๐Ÿšดโ€โ™‚๏ธ
TIL
  • MAIN
  • : TIL?
  • : WIL
  • : Plan
  • : Retrospective
    • 21Y
      • Wait a moment!
      • 9M 2W
      • 9M1W
      • 8M4W
      • 8M3W
      • 8M2W
      • 8M1W
      • 7M4W
      • 7M3W
      • 7M2W
      • 7M1W
      • 6M5W
      • 1H
    • ์ƒˆ์‚ฌ๋žŒ ๋˜๊ธฐ ํ”„๋กœ์ ํŠธ
      • 2ํšŒ์ฐจ
      • 1ํšŒ์ฐจ
  • TIL : ML
    • Paper Analysis
      • BERT
      • Transformer
    • Boostcamp 2st
      • [S]Data Viz
        • (4-3) Seaborn ์‹ฌํ™”
        • (4-2) Seaborn ๊ธฐ์ดˆ
        • (4-1) Seaborn ์†Œ๊ฐœ
        • (3-4) More Tips
        • (3-3) Facet ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-2) Color ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-1) Text ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-3) Scatter Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-2) Line Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-1) Bar Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (1-3) Python๊ณผ Matplotlib
        • (1-2) ์‹œ๊ฐํ™”์˜ ์š”์†Œ
        • (1-1) Welcome to Visualization (OT)
      • [P]MRC
        • (2๊ฐ•) Extraction-based MRC
        • (1๊ฐ•) MRC Intro & Python Basics
      • [P]KLUE
        • (5๊ฐ•) BERT ๊ธฐ๋ฐ˜ ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
        • (4๊ฐ•) ํ•œ๊ตญ์–ด BERT ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต
        • [NLP] ๋ฌธ์žฅ ๋‚ด ๊ฐœ์ฒด๊ฐ„ ๊ด€๊ณ„ ์ถ”์ถœ
        • (3๊ฐ•) BERT ์–ธ์–ด๋ชจ๋ธ ์†Œ๊ฐœ
        • (2๊ฐ•) ์ž์—ฐ์–ด์˜ ์ „์ฒ˜๋ฆฌ
        • (1๊ฐ•) ์ธ๊ณต์ง€๋Šฅ๊ณผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
      • [U]Stage-CV
      • [U]Stage-NLP
        • 7W Retrospective
        • (10๊ฐ•) Advanced Self-supervised Pre-training Models
        • (09๊ฐ•) Self-supervised Pre-training Models
        • (08๊ฐ•) Transformer (2)
        • (07๊ฐ•) Transformer (1)
        • 6W Retrospective
        • (06๊ฐ•) Beam Search and BLEU score
        • (05๊ฐ•) Sequence to Sequence with Attention
        • (04๊ฐ•) LSTM and GRU
        • (03๊ฐ•) Recurrent Neural Network and Language Modeling
        • (02๊ฐ•) Word Embedding
        • (01๊ฐ•) Intro to NLP, Bag-of-Words
        • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Preprocessing for NMT Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Subword-level Language Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ2] RNN-based Language Model
        • [์„ ํƒ ๊ณผ์ œ] BERT Fine-tuning with transformers
        • [ํ•„์ˆ˜ ๊ณผ์ œ] Data Preprocessing
      • Mask Wear Image Classification
        • 5W Retrospective
        • Report_Level1_6
        • Performance | Review
        • DAY 11 : HardVoting | MultiLabelClassification
        • DAY 10 : Cutmix
        • DAY 9 : Loss Function
        • DAY 8 : Baseline
        • DAY 7 : Class Imbalance | Stratification
        • DAY 6 : Error Fix
        • DAY 5 : Facenet | Save
        • DAY 4 : VIT | F1_Loss | LrScheduler
        • DAY 3 : DataSet/Lodaer | EfficientNet
        • DAY 2 : Labeling
        • DAY 1 : EDA
        • 2_EDA Analysis
      • [P]Stage-1
        • 4W Retrospective
        • (10๊ฐ•) Experiment Toolkits & Tips
        • (9๊ฐ•) Ensemble
        • (8๊ฐ•) Training & Inference 2
        • (7๊ฐ•) Training & Inference 1
        • (6๊ฐ•) Model 2
        • (5๊ฐ•) Model 1
        • (4๊ฐ•) Data Generation
        • (3๊ฐ•) Dataset
        • (2๊ฐ•) Image Classification & EDA
        • (1๊ฐ•) Competition with AI Stages!
      • [U]Stage-3
        • 3W Retrospective
        • PyTorch
          • (10๊ฐ•) PyTorch Troubleshooting
          • (09๊ฐ•) Hyperparameter Tuning
          • (08๊ฐ•) Multi-GPU ํ•™์Šต
          • (07๊ฐ•) Monitoring tools for PyTorch
          • (06๊ฐ•) ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
          • (05๊ฐ•) Dataset & Dataloader
          • (04๊ฐ•) AutoGrad & Optimizer
          • (03๊ฐ•) PyTorch ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
          • (02๊ฐ•) PyTorch Basics
          • (01๊ฐ•) Introduction to PyTorch
      • [U]Stage-2
        • 2W Retrospective
        • DL Basic
          • (10๊ฐ•) Generative Models 2
          • (09๊ฐ•) Generative Models 1
          • (08๊ฐ•) Sequential Models - Transformer
          • (07๊ฐ•) Sequential Models - RNN
          • (06๊ฐ•) Computer Vision Applications
          • (05๊ฐ•) Modern CNN - 1x1 convolution์˜ ์ค‘์š”์„ฑ
          • (04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?
          • (03๊ฐ•) Optimization
          • (02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)
          • (01๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ์šฉ์–ด ์„ค๋ช… - Historical Review
        • Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Multi-headed Attention Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] LSTM Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] CNN Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Optimization Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] MLP Assignment
      • [U]Stage-1
        • 1W Retrospective
        • AI Math
          • (AI Math 10๊ฐ•) RNN ์ฒซ๊ฑธ์Œ
          • (AI Math 9๊ฐ•) CNN ์ฒซ๊ฑธ์Œ
          • (AI Math 8๊ฐ•) ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 7๊ฐ•) ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 6๊ฐ•) ํ™•๋ฅ ๋ก  ๋ง›๋ณด๊ธฐ
          • (AI Math 5๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต๋ฐฉ๋ฒ• ์ดํ•ดํ•˜๊ธฐ
          • (AI Math 4๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ๋งค์šด๋ง›
          • (AI Math 3๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ์ˆœํ•œ๋ง›
          • (AI Math 2๊ฐ•) ํ–‰๋ ฌ์ด ๋ญ์˜ˆ์š”?
          • (AI Math 1๊ฐ•) ๋ฒกํ„ฐ๊ฐ€ ๋ญ์˜ˆ์š”?
        • Python
          • (Python 7-2๊ฐ•) pandas II
          • (Python 7-1๊ฐ•) pandas I
          • (Python 6๊ฐ•) numpy
          • (Python 5-2๊ฐ•) Python data handling
          • (Python 5-1๊ฐ•) File / Exception / Log Handling
          • (Python 4-2๊ฐ•) Module and Project
          • (Python 4-1๊ฐ•) Python Object Oriented Programming
          • (Python 3-2๊ฐ•) Pythonic code
          • (Python 3-1๊ฐ•) Python Data Structure
          • (Python 2-4๊ฐ•) String and advanced function concept
          • (Python 2-3๊ฐ•) Conditionals and Loops
          • (Python 2-2๊ฐ•) Function and Console I/O
          • (Python 2-1๊ฐ•) Variables
          • (Python 1-3๊ฐ•) ํŒŒ์ด์ฌ ์ฝ”๋”ฉ ํ™˜๊ฒฝ
          • (Python 1-2๊ฐ•) ํŒŒ์ด์ฌ ๊ฐœ์š”
          • (Python 1-1๊ฐ•) Basic computer class for newbies
        • Assignment
          • [์„ ํƒ ๊ณผ์ œ 3] Maximum Likelihood Estimate
          • [์„ ํƒ ๊ณผ์ œ 2] Backpropagation
          • [์„ ํƒ ๊ณผ์ œ 1] Gradient Descent
          • [ํ•„์ˆ˜ ๊ณผ์ œ 5] Morsecode
          • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Baseball
          • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Text Processing 2
          • [ํ•„์ˆ˜ ๊ณผ์ œ 2] Text Processing 1
          • [ํ•„์ˆ˜ ๊ณผ์ œ 1] Basic Math
    • ๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ
      • ์ข…ํ•ฉ ์‹ค์Šต 2 - ์บ๊ธ€ Plant Pathology(๋‚˜๋ฌด์žŽ ๋ณ‘ ์ง„๋‹จ) ๊ฒฝ์—ฐ ๋Œ€ํšŒ
      • ์ข…ํ•ฉ ์‹ค์Šต 1 - 120์ข…์˜ Dog Breed Identification ๋ชจ๋ธ ์ตœ์ ํ™”
      • ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์˜ ๋ฏธ์„ธ ์กฐ์ • ํ•™์Šต๊ณผ ๋‹ค์–‘ํ•œ Learning Rate Scheduler์˜ ์ ์šฉ
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - ResNet ์ƒ์„ธ์™€ EfficientNet ๊ฐœ์š”
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - AlexNet, VGGNet, GoogLeNet
      • Albumentation์„ ์ด์šฉํ•œ Augmentation๊ธฐ๋ฒ•๊ณผ Keras Sequence ํ™œ์šฉํ•˜๊ธฐ
      • ์‚ฌ์ „ ํ›ˆ๋ จ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ๊ณผ Keras Generator ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ดํ•ด
      • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์ดํ•ด - Keras ImageDataGenerator ํ™œ์šฉ
      • CNN ๋ชจ๋ธ ๊ตฌํ˜„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ณธ ๊ธฐ๋ฒ• ์ ์šฉํ•˜๊ธฐ
    • AI School 1st
    • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
    • ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜
  • TIL : Python & Math
    • Do It! ์žฅ๊ณ +๋ถ€ํŠธ์ŠคํŠธ๋žฉ: ํŒŒ์ด์ฌ ์›น๊ฐœ๋ฐœ์˜ ์ •์„
      • Relations - ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„
      • Relations - ๋‹ค๋Œ€์ผ ๊ด€๊ณ„
      • ํ…œํ”Œ๋ฆฟ ํŒŒ์ผ ๋ชจ๋“ˆํ™” ํ•˜๊ธฐ
      • TDD (Test Driven Development)
      • template tags & ์กฐ๊ฑด๋ฌธ
      • ์ •์  ํŒŒ์ผ(static files) & ๋ฏธ๋””์–ด ํŒŒ์ผ(media files)
      • FBV (Function Based View)์™€ CBV (Class Based View)
      • Django ์ž…๋ฌธํ•˜๊ธฐ
      • ๋ถ€ํŠธ์ŠคํŠธ๋žฉ
      • ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์ดˆ๋‹ค์ง€๊ธฐ (HTML, CSS, JS)
      • ๋“ค์–ด๊ฐ€๊ธฐ + ํ™˜๊ฒฝ์„ค์ •
    • Algorithm
      • Programmers
        • Level1
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์ˆซ์ž ๋ฌธ์ž์—ด๊ณผ ์˜๋‹จ์–ด
          • ์ž์—ฐ์ˆ˜ ๋’ค์ง‘์–ด ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค๊ธฐ
          • ์ •์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐฐ์น˜ํ•˜๊ธฐ
          • ์ •์ˆ˜ ์ œ๊ณฑ๊ทผ ํŒ๋ณ„
          • ์ œ์ผ ์ž‘์€ ์ˆ˜ ์ œ๊ฑฐํ•˜๊ธฐ
          • ์ง์‚ฌ๊ฐํ˜• ๋ณ„์ฐ๊ธฐ
          • ์ง์ˆ˜์™€ ํ™€์ˆ˜
          • ์ฒด์œก๋ณต
          • ์ตœ๋Œ€๊ณต์•ฝ์ˆ˜์™€ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • ์ฝœ๋ผ์ธ  ์ถ”์ธก
          • ํฌ๋ ˆ์ธ ์ธํ˜•๋ฝ‘๊ธฐ ๊ฒŒ์ž„
          • ํ‚คํŒจ๋“œ ๋ˆ„๋ฅด๊ธฐ
          • ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
          • ํฐ์ผ“๋ชฌ
          • ํ•˜์ƒค๋“œ ์ˆ˜
          • ํ•ธ๋“œํฐ ๋ฒˆํ˜ธ ๊ฐ€๋ฆฌ๊ธฐ
          • ํ–‰๋ ฌ์˜ ๋ง์…ˆ
        • Level2
          • ์ˆซ์ž์˜ ํ‘œํ˜„
          • ์ˆœ์œ„ ๊ฒ€์ƒ‰
          • ์ˆ˜์‹ ์ตœ๋Œ€ํ™”
          • ์†Œ์ˆ˜ ์ฐพ๊ธฐ
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์‚ผ๊ฐ ๋‹ฌํŒฝ์ด
          • ๋ฌธ์ž์—ด ์••์ถ•
          • ๋ฉ”๋‰ด ๋ฆฌ๋‰ด์–ผ
          • ๋” ๋งต๊ฒŒ
          • ๋•…๋”ฐ๋จน๊ธฐ
          • ๋ฉ€์ฉกํ•œ ์‚ฌ๊ฐํ˜•
          • ๊ด„ํ˜ธ ํšŒ์ „ํ•˜๊ธฐ
          • ๊ด„ํ˜ธ ๋ณ€ํ™˜
          • ๊ตฌ๋ช…๋ณดํŠธ
          • ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
          • ๋‰ด์Šค ํด๋Ÿฌ์Šคํ„ฐ๋ง
          • ๋‹ค๋ฆฌ๋ฅผ ์ง€๋‚˜๋Š” ํŠธ๋Ÿญ
          • ๋‹ค์Œ ํฐ ์ˆซ์ž
          • ๊ฒŒ์ž„ ๋งต ์ตœ๋‹จ๊ฑฐ๋ฆฌ
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
          • ๊ฐ€์žฅ ํฐ ์ •์‚ฌ๊ฐํ˜• ์ฐพ๊ธฐ
          • H-Index
          • JadenCase ๋ฌธ์ž์—ด ๋งŒ๋“ค๊ธฐ
          • N๊ฐœ์˜ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • N์ง„์ˆ˜ ๊ฒŒ์ž„
          • ๊ฐ€์žฅ ํฐ ์ˆ˜
          • 124 ๋‚˜๋ผ์˜ ์ˆซ์ž
          • 2๊ฐœ ์ดํ•˜๋กœ ๋‹ค๋ฅธ ๋น„ํŠธ
          • [3์ฐจ] ํŒŒ์ผ๋ช… ์ •๋ ฌ
          • [3์ฐจ] ์••์ถ•
          • ์ค„ ์„œ๋Š” ๋ฐฉ๋ฒ•
          • [3์ฐจ] ๋ฐฉ๊ธˆ ๊ทธ๊ณก
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
        • Level3
          • ๋งค์นญ ์ ์ˆ˜
          • ์™ธ๋ฒฝ ์ ๊ฒ€
          • ๊ธฐ์ง€๊ตญ ์„ค์น˜
          • ์ˆซ์ž ๊ฒŒ์ž„
          • 110 ์˜ฎ๊ธฐ๊ธฐ
          • ๊ด‘๊ณ  ์ œ๊ฑฐ
          • ๊ธธ ์ฐพ๊ธฐ ๊ฒŒ์ž„
          • ์…”ํ‹€๋ฒ„์Šค
          • ๋‹จ์†์นด๋ฉ”๋ผ
          • ํ‘œ ํŽธ์ง‘
          • N-Queen
          • ์ง•๊ฒ€๋‹ค๋ฆฌ ๊ฑด๋„ˆ๊ธฐ
          • ์ตœ๊ณ ์˜ ์ง‘ํ•ฉ
          • ํ•ฉ์Šน ํƒ์‹œ ์š”๊ธˆ
          • ๊ฑฐ์Šค๋ฆ„๋ˆ
          • ํ•˜๋…ธ์ด์˜ ํƒ‘
          • ๋ฉ€๋ฆฌ ๋›ฐ๊ธฐ
          • ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ
        • Level4
    • Head First Python
    • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ SQL
    • ๋‹จ ๋‘ ์žฅ์˜ ๋ฌธ์„œ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™” ๋ฝ€๊ฐœ๊ธฐ
    • Linear Algebra(Khan Academy)
    • ์ธ๊ณต์ง€๋Šฅ์„ ์œ„ํ•œ ์„ ํ˜•๋Œ€์ˆ˜
    • Statistics110
  • TIL : etc
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Kubernetes
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Docker
      • 2. ๋„์ปค ์„ค์น˜ ์‹ค์Šต 1 - ํ•™์ŠตํŽธ(์ค€๋น„๋ฌผ/์‹ค์Šต ์œ ํ˜• ์†Œ๊ฐœ)
      • 1. ์ปจํ…Œ์ด๋„ˆ์™€ ๋„์ปค์˜ ์ดํ•ด - ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์“ฐ๋Š”์ด์œ  / ์ผ๋ฐ˜ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ปจํ…Œ์ด๋„ˆํ”„๋กœ๊ทธ๋žจ์˜ ์ฐจ์ด์ 
      • 0. ๋“œ๋””์–ด ์ฐพ์•„์˜จ Docker ๊ฐ•์˜! ์™•์ดˆ๋ณด์—์„œ ๋„์ปค ๋งˆ์Šคํ„ฐ๋กœ - OT
    • CoinTrading
      • [๊ฐ€์ƒ ํ™”ํ ์ž๋™ ๋งค๋งค ํ”„๋กœ๊ทธ๋žจ] ๋ฐฑํ…Œ์ŠคํŒ… : ๊ฐ„๋‹จํ•œ ํ…Œ์ŠคํŒ…
    • Gatsby
      • 01 ๊นƒ๋ถ ํฌ๊ธฐ ์„ ์–ธ
  • TIL : Project
    • Mask Wear Image Classification
    • Project. GARIGO
  • 2021 TIL
    • CHANGED
    • JUN
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Thu
      • 2 Wed
      • 1 Tue
    • MAY
      • 31 Mon
      • 30 Sun
      • 29 Sat
      • 28 Fri
      • 27 Thu
      • 26 Wed
      • 25 Tue
      • 24 Mon
      • 23 Sun
      • 22 Sat
      • 21 Fri
      • 20 Thu
      • 19 Wed
      • 18 Tue
      • 17 Mon
      • 16 Sun
      • 15 Sat
      • 14 Fri
      • 13 Thu
      • 12 Wed
      • 11 Tue
      • 10 Mon
      • 9 Sun
      • 8 Sat
      • 7 Fri
      • 6 Thu
      • 5 Wed
      • 4 Tue
      • 3 Mon
      • 2 Sun
      • 1 Sat
    • APR
      • 30 Fri
      • 29 Thu
      • 28 Wed
      • 27 Tue
      • 26 Mon
      • 25 Sun
      • 24 Sat
      • 23 Fri
      • 22 Thu
      • 21 Wed
      • 20 Tue
      • 19 Mon
      • 18 Sun
      • 17 Sat
      • 16 Fri
      • 15 Thu
      • 14 Wed
      • 13 Tue
      • 12 Mon
      • 11 Sun
      • 10 Sat
      • 9 Fri
      • 8 Thu
      • 7 Wed
      • 6 Tue
      • 5 Mon
      • 4 Sun
      • 3 Sat
      • 2 Fri
      • 1 Thu
    • MAR
      • 31 Wed
      • 30 Tue
      • 29 Mon
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • FEB
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • JAN
      • 31 Sun
      • 30 Sat
      • 29 Fri
      • 28 Thu
      • 27 Wed
      • 26 Tue
      • 25 Mon
      • 24 Sun
      • 23 Sat
      • 22 Fri
      • 21 Thu
      • 20 Wed
      • 19 Tue
      • 18 Mon
      • 17 Sun
      • 16 Sat
      • 15 Fri
      • 14 Thu
      • 13 Wed
      • 12 Tue
      • 11 Mon
      • 10 Sun
      • 9 Sat
      • 8 Fri
      • 7 Thu
      • 6 Wed
      • 5 Tue
      • 4 Mon
      • 3 Sun
      • 2 Sat
      • 1 Fri
  • 2020 TIL
    • DEC
      • 31 Thu
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Tue
      • 2 Wed
      • 1 Tue
    • NOV
      • 30 Mon
Powered by GitBook
On this page
  • Recent Trends
  • GPT-1
  • BERT
  • Pre training Tasks in BERT : Masked Language Model
  • Pre training Tasks in BERT : Next Sentence Prediction
  • BERT Summary
  • Fine-tuning Process
  • BERT vs GPT-1
  • GLUE Benchmark Results
  • Machine Reading Comprehension(MRC), Question Anwsering
  • SQuAD 1.1
  • SQuAD 2.0
  • On SWAG
  • Ablation Study

Was this helpful?

  1. TIL : ML
  2. Boostcamp 2st
  3. [U]Stage-NLP

(09๊ฐ•) Self-supervised Pre-training Models

210915, 210916

Previous(10๊ฐ•) Advanced Self-supervised Pre-training ModelsNext(08๊ฐ•) Transformer (2)

Last updated 3 years ago

Was this helpful?

Recent Trends

ํŠธ๋žœ์Šคํฌ๋จธ ๋ฐ self-attention block์€ ๋ฒ”์šฉ์ ์ธ sequence encoder and decoder๋กœ์จ ์ตœ๊ทผ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์— ๋งŽ์€ ๋ถ„์•ผ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ณ ์žˆ๋‹ค. ์‹ฌ์ง€์–ด, ๋‹ค๋ฅธ ๋ถ„์•ผ์—์„œ๋„ ํ™œ๋ฐœํ•˜๊ฒŒ ์‚ฌ์šฉ๋œ๋‹ค.

ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์ด๋Ÿฌํ•œ self-attention block์„ 6๊ฐœ๋งŒ ์Œ“์•˜๋Š”๋ฐ, ์ตœ๊ทผ์˜ ๋ฐœ์ „๋™ํ–ฅ์€ (๋ชจ๋ธ ๊ตฌ์กฐ ์ž์ฒด์˜ ๋ณ€ํ™”๋Š” ์—†์ด) ์ด๋ฅผ ์ ์  ๋” ๋งŽ์ด ์Œ“๊ฒŒ๋˜์—ˆ๋‹ค. ์ด๋ฅผ, ๋Œ€๊ทœ๋ชจ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด์„œ ํ•™์Šตํ•  ๋•Œ self-supervised learning framework๋ฅผ ํ†ตํ•ด ํ•™์Šตํ•˜๊ณ  transfer learning์˜ ํ˜•ํƒœ๋กœ fine tuningํ•ด์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ณ ์žˆ๋‹ค.

ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์ถ”์ฒœ ์‹œ์Šคํ…œ, ์‹ ์•ฝ ๊ฐœ๋ฐœ, ์˜์ƒ ์ฒ˜๋ฆฌ๋ถ„์•ผ๊นŒ์ง€๋„ ํ™•์žฅํ•˜๊ณ  ์žˆ์ง€๋งŒ, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ผ๋Š” ๋ถ„์•ผ์—์„œ๋Š” <sos> ๋ผ๋Š” ํ† ํฐ๋ถ€ํ„ฐ ํ•˜๋‚˜์”ฉ ๋‹จ์–ด๋ฅผ ์ƒ์„ฑํ•œ๋‹ค๋Š” ์ ์—์„œ ๋ฒ—์–ด๋‚˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

GPT-1

ํ…Œ์Šฌ๋ผ์˜ ์ฐฝ์—…์ž ์ผ๋ก  ๋จธ์Šคํฌ๊ฐ€ ์„ธ์šด ๋น„์˜๋ฆฌ ์—ฐ๊ตฌ๊ธฐ๊ด€์ธ Open AI์—์„œ ๋‚˜์˜จ ๋ชจ๋ธ์ด๋‹ค. ์ตœ๊ทผ์— GPT-2์™€ 3๊นŒ์ง€ ์ด์–ด์ ธ ๋†€๋ผ์šด ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค.

๋‹ค์–‘ํ•œ Special Token์„ ์ œ์•ˆํ•ด์„œ, ๋‹จ์ˆœํ•œ ์–ธ์–ด ๋ชจ๋ธ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค์–‘ํ•œ ์–ธ์–ด ๋ชจ๋ธ์„ ๋™์‹œ์— ์ปค๋ฒ„ํ•˜๋Š” ํ†ตํ•ฉ๋œ ๋ชจ๋ธ์„ ์ œ์•ˆํ–ˆ๋‹ค๋Š” ๊ฒƒ์ด ํŠน์ง•์ด๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ GPT-1์˜ ๋ชจ๋ธ๊ตฌ์กฐ์™€ ํ•™์Šต๋ฐฉ์‹์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.

  • ํŠธ๋žœ์Šคํฌ๋จธ์™€ ๋ชจ์–‘์€ ๋‹ฌ๋ผ๋„, Text์— Position Embedding์„ ๋”ํ•œ ๊ฐ’์ด ์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด๊ฐ€๋ฉฐ, self-attention์„ ์Œ“์€ ์ธต์ด 12๊ฐœ์ด๋‹ค.

  • ๊ฒฐ๊ณผ๋Š” Text Prediction๊ณผ Text Classifier๋กœ ๋ฐ˜ํ™˜๋œ๋‹ค.

  • Text prediction

    • ์ฒซ ๋‹จ์–ด๋ถ€ํ„ฐ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ์˜ˆ์ธกํ•œ๋‹ค.

  • Text Classifer

    • ์‹œํ€€์Šค์— ๋Œ€ํ•œ ๊ฐ์ • ๋ถ„๋ฅ˜๋“ฑ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

๋‹จ์ˆœํ•œ Task ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋ฌธ์žฅ ๋ ˆ๋ฒจ ๋˜๋Š” ๋‹ค์ˆ˜์˜ ๋ฌธ์žฅ์ด ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ์—๋„ ๋ชจ๋ธ์ด ์†์‰ฝ๊ฒŒ ๋ณ€ํ˜• ์—†์ด ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต์˜ framework๋ฅผ ์ œ์‹œํ–ˆ๋‹ค.

  • ์šฐ๋ฆฌ๊ฐ€ ์•Œ๊ณ  ์žˆ๋Š” ํ† ํฐ ์ด์™ธ์—๋„ Delim์ด๋‚˜ Extract๋ผ๋Š” ํ† ํฐ์„ ์‚ฌ์šฉํ•˜๋ฉด์„œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ Task๋ฅผ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

๋งŒ์•ฝ ๋ชจ๋ธ์„ ํ†ตํ•ด์„œ, ์ฃผ์ œ ๋ถ„๋ฅ˜๋ฅผ ํ•ด๋ณธ๋‹ค๊ณ  ํ•˜์ž. (ex ํ•ด๋‹น doc์ด ์ •์น˜, ๊ฒฝ์ œ, ์‚ฌํšŒ, ์Šคํฌ์ธ  ๋ถ„์•ผ ์ค‘ ์–ด๋–ค ๋ถ„์•ผ์ธ์ง€) ์ด ๋•Œ๋Š” ์ด์ „์— ์‚ฌ์šฉํ•˜๋˜ Text Prediction์ด๋‚˜ Task Classifier๋Š” ๋–ผ๋ฒ„๋ฆฌ๊ณ  ๊ทธ ์ „๊นŒ์ง€์˜ output์ธ word๋ณ„ embedding ๋ฒกํ„ฐ๋“ค์„ ์‚ฌ์šฉํ•ด์„œ ์ถ”๊ฐ€์ ์ธ Task์— ๋Œ€ํ•œ ๋ ˆ์ด์–ด๋ฅผ ์ถ”๊ฐ€ํ•ด์„œ ํ•™์Šตํ•œ๋‹ค.

  • ์ด ๋•Œ ๋งˆ์ง€๋ง‰์— ์ถ”๊ฐ€๋˜๋Š” layer๋Š” random initialization์ด ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์œผ๋กœ ํ•™์Šต์„ ํ•˜์ง€๋งŒ, ๊ทธ ์ด์ „๊นŒ์ง€์˜ layer๋“ค์€ ํ•™์Šต์ด ์ด๋ฏธ ๋˜์–ด์žˆ๋Š” ์ƒํƒœ์ด๋‹ค. ๊ทธ๋ž˜์„œ, ์ด์ „ layer๋“ค์—๊ฒŒ๋Š” ํ•™์Šต๋ฅ ์„ ๋งค์šฐ ์ž‘๊ฒŒ์ฃผ๋ฉด์„œ ํฐ ๋ณ€ํ™”๊ฐ€ ์ผ์–ด๋‚˜์ง€ ์•Š๋„๋ก ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ์ด์ „์— ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ์ž˜ ๋‹ด๊ณ ์žˆ์œผ๋ฉด์„œ ์›ํ•˜๋Š” Task์—๋Š” ์ž˜ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ์ด๋Š” pre-training๊ณผ fine-tuning์„ ๋™์‹œ์— ์ ์šฉํ•˜๋Š” ๊ณผ์ •์ด๋‹ค.

  • ๋˜, document์˜ class๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ์—์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์„ ์ˆ˜ ๋ฐ–์— ์—†๋‹ค๋ณด๋‹ˆ, ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ ค๋„ ์ข‹์ง€๋งŒ

    ๋˜, ํ…์ŠคํŠธ์˜ class๋ฅผ ๊ตฌ๋ถ„ํ•˜๋ ค๋ฉด labeling์ด ๋˜์–ด์žˆ์–ด์•ผ ํ•˜๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ์…‹์€ ์ƒ๋Œ€์ ์œผ๋กœ ๊ทธ ์–‘์ด ์ž‘๋‹ค. ๊ทธ๋ž˜์„œ, ์ด์ „์— self-supervised ๋ฐฉ์‹์˜ pre-trained ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€์„œ fine tuning ํ•˜๊ฒŒ ๋œ๋‹ค.

์ด๋ ‡๊ฒŒ pre trained ๋œ GPT-1 ์„ ๋‹ค์–‘ํ•œ task์— fine tuningํ–ˆ์„ ๋•Œ์˜ ์„ฑ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ๊ฑฐ์˜ ๋Œ€๋ถ€๋ถ„์˜ task์—์„œ ์„ฑ๋Šฅ์ด ํ›จ์”ฌ ์ข‹์€ ๋ชจ์Šต์„ ๋ณด์ธ๋‹ค.

BERT

๋ฒ„ํŠธ ๋ชจ๋ธ์€ ํ˜„์žฌ๊นŒ์ง€๋„ ๋„๋ฆฌ ์“ฐ์ด๋Š” Pre trained ๋ชจ๋ธ์ด๋‹ค. GPT์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Language ๋ชจ๋ธ๋กœ์จ ๋ฌธ์žฅ์˜ ์ผ๋ถ€ ๋‹จ์–ด๋ฅผ ๋งž์ถ”๋Š” Task์— ๋Œ€ํ•ด Pretrained๋ฅผ ์ˆ˜ํ–‰ํ•œ ๋ชจ๋ธ์ด๋‹ค.

Self-supervised learning ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•˜๋Š” Transformer ์ด์ „์— language ๋ชจ๋ธ ์ค‘์—์„œ๋Š” LSTM ๊ธฐ๋ฐ˜์˜ ์ธ์ฝ”๋”๋กœ Pre-train ํ•˜๋Š” ์ ‘๊ทผ ๊ธฐ๋ฒ•๋„ ์กด์žฌํ–ˆ๋Š”๋ฐ, ์ด๊ฒƒ์ด ๋ฐ”๋กœ ELMo์ด๋‹ค. ์ด๋Ÿฌํ•œ LSTM ๊ธฐ๋ฐ˜์˜ ์ธ์ฝ”๋”๋ฅผ Transformer ๊ธฐ๋ฐ˜์˜ ์ธ์ฝ”๋”๋กœ ๋ฐ”๊พธ๋ฉด์„œ ์—ฌ๋Ÿฌ Task์— ๋Œ€ํ•ด ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ง€๋Š” ๋ชจ๋ธ๋“ค์ด ๋‚˜์™”๋Š”๋ฐ, ๊ทธ ์ค‘ ํ•˜๋‚˜๊ฐ€ BERT์ด๋‹ค.

๊ธฐ์กด์— GPT๋Š” ์ „ํ›„ ๋ฌธ๋งฅ์„ ํŒŒ์•…ํ•˜์ง€ ๋ชปํ•˜๊ณ  ์•ž์ชฝ ๋ฌธ๋งฅ๋งŒ ๋ณด๊ณ  ๋’ท ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ด์•ผ ํ•œ๋‹ค๋Š” ํ•œ๊ณ„์ ์ด ์กด์žฌํ–ˆ๋‹ค.

  • ์‹ค์ œ ์‚ฌ๋žŒ์˜ ๋Œ€ํ™”์—์„œ๋“ , ํ…์ŠคํŠธ์—์„œ๋“  ๋’ค์ชฝ์—์„œ ๋ฌธ๋งฅ์„ ํŒŒ์•…ํ•˜๋Š” ์ผ์€ ์ž์ฃผ์žˆ๋Š” ์ผ์ด๋‹ค.

Pre training Tasks in BERT : Masked Language Model

์ฃผ์–ด์ง„ Input์— ๋Œ€ํ•˜์—ฌ ํ™•๋ฅ ์ ์œผ๋กœ ํŠน์ • ํ† ํฐ์„ Maskingํ•˜๊ฒŒ ๋˜๊ณ  ์ด mask๊ฐ€ ์›๋ž˜ ์–ด๋–ค ๋‹จ์–ด์žˆ๋Š”์ง€ ์•Œ์•„๋‚ด๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋˜๊ฒŒ ๋œ๋‹ค.

๋ณดํ†ต์€ 15% ์˜ ๋น„์œจ๋กœ ๋งˆ์Šคํ‚น์„ ์ง„ํ–‰ํ•˜๋Š”๋ฐ ์ด ๋น„์œจ์ด ๋„ˆ๋ฌด ํฌ๋ฉด ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ๋„ˆ๋ฌด ์–ด๋ ค์›Œ์ง€๊ณ  ๋„ˆ๋ฌด ์ ์œผ๋ฉด ํ•™์Šต์„ ํ•  ๋•Œ ๋น„์šฉ์ด ์ปค์ง€๊ฒŒ ๋œ๋‹ค.

  • ์ด 15%์˜ ๋น„์œจ์€ BERT๊ฐ€ ์ฐพ์€ ์ตœ์ ์˜ ๋น„์œจ์ด๋‹ค.

์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ๊ฒƒ์€, ๋งˆ์Šคํ‚น์„ ํ•˜๊ธฐ๋กœ ํ•œ 15% ์ „๋ถ€๋ฅผ ๋งˆ์Šคํ‚นํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋งŒ์•ฝ, ์šฐ๋ฆฌ๊ฐ€ ์–ด๋–ค ํ…์ŠคํŠธ์˜ ๊ฐ์ • ๋ถ„์„์„ ํ•œ๋‹ค๊ณ  ํ•˜์ž. ์ด ๋•Œ๋Š” ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•  ์ผ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ๋งˆ์Šคํ‚น์ด ํ•„์š”๊ฐ€ ์—†์–ด์ง„๋‹ค. ์˜คํžˆ๋ ค ์ด๋Ÿฐ ๋งˆ์Šคํ‚น์„ ํ†ตํ•ด ํ•™์Šต๋œ ๋ชจ๋ธ์€ ์‹ค์ œ ํƒœ์Šคํฌ์™€๋Š” ์ฐจ์ด๊ฐ€ ์žˆ์–ด์„œ ์ „์ด ํ•™์Šต์„ ํ•˜๋Š”๋ฐ๋„ ๋ถˆ๊ณผํ•˜๊ณ  ์„ฑ๋Šฅ์ด ๊ฐ์†Œํ•˜๊ฒŒ ๋˜๋Š” ๋ชจ์Šต์„ ๋ณด์ธ๋‹ค.

๊ทธ๋ž˜์„œ, 15%์˜ ๋งˆ์Šคํ‚น์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋น„์œจ๋กœ ๋ฐ”๋€๋‹ค.

๋ฒ„ํŠธ์—์„œ ์“ฐ์ธ Pre trained ๊ธฐ๋ฒ•์ด ๋‹จ์ง€ MASK๋œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ ์ด์™ธ์˜ ๋ฌธ์žฅ ๋ ˆ๋ฒจ ํƒœ์Šคํฌ์—์„œ๋„ ์ ์šฉ๋  ์ˆ˜ ์žˆ๋„๋ก ์ œ์•ˆ๋˜์—ˆ๋Š”๋ฐ ์ด๊ฒƒ์ด ๋ฐ”๋กœ Next Sentence Prediction์ด๋ผ๋Š” ๊ธฐ๋ฒ•์ด๋‹ค.

Pre training Tasks in BERT : Next Sentence Prediction

๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ์„ ๋ฝ‘๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ ์‚ฌ์ด์™€ ๋์—๋Š” [SEP] (seperate) ํ† ํฐ์„ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค. ๋˜, ๋ฌธ์žฅ ๋ ˆ๋ฒจ์—์„œ์˜ ์˜ˆ์ธก Task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์—ญํ• ์„ ๋‹ด๋‹นํ•˜๋Š” [CLS] (classification) ํ† ํฐ์„ ๋ฌธ์žฅ์— ์•ž์— ์ถ”๊ฐ€ํ•ด์ค€๋‹ค.

ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒƒ์€, ์—ฐ๊ฒฐ๋œ ๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ์ด ์‹ค์ œ๋กœ ์—ฐ๊ฒฐ๋ผ์„œ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋Š” ๋ฌธ์žฅ์ธ์ง€ ์ ˆ๋Œ€ ๋‚˜์˜ฌ ์ˆ˜ ์—†๋Š” ๋ฌธ์žฅ์ธ์ง€๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ์ด ์ž‘์—…์ด ์ˆ˜ํ–‰๋˜๋А ์ˆœ์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ์„ ๋ฝ‘๊ณ  masking ์ž‘์—…์„ ํ•œ๋‹ค.

  • ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ํ†ตํ•ด mask ์ž๋ฆฌ์˜ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค.

  • CLS ํ† ํฐ์„ ๊ฐ€์ง€๊ณ  ๋‘ ๋ฌธ์žฅ์ด ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋Š”์ง€ ์—†๋Š”์ง€์— ๋Œ€ํ•œ Binary classification์„ ํ•˜๋ฉฐ ์ด์— ๋Œ€ํ•œ Ground Truth๋Š” ๋‘ ๋ฌธ์žฅ์ด ์‹ค์ œ๋กœ ์ธ์ ‘ํ•œ์ง€์— ๋Œ€ํ•œ ๋ถ€๋ถ„์ด๋‹ค.

  • ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ Loss๋ฅผ ๊ฐ€์ง€๊ณ  CLS ํ† ํฐ์ด ์ˆ˜์ •๋œ๋‹ค.

ํ˜„์žฌ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ด์•ผ๊ธฐํ–ˆ์œผ๋ฏ€๋กœ ์ด์— ๋Œ€ํ•ด ์ข€ ๋” ์•Œ์•„๋ณด์ž.

BERT Summary

1. Model Architecture

๋ชจ๋ธ ๊ตฌ์กฐ ์ž์ฒด๋Š” ํŠธ๋žœ์Šคํฌ๋จธ์˜ self-attention block์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ ์ด์— ๋Œ€ํ•œ ๋‘ ๊ฐ€์ง€ ๋ฒ„์ „์œผ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ œ์•ˆํ–ˆ๋‹ค.

  • L์€ attention layer์˜ ๊ฐœ์ˆ˜, A๋Š” Attention head์˜ ๊ฐœ์ˆ˜์ด๋‹ค.

  • H๋Š” self attention block์—์„œ ์‚ฌ์šฉํ•˜๋Š” ์ธ์ฝ”๋”ฉ ๋ฒกํ„ฐ์˜ ์ฐจ์›์ˆ˜์ด๋‹ค.

  • base ๋ฒ„์ „์€ large๋ณด๋‹ค ๊ฒฝ๋Ÿ‰ํ™”๋œ ๋ฒ„์ „์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

2. Input

  • ๋ฒ„ํŠธ๋Š” ์ž…๋ ฅ sequence๋ฅผ ๋„ฃ์–ด์ค„ ๋•Œ word ๋ณ„ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, sub word๋ณ„ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

  • ํŠธ๋žœ์Šคํฌ๋จธ์—์„œ ์ œ์•ˆ๋œ ํŠน์ • ์ฃผ๊ธฐํ•จ์ˆ˜์˜ ๊ณ ์ •๋œ ๊ฐ’์œผ๋กœ Positional embedding์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ๋ฒ„ํŠธ์—์„œ๋Š” ํ•™์Šต๋œ Positional embedding vector๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

  • CLS์™€ SEP๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

  • Segment embedding์€ ๋ฒ„ํŠธ๋ฅผ ํ•™์Šตํ•  ๋•Œ, ๋‘ ๋ฌธ์žฅ์ด ์‹ค์ œ ์ธ์ ‘๋ฌธ์žฅ์ธ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•  ๋•Œ ๊ฐ™์ด ์‚ฌ์šฉ๋œ๋‹ค. [SEP] ์„ ๊ธฐ์ค€์œผ๋กœ ๋‘ ๋ฌธ์žฅ์ด ์žˆ์„ ๋•Œ ๋‘๋ฒˆ์งธ ๋ฌธ์žฅ์˜ ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด๋Š” Position์ ์œผ๋กœ๋Š” ๊ฐ€์žฅ ์ฒ˜์Œ์—์„œ๋ถ€ํ„ฐ ๊ฑฐ๋ฆฌ๊ฐ€ ์žˆ์ง€๋งŒ ๋‘๋ฒˆ์งธ ๋ฌธ์žฅ๋งŒ์„ ๊ธฐ์ค€์œผ๋กœ ๋ดค์„ ๋•Œ๋Š” ๋‘๋ฒˆ์งธ ๋ฌธ์žฅ์˜ ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด๋Š” ๋ผ๋Š” ๊ฒƒ์„ ์•Œ๋ ค์ค˜์•ผ ํ•˜๊ณ  ์ด๋ฅผ ์•Œ๋ ค์ฃผ๋Š” ์—ญํ• ์„ ๋‹ด๋‹นํ•œ๋‹ค.

BERT์™€ GPT์˜ ์ฐจ์ด์ ์„ ์‚ดํŽด๋ณด์ž.

GPT์˜ ๊ฒฝ์šฐ ๋ฐ”๋กœ ๋‹ค์Œ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋’ค์— ์œ„์น˜ํ•˜๋Š” ๋‹จ์–ด๋“ค์˜ ์ ‘๊ทผ์„ ํ—ˆ์šฉํ•˜๋ฉด ์•ˆ๋œ๋‹ค. ๊ทธ๋ž˜์„œ ํŠน์ • ์Šคํ…์—์„œ๋Š” ์ž๊ธฐ ์ž์‹ ์„ ํฌํ•จํ•œ ์ด์ „ ๋‹จ์–ด๋“ค์˜ ์ •๋ณด๋งŒ ํ—ˆ์šฉ๋œ๋‹ค.

  • ๊ทธ๋ž˜์„œ Transformer์˜ ๋””์ฝ”๋”์—์„œ ์‚ฌ์šฉํ•˜๋˜ Masked Self-attention ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•œ๋‹ค.

๋ฐ˜๋ฉด, ๋ฒ„ํŠธ์˜ ๊ฒฝ์šฐ Masked๋กœ ์น˜ํ™˜๋œ ํ† ํฐ๋“ค์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๊ณ  ๊ทธ๋ž˜์„œ Mask๋œ ๋‹จ์–ด๋ฅผ ํฌํ•จํ•œ ๋ชจ๋“  ๋‹จ์–ด๋“ค์— ๋Œ€ํ•œ ์ ‘๊ทผ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

  • ๊ทธ๋ž˜์„œ Transformer์˜ ์ธ์ฝ”๋”์—์„œ ์‚ฌ์šฉํ•˜๋˜ Self-attention ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜๊ฒŒ๋œ๋‹ค.

Fine-tuning Process

Mask๋œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” Task์™€ ์ธ์ ‘ ๋ฌธ์žฅ์ธ์ง€๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” Task๋ฅผ ๊ฐ€์ง€๊ณ  ์‚ฌ์ „ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋‹ค์–‘ํ•œ Task์— Fine tuningํ•œ ๋ชจ๋ธ๋“ค์˜ ๊ตฌ์กฐ๋ฅผ ์•Œ์•„๋ณด์ž.

Sentence Pair Classification Tasks

๋…ผ๋ฆฌ์ ์œผ๋กœ ๋‚ดํฌ๊ด€๊ณ„ ๋˜๋Š” ๋ชจ์ˆœ๊ด€๊ณ„๋ฅผ ํŒ๋‹จํ•˜๋Š” ์ผ์ด๋‹ค. ๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ์„ SEP ํ† ํฐ์œผ๋กœ ํ•˜๋‚˜์˜ ์‹œํ€€์Šค๋กœ ์ž…๋ ฅํ•˜๊ณ  BERT๋กœ ์ธ์ฝ”๋”ฉ์„ํ•œ๋‹ค. ๊ฐ๊ฐ์˜ word์— ๋Œ€ํ•œ ์ธ์ฝ”๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์–ป์—ˆ๋‹ค๋ฉด CLS ํ† ํฐ์— ํ•ด๋‹นํ•˜๋Š” ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ Output layer์˜ ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์„œ ๋‹ค์ˆ˜ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

Single Sentence Classification Tasks

๋ฌธ์žฅ์ด ํ•˜๋‚˜๋ฐ–์— ์—†๊ธฐ ๋•Œ๋ฌธ์— ํ•œ ๋ฌธ์žฅ์— ๋Œ€ํ•œ CLS ํ† ํฐ์„ ํ•™์Šตํ•œ๋‹ค.

Question Answering Tasks

์ข€ ๋” ๋ณต์žกํ•œ Task์ธ QA Tasks๋Š” ๋’ค์—์„œ ์ถ”๊ฐ€์ ์œผ๋กœ ์„ค๋ช…ํ•œ๋‹ค.

Single Sentence Taggine Tasks

๊ฐ๊ฐ์˜ ๋‹จ์–ด๋ณ„๋กœ ํ’ˆ์‚ฌ๋‚˜ ์˜๋ฏธ๋ฅผ ํŒŒ์•…ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ CLS ํ† ํฐ๊ณผ ๊ฐ๊ฐ์˜ word์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ฒŒ๋œ๋‹ค.

BERT vs GPT-1

  • ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ์˜ ํฌ๊ธฐ๊ฐ€ ํด์ˆ˜๋ก ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์•ˆ์ •ํ™”๋œ๋‹ค๋Š” ์‚ฌ์‹ค์ด ์•Œ๋ ค์ ธ์žˆ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ, ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋ฅผ ํ‚ค์šฐ๋ ค๋ฉด ๋” ๋งŽ์€ GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”๋กœํ•˜๊ฒŒ๋œ๋‹ค.

BERT๋Š” ๊ฐ ๋‹จ์–ด์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์–ป๊ณ  Masked ๋œ word๋ฅผ ์˜ˆ์ธกํ•˜๋Š” Output layer๋ฅผ ์ œ๊ฑฐํ•œ ๋’ค ์›ํ•˜๋Š” Task์— ๋งž๊ฒŒ layer๋ฅผ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

GLUE Benchmark Results

BERT๋ฅผ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ Task์— Fine Tuning ํ˜•ํƒœ๋กœ ์ ์šฉํ–ˆ์„ ๋•Œ ์ผ๋ฐ˜์ ์œผ๋กœ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ƒˆ๋‹ค.

์œ„ ํ‘œ ์ฒ˜๋Ÿผ ์—ฌ๋Ÿฌ Task๋ฅผ ํ•œ๊ณณ์— ๋ชจ์•„๋†“์€ ํ‘œ๋ฅผ GLUE ๋ผ๊ณ  ํ•œ๋‹ค.

Machine Reading Comprehension(MRC), Question Anwsering

์งˆ์˜ ์‘๋‹ต์— ๋Œ€ํ•œ Task์ด๋‹ค. ๋‹จ์ˆœํžˆ ์งˆ๋ฌธ๋งŒ ์ฃผ์–ด์ง€๊ณ  ๋‹ต์„ ์–ป์–ด๋‚ด๋Š” Task๊ฐ€ ์•„๋‹ˆ๋ผ, ๋…ํ•ด๋ ฅ์— ๊ธฐ๋ฐ˜ํ•œ Task์ด๋‹ค. ์ฃผ์–ด์ง„ ์ง€๋ฌธ์— ๋Œ€ํ•œ ์งˆ๋ฌธ์˜ ๋‹ต์„ ๊ตฌํ•˜๋Š” ์ผ์ด๋‹ค. ๊ทธ๋ž˜์„œ ๊ธฐ๊ณ„ ๋…ํ•ด ๊ธฐ๋ฐ˜์˜ ์งˆ์˜์‘๋‹ต์ด๋ผ๊ณ  ํ•œ๋‹ค.

  • 4๊ฐœ์˜ ์žฅ์†Œ๊ฐ€ ์žˆ์œผ๋ฉฐ ๋‘๋ฒˆ์งธ ์ค„์˜ they์—๋Š” Daniel์„ ํฌํ•จํ•˜์ง€๋งŒ ๋„ค๋ฒˆ์งธ ์ค„์˜ they์—๋Š” ํฌํ•จํ•˜์ง€ ์•Š๋Š”๋‹ค. ์ด๋Ÿฌํ•œ ๋ถ€๋ถ„๊นŒ์ง€ ๋‹ค ๊ตฌ๋ณ„ํ•ด์„œ ๋‹ต์„ ๋„์ถœํ•ด์•ผํ•œ๋‹ค.

  • ์‹ค์ œ๋กœ๋Š” ๋” ์–ด๋ ต๊ณ  ์œ ์˜๋ฏธํ•œ Task๋ฅผ ํ•ด์•ผํ•˜๋ฉฐ, ์ด์—๋Œ€ํ•œ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋Š” SQuAD๊ฐ€ ์žˆ๋‹ค.

SQuAD 1.1

Stanford ๋Œ€ํ•™๊ต์—์„œ ๋งŒ๋“ค์—ˆ๊ธฐ ๋•Œ๋ฌธ์— Stanford Question Answering Dataset ์„ ์ค„์—ฌ์„œ ๋ช…๋ช…ํ–ˆ๋‹ค.

๋ฒ„ํŠธ์˜ ์ž…๋ ฅ์œผ๋กœ ์ง€๋ฌธ๊ณผ ๋‹ต์„ ํ•„์š”๋กœ ํ•˜๋Š” ์งˆ๋ฌธ์„ SEP ํ† ํฐ์„ ํ†ตํ•ด Concatํ•ด์„œ ํ•˜๋‚˜์˜ Sequence๋กœ ์ œ๊ณตํ•œ ๋’ค ์ธ์ฝ”๋”ฉ์„ ์ง„ํ–‰ํ•œ๋‹ค.

์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์ด ์‹œ์ž‘๋˜๋Š” ๋ฌธ์žฅ์„ ์ฐพ๋Š” ๊ฒƒ์„ ์‹œ์ž‘์ ์œผ๋กœ ํ•œ๋‹ค. ๊ฐ ๋‹จ์–ด์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์–ป์€ ๋’ค ์ด ๋ฒกํ„ฐ๋“ค์€ FC๋ฅผ ๊ฑฐ์ณ ๊ฐ ๋‹จ์–ด๋ณ„๋กœ ์Šค์นผ๋ผ๊ฐ’์„ ์–ป๊ฒŒ๋œ๋‹ค.

๋˜, ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์ด ๋๋‚˜๋Š” ๋ฌธ์žฅ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ฐพ์•„์•ผ ํ•˜๋ฉฐ ๋˜ ๋‹ค๋ฅธ FC๋ฅผ ๊ฑฐ์ณ ์Šค์นผ๋ผ ๊ฐ’์„ ์–ป๊ฒŒ๋œ๋‹ค.

์ดํ›„, ์Šคํƒ€ํŒ… ํฌ์ธํŠธ์™€ ์—”๋”ฉ ํฌ์ธํŠธ๋ฅผ ํ•™์Šตํ•ด์„œ Ground Truth์— Softmax Loss๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค.

SQuAD 2.0

์ฃผ์–ด์ง„ ์ง€๋ฌธ์— ๋Œ€ํ•œ ์งˆ๋ฌธ์ด ํ•ญ์ƒ ์ •๋‹ต์ด ์กด์žฌํ•˜์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ๋‹ค. ์ด๊ฒƒ๊นŒ์ง€ ํŒ๋‹จํ•ด์„œ ์งˆ๋ฌธ์ด ์žˆ์œผ๋ฉด ๋‹ต์„, ์—†์œผ๋ฉด No answer๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์ตœ์ข…์ ์œผ๋กœ ์˜ˆ์ธก์— ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ๋•Œ๋Š” CLS๋ฅผ ๊ฐ€์ง€๊ณ  Cross Entropy๋ฅผ ํ†ตํ•ด ๋‹ต์˜ ์กด์žฌ ์œ ๋ฌด๋ฅผ ๋จผ์ € ํŒŒ์•…ํ•œ๋‹ค. ์ดํ›„๋Š” 1.1์˜ ๋ฐฉ์‹๊ณผ ๋™์ผํ•˜๋‹ค.

On SWAG

์ฃผ์–ด์ง„ ๋ฌธ์žฅ์ด ์žˆ์„ ๋•Œ ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚ ๋ฒ•ํ•œ ์ ์ ˆํ•œ ๋ฌธ์žฅ์„ ๊ณ ๋ฅด๋Š” Task์ด๋‹ค. ์—ฌ๊ธฐ์„œ๋„ CLS ํ† ํฐ์„ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ๊ฐ๊ด€์‹์œผ๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ฌธ์ œ์™€ ๋ณด๊ธฐ๋ฅผ SEP ํ† ํฐ์œผ๋กœ Concatํ•ด์„œ BERT๋ฅผ ํ†ตํ•ด ์ธ์ฝ”๋”ฉํ•ด์„œ ์–ป์€ CLS๋ฅผ ๊ฐ€์ง€๊ณ  FC๋ฅผ ๊ฑฐ์ณ ์Šค์นผ๋ผ๊ฐ’์„ ์–ป๋Š”๋‹ค. ์ด ์ค‘ ๊ฐ€์žฅ ํฐ ์Šค์นผ๋ผ๊ฐ’์ด ์ •๋‹ต์ด ๋œ๋‹ค.

Ablation Study

BERT์—์„œ ๊ฐ layer ๋ณ„ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ ์  ๋Š˜๋ฆฐ๋‹ค๊ณ  ํ•  ๋•Œ, ๋ชจ๋ธ์˜ ํฌ๊ธฐ๊ฐ€ ์ ์  ์ปค์งˆ ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๊ณ„์†์ ์œผ๋กœ ๋ˆ์ž„์—†์ด ์ข‹์•„์ง„๋‹ค๋Š” ์—ฐ๊ตฌ๊ฒฐ๊ณผ์ด๋‹ค. GPU๋ฅผ ๊ฐ€๋Šฅํ•œ ๋งŽ์ด ์จ์„œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋Š˜๋ ค ํ•™์Šตํ•˜๋ฉด ๊ทธ๋งŒํผ ๋˜ ์„ฑ๋Šฅ์ด ์˜ค๋ฅธ๋‹ค๊ณ ํ•œ๋‹ค.

๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ํ‚ค์›Œ๋ผ!