๐Ÿšดโ€โ™‚๏ธ
TIL
  • MAIN
  • : TIL?
  • : WIL
  • : Plan
  • : Retrospective
    • 21Y
      • Wait a moment!
      • 9M 2W
      • 9M1W
      • 8M4W
      • 8M3W
      • 8M2W
      • 8M1W
      • 7M4W
      • 7M3W
      • 7M2W
      • 7M1W
      • 6M5W
      • 1H
    • ์ƒˆ์‚ฌ๋žŒ ๋˜๊ธฐ ํ”„๋กœ์ ํŠธ
      • 2ํšŒ์ฐจ
      • 1ํšŒ์ฐจ
  • TIL : ML
    • Paper Analysis
      • BERT
      • Transformer
    • Boostcamp 2st
      • [S]Data Viz
        • (4-3) Seaborn ์‹ฌํ™”
        • (4-2) Seaborn ๊ธฐ์ดˆ
        • (4-1) Seaborn ์†Œ๊ฐœ
        • (3-4) More Tips
        • (3-3) Facet ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-2) Color ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-1) Text ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-3) Scatter Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-2) Line Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-1) Bar Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (1-3) Python๊ณผ Matplotlib
        • (1-2) ์‹œ๊ฐํ™”์˜ ์š”์†Œ
        • (1-1) Welcome to Visualization (OT)
      • [P]MRC
        • (2๊ฐ•) Extraction-based MRC
        • (1๊ฐ•) MRC Intro & Python Basics
      • [P]KLUE
        • (5๊ฐ•) BERT ๊ธฐ๋ฐ˜ ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
        • (4๊ฐ•) ํ•œ๊ตญ์–ด BERT ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต
        • [NLP] ๋ฌธ์žฅ ๋‚ด ๊ฐœ์ฒด๊ฐ„ ๊ด€๊ณ„ ์ถ”์ถœ
        • (3๊ฐ•) BERT ์–ธ์–ด๋ชจ๋ธ ์†Œ๊ฐœ
        • (2๊ฐ•) ์ž์—ฐ์–ด์˜ ์ „์ฒ˜๋ฆฌ
        • (1๊ฐ•) ์ธ๊ณต์ง€๋Šฅ๊ณผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
      • [U]Stage-CV
      • [U]Stage-NLP
        • 7W Retrospective
        • (10๊ฐ•) Advanced Self-supervised Pre-training Models
        • (09๊ฐ•) Self-supervised Pre-training Models
        • (08๊ฐ•) Transformer (2)
        • (07๊ฐ•) Transformer (1)
        • 6W Retrospective
        • (06๊ฐ•) Beam Search and BLEU score
        • (05๊ฐ•) Sequence to Sequence with Attention
        • (04๊ฐ•) LSTM and GRU
        • (03๊ฐ•) Recurrent Neural Network and Language Modeling
        • (02๊ฐ•) Word Embedding
        • (01๊ฐ•) Intro to NLP, Bag-of-Words
        • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Preprocessing for NMT Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Subword-level Language Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ2] RNN-based Language Model
        • [์„ ํƒ ๊ณผ์ œ] BERT Fine-tuning with transformers
        • [ํ•„์ˆ˜ ๊ณผ์ œ] Data Preprocessing
      • Mask Wear Image Classification
        • 5W Retrospective
        • Report_Level1_6
        • Performance | Review
        • DAY 11 : HardVoting | MultiLabelClassification
        • DAY 10 : Cutmix
        • DAY 9 : Loss Function
        • DAY 8 : Baseline
        • DAY 7 : Class Imbalance | Stratification
        • DAY 6 : Error Fix
        • DAY 5 : Facenet | Save
        • DAY 4 : VIT | F1_Loss | LrScheduler
        • DAY 3 : DataSet/Lodaer | EfficientNet
        • DAY 2 : Labeling
        • DAY 1 : EDA
        • 2_EDA Analysis
      • [P]Stage-1
        • 4W Retrospective
        • (10๊ฐ•) Experiment Toolkits & Tips
        • (9๊ฐ•) Ensemble
        • (8๊ฐ•) Training & Inference 2
        • (7๊ฐ•) Training & Inference 1
        • (6๊ฐ•) Model 2
        • (5๊ฐ•) Model 1
        • (4๊ฐ•) Data Generation
        • (3๊ฐ•) Dataset
        • (2๊ฐ•) Image Classification & EDA
        • (1๊ฐ•) Competition with AI Stages!
      • [U]Stage-3
        • 3W Retrospective
        • PyTorch
          • (10๊ฐ•) PyTorch Troubleshooting
          • (09๊ฐ•) Hyperparameter Tuning
          • (08๊ฐ•) Multi-GPU ํ•™์Šต
          • (07๊ฐ•) Monitoring tools for PyTorch
          • (06๊ฐ•) ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
          • (05๊ฐ•) Dataset & Dataloader
          • (04๊ฐ•) AutoGrad & Optimizer
          • (03๊ฐ•) PyTorch ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
          • (02๊ฐ•) PyTorch Basics
          • (01๊ฐ•) Introduction to PyTorch
      • [U]Stage-2
        • 2W Retrospective
        • DL Basic
          • (10๊ฐ•) Generative Models 2
          • (09๊ฐ•) Generative Models 1
          • (08๊ฐ•) Sequential Models - Transformer
          • (07๊ฐ•) Sequential Models - RNN
          • (06๊ฐ•) Computer Vision Applications
          • (05๊ฐ•) Modern CNN - 1x1 convolution์˜ ์ค‘์š”์„ฑ
          • (04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?
          • (03๊ฐ•) Optimization
          • (02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)
          • (01๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ์šฉ์–ด ์„ค๋ช… - Historical Review
        • Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Multi-headed Attention Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] LSTM Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] CNN Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Optimization Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] MLP Assignment
      • [U]Stage-1
        • 1W Retrospective
        • AI Math
          • (AI Math 10๊ฐ•) RNN ์ฒซ๊ฑธ์Œ
          • (AI Math 9๊ฐ•) CNN ์ฒซ๊ฑธ์Œ
          • (AI Math 8๊ฐ•) ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 7๊ฐ•) ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 6๊ฐ•) ํ™•๋ฅ ๋ก  ๋ง›๋ณด๊ธฐ
          • (AI Math 5๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต๋ฐฉ๋ฒ• ์ดํ•ดํ•˜๊ธฐ
          • (AI Math 4๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ๋งค์šด๋ง›
          • (AI Math 3๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ์ˆœํ•œ๋ง›
          • (AI Math 2๊ฐ•) ํ–‰๋ ฌ์ด ๋ญ์˜ˆ์š”?
          • (AI Math 1๊ฐ•) ๋ฒกํ„ฐ๊ฐ€ ๋ญ์˜ˆ์š”?
        • Python
          • (Python 7-2๊ฐ•) pandas II
          • (Python 7-1๊ฐ•) pandas I
          • (Python 6๊ฐ•) numpy
          • (Python 5-2๊ฐ•) Python data handling
          • (Python 5-1๊ฐ•) File / Exception / Log Handling
          • (Python 4-2๊ฐ•) Module and Project
          • (Python 4-1๊ฐ•) Python Object Oriented Programming
          • (Python 3-2๊ฐ•) Pythonic code
          • (Python 3-1๊ฐ•) Python Data Structure
          • (Python 2-4๊ฐ•) String and advanced function concept
          • (Python 2-3๊ฐ•) Conditionals and Loops
          • (Python 2-2๊ฐ•) Function and Console I/O
          • (Python 2-1๊ฐ•) Variables
          • (Python 1-3๊ฐ•) ํŒŒ์ด์ฌ ์ฝ”๋”ฉ ํ™˜๊ฒฝ
          • (Python 1-2๊ฐ•) ํŒŒ์ด์ฌ ๊ฐœ์š”
          • (Python 1-1๊ฐ•) Basic computer class for newbies
        • Assignment
          • [์„ ํƒ ๊ณผ์ œ 3] Maximum Likelihood Estimate
          • [์„ ํƒ ๊ณผ์ œ 2] Backpropagation
          • [์„ ํƒ ๊ณผ์ œ 1] Gradient Descent
          • [ํ•„์ˆ˜ ๊ณผ์ œ 5] Morsecode
          • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Baseball
          • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Text Processing 2
          • [ํ•„์ˆ˜ ๊ณผ์ œ 2] Text Processing 1
          • [ํ•„์ˆ˜ ๊ณผ์ œ 1] Basic Math
    • ๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ
      • ์ข…ํ•ฉ ์‹ค์Šต 2 - ์บ๊ธ€ Plant Pathology(๋‚˜๋ฌด์žŽ ๋ณ‘ ์ง„๋‹จ) ๊ฒฝ์—ฐ ๋Œ€ํšŒ
      • ์ข…ํ•ฉ ์‹ค์Šต 1 - 120์ข…์˜ Dog Breed Identification ๋ชจ๋ธ ์ตœ์ ํ™”
      • ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์˜ ๋ฏธ์„ธ ์กฐ์ • ํ•™์Šต๊ณผ ๋‹ค์–‘ํ•œ Learning Rate Scheduler์˜ ์ ์šฉ
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - ResNet ์ƒ์„ธ์™€ EfficientNet ๊ฐœ์š”
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - AlexNet, VGGNet, GoogLeNet
      • Albumentation์„ ์ด์šฉํ•œ Augmentation๊ธฐ๋ฒ•๊ณผ Keras Sequence ํ™œ์šฉํ•˜๊ธฐ
      • ์‚ฌ์ „ ํ›ˆ๋ จ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ๊ณผ Keras Generator ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ดํ•ด
      • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์ดํ•ด - Keras ImageDataGenerator ํ™œ์šฉ
      • CNN ๋ชจ๋ธ ๊ตฌํ˜„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ณธ ๊ธฐ๋ฒ• ์ ์šฉํ•˜๊ธฐ
    • AI School 1st
    • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
    • ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜
  • TIL : Python & Math
    • Do It! ์žฅ๊ณ +๋ถ€ํŠธ์ŠคํŠธ๋žฉ: ํŒŒ์ด์ฌ ์›น๊ฐœ๋ฐœ์˜ ์ •์„
      • Relations - ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„
      • Relations - ๋‹ค๋Œ€์ผ ๊ด€๊ณ„
      • ํ…œํ”Œ๋ฆฟ ํŒŒ์ผ ๋ชจ๋“ˆํ™” ํ•˜๊ธฐ
      • TDD (Test Driven Development)
      • template tags & ์กฐ๊ฑด๋ฌธ
      • ์ •์  ํŒŒ์ผ(static files) & ๋ฏธ๋””์–ด ํŒŒ์ผ(media files)
      • FBV (Function Based View)์™€ CBV (Class Based View)
      • Django ์ž…๋ฌธํ•˜๊ธฐ
      • ๋ถ€ํŠธ์ŠคํŠธ๋žฉ
      • ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์ดˆ๋‹ค์ง€๊ธฐ (HTML, CSS, JS)
      • ๋“ค์–ด๊ฐ€๊ธฐ + ํ™˜๊ฒฝ์„ค์ •
    • Algorithm
      • Programmers
        • Level1
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์ˆซ์ž ๋ฌธ์ž์—ด๊ณผ ์˜๋‹จ์–ด
          • ์ž์—ฐ์ˆ˜ ๋’ค์ง‘์–ด ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค๊ธฐ
          • ์ •์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐฐ์น˜ํ•˜๊ธฐ
          • ์ •์ˆ˜ ์ œ๊ณฑ๊ทผ ํŒ๋ณ„
          • ์ œ์ผ ์ž‘์€ ์ˆ˜ ์ œ๊ฑฐํ•˜๊ธฐ
          • ์ง์‚ฌ๊ฐํ˜• ๋ณ„์ฐ๊ธฐ
          • ์ง์ˆ˜์™€ ํ™€์ˆ˜
          • ์ฒด์œก๋ณต
          • ์ตœ๋Œ€๊ณต์•ฝ์ˆ˜์™€ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • ์ฝœ๋ผ์ธ  ์ถ”์ธก
          • ํฌ๋ ˆ์ธ ์ธํ˜•๋ฝ‘๊ธฐ ๊ฒŒ์ž„
          • ํ‚คํŒจ๋“œ ๋ˆ„๋ฅด๊ธฐ
          • ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
          • ํฐ์ผ“๋ชฌ
          • ํ•˜์ƒค๋“œ ์ˆ˜
          • ํ•ธ๋“œํฐ ๋ฒˆํ˜ธ ๊ฐ€๋ฆฌ๊ธฐ
          • ํ–‰๋ ฌ์˜ ๋ง์…ˆ
        • Level2
          • ์ˆซ์ž์˜ ํ‘œํ˜„
          • ์ˆœ์œ„ ๊ฒ€์ƒ‰
          • ์ˆ˜์‹ ์ตœ๋Œ€ํ™”
          • ์†Œ์ˆ˜ ์ฐพ๊ธฐ
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์‚ผ๊ฐ ๋‹ฌํŒฝ์ด
          • ๋ฌธ์ž์—ด ์••์ถ•
          • ๋ฉ”๋‰ด ๋ฆฌ๋‰ด์–ผ
          • ๋” ๋งต๊ฒŒ
          • ๋•…๋”ฐ๋จน๊ธฐ
          • ๋ฉ€์ฉกํ•œ ์‚ฌ๊ฐํ˜•
          • ๊ด„ํ˜ธ ํšŒ์ „ํ•˜๊ธฐ
          • ๊ด„ํ˜ธ ๋ณ€ํ™˜
          • ๊ตฌ๋ช…๋ณดํŠธ
          • ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
          • ๋‰ด์Šค ํด๋Ÿฌ์Šคํ„ฐ๋ง
          • ๋‹ค๋ฆฌ๋ฅผ ์ง€๋‚˜๋Š” ํŠธ๋Ÿญ
          • ๋‹ค์Œ ํฐ ์ˆซ์ž
          • ๊ฒŒ์ž„ ๋งต ์ตœ๋‹จ๊ฑฐ๋ฆฌ
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
          • ๊ฐ€์žฅ ํฐ ์ •์‚ฌ๊ฐํ˜• ์ฐพ๊ธฐ
          • H-Index
          • JadenCase ๋ฌธ์ž์—ด ๋งŒ๋“ค๊ธฐ
          • N๊ฐœ์˜ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • N์ง„์ˆ˜ ๊ฒŒ์ž„
          • ๊ฐ€์žฅ ํฐ ์ˆ˜
          • 124 ๋‚˜๋ผ์˜ ์ˆซ์ž
          • 2๊ฐœ ์ดํ•˜๋กœ ๋‹ค๋ฅธ ๋น„ํŠธ
          • [3์ฐจ] ํŒŒ์ผ๋ช… ์ •๋ ฌ
          • [3์ฐจ] ์••์ถ•
          • ์ค„ ์„œ๋Š” ๋ฐฉ๋ฒ•
          • [3์ฐจ] ๋ฐฉ๊ธˆ ๊ทธ๊ณก
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
        • Level3
          • ๋งค์นญ ์ ์ˆ˜
          • ์™ธ๋ฒฝ ์ ๊ฒ€
          • ๊ธฐ์ง€๊ตญ ์„ค์น˜
          • ์ˆซ์ž ๊ฒŒ์ž„
          • 110 ์˜ฎ๊ธฐ๊ธฐ
          • ๊ด‘๊ณ  ์ œ๊ฑฐ
          • ๊ธธ ์ฐพ๊ธฐ ๊ฒŒ์ž„
          • ์…”ํ‹€๋ฒ„์Šค
          • ๋‹จ์†์นด๋ฉ”๋ผ
          • ํ‘œ ํŽธ์ง‘
          • N-Queen
          • ์ง•๊ฒ€๋‹ค๋ฆฌ ๊ฑด๋„ˆ๊ธฐ
          • ์ตœ๊ณ ์˜ ์ง‘ํ•ฉ
          • ํ•ฉ์Šน ํƒ์‹œ ์š”๊ธˆ
          • ๊ฑฐ์Šค๋ฆ„๋ˆ
          • ํ•˜๋…ธ์ด์˜ ํƒ‘
          • ๋ฉ€๋ฆฌ ๋›ฐ๊ธฐ
          • ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ
        • Level4
    • Head First Python
    • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ SQL
    • ๋‹จ ๋‘ ์žฅ์˜ ๋ฌธ์„œ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™” ๋ฝ€๊ฐœ๊ธฐ
    • Linear Algebra(Khan Academy)
    • ์ธ๊ณต์ง€๋Šฅ์„ ์œ„ํ•œ ์„ ํ˜•๋Œ€์ˆ˜
    • Statistics110
  • TIL : etc
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Kubernetes
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Docker
      • 2. ๋„์ปค ์„ค์น˜ ์‹ค์Šต 1 - ํ•™์ŠตํŽธ(์ค€๋น„๋ฌผ/์‹ค์Šต ์œ ํ˜• ์†Œ๊ฐœ)
      • 1. ์ปจํ…Œ์ด๋„ˆ์™€ ๋„์ปค์˜ ์ดํ•ด - ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์“ฐ๋Š”์ด์œ  / ์ผ๋ฐ˜ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ปจํ…Œ์ด๋„ˆํ”„๋กœ๊ทธ๋žจ์˜ ์ฐจ์ด์ 
      • 0. ๋“œ๋””์–ด ์ฐพ์•„์˜จ Docker ๊ฐ•์˜! ์™•์ดˆ๋ณด์—์„œ ๋„์ปค ๋งˆ์Šคํ„ฐ๋กœ - OT
    • CoinTrading
      • [๊ฐ€์ƒ ํ™”ํ ์ž๋™ ๋งค๋งค ํ”„๋กœ๊ทธ๋žจ] ๋ฐฑํ…Œ์ŠคํŒ… : ๊ฐ„๋‹จํ•œ ํ…Œ์ŠคํŒ…
    • Gatsby
      • 01 ๊นƒ๋ถ ํฌ๊ธฐ ์„ ์–ธ
  • TIL : Project
    • Mask Wear Image Classification
    • Project. GARIGO
  • 2021 TIL
    • CHANGED
    • JUN
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Thu
      • 2 Wed
      • 1 Tue
    • MAY
      • 31 Mon
      • 30 Sun
      • 29 Sat
      • 28 Fri
      • 27 Thu
      • 26 Wed
      • 25 Tue
      • 24 Mon
      • 23 Sun
      • 22 Sat
      • 21 Fri
      • 20 Thu
      • 19 Wed
      • 18 Tue
      • 17 Mon
      • 16 Sun
      • 15 Sat
      • 14 Fri
      • 13 Thu
      • 12 Wed
      • 11 Tue
      • 10 Mon
      • 9 Sun
      • 8 Sat
      • 7 Fri
      • 6 Thu
      • 5 Wed
      • 4 Tue
      • 3 Mon
      • 2 Sun
      • 1 Sat
    • APR
      • 30 Fri
      • 29 Thu
      • 28 Wed
      • 27 Tue
      • 26 Mon
      • 25 Sun
      • 24 Sat
      • 23 Fri
      • 22 Thu
      • 21 Wed
      • 20 Tue
      • 19 Mon
      • 18 Sun
      • 17 Sat
      • 16 Fri
      • 15 Thu
      • 14 Wed
      • 13 Tue
      • 12 Mon
      • 11 Sun
      • 10 Sat
      • 9 Fri
      • 8 Thu
      • 7 Wed
      • 6 Tue
      • 5 Mon
      • 4 Sun
      • 3 Sat
      • 2 Fri
      • 1 Thu
    • MAR
      • 31 Wed
      • 30 Tue
      • 29 Mon
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • FEB
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • JAN
      • 31 Sun
      • 30 Sat
      • 29 Fri
      • 28 Thu
      • 27 Wed
      • 26 Tue
      • 25 Mon
      • 24 Sun
      • 23 Sat
      • 22 Fri
      • 21 Thu
      • 20 Wed
      • 19 Tue
      • 18 Mon
      • 17 Sun
      • 16 Sat
      • 15 Fri
      • 14 Thu
      • 13 Wed
      • 12 Tue
      • 11 Mon
      • 10 Sun
      • 9 Sat
      • 8 Fri
      • 7 Thu
      • 6 Wed
      • 5 Tue
      • 4 Mon
      • 3 Sun
      • 2 Sat
      • 1 Fri
  • 2020 TIL
    • DEC
      • 31 Thu
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Tue
      • 2 Wed
      • 1 Tue
    • NOV
      • 30 Mon
Powered by GitBook
On this page
  • GPT-2
  • Motivation
  • Datasets
  • Question Answering
  • Summarization
  • Translation
  • GPT-3
  • Language Models are Few-Shot Learners
  • A Lite BERT for Self-supervised Learning of Language Representations
  • ELECTRA: Efficiently Learning an Encoder that Classifies Token Replacements Accurately
  • Light-weight Models
  • Fusing Knowledge Graph into Language Model

Was this helpful?

  1. TIL : ML
  2. Boostcamp 2st
  3. [U]Stage-NLP

(10๊ฐ•) Advanced Self-supervised Pre-training Models

210916

Previous7W RetrospectiveNext(09๊ฐ•) Self-supervised Pre-training Models

Last updated 3 years ago

Was this helpful?

GPT-2

๋ณด๋‹ค ์ตœ๊ทผ์— ๋‚˜์˜จ ๋ชจ๋ธ์ด๋‹ค. GPT-1 ๊ณผ ๋ชจ๋ธ ๊ตฌ์กฐ์ ์ธ ์ฐจ์ด๋Š” ์—†์ง€๋งŒ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์ด ์žˆ๋‹ค.

  • Transformer ๋ชจ๋ธ์˜ ๋ ˆ์ด์–ด๊ฐ€ ๋งŽ์•„์กŒ๋‹ค

  • ์—ฌ์ „ํžˆ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” Language Model์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

  • Training data๋Š” 40GB ๋ผ๋Š” ์ฆ๊ฐ€๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

    • ์ด ๋•Œ ๋‹จ์ˆœํžˆ ๊ทธ๋ƒฅ ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹ˆ๋ผ ํ’ˆ์งˆ์ด ๋งค์šฐ ์ข‹์€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

  • ์—ฌ๋Ÿฌ down-stream stasks๊ฐ€ zero-shot setting ์œผ๋กœ ๋ชจ๋‘ ๋‹ค๋ค„์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ๋ณด์—ฌ์ค€๋‹ค.

  • ์‚ฌ๋žŒ์ด ์“ด ๋ถ€๋ถ„(๋นจ๊ฐ„์ƒ‰) ์ดํ›„๋กœ ๋งค์šฐ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๊ธ€(ํŒŒ๋ž€์ƒ‰)์„ ์ž‘์„ฑํ•œ ๋ชจ์Šต

Motivation

๋ชจ๋“  ์ž์—ฐ์–ด Task๋“ค์ด ์งˆ์˜์‘๋‹ต์˜ ํ˜•ํƒœ๋กœ ๋ฐ”๋€” ์ˆ˜ ์žˆ๋‹ค๋Š” ํ†ต์ฐฐ์„ ์ œ์‹œํ–ˆ๋‹ค.

  • ์ด์ „์—๋Š” Binary classification์˜ ๊ฐ์ •๋ถ„์„๊ณผ How are you ๋ผ๋Š” ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ์„œ๋กœ ๋‹ค๋ฅธ Output layer ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ Task๋กœ ๊ฐ„์ฃผ๋˜์—ˆ๋‹ค.

  • ๊ฐ์ •๋ถ„์„

    • "Do you think this sentence is positive?"

  • ๋ฌธ๋‹จ์š”์•ฝ

    • "What is topic or point on this literature?"

Datasets

๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๊ฐ€ ๋งค์šฐ ํฌ๋ฉด์„œ๋„ ํ’ˆ์งˆ ์—ญ์‹œ ์ข‹์•„ ์ง€์‹์„ ํšจ๊ณผ์ ์œผ๋กœ ์ž˜ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋‹ค.

์ผ๋ถ€๋Š” ๋ ˆ๋”ง์— ์žˆ๋Š” ๋‹ต๋ณ€์ค‘ ์™ธ๋ถ€๋งํฌ๊ฐ€ ์žˆ๊ณ , ๋˜ ์ด ๋‹ต๋ณ€์ด ์ข‹์•„์š”๊ฐ€ 3๊ฐœ ์ด์ƒ ๋ฐ›์•˜์„ ๋•Œ ์ด ์™ธ๋ถ€๋งํฌ ์† ๊ฒŒ์‹œ๋ฌผ์„ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋งŒ๋“ค์—ˆ๋‹ค.

  • ๋ ˆ๋”ง์€ ์งˆ๋ฌธ/๋‹ต๋ณ€ ์‚ฌ์ดํŠธ

๋˜ํ•œ, Byte pair encoding์„ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ Layer Normalization์˜ ์œ„์น˜๊ฐ€ ์กฐ๊ธˆ ๋ฐ”๋€Œ์—ˆ๋‹ค. ๋˜, ๋ ˆ์ด์–ด๊ฐ€ ์˜ฌ๋ผ๊ฐˆ์ˆ˜๋ก ์„ ํ˜•๋ณ€ํ™˜์— ์‚ฌ์šฉ๋˜๋Š” ์ˆ˜๋“ค์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋„๋ก ํ–ˆ๋Š”๋ฐ ์ด๋Š”, ์œ„์ชฝ์˜ ๋ ˆ์ด์–ด์˜ ์—ญํ• ์ด ์ค„์–ด๋“ค ์ˆ˜ ์žˆ๋„๋ก ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ–ˆ๋‹ค.

Question Answering

๋ชจ๋“  Task๋Š” ์งˆ์˜์‘๋‹ต์— ํ˜•ํƒœ๋กœ ๋ฐ”๋€” ์ˆ˜ ์žˆ๋‹ค. ์›๋ž˜๋Š” ์ฃผ์–ด์ง„ ๋Œ€ํ™”ํ˜• ์งˆ์˜์‘๋‹ต ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  Fine tuning ํ•˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์ณ์•ผํ•˜๋Š”๋ฐ, ์ด๋ฅผ ํ•™์Šตํ•˜์ง€ ์•Š๊ณ (=Zero shot setting) ๋ฐ”๋กœ ์ถ”๋ก ํ•˜๋Š” ์‹คํ—˜์„ ํ•ด๋ณด์•˜๋”๋‹ˆ 55 F1 score๊ฐ€ ๋‚˜์™”๊ณ  fine tuning์„ ๊ฑฐ์น˜๋‹ˆ 89 F1 score๊ฐ€ ๋‚˜์™”๋‹ค.

Summarization

Zero-shot setting์œผ๋กœ๋„ ์š”์•ฝ์ด ๊ฐ€๋Šฅํ–ˆ๋Š”๋ฐ ์ด ์ด์œ ๋Š” GPT-2๋Š” ์ด์ „ ๋‹จ์–ด๋กœ ๋’ท ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋‹ค๋ณด๋‹ˆ, ๋ฌธ๋‹จ์˜ ๋งˆ์ง€๋ง‰ ๋‹จ์–ด๊ฐ€ TR ํ† ํฐ์œผ๋กœ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋„๋ก ์ดˆ๊ธฐ ๋ชจ๋ธ ํ•™์Šต ๋•Œ ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฐ ๋ฌธ์žฅ ๋์— TR ํ† ํฐ์„ ์ถ”๊ฐ€ํ–ˆ๊ณ  (์™œ๋ƒํ•˜๋ฉด ์š”์•ฝ Task์„ ์œ„ํ•œ Fine tuning์ด ๊ทธ๋ ‡๊ฒŒ ์ด๋ฃจ์–ด์ง) ๋ฐ”๋กœ ์–ด๋– ํ•œ ๋ฐ์ดํ„ฐ์˜ ์ „์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”์—†์ด ๊ธฐ์กด ํ•™์Šต๋งŒ์œผ๋กœ๋„ ์š”์•ฝ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ๋œ๋‹ค.

Translation

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ๋ฒˆ์—ญ ์—ญ์‹œ ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์ด๋‚˜ ๋ฌธ๋‹จ ๋’ค์— "they say in french" ๋“ฑ์„ ๋ถ™์—ฌ์ฃผ๋ฉด ํ•ด๋‹น์–ธ์–ด(์—ฌ๊ธฐ์„œ๋Š” ๋ถˆ์–ด)๋กœ ์ž˜ ๋ฒˆ์—ญํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋‹ค.

GPT-3

GPT-2 ๋ฅผ ๋” ๊ฐœ์„ ํ•œ ๋ชจ์Šต. ๊ธฐ์กด ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋ฅผ ๋ฐ”๊พธ์ง„ ์•Š์•˜๊ณ  ํ›จ์”ฌ ๋งŽ์€ ๋ฐ์ดํ„ฐ์…‹, ํ›จ์”ฌ ๊นŠ์€ ๋ ˆ์ด์–ด, ํ›จ์”ฌ ๋งŽ์€ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์ ์šฉํ•ด์„œ ์„ฑ๋Šฅ์„ ๋งŽ์ด ๋Œ์–ด์˜ฌ๋ ธ๋‹ค.

GPT-3๋Š” 2020๋…„์— ์ธ๊ณต์ง€๋Šฅ์œผ๋กœ ์œ ๋ช…ํ•œ ํ•™ํšŒ NeruIPS ์—์„œ Best Paper ์ƒ์„ ๋ฐ›์•˜๋Š”๋ฐ ๊ทธ ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋†€๋ผ์šด ์ ์„ ๋ณด์—ฌ์คฌ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

Language Models are Few-Shot Learners

๊ธฐ์กด์— GPT-2 ๊ฐ€ ๋ณด์—ฌ์ค€ Zero-shot ์„ธํŒ…์—์„œ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋งŽ์ด ๋Œ์–ด์˜ฌ๋ ธ๋‹ค.

  • ์˜์–ด๋ฅผ ๋ถˆ์–ด๋กœ ๋ฒˆ์—ญํ•˜๋Š” Task๋ฅผ ์–ด๋– ํ•œ fine tuning ์—†์ด ์ง„ํ–‰ํ•˜๋Š” ๋ชจ์Šต.

  • one shot์€ ์ด์— ๋Œ€ํ•ด ์˜ˆ์‹œ๋ฅผ ๋”ฑ ํ•œ๋ฒˆ๋งŒ ์ œ๊ณตํ•ด์ฃผ๋Š” ๊ฒƒ

  • ๋ฒˆ์—ญ์„ ์œ„ํ•œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ๋”ฑ ํ•œ์Œ๋งŒ ์ œ๊ณตํ•˜๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ์˜ ํ•™์Šต์„ ์œ„ํ•ด ๋ชจ๋ธ์˜ ๋ ˆ์ด์–ด๋ฅผ ๋ณ€๊ฒฝํ•˜๊ฑฐ๋‚˜ ํ•˜๋Š” ๊ฒƒ์€ ์ผ์ ˆ ํ•˜์ง€ ์•Š๊ณ  ๊ธฐ์กด ๋ฐ์ดํ„ฐ์˜ ๋ชจ์–‘์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

  • Zero-shot ์„ธํŒ…๋ณด๋‹ค ํ›จ์”ฌ ์„ฑ๋Šฅ์ด ์ฆ๊ฐ€ํ•œ๋‹ค.

  • one-shot ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋” ํ–ฅ์ƒํ•œ๋‹ค.

๋ฐ์ดํ„ฐ๋ฅผ "๋™์ ์œผ๋กœ" ํ•™์Šตํ•˜๊ณ  ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ธ๋‹ค๋Š” ์ ์—์„œ GPT-3์˜ ์žฅ์ ์„ ๋ณด์—ฌ์ค€๋‹ค.

๋˜ํ•œ, ๋ชจ๋ธ์˜ ํฌ๊ธฐ๊ฐ€ ํฌ๋ฉด ํด์ˆ˜๋ก ๋ชจ๋ธ์˜ "๋™์  ์ ์‘ ๋Šฅ๋ ฅ"์ด ๋น ๋ฅด๊ฒŒ ์ƒ์Šนํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

A Lite BERT for Self-supervised Learning of Language Representations

ALBERT ๋ชจ๋ธ์€ ๊ธฐ์กด์˜ ๋ฉ”๋ชจ๋ฆฌ๋‚˜ ํ•™์Šต ๋น„์šฉ์— ๋Œ€ํ•œ ์žฅ์• ๋ฌผ์„ ์ค„์ด๊ณ  ์„ฑ๋Šฅ์— ํฐ ํ•˜๋ฝ ์—†์ด, ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ค๋ ค๊ณ  ํ–ˆ๋‹ค.

๋˜, ์ƒˆ๋กœ์šด ๋ณ€ํ˜•๋œ ํ˜•ํƒœ์˜ ๋ฌธ์žฅ๋ ˆ๋ฒจ์˜ self-supervised learning์˜ pre-trained task๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค.

๊ตฌ์ฒด์ ์œผ๋กœ ์‚ดํŽด๋ณด์ž.

Factorized Embedding Parameterization

์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ์ฐจ์›์ด ์ž‘์œผ๋ฉด ์‹œํ€€์Šค์˜ ํŠน์ง•์„ ๋ชจ๋‘ ๋‹ค ๋‹ด์•„๋‚ด์ง€ ๋ชปํ•˜๊ฒŒ๋˜๊ณ , ๊ทธ๋ ‡๋‹ค๊ณ  ๋„ˆ๋ฌด ํฌ๋ฉด ์—ฐ์‚ฐ๋Ÿ‰๋„ ๋Š˜๊ณ  ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋„ ์ฆ๊ฐ€ํ•˜๊ฒŒ๋˜๋Š” ๋”œ๋ ˆ๋งˆ๊ฐ€ ์žˆ์—ˆ๋‹ค.

๊ทผ๋ฐ, ์ž˜ ์ƒ๊ฐํ•ด๋ณด์ž. ๋ ˆ์ด์–ด๋ฅผ ์‹ฌ์ธต์žˆ๊ฒŒ ์Œ“๋Š”๋‹ค๋Š” ๊ฒƒ์€ ์‹œํ€€์Šค์˜ ์„ธ๋ฐ€ํ•˜๊ณ  ์œ ์˜๋ฏธํ•œ ํŠน์ง•๋“ค์„ ํŒŒ์•…ํ•˜๊ฒ ๋‹ค๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค. ๊ทธ๋ ‡๋‹ค๋Š” ๊ฒƒ์€ ์ด๋Ÿฌํ•œ ํŠน์ง•ํŒŒ์•…์ด๋ผ๋Š” ์—ญํ• ์„ ๊นŠ์€ ๋ ˆ์ด์–ด์—๊ฒŒ ๋งก๊ธฐ๊ณ , ๊ฐ ๋‹จ์–ด๊ฐ€ ๊ฐ€์ง€๋Š” ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ์ฐจ์›์€ ์กฐ๊ธˆ ์ค„์–ด๋“ค์–ด๋„ ๊ดœ์ฐฎ์ง€ ์•Š์„๊นŒ ๋ผ๋Š” ์•„์ด๋””์–ด๊ฐ€ ์ƒ๊ฒผ๊ณ  ์ด๋ฅผ ์œ„ํ•ด ์ž„๋ฒ ๋”ฉ ์ฐจ์›์„ ์ค„์ด๋Š” ์ธ์‚ฌ์ดํŠธ๋ฅผ ์•Œ๋ฒ„ํŠธ๊ฐ€ ์ œ์•ˆํ•˜๊ฒŒ๋œ๋‹ค.

๊ธฐ์กด์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‹จ์–ด์˜ ์ž„๋ฒ ๋”ฉ์— positinal ๋ฒกํ„ฐ๋ฅผ ๋”ํ•ด์„œ ๋ ˆ์ด์–ด์—์„œ ์‚ฌ์šฉํ–ˆ๋‹ค๋ฉด,

์•Œ๋ฒ„ํŠธ์—์„œ๋Š” ๋ ˆ์ด์–ด์˜ ์ž…๋ ฅ์œผ๋กœ ์ฃผ๋Š” ์ฐจ์›์€ ๊ธฐ์กด๊ณผ ๋™์ผํ•˜์ง€๋งŒ, ๋ ˆ์ด์–ด์˜ ์ž…๋ ฅ ์ฐจ์›๊ณผ ์ž„๋ฒ ๋”ฉ์˜ ์ฐจ์›์€ ๊ฐ™์„ ํ•„์š”๊ฐ€ ์—†๋„๋ก ํ•˜๋Š” ์•„์ด๋””์–ด๋ฅผ ์ œ์‹œํ•œ๋‹ค.

  • ๋ฒ„ํŠธ๋Š” ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ์ฐจ์›์ด 4์ธ๋ฐ ๋น„ํ•ด ์•Œ๋ฒ„ํŠธ๋Š” 2์˜ ์ฐจ์›์„ ๊ฐ€์ง€๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  • 2์˜ ์ฐจ์›์„ ๊ฐ€์ง„ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ๊ฐ€์ค‘์น˜ W(E x H shape)๋ฅผ ๊ณฑํ•ด์„œ 4์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋œ๋‹ค.

์ด ๋ฐฉ๋ฒ•์ด ์‹ค์ œ๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ค„์—ฌ์ฃผ์—ˆ์„๊นŒ?

Vocab size๊ฐ€ 500์ด๊ณ , attention layer์—์„œ ์‚ฌ์šฉ๋˜์–ด์•ผ ํ•˜๋Š” ์ฐจ์›์ด 100์ด๋ผ๊ณ  ํ•˜์ž.

  • ๋ฒ„ํŠธ(์™ผ์ชฝ)๋Š” 500*100 = 50,000 ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค

  • ์•Œ๋ฒ„ํŠธ(์˜ค๋ฅธ์ชฝ)์€ 500*15 + 15*100 = 9,000 ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ํ•„์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ค„์ด๊ฒŒ ๋˜๋ฉฐ ์ด๋Š” Vocab size์™€ hidden dimension์ด ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์˜ ์ฐจ์ด๊ฐ€ ๋‚˜๊ฒŒ๋œ๋‹ค.

์•Œ๋ฒ„ํŠธ๊ฐ€ ๊ฐ€์ง€๋Š” ์žฅ์ ์€ ๋˜ ์žˆ๋‹ค. ๊ธฐ์กด์˜ ํŠธ๋žœ์Šคํฌ๋จธ์—์„œ ๋ ˆ์ด์–ด๋ฅผ ๊นŠ๊ฒŒ ์Œ“์„์ˆ˜๋ก ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ์ ์  ๋Š˜์–ด๋‚˜๋Š” ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • multi-head attention์„ ์ ์šฉํ•˜๋ฉด์„œ ๊ฐ๊ฐ์˜ head๊ฐ€ ๋…๋ฆฝ์ ์ด๋ฉด์„œ ์„œ๋กœ ๋‹ค๋ฅธ Q, K, V๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ

  • ์—ฌ๋Ÿฌ ์ธต์œผ๋กœ ๊ตฌ์„ฑ๋œ layer ์—ญ์‹œ ๋ชจ๋‘ ๋‹ค๋ฅธ Q, K, V๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ

์•Œ๋ฒ„ํŠธ๋Š” ์œ„๋ฅผ ์ด๋ ‡๊ฒŒ ํ•ด๊ฒฐํ–ˆ๋‹ค.

  • multi-head๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณต์œ ํ•˜๋ฉด ์•ˆ๋œ๋‹ค! ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์ด๋ถ€๋ถ„์€ ํŒจ์Šคํ•˜์ž.

  • shared-attention :์—ฌ๋Ÿฌ ์ธต์œผ๋กœ ๊ตฌ์„ฑ๋œ layer๋“ค์ด ๋ชจ๋‘ ๊ฐ™์€ attention ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์ž!

  • shared-ffn : ๋ชจ๋“  layer๋“ค์˜ output layer๊ฐ€ ๊ฐ™์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์ž

  • all-shared : ๋ชจ๋‘ ๊ฐ™์€๊ฑธ๋กœ ์‚ฌ์šฉํ•˜์ž

  • ์œ„๋ฅผ ๋ณด๋ฉด ๋‹น์—ฐํžˆ ์„ฑ๋Šฅ์€ ๋–จ์–ด์กŒ์ง€๋งŒ ํ•˜๋ฝํญ์ด ๊ทธ๋ ‡๊ฒŒ ํฌ์ง€ ์•Š์€ ๋ชจ์Šต์ด๋‹ค.

์•Œ๋ฒ„ํŠธ์— ์ œ์‹œํ•œ ๋˜ ๋‹ค๋ฅธ ๊ธฐ๋ฒ•์€ Sentence Order Prediction. ๊ธฐ์กด์˜ BERT์—์„œ ์‚ฌ์šฉ๋˜๋Š” pre trained ๊ธฐ๋ฒ•์€ ๋‘๊ฐœ๊ฐ€ ์žˆ์—ˆ๋‹ค. MASK ๋œ ๋‹จ์–ด๋ฅผ ๋งž์ถ”๋Š” ๊ฒƒ๊ณผ ๋‘ ๊ฐœ์˜ ์—ฐ์†๋œ ๋ฌธ์žฅ์ด ์‹ค์ œ๋กœ ๋ฌธ๋งฅ์ƒ ์ด์–ด์ง€๋Š” ๋ฌธ์žฅ์ธ์ง€๋ฅผ ๋งž์ถ”๋Š” ๊ฒƒ.

๊ทธ๋Ÿฐ๋ฐ, ๋ฒ„ํŠธ ์ดํ›„์˜ ํ›„์† ์—ฐ๊ตฌ๋“ค์—์„œ NSP๊ฐ€ ๋„ˆ๋ฌด ์‰ฌ์› ๊ธฐ ๋•Œ๋ฌธ์— ์‹คํšจ์„ฑ์ด ์—†๋‹ค๋Š” ์ฃผ์žฅ์ด ๋‚˜์™”๊ณ  ์—ฌ๋Ÿฌ ์‹คํ—˜๊ฒฐ๊ณผ์—์„œ NSP๋ฅผ ์ œ๊ฑฐํ•˜๋”๋ผ๋„ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๊ทธ๋ ‡๊ฒŒ ํฐ ์ฐจ์ด๊ฐ€ ์—†๋‹ค๋ผ๊ณ  ์ง€์ ์„ ๋ฐ›์•˜๋‹ค.

์•Œ๋ฒ„ํŠธ์—์„œ๋Š” ์‹คํšจ์„ฑ์ด ์—†๋Š” NSP๋ฅผ ์œ ์˜๋ฏธํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฅผ ํ™•์žฅํ–ˆ๋‹ค. ๊ธฐ์กด์˜ "๋‘ ๋ฌธ์žฅ์ด ์—ฐ์†์ ์œผ๋กœ ๋“ฑ์žฅํ•˜๋Š” ๋ฌธ์žฅ์ธ๊ฐ€" ๋ฅผ ํŒ๋ณ„ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์‹ค์ œ๋กœ ์—ฐ์†์ ์ธ ๋ฌธ์žฅ์Œ์„ ๊ฐ€์ ธ์™€์„œ ์ •์ˆœ๊ณผ ์—ญ์ˆœ์œผ๋กœ ๋ณ€ํ˜•ํ•œ๋’ค ์ด๋ฅผ ํŒ๋ณ„ํ•˜๋Š” ๋ฌธ์ œ๋กœ ๋ณ€๊ฒฝํ–ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ํ•ต์‹ฌ์€, Negative Sampling์„ ๋™์ผ๋ฌธ์„œ์˜ ์ธ์ ‘๋ฌธ์žฅ์—์„œ ๋ฝ‘์•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๊ธฐ์กด์˜ bert์—์„œ์˜ NSP๋Š” False์˜ ๋ฌธ์žฅ์Œ์˜ ๊ฒฝ์šฐ ๋‘ ๋ฌธ์žฅ์—์„œ ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๋“ค์ด ๋งค์šฐ ์ƒ์ดํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•˜๋‹ค. ๋ฐ˜๋ฉด True์˜ ๋ฌธ์žฅ์Œ์˜ ๊ฒฝ์šฐ์—๋Š” ๋‘ ๋ฌธ์žฅ์—์„œ ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๋“ค์ด ๊ฒน์น  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•˜๋‹ค. ๊ทธ๋ ‡๋‹ค๋ณด๋‹ˆ NSP๋ฅผ ํ•  ๋•Œ ๋ชจ๋ธ์€ ๊ณ ์ฐจ์›์ ์ธ ํŠน์ง•์— ๋Œ€ํ•ด ๋ถ„์„ํ•˜๊ธฐ ๋ณด๋‹ค๋Š” ๋™์ผํ•œ ๋‹จ์–ด๋“ค์˜ ๋“ฑ์žฅ ํšŸ์ˆ˜ ์ •๋„์˜ ์ €์ฐจ์›์ ์ธ ํŠน์ง•์„ ์‚ฌ์šฉํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•˜๊ณ  ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— NSP task๋ฅผ ์ œ๊ฑฐํ•˜๋”๋ผ๋„ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์˜ ํฐ ์ฐจ์ด๊ฐ€ ์—†์—ˆ๋˜ ๊ฒƒ. ๊ทธ๋Ÿฌ๋‚˜ SOP๋Š” ๊ณตํ†ต ๋‹จ์–ด์˜ ๋“ฑ์žฅ ํšŸ์ˆ˜๋งŒ์œผ๋กœ๋Š” ์ด task๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ ์ข€ ๋” ๊ณ ์ฐจ์›์ ์ธ ๋ฌธ์ œ๋กœ ๋ณ€๊ฒฝ๋˜์—ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  • ๊ธฐ์กด์˜ None๊ณผ NSP๋Š” ์„ฑ๋Šฅ์ฐจ์ด๊ฐ€ ๋ณ„๋กœ ์—†๊ฑฐ๋‚˜ ์˜คํžˆ๋ ค None์ด ๋” ๋†’์€ ์ƒํ™ฉ์ด ๋ฐœ์ƒํ–ˆ๋Š”๋ฐ, SOP๋Š” ๋ชจ๋“  ๊ฒฝ์šฐ์—์„œ ์„ฑ๋Šฅ์ด ๋†’์€ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

  • ๋ณ€์ข… ๋ชจ๋ธ๋ณด๋‹ค ์•Œ๋ฒ„ํŠธ์˜ ์„ฑ๋Šฅ์ด ๋†’์€ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

  • ๋˜, ๋ชจ๋ธ์˜ ํฌ๊ธฐ๊ฐ€ ์ปค์ง€๋ฉด ์„ฑ๋Šฅ์ด ๋” ๋†’์•„์ง„๋‹ค.

ELECTRA: Efficiently Learning an Encoder that Classifies Token Replacements Accurately

๋ฒ„ํŠธ๋‚˜ GPT์™€๋Š” ๋‹ค๋ฅธ ํ˜•ํƒœ๋กœ pretrain ํ•œ ๋ชจ๋ธ์ด๋‹ค. ์‹ค์ œ๋กœ Bert์—์„œ ์‚ฌ์šฉํ•œ MASK ๋˜๋Š” GPT์—์„œ ํ™œ์šฉํ•œ Standardํ•œ Language ๋ชจ๋ธ์—์„œ ํ•œ๋ฐœ์ง ๋” ๋‚˜์•„๊ฐ”๋‹ค. ๊ธฐ์กด์— mask๋œ ๋‹จ์–ด๋ฅผ ๋‹ค์‹œ ์˜ˆ์ธกํ•˜๋Š” MLM ์„ ๋‘๊ณ  ์ด ์˜ˆ์ธกํ•œ ๋‹จ์–ด์— ๋Œ€ํ•ด ์ด ๋‹จ์–ด๊ฐ€ ์‹ค์ œ๋กœ ๋ฌธ์žฅ์— ์žˆ๋˜ ๋‹จ์–ด์ธ์ง€ ๋˜๋Š” ์˜ˆ์ธก๋œ ๋‹จ์–ด์ธ์ง€๋ฅผ ๊ตฌ๋ณ„ํ•˜๊ฒŒ ๋˜๋Š” ๊ตฌ๋ถ„์ž๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ด ๋ชจ๋ธ์€, ๋‘๊ฐ€์ง€ ๋ชจ๋ธ์ด ์ ๋Œ€์  ๊ด€๊ณ„๋ฅผ ์ด๋ฃจ๋Š” ์ƒํƒœ๋กœ ํ•™์Šต์ด ์ง„ํ–‰์ด๋œ๋‹ค. ์ด idea๋Š” ๊ธฐ์กด์˜ Generative adversarial network์—์„œ ์ฐฉ์•ˆํ•œ ๊ฒƒ. Ground Truth๋ฅผ ์•Œ๊ณ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šตํ•˜๊ธฐ ์‰ฝ๊ณ  ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ ๋ชจ๋ธ์„ ๊ณ ๋„ํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.

์—ฌ๊ธฐ์„œ ํŠน์ง•์€ generator๊ฐ€ ์•„๋‹Œ, replaced์™€ original์„ ํŒ๋ณ„ํ•˜๋Š” Discriminator๋ฅผ pretrained ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค

  • ๊ธฐ์กด ๋ฒ„ํŠธ๋ชจ๋ธ๋ณด๋‹ค ๋™์ผํ•œ ํ•™์Šต๋Ÿ‰์— ๋Œ€ํ•ด ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

Light-weight Models

๊ธฐ์กด์˜ ๋ชจ๋ธ๋“ค์€ self-attention์„ ์ ์  ๋งŽ์ด ์Œ“์œผ๋ฉด์„œ ์„ฑ๋Šฅ์„ ์ฆ๊ฐ€์‹œ์ผฐ๊ณ  ๊ฒฝ๋Ÿ‰ํ™” ๋ชจ๋ธ์˜ ์—ฐ๊ตฌ ์ถ”์„ธ๋Š” ์ด๋Ÿฌํ•œ ํฐ size์˜ ๋ชจ๋ธ์ด ๊ฐ€์ง€๋˜ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ•œ ์œ ์ง€ํ•˜๋ฉด์„œ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ณ  ๊ณ„์‚ฐ์†๋„๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•˜๋Š” ๊ฒƒ์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค.

๊ทธ๋Ÿฌ๋ฏ€๋กœ, ํด๋ผ์šฐ๋“œ๋‚˜ ๊ณ ์„ฑ๋Šฅ์˜ GPU๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ๋ชจ๋ฐ”์ผ ํฐ์—์„œ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

๊ฒฝ๋Ÿ‰ํ™”ํ•˜๋Š” ๋ฐฉ์‹์€ ๋‹ค์–‘ํ•˜๊ฒŒ ์กด์žฌํ•˜๋ฏธ๋‚˜ ์—ฌ๊ธฐ์„œ๋Š” Distilation์ด๋ผ๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค.

DistillBERT

Transformer์˜ ๊ตฌํ˜„์ฒด๋ฅผ ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ huggingface ๋ผ๋Š” ํšŒ์‚ฌ์—์„œ ๋ฐœํ‘œํ•œ ๋ชจ๋ธ์ด๋‹ค. ์—ฌ๊ธฐ์—๋Š” Teacher๋ชจ๋ธ๊ณผ Student ๋ชจ๋ธ์ด ์žˆ๋‹ค. Student๋ชจ๋ธ์€ Teacher๋ชจ๋ธ๋ณด๋‹ค ๋ ˆ์ด์–ด์˜ ์ˆ˜๋‚˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ์ ์€ ๋ชจ๋ธ์ด๋‹ค. ์ด Student ๋ชจ๋ธ์ด ๊ฒฝ๋Ÿ‰ํ™” ๋ชจ๋ธ์— ์ดˆ์ ์„ ๋งž์ถ˜ ๋ชจ๋ธ์ด๋‹ค.

Teacher ๋ชจ๋ธ์ด ๊ฐ ์‹œํ€€์Šค์— ๋Œ€ํ•ด ๋‹ค์Œ์— ์˜ฌ ๋‹จ์–ด๋กœ ์˜ˆ์ธกํ•œ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ์กด์žฌํ•  ๊ฒƒ์ธ๋ฐ Student ๋ชจ๋ธ์€ ์ด ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ์ตœ๋Œ€ํ•œ ๋ชจ์‚ฌํ•˜๋Š”๊ฒƒ์ด ๋ชฉํ‘œ์ด๋‹ค. ๊ทธ๋ž˜์„œ Student ๋ชจ๋ธ์˜ Ground Truth๋Š” Teacher ๋ชจ๋ธ์˜ ํ™•๋ฅ ๋ถ„ํฌ์ด๋‹ค. knowledge distillation ์ด๋ผ๋Š” ํ…Œํฌ๋‹‰์„ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ

TinyBERT

DistillBERT์ฒ˜๋Ÿผ knowledge distillation ํ…Œํฌ๋‹‰์„ ์‚ฌ์šฉํ•˜์ง€๋งŒ ์ฐจ์ด์ ์ด ์žˆ๋‹ค๋ฉด Distil. ์˜ ๊ฒฝ์šฐ์—๋Š” ์ตœ์ข… ๊ฒฐ๊ณผ๋ฌผ์„ ๋ชจ์‚ฌํ•˜๋ ค๊ณ  ํ•œ๋‹ค๋ฉด TinyBERT๋Š” ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ๊นŒ์ง€๋„ ๋ชจ๋‘ ๋ชจ์‚ฌํ•˜๋ ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฐ layer๊ฐ„์˜ hidden state์™€ attention parameter๊นŒ์ง€ ๋™์ผํ•˜๊ฒŒ ํ•˜๋ ค๊ณ  ํ•˜๋ฉฐ ์ด ๋•Œ MSE๋ฅผ ์ด์šฉํ•œ๋‹ค.

ํ•˜์ง€๋งŒ, Student ๋ชจ๋ธ์˜ attention parameter๋Š” Teacher ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๋™์ผํ•ด์ง€๊ธฐ ์–ด๋ ต๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์ฐจ์›์ˆ˜๊ฐ€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ๋™์ผํ•˜๊ฒŒ ํ•œ๋‹ค๋Š” ๊ฐœ๋…์„ ์ •๋ฆฝํ•˜์ง€ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋ฅผ ์œ„ํ•ด Teacher ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•œ ๊ฐœ์˜ FC๋ฅผ ์ง€๋‚˜์„œ ์ถ•์†Œ๋œ ์ฐจ์›์˜ ๋ฒกํ„ฐ๊ฐ’์„ ๊ฐ–๋„๋ก ํ•˜๊ณ  Student๊ฐ€ ์ด ์ถ•์†Œ๋œ ์ฐจ์›๊ณผ ๋™์ผํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ ๋ถ€๋ถ„์œผ๋กœ ํ•˜๋ฉด์„œ mismatch๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค.

  • ์ด FC ์—ญ์‹œ ํ•™์Šตํ•ด์•ผํ•œ๋‹ค.

Fusing Knowledge Graph into Language Model

์ตœ์‹  ์—ฐ๊ตฌ ํ๋ฆ„์€ ๊ธฐ์กด์˜ pretraining model๊ณผ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋ผ ๋ถˆ๋ฆฌ๋Š” knowledge graph๋ผ๋Š” ์™ธ๋ถ€ ์ •๋ณด๋ฅผ ์ž˜ ๊ฒฐํ•ฉํ•˜๋Š” ํ˜•ํƒœ์ด๋‹ค. ๋ฒ„ํŠธ๊ฐ€ ์–ธ์–ด์  ํŠน์„ฑ์„ ์ž˜ ์ดํ•ดํ•˜๊ณ  ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ๋ถ„์„์ด ๋งŽ์ด ์ง„ํ–‰๋˜์—ˆ๋Š”๋ฐ, ๋ฒ„ํŠธ๋Š” ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์—์„œ๋Š” ๋ฌธ๋งฅ์„ ์ž˜ ํŒŒ์•…ํ•˜๊ณ  ๋‹จ์–ด๋“ค๊ฐ„์˜ ์œ ์‚ฌ๋„๋‚˜ ๊ด€๊ณ„๋ฅผ ์ž˜ ํŒŒ์•…ํ–ˆ์ง€๋งŒ ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์— ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š์€ ์ถ”๊ฐ€์ ์ธ ์ •๋ณด๊ฐ€ ํ•„์š”ํ•œ ๊ฒฝ์šฐ์—๋Š” ๊ทธ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ๋Šฅ๋ ฅ์€ ์ž˜ ๋ณด์—ฌ์ฃผ์ง€ ๋ชปํ–ˆ๋‹ค.

๋งŒ์•ฝ, ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์ด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค๊ณ  ํ•˜์ž

๋•…์„ ํŒ ๋‹ค

ํ•œ ๊ฒฝ์šฐ๋Š” ๊ฝƒ์„ ์‹ฌ๊ธฐ ์œ„ํ•ด ํŒ ๊ฒƒ์ด๊ณ  ๋˜ ํ•œ ๊ฒฝ์šฐ๋Š” ์ง‘์„ ์ง“๊ธฐ ์œ„ํ•ด ํŒ ๋‹ค๊ณ  ํ•˜์ž. "๋•…์„ ๋ฌด์—‡์œผ๋กœ ํŒ ์„๊นŒ?" ๋ผ๋Š” ์งˆ๋ฌธ์„ ํ–ˆ์„ ๋•Œ ์‚ฌ๋žŒ์€ ๊ฝƒ์˜ ๊ฒฝ์šฐ๋Š” "๋ถ€์‚ฝ", ์ง‘์˜ ๊ฒฝ์šฐ๋Š” "์ค‘์žฅ๋น„" ๋“ฑ์œผ๋กœ ๋Œ€๋‹ต์„ ํ•  ์ˆ˜ ์žˆ๋Š” ์ด์œ ๋Š” ๋ฌธ์žฅ์—์„œ ์–ป๋Š” ์ •๋ณด๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ด๋ฏธ ์•Œ๊ณ ์žˆ๋Š” ์™ธ๋ถ€ ์ •๋ณด(=์ƒ์‹)์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ธ๊ณต์ง€๋Šฅ์—์„œ์˜ ์ƒ์‹์€ Knowledge Graph๋ผ๋Š” ํ˜•ํƒœ๋กœ ํ‘œํ˜„๋œ๋‹ค.

๋ฒ„ํŠธ๋Š” ์™ธ๋ถ€์ง€์‹์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ๋Š” ์ทจ์•ฝ์ ์„ ๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๋Ÿฌํ•œ ๋ถ€๋ถ„์„ Knowledge Graph๋กœ ์ž˜ ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ BERT์™€ ์ž˜ ๊ฒฐํ•ฉํ•ด์„œ ๋ฌธ์ œ๋“ค์„ ์ข€ ๋” ์ž˜ ํ’€๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋œ๋‹ค.