๐Ÿšดโ€โ™‚๏ธ
TIL
  • MAIN
  • : TIL?
  • : WIL
  • : Plan
  • : Retrospective
    • 21Y
      • Wait a moment!
      • 9M 2W
      • 9M1W
      • 8M4W
      • 8M3W
      • 8M2W
      • 8M1W
      • 7M4W
      • 7M3W
      • 7M2W
      • 7M1W
      • 6M5W
      • 1H
    • ์ƒˆ์‚ฌ๋žŒ ๋˜๊ธฐ ํ”„๋กœ์ ํŠธ
      • 2ํšŒ์ฐจ
      • 1ํšŒ์ฐจ
  • TIL : ML
    • Paper Analysis
      • BERT
      • Transformer
    • Boostcamp 2st
      • [S]Data Viz
        • (4-3) Seaborn ์‹ฌํ™”
        • (4-2) Seaborn ๊ธฐ์ดˆ
        • (4-1) Seaborn ์†Œ๊ฐœ
        • (3-4) More Tips
        • (3-3) Facet ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-2) Color ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-1) Text ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-3) Scatter Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-2) Line Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-1) Bar Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (1-3) Python๊ณผ Matplotlib
        • (1-2) ์‹œ๊ฐํ™”์˜ ์š”์†Œ
        • (1-1) Welcome to Visualization (OT)
      • [P]MRC
        • (2๊ฐ•) Extraction-based MRC
        • (1๊ฐ•) MRC Intro & Python Basics
      • [P]KLUE
        • (5๊ฐ•) BERT ๊ธฐ๋ฐ˜ ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
        • (4๊ฐ•) ํ•œ๊ตญ์–ด BERT ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต
        • [NLP] ๋ฌธ์žฅ ๋‚ด ๊ฐœ์ฒด๊ฐ„ ๊ด€๊ณ„ ์ถ”์ถœ
        • (3๊ฐ•) BERT ์–ธ์–ด๋ชจ๋ธ ์†Œ๊ฐœ
        • (2๊ฐ•) ์ž์—ฐ์–ด์˜ ์ „์ฒ˜๋ฆฌ
        • (1๊ฐ•) ์ธ๊ณต์ง€๋Šฅ๊ณผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
      • [U]Stage-CV
      • [U]Stage-NLP
        • 7W Retrospective
        • (10๊ฐ•) Advanced Self-supervised Pre-training Models
        • (09๊ฐ•) Self-supervised Pre-training Models
        • (08๊ฐ•) Transformer (2)
        • (07๊ฐ•) Transformer (1)
        • 6W Retrospective
        • (06๊ฐ•) Beam Search and BLEU score
        • (05๊ฐ•) Sequence to Sequence with Attention
        • (04๊ฐ•) LSTM and GRU
        • (03๊ฐ•) Recurrent Neural Network and Language Modeling
        • (02๊ฐ•) Word Embedding
        • (01๊ฐ•) Intro to NLP, Bag-of-Words
        • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Preprocessing for NMT Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Subword-level Language Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ2] RNN-based Language Model
        • [์„ ํƒ ๊ณผ์ œ] BERT Fine-tuning with transformers
        • [ํ•„์ˆ˜ ๊ณผ์ œ] Data Preprocessing
      • Mask Wear Image Classification
        • 5W Retrospective
        • Report_Level1_6
        • Performance | Review
        • DAY 11 : HardVoting | MultiLabelClassification
        • DAY 10 : Cutmix
        • DAY 9 : Loss Function
        • DAY 8 : Baseline
        • DAY 7 : Class Imbalance | Stratification
        • DAY 6 : Error Fix
        • DAY 5 : Facenet | Save
        • DAY 4 : VIT | F1_Loss | LrScheduler
        • DAY 3 : DataSet/Lodaer | EfficientNet
        • DAY 2 : Labeling
        • DAY 1 : EDA
        • 2_EDA Analysis
      • [P]Stage-1
        • 4W Retrospective
        • (10๊ฐ•) Experiment Toolkits & Tips
        • (9๊ฐ•) Ensemble
        • (8๊ฐ•) Training & Inference 2
        • (7๊ฐ•) Training & Inference 1
        • (6๊ฐ•) Model 2
        • (5๊ฐ•) Model 1
        • (4๊ฐ•) Data Generation
        • (3๊ฐ•) Dataset
        • (2๊ฐ•) Image Classification & EDA
        • (1๊ฐ•) Competition with AI Stages!
      • [U]Stage-3
        • 3W Retrospective
        • PyTorch
          • (10๊ฐ•) PyTorch Troubleshooting
          • (09๊ฐ•) Hyperparameter Tuning
          • (08๊ฐ•) Multi-GPU ํ•™์Šต
          • (07๊ฐ•) Monitoring tools for PyTorch
          • (06๊ฐ•) ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
          • (05๊ฐ•) Dataset & Dataloader
          • (04๊ฐ•) AutoGrad & Optimizer
          • (03๊ฐ•) PyTorch ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
          • (02๊ฐ•) PyTorch Basics
          • (01๊ฐ•) Introduction to PyTorch
      • [U]Stage-2
        • 2W Retrospective
        • DL Basic
          • (10๊ฐ•) Generative Models 2
          • (09๊ฐ•) Generative Models 1
          • (08๊ฐ•) Sequential Models - Transformer
          • (07๊ฐ•) Sequential Models - RNN
          • (06๊ฐ•) Computer Vision Applications
          • (05๊ฐ•) Modern CNN - 1x1 convolution์˜ ์ค‘์š”์„ฑ
          • (04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?
          • (03๊ฐ•) Optimization
          • (02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)
          • (01๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ์šฉ์–ด ์„ค๋ช… - Historical Review
        • Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Multi-headed Attention Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] LSTM Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] CNN Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Optimization Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] MLP Assignment
      • [U]Stage-1
        • 1W Retrospective
        • AI Math
          • (AI Math 10๊ฐ•) RNN ์ฒซ๊ฑธ์Œ
          • (AI Math 9๊ฐ•) CNN ์ฒซ๊ฑธ์Œ
          • (AI Math 8๊ฐ•) ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 7๊ฐ•) ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 6๊ฐ•) ํ™•๋ฅ ๋ก  ๋ง›๋ณด๊ธฐ
          • (AI Math 5๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต๋ฐฉ๋ฒ• ์ดํ•ดํ•˜๊ธฐ
          • (AI Math 4๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ๋งค์šด๋ง›
          • (AI Math 3๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ์ˆœํ•œ๋ง›
          • (AI Math 2๊ฐ•) ํ–‰๋ ฌ์ด ๋ญ์˜ˆ์š”?
          • (AI Math 1๊ฐ•) ๋ฒกํ„ฐ๊ฐ€ ๋ญ์˜ˆ์š”?
        • Python
          • (Python 7-2๊ฐ•) pandas II
          • (Python 7-1๊ฐ•) pandas I
          • (Python 6๊ฐ•) numpy
          • (Python 5-2๊ฐ•) Python data handling
          • (Python 5-1๊ฐ•) File / Exception / Log Handling
          • (Python 4-2๊ฐ•) Module and Project
          • (Python 4-1๊ฐ•) Python Object Oriented Programming
          • (Python 3-2๊ฐ•) Pythonic code
          • (Python 3-1๊ฐ•) Python Data Structure
          • (Python 2-4๊ฐ•) String and advanced function concept
          • (Python 2-3๊ฐ•) Conditionals and Loops
          • (Python 2-2๊ฐ•) Function and Console I/O
          • (Python 2-1๊ฐ•) Variables
          • (Python 1-3๊ฐ•) ํŒŒ์ด์ฌ ์ฝ”๋”ฉ ํ™˜๊ฒฝ
          • (Python 1-2๊ฐ•) ํŒŒ์ด์ฌ ๊ฐœ์š”
          • (Python 1-1๊ฐ•) Basic computer class for newbies
        • Assignment
          • [์„ ํƒ ๊ณผ์ œ 3] Maximum Likelihood Estimate
          • [์„ ํƒ ๊ณผ์ œ 2] Backpropagation
          • [์„ ํƒ ๊ณผ์ œ 1] Gradient Descent
          • [ํ•„์ˆ˜ ๊ณผ์ œ 5] Morsecode
          • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Baseball
          • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Text Processing 2
          • [ํ•„์ˆ˜ ๊ณผ์ œ 2] Text Processing 1
          • [ํ•„์ˆ˜ ๊ณผ์ œ 1] Basic Math
    • ๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ
      • ์ข…ํ•ฉ ์‹ค์Šต 2 - ์บ๊ธ€ Plant Pathology(๋‚˜๋ฌด์žŽ ๋ณ‘ ์ง„๋‹จ) ๊ฒฝ์—ฐ ๋Œ€ํšŒ
      • ์ข…ํ•ฉ ์‹ค์Šต 1 - 120์ข…์˜ Dog Breed Identification ๋ชจ๋ธ ์ตœ์ ํ™”
      • ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์˜ ๋ฏธ์„ธ ์กฐ์ • ํ•™์Šต๊ณผ ๋‹ค์–‘ํ•œ Learning Rate Scheduler์˜ ์ ์šฉ
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - ResNet ์ƒ์„ธ์™€ EfficientNet ๊ฐœ์š”
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - AlexNet, VGGNet, GoogLeNet
      • Albumentation์„ ์ด์šฉํ•œ Augmentation๊ธฐ๋ฒ•๊ณผ Keras Sequence ํ™œ์šฉํ•˜๊ธฐ
      • ์‚ฌ์ „ ํ›ˆ๋ จ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ๊ณผ Keras Generator ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ดํ•ด
      • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์ดํ•ด - Keras ImageDataGenerator ํ™œ์šฉ
      • CNN ๋ชจ๋ธ ๊ตฌํ˜„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ณธ ๊ธฐ๋ฒ• ์ ์šฉํ•˜๊ธฐ
    • AI School 1st
    • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
    • ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜
  • TIL : Python & Math
    • Do It! ์žฅ๊ณ +๋ถ€ํŠธ์ŠคํŠธ๋žฉ: ํŒŒ์ด์ฌ ์›น๊ฐœ๋ฐœ์˜ ์ •์„
      • Relations - ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„
      • Relations - ๋‹ค๋Œ€์ผ ๊ด€๊ณ„
      • ํ…œํ”Œ๋ฆฟ ํŒŒ์ผ ๋ชจ๋“ˆํ™” ํ•˜๊ธฐ
      • TDD (Test Driven Development)
      • template tags & ์กฐ๊ฑด๋ฌธ
      • ์ •์  ํŒŒ์ผ(static files) & ๋ฏธ๋””์–ด ํŒŒ์ผ(media files)
      • FBV (Function Based View)์™€ CBV (Class Based View)
      • Django ์ž…๋ฌธํ•˜๊ธฐ
      • ๋ถ€ํŠธ์ŠคํŠธ๋žฉ
      • ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์ดˆ๋‹ค์ง€๊ธฐ (HTML, CSS, JS)
      • ๋“ค์–ด๊ฐ€๊ธฐ + ํ™˜๊ฒฝ์„ค์ •
    • Algorithm
      • Programmers
        • Level1
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์ˆซ์ž ๋ฌธ์ž์—ด๊ณผ ์˜๋‹จ์–ด
          • ์ž์—ฐ์ˆ˜ ๋’ค์ง‘์–ด ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค๊ธฐ
          • ์ •์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐฐ์น˜ํ•˜๊ธฐ
          • ์ •์ˆ˜ ์ œ๊ณฑ๊ทผ ํŒ๋ณ„
          • ์ œ์ผ ์ž‘์€ ์ˆ˜ ์ œ๊ฑฐํ•˜๊ธฐ
          • ์ง์‚ฌ๊ฐํ˜• ๋ณ„์ฐ๊ธฐ
          • ์ง์ˆ˜์™€ ํ™€์ˆ˜
          • ์ฒด์œก๋ณต
          • ์ตœ๋Œ€๊ณต์•ฝ์ˆ˜์™€ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • ์ฝœ๋ผ์ธ  ์ถ”์ธก
          • ํฌ๋ ˆ์ธ ์ธํ˜•๋ฝ‘๊ธฐ ๊ฒŒ์ž„
          • ํ‚คํŒจ๋“œ ๋ˆ„๋ฅด๊ธฐ
          • ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
          • ํฐ์ผ“๋ชฌ
          • ํ•˜์ƒค๋“œ ์ˆ˜
          • ํ•ธ๋“œํฐ ๋ฒˆํ˜ธ ๊ฐ€๋ฆฌ๊ธฐ
          • ํ–‰๋ ฌ์˜ ๋ง์…ˆ
        • Level2
          • ์ˆซ์ž์˜ ํ‘œํ˜„
          • ์ˆœ์œ„ ๊ฒ€์ƒ‰
          • ์ˆ˜์‹ ์ตœ๋Œ€ํ™”
          • ์†Œ์ˆ˜ ์ฐพ๊ธฐ
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์‚ผ๊ฐ ๋‹ฌํŒฝ์ด
          • ๋ฌธ์ž์—ด ์••์ถ•
          • ๋ฉ”๋‰ด ๋ฆฌ๋‰ด์–ผ
          • ๋” ๋งต๊ฒŒ
          • ๋•…๋”ฐ๋จน๊ธฐ
          • ๋ฉ€์ฉกํ•œ ์‚ฌ๊ฐํ˜•
          • ๊ด„ํ˜ธ ํšŒ์ „ํ•˜๊ธฐ
          • ๊ด„ํ˜ธ ๋ณ€ํ™˜
          • ๊ตฌ๋ช…๋ณดํŠธ
          • ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
          • ๋‰ด์Šค ํด๋Ÿฌ์Šคํ„ฐ๋ง
          • ๋‹ค๋ฆฌ๋ฅผ ์ง€๋‚˜๋Š” ํŠธ๋Ÿญ
          • ๋‹ค์Œ ํฐ ์ˆซ์ž
          • ๊ฒŒ์ž„ ๋งต ์ตœ๋‹จ๊ฑฐ๋ฆฌ
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
          • ๊ฐ€์žฅ ํฐ ์ •์‚ฌ๊ฐํ˜• ์ฐพ๊ธฐ
          • H-Index
          • JadenCase ๋ฌธ์ž์—ด ๋งŒ๋“ค๊ธฐ
          • N๊ฐœ์˜ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • N์ง„์ˆ˜ ๊ฒŒ์ž„
          • ๊ฐ€์žฅ ํฐ ์ˆ˜
          • 124 ๋‚˜๋ผ์˜ ์ˆซ์ž
          • 2๊ฐœ ์ดํ•˜๋กœ ๋‹ค๋ฅธ ๋น„ํŠธ
          • [3์ฐจ] ํŒŒ์ผ๋ช… ์ •๋ ฌ
          • [3์ฐจ] ์••์ถ•
          • ์ค„ ์„œ๋Š” ๋ฐฉ๋ฒ•
          • [3์ฐจ] ๋ฐฉ๊ธˆ ๊ทธ๊ณก
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
        • Level3
          • ๋งค์นญ ์ ์ˆ˜
          • ์™ธ๋ฒฝ ์ ๊ฒ€
          • ๊ธฐ์ง€๊ตญ ์„ค์น˜
          • ์ˆซ์ž ๊ฒŒ์ž„
          • 110 ์˜ฎ๊ธฐ๊ธฐ
          • ๊ด‘๊ณ  ์ œ๊ฑฐ
          • ๊ธธ ์ฐพ๊ธฐ ๊ฒŒ์ž„
          • ์…”ํ‹€๋ฒ„์Šค
          • ๋‹จ์†์นด๋ฉ”๋ผ
          • ํ‘œ ํŽธ์ง‘
          • N-Queen
          • ์ง•๊ฒ€๋‹ค๋ฆฌ ๊ฑด๋„ˆ๊ธฐ
          • ์ตœ๊ณ ์˜ ์ง‘ํ•ฉ
          • ํ•ฉ์Šน ํƒ์‹œ ์š”๊ธˆ
          • ๊ฑฐ์Šค๋ฆ„๋ˆ
          • ํ•˜๋…ธ์ด์˜ ํƒ‘
          • ๋ฉ€๋ฆฌ ๋›ฐ๊ธฐ
          • ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ
        • Level4
    • Head First Python
    • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ SQL
    • ๋‹จ ๋‘ ์žฅ์˜ ๋ฌธ์„œ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™” ๋ฝ€๊ฐœ๊ธฐ
    • Linear Algebra(Khan Academy)
    • ์ธ๊ณต์ง€๋Šฅ์„ ์œ„ํ•œ ์„ ํ˜•๋Œ€์ˆ˜
    • Statistics110
  • TIL : etc
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Kubernetes
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Docker
      • 2. ๋„์ปค ์„ค์น˜ ์‹ค์Šต 1 - ํ•™์ŠตํŽธ(์ค€๋น„๋ฌผ/์‹ค์Šต ์œ ํ˜• ์†Œ๊ฐœ)
      • 1. ์ปจํ…Œ์ด๋„ˆ์™€ ๋„์ปค์˜ ์ดํ•ด - ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์“ฐ๋Š”์ด์œ  / ์ผ๋ฐ˜ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ปจํ…Œ์ด๋„ˆํ”„๋กœ๊ทธ๋žจ์˜ ์ฐจ์ด์ 
      • 0. ๋“œ๋””์–ด ์ฐพ์•„์˜จ Docker ๊ฐ•์˜! ์™•์ดˆ๋ณด์—์„œ ๋„์ปค ๋งˆ์Šคํ„ฐ๋กœ - OT
    • CoinTrading
      • [๊ฐ€์ƒ ํ™”ํ ์ž๋™ ๋งค๋งค ํ”„๋กœ๊ทธ๋žจ] ๋ฐฑํ…Œ์ŠคํŒ… : ๊ฐ„๋‹จํ•œ ํ…Œ์ŠคํŒ…
    • Gatsby
      • 01 ๊นƒ๋ถ ํฌ๊ธฐ ์„ ์–ธ
  • TIL : Project
    • Mask Wear Image Classification
    • Project. GARIGO
  • 2021 TIL
    • CHANGED
    • JUN
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Thu
      • 2 Wed
      • 1 Tue
    • MAY
      • 31 Mon
      • 30 Sun
      • 29 Sat
      • 28 Fri
      • 27 Thu
      • 26 Wed
      • 25 Tue
      • 24 Mon
      • 23 Sun
      • 22 Sat
      • 21 Fri
      • 20 Thu
      • 19 Wed
      • 18 Tue
      • 17 Mon
      • 16 Sun
      • 15 Sat
      • 14 Fri
      • 13 Thu
      • 12 Wed
      • 11 Tue
      • 10 Mon
      • 9 Sun
      • 8 Sat
      • 7 Fri
      • 6 Thu
      • 5 Wed
      • 4 Tue
      • 3 Mon
      • 2 Sun
      • 1 Sat
    • APR
      • 30 Fri
      • 29 Thu
      • 28 Wed
      • 27 Tue
      • 26 Mon
      • 25 Sun
      • 24 Sat
      • 23 Fri
      • 22 Thu
      • 21 Wed
      • 20 Tue
      • 19 Mon
      • 18 Sun
      • 17 Sat
      • 16 Fri
      • 15 Thu
      • 14 Wed
      • 13 Tue
      • 12 Mon
      • 11 Sun
      • 10 Sat
      • 9 Fri
      • 8 Thu
      • 7 Wed
      • 6 Tue
      • 5 Mon
      • 4 Sun
      • 3 Sat
      • 2 Fri
      • 1 Thu
    • MAR
      • 31 Wed
      • 30 Tue
      • 29 Mon
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • FEB
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • JAN
      • 31 Sun
      • 30 Sat
      • 29 Fri
      • 28 Thu
      • 27 Wed
      • 26 Tue
      • 25 Mon
      • 24 Sun
      • 23 Sat
      • 22 Fri
      • 21 Thu
      • 20 Wed
      • 19 Tue
      • 18 Mon
      • 17 Sun
      • 16 Sat
      • 15 Fri
      • 14 Thu
      • 13 Wed
      • 12 Tue
      • 11 Mon
      • 10 Sun
      • 9 Sat
      • 8 Fri
      • 7 Thu
      • 6 Wed
      • 5 Tue
      • 4 Mon
      • 3 Sun
      • 2 Sat
      • 1 Fri
  • 2020 TIL
    • DEC
      • 31 Thu
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Tue
      • 2 Wed
      • 1 Tue
    • NOV
      • 30 Mon
Powered by GitBook
On this page
  • 2. Transformer(cont'd)
  • Multi-Head Attention
  • Block-Based Model
  • Positional Encoding
  • Warm-up Learning Rate Scheduler
  • Encoder Self-Attention Visualization
  • Decoder
  • Masked Self-Attention

Was this helpful?

  1. TIL : ML
  2. Boostcamp 2st
  3. [U]Stage-NLP

(08๊ฐ•) Transformer (2)

210913

Previous(09๊ฐ•) Self-supervised Pre-training ModelsNext(07๊ฐ•) Transformer (1)

Last updated 3 years ago

Was this helpful?

2. Transformer(cont'd)

cont'd ๋Š” continued์˜ ์•ฝ์ž์ด๋‹ค. ์ด์ „๊ณผ ์ด์–ด์ง„๋‹ค๋Š” ์˜๋ฏธ

Multi-Head Attention

single attention ๋ฐฉ์‹์„ ๋ณ‘๋ ฌ์ ์œผ๋กœ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๋ฐฉ๋ฒ•์€ ๋˜‘๊ฐ™์œผ๋ฉฐ ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ ๊ฐ€์ง€ output์ด ๋‚˜์˜จ๋‹ค. ๋ฐฉ๋ฒ•๋ก ์ ์œผ๋กœ๋Š” ์•™์ƒ๋ธ”์˜ ๋А๋‚Œ์œผ๋กœ๋„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ตœ์ข… output์€ concatํ•˜๊ฒŒ๋œ๋‹ค.

์™œ ํ•˜๋Š”๊ฑธ๊นŒ? ๋‹จ์ˆœํžˆ ๋ชจ๋ธ์„ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์กฐ์ž‘ํ•˜๋ฉด์„œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ? ๋„ ๋งž๋Š” ๋ง์ด์ง€๋งŒ ์ด๋ฅผ ์ข€ ๋” ์ž์„ธํ•˜๊ฒŒ ์ด์•ผ๊ธฐ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฐ ์‹œํ€€์Šค๋งˆ๋‹ค ๋ณ‘๋ ฌ์ ์œผ๋กœ ์„œ๋กœ ๋‹ค๋ฅธ ์ •๋ณด๋ฅผ ์–ป์–ด์„œ ํ•ด๋‹น ์‹œํ€€์Šค์— ๋Œ€ํ•ด ํ’๋ถ€ํ•œ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  output์„ ๋‚ด๊ธฐ์œ„ํ•จ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

๋งŒ์•ฝ 8๋ฒˆ์˜ attention์„ ์‹คํ–‰ํ–ˆ๋‹ค๋ฉด ๊ฐ๊ฐ์˜ ๊ฒฐ๊ณผ๋ฅผ concatํ•˜๋ฏ€๋กœ ๊ฐ€๋กœ๋กœ ๋งค์šฐ ๊ธธ์–ด์ง„ ์ตœ์ข… output์„ ์–ป๊ฒŒ๋œ๋‹ค.

์—ฌ๊ธฐ์—, ์„ ํ˜• layer๋ฅผ ์ ์šฉํ•ด์„œ ์–ด๋–ค W์™€์˜ ๊ณฑ์„ ํ†ตํ•ด ์ตœ์ข…์ ์œผ๋กœ Z ๋ฒกํ„ฐ๋ฅผ ์–ป๊ฒŒ๋œ๋‹ค.

Multi head Attention์—์„œ์˜ ๊ณ„์‚ฐ๋Ÿ‰์„ ์•Œ์•„๋ณด์ž.

Complexity per Layer

Self-Attention์€ RNN ๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ์€ ์—ฐ์‚ฐ๋Ÿ‰๊ณผ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

  • d๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ผ์„œ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ ๋น„ํ•ด n์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋˜๋Š” ๋ถ€๋ถ„์ด๋ผ์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌ๋ฉด ํด์ˆ˜๋ก ๋งŽ์€ ์—ฐ์‚ฐ๋Ÿ‰์„ ํ•„์š”๋กœ ํ•œ๋‹ค.

Sequential Operations

Self-Attention์€ ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉด ํ•œ๋ฒˆ์— ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, RNN์€ ์ด์ „ step์ด ๋๋‚˜์•ผ ๋‹ค์Œ step์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ณ‘๋ ฌํ™”ํ•  ์ˆ˜ ์—†๋‹ค. ๊ทธ๋ž˜์„œ RNN์€ Forward & Backward Propagation์€ sequence์˜ ๊ธธ์ด๋งŒํผ ์‹œ๊ฐ„์ด ์†Œ์š”๋œ๋‹ค.

  • ์‹ค์ œ๋กœ ์ž…๋ ฅ์€ ํ•œ๋ฒˆ์— ์ฃผ์–ด์ง€๋ฏ€๋กœ ํ•œ๊บผ๋ฒˆ์— ์ฒ˜๋ฆฌ๋˜๋Š” ๋“ฏ์ด ๋ณด์ด์ง€๋งŒ ์œ„์™€ ๊ฐ™์€ ์ด์œ ๋•Œ๋ฌธ์— ์ ˆ๋Œ€ ๋ณ‘๋ ฌํ™”๊ฐ€ ์ด๋ฃจ์–ด์งˆ ์ˆ˜ ์—†๋‹ค.

์ •๋ฆฌํ•˜๋ฉด, RNN์€ ์—ฐ์‚ฐ๋Ÿ‰์ด ์ž‘์ง€๋งŒ ์†๋„๋Š” ๋А๋ฆฌ๊ณ , Self-Attention์€ ์—ฐ์‚ฐ๋Ÿ‰์ด ํฐ๋Œ€์‹  ์†๋„๋Š” ๋น ๋ฅด๋‹ค.

Maximum Path Length

Long Term Dependency์™€ ๊ด€๋ จ์ด ์žˆ๋Š” ๋ถ€๋ถ„์ด๋‹ค.

RNN์—์„œ๋Š” ๋งˆ์ง€๋ง‰ step์—์„œ ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด์˜ ์ •๋ณด๋ฅผ ์–ป๊ธฐ์œ„ํ•ด n๊ฐœ์˜ ๋ ˆ์ด์–ด๋ฅผ ์ง€๋‚˜์™€์•ผ ํ•˜์ง€๋งŒ, T.F ์—์„œ๋Š” time step ๊ณผ ๊ด€๋ จ์—†์ด attention์„ ์ด์šฉํ•ด ์ง์ ‘์ ์œผ๋กœ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

Block-Based Model

  • ์•„๋ž˜์—์„œ ๋ถ€ํ„ฐ ์„ธ ๊ฐˆ๋ž˜๋กœ ๋‚˜๋ˆ„์–ด์ง€๋Š”๋ฐ ๋ชจ๋‘ K, Q, V ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด๋“ค์€ ๊ฐœ๋ณ„์ ์ธ head attention์—์„œ ๊ฐ๊ฐ์˜ Wk, Wq, Wv๋ฅผ ์–ป๊ฒŒ๋˜๋ฉฐ ์ด๋ฅผ ๋ชจ๋‘ concatํ•ด์„œ output์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ ์ฒ˜์Œ๋ณด๋Š” ๋ถ€๋ถ„์ด ์žˆ๋‹ค. ๋ฐ”๋กœ Add & Norm

  • Residual ์—ฐ์‚ฐ์ธ Add๊ฐ€ ์ˆ˜ํ–‰๋˜๊ณ  Layer Normalization์ด ์ˆ˜ํ–‰๋œ๋‹ค.

  • ์ดํ›„, Feed Forward๋ฅผ ํ†ต๊ณผํ•˜๊ณ  ๋˜ ์ˆ˜ํ–‰์ด ๋œ๋‹ค.

Add

  • ๊นŠ์€ ๋ ˆ์ด์–ด์—์„œ Gradient Vanishing ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ํ•™์Šต์„ ์•ˆ์ •ํ™”ํ•˜์—ฌ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ฒŒํ•˜๋Š” ๊ธฐ์ˆ ์ด๋‹ค.

  • ๋งŒ์•ฝ "I study math" ๋ผ๋Š” ๋ฌธ์žฅ์—์„œ "I" ์— ํ•ด๋‹นํ•˜๋Š” ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ [1, -4] ์ด๊ณ  head attention์„ ํ†ต๊ณผํ•œ ์ธ์ฝ”๋”ฉ ๋ฒกํ„ฐ๊ฐ€ [2, 3] ์ด๋ผ๊ณ  ํ•˜์ž. ์ด ๋•Œ add๋ฅผ ์ ์šฉํ•˜๋ฉด ๋‘ ๋ฒกํ„ฐ๋ฅผ ๋”ํ•ด์„œ [3, -1] ์„ ์–ป๊ฒŒ๋˜๊ณ  ์ด๋ฅผ "I"์˜ ์ตœ์ข… ์ธ์ฝ”๋”ฉ ๋ฒกํ„ฐ๋กœ ๊ฒฐ์ •ํ•œ๋‹ค.

๋ช‡ ๊ฐ€์ง€ Normalization์ด ์กด์žฌํ•˜๋Š”๋ฐ ์ด์ค‘์—์„œ Batch Norm๊ณผ Layer Norm ์•Œ์•„๋ณด์ž.

Batch Normalization

  • ๊ฐ ๋ฐฐ์น˜์˜ ๊ฐ’์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ตฌํ•˜๊ณ  ์ด๋ฅผ ์ด์šฉํ•ด ๊ฐ ๋ฐฐ์น˜๋ฅผ ํ‰๊ท ์ด 0์ด๊ณ  ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 1์ธ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋„๋ก ์ •๊ทœํ™”ํ•ด์ค€๋‹ค.

  • ์ดํ›„ Affine Transformation์„ ์ ์šฉํ•ด์„œ ์›ํ•˜๋Š” ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์œผ๋กœ ๋งž์ถฐ์ค€๋‹ค.

Layer Normalization

  • Batch Norm. ์€ ํ•œ batch์— ๋Œ€ํ•ด์„œ(=๊ฐ€๋กœ๋กœ) ์ •๊ทœํ™”ํ–ˆ๋‹ค๋ฉด Layer Norm.์€ ํ•œ Feature์— ๋Œ€ํ•ด์„œ(=์„ธ๋กœ๋กœ) ์ •๊ทœํ™”ํ•œ๋‹ค.

Positional Encoding

๋งŒ์•ฝ์— ์šฐ๋ฆฌ๊ฐ€ ์ง€๊ธˆ๊นŒ์ง€ ๋ณธ ๋ชจ๋ธ์—์„œ "I love you" ์™€ "love I you"๋ฅผ ์ž…๋ ฅํ–ˆ์„ ๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ํ•ญ์ƒ ๋˜‘๊ฐ™์„ ๊ฒƒ์ด๋‹ค. ์™œ๋ƒํ•˜๋ฉด Transformer๋Š” time step์„ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ์ž…๋ ฅ์— ๋Œ€ํ•ด ํ•œ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ˆœ์„œ๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ์ฒ˜๋ฆฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ˆœ์„œ๋ฅผ ๊ณ ๋ คํ•ด์ฃผ๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜ˆ๋ฅผ ๋“ค์–ด๋ณด์ž.

"I Study math" ์—์„œ "I"์˜ ์ธ์ฝ”๋”ฉ ๋ฒกํ„ฐ๊ฐ€ [3, -2, 4] ๋ผ๊ณ  ํ•˜์ž. ๊ทธ๋Ÿฌ๋ฉด I๋Š” ์ฒซ๋ฒˆ์งธ ์ˆœ์„œ์— ๋‚˜์™”์œผ๋ฏ€๋กœ ๋ฒกํ„ฐ์˜ ์ฒซ๋ฒˆ์งธ ๊ฐ’์— ์ƒ์ˆ˜ 1000์„ ๋”ํ•ด์„œ [1003, -2, 4] ๋กœ ๋งŒ๋“ค์–ด์ฃผ๋Š” ๋ฐฉ๋ฒ•์ด Positional Encoding์˜ ์•„์ด๋””์–ด์ด๋‹ค.

  • ์ˆœ์„œ์— ๋”ฐ๋ผ ๋ฒกํ„ฐ๊ฐ€ ๋‹ค๋ฅธ ๊ฐ’์„ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค.

  • ์—ฌ๊ธฐ์„œ๋Š” ๊ฐ„๋‹จํ•˜๊ฒŒ 1000์„ ๋”ํ•ด์คฌ์ง€๋งŒ ์‹ค์ œ๋กœ๋Š” ๊ฐ„๋‹จํ•˜๊ฒŒ ์ด๋ฃจ์–ด์ง€๋Š” ๋ถ€๋ถ„์€ ์•„๋‹ˆ๋‹ค.

์œ„์น˜์— ๋”ฐ๋ผ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฒกํ„ฐ๋ฅผ sin๊ณผ cosํ•จ์ˆ˜๋กœ ์ด๋ฃจ์–ด์ง„ ์ฃผ๊ธฐํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ฒฐ์ •ํ•œ๋‹ค.

dimension ๊ฐœ์ˆ˜๋งŒํผ ์„œ๋กœ ๋‹ค๋ฅธ ๊ทธ๋ž˜ํ”„๊ฐ€ ์กด์žฌํ•˜๋ฉฐ ๊ฐ sequence์˜ ์ธ๋ฑ์Šค๋ฅผ x๊ฐ’์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

์œ„ ๊ทธ๋ž˜ํ”„์—์„œ๋Š” ๊ฐ€๋กœ์ถ•์€ ์ž„๋ฒ ๋”ฉ ์ฐจ์›, ์„ธ๋กœ์ถ•์€ ์ธ๋ฑ์Šค(=์œ„์น˜)์ด๋‹ค. ๊ทธ๋ž˜์„œ ํ•ด๋‹น ์ธ๋ฑ์Šค์— ํ•ด๋‹นํ•˜๋Š” ์ž„๋ฒ ๋”ฉ ์ฐจ์›๋งŒํผ์˜ ๋ฒกํ„ฐ๋ฅผ positional encoding ๋ฒกํ„ฐ๋กœ ์‚ฌ์šฉํ•ด์„œ ๊ธฐ์กด ๋ฒกํ„ฐ์— ๋”ํ•ด์ฃผ๊ฒŒ ๋œ๋‹ค.

Warm-up Learning Rate Scheduler

์šฐ๋ฆฌ๋Š” loss๊ฐ€ ๊ฐ€์žฅ ์ž‘์€ ์ง€์ ์„ ๋ชฉํ‘œ๋กœ ํ•™์Šต์„ ํ•  ๊ฒƒ์ด๊ณ  ์ด ๋•Œ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์€ ์ž„์˜๋กœ ์ดˆ๊ธฐํ™”ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ์•„๋ฌด๋ž˜๋„ Goal๊ณผ๋Š” ๋Œ€๋ถ€๋ถ„ ๋ฉ€๋ฆฌ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๋‹ค. ๋˜ํ•œ, ์ด ๋•Œ๋Š” Loss ํ•จ์ˆ˜ ํŠน์„ฑ์ƒ ๋ฉ€๋ฆฌ์žˆ์„ ์ˆ˜๋ก Gradient๊ฐ€ ๋งค์šฐ ํด ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค.

  • "gradient ๋งค์šฐ ํผ" ์ด๋ผ๊ณ  ์ž‘์„ฑ๋œ ๊ฒƒ์ž„

๊ทธ๋ž˜์„œ, ์ดˆ๋ฐ˜์— ๋„ˆ๋ฌด ํฐ gradient๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋„ˆ๋ฌด ํฐ ๋ณดํญ์œผ๋กœ ๊ฑท์ง€ ์•Š๊ฒŒ ์กฐ์ ˆํ•˜๊ธฐ์œ„ํ•ด ์ž‘์€ ํ•™์Šต๋ฅ ์—์„œ ์‹œ์ž‘ํ•ด์„œ ํ•™์Šต๋ฅ ์„ ํ‚ค์›Œ๋‚˜๊ฐ„๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ชฉํ‘œ์ง€์ ์— ๊ฐ€๊นŒ์›Œ์งˆ ๋•Œ ํ•™์Šต๋ฅ ์ด ๋„ˆ๋ฌด์ปค์„œ ์ˆ˜๋ ดํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์‹œ ํ•™์Šต๋ฅ ์„ ๊ฐ์†Œ์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์„ ํ•˜๊ฒŒ ๋œ๋‹ค.

  • ๊ทธ๋ž˜ํ”„์˜ ๋ฒ”์ฃผ์—์„œ ์•ž ์ˆซ์ž๋Š” batch size ๋’ท ์ˆซ์ž๋Š” epoch ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

  • batch size๊ฐ€ ์ž‘์„์ˆ˜๋ก ํ•™์Šต๋ฅ ์˜ ์ƒ์Šน ๊ณก์„ ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ํฌ๊ฒŒ ๊ฐ€์ง€๋ฉฐ, epoch์ˆ˜๊ฐ€ ์ ์„์ˆ˜๋ก ์ตœ๊ณ ์ ์ด ๋‚ฎ์•„์ง€๊ณ  ๋„๋‹ฌ์†๋„๋„ ์˜ค๋ž˜๊ฑธ๋ฆฌ๊ฒŒ๋œ๋‹ค.

Encoder Self-Attention Visualization

Attention ๋ฒกํ„ฐ๋ฅผ ๋ถ„์„ํ•ด ์‹œ๊ฐํ™”ํ•ด๋ณด์ž.

  • ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์—์„œ making ์ด๋ผ๋Š” ๋‹จ์–ด๋Š”, ์ž๊ธฐ ์ž์‹ ๋„ ์ฐธ์กฐ ํ•˜์ง€๋งŒ more์™€ difficult๋ผ๋Š” ๋‹จ์–ด๋ฅผ ๊ฐ€์žฅ ๋งŽ์ด ์ฐธ์กฐํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋”์šฑ ์–ด๋ ต๊ฒŒ ๋งŒ๋“ค์—ˆ๋‹ค๋ผ๋Š” ๋ชฉ์  ๋ณด์–ด์˜ ๋‹จ์–ด๋“ค์„ ์ฐธ์กฐํ•œ๋‹ค. ๋˜, 2009์™€ since๋ผ๋Š” ์‹œ๊ธฐ์ ์ธ ์˜๋ฏธ์˜ ๋‹จ์–ด๋„ ์กฐ๊ธˆ ์ฐธ์กฐํ•œ๋‹ค.

๋‹ค๋ฅธ ๋‹จ์–ด๋ฅผ ๋ณด์ž.

  • its๋Š” ์–ด๋–ค ๋‹จ์–ด๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” ์ง€์— ๋Œ€ํ•ด ์•Œ ์ˆ˜ ์žˆ๊ณ , ์ด๋Ÿฌํ•œ its์— ๋Œ€ํ•ด application์ด๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ์–ด๋А์ •๋„ ๊ด€๋ จ์ด ๋˜์–ด์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

Decoder

  • ์ธ์ฝ”๋”์—์„œ "I", "go", "home" ์„ ํ•™์Šต์‹œ์ผฐ๋‹ค๋ฉด ๋””์ฝ”๋”์—์„œ๋Š” "<sos>", "๋‚˜๋Š”", "์ง‘์—" ๋ฅผ ์ž…๋ ฅํ•ด์ค€๋‹ค. ๊ทธ๋Ÿฌ๋ฉด Positional Encoding์„ ๊ฑฐ์นœ ํ›„ Multi-Head Attention์„ ๊ฑฐ์นœ๋‹ค. ์ด ๊ณผ์ •์€ seq2seq์—์„œ decoder์˜ hidden state๋ฅผ ๋ฝ‘๋Š” ๊ณผ์ •์ด๋‹ค.

Masked Self-Attention

๋””์ฝ”๋”์—์„œ output์„ ๋””์ฝ”๋”ฉ ํ•  ๋•Œ ์ •๋ณด์˜ ์ ‘๊ทผ ๋ฒ”์œ„์— ์ œํ•œ์„ ๋‘๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

์˜ˆ์ธก์„ ํ•  ๋•Œ์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ด๋ฃจ์–ด์ ธ์•ผ ํ•œ๋‹ค.

  • "๋‚˜๋Š”"์„ ์˜ˆ์ธก : "<SOS>" ๋งŒ์„ ๊ฐ€์ง€๊ณ  ํ•ด์•ผํ•จ

  • "์ง‘์—"๋ฅผ ์˜ˆ์ธก : "<SOS>"์™€ "๋‚˜๋Š”"๋งŒ์„ ๊ฐ€์ง€๊ณ  ํ•ด์•ผํ•จ

  • ...

์†Œํ”„ํŠธ ๋งฅ์Šค๋ฅผ ๊ฑฐ์น˜๋ฉด ๊ฐ ๋‹จ์–ด์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ๊ฐ€์ง€๊ฒŒ ๋˜๋Š”๋ฐ ์ด ๊ฐ’์„ ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค์–ด์ค˜์•ผ ํ•œ๋‹ค.

  • softmax ๊ฐ’์ด 0์ด ๋˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ฐ ๊ฐ’์— -inf๋ฅผ ๊ณฑํ•ด์ฃผ๊ฒŒ๋œ๋‹ค.

์ดํ›„, ์ •๊ทœํ™”๋ฅผ ํ†ตํ•ด row์˜ ์ดํ•ฉ์ด 1์ด ๋˜๋„๋ก ํ•œ๋‹ค.