๐Ÿšดโ€โ™‚๏ธ
TIL
  • MAIN
  • : TIL?
  • : WIL
  • : Plan
  • : Retrospective
    • 21Y
      • Wait a moment!
      • 9M 2W
      • 9M1W
      • 8M4W
      • 8M3W
      • 8M2W
      • 8M1W
      • 7M4W
      • 7M3W
      • 7M2W
      • 7M1W
      • 6M5W
      • 1H
    • ์ƒˆ์‚ฌ๋žŒ ๋˜๊ธฐ ํ”„๋กœ์ ํŠธ
      • 2ํšŒ์ฐจ
      • 1ํšŒ์ฐจ
  • TIL : ML
    • Paper Analysis
      • BERT
      • Transformer
    • Boostcamp 2st
      • [S]Data Viz
        • (4-3) Seaborn ์‹ฌํ™”
        • (4-2) Seaborn ๊ธฐ์ดˆ
        • (4-1) Seaborn ์†Œ๊ฐœ
        • (3-4) More Tips
        • (3-3) Facet ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-2) Color ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-1) Text ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-3) Scatter Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-2) Line Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-1) Bar Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (1-3) Python๊ณผ Matplotlib
        • (1-2) ์‹œ๊ฐํ™”์˜ ์š”์†Œ
        • (1-1) Welcome to Visualization (OT)
      • [P]MRC
        • (2๊ฐ•) Extraction-based MRC
        • (1๊ฐ•) MRC Intro & Python Basics
      • [P]KLUE
        • (5๊ฐ•) BERT ๊ธฐ๋ฐ˜ ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
        • (4๊ฐ•) ํ•œ๊ตญ์–ด BERT ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต
        • [NLP] ๋ฌธ์žฅ ๋‚ด ๊ฐœ์ฒด๊ฐ„ ๊ด€๊ณ„ ์ถ”์ถœ
        • (3๊ฐ•) BERT ์–ธ์–ด๋ชจ๋ธ ์†Œ๊ฐœ
        • (2๊ฐ•) ์ž์—ฐ์–ด์˜ ์ „์ฒ˜๋ฆฌ
        • (1๊ฐ•) ์ธ๊ณต์ง€๋Šฅ๊ณผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
      • [U]Stage-CV
      • [U]Stage-NLP
        • 7W Retrospective
        • (10๊ฐ•) Advanced Self-supervised Pre-training Models
        • (09๊ฐ•) Self-supervised Pre-training Models
        • (08๊ฐ•) Transformer (2)
        • (07๊ฐ•) Transformer (1)
        • 6W Retrospective
        • (06๊ฐ•) Beam Search and BLEU score
        • (05๊ฐ•) Sequence to Sequence with Attention
        • (04๊ฐ•) LSTM and GRU
        • (03๊ฐ•) Recurrent Neural Network and Language Modeling
        • (02๊ฐ•) Word Embedding
        • (01๊ฐ•) Intro to NLP, Bag-of-Words
        • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Preprocessing for NMT Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Subword-level Language Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ2] RNN-based Language Model
        • [์„ ํƒ ๊ณผ์ œ] BERT Fine-tuning with transformers
        • [ํ•„์ˆ˜ ๊ณผ์ œ] Data Preprocessing
      • Mask Wear Image Classification
        • 5W Retrospective
        • Report_Level1_6
        • Performance | Review
        • DAY 11 : HardVoting | MultiLabelClassification
        • DAY 10 : Cutmix
        • DAY 9 : Loss Function
        • DAY 8 : Baseline
        • DAY 7 : Class Imbalance | Stratification
        • DAY 6 : Error Fix
        • DAY 5 : Facenet | Save
        • DAY 4 : VIT | F1_Loss | LrScheduler
        • DAY 3 : DataSet/Lodaer | EfficientNet
        • DAY 2 : Labeling
        • DAY 1 : EDA
        • 2_EDA Analysis
      • [P]Stage-1
        • 4W Retrospective
        • (10๊ฐ•) Experiment Toolkits & Tips
        • (9๊ฐ•) Ensemble
        • (8๊ฐ•) Training & Inference 2
        • (7๊ฐ•) Training & Inference 1
        • (6๊ฐ•) Model 2
        • (5๊ฐ•) Model 1
        • (4๊ฐ•) Data Generation
        • (3๊ฐ•) Dataset
        • (2๊ฐ•) Image Classification & EDA
        • (1๊ฐ•) Competition with AI Stages!
      • [U]Stage-3
        • 3W Retrospective
        • PyTorch
          • (10๊ฐ•) PyTorch Troubleshooting
          • (09๊ฐ•) Hyperparameter Tuning
          • (08๊ฐ•) Multi-GPU ํ•™์Šต
          • (07๊ฐ•) Monitoring tools for PyTorch
          • (06๊ฐ•) ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
          • (05๊ฐ•) Dataset & Dataloader
          • (04๊ฐ•) AutoGrad & Optimizer
          • (03๊ฐ•) PyTorch ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
          • (02๊ฐ•) PyTorch Basics
          • (01๊ฐ•) Introduction to PyTorch
      • [U]Stage-2
        • 2W Retrospective
        • DL Basic
          • (10๊ฐ•) Generative Models 2
          • (09๊ฐ•) Generative Models 1
          • (08๊ฐ•) Sequential Models - Transformer
          • (07๊ฐ•) Sequential Models - RNN
          • (06๊ฐ•) Computer Vision Applications
          • (05๊ฐ•) Modern CNN - 1x1 convolution์˜ ์ค‘์š”์„ฑ
          • (04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?
          • (03๊ฐ•) Optimization
          • (02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)
          • (01๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ์šฉ์–ด ์„ค๋ช… - Historical Review
        • Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Multi-headed Attention Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] LSTM Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] CNN Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Optimization Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] MLP Assignment
      • [U]Stage-1
        • 1W Retrospective
        • AI Math
          • (AI Math 10๊ฐ•) RNN ์ฒซ๊ฑธ์Œ
          • (AI Math 9๊ฐ•) CNN ์ฒซ๊ฑธ์Œ
          • (AI Math 8๊ฐ•) ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 7๊ฐ•) ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 6๊ฐ•) ํ™•๋ฅ ๋ก  ๋ง›๋ณด๊ธฐ
          • (AI Math 5๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต๋ฐฉ๋ฒ• ์ดํ•ดํ•˜๊ธฐ
          • (AI Math 4๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ๋งค์šด๋ง›
          • (AI Math 3๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ์ˆœํ•œ๋ง›
          • (AI Math 2๊ฐ•) ํ–‰๋ ฌ์ด ๋ญ์˜ˆ์š”?
          • (AI Math 1๊ฐ•) ๋ฒกํ„ฐ๊ฐ€ ๋ญ์˜ˆ์š”?
        • Python
          • (Python 7-2๊ฐ•) pandas II
          • (Python 7-1๊ฐ•) pandas I
          • (Python 6๊ฐ•) numpy
          • (Python 5-2๊ฐ•) Python data handling
          • (Python 5-1๊ฐ•) File / Exception / Log Handling
          • (Python 4-2๊ฐ•) Module and Project
          • (Python 4-1๊ฐ•) Python Object Oriented Programming
          • (Python 3-2๊ฐ•) Pythonic code
          • (Python 3-1๊ฐ•) Python Data Structure
          • (Python 2-4๊ฐ•) String and advanced function concept
          • (Python 2-3๊ฐ•) Conditionals and Loops
          • (Python 2-2๊ฐ•) Function and Console I/O
          • (Python 2-1๊ฐ•) Variables
          • (Python 1-3๊ฐ•) ํŒŒ์ด์ฌ ์ฝ”๋”ฉ ํ™˜๊ฒฝ
          • (Python 1-2๊ฐ•) ํŒŒ์ด์ฌ ๊ฐœ์š”
          • (Python 1-1๊ฐ•) Basic computer class for newbies
        • Assignment
          • [์„ ํƒ ๊ณผ์ œ 3] Maximum Likelihood Estimate
          • [์„ ํƒ ๊ณผ์ œ 2] Backpropagation
          • [์„ ํƒ ๊ณผ์ œ 1] Gradient Descent
          • [ํ•„์ˆ˜ ๊ณผ์ œ 5] Morsecode
          • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Baseball
          • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Text Processing 2
          • [ํ•„์ˆ˜ ๊ณผ์ œ 2] Text Processing 1
          • [ํ•„์ˆ˜ ๊ณผ์ œ 1] Basic Math
    • ๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ
      • ์ข…ํ•ฉ ์‹ค์Šต 2 - ์บ๊ธ€ Plant Pathology(๋‚˜๋ฌด์žŽ ๋ณ‘ ์ง„๋‹จ) ๊ฒฝ์—ฐ ๋Œ€ํšŒ
      • ์ข…ํ•ฉ ์‹ค์Šต 1 - 120์ข…์˜ Dog Breed Identification ๋ชจ๋ธ ์ตœ์ ํ™”
      • ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์˜ ๋ฏธ์„ธ ์กฐ์ • ํ•™์Šต๊ณผ ๋‹ค์–‘ํ•œ Learning Rate Scheduler์˜ ์ ์šฉ
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - ResNet ์ƒ์„ธ์™€ EfficientNet ๊ฐœ์š”
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - AlexNet, VGGNet, GoogLeNet
      • Albumentation์„ ์ด์šฉํ•œ Augmentation๊ธฐ๋ฒ•๊ณผ Keras Sequence ํ™œ์šฉํ•˜๊ธฐ
      • ์‚ฌ์ „ ํ›ˆ๋ จ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ๊ณผ Keras Generator ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ดํ•ด
      • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์ดํ•ด - Keras ImageDataGenerator ํ™œ์šฉ
      • CNN ๋ชจ๋ธ ๊ตฌํ˜„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ณธ ๊ธฐ๋ฒ• ์ ์šฉํ•˜๊ธฐ
    • AI School 1st
    • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
    • ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜
  • TIL : Python & Math
    • Do It! ์žฅ๊ณ +๋ถ€ํŠธ์ŠคํŠธ๋žฉ: ํŒŒ์ด์ฌ ์›น๊ฐœ๋ฐœ์˜ ์ •์„
      • Relations - ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„
      • Relations - ๋‹ค๋Œ€์ผ ๊ด€๊ณ„
      • ํ…œํ”Œ๋ฆฟ ํŒŒ์ผ ๋ชจ๋“ˆํ™” ํ•˜๊ธฐ
      • TDD (Test Driven Development)
      • template tags & ์กฐ๊ฑด๋ฌธ
      • ์ •์  ํŒŒ์ผ(static files) & ๋ฏธ๋””์–ด ํŒŒ์ผ(media files)
      • FBV (Function Based View)์™€ CBV (Class Based View)
      • Django ์ž…๋ฌธํ•˜๊ธฐ
      • ๋ถ€ํŠธ์ŠคํŠธ๋žฉ
      • ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์ดˆ๋‹ค์ง€๊ธฐ (HTML, CSS, JS)
      • ๋“ค์–ด๊ฐ€๊ธฐ + ํ™˜๊ฒฝ์„ค์ •
    • Algorithm
      • Programmers
        • Level1
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์ˆซ์ž ๋ฌธ์ž์—ด๊ณผ ์˜๋‹จ์–ด
          • ์ž์—ฐ์ˆ˜ ๋’ค์ง‘์–ด ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค๊ธฐ
          • ์ •์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐฐ์น˜ํ•˜๊ธฐ
          • ์ •์ˆ˜ ์ œ๊ณฑ๊ทผ ํŒ๋ณ„
          • ์ œ์ผ ์ž‘์€ ์ˆ˜ ์ œ๊ฑฐํ•˜๊ธฐ
          • ์ง์‚ฌ๊ฐํ˜• ๋ณ„์ฐ๊ธฐ
          • ์ง์ˆ˜์™€ ํ™€์ˆ˜
          • ์ฒด์œก๋ณต
          • ์ตœ๋Œ€๊ณต์•ฝ์ˆ˜์™€ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • ์ฝœ๋ผ์ธ  ์ถ”์ธก
          • ํฌ๋ ˆ์ธ ์ธํ˜•๋ฝ‘๊ธฐ ๊ฒŒ์ž„
          • ํ‚คํŒจ๋“œ ๋ˆ„๋ฅด๊ธฐ
          • ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
          • ํฐ์ผ“๋ชฌ
          • ํ•˜์ƒค๋“œ ์ˆ˜
          • ํ•ธ๋“œํฐ ๋ฒˆํ˜ธ ๊ฐ€๋ฆฌ๊ธฐ
          • ํ–‰๋ ฌ์˜ ๋ง์…ˆ
        • Level2
          • ์ˆซ์ž์˜ ํ‘œํ˜„
          • ์ˆœ์œ„ ๊ฒ€์ƒ‰
          • ์ˆ˜์‹ ์ตœ๋Œ€ํ™”
          • ์†Œ์ˆ˜ ์ฐพ๊ธฐ
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์‚ผ๊ฐ ๋‹ฌํŒฝ์ด
          • ๋ฌธ์ž์—ด ์••์ถ•
          • ๋ฉ”๋‰ด ๋ฆฌ๋‰ด์–ผ
          • ๋” ๋งต๊ฒŒ
          • ๋•…๋”ฐ๋จน๊ธฐ
          • ๋ฉ€์ฉกํ•œ ์‚ฌ๊ฐํ˜•
          • ๊ด„ํ˜ธ ํšŒ์ „ํ•˜๊ธฐ
          • ๊ด„ํ˜ธ ๋ณ€ํ™˜
          • ๊ตฌ๋ช…๋ณดํŠธ
          • ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
          • ๋‰ด์Šค ํด๋Ÿฌ์Šคํ„ฐ๋ง
          • ๋‹ค๋ฆฌ๋ฅผ ์ง€๋‚˜๋Š” ํŠธ๋Ÿญ
          • ๋‹ค์Œ ํฐ ์ˆซ์ž
          • ๊ฒŒ์ž„ ๋งต ์ตœ๋‹จ๊ฑฐ๋ฆฌ
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
          • ๊ฐ€์žฅ ํฐ ์ •์‚ฌ๊ฐํ˜• ์ฐพ๊ธฐ
          • H-Index
          • JadenCase ๋ฌธ์ž์—ด ๋งŒ๋“ค๊ธฐ
          • N๊ฐœ์˜ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • N์ง„์ˆ˜ ๊ฒŒ์ž„
          • ๊ฐ€์žฅ ํฐ ์ˆ˜
          • 124 ๋‚˜๋ผ์˜ ์ˆซ์ž
          • 2๊ฐœ ์ดํ•˜๋กœ ๋‹ค๋ฅธ ๋น„ํŠธ
          • [3์ฐจ] ํŒŒ์ผ๋ช… ์ •๋ ฌ
          • [3์ฐจ] ์••์ถ•
          • ์ค„ ์„œ๋Š” ๋ฐฉ๋ฒ•
          • [3์ฐจ] ๋ฐฉ๊ธˆ ๊ทธ๊ณก
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
        • Level3
          • ๋งค์นญ ์ ์ˆ˜
          • ์™ธ๋ฒฝ ์ ๊ฒ€
          • ๊ธฐ์ง€๊ตญ ์„ค์น˜
          • ์ˆซ์ž ๊ฒŒ์ž„
          • 110 ์˜ฎ๊ธฐ๊ธฐ
          • ๊ด‘๊ณ  ์ œ๊ฑฐ
          • ๊ธธ ์ฐพ๊ธฐ ๊ฒŒ์ž„
          • ์…”ํ‹€๋ฒ„์Šค
          • ๋‹จ์†์นด๋ฉ”๋ผ
          • ํ‘œ ํŽธ์ง‘
          • N-Queen
          • ์ง•๊ฒ€๋‹ค๋ฆฌ ๊ฑด๋„ˆ๊ธฐ
          • ์ตœ๊ณ ์˜ ์ง‘ํ•ฉ
          • ํ•ฉ์Šน ํƒ์‹œ ์š”๊ธˆ
          • ๊ฑฐ์Šค๋ฆ„๋ˆ
          • ํ•˜๋…ธ์ด์˜ ํƒ‘
          • ๋ฉ€๋ฆฌ ๋›ฐ๊ธฐ
          • ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ
        • Level4
    • Head First Python
    • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ SQL
    • ๋‹จ ๋‘ ์žฅ์˜ ๋ฌธ์„œ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™” ๋ฝ€๊ฐœ๊ธฐ
    • Linear Algebra(Khan Academy)
    • ์ธ๊ณต์ง€๋Šฅ์„ ์œ„ํ•œ ์„ ํ˜•๋Œ€์ˆ˜
    • Statistics110
  • TIL : etc
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Kubernetes
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Docker
      • 2. ๋„์ปค ์„ค์น˜ ์‹ค์Šต 1 - ํ•™์ŠตํŽธ(์ค€๋น„๋ฌผ/์‹ค์Šต ์œ ํ˜• ์†Œ๊ฐœ)
      • 1. ์ปจํ…Œ์ด๋„ˆ์™€ ๋„์ปค์˜ ์ดํ•ด - ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์“ฐ๋Š”์ด์œ  / ์ผ๋ฐ˜ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ปจํ…Œ์ด๋„ˆํ”„๋กœ๊ทธ๋žจ์˜ ์ฐจ์ด์ 
      • 0. ๋“œ๋””์–ด ์ฐพ์•„์˜จ Docker ๊ฐ•์˜! ์™•์ดˆ๋ณด์—์„œ ๋„์ปค ๋งˆ์Šคํ„ฐ๋กœ - OT
    • CoinTrading
      • [๊ฐ€์ƒ ํ™”ํ ์ž๋™ ๋งค๋งค ํ”„๋กœ๊ทธ๋žจ] ๋ฐฑํ…Œ์ŠคํŒ… : ๊ฐ„๋‹จํ•œ ํ…Œ์ŠคํŒ…
    • Gatsby
      • 01 ๊นƒ๋ถ ํฌ๊ธฐ ์„ ์–ธ
  • TIL : Project
    • Mask Wear Image Classification
    • Project. GARIGO
  • 2021 TIL
    • CHANGED
    • JUN
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Thu
      • 2 Wed
      • 1 Tue
    • MAY
      • 31 Mon
      • 30 Sun
      • 29 Sat
      • 28 Fri
      • 27 Thu
      • 26 Wed
      • 25 Tue
      • 24 Mon
      • 23 Sun
      • 22 Sat
      • 21 Fri
      • 20 Thu
      • 19 Wed
      • 18 Tue
      • 17 Mon
      • 16 Sun
      • 15 Sat
      • 14 Fri
      • 13 Thu
      • 12 Wed
      • 11 Tue
      • 10 Mon
      • 9 Sun
      • 8 Sat
      • 7 Fri
      • 6 Thu
      • 5 Wed
      • 4 Tue
      • 3 Mon
      • 2 Sun
      • 1 Sat
    • APR
      • 30 Fri
      • 29 Thu
      • 28 Wed
      • 27 Tue
      • 26 Mon
      • 25 Sun
      • 24 Sat
      • 23 Fri
      • 22 Thu
      • 21 Wed
      • 20 Tue
      • 19 Mon
      • 18 Sun
      • 17 Sat
      • 16 Fri
      • 15 Thu
      • 14 Wed
      • 13 Tue
      • 12 Mon
      • 11 Sun
      • 10 Sat
      • 9 Fri
      • 8 Thu
      • 7 Wed
      • 6 Tue
      • 5 Mon
      • 4 Sun
      • 3 Sat
      • 2 Fri
      • 1 Thu
    • MAR
      • 31 Wed
      • 30 Tue
      • 29 Mon
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • FEB
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • JAN
      • 31 Sun
      • 30 Sat
      • 29 Fri
      • 28 Thu
      • 27 Wed
      • 26 Tue
      • 25 Mon
      • 24 Sun
      • 23 Sat
      • 22 Fri
      • 21 Thu
      • 20 Wed
      • 19 Tue
      • 18 Mon
      • 17 Sun
      • 16 Sat
      • 15 Fri
      • 14 Thu
      • 13 Wed
      • 12 Tue
      • 11 Mon
      • 10 Sun
      • 9 Sat
      • 8 Fri
      • 7 Thu
      • 6 Wed
      • 5 Tue
      • 4 Mon
      • 3 Sun
      • 2 Sat
      • 1 Fri
  • 2020 TIL
    • DEC
      • 31 Thu
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Tue
      • 2 Wed
      • 1 Tue
    • NOV
      • 30 Mon
Powered by GitBook
On this page
  • 1. Basics of Recurrent Neural Networks (RNNs)
  • 2. Types of RNNs
  • One-to-One
  • One-to-Many
  • Many-to-One
  • Many-to-Many
  • 3. Character-level Language Model
  • BackPropagation through time, BPTT
  • ์‹ค์Šต
  • ํ•„์š” ํŒจํ‚ค์ง€
  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
  • RNN ์‚ฌ์šฉ
  • RNN ํ™œ์šฉ
  • PackedSequence ์‚ฌ์šฉ

Was this helpful?

  1. TIL : ML
  2. Boostcamp 2st
  3. [U]Stage-NLP

(03๊ฐ•) Recurrent Neural Network and Language Modeling

210907

Previous(04๊ฐ•) LSTM and GRUNext(02๊ฐ•) Word Embedding

Last updated 3 years ago

Was this helpful?

1. Basics of Recurrent Neural Networks (RNNs)

์ด์ „ time step ์—์„œ ๊ณ„์‚ฐํ•œ Htโˆ’1 H_{t-1} Htโˆ’1โ€‹์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ ํ˜„์žฌ time step์˜ Ht H_t Htโ€‹๋ฅผ ์ถœ๋ ฅ์œผ๋กœ ๋‚ด์–ด์ฃผ๋Š” ๊ตฌ์กฐ์ด๋‹ค. ์ด ๋•Œ ๋งค time step์—์„œ ๋™์ผํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋™์ผํ•œ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๋œป์—์„œ Recurrent ๊ฐ€ ๋˜์—ˆ๋‹ค.

RNN์˜ ๊ธฐํ˜ธ๋“ค์ด ์˜๋ฏธํ•˜๋Š” ๋ฐ”๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

ํŠนํžˆ y๋Š” h์—์„œ ์ƒ์„ฑ๋˜๋Š” ๊ฐ’์œผ๋กœ ๋งค time step๋งˆ๋‹ค ์ƒ์„ฑ๋  ์ˆ˜๋„ ์žˆ๊ณ  ๋งˆ์ง€๋ง‰์—๋งŒ ์ƒ์„ฑ๋  ์ˆ˜๋„ ์žˆ๋‹ค.

  • machine translation์€ ๋งค๋ฒˆ ์ƒ์„ฑ๋˜๊ณ  ๋ฌธ์žฅ์˜ ๊ธ์ • ํ‘œํ˜„์€ ๋งˆ์ง€๋ง‰์—๋งŒ ์ƒ์„ฑ๋œ๋‹ค.

๋ณดํ†ต, h๋ฅผ ๊ตฌํ•  ๋•Œ hyper tangent๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ ์‹ค์ œ๋กœ๋Š” W๊ฐ€ x์™€ h(t-1)์— ๋Œ€ํ•ด์„œ ๋‘ ๊ฐœ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ๊ฒฐ๊ตญ์—” ๋‚ด์ ํ•ด์„œ ๋”ํ•˜๋ฏ€๋กœ ์‹ค์ œ๋กœ๋Š” ํ•œ ๊ฐœ์˜ W๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ณ  x์™€ h(t-1)์„ ์„ธ๋กœ๋กœ ๊ฒฐํ•ฉํ•œ ์ƒํƒœ์—์„œ ๊ณฑํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

2. Types of RNNs

One-to-One

  • ์‚ฌ๋žŒ์˜ ํ‚ค๋ฅผ ์ž…๋ ฅ ๋ฐ›์•„ ๋ชธ๋ฌด๊ฒŒ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ

  • time step์ด๋‚˜ sequence๊ฐ€ ์—†๋Š” ์ผ๋ฐ˜์ ์ธ ํ˜•ํƒœ๋ฅผ ๋„์‹ํ™”ํ•œ ๋ชจ์Šต

One-to-Many

  • Image Captioning ๊ฐ™์€ Task์— ๋งŽ์ด ์‚ฌ์šฉ๋œ๋‹ค.

  • time step์œผ๋กœ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์€ ์ž…๋ ฅ์„ ์ œ๊ณตํ•˜๋ฉฐ ์ด๋ฏธ์ง€์— ํ•„์š”ํ•œ ๋‹จ์–ด๋“ค์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑํ•œ๋‹ค.

  • ์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” ์ž…๋ ฅ์ด ์ฒซ๋ฒˆ์งธ time step์—์„œ๋งŒ ๋“ค์–ด๊ฐ€๋ฏ€๋กœ ๊ทธ๋ฆผ์€ ์ €๋ ‡๊ฒŒ ๊ทธ๋ ธ์ง€๋งŒ ์‹ค์ œ๋กœ๋Š” RNN ๋ชจ๋ธ์„ ์ด์šฉํ•˜๋ฏ€๋กœ ๋‘๋ฒˆ์งธ time step๋ถ€ํ„ฐ๋Š” 0์œผ๋กœ ์ฑ„์›Œ์ง„ ํ–‰๋ ฌ ๋˜๋Š” ํ…์„œ๊ฐ€ ์ž…๋ ฅ๋œ๋‹ค.

Many-to-One

  • Sentiment Classification Task์— ์ด์šฉ

  • ๊ฐ ์ž…๋ ฅ๋งˆ๋‹ค Sequence Words๋ฅผ ๋ฐ›๊ณ , ๊ฐ ๋‹จ์–ด๋ฅผ ํ†ตํ•ด ์ „์ฒด ๋ฌธ์žฅ์˜ ๊ฐ์ •์„ ๋ถ„์„ํ•˜๊ฒŒ ๋œ๋‹ค.

Many-to-Many

  • Machine Translation์— ์ด์šฉ

  • ๋งˆ์ง€๋ง‰ sequence๊นŒ์ง€ ์ž…๋ ฅ์„ ๋ฐ›๊ณ  ์ด ๋•Œ ๊นŒ์ง€๋Š” ์ถœ๋ ฅ์„ ๋‚ด๋†“์ง€ ์•Š๋‹ค๊ฐ€ ๋งˆ์ง€๋ง‰ step ์—์„œ ์ž…๋ ฅ์„ ๋ฐ›์€ ๋’ค ์ถœ๋ ฅ์„ ์ค€๋‹ค.

๋˜ ๋‹ค๋ฅธ ๊ตฌ์กฐ๋กœ์„œ, ์ž…๋ ฅ์ด ์ฃผ์–ด์งˆ ๋•Œ ๋งˆ๋‹ค ์ถœ๋ ฅ์ด ๋˜๋Š” ๋”œ๋ ˆ์ด๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” Task๊ฐ€ ์กด์žฌํ•œ๋‹ค.

  • Video classification on frame level ์ด๋‚˜ POS tagging Task์— ์ด์šฉํ•œ๋‹ค.

    • ๋น„๋””์˜ค ๋ถ„๋ฅ˜๋Š” ๊ฐ ๋น„๋””์˜ค์˜ ์ด๋ฏธ์ง€ ํ•œ์žฅ ํ•œ์žฅ์ด ์–ด๋–ค ์˜๋ฏธ๋ฅผ ๊ฐ–๋Š” ์ง€ ๋ถ„์„ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด ๊ฐ ์‹ ์ด ์ „์Ÿ์ด ์ผ์–ด๋‚˜๋Š” ์‹ ์ด๋‹ค ๋ผ๋˜๊ฐ€. ์ฃผ์ธ๊ณต์ด ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š” ์‹ ์ด๋‹ค ๋ผ๋˜๊ฐ€. ๋“ฑ๋“ฑ

3. Character-level Language Model

์–ธ์–ด ๋ชจ๋ธ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ฃผ์–ด์ง„ ๋ฌธ์ž์—ด์˜ ์ˆœ์„œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ ๋‹จ์–ด๊ฐ€ ๋ฌด์—‡์ธ์ง€ ์•Œ์•„๋‚ด๋Š” Task์ด๋‹ค. ๋ณด๋‹ค ์‹ฌํ”Œํ•œ ์˜ˆ์ œ๋กœ์„œ Character๋กœ ๋‹ค๋ฃฌ๋‹ค.

  • ์ฒ˜์Œ์—๋Š” ์ค‘๋ณต์„ ์ œ๊ฑฐํ•ด์„œ ์‚ฌ์ „์„ ๊ตฌ์ถ•ํ•œ๋‹ค.

  • ์ „์ฒด ๊ธธ์ด๋งŒํผ์˜ ์ฐจ์›์„ ๊ฐ€์ง€๋Š” ์›ํ•ซ๋ฒกํ„ฐ๋กœ ์•ŒํŒŒ๋ฒณ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

์ด ๋•Œ, bias๋„ ์‚ฌ์šฉํ•˜๋ฉฐ h0 ๋Š” ์˜๋ฒกํ„ฐ๋กœ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

Output ๋ฒกํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

์ดํ›„, ์†Œํ”„ํŠธ ๋งฅ์Šค๋ฅผ ๊ฑฐ์ณ ๊ฐ€์žฅ ํฐ ๊ฐ’์œผ๋กœ output์„ ๊ฒฐ์ •ํ•˜๊ฒŒ ๋˜๋ฉฐ ํŠน์ • ๋ฌธ์ž์— 1์„ ๋ชฐ์•„์ค€ Ground Truth์™€์˜ ์˜ค์ฐจ๋ฅผ ํ†ตํ•ด back propagation์ด ์ด๋ฃจ์–ด์ง€๊ฒŒ ๋œ๋‹ค.

์ด ๋•Œ inference ํ•˜๋Š” ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ์ฒซ๋ฒˆ์งธ ๊ธ€์ž๋ฅผ ์ฃผ๊ณ  ์–ป์€ output์„ ๋‘๋ฒˆ์งธ ์ž…๋ ฅ์œผ๋กœ ์„ค์ •ํ•œ๋‹ค. ์ด๋ฅผ ๋ฐ˜๋ณตํ•ด์„œ ๋ชจ๋“  ๋‹จ์–ด๋ฅผ ์–ป๋Š”๋‹ค.

์œ„ ๊ธ€์€ ์…ฐ์ต์Šคํ”ผ์–ด์˜ ํฌ๊ณก ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ž์„ธํžˆ ๋ณด๋ฉด ๋‹จ์ˆœํžˆ ์•ŒํŒŒ๋ฒณ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ณต๋ฐฑ๊ณผ ๋ฌธ์žฅ๋ถ€ํ˜ธ๊นŒ์ง€๋„ ์ด์–ด์ง€๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์‹ค์ œ๋กœ ์ด๋Ÿฌํ•œ ๊ฒƒ๊นŒ์ง€ ๊ณ ๋ คํ•ด์•ผํ•œ๋‹ค.

์ฒ˜์Œ์—๋Š” ์ž˜ ํ•™์Šตํ•˜์ง€ ๋ชปํ•ด ๋ง๋„ ๋˜์ง€ ์•Š๋Š” ์—‰ํ„ฐ๋ฆฌ ๋‹จ์–ด๋“ค์„ ๋‚ด๋†“๋‹ค๊ฐ€ ํ•™์Šต์„ ๊ฑฐ๋“ญํ•  ์ˆ˜๋ก ๋ง์ด ๋˜๋Š” ๋ฌธ์žฅ์ด ๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

RNN์œผ๋กœ ๋…ผ๋ฌธ์„ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜, ์—ฐ๊ทน ๋Œ€๋ณธ ๋˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์ฝ”๋“œ๊นŒ์ง€๋„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

BackPropagation through time, BPTT

์ˆ˜ ์ฒœ, ์ˆ˜ ๋งŒ ์ด์ƒ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๋‹ค๋ณด๋ฉด ์ž…๋ ฅ์œผ๋กœ ์ œ๊ณต๋˜๋Š” Sequence๊ฐ€ ๋งค์šฐ ๊ธธ ์ˆ˜ ์žˆ๊ณ  ์ด์— ๋”ฐ๋ผ ๋ชจ๋“  output์„ ์ข…ํ•ฉํ•ด์„œ Loss ๊ฐ’์„ ์–ป๊ณ  BackPropagtaion์„ ์ง„ํ–‰ํ•ด์•ผ ํ•œ๋‹ค. ํ˜„์‹ค์ ์œผ๋กœ ์ด ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์ง€๋ฉด ํ•œ๊บผ๋ฒˆ์— ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด๋‚˜ ๋ฐ์ดํ„ฐ์˜ ์–‘์ด ํ•œ์ •๋œ ๋ฆฌ์†Œ์Šค ์•ˆ์— ๋‹ด๊ธฐ์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ œํ•œ๋œ ๊ธธ์ด์˜ Sequence ๋งŒ์„ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฑ„ํƒํ•œ๋‹ค.

๋‹ค์Œ ์ด๋ฏธ์ง€๋Š” Hidden state์˜ ํŠน์ •ํ•œ dimension์„ ๊ด€์ฐฐํ•ด์„œ ํฌ๊ธฐ๊ฐ€ ์ปค์ง€๋ฉด ํ‘ธ๋ฅธ์ƒ‰์œผ๋กœ ํฌ๊ธฐ๊ฐ€ ์Œ์ˆ˜๋กœ ์ž‘์•„์ง€๋ฉด ๋ถ‰์€์ƒ‰์œผ๋กœ ํ‘œํ˜„ํ–ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ํ•œ ๊ฐœ์˜ dimension์„ ์—ฌ๋Ÿฌ๊ฐœ ๊ด€์ฐฐํ•˜๋‹ค๊ฐ€ ํŠน์ • ์œ„์น˜์—์„œ ํฅ๋ฏธ๋กœ์šด ํŒจํ„ด์„ ๋ณด๊ฒŒ๋œ๋‹ค ํฐ ๋”ฐ์˜ดํ‘œ๋ถ€ํ„ฐ ๋‹ค์Œ ํฐ ๋”ฐ์˜ดํ‘œ๊นŒ์ง€ ํ‘ธ๋ฅธ์ƒ‰์„ ์œ ์ง€ํ•˜๋‹ค๊ฐ€ ๋”ฐ์˜ดํ‘œ๊ฐ€ ๋‹ซํžŒ ์ดํ›„๋ถ€ํ„ฐ๋Š” ๋ถ‰์€์ƒ‰์„ ์œ ์ง€ํ•˜๊ณ  ๋‹ค์‹œ ๋”ฐ์˜ดํ‘œ๋ฅผ ๋งŒ๋‚˜๋ฉด ํ‘ธ๋ฅธ์ƒ‰์œผ๋กœ ๋ฐ”๋€Œ๋Š” ๋ชจ์Šต์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  • ์ฆ‰, ์ด ์ฐจ์›์—์„œ๋Š” ํฐ ๋”ฐ์˜ดํ‘œ์˜ ์‹œ์ž‘๊ณผ ๋์„ ๊ธฐ์–ตํ•˜๋Š” ์šฉ๋„๋กœ ์‚ฌ์šฉ๋˜์—ˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๋˜, ๋‹ค์Œ ์ด๋ฏธ์ง€๋Š” ํ”„๋กœ๊ทธ๋žจ ์ฝ”๋“œ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

  • ์ด ์…€์€ ํ•ด๋‹น ๊ตฌ๋ฌธ์ด if๋ฌธ์ด๋ผ๋Š” ๊ฒƒ์„ ๊ธฐ์–ตํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์‚ฌ์‹ค ์ด๋Ÿฌํ•œ ํŠน์ง•์€ ๋ฐ”๋‹๋ผ RNN์ด ์•„๋‹Œ LSTM์ด๋‚˜ GRU๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ์˜ ๊ฒฐ๊ณผ์ด๋‹ค. ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ์˜ RNN์€ ์ •์ž‘ ๋งŽ์ด ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋ฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋งค ํƒ€์ž„์Šคํ…๋งˆ๋‹ค ํžˆ๋“ ์Šคํ…Œ์ดํŠธ์— ๋™์ผํ•œ W๊ฐ€ ๊ณฑํ•ด์ง€๋‹ค๋ณด๋‹ˆ ๋“ฑ๋น„์ˆ˜์—ด์˜ ๊ผด๋กœ ๋‚˜ํƒ€๋‚˜์ง€๊ฒŒ ๋˜๊ณ  ์—ฌ๊ธฐ์„œ ๊ณต๋น„๊ฐ€ 1๋ณด๋‹ค ์ž‘์œผ๋ฉด Vanishing Gradient ๋ฌธ์ œ๊ฐ€, 1๋ณด๋‹ค ํฌ๋ฉด Exploding Gradient ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ฒŒ ๋œ๋‹ค.

์œ„์˜ ์ˆซ์ž๋Š” backpropagtaion ๋  ๋•Œ W์˜ gradient๊ฐ’์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ ์ ์  ์ž‘์•„์ง€๊ณ  ์žˆ๋‹ค. 0์— ๊ฐ€๊นŒ์›Œ์งˆ์ˆ˜๋ก ์œ ์˜๋ฏธํ•œ signal์„ ๋’ค์ชฝ์œผ๋กœ ์ „๋‹ฌํ•  ์ˆ˜ ์—†๊ฒŒ๋œ๋‹ค. ํšŒ์ƒ‰์€ 0์„ ์˜๋ฏธํ•˜๊ณ  ๊ทธ ์™ธ์˜ ๊ฐ’์€ 0๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ์ž‘์€ ๊ฐ’์„ ์˜๋ฏธํ•œ๋‹ค. RNN์€ ์‰ฝ๊ฒŒ gradient๊ฐ€ 0์ด ๋˜๋Š” ๋ฐ˜๋ฉด์— LSTM์€ ๊ฝค ๊ธด ํƒ€์ž„ ์Šคํ…๊นŒ์ง€๋„ gradient๊ฐ€ ์‚ด์•„์žˆ๋Š” ๋ชจ์Šต์ด๋‹ค.

  • Long Term Dependency๋ฅผ ํ•ด๊ฒฐํ•ด์ฃผ๋Š” ๋ชจ์Šต.

์‹ค์Šต

ํ•„์š” ํŒจํ‚ค์ง€

from tqdm import tqdm
from torch import nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

import torch

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

vocab_size = 100
pad_id = 0

data = [
  [85,14,80,34,99,20,31,65,53,86,3,58,30,4,11,6,50,71,74,13],
  [62,76,79,66,32],
  [93,77,16,67,46,74,24,70],
  [19,83,88,22,57,40,75,82,4,46],
  [70,28,30,24,76,84,92,76,77,51,7,20,82,94,57],
  [58,13,40,61,88,18,92,89,8,14,61,67,49,59,45,12,47,5],
  [22,5,21,84,39,6,9,84,36,59,32,30,69,70,82,56,1],
  [94,21,79,24,3,86],
  [80,80,33,63,34,63],
  [87,32,79,65,2,96,43,80,85,20,41,52,95,50,35,96,24,80]
]
  • ํ˜„์žฌ๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ธธ์ด๊ฐ€ ๋ชจ๋‘ ๋‹ค๋ฅธ ๋ชจ์Šต

max_len = len(max(data, key=len))
print(f"Maximum sequence length: {max_len}")

valid_lens = []
for i, seq in enumerate(tqdm(data)):
  valid_lens.append(len(seq))
  if len(seq) < max_len:
    data[i] = seq + [pad_id] * (max_len - len(seq))
  • ๊ฐ€์žฅ ๊ธด ๋ฐ์ดํ„ฐ์˜ ๊ธธ์ด๋กœ ํ†ต์ผํ•ด์ค€๋‹ค.

  • ์ด ๋•Œ ์ด ๊ธธ์ด๋ณด๋‹ค ์งง์€ ๋ฐ์ดํ„ฐ๋Š” pad_id==0 ์œผ๋กœ ์ฑ„์›Œ์ค€๋‹ค.

for i in data:
  print(i)
print(valid_lens)
[85, 14, 80, 34, 99, 20, 31, 65, 53, 86, 3, 58, 30, 4, 11, 6, 50, 71, 74, 13]
[62, 76, 79, 66, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[93, 77, 16, 67, 46, 74, 24, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[19, 83, 88, 22, 57, 40, 75, 82, 4, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[70, 28, 30, 24, 76, 84, 92, 76, 77, 51, 7, 20, 82, 94, 57, 0, 0, 0, 0, 0]
[58, 13, 40, 61, 88, 18, 92, 89, 8, 14, 61, 67, 49, 59, 45, 12, 47, 5, 0, 0]
[22, 5, 21, 84, 39, 6, 9, 84, 36, 59, 32, 30, 69, 70, 82, 56, 1, 0, 0, 0]
[94, 21, 79, 24, 3, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[80, 80, 33, 63, 34, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[87, 32, 79, 65, 2, 96, 43, 80, 85, 20, 41, 52, 95, 50, 35, 96, 24, 80, 0, 0]

[20, 5, 8, 10, 15, 18, 17, 6, 6, 18]
  • ๊ธธ์ด๊ฐ€ ๋ชจ๋‘ 20์œผ๋กœ ํ†ต์ผ๋œ ๋ชจ์Šต

  • ๋˜, ์›๋ž˜ ๊ธธ์ด๋ฅผ ์•Œ๊ธฐ์œ„ํ•œ valid_lens ๋ฅผ ์„ ์–ธํ•œ๋‹ค.

# B: batch size, L: maximum sequence length
batch = torch.LongTensor(data)  # (B, L)
batch_lens = torch.LongTensor(valid_lens)  # (B)
  • ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ batch๋กœ ๋งŒ๋“ ๋‹ค.

    • ๋‹จ์ˆœํžˆ Tensorํ™” ํ•ด์ฃผ๋Š” ๊ณผ์ •์œผ๋กœ ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.

RNN ์‚ฌ์šฉ

RNN์— ๋„ฃ๊ธฐ์ „์— ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ์„ ํ•ด์•ผํ•œ๋‹ค.

embedding_size = 256
embedding = nn.Embedding(vocab_size, embedding_size)

# d_w: embedding size
batch_emb = embedding(batch)  # (B, L, d_w)
  • ์ฒ˜์Œ์— vocab_size๋Š” 100์œผ๋กœ ์ •ํ•ด์ฃผ์—ˆ๋‹ค.

    • ์‹ค์ œ๋กœ ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ์˜ ์›์†Œ๋Š” 0๋ถ€ํ„ฐ 99๊นŒ์ง€์˜ ์ˆ˜๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๋‹ค.

  • embedding ์ฐจ์›์€ 256์œผ๋กœ ์ž„์˜๋กœ ์ •ํ•œ๋‹ค.

hidden_size = 512  # RNN์˜ hidden size
num_layers = 1  # ์Œ“์„ RNN layer์˜ ๊ฐœ์ˆ˜
num_dirs = 1  # 1: ๋‹จ๋ฐฉํ–ฅ RNN, 2: ์–‘๋ฐฉํ–ฅ RNN

rnn = nn.RNN(
    input_size=embedding_size,
    hidden_size=hidden_size,
    num_layers=num_layers,
    bidirectional=True if num_dirs > 1 else False
)

h_0 = torch.zeros((num_layers * num_dirs, batch.shape[0], hidden_size))  # (num_layers * num_dirs, B, d_h)
  • hidden_laye๋ฅผ ์ •์˜ํ–ˆ๋‹ค.

    • ๋‹จ์–ด๋Š” 100์ฐจ์›->256์ฐจ์› -> 512์ฐจ์› -> 256์ฐจ์›->100์ฐจ์›์œผ๋กœ ๋ณ€ํ™˜๋œ๋‹ค.

    • 100์ฐจ์›์€ ์›ํ•ซ ์ธ์ฝ”๋”ฉ

    • 256์ฐจ์›์€ ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ

    • 512์ฐจ์›์€ RNN network๋ฅผ ํ†ตํ•ด ๋ณ€ํ™˜

  • ์ดˆ๊ธฐ h0์€ 0์œผ๋กœ ์ดˆ๊ธฐํ™”๋œ๋‹ค. ํฌ๊ธฐ๋Š” (1, 10, 512) ์ด๋‹ค.

์ดํ›„, RNN์— batch data๋ฅผ ๋„ฃ๋Š”๋‹ค. ๋‘ ๊ฐ€์ง€ output์„ ์–ป๋Š”๋‹ค.

hidden_states, h_n = rnn(batch_emb.transpose(0, 1), h_0)

# d_h: hidden size, num_layers: layer ๊ฐœ์ˆ˜, num_dirs: ๋ฐฉํ–ฅ์˜ ๊ฐœ์ˆ˜
print(hidden_states.shape)  # (L, B, d_h)
print(h_n.shape)  # (num_layers*num_dirs, B, d_h) = (1, B, d_h)
torch.Size([20, 10, 512])
torch.Size([1, 10, 512])
  • transpose ๋Š” ์ „์น˜ํ–‰๋ ฌ์ด๋ฉฐ ์ธ์ž๋กœ ๋ฐ›์€ ์ฐจ์›๋ผ๋ฆฌ ๋ณ€๊ฒฝ์‹œ์ผœ์ค€๋‹ค.

    • ํ˜„์žฌ๋Š” 0๊ณผ 1์ด๋ฏ€๋กœ X_ij ์—์„œ i๊ฐ€ j๋กœ, j๊ฐ€ i๋กœ ๋ฐ”๋€๋‹ค.

  • hidden_states: ๊ฐ time step์— ํ•ด๋‹นํ•˜๋Š” hidden state๋“ค์˜ ๋ฌถ์Œ.

  • h_n: ๋ชจ๋“  sequence๋ฅผ ๊ฑฐ์น˜๊ณ  ๋‚˜์˜จ ๋งˆ์ง€๋ง‰ hidden state.

RNN ํ™œ์šฉ

๋งˆ์ง€๋ง‰ hidden state๋ฅผ ์ด์šฉํ•˜๋ฉด text classification task์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

num_classes = 2
classification_layer = nn.Linear(hidden_size, num_classes)

# C: number of classes
output = classification_layer(h_n.squeeze(0))  # (1, B, d_h) => (B, C)
print(output.shape)
torch.Size([10, 2])

๊ฐ time step์— ๋Œ€ํ•œ hidden state๋ฅผ ์ด์šฉํ•˜๋ฉด token-level์˜ task๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

num_classes = 5
entity_layer = nn.Linear(hidden_size, num_classes)

# C: number of classes
output = entity_layer(hidden_states)  # (L, B, d_h) => (L, B, C)
print(output.shape)
torch.Size([20, 10, 5])

PackedSequence ์‚ฌ์šฉ

์ฃผ์–ด์ง„ data์—์„œ ๋ถˆํ•„์š”ํ•œ pad ๊ณ„์‚ฐ์ด ํฌํ•จ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์ •๋ ฌ์„ ํ•ด์•ผํ•œ๋‹ค.

  • ์™œ ๋ถˆํ•„์š”ํ•˜๋ƒ๋ฉด 0์—๋‹ค๊ฐ€ ์–ด๋–ค ์ˆ˜๋ฅผ ๊ณฑํ•ด๋„ ๋Š˜ 0์ด๋ฏ€๋กœ ์ด ๋ถ€๋ถ„์„ ๊ณ„์‚ฐํ•  ํ•„์š”๊ฐ€ ์—†๋Š” ๊ฒƒ์ด๋‹ค.

  • ์ด๊ฒŒ ๋ฌธ์ œ๊ฐ€ ๋ผ? ๋ฌธ์ œ๊ฐ€ ๋œ๋‹ค. ๊ฐ ํƒ€์ž„์Šคํ… ๋งˆ๋‹ค ๊ณ„์‚ฐ์ด ํ•„์š”ํ•œ๋ฐ ์ด ๋•Œ ๋งŽ์€ 0์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์ƒ๋žตํ•˜๋Š” ๊ฒƒ์ด ๋” ๋น ๋ฅธ ์—ฐ์‚ฐ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ์ •๋ ฌ์„ ํ•˜๋ฉด ํ•ด๊ฒฐ๋ผ? ํ•ด๊ฒฐ์ด ๋œ๋‹ค. ์ •๋ ฌ์„ ํ•˜๊ณ  ๊ฐ ํƒ€์ž„์Šคํ…๋ณ„๋กœ ๋ฌธ์žฅ์˜ ์ตœ๋Œ€ ๊ธธ์ด๋ฅผ ๊ธฐ์–ตํ•˜๊ณ  ์žˆ์œผ๋ฉด ๋œ๋‹ค.

  • ์ด ๋•Œ ์ด ๊ธฐ๋Šฅ์€ torch.nn.utils.rnn ์—์„œ ์ง€์›ํ•˜๋Š” pack_padded_sequence ์™€ pad_packed_sequence ๋ฅผ ์ด์šฉํ•˜๋ฉด ๋œ๋‹ค.

sorted_lens, sorted_idx = batch_lens.sort(descending=True)
sorted_batch = batch[sorted_idx]

sorted_batch_emb = embedding(sorted_batch)
packed_batch = pack_padded_sequence(sorted_batch_emb.transpose(0, 1), sorted_lens)

packed_outputs, h_n = rnn(packed_batch, h_0)

outputs, outputs_lens = pad_packed_sequence(packed_outputs)

print(outputs.shape)  # (L, B, d_h)
print(outputs_lens)
torch.Size([20, 10, 512])
tensor([20, 18, 18, 17, 15, 10,  8,  6,  6,  5])
  • ์ฃผ์–ด์ง„ data๋ฅผ ์ •๋ ฌํ•˜๊ณ  embeddingํ•œ ๋’ค

    ์ •๋ ฌํ•œ data๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜๊ณ  PackSequence ๋ชจ์–‘์œผ๋กœ ๋ฐ”๊ฟ”์„œ rnn์— ์ž…๋ ฅํ•œ๋‹ค.

  • ์ดํ›„ ์–ป์€ output์€ ์›๋ž˜ outputํ˜•ํƒœ์™€ ๋‹ค๋ฅด๋ฏ€๋กœ pad_packed_sequence ๋ฅผ ์ด์šฉํ•˜์—ฌ ์›๋ž˜ ํ˜•ํƒœ๋กœ ๋˜๋Œ๋ ค ์ค€๋‹ค.

๋ฅผ ๋ณด๋ฉด ์ดํ•ด๊ฐ€ ์‰ฝ๋‹ค

์—ฌ๊ธฐ
https://simonjisu.github.io/nlp/2018/07/05/packedsequence.html