๐Ÿšดโ€โ™‚๏ธ
TIL
  • MAIN
  • : TIL?
  • : WIL
  • : Plan
  • : Retrospective
    • 21Y
      • Wait a moment!
      • 9M 2W
      • 9M1W
      • 8M4W
      • 8M3W
      • 8M2W
      • 8M1W
      • 7M4W
      • 7M3W
      • 7M2W
      • 7M1W
      • 6M5W
      • 1H
    • ์ƒˆ์‚ฌ๋žŒ ๋˜๊ธฐ ํ”„๋กœ์ ํŠธ
      • 2ํšŒ์ฐจ
      • 1ํšŒ์ฐจ
  • TIL : ML
    • Paper Analysis
      • BERT
      • Transformer
    • Boostcamp 2st
      • [S]Data Viz
        • (4-3) Seaborn ์‹ฌํ™”
        • (4-2) Seaborn ๊ธฐ์ดˆ
        • (4-1) Seaborn ์†Œ๊ฐœ
        • (3-4) More Tips
        • (3-3) Facet ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-2) Color ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-1) Text ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-3) Scatter Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-2) Line Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-1) Bar Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (1-3) Python๊ณผ Matplotlib
        • (1-2) ์‹œ๊ฐํ™”์˜ ์š”์†Œ
        • (1-1) Welcome to Visualization (OT)
      • [P]MRC
        • (2๊ฐ•) Extraction-based MRC
        • (1๊ฐ•) MRC Intro & Python Basics
      • [P]KLUE
        • (5๊ฐ•) BERT ๊ธฐ๋ฐ˜ ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
        • (4๊ฐ•) ํ•œ๊ตญ์–ด BERT ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต
        • [NLP] ๋ฌธ์žฅ ๋‚ด ๊ฐœ์ฒด๊ฐ„ ๊ด€๊ณ„ ์ถ”์ถœ
        • (3๊ฐ•) BERT ์–ธ์–ด๋ชจ๋ธ ์†Œ๊ฐœ
        • (2๊ฐ•) ์ž์—ฐ์–ด์˜ ์ „์ฒ˜๋ฆฌ
        • (1๊ฐ•) ์ธ๊ณต์ง€๋Šฅ๊ณผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
      • [U]Stage-CV
      • [U]Stage-NLP
        • 7W Retrospective
        • (10๊ฐ•) Advanced Self-supervised Pre-training Models
        • (09๊ฐ•) Self-supervised Pre-training Models
        • (08๊ฐ•) Transformer (2)
        • (07๊ฐ•) Transformer (1)
        • 6W Retrospective
        • (06๊ฐ•) Beam Search and BLEU score
        • (05๊ฐ•) Sequence to Sequence with Attention
        • (04๊ฐ•) LSTM and GRU
        • (03๊ฐ•) Recurrent Neural Network and Language Modeling
        • (02๊ฐ•) Word Embedding
        • (01๊ฐ•) Intro to NLP, Bag-of-Words
        • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Preprocessing for NMT Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Subword-level Language Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ2] RNN-based Language Model
        • [์„ ํƒ ๊ณผ์ œ] BERT Fine-tuning with transformers
        • [ํ•„์ˆ˜ ๊ณผ์ œ] Data Preprocessing
      • Mask Wear Image Classification
        • 5W Retrospective
        • Report_Level1_6
        • Performance | Review
        • DAY 11 : HardVoting | MultiLabelClassification
        • DAY 10 : Cutmix
        • DAY 9 : Loss Function
        • DAY 8 : Baseline
        • DAY 7 : Class Imbalance | Stratification
        • DAY 6 : Error Fix
        • DAY 5 : Facenet | Save
        • DAY 4 : VIT | F1_Loss | LrScheduler
        • DAY 3 : DataSet/Lodaer | EfficientNet
        • DAY 2 : Labeling
        • DAY 1 : EDA
        • 2_EDA Analysis
      • [P]Stage-1
        • 4W Retrospective
        • (10๊ฐ•) Experiment Toolkits & Tips
        • (9๊ฐ•) Ensemble
        • (8๊ฐ•) Training & Inference 2
        • (7๊ฐ•) Training & Inference 1
        • (6๊ฐ•) Model 2
        • (5๊ฐ•) Model 1
        • (4๊ฐ•) Data Generation
        • (3๊ฐ•) Dataset
        • (2๊ฐ•) Image Classification & EDA
        • (1๊ฐ•) Competition with AI Stages!
      • [U]Stage-3
        • 3W Retrospective
        • PyTorch
          • (10๊ฐ•) PyTorch Troubleshooting
          • (09๊ฐ•) Hyperparameter Tuning
          • (08๊ฐ•) Multi-GPU ํ•™์Šต
          • (07๊ฐ•) Monitoring tools for PyTorch
          • (06๊ฐ•) ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
          • (05๊ฐ•) Dataset & Dataloader
          • (04๊ฐ•) AutoGrad & Optimizer
          • (03๊ฐ•) PyTorch ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
          • (02๊ฐ•) PyTorch Basics
          • (01๊ฐ•) Introduction to PyTorch
      • [U]Stage-2
        • 2W Retrospective
        • DL Basic
          • (10๊ฐ•) Generative Models 2
          • (09๊ฐ•) Generative Models 1
          • (08๊ฐ•) Sequential Models - Transformer
          • (07๊ฐ•) Sequential Models - RNN
          • (06๊ฐ•) Computer Vision Applications
          • (05๊ฐ•) Modern CNN - 1x1 convolution์˜ ์ค‘์š”์„ฑ
          • (04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?
          • (03๊ฐ•) Optimization
          • (02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)
          • (01๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ์šฉ์–ด ์„ค๋ช… - Historical Review
        • Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Multi-headed Attention Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] LSTM Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] CNN Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Optimization Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] MLP Assignment
      • [U]Stage-1
        • 1W Retrospective
        • AI Math
          • (AI Math 10๊ฐ•) RNN ์ฒซ๊ฑธ์Œ
          • (AI Math 9๊ฐ•) CNN ์ฒซ๊ฑธ์Œ
          • (AI Math 8๊ฐ•) ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 7๊ฐ•) ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 6๊ฐ•) ํ™•๋ฅ ๋ก  ๋ง›๋ณด๊ธฐ
          • (AI Math 5๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต๋ฐฉ๋ฒ• ์ดํ•ดํ•˜๊ธฐ
          • (AI Math 4๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ๋งค์šด๋ง›
          • (AI Math 3๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ์ˆœํ•œ๋ง›
          • (AI Math 2๊ฐ•) ํ–‰๋ ฌ์ด ๋ญ์˜ˆ์š”?
          • (AI Math 1๊ฐ•) ๋ฒกํ„ฐ๊ฐ€ ๋ญ์˜ˆ์š”?
        • Python
          • (Python 7-2๊ฐ•) pandas II
          • (Python 7-1๊ฐ•) pandas I
          • (Python 6๊ฐ•) numpy
          • (Python 5-2๊ฐ•) Python data handling
          • (Python 5-1๊ฐ•) File / Exception / Log Handling
          • (Python 4-2๊ฐ•) Module and Project
          • (Python 4-1๊ฐ•) Python Object Oriented Programming
          • (Python 3-2๊ฐ•) Pythonic code
          • (Python 3-1๊ฐ•) Python Data Structure
          • (Python 2-4๊ฐ•) String and advanced function concept
          • (Python 2-3๊ฐ•) Conditionals and Loops
          • (Python 2-2๊ฐ•) Function and Console I/O
          • (Python 2-1๊ฐ•) Variables
          • (Python 1-3๊ฐ•) ํŒŒ์ด์ฌ ์ฝ”๋”ฉ ํ™˜๊ฒฝ
          • (Python 1-2๊ฐ•) ํŒŒ์ด์ฌ ๊ฐœ์š”
          • (Python 1-1๊ฐ•) Basic computer class for newbies
        • Assignment
          • [์„ ํƒ ๊ณผ์ œ 3] Maximum Likelihood Estimate
          • [์„ ํƒ ๊ณผ์ œ 2] Backpropagation
          • [์„ ํƒ ๊ณผ์ œ 1] Gradient Descent
          • [ํ•„์ˆ˜ ๊ณผ์ œ 5] Morsecode
          • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Baseball
          • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Text Processing 2
          • [ํ•„์ˆ˜ ๊ณผ์ œ 2] Text Processing 1
          • [ํ•„์ˆ˜ ๊ณผ์ œ 1] Basic Math
    • ๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ
      • ์ข…ํ•ฉ ์‹ค์Šต 2 - ์บ๊ธ€ Plant Pathology(๋‚˜๋ฌด์žŽ ๋ณ‘ ์ง„๋‹จ) ๊ฒฝ์—ฐ ๋Œ€ํšŒ
      • ์ข…ํ•ฉ ์‹ค์Šต 1 - 120์ข…์˜ Dog Breed Identification ๋ชจ๋ธ ์ตœ์ ํ™”
      • ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์˜ ๋ฏธ์„ธ ์กฐ์ • ํ•™์Šต๊ณผ ๋‹ค์–‘ํ•œ Learning Rate Scheduler์˜ ์ ์šฉ
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - ResNet ์ƒ์„ธ์™€ EfficientNet ๊ฐœ์š”
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - AlexNet, VGGNet, GoogLeNet
      • Albumentation์„ ์ด์šฉํ•œ Augmentation๊ธฐ๋ฒ•๊ณผ Keras Sequence ํ™œ์šฉํ•˜๊ธฐ
      • ์‚ฌ์ „ ํ›ˆ๋ จ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ๊ณผ Keras Generator ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ดํ•ด
      • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์ดํ•ด - Keras ImageDataGenerator ํ™œ์šฉ
      • CNN ๋ชจ๋ธ ๊ตฌํ˜„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ณธ ๊ธฐ๋ฒ• ์ ์šฉํ•˜๊ธฐ
    • AI School 1st
    • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
    • ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜
  • TIL : Python & Math
    • Do It! ์žฅ๊ณ +๋ถ€ํŠธ์ŠคํŠธ๋žฉ: ํŒŒ์ด์ฌ ์›น๊ฐœ๋ฐœ์˜ ์ •์„
      • Relations - ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„
      • Relations - ๋‹ค๋Œ€์ผ ๊ด€๊ณ„
      • ํ…œํ”Œ๋ฆฟ ํŒŒ์ผ ๋ชจ๋“ˆํ™” ํ•˜๊ธฐ
      • TDD (Test Driven Development)
      • template tags & ์กฐ๊ฑด๋ฌธ
      • ์ •์  ํŒŒ์ผ(static files) & ๋ฏธ๋””์–ด ํŒŒ์ผ(media files)
      • FBV (Function Based View)์™€ CBV (Class Based View)
      • Django ์ž…๋ฌธํ•˜๊ธฐ
      • ๋ถ€ํŠธ์ŠคํŠธ๋žฉ
      • ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์ดˆ๋‹ค์ง€๊ธฐ (HTML, CSS, JS)
      • ๋“ค์–ด๊ฐ€๊ธฐ + ํ™˜๊ฒฝ์„ค์ •
    • Algorithm
      • Programmers
        • Level1
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์ˆซ์ž ๋ฌธ์ž์—ด๊ณผ ์˜๋‹จ์–ด
          • ์ž์—ฐ์ˆ˜ ๋’ค์ง‘์–ด ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค๊ธฐ
          • ์ •์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐฐ์น˜ํ•˜๊ธฐ
          • ์ •์ˆ˜ ์ œ๊ณฑ๊ทผ ํŒ๋ณ„
          • ์ œ์ผ ์ž‘์€ ์ˆ˜ ์ œ๊ฑฐํ•˜๊ธฐ
          • ์ง์‚ฌ๊ฐํ˜• ๋ณ„์ฐ๊ธฐ
          • ์ง์ˆ˜์™€ ํ™€์ˆ˜
          • ์ฒด์œก๋ณต
          • ์ตœ๋Œ€๊ณต์•ฝ์ˆ˜์™€ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • ์ฝœ๋ผ์ธ  ์ถ”์ธก
          • ํฌ๋ ˆ์ธ ์ธํ˜•๋ฝ‘๊ธฐ ๊ฒŒ์ž„
          • ํ‚คํŒจ๋“œ ๋ˆ„๋ฅด๊ธฐ
          • ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
          • ํฐ์ผ“๋ชฌ
          • ํ•˜์ƒค๋“œ ์ˆ˜
          • ํ•ธ๋“œํฐ ๋ฒˆํ˜ธ ๊ฐ€๋ฆฌ๊ธฐ
          • ํ–‰๋ ฌ์˜ ๋ง์…ˆ
        • Level2
          • ์ˆซ์ž์˜ ํ‘œํ˜„
          • ์ˆœ์œ„ ๊ฒ€์ƒ‰
          • ์ˆ˜์‹ ์ตœ๋Œ€ํ™”
          • ์†Œ์ˆ˜ ์ฐพ๊ธฐ
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์‚ผ๊ฐ ๋‹ฌํŒฝ์ด
          • ๋ฌธ์ž์—ด ์••์ถ•
          • ๋ฉ”๋‰ด ๋ฆฌ๋‰ด์–ผ
          • ๋” ๋งต๊ฒŒ
          • ๋•…๋”ฐ๋จน๊ธฐ
          • ๋ฉ€์ฉกํ•œ ์‚ฌ๊ฐํ˜•
          • ๊ด„ํ˜ธ ํšŒ์ „ํ•˜๊ธฐ
          • ๊ด„ํ˜ธ ๋ณ€ํ™˜
          • ๊ตฌ๋ช…๋ณดํŠธ
          • ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
          • ๋‰ด์Šค ํด๋Ÿฌ์Šคํ„ฐ๋ง
          • ๋‹ค๋ฆฌ๋ฅผ ์ง€๋‚˜๋Š” ํŠธ๋Ÿญ
          • ๋‹ค์Œ ํฐ ์ˆซ์ž
          • ๊ฒŒ์ž„ ๋งต ์ตœ๋‹จ๊ฑฐ๋ฆฌ
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
          • ๊ฐ€์žฅ ํฐ ์ •์‚ฌ๊ฐํ˜• ์ฐพ๊ธฐ
          • H-Index
          • JadenCase ๋ฌธ์ž์—ด ๋งŒ๋“ค๊ธฐ
          • N๊ฐœ์˜ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • N์ง„์ˆ˜ ๊ฒŒ์ž„
          • ๊ฐ€์žฅ ํฐ ์ˆ˜
          • 124 ๋‚˜๋ผ์˜ ์ˆซ์ž
          • 2๊ฐœ ์ดํ•˜๋กœ ๋‹ค๋ฅธ ๋น„ํŠธ
          • [3์ฐจ] ํŒŒ์ผ๋ช… ์ •๋ ฌ
          • [3์ฐจ] ์••์ถ•
          • ์ค„ ์„œ๋Š” ๋ฐฉ๋ฒ•
          • [3์ฐจ] ๋ฐฉ๊ธˆ ๊ทธ๊ณก
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
        • Level3
          • ๋งค์นญ ์ ์ˆ˜
          • ์™ธ๋ฒฝ ์ ๊ฒ€
          • ๊ธฐ์ง€๊ตญ ์„ค์น˜
          • ์ˆซ์ž ๊ฒŒ์ž„
          • 110 ์˜ฎ๊ธฐ๊ธฐ
          • ๊ด‘๊ณ  ์ œ๊ฑฐ
          • ๊ธธ ์ฐพ๊ธฐ ๊ฒŒ์ž„
          • ์…”ํ‹€๋ฒ„์Šค
          • ๋‹จ์†์นด๋ฉ”๋ผ
          • ํ‘œ ํŽธ์ง‘
          • N-Queen
          • ์ง•๊ฒ€๋‹ค๋ฆฌ ๊ฑด๋„ˆ๊ธฐ
          • ์ตœ๊ณ ์˜ ์ง‘ํ•ฉ
          • ํ•ฉ์Šน ํƒ์‹œ ์š”๊ธˆ
          • ๊ฑฐ์Šค๋ฆ„๋ˆ
          • ํ•˜๋…ธ์ด์˜ ํƒ‘
          • ๋ฉ€๋ฆฌ ๋›ฐ๊ธฐ
          • ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ
        • Level4
    • Head First Python
    • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ SQL
    • ๋‹จ ๋‘ ์žฅ์˜ ๋ฌธ์„œ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™” ๋ฝ€๊ฐœ๊ธฐ
    • Linear Algebra(Khan Academy)
    • ์ธ๊ณต์ง€๋Šฅ์„ ์œ„ํ•œ ์„ ํ˜•๋Œ€์ˆ˜
    • Statistics110
  • TIL : etc
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Kubernetes
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Docker
      • 2. ๋„์ปค ์„ค์น˜ ์‹ค์Šต 1 - ํ•™์ŠตํŽธ(์ค€๋น„๋ฌผ/์‹ค์Šต ์œ ํ˜• ์†Œ๊ฐœ)
      • 1. ์ปจํ…Œ์ด๋„ˆ์™€ ๋„์ปค์˜ ์ดํ•ด - ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์“ฐ๋Š”์ด์œ  / ์ผ๋ฐ˜ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ปจํ…Œ์ด๋„ˆํ”„๋กœ๊ทธ๋žจ์˜ ์ฐจ์ด์ 
      • 0. ๋“œ๋””์–ด ์ฐพ์•„์˜จ Docker ๊ฐ•์˜! ์™•์ดˆ๋ณด์—์„œ ๋„์ปค ๋งˆ์Šคํ„ฐ๋กœ - OT
    • CoinTrading
      • [๊ฐ€์ƒ ํ™”ํ ์ž๋™ ๋งค๋งค ํ”„๋กœ๊ทธ๋žจ] ๋ฐฑํ…Œ์ŠคํŒ… : ๊ฐ„๋‹จํ•œ ํ…Œ์ŠคํŒ…
    • Gatsby
      • 01 ๊นƒ๋ถ ํฌ๊ธฐ ์„ ์–ธ
  • TIL : Project
    • Mask Wear Image Classification
    • Project. GARIGO
  • 2021 TIL
    • CHANGED
    • JUN
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Thu
      • 2 Wed
      • 1 Tue
    • MAY
      • 31 Mon
      • 30 Sun
      • 29 Sat
      • 28 Fri
      • 27 Thu
      • 26 Wed
      • 25 Tue
      • 24 Mon
      • 23 Sun
      • 22 Sat
      • 21 Fri
      • 20 Thu
      • 19 Wed
      • 18 Tue
      • 17 Mon
      • 16 Sun
      • 15 Sat
      • 14 Fri
      • 13 Thu
      • 12 Wed
      • 11 Tue
      • 10 Mon
      • 9 Sun
      • 8 Sat
      • 7 Fri
      • 6 Thu
      • 5 Wed
      • 4 Tue
      • 3 Mon
      • 2 Sun
      • 1 Sat
    • APR
      • 30 Fri
      • 29 Thu
      • 28 Wed
      • 27 Tue
      • 26 Mon
      • 25 Sun
      • 24 Sat
      • 23 Fri
      • 22 Thu
      • 21 Wed
      • 20 Tue
      • 19 Mon
      • 18 Sun
      • 17 Sat
      • 16 Fri
      • 15 Thu
      • 14 Wed
      • 13 Tue
      • 12 Mon
      • 11 Sun
      • 10 Sat
      • 9 Fri
      • 8 Thu
      • 7 Wed
      • 6 Tue
      • 5 Mon
      • 4 Sun
      • 3 Sat
      • 2 Fri
      • 1 Thu
    • MAR
      • 31 Wed
      • 30 Tue
      • 29 Mon
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • FEB
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • JAN
      • 31 Sun
      • 30 Sat
      • 29 Fri
      • 28 Thu
      • 27 Wed
      • 26 Tue
      • 25 Mon
      • 24 Sun
      • 23 Sat
      • 22 Fri
      • 21 Thu
      • 20 Wed
      • 19 Tue
      • 18 Mon
      • 17 Sun
      • 16 Sat
      • 15 Fri
      • 14 Thu
      • 13 Wed
      • 12 Tue
      • 11 Mon
      • 10 Sun
      • 9 Sat
      • 8 Fri
      • 7 Thu
      • 6 Wed
      • 5 Tue
      • 4 Mon
      • 3 Sun
      • 2 Sat
      • 1 Fri
  • 2020 TIL
    • DEC
      • 31 Thu
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Tue
      • 2 Wed
      • 1 Tue
    • NOV
      • 30 Mon
Powered by GitBook
On this page
  • Important Concepts in Optimization
  • Generalization
  • Underfitting vs Overfitting
  • Cross-validation
  • Bias and Variance
  • Bootstrapping
  • Bagging vs Boosting
  • Practical Gradient Descent Methods
  • Gradient Descent Methods
  • Batch-size Matters
  • Gradient Descent
  • Regularization
  • ์‹ค์Šต
  • Regression with Different Optimizers

Was this helpful?

  1. TIL : ML
  2. Boostcamp 2st
  3. [U]Stage-2
  4. DL Basic

(03๊ฐ•) Optimization

210810

Previous(04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?Next(02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)

Last updated 3 years ago

Was this helpful?

Important Concepts in Optimization

Generalization

์ผ๋ฐ˜ํ™”๊ฐ€ ์ข‹๋‹ค๋ผ๋Š” ์˜๋ฏธ๋Š” ์ด ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ์„ฑ๋Šฅ๊ณผ ๋น„์Šทํ•˜๋‹ค๋ผ๋Š” ์˜๋ฏธ์ด๋‹ค. Generalization Gap์€ Train data error์™€ Test data error์˜ ์ฐจ์ด๋ฅผ ์˜๋ฏธํžŒ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์ผ๋ฐ˜ํ™”๊ฐ€ ์ž˜๋˜๋ฉด ์ข‹์€ ๊ฑธ๊นŒ? ๊ผญ ๊ทธ๋ ‡์ง€๋งŽ์€ ์•Š๋‹ค.

์™ผ์ชฝ์— ๋™๊ทธ๋ผ๋ฏธ์•ˆ์„ ๋ณด๋ฉด ์ผ๋ฐ˜ํ™” ๊ฐญ์€ ๋งค์šฐ ๋‚ฎ์ง€๋งŒ ์—๋Ÿฌ๋Š” ๋งค์šฐ ๋†’๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ, ์ผ๋ฐ˜ํ™”๊ฐ€ ์ž˜๋˜๋ฉด์„œ Traing error ๊นŒ์ง€ ๋‚ฎ์•„์•ผ ์ข‹๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค.

Underfitting vs Overfitting

ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๊ณผ๋„ํ•˜๊ฒŒ ํ•™์Šต๋˜๋ฉด Overfitting์ด ๋ฐœ์ƒํ•˜๊ณ , ์ ๊ฒŒ ํ•™์Šต๋˜๋ฉด Underfitting์ด ๋ฐœ์ƒํ•œ๋‹ค.

Cross-validation

์˜ค๋ฒ„ํ”ผํŒ…์„ ํ”ผํ•˜๋ ค๋ฉด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ค„์ด๊ณ  ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๋ฉด ๋ ๊นŒ? ๊ทธ๋Ÿฌ๋ฉด ์ข‹์„ ์ˆ˜๋„ ์žˆ๊ฒ ์ง€๋งŒ, ์ผ๋ฐ˜์ ์œผ๋กœ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์•„์•ผ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ฆ๊ฐ€ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ๋“ฑ์žฅํ•œ ๊ฒƒ์ด cross-validation

ํ•™์Šตํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ€๋ถ„ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆ„์–ด์„œ n๊ฐœ๋กœ ๋งŒ๋“ ๋‹ค. ๊ทธ๋ฆฌ๊ณ  n๋ฒˆ์˜ ํ•™์Šต์„ ๊ฑฐ์น˜๋ฉด์„œ ๊ฐ๊ฐ์˜ ๋ถ€๋ถ„ ๋ฐ์ดํ„ฐ๊ฐ€ 1๋ฒˆ์”ฉ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋กœ, ๋‚˜๋จธ์ง€๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์ด๋‹ค.

Bias and Variance

Varience๊ฐ€ ๋‚ฎ์œผ๋ฉด ์ถœ๋ ฅ์ด ์ผ๊ด€๋œ๋‹ค. ํฌ๋ฉด ์ถœ๋ ฅ์ด ๋งŽ์ด ๋‹ฌ๋ผ์ง„๋‹ค. ๊ทธ๋ž˜์„œ Overfitting์ด ๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค

Bias๊ฐ€ ๋‚ฎ์œผ๋ฉด ํ‰๊ท ๊ฐ’์„ ๋งŽ์ด ์ถœ๋ ฅํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋ฐ˜๋Œ€๋กœ ํฌ๋ฉด ํ‰๊ท ์—์„œ ๋งŽ์ด ๋ฒ—์–ด๋‚œ ๊ฐ’๋“ค์„ ์ถœ๋ ฅํ•œ๋‹ค.

cost๋ฅผ ์ค„์ด๋Š” ๊ณผ์ •์—์„œ cost๋Š” varience์™€ bias ๊ทธ๋ฆฌ๊ณ  noise๋ผ๋Š” 3๊ฐ€์ง€์˜ ์š”์†Œ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋Š”๋ฐ ์ด ์„ธ ์š”์†Œ๋Š” tradeoff์˜ ๊ด€๊ณ„์— ์žˆ๋‹ค

cost๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์€ bias์™€ variance์™€ noise๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ธ๋ฐ, bias๋ฅผ ์ค„์ด๋ฉด variance๊ฐ€ ๋†’์•„์ง€๊ฒŒ ๋˜๊ณ  noise๊ฐ€ ์žˆ์œผ๋ฉด bias์™€ variance๋ฅผ ๋™์‹œ์— ์ค„์ด๊ธฐ๋Š” ์–ด๋ ต๊ฒŒ ๋œ๋‹ค.

Bootstrapping

๋œป์€ ์‹ ๋ฐœ๋ˆ. ์‹ ๋ฐœ๋ˆ์€ ๋“ค์–ด์„œ ํ•˜๋Š˜์„ ๋‚ ๊ฒ ๋‹ค๋Š” ํ—ˆ๋ฌด๋งน๋ž‘ํ•œ ์˜๋ฏธ. ํ…Œ์ŠคํŠธ์…‹์ด ๊ณ ์ •๋˜์–ด ์žˆ์„ ๋•Œ ์ด๋ฅผ ์ „๋ถ€์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ ํ…Œ์ŠคํŠธํ…Ÿ์„ ๋งŒ๋“ค๊ณ  ๋˜ ์ด๋ฅผ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๋ชจ๋ธ๊ณผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ดํ›„ ์ด ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ€ ์ผ์น˜ํ•˜๋Š”์ง€ ๋“ฑ์„ ๋ณด๊ณ  ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํŒŒ์•…ํ•  ๋•Œ ์‚ฌ์šฉํ•œ๋‹ค.

Bagging vs Boosting

Bagging

Bootstrapping aggregating์˜ ์ค€๋ง. ํ…Œ์ŠคํŠธ์…‹์ด ๊ณ ์ •๋˜์–ด ์žˆ์„ ๋•Œ ์ด ํ…Œ์ŠคํŠธํ…Ÿ ํ•˜๋‚˜๋ฅผ ์ „๋ถ€ ์‚ฌ์šฉํ•ด์„œ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋งŒ๋“ค์–ด์„œ Boostrap ํ•˜๋Š” ๊ฒƒ. ์ผ๋ฐ˜์ ์œผ๋กœ ์•™์ƒ๋ธ”์ด๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค.

์‹ค์ œ๋กœ๋„ 100%์˜ ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“œ๋Š” ๊ฒƒ๋ณด๋‹ค 80%์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ด 5๊ฐœ์˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ  ํ‰๊ท ์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์„ฑ๋Šฅ์ด ๋” ์ข‹๋‹ค.

Boosting

100๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ํ•™์Šตํ•˜๊ณ  ์ด ์ค‘์— 80๊ฐœ์— ๋Œ€ํ•ด์„œ๋งŒ ์ž˜ ์˜ˆ์ธกํ–ˆ๋‹ค๋ฉด ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•œ 20๊ฐœ์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋งŒ ํ•™์Šตํ•˜๋Š” ๋‘๋ฒˆ์งธ ๋ชจ๋ธ์„ ๋งŒ๋“ ๋‹ค. ์ด๋ ‡๊ฒŒ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์„œ ํ•ฉ์นœ๋‹ค. ํ•˜๋‚˜ํ•˜๋‚˜์˜ ๋ชจ๋ธ์„ sequence ํ•˜๊ฒŒ ์—ฐ๊ฒฐํ•œ๋‹ค (๋…๋ฆฝ์ ์œผ๋กœ ๋ณด๋Š”๊ฒƒ์ด ์•„๋‹˜)

Practical Gradient Descent Methods

Gradient Descent Methods

Stochastic gradient descent

  • ํ•˜๋‚˜์˜ ์ƒ˜ํ”Œ๋กœ๋งŒ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ฐฑ์‹ ํ•œ๋‹ค

Mini-batch gradient descent

  • ๋ช‡๊ฐœ์˜ ์ƒ˜ํ”Œ๋กœ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ฐฑ์‹ ํ•œ๋‹ค

Batch gradient descent

  • ์ „์ฒด ๋ฐ์ดํ„ฐ๋กœ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ฐฑ์‹ ํ•œ๋‹ค

Batch-size Matters

๋‹จ์ˆœํžˆ, ํ•œ๊ฐœ๋Š” ๋„ˆ๋ฌด ์ ๊ณ  ์ „์ฒด๋Š” ๋„ˆ๋ฌด ์˜ค๋ž˜๊ฑธ๋ฆฌ๋‹ˆ๊นŒ ์ผ๋ถ€๋กœ ํ•˜๋ฉด ๋˜๊ฒ ์ง€ ๋ผ๋Š” ์ด์œ ๋ณด๋‹ค ๋ฐฐ์น˜์‚ฌ์ด์ฆˆ๊ฐ€ ๊ต‰์žฅํžˆ ์ค‘์š”ํ•˜๋‹ค.

๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ž‘์„์ˆ˜๋ก ์‹คํ—˜์ ์œผ๋กœ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค. ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ž‘์„์ˆ˜๋ก Flat Minimum์— ๋„๋‹ฌํ•˜๊ธฐ ์‰ฝ๊ณ , ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ํด์ˆ˜๋ก Sharp Minimum์— ๋„๋‹ฌํ•˜๊ธฐ ์‰ฝ๋‹ค.

Sharp๋Š” ๊ฐ’์ด ์กฐ๊ธˆ๋งŒ ๋‹ฌ๋ผ์ ธ๋„ Loss๋‚˜ Accuracy๊ฐ€ ํฌ๊ฒŒ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์…‹์ด ๋‹ฌ๋ผ์ง€๋ฉด ์„ฑ๋Šฅ์ด ์ž˜ ์•ˆ๋‚˜์˜จ๋‹ค.

Gradient Descent

(Stochastic) GD

๋ฌธ์ œ๋Š”, ํ•™์Šต๋ฅ ์„ ์ง€์ •ํ•˜๊ธฐ๊ฐ€ ๋„ˆ๋ฌด ์–ด๋ ต๋‹ค. ๋„ˆ๋ฌด ์ปค๋„, ๋„ˆ๋ฌด ์ž‘์•„๋„ ์•ˆ๋˜๊ธฐ ๋•Œ๋ฌธ

Momentum

์ด์ „์— ๊ฐ€์ค‘์น˜ ๊ฐฑ์‹  ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด ๋•Œ gradient์˜ ๋ณ€๋™ํญ์ด ํฌ๋”๋ผ๋„ ์ˆ˜๋ ดํ•˜๋Š” ์ชฝ์œผ๋กœ ํ•™์Šต์„ ์ž˜ ํ•˜๊ฒŒ ๋œ๋‹ค.

Nesterov Accelerated Gradient, NAG

a๋ผ๋Š” ์ด์ „์˜ ๊ฐ€์ค‘์น˜ ์ •๋ณด๋งŒํผ ํ•œ step ์ด๋™ํ•˜๊ณ  ๊ทธ ์ž๋ฆฌ์—์„œ ์ƒˆ๋กœ ๊ฐฑ์‹ ๋œ ๊ฐ€์ค‘์น˜ ๋งŒํผ ์ด๋™ํ•œ๋‹ค. ์ฆ‰, ๊ด€์„ฑ์— ์˜ํ•ด์„œ ์ตœ์†Œ์ ์„ ์ง€๋‚˜๋”๋ผ๋„ ์ง€๋‚œ ์‹œ์ ์—์„œ ์ƒˆ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ๊ตฌํ•ด์„œ ๋”ํ•˜๋ฉด ๋œ๋‹ค๋Š” ๋œป!

๊ธฐ์กด์˜ ๋ชจ๋ฉ˜ํ…€์€ ์ตœ์†Œ์ ์„ ์ง€๋‚˜๋”๋ผ๋„ ๋‹ค์‹œ ์ตœ์†Œ์  ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€์ง€ ๋ชปํ•˜๊ณ  ๊ด€์„ฑ ๋•Œ๋ฌธ์— ๋” ๋ฉ€์–ด์กŒ๋‹ค๊ฐ€ ๋‹ค์‹œ ์˜ค๊ฒŒ๋œ๋‹ค. (๋งˆ์น˜ ์ง„์ž์šด๋™์ฒ˜๋Ÿผ) ๊ทธ๋ž˜์„œ ์ˆ˜๋ ดํ•˜๋Š” ์ง€์  ์ฃผ๋ณ€์—๋Š” ๋„๋‹ฌํ•˜์ง€๋งŒ ์ •ํ™•ํžˆ๋Š” ์ˆ˜๋ ดํ•˜์ง€ ๋ชปํ•˜๊ฒŒ ๋œ๋‹ค.

NAG๋Š” ์ด๋Ÿฌํ•œ ์ตœ์†Œ์ ์— ๋” ๋น ๋ฅด๊ฒŒ ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค€๋‹ค.

Adagrad

Adaptive Gradient, ๊ฐ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๋ณ€ํ™”์œจ์— ๋”ฐ๋ผ STEP SIZE๋ฅผ ๋‹ค๋ฅด๊ฒŒ ๊ณฑํ•ด์ค€๋‹ค. ๊ทธ๋ž˜์„œ ์กฐ๊ธˆ ๋ณ€ํ™”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋” ๋งŽ์ด, ๋งŽ์ด ๋ณ€ํ™”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋” ์ ๊ฒŒ ๋ณ€ํ™”ํ•˜๋„๋ก ํ•œ๋‹ค.

์™œ๋ƒํ•˜๋ฉด, ์ž์ฃผ ๋“ฑ์žฅํ•˜๊ฑฐ๋‚˜ ๋ณ€ํ™”๋ฅผ ๋งŽ์ด ํ•œ ๋ณ€์ˆ˜๋“ค์€ optimum์— ๊ฐ€๊นŒ์ด ์žˆ์„ ํ™•๋ฅ ์ด ๋†’์•„์„œ ์„ธ๋ฐ€ํ•˜๊ฒŒ ์ด๋™ํ•ด์•ผ ํ•˜๊ณ , ์ ๊ฒŒ ๋ณ€ํ™”ํ•œ ๋ณ€์ˆ˜๋“ค์€ ๋น ๋ฅด๊ฒŒ optimump์— ๊ฐ€๊นŒ์›Œ์ง€๊ธฐ ์œ„ํ•ด ๋งŽ์ด ์ด๋™ํ•ด์•ผํ•  ํ™•๋ฅ ์ด ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๋น ๋ฅด๊ฒŒ loss๋ฅผ ์ค„์ด๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™ํ•˜๋ ค๋Š” ๋ฐฉ์‹์ด๋‹ค.

G๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ์ œ๊ณฑํ•ด์„œ ๋ชจ๋‘ ๋”ํ•œ ๊ฐ’์ด๋ฉฐ ์—ก์‹ค๋ก ์€ 0์œผ๋กœ ๋‚˜๋ˆ ์ง€์ง€ ์•Š๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค. ํ•™์Šต์„ ์ง„ํ–‰ํ• ์ˆ˜๋ก G๊ฐ’์—๋Š” ์ œ๊ณฑํ•œ ๊ฐ’์ด ๋“ค์–ด์˜ค๋ฏ€๋กœ ๊ณ„์† ์ฆ๊ฐ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต์ด ๋„ˆ๋ฌด ์˜ค๋ž˜๋˜๋ฉด step size๊ฐ€ ๋„ˆ๋ฌด ์ž‘์•„์ ธ์„œ ๊ฑฐ์˜ ์›€์ง์ด์ง€ ์•Š๊ฒŒ ๋œ๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค.

Adadelta

AdaGrad์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด๋‹ค.

Gt๋Š” exponential moving average๋ฅผ ํ†ตํ•ด ๊ฐ’์„ ๊ฐฑ์‹ ํ•˜๊ฒŒ ๋œ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์ด์ „์— gt๋“ค์„ ๋ชจ๋‘ ๊ธฐ์–ตํ•˜๊ณ  ์žˆ์–ด์•ผ ํ•˜๋Š”๋ฐ, ์ด๋ถ€๋ถ„์— ๋ฆฌ์†Œ์Šค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฒฐ๊ณผ๊ฐ€ ๋น„์Šทํ•ด์ง€๋Š” EMA ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

Ht๋Š” ๊ฐ€์ค‘์น˜์˜ ๋ณ€ํ™”์œจ์— ๋Œ€ํ•ด์„œ EMA๋ฅผ ์ ์šฉํ–ˆ๋‹ค.

๋‚˜๋„, ์™œ Ht์˜ ๋ฃจํŠธ๊ฐ’ / Gt ๋กœ ํ•™์Šต๋ฅ ์„ ์ •์˜ํ–ˆ๋Š”์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ๋‹ค. ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ด์•ผ ํ•  ๊ฒƒ ๊ฐ™๋‹ค.

Adadelta๋Š” ํ•™์Šต๋ฅ ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๋Š” ์š”์†Œ๊ฐ€ ๋งŽ์ด ์—†์–ด ์ž˜ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค.

RMSProp

๋…ผ๋ฌธ์„ ํ†ตํ•ด์„œ ์ œ์•ˆ๋œ ๊ฑด ์•„๋‹ˆ๊ณ , ๊ฐ•์˜์—์„œ ์†Œ๊ฐœ๋œ ๊ฒƒ์ด๋‹ค. Ht๊ฐ’ ๋Œ€์‹  ์—ํƒ€๋ผ๋Š” Stepsize๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ๋‹ค.

Adam

EMA of GS(Gradient Squares)๋ฅผ ์‚ฌ์šฉํ•จ๊ณผ ๋™์‹œ์— Momentum์„ ๊ฐ™์ด ํ™œ์šฉํ•˜๋Š” ๊ฒƒ

  • b1 : ๋ชจ๋ฉ˜ํ…€์„ ์–ผ๋งˆ๋‚˜ ์œ ์ง€์‹œํ‚ฌ ๊ฒƒ์ธ๊ฐ€

  • b2 : EMA of GS ์ •๋ณด

  • ์—ก์‹ค๋ก  e ๊ฐ’์€ ์‹ค์ œ๋กœ 10^(-7) ์ด ๊ธฐ๋ณธ๊ฐ’์ธ๋ฐ ์ด ๊ฐ’์„ ์ž˜ ์กฐ์ ˆํ•ด์ฃผ๋Š” ๊ฒƒ์ด ๊ต‰์žฅํžˆ ์ค‘์š”ํ•˜๋‹ค

Regularization

์ผ๋ฐ˜ํ™”๋ฅผ ์ž˜ ๋˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๊ทœ์ œ๋ฅผ ํ•˜๋Š” ๊ฒƒ. ํ•™์Šต์„ ๋ฐฉํ•ดํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ธ๋ฐ, ๋‹จ์ˆœํžˆ ๋ฐฉํ•ด๋ผ๋Š” ์˜๋ฏธ๋ณด๋‹ค๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—๋„ ์ž˜ ์ ์šฉ๋˜๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•

Early Stopping

Validation error๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์„ ๋•Œ ํ•™์Šต์„ ๋ฉˆ์ถ”๋Š” ๋ฐฉ๋ฒ•

  • Test error๋กœ ํ•˜๋ฉด ์•ˆ๋œ๋‹ค.

Parameter Norm Penalty

์ผ๋ฐ˜ํ™”๊ฐ€ ์ž˜๋˜๋Š” ํ•จ์ˆ˜์ผ์ˆ˜๋ก ๋ถ€๋“œ๋Ÿฌ์šด ํ•จ์ˆ˜์ผ ๊ฒƒ์ด๋‹ค๋ผ๋Š” ๊ฐ€์ •์œผ๋กœ ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

Data Augmentation

๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋ฐ์ดํ„ฐ์ธ๋ฐ, ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฌดํ•œํžˆ ๋งŽ์œผ๋ฉด ํ•ญ์ƒ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค.

๋ฐ์ดํ„ฐ๊ฐ€ ์ ์„ ๋•Œ๋Š” ์•™์ƒ๋ธ”, ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ ๊ฐ™์€ ๊ธฐ๋ฒ•๋“ค์„ ์ ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ฆ๊ฐ€ํ–ˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์„ ๋•Œ์—๋Š” ์‹ ๊ฒฝ๋ง์ด ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•์„ ์ž˜ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์–ด ์„ฑ๋Šฅ์ด ์ข‹์•˜๋‹ค.

๋”ฐ๋ผ์„œ, ๋ฐ์ดํ„ฐ๋ฅผ ๋ผ๋ฒจ์ด ๋ฐ”๋€Œ์ง€ ์•Š๋Š” ํ•œ๋„๋‚ด์—์„œ ๋ณ€ํ™˜์‹œ์ผœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ.

๊ทธ๋Ÿฌ๋‚˜ ๋ ˆ์ด๋ธ”์ด ๋ณ€ํ™˜๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์œผ๋ฉด ํ•˜๋ฉด ์•ˆ๋œ๋‹ค. (ex MNIST๋Š” 6์„ 9๋กœ ๋ณผ ์ˆ˜๋„ ์žˆ๋‹ค)

Noiser Robustness

๋…ธ์ด์ฆˆ๋ฅผ ์‹ ๊ฒฝ๋ง ์ค‘๊ฐ„์ค‘๊ฐ„์— ์ธํ’‹์ด๋‚˜ ๊ฐ€์ค‘์น˜์— ๋„ฃ๊ฒŒ๋˜๋ฉด ์„ฑ๋Šฅ์ด ๋” ์ข‹๊ฒŒ ๋‚˜์˜จ๋‹ค๋Š” ์‹คํ—˜์ ์ธ ๊ฒฐ๊ณผ

Label Smooting

์ด๋ฏธ์ง€๋ฅผ ์„œ๋กœ ์กฐํ•ฉํ•˜๋Š” ๊ธฐ๋ฒ•

  • Mixup : ๋‘ ๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋น„์œจ๋กœ ์„ž๊ณ , ๋ผ๋ฒจ๋„ ์„ž์–ด๋ฒ„๋ฆฌ๋Š” ๋ฐฉ๋ฒ•

  • Cutout : ์ด๋ฏธ์ง€์˜ ์ผ๋ถ€๋ถ„์„ ์ œ๊ฑฐ

  • Cutmix : ์ด๋ฏธ์ง€๋ฅผ ์„ž์–ด์ค„ ๋•Œ blending ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ผ๋ถ€ ์˜์—ญ์„ ์„ž์–ด์ฃผ๋Š” ๊ฒƒ

Dropout

๊ฐ๊ฐ์˜ ๋‰ด๋Ÿฐ๋“ค์ด ์กฐ๊ธˆ ๋” robustํ•œ feature๋“ค์„ ์žก๋„๋ก ๋ช‡๊ฐœ์˜ ๋‰ด๋Ÿฐ์„ ๋น„ํ™œ์„ฑํ™” ํ•œ๋‹ค.

Batch Normalization

์‹ ๊ฒฝ๋ง์˜ ๊ฐ๊ฐ์˜ ๋ ˆ์ด์–ด๊ฐ€ ์ฒœ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ์„ ๋•Œ, ์ฒœ๊ฐœ์˜ ๊ฐ’์„ ๋ชจ๋“€ ์ •๊ทœํ™”(ํ‰๊ท ์„ ๋นผ์ฃผ๊ณ  ๋ถ„์‚ฐ์œผ๋กœ ๋‚˜๋ˆ„์–ด์ค€๋‹ค)ํ•ด์ค€๋‹ค. ๊ทธ๋Ÿฌ๋ฉด์„œ ๋„คํŠธ์›Œํฌ๊ฐ€ ์ž˜ ํ•™์Šต์ด ๋œ๋‹ค

  • ๋งŽ์€ ๋…ผ๋ฌธ๋“ค์ด ๋™์˜ํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค

  • ํ™•์‹คํ•œ ๊ฒƒ์€ BN์„ ํ™œ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ํ–ฅ์ƒํ•œ๋‹ค.

์‹ค์Šต

Regression with Different Optimizers

!pip install matplotlib==3.3.0
  • matplotlib 3.2์—๋Š” ์—†๋Š” ๊ธฐ๋Šฅ์„ ์“ธ๊ฒƒ์ด๋ผ์„œ 3.3์œผ๋กœ ๋‹ค์šด๋กœ๋“œํ•ด์ค€๋‹ค

  • ์ฝ”๋žฉ์—์„œ๋Š” ๋Ÿฐํƒ€์ž„์„ ๋‹ค์‹œ ์‹œ์ž‘ํ•ด์ค˜์•ผ ํ•œ๋‹ค.

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
%matplotlib inline
%config InlineBackend.figure_format='retina'
print ("PyTorch version:[%s]."%(torch.__version__))
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print ("device:[%s]."%(device))
PyTorch version:[1.9.0+cu102].
device:[cuda:0].

Dataset

n_data = 10000
x_numpy = -3+6*np.random.rand(n_data,1)
y_numpy = np.exp(-(x_numpy**2))*np.cos(10*x_numpy) + 3e-2*np.random.randn(n_data,1)
plt.figure(figsize=(8,5))
plt.plot(x_numpy,y_numpy,'r.',ms=2)
plt.show()
x_torch = torch.Tensor(x_numpy).to(device)
y_torch = torch.Tensor(y_numpy).to(device)
print ("Done.")
Done.

x์˜ ๋ฒ”์œ„๋ฅผ -3๋ถ€ํ„ฐ 3๊นŒ์ง€ ์ •ํ•ด์ค€๋‹ค. x๋ฅผ ์ง€์ˆ˜ํ•จ์ˆ˜์™€ cosํ•จ์ˆ˜์˜ ๊ณฑ์— ๋Œ€์ž…ํ•ด y๋ฅผ ์–ป์œผ๋ฉด ์œ„์— fig์ฒ˜๋Ÿผ ๋œ๋‹ค.

Define Model

class Model(nn.Module):
    def __init__(self,name='mlp',xdim=1,hdims=[16,16],ydim=1):
        super(Model, self).__init__()
        self.name = name
        self.xdim = xdim
        self.hdims = hdims
        self.ydim = ydim

        self.layers = []
        prev_hdim = self.xdim
        for hdim in self.hdims:
            self.layers.append(nn.Linear(
                prev_hdim, hdim, bias=True
            ))
            self.layers.append(nn.Tanh())  # activation
            prev_hdim = hdim
        # Final layer (without activation)
        self.layers.append(nn.Linear(prev_hdim,self.ydim,bias=True))

        # Concatenate all layers 
        self.net = nn.Sequential()
        for l_idx,layer in enumerate(self.layers):
            layer_name = "%s_%02d"%(type(layer).__name__.lower(),l_idx)
            self.net.add_module(layer_name,layer)

        self.init_param() # initialize parameters
    
    def init_param(self):
        for m in self.modules():
            if isinstance(m,nn.Conv2d): # init conv
                nn.init.kaiming_normal_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m,nn.Linear): # lnit dense
                nn.init.kaiming_normal_(m.weight)
                nn.init.zeros_(m.bias)
    
    def forward(self,x):
        return self.net(x)

print ("Done.")        
Done.
  • 12 -15 : ๋ชจ๋ธ์€ Linear๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์„ ํ˜• ๋ชจ๋ธ์ด๋ฉฐ activate ํ•จ์ˆ˜๋กœ tanh()๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

LEARNING_RATE = 1e-2
# Instantiate models
model_sgd = Model(name='mlp_sgd',xdim=1,hdims=[64,64],ydim=1).to(device)
model_momentum = Model(name='mlp_momentum',xdim=1,hdims=[64,64],ydim=1).to(device)
model_adam = Model(name='mlp_adam',xdim=1,hdims=[64,64],ydim=1).to(device)
# Optimizers
loss = nn.MSELoss()
optm_sgd = optim.SGD(
    model_sgd.parameters(), lr=LEARNING_RATE
)
optm_momentum = optim.SGD(
    model_momentum.parameters(), lr=LEARNING_RATE, momentum=0.9
)
optm_adam = optim.Adam(
    model_adam.parameters(), lr=LEARNING_RATE
)
print ("Done.")
Done.

๋™์ผํ•œ ๋ฐ์ดํ„ฐ์™€ ๋™์ผํ•œ ๋„คํŠธ์›Œํฌ๋กœ ํ›ˆ๋ จํ•œ๋‹ค. ์ด ๋•Œ์˜ ์˜ตํ‹ฐ๋งˆ์ด์ €๋“ค์˜ ์ฐจ์ด๋ฅผ ๋ณผ ๊ฒƒ์ž„

Check Parameters

np.set_printoptions(precision=3)
n_param = 0
for p_idx,(param_name,param) in enumerate(model_sgd.named_parameters()):
    if param.requires_grad:
        param_numpy = param.detach().cpu().numpy() # to numpy array 
        n_param += len(param_numpy.reshape(-1))
        print ("[%d] name:[%s] shape:[%s]."%(p_idx,param_name,param_numpy.shape))
        print ("    val:%s"%(param_numpy.reshape(-1)[:5]))
print ("Total number of parameters:[%s]."%(format(n_param,',d')))
[0] name:[net.linear_00.weight] shape:[(64, 1)].
    val:[-0.204  2.943  0.396 -1.603 -1.872]
[1] name:[net.linear_00.bias] shape:[(64,)].
    val:[0. 0. 0. 0. 0.]
[2] name:[net.linear_02.weight] shape:[(64, 64)].
    val:[-0.057 -0.296  0.127  0.142 -0.027]
[3] name:[net.linear_02.bias] shape:[(64,)].
    val:[0. 0. 0. 0. 0.]
[4] name:[net.linear_04.weight] shape:[(1, 64)].
    val:[-0.139 -0.354  0.019  0.221  0.045]
[5] name:[net.linear_04.bias] shape:[(1,)].
    val:[0.]
Total number of parameters:[4,353].

Train

MAX_ITER,BATCH_SIZE,PLOT_EVERY = 1e4,64,500

model_sgd.init_param()
model_momentum.init_param()
model_adam.init_param()

model_sgd.train()
model_momentum.train()
model_adam.train()

for it in range(int(MAX_ITER)):
    r_idx = np.random.permutation(n_data)[:BATCH_SIZE]
    batch_x,batch_y = x_torch[r_idx],y_torch[r_idx]
    
    # Update with Adam
    y_pred_adam = model_adam.forward(batch_x)
    loss_adam = loss(y_pred_adam,batch_y)
    optm_adam.zero_grad()
    loss_adam.backward()
    optm_adam.step()

    # Update with Momentum
    y_pred_momentum = model_momentum.forward(batch_x)
    loss_momentum = loss(y_pred_momentum,batch_y)
    optm_momentum.zero_grad()
    loss_momentum.backward()
    optm_momentum.step()

    # Update with SGD
    y_pred_sgd = model_sgd.forward(batch_x)
    loss_sgd = loss(y_pred_sgd,batch_y)
    optm_sgd.zero_grad()
    loss_sgd.backward()
    optm_sgd.step()
    

    # Plot
    if ((it%PLOT_EVERY)==0) or (it==0) or (it==(MAX_ITER-1)):
        with torch.no_grad():
            y_sgd_numpy = model_sgd.forward(x_torch).cpu().detach().numpy()
            y_momentum_numpy = model_momentum.forward(x_torch).cpu().detach().numpy()
            y_adam_numpy = model_adam.forward(x_torch).cpu().detach().numpy()
            
            plt.figure(figsize=(8,4))
            plt.plot(x_numpy,y_numpy,'r.',ms=4,label='GT')
            plt.plot(x_numpy,y_sgd_numpy,'g.',ms=2,label='SGD')
            plt.plot(x_numpy,y_momentum_numpy,'b.',ms=2,label='Momentum')
            plt.plot(x_numpy,y_adam_numpy,'k.',ms=2,label='ADAM')
            plt.title("[%d/%d]"%(it,MAX_ITER),fontsize=15)
            plt.legend(labelcolor='linecolor',loc='upper right',fontsize=15)
            plt.show()

print ("Done.")
    • permutation์„ ์“ฐ๋Š”๋ฒ• ๊ทธ๋ฆฌ๊ณ  shuffle๊ณผ์˜ ์ฐจ์ด๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋‹ค

    • n_data ๋งŒํผ ๋ฐ์ดํ„ฐ๋ฅผ ์„ž๊ณ  ์ด ์ค‘ ๋ฐฐ์น˜์‚ฌ์ด์ฆˆ ๋งŒํผ๋งŒ ๊ฐ€์ ธ์˜จ๋‹ค.

  • 16-20 : ๋ชจ๋ธ์— x๋ฅผ ๋„ฃ์–ด forward ํ•˜๊ณ  loss๋ฅผ ์–ป๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  backpropagation์— ํ•„์š”ํ•œ ์„ธ๊ฐ€์ง€ ์ž‘์—…์„ ์ง„ํ–‰ํ•ด์ค€๋‹ค. ๋‚˜๋จธ์ง€ ๋‘ ๋ชจ๋ธ๋„ ๋™์ผํ•˜๋‹ค.

Done.

๊ฒฐ๊ณผ๋กœ ์•Œ ์ˆ˜ ์žˆ๋Š”์ 

  • ์•„๋‹ด์ด ์ œ์ผ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ–ˆ๋‹ค.

    • ์•„๋‹ด์€ 2์ฒœ๋ฒˆ๋งŒ์— ๊ฑฐ์˜ ์ˆ˜๋ ดํ–ˆ๋‹ค.

    • ์ˆ˜๋ ด์— ์žˆ์–ด์„œ ๋ชจ๋ฉ˜ํ…€๊ณผ ์Šคํ… ์‚ฌ์ด์ฆˆ๋ฅผ ๋‘˜ ๋‹ค ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.

  • ๋ชจ๋ฉ˜ํ…€์ด SGD๋ณด๋‹ค ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•œ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ?(์™„๋ฒฝํžˆ ์ˆ˜๋ ดํ•˜์ง€๋Š” ๋ชปํ–ˆ์ง€๋งŒ)

    • ์ด์ „์˜ gradient๋ฅผ ํ™œ์šฉํ•˜๊ฒ ๋‹ค ๋ผ๋Š” ๊ฒƒ์ด ๋ชจ๋ฉ˜ํ…€๊ณผ SGD์˜ ์ฐจ์ด

    • SGD๋Š” ๋ฐฐ์น˜ ๋ฐ์ดํ„ฐ๋งŒ ๊ฐ€์ง€๊ณ  ๊ทธ ๋•Œ์˜ ์˜์‚ฌ๊ฒฐ์ •์„ ํ•˜์ง€๋งŒ, ๋ชจ๋ฉ˜ํ…€์€ ์Œ“์—ฌ์˜จ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์˜์‚ฌ๊ฒฐ์ •์„ ํ•˜๋Š” ์ฐจ์ด์ด๋‹ค.

  • SGD๋Š” ํ”ผํฌ๊ฐ€ ๋†’์€ ๊ฒƒ๋งŒ ๋Œ€์ฒด๋กœ ์žก๊ณ ์žˆ๋‹ค.

    • SGD๋Š” MSE๋ฅผ ์‚ฌ์šฉํ•˜๋‹ค๋ณด๋‹ˆ ๋งŽ์ด ์–ด๊ธ‹๋‚˜๋ฉด ๊ทธ ์ชฝ์„ ๋งŽ์ด ๋ณด์™„ํ•˜๊ฒŒ ๋˜๊ณ  ์ ๊ฒŒ ์–ด๊ธ‹๋‚˜๋ฉด ๊ทธ๋งŒํผ ๋œ ์‹ ๊ฒฝ์“ฐ๊ฒŒ๋œ๋‹ค.

    • ๋งŒ์•ฝ์— outlier๊ฐ€ ๋ผ๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค๋ฉด MSE๊ฐ€ ์ข‹์€ Loss ํ•จ์ˆ˜๋Š” ์•„๋‹ˆ๋ผ๋Š” ์ƒ๊ฐ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค

  • ๋ชจ๋ธ์ด ์•„๋ฌด๋ฆฌ ์„ฑ๋Šฅ์ด ์ข‹๋”๋ผ๋„ Optimizer๋ฅผ ์ž˜ ์„ ํƒํ•˜์ง€ ๋ชปํ•˜๋ฉด Dead Line์„ ์ง€ํ‚ค์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋„๊ตฌ๋ฅผ ์ž˜ ๊ณจ๋ผ์•ผํ•œ๋‹ค.

gtg_tgtโ€‹: ํ˜„์žฌ ์‹œ์ ์— ๊ฐฑ์‹ ๋œ ๊ฐ€์ค‘์น˜

ata_tatโ€‹: ์ด์ „์˜ ๊ฐ€์ค‘์น˜ ์ •๋ณด๋“ค

ฮฒ\betaฮฒ: ๋ชจ๋ฉ˜ํ…€

Wtโˆ’ฮทฮฒat W_t - \eta\beta a_t Wtโ€‹โˆ’ฮทฮฒatโ€‹์˜ ์˜๋ฏธ๋Š” ๊ธฐ์กด์— ๊ธฐ์šธ๊ธฐ์—์„œ ์ผ๋‹จ ๊ด€์„ฑ * ํ•™์Šต๋ฅ  ๋งŒํผ ๋นผ๋ผ๋Š” ์˜๋ฏธ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์—ฌ๊ธฐ์„œ โ–ฝL \triangledown L โ–ฝL์€ ํ•ด๋‹น ์‹œ์ ์—์„œ์˜ ๋ฏธ๋ถ„์œจ์„ ๊ตฌํ•˜๋ผ๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ฐฑ์‹ ๋œ ์œ„์น˜์—์„œ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

12 : np.random.permutation ์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ์ด์—๋Œ€ํ•œ ์„ค๋ช…์€ ์„œ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์—ฌ๊ธฐ