๐Ÿšดโ€โ™‚๏ธ
TIL
  • MAIN
  • : TIL?
  • : WIL
  • : Plan
  • : Retrospective
    • 21Y
      • Wait a moment!
      • 9M 2W
      • 9M1W
      • 8M4W
      • 8M3W
      • 8M2W
      • 8M1W
      • 7M4W
      • 7M3W
      • 7M2W
      • 7M1W
      • 6M5W
      • 1H
    • ์ƒˆ์‚ฌ๋žŒ ๋˜๊ธฐ ํ”„๋กœ์ ํŠธ
      • 2ํšŒ์ฐจ
      • 1ํšŒ์ฐจ
  • TIL : ML
    • Paper Analysis
      • BERT
      • Transformer
    • Boostcamp 2st
      • [S]Data Viz
        • (4-3) Seaborn ์‹ฌํ™”
        • (4-2) Seaborn ๊ธฐ์ดˆ
        • (4-1) Seaborn ์†Œ๊ฐœ
        • (3-4) More Tips
        • (3-3) Facet ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-2) Color ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-1) Text ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-3) Scatter Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-2) Line Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-1) Bar Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (1-3) Python๊ณผ Matplotlib
        • (1-2) ์‹œ๊ฐํ™”์˜ ์š”์†Œ
        • (1-1) Welcome to Visualization (OT)
      • [P]MRC
        • (2๊ฐ•) Extraction-based MRC
        • (1๊ฐ•) MRC Intro & Python Basics
      • [P]KLUE
        • (5๊ฐ•) BERT ๊ธฐ๋ฐ˜ ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
        • (4๊ฐ•) ํ•œ๊ตญ์–ด BERT ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต
        • [NLP] ๋ฌธ์žฅ ๋‚ด ๊ฐœ์ฒด๊ฐ„ ๊ด€๊ณ„ ์ถ”์ถœ
        • (3๊ฐ•) BERT ์–ธ์–ด๋ชจ๋ธ ์†Œ๊ฐœ
        • (2๊ฐ•) ์ž์—ฐ์–ด์˜ ์ „์ฒ˜๋ฆฌ
        • (1๊ฐ•) ์ธ๊ณต์ง€๋Šฅ๊ณผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
      • [U]Stage-CV
      • [U]Stage-NLP
        • 7W Retrospective
        • (10๊ฐ•) Advanced Self-supervised Pre-training Models
        • (09๊ฐ•) Self-supervised Pre-training Models
        • (08๊ฐ•) Transformer (2)
        • (07๊ฐ•) Transformer (1)
        • 6W Retrospective
        • (06๊ฐ•) Beam Search and BLEU score
        • (05๊ฐ•) Sequence to Sequence with Attention
        • (04๊ฐ•) LSTM and GRU
        • (03๊ฐ•) Recurrent Neural Network and Language Modeling
        • (02๊ฐ•) Word Embedding
        • (01๊ฐ•) Intro to NLP, Bag-of-Words
        • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Preprocessing for NMT Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Subword-level Language Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ2] RNN-based Language Model
        • [์„ ํƒ ๊ณผ์ œ] BERT Fine-tuning with transformers
        • [ํ•„์ˆ˜ ๊ณผ์ œ] Data Preprocessing
      • Mask Wear Image Classification
        • 5W Retrospective
        • Report_Level1_6
        • Performance | Review
        • DAY 11 : HardVoting | MultiLabelClassification
        • DAY 10 : Cutmix
        • DAY 9 : Loss Function
        • DAY 8 : Baseline
        • DAY 7 : Class Imbalance | Stratification
        • DAY 6 : Error Fix
        • DAY 5 : Facenet | Save
        • DAY 4 : VIT | F1_Loss | LrScheduler
        • DAY 3 : DataSet/Lodaer | EfficientNet
        • DAY 2 : Labeling
        • DAY 1 : EDA
        • 2_EDA Analysis
      • [P]Stage-1
        • 4W Retrospective
        • (10๊ฐ•) Experiment Toolkits & Tips
        • (9๊ฐ•) Ensemble
        • (8๊ฐ•) Training & Inference 2
        • (7๊ฐ•) Training & Inference 1
        • (6๊ฐ•) Model 2
        • (5๊ฐ•) Model 1
        • (4๊ฐ•) Data Generation
        • (3๊ฐ•) Dataset
        • (2๊ฐ•) Image Classification & EDA
        • (1๊ฐ•) Competition with AI Stages!
      • [U]Stage-3
        • 3W Retrospective
        • PyTorch
          • (10๊ฐ•) PyTorch Troubleshooting
          • (09๊ฐ•) Hyperparameter Tuning
          • (08๊ฐ•) Multi-GPU ํ•™์Šต
          • (07๊ฐ•) Monitoring tools for PyTorch
          • (06๊ฐ•) ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
          • (05๊ฐ•) Dataset & Dataloader
          • (04๊ฐ•) AutoGrad & Optimizer
          • (03๊ฐ•) PyTorch ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
          • (02๊ฐ•) PyTorch Basics
          • (01๊ฐ•) Introduction to PyTorch
      • [U]Stage-2
        • 2W Retrospective
        • DL Basic
          • (10๊ฐ•) Generative Models 2
          • (09๊ฐ•) Generative Models 1
          • (08๊ฐ•) Sequential Models - Transformer
          • (07๊ฐ•) Sequential Models - RNN
          • (06๊ฐ•) Computer Vision Applications
          • (05๊ฐ•) Modern CNN - 1x1 convolution์˜ ์ค‘์š”์„ฑ
          • (04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?
          • (03๊ฐ•) Optimization
          • (02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)
          • (01๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ์šฉ์–ด ์„ค๋ช… - Historical Review
        • Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Multi-headed Attention Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] LSTM Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] CNN Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Optimization Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] MLP Assignment
      • [U]Stage-1
        • 1W Retrospective
        • AI Math
          • (AI Math 10๊ฐ•) RNN ์ฒซ๊ฑธ์Œ
          • (AI Math 9๊ฐ•) CNN ์ฒซ๊ฑธ์Œ
          • (AI Math 8๊ฐ•) ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 7๊ฐ•) ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 6๊ฐ•) ํ™•๋ฅ ๋ก  ๋ง›๋ณด๊ธฐ
          • (AI Math 5๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต๋ฐฉ๋ฒ• ์ดํ•ดํ•˜๊ธฐ
          • (AI Math 4๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ๋งค์šด๋ง›
          • (AI Math 3๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ์ˆœํ•œ๋ง›
          • (AI Math 2๊ฐ•) ํ–‰๋ ฌ์ด ๋ญ์˜ˆ์š”?
          • (AI Math 1๊ฐ•) ๋ฒกํ„ฐ๊ฐ€ ๋ญ์˜ˆ์š”?
        • Python
          • (Python 7-2๊ฐ•) pandas II
          • (Python 7-1๊ฐ•) pandas I
          • (Python 6๊ฐ•) numpy
          • (Python 5-2๊ฐ•) Python data handling
          • (Python 5-1๊ฐ•) File / Exception / Log Handling
          • (Python 4-2๊ฐ•) Module and Project
          • (Python 4-1๊ฐ•) Python Object Oriented Programming
          • (Python 3-2๊ฐ•) Pythonic code
          • (Python 3-1๊ฐ•) Python Data Structure
          • (Python 2-4๊ฐ•) String and advanced function concept
          • (Python 2-3๊ฐ•) Conditionals and Loops
          • (Python 2-2๊ฐ•) Function and Console I/O
          • (Python 2-1๊ฐ•) Variables
          • (Python 1-3๊ฐ•) ํŒŒ์ด์ฌ ์ฝ”๋”ฉ ํ™˜๊ฒฝ
          • (Python 1-2๊ฐ•) ํŒŒ์ด์ฌ ๊ฐœ์š”
          • (Python 1-1๊ฐ•) Basic computer class for newbies
        • Assignment
          • [์„ ํƒ ๊ณผ์ œ 3] Maximum Likelihood Estimate
          • [์„ ํƒ ๊ณผ์ œ 2] Backpropagation
          • [์„ ํƒ ๊ณผ์ œ 1] Gradient Descent
          • [ํ•„์ˆ˜ ๊ณผ์ œ 5] Morsecode
          • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Baseball
          • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Text Processing 2
          • [ํ•„์ˆ˜ ๊ณผ์ œ 2] Text Processing 1
          • [ํ•„์ˆ˜ ๊ณผ์ œ 1] Basic Math
    • ๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ
      • ์ข…ํ•ฉ ์‹ค์Šต 2 - ์บ๊ธ€ Plant Pathology(๋‚˜๋ฌด์žŽ ๋ณ‘ ์ง„๋‹จ) ๊ฒฝ์—ฐ ๋Œ€ํšŒ
      • ์ข…ํ•ฉ ์‹ค์Šต 1 - 120์ข…์˜ Dog Breed Identification ๋ชจ๋ธ ์ตœ์ ํ™”
      • ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์˜ ๋ฏธ์„ธ ์กฐ์ • ํ•™์Šต๊ณผ ๋‹ค์–‘ํ•œ Learning Rate Scheduler์˜ ์ ์šฉ
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - ResNet ์ƒ์„ธ์™€ EfficientNet ๊ฐœ์š”
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - AlexNet, VGGNet, GoogLeNet
      • Albumentation์„ ์ด์šฉํ•œ Augmentation๊ธฐ๋ฒ•๊ณผ Keras Sequence ํ™œ์šฉํ•˜๊ธฐ
      • ์‚ฌ์ „ ํ›ˆ๋ จ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ๊ณผ Keras Generator ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ดํ•ด
      • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์ดํ•ด - Keras ImageDataGenerator ํ™œ์šฉ
      • CNN ๋ชจ๋ธ ๊ตฌํ˜„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ณธ ๊ธฐ๋ฒ• ์ ์šฉํ•˜๊ธฐ
    • AI School 1st
    • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
    • ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜
  • TIL : Python & Math
    • Do It! ์žฅ๊ณ +๋ถ€ํŠธ์ŠคํŠธ๋žฉ: ํŒŒ์ด์ฌ ์›น๊ฐœ๋ฐœ์˜ ์ •์„
      • Relations - ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„
      • Relations - ๋‹ค๋Œ€์ผ ๊ด€๊ณ„
      • ํ…œํ”Œ๋ฆฟ ํŒŒ์ผ ๋ชจ๋“ˆํ™” ํ•˜๊ธฐ
      • TDD (Test Driven Development)
      • template tags & ์กฐ๊ฑด๋ฌธ
      • ์ •์  ํŒŒ์ผ(static files) & ๋ฏธ๋””์–ด ํŒŒ์ผ(media files)
      • FBV (Function Based View)์™€ CBV (Class Based View)
      • Django ์ž…๋ฌธํ•˜๊ธฐ
      • ๋ถ€ํŠธ์ŠคํŠธ๋žฉ
      • ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์ดˆ๋‹ค์ง€๊ธฐ (HTML, CSS, JS)
      • ๋“ค์–ด๊ฐ€๊ธฐ + ํ™˜๊ฒฝ์„ค์ •
    • Algorithm
      • Programmers
        • Level1
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์ˆซ์ž ๋ฌธ์ž์—ด๊ณผ ์˜๋‹จ์–ด
          • ์ž์—ฐ์ˆ˜ ๋’ค์ง‘์–ด ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค๊ธฐ
          • ์ •์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐฐ์น˜ํ•˜๊ธฐ
          • ์ •์ˆ˜ ์ œ๊ณฑ๊ทผ ํŒ๋ณ„
          • ์ œ์ผ ์ž‘์€ ์ˆ˜ ์ œ๊ฑฐํ•˜๊ธฐ
          • ์ง์‚ฌ๊ฐํ˜• ๋ณ„์ฐ๊ธฐ
          • ์ง์ˆ˜์™€ ํ™€์ˆ˜
          • ์ฒด์œก๋ณต
          • ์ตœ๋Œ€๊ณต์•ฝ์ˆ˜์™€ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • ์ฝœ๋ผ์ธ  ์ถ”์ธก
          • ํฌ๋ ˆ์ธ ์ธํ˜•๋ฝ‘๊ธฐ ๊ฒŒ์ž„
          • ํ‚คํŒจ๋“œ ๋ˆ„๋ฅด๊ธฐ
          • ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
          • ํฐ์ผ“๋ชฌ
          • ํ•˜์ƒค๋“œ ์ˆ˜
          • ํ•ธ๋“œํฐ ๋ฒˆํ˜ธ ๊ฐ€๋ฆฌ๊ธฐ
          • ํ–‰๋ ฌ์˜ ๋ง์…ˆ
        • Level2
          • ์ˆซ์ž์˜ ํ‘œํ˜„
          • ์ˆœ์œ„ ๊ฒ€์ƒ‰
          • ์ˆ˜์‹ ์ตœ๋Œ€ํ™”
          • ์†Œ์ˆ˜ ์ฐพ๊ธฐ
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์‚ผ๊ฐ ๋‹ฌํŒฝ์ด
          • ๋ฌธ์ž์—ด ์••์ถ•
          • ๋ฉ”๋‰ด ๋ฆฌ๋‰ด์–ผ
          • ๋” ๋งต๊ฒŒ
          • ๋•…๋”ฐ๋จน๊ธฐ
          • ๋ฉ€์ฉกํ•œ ์‚ฌ๊ฐํ˜•
          • ๊ด„ํ˜ธ ํšŒ์ „ํ•˜๊ธฐ
          • ๊ด„ํ˜ธ ๋ณ€ํ™˜
          • ๊ตฌ๋ช…๋ณดํŠธ
          • ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
          • ๋‰ด์Šค ํด๋Ÿฌ์Šคํ„ฐ๋ง
          • ๋‹ค๋ฆฌ๋ฅผ ์ง€๋‚˜๋Š” ํŠธ๋Ÿญ
          • ๋‹ค์Œ ํฐ ์ˆซ์ž
          • ๊ฒŒ์ž„ ๋งต ์ตœ๋‹จ๊ฑฐ๋ฆฌ
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
          • ๊ฐ€์žฅ ํฐ ์ •์‚ฌ๊ฐํ˜• ์ฐพ๊ธฐ
          • H-Index
          • JadenCase ๋ฌธ์ž์—ด ๋งŒ๋“ค๊ธฐ
          • N๊ฐœ์˜ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • N์ง„์ˆ˜ ๊ฒŒ์ž„
          • ๊ฐ€์žฅ ํฐ ์ˆ˜
          • 124 ๋‚˜๋ผ์˜ ์ˆซ์ž
          • 2๊ฐœ ์ดํ•˜๋กœ ๋‹ค๋ฅธ ๋น„ํŠธ
          • [3์ฐจ] ํŒŒ์ผ๋ช… ์ •๋ ฌ
          • [3์ฐจ] ์••์ถ•
          • ์ค„ ์„œ๋Š” ๋ฐฉ๋ฒ•
          • [3์ฐจ] ๋ฐฉ๊ธˆ ๊ทธ๊ณก
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
        • Level3
          • ๋งค์นญ ์ ์ˆ˜
          • ์™ธ๋ฒฝ ์ ๊ฒ€
          • ๊ธฐ์ง€๊ตญ ์„ค์น˜
          • ์ˆซ์ž ๊ฒŒ์ž„
          • 110 ์˜ฎ๊ธฐ๊ธฐ
          • ๊ด‘๊ณ  ์ œ๊ฑฐ
          • ๊ธธ ์ฐพ๊ธฐ ๊ฒŒ์ž„
          • ์…”ํ‹€๋ฒ„์Šค
          • ๋‹จ์†์นด๋ฉ”๋ผ
          • ํ‘œ ํŽธ์ง‘
          • N-Queen
          • ์ง•๊ฒ€๋‹ค๋ฆฌ ๊ฑด๋„ˆ๊ธฐ
          • ์ตœ๊ณ ์˜ ์ง‘ํ•ฉ
          • ํ•ฉ์Šน ํƒ์‹œ ์š”๊ธˆ
          • ๊ฑฐ์Šค๋ฆ„๋ˆ
          • ํ•˜๋…ธ์ด์˜ ํƒ‘
          • ๋ฉ€๋ฆฌ ๋›ฐ๊ธฐ
          • ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ
        • Level4
    • Head First Python
    • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ SQL
    • ๋‹จ ๋‘ ์žฅ์˜ ๋ฌธ์„œ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™” ๋ฝ€๊ฐœ๊ธฐ
    • Linear Algebra(Khan Academy)
    • ์ธ๊ณต์ง€๋Šฅ์„ ์œ„ํ•œ ์„ ํ˜•๋Œ€์ˆ˜
    • Statistics110
  • TIL : etc
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Kubernetes
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Docker
      • 2. ๋„์ปค ์„ค์น˜ ์‹ค์Šต 1 - ํ•™์ŠตํŽธ(์ค€๋น„๋ฌผ/์‹ค์Šต ์œ ํ˜• ์†Œ๊ฐœ)
      • 1. ์ปจํ…Œ์ด๋„ˆ์™€ ๋„์ปค์˜ ์ดํ•ด - ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์“ฐ๋Š”์ด์œ  / ์ผ๋ฐ˜ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ปจํ…Œ์ด๋„ˆํ”„๋กœ๊ทธ๋žจ์˜ ์ฐจ์ด์ 
      • 0. ๋“œ๋””์–ด ์ฐพ์•„์˜จ Docker ๊ฐ•์˜! ์™•์ดˆ๋ณด์—์„œ ๋„์ปค ๋งˆ์Šคํ„ฐ๋กœ - OT
    • CoinTrading
      • [๊ฐ€์ƒ ํ™”ํ ์ž๋™ ๋งค๋งค ํ”„๋กœ๊ทธ๋žจ] ๋ฐฑํ…Œ์ŠคํŒ… : ๊ฐ„๋‹จํ•œ ํ…Œ์ŠคํŒ…
    • Gatsby
      • 01 ๊นƒ๋ถ ํฌ๊ธฐ ์„ ์–ธ
  • TIL : Project
    • Mask Wear Image Classification
    • Project. GARIGO
  • 2021 TIL
    • CHANGED
    • JUN
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Thu
      • 2 Wed
      • 1 Tue
    • MAY
      • 31 Mon
      • 30 Sun
      • 29 Sat
      • 28 Fri
      • 27 Thu
      • 26 Wed
      • 25 Tue
      • 24 Mon
      • 23 Sun
      • 22 Sat
      • 21 Fri
      • 20 Thu
      • 19 Wed
      • 18 Tue
      • 17 Mon
      • 16 Sun
      • 15 Sat
      • 14 Fri
      • 13 Thu
      • 12 Wed
      • 11 Tue
      • 10 Mon
      • 9 Sun
      • 8 Sat
      • 7 Fri
      • 6 Thu
      • 5 Wed
      • 4 Tue
      • 3 Mon
      • 2 Sun
      • 1 Sat
    • APR
      • 30 Fri
      • 29 Thu
      • 28 Wed
      • 27 Tue
      • 26 Mon
      • 25 Sun
      • 24 Sat
      • 23 Fri
      • 22 Thu
      • 21 Wed
      • 20 Tue
      • 19 Mon
      • 18 Sun
      • 17 Sat
      • 16 Fri
      • 15 Thu
      • 14 Wed
      • 13 Tue
      • 12 Mon
      • 11 Sun
      • 10 Sat
      • 9 Fri
      • 8 Thu
      • 7 Wed
      • 6 Tue
      • 5 Mon
      • 4 Sun
      • 3 Sat
      • 2 Fri
      • 1 Thu
    • MAR
      • 31 Wed
      • 30 Tue
      • 29 Mon
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • FEB
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • JAN
      • 31 Sun
      • 30 Sat
      • 29 Fri
      • 28 Thu
      • 27 Wed
      • 26 Tue
      • 25 Mon
      • 24 Sun
      • 23 Sat
      • 22 Fri
      • 21 Thu
      • 20 Wed
      • 19 Tue
      • 18 Mon
      • 17 Sun
      • 16 Sat
      • 15 Fri
      • 14 Thu
      • 13 Wed
      • 12 Tue
      • 11 Mon
      • 10 Sun
      • 9 Sat
      • 8 Fri
      • 7 Thu
      • 6 Wed
      • 5 Tue
      • 4 Mon
      • 3 Sun
      • 2 Sat
      • 1 Fri
  • 2020 TIL
    • DEC
      • 31 Thu
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Tue
      • 2 Wed
      • 1 Tue
    • NOV
      • 30 Mon
Powered by GitBook
On this page
  • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
  • K-Fold Cross Validation
  • Feature Engineering - Feature Selection
  • ์ƒ๊ด€ ๋ถ„์„(Correlation Analysis) & regplot()
  • Regression ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋ณด์Šคํ„ด ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ ์˜ˆ์ธกํ•ด๋ณด๊ธฐ (EDA & Feature Selection)
  • Feature Engineering - Feature Normalization
  • Feature Engineering - Feature Generation
  • Ridge & Lasso & ElasticNet Regression
  • ๋ณด์Šคํ„ด ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ ์˜ˆ์ธก ์„ฑ๋Šฅ ํ–ฅ์ƒ์‹œ์ผœ๋ณด๊ธฐ (Feature Generation & Advanced Estimator)

Was this helpful?

  1. 2021 TIL
  2. JUN

7 Mon

Previous8 TueNext6 Sun

Last updated 4 years ago

Was this helpful?

ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ

K-Fold Cross Validation

  • ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ์ž‘์„ ๊ฒฝ์šฐ ํŠธ๋ ˆ์ด๋‹ ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ๋ถ„๋ฅ˜ ๋ฐฉ์‹์— ๋”ฐ๋ผ ์„ฑ๋Šฅ ์ธก์ •๊ฒฐ๊ณผ๊ฐ€ ํฌ๊ฒŒ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค.

  • ํŠธ๋ ˆ์ด๋‹ ๋ฐ์ดํ„ฐ์— ๊ทน๋‹จ์ ์ธ ๋ถ„ํฌ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชฐ๋ ค ์žˆ๋‹ค๋ฉด ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ์„ฑ๋Šฅ์ด ์ž˜ ์•ˆ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค.

>>> import numpy as np
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
...     print("TRAIN:", train_index, "TEST:", test_index)
...     X_train, X_test = X[train_index], X[test_index]
...     y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]

Feature Engineering - Feature Selection

  • ๋„๋ฉ”์ธ ์ง€์‹์ด๋‚˜ ๋ถ„์„์„ ํ†ตํ•ด ์œ ์˜๋ฏธํ•œ ํŠน์ง•๋“ค๋งŒ์„ ์„ ๋ณ„ํ•ด๋‚ด๊ฑฐ๋‚˜ Feature์˜ ํ˜•ํƒœ๋ฅผ ๋”์šฑ ์ ํ•ฉํ•œ ํ˜•ํƒœ๋กœ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ

  • ์ ์ ˆํ•œ Feature Enginerring์€ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ํฐ ์˜ํ–ฅ์„ ๋ผ์น  ์ˆ˜ ์žˆ๋‹ค

  • ๋‹ค์Œ๊ณผ ๊ฐ™์ด 3์ข…๋ฅ˜๋กœ ๋‚˜๋‰œ๋‹ค

    • Feature Selection

    • Normalization

    • Feature Generation

Feature Selection

  • ์˜ˆ์ธก๊ฐ’๊ณผ ์—ฐ๊ด€์ด ์—†๋Š” ๋ถˆํ•„์š”ํ•œ ํŠน์ง•์„ ์ œ๊ฑฐํ•ด์„œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋”์šฑ ๋†’์ด๋Š” ๊ธฐ๋ฒ•

  • ์ œ๊ฑฐํ•  ํŠน์ง•์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•ด ์ƒ๊ด€ ๋ถ„์„๋“ฑ์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค

์ƒ๊ด€ ๋ถ„์„(Correlation Analysis) & regplot()

์ƒ๊ด€ ๋ถ„์„ ๋˜๋Š” ์ƒ๊ด€ ๊ด€๊ณ„๋Š” ํ™•๋ฅ ๋ก ๊ณผ ํ†ต๊ณ„ํ•™์—์„œ ๋‘ ๋ณ€์ˆ˜๊ฐ„์— ์–ด๋–ค ์„ ํ˜•์  ๋˜๋Š” ๋น„์„ ํ˜•์  ๊ด€๊ณ„๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค

  • 1์— ๊ฐ€๊นŒ์šด ๊ฐ’ : ๋‘ ๋ณ€์ˆ˜๊ฐ„์˜ ์–‘์˜ ์ƒ๊ด€๊ด€๊ณ„

  • 0์— ๊ฐ€๊นŒ์šด ๊ฐ’ : ๋‘ ๋ณ€์ˆ˜๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์—†์Œ

  • -1์— ๊ฐ€๊นŒ์šด ๊ฐ’ : ๋‘ ๋ณ€์ˆ˜๊ฐ„์˜ ์Œ์˜ ์ƒ๊ด€๊ด€๊ณ„

๊ตฌํ˜„

  • scikit-learn์„ ์ด์šฉํ•ด ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค

corr=df.corr()
plt.figure(figsize=(10, 10))
sns.heatmap(corr, 
        vmax=0.8,
        linewidths=0.01,
        square=True,
        annot=True,
        cmap='YlGnBu');
plt.tilte('Correlation Matrix')

sns.regplot ์œผ๋กœ Feature๊ฐ„์˜ ๊ฒฝํ–ฅ์„ฑ ์ถœ๋ ฅ

  • sns.regplot(data={dataframe}, x={์ปฌ๋Ÿผ๋ช…}, y={์ปฌ๋Ÿผ๋ช…}) ํ˜•ํƒœ๋ฅผ ์ด์šฉํ•ด์„œ regression line์ด ํฌํ•จ๋œ scatter plot์„ ๊ทธ๋ฆด ์ˆ˜ ์žˆ๋‹ค.

Regression ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋ณด์Šคํ„ด ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ ์˜ˆ์ธกํ•ด๋ณด๊ธฐ (EDA & Feature Selection)

  • 1970๋…„๋„์˜ ๋ณด์Šคํ„ด ์ง€์—ญ์˜ ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ์„ ์ˆ˜์ง‘ํ•œ ๋ฐ์ดํ„ฐ

  • Feature ๋ฐ์ดํ„ฐ : 13๊ฐœ

  • ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜ : 506๊ฐœ

  • Target data : ๋ณด์Šคํ„ด ๋ถ€๋™์‚ฐ ์ง‘๊ฐ’ (๋‹จ์œ„ : $1000)

  • ์‚ฌ์šฉ ์•Œ๊ณ ๋ฆฌ์ฆ˜

    • LinearRegression

  • ์ถ”๊ฐ€์ ์ธ ์ ์šฉ๊ธฐ๋ฒ•

    • Feature Selection

๋ณด์Šคํ„ด ๋ถ€๋™์‚ฐ ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•๋“ค(Features)

  1. CRIM: ๋„์‹œ๋ณ„ ๋ฒ”์ฃ„๋ฐœ์ƒ๋ฅ 

  2. ZN: 25,000ํ‰์„ ๋„˜๋Š” ํ† ์ง€์˜ ๋น„์œจ

  3. INDUS: ๋„์‹œ๋ณ„ ๋น„์ƒ์—… ์ง€๊ตฌ์˜ ๋น„์œ 

  4. CHAS: ์ฐฐ์Šค ๊ฐ•์˜ ๋”๋ฏธ ๋ณ€์ˆ˜(1 = ๊ฐ•์˜ ๊ฒฝ๊ณ„, 0 = ๋‚˜๋จธ์ง€)

  5. NOX: ์ผ์‚ฐํ™”์งˆ์†Œ ๋†๋„

  6. RM: ์ฃผ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ํ‰๊ท  ๋ฐฉ์˜๊ฐœ์ˆ˜

  7. AGE: 1940๋…„ ์ด์ „์— ์ง€์–ด์ง„ ์ฃผํƒ์˜ ๋น„์œจ

  8. DIS: 5๊ฐœ์˜ ๊ณ ์šฉ์ง€์›์„ผํ„ฐ๊นŒ์ง€์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ๊ณ ๋ ค๋œ ๊ฑฐ๋ฆฌ

  9. RAD: ๊ณ ์†๋„๋กœ์˜ ์ ‘๊ทผ ์šฉ์ด์„ฑ์— ๋Œ€ํ•œ ์ง€ํ‘œ

  10. TAX: 10,000๋‹ฌ๋Ÿฌ๋‹น ์žฌ์‚ฐ์„ธ ๋น„์œจ

  11. PTRATIO: ๋„์‹œ๋ณ„ ๊ต์‚ฌ์™€ ํ•™์ƒ์˜ ๋น„์œจ

  12. B: ๋„์‹œ์˜ ํ‘์ธ ๊ฑฐ์ฃผ ๋น„์œ 

  13. LSTAT: ์ €์†Œ๋“์ธต์˜ ๋น„์œจ

์ „์ฒด ํŠน์ง•(Feature)๋ฅผ ์‚ฌ์šฉํ•œ Linear Regression

X = boston_house_data.data
y = boston_house_data.target
type(X)
numpy.ndarray
from sklearn.model_selection import KFold

num_split = 5
kf = KFold(n_splits=num_split)

avg_MSE = 0.0
for train_index, test_index in kf.split(X):
  X_train, X_test = X[train_index], X[test_index]
  y_train, y_test = y[train_index], y[test_index]
  # ์„ ํ˜•ํšŒ๊ท€(Linear Regression) ๋ชจ๋ธ ์„ ์–ธํ•˜๊ธฐ
  lr = LinearRegression()

  # ์„ ํ˜•ํšŒ๊ท€(Linear Regression) ๋ชจ๋ธ ํ•™์Šตํ•˜๊ธฐ
  lr.fit(X_train, y_train)

  # ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  y_pred = lr.predict(X_test)

  # MSE(Mean Squared Error)๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
  avg_MSE = avg_MSE + mean_squared_error(y_test, y_pred)

print('Average MSE :', avg_MSE/num_split)
print('Avergae RMSE :', np.sqrt(avg_MSE/num_split))
Average MSE : 37.13180746769903
Avergae RMSE : 6.093587405436885

์ƒ๊ด€๋ถ„์„(Correlation Analysis)

boston_house_df = pd.DataFrame(boston_house_data.data, columns = boston_house_data.feature_names)
boston_house_df['PRICE'] = y
boston_house_df.head()

CRIM

ZN

INDUS

CHAS

NOX

RM

AGE

DIS

RAD

TAX

PTRATIO

B

LSTAT

PRICE

0

0.00632

18.0

2.31

0.0

0.538

6.575

65.2

4.0900

1.0

296.0

15.3

396.90

4.98

24.0

1

0.02731

0.0

7.07

0.0

0.469

6.421

78.9

4.9671

2.0

242.0

17.8

396.90

9.14

21.6

2

0.02729

0.0

7.07

0.0

0.469

7.185

61.1

4.9671

2.0

242.0

17.8

392.83

4.03

34.7

3

0.03237

0.0

2.18

0.0

0.458

6.998

45.8

6.0622

3.0

222.0

18.7

394.63

2.94

33.4

4

0.06905

0.0

2.18

0.0

0.458

7.147

54.2

6.0622

3.0

222.0

18.7

396.90

5.33

36.2

corr = boston_house_df.corr()
plt.figure(figsize=(10, 10));
sns.heatmap(corr,
            vmax=0.8,
            linewidths=0.01,
            square=True,
            annot=True,
            cmap='YlGnBu');
plt.title('Feature Correlation');

regplot์œผ๋กœ ๋ณด๋Š” ์ƒ๊ด€๊ด€๊ณ„

figure, ax_list = plt.subplots(nrows=3, ncols=5)
figure.set_size_inches(20,20) 
for i in range(len(full_column_list)): 
  sns.regplot(data=boston_house_df, x=full_column_list[i], y='PRICE', ax=ax_list[int(i/5)][int(i%5)]) 
  ax_list[int(i/5)][int(i%5)].set_title("regplot " + full_column_list[i])

์œ ์˜๋ฏธํ•œ Feature๋“ค๋งŒ์„ ๋‚จ๊ธฐ๋Š” Feature Selection

print(type(corr))
corr
<class 'pandas.core.frame.DataFrame'>

CRIM

ZN

INDUS

CHAS

NOX

RM

AGE

DIS

RAD

TAX

PTRATIO

B

LSTAT

PRICE

CRIM

1.000000

-0.200469

0.406583

-0.055892

0.420972

-0.219247

0.352734

-0.379670

0.625505

0.582764

0.289946

-0.385064

0.455621

-0.388305

ZN

-0.200469

1.000000

-0.533828

-0.042697

-0.516604

0.311991

-0.569537

0.664408

-0.311948

-0.314563

-0.391679

0.175520

-0.412995

0.360445

INDUS

0.406583

-0.533828

1.000000

0.062938

0.763651

-0.391676

0.644779

-0.708027

0.595129

0.720760

0.383248

-0.356977

0.603800

-0.483725

CHAS

-0.055892

-0.042697

0.062938

1.000000

0.091203

0.091251

0.086518

-0.099176

-0.007368

-0.035587

-0.121515

0.048788

-0.053929

0.175260

NOX

0.420972

-0.516604

0.763651

0.091203

1.000000

-0.302188

0.731470

-0.769230

0.611441

0.668023

0.188933

-0.380051

0.590879

-0.427321

RM

-0.219247

0.311991

-0.391676

0.091251

-0.302188

1.000000

-0.240265

0.205246

-0.209847

-0.292048

-0.355501

0.128069

-0.613808

0.695360

AGE

0.352734

-0.569537

0.644779

0.086518

0.731470

-0.240265

1.000000

-0.747881

0.456022

0.506456

0.261515

-0.273534

0.602339

-0.376955

DIS

-0.379670

0.664408

-0.708027

-0.099176

-0.769230

0.205246

-0.747881

1.000000

-0.494588

-0.534432

-0.232471

0.291512

-0.496996

0.249929

RAD

0.625505

-0.311948

0.595129

-0.007368

0.611441

-0.209847

0.456022

-0.494588

1.000000

0.910228

0.464741

-0.444413

0.488676

-0.381626

TAX

0.582764

-0.314563

0.720760

-0.035587

0.668023

-0.292048

0.506456

-0.534432

0.910228

1.000000

0.460853

-0.441808

0.543993

-0.468536

PTRATIO

0.289946

-0.391679

0.383248

-0.121515

0.188933

-0.355501

0.261515

-0.232471

0.464741

0.460853

1.000000

-0.177383

0.374044

-0.507787

B

-0.385064

0.175520

-0.356977

0.048788

-0.380051

0.128069

-0.273534

0.291512

-0.444413

-0.441808

-0.177383

1.000000

-0.366087

0.333461

LSTAT

0.455621

-0.412995

0.603800

-0.053929

0.590879

-0.613808

0.602339

-0.496996

0.488676

0.543993

0.374044

-0.366087

1.000000

-0.737663

PRICE

-0.388305

0.360445

-0.483725

0.175260

-0.427321

0.695360

-0.376955

0.249929

-0.381626

-0.468536

-0.507787

0.333461

-0.737663

1.000000

useful_feature_list = corr.query("PRICE > 0.5 or PRICE < -0.5").index.values.tolist()
useful_feature_list.remove('PRICE')
print(useful_feature_list)
['RM', 'PTRATIO', 'LSTAT']
X = boston_house_df.loc[:,useful_feature_list].values
y = boston_house_df.iloc[:,-1].values
print(X.shape)
print(y.shape)
(506, 3)
(506,)

Feature Selection ๊ฒฐ๊ณผ with K-fold

num_split = 5

kf = KFold(n_splits=num_split)

avg_MSE = 0.0

for train_index, test_index in kf.split(X):
  X_train, X_test = X[train_index], X[test_index]
  y_train, y_test = y[train_index], y[test_index]
  # ์„ ํ˜•ํšŒ๊ท€(Linear Regression) ๋ชจ๋ธ ์„ ์–ธํ•˜๊ธฐ
  lr = LinearRegression()

  # ์„ ํ˜•ํšŒ๊ท€(Linear Regression) ๋ชจ๋ธ ํ•™์Šตํ•˜๊ธฐ
  lr.fit(X_train, y_train)

  # ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  y_pred = lr.predict(X_test)

  # MSE(Mean Squared Error)๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
  avg_MSE = avg_MSE + mean_squared_error(y_test, y_pred)

print('Average MSE :', avg_MSE/num_split)
print('Avergae RMSE :', np.sqrt(avg_MSE/num_split))
Average MSE : 34.10008149030686
Avergae RMSE : 5.839527505741099

๊ฒฐ๋ก 

์„ฑ๋Šฅ์ด ๋” ์ข‹์•„์ง„ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค

  • ์ด์ „

Average MSE : 37.13180746769903
Avergae RMSE : 6.093587405436885
  • ์ดํ›„

Average MSE : 34.10008149030686
Avergae RMSE : 5.839527505741099

Feature Engineering - Feature Normalization

  • Feature๊ฐ’์˜ ๋ฒ”์œ„๋ฅผ ์กฐ์ •ํ•˜๋Š” ๊ธฐ๋ฒ•

  • Feature๋ฅผ ์ •๊ทœํ™” ํ•  ๊ฒฝ์šฐ ๋” ์•ˆ์ •์ ์œผ๋กœ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค

  • Min-Max Scaling์„ ํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ์ด ๋•Œ๋Š” ๋ชจ๋“  ๊ฐ’์ด 0์—์„œ 1์‚ฌ์ด์— ์œ„์น˜ํ•˜๊ฒŒ ๋œ๋‹ค.

    • x' = (x - min) / (max- min)

from sklearn import preprocessing
normalized_data = preprocessing.StandardScaler().
                                fit_transform(data)
from sklearn import preprocessing
normalized_data = preprocessing.MinMaxScaler().
                                fit_transform(data)

Feature Engineering - Feature Generation

  • ๊ธฐ์กด์˜ ํŠน์ง•๊ฐ’๋“ค์„ ์กฐํ•ฉํ•ด์„œ ์ƒˆ๋กœ์šด ํŠน์ง•์„ ๋งŒ๋“œ๋Š” ๋ฐฉ์‹

  • ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ์‹์€ PolynomialFeature ๋ฐฉ๋ฒ•์ด๋‹ค

    • ์„œ๋กœ ๋‹ค๋ฅธ ํŠน์ง•๋“ค๊ฐ„์˜ ๊ณฑ์…ˆ์„ ์ƒˆ๋กœ์šด Feature๋กœ ๋งŒ๋“ ๋‹ค

    • ์˜ˆ๋ฅผ ๋“ค๋ฉด ๋ฒ”์ฃ„์œจ x1๊ณผ ์ €์†Œ๋“์ธต ๋น„์œจ x2๋ฅผ ๊ณฑํ•ด ์ƒˆ๋กœ์šด ํŠน์ง• x1*x2 ๋ฅผ ๋งŒ๋“ ๋‹ค

  • ์•„๋ž˜ ํ•จ์ˆ˜๋Š” ๋ณด์Šคํ„ด ๋ถ€๋™์‚ฐ์— ๋Œ€ํ•œ 13๊ฐœ์˜ ํŠน์ง•์— 91๊ฐœ์˜ ์ƒˆ๋กœ์šด ํŠน์ง•์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ด 104๊ฐœ์˜ ํŠน์ง•์„ ๋ฐ˜ํ™˜ํ•˜๊ฒŒ ๋œ๋‹ค

from sklearn.preprocessing import MinMaxScaler,
                                PolynomialFeatures

def load_extended_boston():
    boston = load_boston()
    X = boston.data
    
    X = MinMaxScaler().fit_transform(boston.data)
    X = PolynomialFeatures(degree=2, include_bias=False).
        fit_transform(X)
        
    return X, boston.target

Ridge & Lasso & ElasticNet Regression

์•„๋ž˜ ์‹๋“ค์€ ํ•ด๋‹น ์ถœ์ฒ˜์—์„œ ๊ฐ€์ ธ์˜ด

Rdige Regression

  • L2 Regularization์„ ์ด์šฉํ•ด์„œ ๊ฐ€์ค‘์น˜ w๋ฅผ ์ œํ•œํ•˜๋Š” ๊ธฐ๋ฒ•

Lasso Regression

  • L1 Regularization์„ ์ด์šฉํ•ด์„œ ๊ฐ€์ค‘์น˜ w๋ฅผ ์ œํ•œํ•˜๋Š” ๊ธฐ๋ฒ•

๋น„๊ต

ElasticNet Regression

  • Ridge์™€ Lasso๋ฅผ ๊ฒฐํ•ฉํ•œ ๊ธฐ๋ฒ•

์–ด๋–ค ๊ฒƒ์„ ์จ์•ผํ• ๊นŒ?

  • ์ •๋‹ต์€ ์—†๋‹ค.

  • ์ƒํ™ฉ์— ๋งž๊ฒŒ ์จ์•ผ ํ•˜๋Š” ๊ฒƒ์ด ํ˜„๋‹ต.

  • ์ด ์ƒํ™ฉ์— ๋งž๊ฒŒ ์จ์•ผํ•˜๋Š” ๊ธฐ์ค€์„ ๊ฐ€์ด๋“œ๋กœ ์ œ๊ณตํ•˜๊ณ  ์žˆ๋‹ค

  • Regression ์•ˆ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„๋ฅ˜๋œ๋‹ค

    • ๋ฐ์ดํ„ฐ๊ฐ€ 10๋งŒ๊ฐœ ์ดํ•˜์ธ๊ฐ€? => SGD Regressor

    • ์ „์ฒด feature ์ค‘ ํŠน์ • feature์˜ ์ค‘์š”๋„๊ฐ€ ๋” ํฐ๊ฐ€? => Lasso, ElasticNet

    • ์ „์ฒด feature์˜ ์ค‘์š”๋„๊ฐ€ ๊ณ ๋ฅด๋‹ค => Ridge

      • ์ž˜ ์ž‘๋™์„ ํ•˜์ง€ ์•Š๋Š”๊ฐ€? => Ensemble Regressor

์ฝ”๋“œ๋ ˆ๋ฒจ ๊ตฌํ˜„

from sklearn.linear_model import LinearRegression,
                                Ridge,
                                Lasso,
                                ElasticNet
lr = LinearRegression()
ridge_reg = Ridge()
lasso_reg = Lasso()
elasticnet_Reg = ElasticNet()

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ

  • ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์˜ํ•ด ๋ณ€๊ฒฝ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์™ธ์— ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋””์ž์ด๋„ˆ๊ฐ€ ์„ค์ •ํ•ด์ค˜์•ผ ํ•˜๋Š” ๊ฐ’์„ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ผ๊ณ  ํ•œ๋‹ค

  • ์ ์ ˆํ•œ ํ•˜์ดํผ ํŒŒ๋ฆฌ๋ฏธํ„ฐ ๊ฐ’์„ ์ •ํ•ด์ฃผ๋Š” ๊ฒƒ๋„ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์˜ ์ค‘์š”ํ•œ ์š”์†Œ ์ค‘ ํ•˜๋‚˜

๋ณด์Šคํ„ด ๋ถ€๋™์‚ฐ ๊ฐ€๊ฒฉ ์˜ˆ์ธก ์„ฑ๋Šฅ ํ–ฅ์ƒ์‹œ์ผœ๋ณด๊ธฐ (Feature Generation & Advanced Estimator)

  • Input data : 104 Dimension (PolynomialFeatures๋ฅผ ์‚ฌ์šฉํ•ด์„œ ํ™•์žฅ๋œ Feature Set)

  • Target data : ๋ณด์Šคํ„ด ๋ถ€๋™์‚ฐ ์ง‘๊ฐ’ (๋‹จ์œ„ : $1000)

  • ์‚ฌ์šฉ ์•Œ๊ณ ๋ฆฌ์ฆ˜

    • LinearRegression

    • Ridge

    • Lasso

    • ElasticNet

  • ์ถ”๊ฐ€์ ์ธ ์ ์šฉ๊ธฐ๋ฒ•

    • Feature Generation (PolynomialFeatures)

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
https://rk1993.tistory.com/entry/Ridge-regression%EC%99%80-Lasso-regression-%EC%89%BD%EA%B2%8C-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
https://nurilee.com/2020/01/26/data-science-model-summary-linear-ridge-lasso-elasticnet/