๐Ÿšดโ€โ™‚๏ธ
TIL
  • MAIN
  • : TIL?
  • : WIL
  • : Plan
  • : Retrospective
    • 21Y
      • Wait a moment!
      • 9M 2W
      • 9M1W
      • 8M4W
      • 8M3W
      • 8M2W
      • 8M1W
      • 7M4W
      • 7M3W
      • 7M2W
      • 7M1W
      • 6M5W
      • 1H
    • ์ƒˆ์‚ฌ๋žŒ ๋˜๊ธฐ ํ”„๋กœ์ ํŠธ
      • 2ํšŒ์ฐจ
      • 1ํšŒ์ฐจ
  • TIL : ML
    • Paper Analysis
      • BERT
      • Transformer
    • Boostcamp 2st
      • [S]Data Viz
        • (4-3) Seaborn ์‹ฌํ™”
        • (4-2) Seaborn ๊ธฐ์ดˆ
        • (4-1) Seaborn ์†Œ๊ฐœ
        • (3-4) More Tips
        • (3-3) Facet ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-2) Color ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-1) Text ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-3) Scatter Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-2) Line Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-1) Bar Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (1-3) Python๊ณผ Matplotlib
        • (1-2) ์‹œ๊ฐํ™”์˜ ์š”์†Œ
        • (1-1) Welcome to Visualization (OT)
      • [P]MRC
        • (2๊ฐ•) Extraction-based MRC
        • (1๊ฐ•) MRC Intro & Python Basics
      • [P]KLUE
        • (5๊ฐ•) BERT ๊ธฐ๋ฐ˜ ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
        • (4๊ฐ•) ํ•œ๊ตญ์–ด BERT ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต
        • [NLP] ๋ฌธ์žฅ ๋‚ด ๊ฐœ์ฒด๊ฐ„ ๊ด€๊ณ„ ์ถ”์ถœ
        • (3๊ฐ•) BERT ์–ธ์–ด๋ชจ๋ธ ์†Œ๊ฐœ
        • (2๊ฐ•) ์ž์—ฐ์–ด์˜ ์ „์ฒ˜๋ฆฌ
        • (1๊ฐ•) ์ธ๊ณต์ง€๋Šฅ๊ณผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
      • [U]Stage-CV
      • [U]Stage-NLP
        • 7W Retrospective
        • (10๊ฐ•) Advanced Self-supervised Pre-training Models
        • (09๊ฐ•) Self-supervised Pre-training Models
        • (08๊ฐ•) Transformer (2)
        • (07๊ฐ•) Transformer (1)
        • 6W Retrospective
        • (06๊ฐ•) Beam Search and BLEU score
        • (05๊ฐ•) Sequence to Sequence with Attention
        • (04๊ฐ•) LSTM and GRU
        • (03๊ฐ•) Recurrent Neural Network and Language Modeling
        • (02๊ฐ•) Word Embedding
        • (01๊ฐ•) Intro to NLP, Bag-of-Words
        • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Preprocessing for NMT Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Subword-level Language Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ2] RNN-based Language Model
        • [์„ ํƒ ๊ณผ์ œ] BERT Fine-tuning with transformers
        • [ํ•„์ˆ˜ ๊ณผ์ œ] Data Preprocessing
      • Mask Wear Image Classification
        • 5W Retrospective
        • Report_Level1_6
        • Performance | Review
        • DAY 11 : HardVoting | MultiLabelClassification
        • DAY 10 : Cutmix
        • DAY 9 : Loss Function
        • DAY 8 : Baseline
        • DAY 7 : Class Imbalance | Stratification
        • DAY 6 : Error Fix
        • DAY 5 : Facenet | Save
        • DAY 4 : VIT | F1_Loss | LrScheduler
        • DAY 3 : DataSet/Lodaer | EfficientNet
        • DAY 2 : Labeling
        • DAY 1 : EDA
        • 2_EDA Analysis
      • [P]Stage-1
        • 4W Retrospective
        • (10๊ฐ•) Experiment Toolkits & Tips
        • (9๊ฐ•) Ensemble
        • (8๊ฐ•) Training & Inference 2
        • (7๊ฐ•) Training & Inference 1
        • (6๊ฐ•) Model 2
        • (5๊ฐ•) Model 1
        • (4๊ฐ•) Data Generation
        • (3๊ฐ•) Dataset
        • (2๊ฐ•) Image Classification & EDA
        • (1๊ฐ•) Competition with AI Stages!
      • [U]Stage-3
        • 3W Retrospective
        • PyTorch
          • (10๊ฐ•) PyTorch Troubleshooting
          • (09๊ฐ•) Hyperparameter Tuning
          • (08๊ฐ•) Multi-GPU ํ•™์Šต
          • (07๊ฐ•) Monitoring tools for PyTorch
          • (06๊ฐ•) ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
          • (05๊ฐ•) Dataset & Dataloader
          • (04๊ฐ•) AutoGrad & Optimizer
          • (03๊ฐ•) PyTorch ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
          • (02๊ฐ•) PyTorch Basics
          • (01๊ฐ•) Introduction to PyTorch
      • [U]Stage-2
        • 2W Retrospective
        • DL Basic
          • (10๊ฐ•) Generative Models 2
          • (09๊ฐ•) Generative Models 1
          • (08๊ฐ•) Sequential Models - Transformer
          • (07๊ฐ•) Sequential Models - RNN
          • (06๊ฐ•) Computer Vision Applications
          • (05๊ฐ•) Modern CNN - 1x1 convolution์˜ ์ค‘์š”์„ฑ
          • (04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?
          • (03๊ฐ•) Optimization
          • (02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)
          • (01๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ์šฉ์–ด ์„ค๋ช… - Historical Review
        • Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Multi-headed Attention Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] LSTM Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] CNN Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Optimization Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] MLP Assignment
      • [U]Stage-1
        • 1W Retrospective
        • AI Math
          • (AI Math 10๊ฐ•) RNN ์ฒซ๊ฑธ์Œ
          • (AI Math 9๊ฐ•) CNN ์ฒซ๊ฑธ์Œ
          • (AI Math 8๊ฐ•) ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 7๊ฐ•) ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 6๊ฐ•) ํ™•๋ฅ ๋ก  ๋ง›๋ณด๊ธฐ
          • (AI Math 5๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต๋ฐฉ๋ฒ• ์ดํ•ดํ•˜๊ธฐ
          • (AI Math 4๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ๋งค์šด๋ง›
          • (AI Math 3๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ์ˆœํ•œ๋ง›
          • (AI Math 2๊ฐ•) ํ–‰๋ ฌ์ด ๋ญ์˜ˆ์š”?
          • (AI Math 1๊ฐ•) ๋ฒกํ„ฐ๊ฐ€ ๋ญ์˜ˆ์š”?
        • Python
          • (Python 7-2๊ฐ•) pandas II
          • (Python 7-1๊ฐ•) pandas I
          • (Python 6๊ฐ•) numpy
          • (Python 5-2๊ฐ•) Python data handling
          • (Python 5-1๊ฐ•) File / Exception / Log Handling
          • (Python 4-2๊ฐ•) Module and Project
          • (Python 4-1๊ฐ•) Python Object Oriented Programming
          • (Python 3-2๊ฐ•) Pythonic code
          • (Python 3-1๊ฐ•) Python Data Structure
          • (Python 2-4๊ฐ•) String and advanced function concept
          • (Python 2-3๊ฐ•) Conditionals and Loops
          • (Python 2-2๊ฐ•) Function and Console I/O
          • (Python 2-1๊ฐ•) Variables
          • (Python 1-3๊ฐ•) ํŒŒ์ด์ฌ ์ฝ”๋”ฉ ํ™˜๊ฒฝ
          • (Python 1-2๊ฐ•) ํŒŒ์ด์ฌ ๊ฐœ์š”
          • (Python 1-1๊ฐ•) Basic computer class for newbies
        • Assignment
          • [์„ ํƒ ๊ณผ์ œ 3] Maximum Likelihood Estimate
          • [์„ ํƒ ๊ณผ์ œ 2] Backpropagation
          • [์„ ํƒ ๊ณผ์ œ 1] Gradient Descent
          • [ํ•„์ˆ˜ ๊ณผ์ œ 5] Morsecode
          • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Baseball
          • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Text Processing 2
          • [ํ•„์ˆ˜ ๊ณผ์ œ 2] Text Processing 1
          • [ํ•„์ˆ˜ ๊ณผ์ œ 1] Basic Math
    • ๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ
      • ์ข…ํ•ฉ ์‹ค์Šต 2 - ์บ๊ธ€ Plant Pathology(๋‚˜๋ฌด์žŽ ๋ณ‘ ์ง„๋‹จ) ๊ฒฝ์—ฐ ๋Œ€ํšŒ
      • ์ข…ํ•ฉ ์‹ค์Šต 1 - 120์ข…์˜ Dog Breed Identification ๋ชจ๋ธ ์ตœ์ ํ™”
      • ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์˜ ๋ฏธ์„ธ ์กฐ์ • ํ•™์Šต๊ณผ ๋‹ค์–‘ํ•œ Learning Rate Scheduler์˜ ์ ์šฉ
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - ResNet ์ƒ์„ธ์™€ EfficientNet ๊ฐœ์š”
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - AlexNet, VGGNet, GoogLeNet
      • Albumentation์„ ์ด์šฉํ•œ Augmentation๊ธฐ๋ฒ•๊ณผ Keras Sequence ํ™œ์šฉํ•˜๊ธฐ
      • ์‚ฌ์ „ ํ›ˆ๋ จ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ๊ณผ Keras Generator ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ดํ•ด
      • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์ดํ•ด - Keras ImageDataGenerator ํ™œ์šฉ
      • CNN ๋ชจ๋ธ ๊ตฌํ˜„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ณธ ๊ธฐ๋ฒ• ์ ์šฉํ•˜๊ธฐ
    • AI School 1st
    • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
    • ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜
  • TIL : Python & Math
    • Do It! ์žฅ๊ณ +๋ถ€ํŠธ์ŠคํŠธ๋žฉ: ํŒŒ์ด์ฌ ์›น๊ฐœ๋ฐœ์˜ ์ •์„
      • Relations - ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„
      • Relations - ๋‹ค๋Œ€์ผ ๊ด€๊ณ„
      • ํ…œํ”Œ๋ฆฟ ํŒŒ์ผ ๋ชจ๋“ˆํ™” ํ•˜๊ธฐ
      • TDD (Test Driven Development)
      • template tags & ์กฐ๊ฑด๋ฌธ
      • ์ •์  ํŒŒ์ผ(static files) & ๋ฏธ๋””์–ด ํŒŒ์ผ(media files)
      • FBV (Function Based View)์™€ CBV (Class Based View)
      • Django ์ž…๋ฌธํ•˜๊ธฐ
      • ๋ถ€ํŠธ์ŠคํŠธ๋žฉ
      • ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์ดˆ๋‹ค์ง€๊ธฐ (HTML, CSS, JS)
      • ๋“ค์–ด๊ฐ€๊ธฐ + ํ™˜๊ฒฝ์„ค์ •
    • Algorithm
      • Programmers
        • Level1
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์ˆซ์ž ๋ฌธ์ž์—ด๊ณผ ์˜๋‹จ์–ด
          • ์ž์—ฐ์ˆ˜ ๋’ค์ง‘์–ด ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค๊ธฐ
          • ์ •์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐฐ์น˜ํ•˜๊ธฐ
          • ์ •์ˆ˜ ์ œ๊ณฑ๊ทผ ํŒ๋ณ„
          • ์ œ์ผ ์ž‘์€ ์ˆ˜ ์ œ๊ฑฐํ•˜๊ธฐ
          • ์ง์‚ฌ๊ฐํ˜• ๋ณ„์ฐ๊ธฐ
          • ์ง์ˆ˜์™€ ํ™€์ˆ˜
          • ์ฒด์œก๋ณต
          • ์ตœ๋Œ€๊ณต์•ฝ์ˆ˜์™€ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • ์ฝœ๋ผ์ธ  ์ถ”์ธก
          • ํฌ๋ ˆ์ธ ์ธํ˜•๋ฝ‘๊ธฐ ๊ฒŒ์ž„
          • ํ‚คํŒจ๋“œ ๋ˆ„๋ฅด๊ธฐ
          • ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
          • ํฐ์ผ“๋ชฌ
          • ํ•˜์ƒค๋“œ ์ˆ˜
          • ํ•ธ๋“œํฐ ๋ฒˆํ˜ธ ๊ฐ€๋ฆฌ๊ธฐ
          • ํ–‰๋ ฌ์˜ ๋ง์…ˆ
        • Level2
          • ์ˆซ์ž์˜ ํ‘œํ˜„
          • ์ˆœ์œ„ ๊ฒ€์ƒ‰
          • ์ˆ˜์‹ ์ตœ๋Œ€ํ™”
          • ์†Œ์ˆ˜ ์ฐพ๊ธฐ
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์‚ผ๊ฐ ๋‹ฌํŒฝ์ด
          • ๋ฌธ์ž์—ด ์••์ถ•
          • ๋ฉ”๋‰ด ๋ฆฌ๋‰ด์–ผ
          • ๋” ๋งต๊ฒŒ
          • ๋•…๋”ฐ๋จน๊ธฐ
          • ๋ฉ€์ฉกํ•œ ์‚ฌ๊ฐํ˜•
          • ๊ด„ํ˜ธ ํšŒ์ „ํ•˜๊ธฐ
          • ๊ด„ํ˜ธ ๋ณ€ํ™˜
          • ๊ตฌ๋ช…๋ณดํŠธ
          • ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
          • ๋‰ด์Šค ํด๋Ÿฌ์Šคํ„ฐ๋ง
          • ๋‹ค๋ฆฌ๋ฅผ ์ง€๋‚˜๋Š” ํŠธ๋Ÿญ
          • ๋‹ค์Œ ํฐ ์ˆซ์ž
          • ๊ฒŒ์ž„ ๋งต ์ตœ๋‹จ๊ฑฐ๋ฆฌ
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
          • ๊ฐ€์žฅ ํฐ ์ •์‚ฌ๊ฐํ˜• ์ฐพ๊ธฐ
          • H-Index
          • JadenCase ๋ฌธ์ž์—ด ๋งŒ๋“ค๊ธฐ
          • N๊ฐœ์˜ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • N์ง„์ˆ˜ ๊ฒŒ์ž„
          • ๊ฐ€์žฅ ํฐ ์ˆ˜
          • 124 ๋‚˜๋ผ์˜ ์ˆซ์ž
          • 2๊ฐœ ์ดํ•˜๋กœ ๋‹ค๋ฅธ ๋น„ํŠธ
          • [3์ฐจ] ํŒŒ์ผ๋ช… ์ •๋ ฌ
          • [3์ฐจ] ์••์ถ•
          • ์ค„ ์„œ๋Š” ๋ฐฉ๋ฒ•
          • [3์ฐจ] ๋ฐฉ๊ธˆ ๊ทธ๊ณก
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
        • Level3
          • ๋งค์นญ ์ ์ˆ˜
          • ์™ธ๋ฒฝ ์ ๊ฒ€
          • ๊ธฐ์ง€๊ตญ ์„ค์น˜
          • ์ˆซ์ž ๊ฒŒ์ž„
          • 110 ์˜ฎ๊ธฐ๊ธฐ
          • ๊ด‘๊ณ  ์ œ๊ฑฐ
          • ๊ธธ ์ฐพ๊ธฐ ๊ฒŒ์ž„
          • ์…”ํ‹€๋ฒ„์Šค
          • ๋‹จ์†์นด๋ฉ”๋ผ
          • ํ‘œ ํŽธ์ง‘
          • N-Queen
          • ์ง•๊ฒ€๋‹ค๋ฆฌ ๊ฑด๋„ˆ๊ธฐ
          • ์ตœ๊ณ ์˜ ์ง‘ํ•ฉ
          • ํ•ฉ์Šน ํƒ์‹œ ์š”๊ธˆ
          • ๊ฑฐ์Šค๋ฆ„๋ˆ
          • ํ•˜๋…ธ์ด์˜ ํƒ‘
          • ๋ฉ€๋ฆฌ ๋›ฐ๊ธฐ
          • ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ
        • Level4
    • Head First Python
    • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ SQL
    • ๋‹จ ๋‘ ์žฅ์˜ ๋ฌธ์„œ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™” ๋ฝ€๊ฐœ๊ธฐ
    • Linear Algebra(Khan Academy)
    • ์ธ๊ณต์ง€๋Šฅ์„ ์œ„ํ•œ ์„ ํ˜•๋Œ€์ˆ˜
    • Statistics110
  • TIL : etc
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Kubernetes
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Docker
      • 2. ๋„์ปค ์„ค์น˜ ์‹ค์Šต 1 - ํ•™์ŠตํŽธ(์ค€๋น„๋ฌผ/์‹ค์Šต ์œ ํ˜• ์†Œ๊ฐœ)
      • 1. ์ปจํ…Œ์ด๋„ˆ์™€ ๋„์ปค์˜ ์ดํ•ด - ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์“ฐ๋Š”์ด์œ  / ์ผ๋ฐ˜ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ปจํ…Œ์ด๋„ˆํ”„๋กœ๊ทธ๋žจ์˜ ์ฐจ์ด์ 
      • 0. ๋“œ๋””์–ด ์ฐพ์•„์˜จ Docker ๊ฐ•์˜! ์™•์ดˆ๋ณด์—์„œ ๋„์ปค ๋งˆ์Šคํ„ฐ๋กœ - OT
    • CoinTrading
      • [๊ฐ€์ƒ ํ™”ํ ์ž๋™ ๋งค๋งค ํ”„๋กœ๊ทธ๋žจ] ๋ฐฑํ…Œ์ŠคํŒ… : ๊ฐ„๋‹จํ•œ ํ…Œ์ŠคํŒ…
    • Gatsby
      • 01 ๊นƒ๋ถ ํฌ๊ธฐ ์„ ์–ธ
  • TIL : Project
    • Mask Wear Image Classification
    • Project. GARIGO
  • 2021 TIL
    • CHANGED
    • JUN
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Thu
      • 2 Wed
      • 1 Tue
    • MAY
      • 31 Mon
      • 30 Sun
      • 29 Sat
      • 28 Fri
      • 27 Thu
      • 26 Wed
      • 25 Tue
      • 24 Mon
      • 23 Sun
      • 22 Sat
      • 21 Fri
      • 20 Thu
      • 19 Wed
      • 18 Tue
      • 17 Mon
      • 16 Sun
      • 15 Sat
      • 14 Fri
      • 13 Thu
      • 12 Wed
      • 11 Tue
      • 10 Mon
      • 9 Sun
      • 8 Sat
      • 7 Fri
      • 6 Thu
      • 5 Wed
      • 4 Tue
      • 3 Mon
      • 2 Sun
      • 1 Sat
    • APR
      • 30 Fri
      • 29 Thu
      • 28 Wed
      • 27 Tue
      • 26 Mon
      • 25 Sun
      • 24 Sat
      • 23 Fri
      • 22 Thu
      • 21 Wed
      • 20 Tue
      • 19 Mon
      • 18 Sun
      • 17 Sat
      • 16 Fri
      • 15 Thu
      • 14 Wed
      • 13 Tue
      • 12 Mon
      • 11 Sun
      • 10 Sat
      • 9 Fri
      • 8 Thu
      • 7 Wed
      • 6 Tue
      • 5 Mon
      • 4 Sun
      • 3 Sat
      • 2 Fri
      • 1 Thu
    • MAR
      • 31 Wed
      • 30 Tue
      • 29 Mon
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • FEB
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • JAN
      • 31 Sun
      • 30 Sat
      • 29 Fri
      • 28 Thu
      • 27 Wed
      • 26 Tue
      • 25 Mon
      • 24 Sun
      • 23 Sat
      • 22 Fri
      • 21 Thu
      • 20 Wed
      • 19 Tue
      • 18 Mon
      • 17 Sun
      • 16 Sat
      • 15 Fri
      • 14 Thu
      • 13 Wed
      • 12 Tue
      • 11 Mon
      • 10 Sun
      • 9 Sat
      • 8 Fri
      • 7 Thu
      • 6 Wed
      • 5 Tue
      • 4 Mon
      • 3 Sun
      • 2 Sat
      • 1 Fri
  • 2020 TIL
    • DEC
      • 31 Thu
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Tue
      • 2 Wed
      • 1 Tue
    • NOV
      • 30 Mon
Powered by GitBook
On this page
  • [AI ์Šค์ฟจ 1๊ธฐ] 9์ฃผ์ฐจ DAY 2
  • Big Data : Spark ์†Œ๊ฐœ I
  • Big Data : Spark ์†Œ๊ฐœ II
  • Big Data : Spark์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

Was this helpful?

  1. 2021 TIL
  2. FEB

2 Tue

TIL

[AI ์Šค์ฟจ 1๊ธฐ] 9์ฃผ์ฐจ DAY 2

Big Data : Spark ์†Œ๊ฐœ I

๋น…๋ฐ์ดํ„ฐ์˜ ์ •์˜

  • ์„œ๋ฒ„ ํ•œ๋Œ€๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†๋Š” ๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ

    • ์•„๋งˆ์กด์˜ DATA SCIENTIST ์กด๋ผ์šฐ์ €๊ฐ€ ๋‚ด๋ฆฐ ์ •์˜

    • ๋ถ„์‚ฐ ํ™˜๊ฒฝ์ด ํ•„์š”ํ•˜๋А๋ƒ์— ํฌ์ปค์Šค

  • ๊ธฐ์กด์˜ ์†Œํ”„ํŠธ์›จ์–ด๋กœ๋Š” ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†๋Š” ๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ

    • ๊ธฐ์กด ์†Œํ”„ํŠธ์›จ์–ด๋Š” ์˜ค๋ผํด์ด๋‚˜ MySQL ๋“ฑ์˜ ๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค

    • ์„œ๋ฒ„์˜ ์‚ฌ์–‘์„ ๋†’์ด์ง€ ์•Š๋Š” ์ด์ƒ ์ฒ˜๋ฆฌ ๋ถˆ๊ฐ€ (Scale-up)

    • ์„œ๋ฒ„์˜ ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ์„ Scale-out ์ด๋ผ๊ณ  ํ•จ

  • 4V

    • Volume : ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ๋Œ€์šฉ๋Ÿ‰

    • Velocity : ๋ฐ์ดํ„ฐ์˜ ์ฒ˜๋ฆฌ ์†๋„๊ฐ€ ์ค‘์š”

    • Variety : ๊ตฌ์กฐํ™”/๋น„๊ตฌ์กฐํ™” ๋ฐ์ดํ„ฐ

    • Veracity : ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ

๋น…๋ฐ์ดํ„ฐ์˜ ์˜ˆ - ์›น ํŽ˜์ด์ง€

  • ์ˆ˜์‹ญ์กฐ๊ฐœ ์ด์ƒ์˜ ์›น ํŽ˜์ด์ง€ ์กด์žฌ ex) ๊ตฌ๊ธ€

  • ์ด๋ฅผ ํฌ๋กคํ•˜์—ฌ ์ค‘์š”ํ•œ ํŽ˜์ด์ง€๋ฅผ ์ฐพ์•„๋‚ด๊ณ  ์ธ๋ฑ์‹ฑํ•˜๋Š” ๊ฒƒ์€ ์—„์ฒญ๋‚œ ํฌ๊ธฐ์˜ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘๊ณผ ๊ณ„์‚ฐ์„ ํ•„์š”๋กœ ํ•œ๋‹ค

  • ์‚ฌ์šฉ์ž ๊ฒ€์ƒ‰์–ด์™€ ํด๋ฆญ ์ •๋ณด ์ž์ฒด๋„ ๋Œ€์šฉ๋Ÿ‰

  • ์ด๋Ÿฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉด์„œ ๊ตฌ๊ธ€์ด ๋น…๋ฐ์ดํ„ฐ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์— ์ง€๋Œ€ํ•œ ๊ณตํ—Œ์„ ํ•˜๊ฒŒ ๋จ

๋Œ€์šฉ๋Ÿ‰ ์ฒ˜๋ฆฌ ๊ธฐ์ˆ 

  • ๋ถ„์‚ฐ ํ™˜๊ฒฝ ๊ธฐ๋ฐ˜ : 2๋Œ€ ์ด์ƒ์˜ ์„œ๋ฒ„๋กœ ๊ตฌ์„ฑ

    • ๋ถ„์‚ฐ ์ปดํ“จํŒ…๊ณผ ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์ด ํ•„์š”

    • ๋ชจ๋“  ์„œ๋ฒ„๋“ค์ด ๊ฐ€์ง„ ๋””์Šคํฌ๋ฅผ ๋ชจ๋‘ ํ•ฉ์ณ์„œ ๊ฐ€์ƒ๋””์Šคํฌ๋กœ ์‚ฌ์šฉ

  • Fault Tolerance : ์†Œ์ˆ˜์˜ ์„œ๋ฒ„๊ฐ€ ๊ณ ์žฅ๋‚˜๋„ ๋™์ž‘ํ•ด์•ผํ•จ

  • Sacle Out : ํ™•์žฅ์ด ์šฉ์ดํ•ด์•ผํ•จ

ํ•˜๋‘ก์˜ ๋“ฑ์žฅ

  • Doug Cutting์ด ๊ตฌ๊ธ€๋žฉ ๋ฐœํ‘œ ๋…ผ๋ฌธ๋“ค์— ๊ธฐ๋ฐ˜ํ•ด ๋งŒ๋“  ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ

    • Doug Cutting์˜ ์•„๋“ค์˜ ์ฝ”๋ผ๋ฆฌ ์ธํ˜• ์ด๋ฆ„์ด ํ•˜๋‘ก

  • ์ฒ˜์Œ ์‹œ์ž‘์€ Nutch๋ผ๋Š” ์˜คํ”ˆ์†Œ์Šค ๊ฒ€์ƒ‰์—”์ง„์˜ ํ•˜๋ถ€ ํ”„๋กœ์ ํŠธ

  • ํฌ๊ฒŒ ๋‘ ๊ฐœ์˜ ์„œ๋ธŒ ์‹œ์Šคํ…œ์œผ๋กœ ๊ตฌํ˜„๋จ

    • ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์ธ HDFS

    • ๋ถ„์‚ฐ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์ธ MapReduce

      • ์ƒˆ๋กœ์šด ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์œผ๋กœ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์˜ ํšจ์œจ์„ ๊ทน๋Œ€ํ™”

      • ์ž‘์—…์— ๋”ฐ๋ผ์„œ๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ์ด ๋„ˆ๋ฌด ๋ณต์žกํ•˜๋‹ค => ์„ฑ๋Šฅ์— ์ดˆ์ ์„ ๋งž์ถค

      • ๊ฒฐ๊ตญ Hive ์ฒ˜๋Ÿผ MapReduce๋กœ ๊ตฌํ˜„๋œ SQL ์–ธ์–ด๋“ค์ด ๋‹ค์‹œ ๊ฐ๊ด‘์„ ๋ฐ›๊ฒŒ ๋จ

      • ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฐฐ์น˜ ์ž‘์—…์— ์ตœ์ ํ™” ๋˜์–ด ์žˆ์–ด์„œ realtime์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ๋Š” ์–ด๋ ต๋‹ค

ํ•˜๋‘ก์˜ ๋ฐœ์ „

  • ํ•˜๋‘ก 1.0์€ HDFS์œ„์— MapReduce๋ผ๋Š” ๋ถ„์‚ฐ์ปดํ“จํŒ… ์‹œ์ŠคํŒ€์ด ๋„๋Š” ๊ตฌ์กฐ

    • ๋‹ค๋ฅธ ๋ถ„์‚ฐ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์€ ์ง€์›ํ•˜์ง€ ๋ชปํ•จ

  • ํ•˜๋‘ก 2.0์€ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ํฌ๊ฒŒ ๋ณ€๊ฒฝ๋จ

    • ํ•˜๋‘ก์€ ๊ธฐ๋ฐ˜ ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์ด ๋˜๊ณ  ๊ทธ ์œ„์— ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ ˆ์ด์–ด๊ฐ€ ์˜ฌ๋ผ๊ฐ€๋Š” ๊ตฌ์กฐ

    • Spark๋Š” ํ•˜๋‘ก 2.0์œ„์—์„œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ ˆ์ด์–ด๋กœ ์‹คํ–‰๋จ

HDFS - ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ

  • ๋ฐ์ดํ„ฐ๋ฅผ ๋ธ”๋ก ๋‹จ์œ„๋กœ ์ €์žฅ

    • ๋ธ”๋ก์˜ ํฌ๊ธฐ๋Š” 128MB

  • ๋ธ”๋ก ๋ณต์ œ ๋ฐฉ์‹

    • ๊ฐ ๋ธ”๋ก์€ 3๊ตฐ๋ฐ์— ์ค‘๋ณต ์ €์žฅ๋จ => Fault tolerance๋ฅผ ๋ณด์žฅ

    • ์ด๊ฒƒ์ด ์˜๋ฏธ๊ฐ€ ์žˆ์œผ๋ ค๋ฉด ์„œ๋ฒ„๊ฐ€ 3๋Œ€๋Š” ์žˆ์–ด์•ผ ๋จ

  • ๋ฐ์ดํ„ฐ๋Š” ๋ฐ์ดํ„ฐ ๋…ธ๋“œ์— ์ €์žฅ๋˜์–ด ์žˆ์œผ๋ฉฐ ์ด ์œ„์น˜๋Š” ๋„ค์ž„ ๋…ธ๋“œ์— ์ €์žฅ๋˜์–ด ์žˆ์Œ

    • ๋ฐ์ดํ„ฐ ๋…ธ๋“œ๋Š” ๊ณ ์žฅ๋‚˜๋ฉด ๋‹ค๋ฅธ ๋ฐฑ์—… ๋…ธ๋“œ๋ฅผ ์ฐธ์กฐํ•˜๋ฉด ๋œ๋‹ค

    • ํ•˜๋‘ก 1.0์—์„œ๋Š” ๋„ค์ž„ ๋…ธ๋“œ๊ฐ€ ๊ณ ์žฅ๋‚˜๋ฉด ์น˜๋ช…์ 

    • ํ•˜๋‘ก 2.0์—์„œ๋Š” 2nd ๋„ค์ž„ ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€

๋ถ„์‚ฐ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ

  • ํ•˜๋‘ก 1.0

    • ํ•˜๋‚˜์˜ ์žก ํŠธ๋ž˜์ปค์™€ ๋‹ค์ˆ˜์˜ ํƒœ์Šคํฌ ํŠธ๋ž˜์ปค๋กœ ๊ตฌ์„ฑ

    • ์žก ํŠธ๋ž˜์ปค๊ฐ€ ์ผ์„ ๋‚˜๋ˆ ์„œ ๋‹ค์ˆ˜์˜ ํƒœ์Šคํฌ ํŠธ๋ž˜์ปค์—๊ฒŒ ๋ถ„๋ฐฐ

  • ํ•˜๋‘ก 2.0

    • ํด๋ผ์ด์–ธํŠธ, ๋ฆฌ์†Œ์Šค ๋งค๋‹ˆ์ €, ๋…ธ๋“œ ๋งค๋‹ˆ์ €, ์ปจํ…Œ์ด๋„ˆ๋กœ ์—ญํ•  ์„ธ๋ถ„ํ™”

ํ•˜๋‘ก์„ ์ด์šฉํ•œ ๋ฐ์ดํ„ฐ ์‹œ์Šคํ…œ ๊ตฌ์„ฑ

  • ํ•˜๋‘ก์€ ํ”ํžˆ ์ด์•ผ๊ธฐํ•˜๋Š” Data Warehouse

  • ์›ํ”Œ๋กœ์šฐ(์—ฌ๋Ÿฌ ๊ณณ์œผ๋กœ ๋ฐ์ดํ„ฐ ํ‘ธ์‹œ) ๊ด€๋ฆฌ๋กœ๋Š” Airflow๊ฐ€ ๋Œ€์„ธ

ํ•˜๋‘ก 1.0 vs ํ•˜๋‘ก 2.0

  • ํ•˜๋‘ก 2.0์„ YARN ์ด๋ผ๊ณ  ๋ถ€๋ฆ„

  • YARN์ด๋ผ๋Š” ํ”„๋ ˆ์ž„ ์›Œํฌ ์œ„์—์„œ ๋…์ž์ ์ธ ๋ถ„์‚ฐ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์ด ์ž‘๋™๋  ์ˆ˜ ์žˆ๋„๋ก ํ•จ

Big Data : Spark ์†Œ๊ฐœ II

Spark์˜ ๋“ฑ์žฅ

  • ๋ฒ„ํด๋ฆฌ ๋Œ€ํ•™์˜ AMPLab์—์„œ ์•„ํŒŒ์น˜ ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ๋กœ 2013๋…„ ์‹œ์ž‘

  • ํ•˜๋‘ก์˜ ๋’ค๋ฅผ ์ž‡๋Š” 2์„ธ๋Œ€ ๋น…๋ฐ์ดํ„ฐ ๊ธฐ์ˆ 

    • ์ž์ฒด ๋ถ„์‚ฐํ™ฉ๊ฒฝ๋„ ์ง€์›ํ•˜์ง€๋งŒ ์ฃผ๋กœ ํ•˜๋‘ก 2.0 ์œ„์—์„œ ๋ถ„์‚ฐํ™˜๊ฒฝ์œผ๋กœ ์‚ฌ์šฉ

    • ์Šค์นผ๋ผ๋กœ ์ž‘์„ฑ๋จ

  • MapReduce์˜ ๋‹จ์ ์„ ๋Œ€ํญ์ ์œผ๋กœ ๊ฐœ์„ 

    • Pandas์™€ ๊ต‰์žฅํžˆ ํก์‚ฌ

  • ํ˜„์žฌ ๋ฒ„์ „์€ Spark3

    • Scala, Java, Python3์œผ๋กœ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์ด ๊ฐ€๋Šฅ

    • ๋จธ์‹ ๋Ÿฌ๋‹ ๊ด€๋ จํ•ด์„œ ๋งŽ์€ ๊ฐœ์„ ์ด ์žˆ์Œ ex) GPU ํ™˜๊ฒฝ

Spark vs MapReduce

  • MR์€ ๋””์Šคํฌ ๊ธฐ๋ฐ˜, S๋Š” ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ฐ˜

  • MR์€ ํ•˜๋‘ก ์œ„์—์„œ๋งŒ ๋™์ž‘, S๋Š” ํ•˜๋‘ก ์ด์™ธ์˜ ํ™˜๊ฒฝ ์ง€์›

  • MR์€ ํ‚ค์™€ ๋ฐธ๋ฅ˜ ๊ธฐ๋ฐ˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ, S๋Š” ํŒ๋‹ค์Šค์™€ ํก์‚ฌ

  • S๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์˜ ์ปดํ“จํŒ…์„ ์ง€์›

    • ๋ฐฐ์น˜, ์ŠคํŠธ๋ฆฌ๋ฐ, SQL, ๋จธ์‹  ๋Ÿฌ๋‹ ๋“ฑ

Spark์˜ ๊ตฌ์กฐ

  • ๋“œ๋ผ์ด๋ฒ„ ํ”„๋กœ๊ทธ๋žจ์˜ ์กด์žฌ

  • Spark๋Š” ํ•˜๋‘ก 2.0 (ํ˜น์€ ํ•˜๋‘ก 3.0) ์œ„์— ์˜ฌ๋ผ๊ฐ€๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜

Spark ํ”„๋กœ๊ทธ๋ž˜๋ฐ

  • RDD

    • ๋กœ์šฐ๋ ˆ๋ฒจ ํ”„๋กœ๊ทธ๋ž˜๋ฐ API => ์„ธ๋ฐ€ํ•œ ์ œ์–ด ๊ฐ€๋Šฅ

    • ์ฝ”๋”ฉ์˜ ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๋‹จ์ 

  • DataFrame & Dataset

    • ์Šค์นผ๋ผ๋‚˜ ์ž๋ฐ”๋Š” ๋ฐ์ดํ„ฐ์…‹

    • ํŒŒ์ด์ฌ์€ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์‚ฌ์šฉ(ํŒŒ์ด์ฌ์€ ์ปดํŒŒ์ผ์ด ํ•„์š” ์—†๊ธฐ ๋•Œ๋ฌธ)

    • ํ•˜์ด๋ ˆ๋ฒจ ํ”„๋กœ๊ทธ๋ž˜๋ฐ API๋กœ ์ ์  ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ถ”์„ธ

    • SparkSQL์„ ์‚ฌ์šฉํ•˜๋ฉด ์ด๋ฅผ ์“ฐ๊ฒŒ ๋œ๋‹ค

ํŒ๋‹ค์Šค

  • ํŒŒ์ด์ฌ์œผ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ํ•˜๋Š”๋ฐ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๋ชจ๋“ˆ ์ค‘์˜ ํ•˜๋‚˜

    • ์—‘์…€์—์„œ ํ•˜๋Š” ์ผ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ํŒŒ์ด์ฌ ๋ชจ๋“ˆ

    • MATPLOTLIB(์‹œ๊ฐํ™”)- SCIKIT-LEARN(๋จธ์‹ ๋Ÿฌ๋‹)๊ณผ ๊ฐ™์€ ๋ชจ๋“ˆ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉ๋จ

  • ์†Œ๊ทœ๋ชจ์˜ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š”๋ฐ ์ตœ์ 

    • ํ•œ ๋Œ€์˜ ์„œ๋ฒ„์—์„œ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋กœ ํฌ๊ธฐ๊ฐ€ ์ œ์•ฝ๋จ

    • ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š์Œ

    • ์œ„ ๋‘๊ฐ€์ง€์˜ ํŠน์ง•์ด ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ์—์„œ ํŒ๋‹ค์Šค๋ฅผ ์“ฐ์ง€ ์•Š๋Š” ์ด์œ 

  • ํ•  ์ˆ˜ ์žˆ๋Š” ์ผ

    • ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์ €์žฅ

    • ๋‹ค์–‘ํ•œ ํ†ต๊ณ„ ์ง€ํ‘œ ๋„์ถœ

    • ๋ฐ์ดํ„ฐ ์ฒญ์†Œ ์ž‘์—… => ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

    • ์‹œ๊ฐํ™”

ํŒ๋‹ค์‚ฌ์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

  • ์—‘์…€์˜ ์‹œํŠธ์— ํ•ด๋‹นํ•˜๋Š” ๊ฒƒ์ด Dataframe

  • ์—‘์…€ ์‹œํŠธ์˜ ์ปฌ๋Ÿผ์— ํ•ด๋‹นํ•˜๋Š” ๊ฒƒ์ด Series

Spark ์„ธ์…˜

  • ์ŠคํŒŒํฌ ํ”„๋กœ๊ทธ๋žจ์˜ ์‹œ์ž‘์€ Spark ์„ธ์…˜์„ ๋งŒ๋“ค์–ด์•ผ ํ•œ๋‹ค

  • ์ŠคํŒŒํฌ ์„ธ์…˜์„ ํ†ตํ•ด ์ŠคํŒŒํฌ๊ฐ€ ์ œ๊ณตํ•ด์ฃผ๋Š” ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉ

  • ์ŠคํŒŒํฌ 2.0 ์ด์ „์—๋Š” ๊ธฐ๋Šฅ์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ์ปจํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ–ˆ์Œ

Spark ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

  • ํฌ๊ฒŒ 3๊ฐ€์ง€์˜ ์ž๋ฃŒ ๊ตฌ์กฐ๊ฐ€ ์กด์žฌ

    • RDD

      • ๊ฑฐ์˜ ๋Œ€๋ถ€๋ถ„์˜ ์ŠคํŒŒํฌ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

      • ์„œ๋ฒ„์— ์ €์žฅ๋œ ๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ง€์นญ

      • ๋กœ์šฐ๋ ˆ๋ฒจ ๋ฐ์ดํ„ฐ ์ด๋‹ค

      • ๊ตฌ์กฐํ™”/๋น„๊ตฌ์กฐํ™” ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์ง€์›

      • RDD๋Š” ๋‹ค์ˆ˜์˜ ํŒŒํ‹ฐ์…˜์œผ๋กœ ๊ตฌ์„ฑ๋˜๊ณ  ์ŠคํŒŒํฌ ํด๋Ÿฌ์Šคํ„ฐ๋‚ด ์„œ๋ฒ„๋“ค์— ๋‚˜๋ˆ  ์ €์žฅ๋œ๋‹ค

      • ์ด ๋•Œ RDD๋Š” ๋ฐ”๋กœ ํด๋Ÿฌ์Šคํ„ฐ์— ์ €์žฅ๋˜์ง€๋งŒ ์ผ๋ฐ˜ ํŒŒ์ด์ฌ ๋ฐ์ดํ„ฐ๋Š” parallelize ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด RDD๋กœ ๋ณ€ํ™˜๋œ ๋’ค ์ €์žฅ๋œ๋‹ค

      • ex) (๋ฌผ๋ก  ์ด๋ ‡๊ฒŒ ์ž‘์€ ๋ฐ์ดํ„ฐ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ์— ์˜ฌ๋ฆด ์ด์œ ๋Š” ์—†๋‹ค)

        py_list = [1, 2, 3, 4]
        rdd = sc.parallelize(py_list)
        print(rdd.collect())
    • Dataframe & Dataset

      • RDD๋Š” ์ปฌ๋Ÿผ์ด ์—†๋Š”๋ฐ์— ๋น„ํ•ด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„๊ณผ ๋ฐ์ดํ„ฐ ์…‹์€ ์ปฌ๋Ÿผ์ด ์กด์žฌ => ๊ฐœ๋ฐœ์ž๊ฐ€ ์œ ์šฉ

      • ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์€ ํƒ€์ž…์ด ์—†๊ณ  ๋ฐ์ดํ„ฐ์…‹์€ ํƒ€์ž…์ด ์žˆ๋‹ค

        • ํƒ€์ž…์ด ์—†๋Š” ํŒŒ์ด์ฌ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์‚ฌ์šฉ

        • ํƒ€์ž…์ด ์žˆ๋Š” ์ž๋ฐ”๋‚˜ ์Šค์นผ๋ผ๋Š” ๋ฐ์ดํ„ฐ์…‹ ์‚ฌ์šฉ

        • ๋” ์ž์„ธํžˆ๋Š” ์ปดํŒŒ์ผ ํ•˜๊ธฐ์ „์— ํƒ€์ž…์„ ๋ฏธ๋ฆฌ ์•Œ๊ณ  ์žˆ์–ด์•ผ ํ•˜๋Š”์ง€ ์•„๋‹Œ์ง€์— ๋Œ€ํ•œ ๋ฏธ๋ฌ˜ํ•œ ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค

      • parallelize : data => rdd

      • collect : rdd => data

        • ์ด ๋•Œ ์ฃผ์˜ํ•ด์•ผ ํ•  ์ ์€ collectํ•  ๋ฐ์ดํ„ฐ๊ฐ€ ์ž‘์•„์•ผ ํ•œ๋‹ค. ๊ต‰์žฅํžˆ ํฌ๋ฉด ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ด์„œ ์˜ค๋ฅ˜๊ฐ€ ๋‚  ์ˆ˜ ์žˆ์Œ

Big Data : Spark์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

์ŠคํŒŒํฌ ์„ธ์…˜

  • ์Šค์นดํ”„๋ฅผ ์‹คํ–‰ํ•  ๋•Œ๋Š” ์ŠคํŒŒํฌ ์„ธ์…˜์ด๋ผ๋Š” ์˜ค๋ธŒ์ ํŠธ๋ฅผ ๋จผ์ € ์ƒ์„ฑํ•ด์•ผํ•จ

  • ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ž‘์—…์€ ๋ชจ๋‘ ์ŠคํŒŒํฌ ํด๋Ÿฌ์Šคํ„ฐ ์œ„์—์„œ ์ž‘๋™ํ•˜๋ฉฐ ํŒŒ์ด์ฌ์ด๋‚˜ ์ž๋ฐ” ์ฝ”๋“œ๋Š” ์ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์กฐ์ž‘์„ ๋ช…๋ น๋งŒ ํ• ๋ฟ์ด๋‹ค. ์‹ค์ œ๋กœ๋Š” ์—”ํŠธ๋ฆฌ ํฌ์ธํŠธ๊ฐ€ ์žˆ์–ด์•ผ ํ•˜๋Š”๋ฐ ์ด๊ฒƒ์ด ์ŠคํŒŒํฌ ์„ธ์…˜

  • spark = SparkSession()

    • appName

    • config : ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ํ‚ค์™€ ๋ฐธ๋ฅ˜๋“ค์„ ์ŠคํŒŒํฌ ํด๋Ÿฌ์Šคํ„ฐ์—๊ฒŒ ์ „๋‹ฌ ๊ฐ€๋Šฅ

    • getOrCreate() : ์ŠคํŒŒํฌ๋ฅผ ์ƒ์„ฑ

    • sc = spark.Context

      • sc => RDD ์กฐ์ž‘

      • spark = ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์กฐ์ž‘

Spark ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ - ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ƒ์„ฑ ๋ฐฉ๋ฒ•

  • RDD๋ฅผ ๋ณ€ํ™˜ํ•ด์„œ ์ƒ์„ฑ : RDD์˜ toDF ํ•จ์ˆ˜ ์‚ฌ์šฉ

  • SQL ์ฟผ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ์„ฑ

  • ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋”ฉํ•˜์—ฌ ์ƒ์„ฑ EX) .csv / .jdbc

Spark ๊ฐœ๋ฐœ ํ™˜๊ฒฝ

  • ๊ฐœ์ธ์ปดํ“จํ„ฐ์— ์„ค์น˜ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•

    • ๊ฐ„ํŽธํ•˜๊ธฐ๋Š” ํ•˜์ง€๋งŒ ๋…ธํŠธ๋ถ๋“ฑ์„ ์„ค์น˜ํ•ด์•ผํ•จ

    • ์•„๋‹ˆ๋ฉด spark-submit๋ฅผ ์ด์šฉํ•ด ์‹คํ–‰

  • ๊ฐ์ข… ๋ฌด๋ฃŒ ๋…ธํŠธ๋ถ ์‚ฌ์šฉ

    • ๊ตฌ๊ธ€ colab

    • ๋ฐ์ดํ„ฐ๋ธŒ๋ฆญ์˜ ์ปค๋ฎค๋‹ˆํ‹ฐ ๋…ธํŠธ๋ถ(์†๋„๊ฐ€ ์ฝ”๋žฉ๋ณด๋‹ค ์ข€ ๋А๋ฆฐ ๋“ฏ)

    • ์ œํ”Œ๋ฆฐ์˜ ๋ฌด๋ฃŒ ๋…ธํŠธ๋ถ : ํ•œ ๊ฐ€์ง€ ์–ธ์–ด๊ฐ€ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ์–ธ์–ด๋กœ ํ•  ์ˆ˜ ์žˆ์Œ(์†๋„๊ฐ€ ์ฝ”๋žฉ๋ณด๋‹ค ์ข€ ๋А๋ฆฐ ๋“ฏ)

  • ์‹ค์ œ ํ˜‘์—…

    • AWS์˜ EMR ํด๋Ÿฌ์Šคํ„ฐ ์‚ฌ์šฉ

Previous3 WedNext1 Mon

Last updated 4 years ago

Was this helpful?