๐Ÿšดโ€โ™‚๏ธ
TIL
  • MAIN
  • : TIL?
  • : WIL
  • : Plan
  • : Retrospective
    • 21Y
      • Wait a moment!
      • 9M 2W
      • 9M1W
      • 8M4W
      • 8M3W
      • 8M2W
      • 8M1W
      • 7M4W
      • 7M3W
      • 7M2W
      • 7M1W
      • 6M5W
      • 1H
    • ์ƒˆ์‚ฌ๋žŒ ๋˜๊ธฐ ํ”„๋กœ์ ํŠธ
      • 2ํšŒ์ฐจ
      • 1ํšŒ์ฐจ
  • TIL : ML
    • Paper Analysis
      • BERT
      • Transformer
    • Boostcamp 2st
      • [S]Data Viz
        • (4-3) Seaborn ์‹ฌํ™”
        • (4-2) Seaborn ๊ธฐ์ดˆ
        • (4-1) Seaborn ์†Œ๊ฐœ
        • (3-4) More Tips
        • (3-3) Facet ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-2) Color ์‚ฌ์šฉํ•˜๊ธฐ
        • (3-1) Text ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-3) Scatter Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-2) Line Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (2-1) Bar Plot ์‚ฌ์šฉํ•˜๊ธฐ
        • (1-3) Python๊ณผ Matplotlib
        • (1-2) ์‹œ๊ฐํ™”์˜ ์š”์†Œ
        • (1-1) Welcome to Visualization (OT)
      • [P]MRC
        • (2๊ฐ•) Extraction-based MRC
        • (1๊ฐ•) MRC Intro & Python Basics
      • [P]KLUE
        • (5๊ฐ•) BERT ๊ธฐ๋ฐ˜ ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต
        • (4๊ฐ•) ํ•œ๊ตญ์–ด BERT ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต
        • [NLP] ๋ฌธ์žฅ ๋‚ด ๊ฐœ์ฒด๊ฐ„ ๊ด€๊ณ„ ์ถ”์ถœ
        • (3๊ฐ•) BERT ์–ธ์–ด๋ชจ๋ธ ์†Œ๊ฐœ
        • (2๊ฐ•) ์ž์—ฐ์–ด์˜ ์ „์ฒ˜๋ฆฌ
        • (1๊ฐ•) ์ธ๊ณต์ง€๋Šฅ๊ณผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
      • [U]Stage-CV
      • [U]Stage-NLP
        • 7W Retrospective
        • (10๊ฐ•) Advanced Self-supervised Pre-training Models
        • (09๊ฐ•) Self-supervised Pre-training Models
        • (08๊ฐ•) Transformer (2)
        • (07๊ฐ•) Transformer (1)
        • 6W Retrospective
        • (06๊ฐ•) Beam Search and BLEU score
        • (05๊ฐ•) Sequence to Sequence with Attention
        • (04๊ฐ•) LSTM and GRU
        • (03๊ฐ•) Recurrent Neural Network and Language Modeling
        • (02๊ฐ•) Word Embedding
        • (01๊ฐ•) Intro to NLP, Bag-of-Words
        • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Preprocessing for NMT Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Subword-level Language Model
        • [ํ•„์ˆ˜ ๊ณผ์ œ2] RNN-based Language Model
        • [์„ ํƒ ๊ณผ์ œ] BERT Fine-tuning with transformers
        • [ํ•„์ˆ˜ ๊ณผ์ œ] Data Preprocessing
      • Mask Wear Image Classification
        • 5W Retrospective
        • Report_Level1_6
        • Performance | Review
        • DAY 11 : HardVoting | MultiLabelClassification
        • DAY 10 : Cutmix
        • DAY 9 : Loss Function
        • DAY 8 : Baseline
        • DAY 7 : Class Imbalance | Stratification
        • DAY 6 : Error Fix
        • DAY 5 : Facenet | Save
        • DAY 4 : VIT | F1_Loss | LrScheduler
        • DAY 3 : DataSet/Lodaer | EfficientNet
        • DAY 2 : Labeling
        • DAY 1 : EDA
        • 2_EDA Analysis
      • [P]Stage-1
        • 4W Retrospective
        • (10๊ฐ•) Experiment Toolkits & Tips
        • (9๊ฐ•) Ensemble
        • (8๊ฐ•) Training & Inference 2
        • (7๊ฐ•) Training & Inference 1
        • (6๊ฐ•) Model 2
        • (5๊ฐ•) Model 1
        • (4๊ฐ•) Data Generation
        • (3๊ฐ•) Dataset
        • (2๊ฐ•) Image Classification & EDA
        • (1๊ฐ•) Competition with AI Stages!
      • [U]Stage-3
        • 3W Retrospective
        • PyTorch
          • (10๊ฐ•) PyTorch Troubleshooting
          • (09๊ฐ•) Hyperparameter Tuning
          • (08๊ฐ•) Multi-GPU ํ•™์Šต
          • (07๊ฐ•) Monitoring tools for PyTorch
          • (06๊ฐ•) ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
          • (05๊ฐ•) Dataset & Dataloader
          • (04๊ฐ•) AutoGrad & Optimizer
          • (03๊ฐ•) PyTorch ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์ดํ•ดํ•˜๊ธฐ
          • (02๊ฐ•) PyTorch Basics
          • (01๊ฐ•) Introduction to PyTorch
      • [U]Stage-2
        • 2W Retrospective
        • DL Basic
          • (10๊ฐ•) Generative Models 2
          • (09๊ฐ•) Generative Models 1
          • (08๊ฐ•) Sequential Models - Transformer
          • (07๊ฐ•) Sequential Models - RNN
          • (06๊ฐ•) Computer Vision Applications
          • (05๊ฐ•) Modern CNN - 1x1 convolution์˜ ์ค‘์š”์„ฑ
          • (04๊ฐ•) Convolution์€ ๋ฌด์—‡์ธ๊ฐ€?
          • (03๊ฐ•) Optimization
          • (02๊ฐ•) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ - MLP (Multi-Layer Perceptron)
          • (01๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ์šฉ์–ด ์„ค๋ช… - Historical Review
        • Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Multi-headed Attention Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] LSTM Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] CNN Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] Optimization Assignment
          • [ํ•„์ˆ˜ ๊ณผ์ œ] MLP Assignment
      • [U]Stage-1
        • 1W Retrospective
        • AI Math
          • (AI Math 10๊ฐ•) RNN ์ฒซ๊ฑธ์Œ
          • (AI Math 9๊ฐ•) CNN ์ฒซ๊ฑธ์Œ
          • (AI Math 8๊ฐ•) ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 7๊ฐ•) ํ†ต๊ณ„ํ•™ ๋ง›๋ณด๊ธฐ
          • (AI Math 6๊ฐ•) ํ™•๋ฅ ๋ก  ๋ง›๋ณด๊ธฐ
          • (AI Math 5๊ฐ•) ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต๋ฐฉ๋ฒ• ์ดํ•ดํ•˜๊ธฐ
          • (AI Math 4๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ๋งค์šด๋ง›
          • (AI Math 3๊ฐ•) ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• - ์ˆœํ•œ๋ง›
          • (AI Math 2๊ฐ•) ํ–‰๋ ฌ์ด ๋ญ์˜ˆ์š”?
          • (AI Math 1๊ฐ•) ๋ฒกํ„ฐ๊ฐ€ ๋ญ์˜ˆ์š”?
        • Python
          • (Python 7-2๊ฐ•) pandas II
          • (Python 7-1๊ฐ•) pandas I
          • (Python 6๊ฐ•) numpy
          • (Python 5-2๊ฐ•) Python data handling
          • (Python 5-1๊ฐ•) File / Exception / Log Handling
          • (Python 4-2๊ฐ•) Module and Project
          • (Python 4-1๊ฐ•) Python Object Oriented Programming
          • (Python 3-2๊ฐ•) Pythonic code
          • (Python 3-1๊ฐ•) Python Data Structure
          • (Python 2-4๊ฐ•) String and advanced function concept
          • (Python 2-3๊ฐ•) Conditionals and Loops
          • (Python 2-2๊ฐ•) Function and Console I/O
          • (Python 2-1๊ฐ•) Variables
          • (Python 1-3๊ฐ•) ํŒŒ์ด์ฌ ์ฝ”๋”ฉ ํ™˜๊ฒฝ
          • (Python 1-2๊ฐ•) ํŒŒ์ด์ฌ ๊ฐœ์š”
          • (Python 1-1๊ฐ•) Basic computer class for newbies
        • Assignment
          • [์„ ํƒ ๊ณผ์ œ 3] Maximum Likelihood Estimate
          • [์„ ํƒ ๊ณผ์ œ 2] Backpropagation
          • [์„ ํƒ ๊ณผ์ œ 1] Gradient Descent
          • [ํ•„์ˆ˜ ๊ณผ์ œ 5] Morsecode
          • [ํ•„์ˆ˜ ๊ณผ์ œ 4] Baseball
          • [ํ•„์ˆ˜ ๊ณผ์ œ 3] Text Processing 2
          • [ํ•„์ˆ˜ ๊ณผ์ œ 2] Text Processing 1
          • [ํ•„์ˆ˜ ๊ณผ์ œ 1] Basic Math
    • ๋”ฅ๋Ÿฌ๋‹ CNN ์™„๋ฒฝ ๊ฐ€์ด๋“œ - Fundamental ํŽธ
      • ์ข…ํ•ฉ ์‹ค์Šต 2 - ์บ๊ธ€ Plant Pathology(๋‚˜๋ฌด์žŽ ๋ณ‘ ์ง„๋‹จ) ๊ฒฝ์—ฐ ๋Œ€ํšŒ
      • ์ข…ํ•ฉ ์‹ค์Šต 1 - 120์ข…์˜ Dog Breed Identification ๋ชจ๋ธ ์ตœ์ ํ™”
      • ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์˜ ๋ฏธ์„ธ ์กฐ์ • ํ•™์Šต๊ณผ ๋‹ค์–‘ํ•œ Learning Rate Scheduler์˜ ์ ์šฉ
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - ResNet ์ƒ์„ธ์™€ EfficientNet ๊ฐœ์š”
      • Advanced CNN ๋ชจ๋ธ ํŒŒํ—ค์น˜๊ธฐ - AlexNet, VGGNet, GoogLeNet
      • Albumentation์„ ์ด์šฉํ•œ Augmentation๊ธฐ๋ฒ•๊ณผ Keras Sequence ํ™œ์šฉํ•˜๊ธฐ
      • ์‚ฌ์ „ ํ›ˆ๋ จ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ๊ณผ Keras Generator ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ดํ•ด
      • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์˜ ์ดํ•ด - Keras ImageDataGenerator ํ™œ์šฉ
      • CNN ๋ชจ๋ธ ๊ตฌํ˜„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ณธ ๊ธฐ๋ฒ• ์ ์šฉํ•˜๊ธฐ
    • AI School 1st
    • ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ
    • ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜
  • TIL : Python & Math
    • Do It! ์žฅ๊ณ +๋ถ€ํŠธ์ŠคํŠธ๋žฉ: ํŒŒ์ด์ฌ ์›น๊ฐœ๋ฐœ์˜ ์ •์„
      • Relations - ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„
      • Relations - ๋‹ค๋Œ€์ผ ๊ด€๊ณ„
      • ํ…œํ”Œ๋ฆฟ ํŒŒ์ผ ๋ชจ๋“ˆํ™” ํ•˜๊ธฐ
      • TDD (Test Driven Development)
      • template tags & ์กฐ๊ฑด๋ฌธ
      • ์ •์  ํŒŒ์ผ(static files) & ๋ฏธ๋””์–ด ํŒŒ์ผ(media files)
      • FBV (Function Based View)์™€ CBV (Class Based View)
      • Django ์ž…๋ฌธํ•˜๊ธฐ
      • ๋ถ€ํŠธ์ŠคํŠธ๋žฉ
      • ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์ดˆ๋‹ค์ง€๊ธฐ (HTML, CSS, JS)
      • ๋“ค์–ด๊ฐ€๊ธฐ + ํ™˜๊ฒฝ์„ค์ •
    • Algorithm
      • Programmers
        • Level1
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์ˆซ์ž ๋ฌธ์ž์—ด๊ณผ ์˜๋‹จ์–ด
          • ์ž์—ฐ์ˆ˜ ๋’ค์ง‘์–ด ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค๊ธฐ
          • ์ •์ˆ˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๋ฐฐ์น˜ํ•˜๊ธฐ
          • ์ •์ˆ˜ ์ œ๊ณฑ๊ทผ ํŒ๋ณ„
          • ์ œ์ผ ์ž‘์€ ์ˆ˜ ์ œ๊ฑฐํ•˜๊ธฐ
          • ์ง์‚ฌ๊ฐํ˜• ๋ณ„์ฐ๊ธฐ
          • ์ง์ˆ˜์™€ ํ™€์ˆ˜
          • ์ฒด์œก๋ณต
          • ์ตœ๋Œ€๊ณต์•ฝ์ˆ˜์™€ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • ์ฝœ๋ผ์ธ  ์ถ”์ธก
          • ํฌ๋ ˆ์ธ ์ธํ˜•๋ฝ‘๊ธฐ ๊ฒŒ์ž„
          • ํ‚คํŒจ๋“œ ๋ˆ„๋ฅด๊ธฐ
          • ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
          • ํฐ์ผ“๋ชฌ
          • ํ•˜์ƒค๋“œ ์ˆ˜
          • ํ•ธ๋“œํฐ ๋ฒˆํ˜ธ ๊ฐ€๋ฆฌ๊ธฐ
          • ํ–‰๋ ฌ์˜ ๋ง์…ˆ
        • Level2
          • ์ˆซ์ž์˜ ํ‘œํ˜„
          • ์ˆœ์œ„ ๊ฒ€์ƒ‰
          • ์ˆ˜์‹ ์ตœ๋Œ€ํ™”
          • ์†Œ์ˆ˜ ์ฐพ๊ธฐ
          • ์†Œ์ˆ˜ ๋งŒ๋“ค๊ธฐ
          • ์‚ผ๊ฐ ๋‹ฌํŒฝ์ด
          • ๋ฌธ์ž์—ด ์••์ถ•
          • ๋ฉ”๋‰ด ๋ฆฌ๋‰ด์–ผ
          • ๋” ๋งต๊ฒŒ
          • ๋•…๋”ฐ๋จน๊ธฐ
          • ๋ฉ€์ฉกํ•œ ์‚ฌ๊ฐํ˜•
          • ๊ด„ํ˜ธ ํšŒ์ „ํ•˜๊ธฐ
          • ๊ด„ํ˜ธ ๋ณ€ํ™˜
          • ๊ตฌ๋ช…๋ณดํŠธ
          • ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
          • ๋‰ด์Šค ํด๋Ÿฌ์Šคํ„ฐ๋ง
          • ๋‹ค๋ฆฌ๋ฅผ ์ง€๋‚˜๋Š” ํŠธ๋Ÿญ
          • ๋‹ค์Œ ํฐ ์ˆซ์ž
          • ๊ฒŒ์ž„ ๋งต ์ตœ๋‹จ๊ฑฐ๋ฆฌ
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
          • ๊ฐ€์žฅ ํฐ ์ •์‚ฌ๊ฐํ˜• ์ฐพ๊ธฐ
          • H-Index
          • JadenCase ๋ฌธ์ž์—ด ๋งŒ๋“ค๊ธฐ
          • N๊ฐœ์˜ ์ตœ์†Œ๊ณต๋ฐฐ์ˆ˜
          • N์ง„์ˆ˜ ๊ฒŒ์ž„
          • ๊ฐ€์žฅ ํฐ ์ˆ˜
          • 124 ๋‚˜๋ผ์˜ ์ˆซ์ž
          • 2๊ฐœ ์ดํ•˜๋กœ ๋‹ค๋ฅธ ๋น„ํŠธ
          • [3์ฐจ] ํŒŒ์ผ๋ช… ์ •๋ ฌ
          • [3์ฐจ] ์••์ถ•
          • ์ค„ ์„œ๋Š” ๋ฐฉ๋ฒ•
          • [3์ฐจ] ๋ฐฉ๊ธˆ ๊ทธ๊ณก
          • ๊ฑฐ๋ฆฌ๋‘๊ธฐ ํ™•์ธํ•˜๊ธฐ
        • Level3
          • ๋งค์นญ ์ ์ˆ˜
          • ์™ธ๋ฒฝ ์ ๊ฒ€
          • ๊ธฐ์ง€๊ตญ ์„ค์น˜
          • ์ˆซ์ž ๊ฒŒ์ž„
          • 110 ์˜ฎ๊ธฐ๊ธฐ
          • ๊ด‘๊ณ  ์ œ๊ฑฐ
          • ๊ธธ ์ฐพ๊ธฐ ๊ฒŒ์ž„
          • ์…”ํ‹€๋ฒ„์Šค
          • ๋‹จ์†์นด๋ฉ”๋ผ
          • ํ‘œ ํŽธ์ง‘
          • N-Queen
          • ์ง•๊ฒ€๋‹ค๋ฆฌ ๊ฑด๋„ˆ๊ธฐ
          • ์ตœ๊ณ ์˜ ์ง‘ํ•ฉ
          • ํ•ฉ์Šน ํƒ์‹œ ์š”๊ธˆ
          • ๊ฑฐ์Šค๋ฆ„๋ˆ
          • ํ•˜๋…ธ์ด์˜ ํƒ‘
          • ๋ฉ€๋ฆฌ ๋›ฐ๊ธฐ
          • ๋ชจ๋‘ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ
        • Level4
    • Head First Python
    • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ SQL
    • ๋‹จ ๋‘ ์žฅ์˜ ๋ฌธ์„œ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™” ๋ฝ€๊ฐœ๊ธฐ
    • Linear Algebra(Khan Academy)
    • ์ธ๊ณต์ง€๋Šฅ์„ ์œ„ํ•œ ์„ ํ˜•๋Œ€์ˆ˜
    • Statistics110
  • TIL : etc
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Kubernetes
    • [๋”ฐ๋ฐฐ๋Ÿฐ] Docker
      • 2. ๋„์ปค ์„ค์น˜ ์‹ค์Šต 1 - ํ•™์ŠตํŽธ(์ค€๋น„๋ฌผ/์‹ค์Šต ์œ ํ˜• ์†Œ๊ฐœ)
      • 1. ์ปจํ…Œ์ด๋„ˆ์™€ ๋„์ปค์˜ ์ดํ•ด - ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์“ฐ๋Š”์ด์œ  / ์ผ๋ฐ˜ํ”„๋กœ๊ทธ๋žจ๊ณผ ์ปจํ…Œ์ด๋„ˆํ”„๋กœ๊ทธ๋žจ์˜ ์ฐจ์ด์ 
      • 0. ๋“œ๋””์–ด ์ฐพ์•„์˜จ Docker ๊ฐ•์˜! ์™•์ดˆ๋ณด์—์„œ ๋„์ปค ๋งˆ์Šคํ„ฐ๋กœ - OT
    • CoinTrading
      • [๊ฐ€์ƒ ํ™”ํ ์ž๋™ ๋งค๋งค ํ”„๋กœ๊ทธ๋žจ] ๋ฐฑํ…Œ์ŠคํŒ… : ๊ฐ„๋‹จํ•œ ํ…Œ์ŠคํŒ…
    • Gatsby
      • 01 ๊นƒ๋ถ ํฌ๊ธฐ ์„ ์–ธ
  • TIL : Project
    • Mask Wear Image Classification
    • Project. GARIGO
  • 2021 TIL
    • CHANGED
    • JUN
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Thu
      • 2 Wed
      • 1 Tue
    • MAY
      • 31 Mon
      • 30 Sun
      • 29 Sat
      • 28 Fri
      • 27 Thu
      • 26 Wed
      • 25 Tue
      • 24 Mon
      • 23 Sun
      • 22 Sat
      • 21 Fri
      • 20 Thu
      • 19 Wed
      • 18 Tue
      • 17 Mon
      • 16 Sun
      • 15 Sat
      • 14 Fri
      • 13 Thu
      • 12 Wed
      • 11 Tue
      • 10 Mon
      • 9 Sun
      • 8 Sat
      • 7 Fri
      • 6 Thu
      • 5 Wed
      • 4 Tue
      • 3 Mon
      • 2 Sun
      • 1 Sat
    • APR
      • 30 Fri
      • 29 Thu
      • 28 Wed
      • 27 Tue
      • 26 Mon
      • 25 Sun
      • 24 Sat
      • 23 Fri
      • 22 Thu
      • 21 Wed
      • 20 Tue
      • 19 Mon
      • 18 Sun
      • 17 Sat
      • 16 Fri
      • 15 Thu
      • 14 Wed
      • 13 Tue
      • 12 Mon
      • 11 Sun
      • 10 Sat
      • 9 Fri
      • 8 Thu
      • 7 Wed
      • 6 Tue
      • 5 Mon
      • 4 Sun
      • 3 Sat
      • 2 Fri
      • 1 Thu
    • MAR
      • 31 Wed
      • 30 Tue
      • 29 Mon
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • FEB
      • 28 Sun
      • 27 Sat
      • 26 Fri
      • 25 Thu
      • 24 Wed
      • 23 Tue
      • 22 Mon
      • 21 Sun
      • 20 Sat
      • 19 Fri
      • 18 Thu
      • 17 Wed
      • 16 Tue
      • 15 Mon
      • 14 Sun
      • 13 Sat
      • 12 Fri
      • 11 Thu
      • 10 Wed
      • 9 Tue
      • 8 Mon
      • 7 Sun
      • 6 Sat
      • 5 Fri
      • 4 Thu
      • 3 Wed
      • 2 Tue
      • 1 Mon
    • JAN
      • 31 Sun
      • 30 Sat
      • 29 Fri
      • 28 Thu
      • 27 Wed
      • 26 Tue
      • 25 Mon
      • 24 Sun
      • 23 Sat
      • 22 Fri
      • 21 Thu
      • 20 Wed
      • 19 Tue
      • 18 Mon
      • 17 Sun
      • 16 Sat
      • 15 Fri
      • 14 Thu
      • 13 Wed
      • 12 Tue
      • 11 Mon
      • 10 Sun
      • 9 Sat
      • 8 Fri
      • 7 Thu
      • 6 Wed
      • 5 Tue
      • 4 Mon
      • 3 Sun
      • 2 Sat
      • 1 Fri
  • 2020 TIL
    • DEC
      • 31 Thu
      • 30 Wed
      • 29 Tue
      • 28 Mon
      • 27 Sun
      • 26 Sat
      • 25 Fri
      • 24 Thu
      • 23 Wed
      • 22 Tue
      • 21 Mon
      • 20 Sun
      • 19 Sat
      • 18 Fri
      • 17 Thu
      • 16 Wed
      • 15 Tue
      • 14 Mon
      • 13 Sun
      • 12 Sat
      • 11 Fri
      • 10 Thu
      • 9 Wed
      • 8 Tue
      • 7 Mon
      • 6 Sun
      • 5 Sat
      • 4 Fri
      • 3 Tue
      • 2 Wed
      • 1 Tue
    • NOV
      • 30 Mon
Powered by GitBook
On this page
  • (๋ถ€๋ก) ์„œ์šธ์‹œ ์ฝ”๋กœ๋‚˜19 ๋ฐœ์ƒํ˜„ํ™ฉ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
  • [1/7] ์„œ์šธ์‹œ ์ฝ”๋กœ๋‚˜ ๋ฐœ์ƒํ˜„ํ™ฉ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์ค€๋น„ํ•˜๊ธฐ
  • [2/7] ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์ „ ๋กœ๋ด‡๋ฐฐ์ œํ‘œ์ค€, ์ €์ž‘๊ถŒ, ๋ฌด๋ฆฌํ•œ ๋„คํŠธ์›Œํฌ ์š”์ฒญ ํ™•์ธํ•˜๊ธฐ
  • [3/7] ๋ธŒ๋ผ์šฐ์ €์˜ ๋„คํŠธ์›Œํฌํƒญ๊ณผ JSON ํŒŒ์ผํ˜•์‹ ์ดํ•ดํ•˜๊ธฐ
  • [4/7] ๊ธฐ์กด์˜ read_html ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์–ด์˜ฌ ์ˆ˜ ์—†๋Š” ์ด์œ ์™€ ๊ธฐ์กด์˜ ์ˆ˜์ง‘๋ฐฉ๋ฒ•
  • ์ด์ „ ์ฝ”๋“œ
  • ๋ฐ”๋€ ์ฝ”๋“œ
  • [5/7] ๋„คํŠธ์›Œํฌ ํƒญ์„ ๋ณด๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ˆ˜์ง‘ํ•  URL ์ฐพ๊ณ  ์š”์ฒญํ•˜๊ธฐ
  • ํ•จ์ˆ˜๋กœ ๋งŒ๋“ค๊ธฐ
  • [6/7] ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค๊ณ  ๋ฐ˜๋ณต๋ฌธ์œผ๋กœ ์ „์ฒด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ํ•˜๊ธฐ
  • ์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
  • [7/7] ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ์™€ ์ €์žฅํ•˜๊ณ  ํ™•์ธํ•˜๊ธฐ
  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

Was this helpful?

  1. 2021 TIL
  2. MAR

24 Wed

(๋ถ€๋ก) ์„œ์šธ์‹œ ์ฝ”๋กœ๋‚˜19 ๋ฐœ์ƒํ˜„ํ™ฉ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

[1/7] ์„œ์šธ์‹œ ์ฝ”๋กœ๋‚˜ ๋ฐœ์ƒํ˜„ํ™ฉ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์ค€๋น„ํ•˜๊ธฐ

๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ vs ํฌ๋กค๋ง

  • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ : ์›น์‚ฌ์ดํŠธ์˜ ๋‚ด์šฉ์„ ์ฝ์–ด์˜ค๋Š” ๊ฒƒ, ์Šคํฌ๋ž˜ํ•‘

  • ํฌ๋กค๋ง : ๊ฒ€์ƒ‰์—”์ง„์ด ํ•˜์ดํผ๋งํฌ๋ฅผ ํƒ€๊ณ  ์›น ํŽ˜์ด์ง€์˜ ๋‚ด์šฉ์„ ์ฝ์–ด๊ฐ€๋Š” ๊ฒƒ

ํ•„์š” ๋„๊ตฌ ์„ค์น˜

request : ์ž‘์€ ๋ธŒ๋ผ์šฐ์ €๋กœ ์›น์‚ฌ์ดํŠธ๋ฅผ ์ฝ์–ด์˜ค๋Š” ๋ชฉ์ 

beautifulsoup4 : ์ฝ์–ด์˜จ ์›น ์‚ฌ์ดํŠธ๋ฅผ ํ•ด์„ํ•˜๋Š” ๋ชฉ์ 

tqdm : ์—ฌ๋Ÿฌ ํŽ˜์ด์ง€๋ฅผ ์ฝ์–ด์˜ฌ ๋•Œ ์ง„ํ–‰ ์ƒํƒœ๋ฅผ ํ™•์ธํ•˜๋Š” ๋ชฉ์ 

  • ์˜ค๋ž˜๊ฑธ๋ฆฌ๋ฉด์„œ ๋ฐ˜๋ณต๋˜๋Š” ์ž‘์—…์„ ํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋ฉด ์ข‹๋‹ค.

[2/7] ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์ „ ๋กœ๋ด‡๋ฐฐ์ œํ‘œ์ค€, ์ €์ž‘๊ถŒ, ๋ฌด๋ฆฌํ•œ ๋„คํŠธ์›Œํฌ ์š”์ฒญ ํ™•์ธํ•˜๊ธฐ

์ˆ˜์ง‘ํ•ด๋„ ๋˜๋Š” ํŽ˜์ด์ง€์ธ์ง€ ํ™•์ธํ•˜๊ธฐ

  • ๋กœ๋ด‡ ๋ฐฐ์ œ ํ‘œ์ค€ robots.txt

    • ์›น ์‚ฌ์ดํŠธ์— ๋กœ๋ด‡์ด ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๊ทœ์•ฝ

    • ๊ถŒ๊ณ ์•ˆ์ด๋ฉฐ, ์ ‘๊ทผ ๋ฐฉ์ง€ ์„ค์ •์„ ํ•˜๋”๋ผ๋„ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ์ €์ž‘๊ถŒ

  • ๋ฌด๋ฆฌํ•œ ๋„คํŠธ์›Œํฌ ์š”์ฒญ

์„œ์šธ์‹œ ์‚ฌ์ดํŠธ robots.txt ํ™•์ธํ•˜๊ธฐ

  • ๊ฐ€์ ธ๊ฐ€๋„ ๋œ๋‹ค๊ณ  ํ—ˆ์šฉ๋˜์–ด ์žˆ์Œ

  • ์‚ฌ์ดํŠธ ๋งต๋„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค

๋„ค์ด๋ฒ„ ์‚ฌ์ดํŠธ robots.txt ํ™•์ธํ•˜๊ธฐ

User-agent: *
Disallow: /
Allow : /$ 
  • ๊ตฌ๊ธ€์—์„œ ๊ฒ€์ƒ‰์ด ์ž˜ ์•ˆ๋˜๋Š” ์ด์œ ๋Š” / ๋ฃจํŠธ ๊ฒฝ๋กœ ์ดํ•˜์˜ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๊ฑฐ๋ถ€ํ–ˆ๊ธฐ ๋•Œ๋ฌธ

  • /$ ๋Š” ๋ฉ”์ธ ํŽ˜์ด์ง€๋Š” ์ฝ์–ด๋„ ๋œ๋‹ค๋Š” ๋œป

์ €์ž‘๊ถŒ ํ™•์ธํ•˜๊ธฐ

  • ์„œ์šธ์‹œ๋Š” ๊ณต๊ณต๋ˆ„๋ฆฌ ์ €์ž‘๊ถŒ

    • ๋ณ„๋„์˜ ํ—ˆ๋ฝ์—†์ด ์ž์œ  ์ด์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค

๋ฌด๋ฆฌํ•œ ๋„คํŠธ์›Œํฌ ์š”์ฒญ

  • ์—ฌ๋Ÿฌ ํŽ˜์ด์ง€๋ฅผ ํ•œ ๋ฒˆ์— ์ฝ์–ด์˜ค๋ฉด DDOS ๊ณต๊ฒฉ์œผ๋กœ ์˜์‹ฌ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค.

    • ๋”ฐ๋ผ์„œ time.sleep() ์œผ๋กœ ๊ฐ„๊ฒฉ์„ ๋‘๊ณ  ๊ฐ€์ ธ์˜จ๋‹ค.

๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐฉ๋ฒ•

  1. ์ˆ˜์ง‘ ํ•˜๊ณ ์ž ํ•˜๋Š” ํŽ˜์ด์ง€์˜ URL์„ ์•Œ์•„๋ณธ๋‹ค

  2. ํŒŒ์ด์ฌ์˜ ์ž‘์€ ๋ธŒ๋ผ์šฐ์ € requests๋ฅผ ํ†ตํ•ด URL์— ์ ‘๊ทผํ•œ๋‹ค.

  3. response.status_code๊ฐ€ 200 OK๋ผ๋ฉด ์ •์ƒ ์‘๋‹ต

  4. request์˜ response๊ฐ’์—์„œ response.txt๋งŒ ๋ฐ›์•„์˜จ๋‹ค.

  5. html ํ…์ŠคํŠธ๋ฅผ bs(response.txt, 'html.parse')๋กœ ํ•ด์„ํ•œ๋‹ค.

  6. soup.select๋ฅผ ํ†ตํ•ด ์›ํ•˜๋Š” ํƒœ๊ทธ์— ์ ‘๊ทผํ•œ๋‹ค.

  7. ๋ชฉ๋ก์„ ๋ฐ›์•„์˜จ๋‹ค.

  8. ๋ชฉ๋ก์—์„œ ํ–‰์„ ๋ฐ›์•„์˜จ๋‹ค.

  9. ํ–‰์„ ๋ชจ์•„ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋งŒ๋“ ๋‹ค.

๋ฐ์ดํ„ฐ์˜ ์œ„์น˜

  • ๊ฐœ๋ฐœ์ž ์ฝ”๋“œ๋กœ ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ๋ณด๋ฉด html ํƒœ๊ทธ์— ํ•ด๋‹น ๋‚ด์šฉ์ด ์žˆ์ง€๋งŒ read_html๋กœ ์ฝ์–ด์˜ฌ ์ˆ˜ ์—†๋‹ค.

[3/7] ๋ธŒ๋ผ์šฐ์ €์˜ ๋„คํŠธ์›Œํฌํƒญ๊ณผ JSON ํŒŒ์ผํ˜•์‹ ์ดํ•ดํ•˜๊ธฐ

๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

  • ๋ณดํ†ต ๋ณด์ด์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ธฐ ์œ„ํ•ด ์…€๋ ˆ๋‹ˆ์›€์„ ์‚ฌ์šฉ

  • ๋ธŒ๋ผ์šฐ์ € ๋™์ž‘ ์›๋ฆฌ๋ฅผ ์•Œ๋ฉด ์…€๋ ˆ๋‹ˆ์›€์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„๋„ ์ˆ˜์ง‘ ๊ฐ€๋Šฅ

  • ๊ฐœ๋ฐœ์ž ๋„๊ตฌ - Network - XHR - URL - Preview ์ˆœ์„œ๋ฅผ ํ†ตํ•ด JSON ํƒ€์ž…์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

JSON

  • ์ œ์ด์Šจ, JavaScript Object Notation

  • ์†์„ฑ-๊ฐ’ ์Œ๋˜๋Š” ํ‚ค-๊ฐ’ ์Œ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฐ์ดํ„ฐ ์˜ค๋ธŒ์ ํŠธ๋ฅผ ์ „๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ๊ฐœ๋ฐฉํ˜• ํ‘œ์ค€ ํฌ๋งท

  • ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋‚˜ ํ”Œ๋žซํผ์— ๋…๋ฆฝ์ ์ด์–ด์„œ ์ˆ˜๋งŽ์€ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ํŒŒ์ด์ฌ์˜ ํŒ๋‹ค์Šค๋„ JSON์„ ์ฝ๊ณ  ์“ธ ์ˆ˜ ์žˆ๋‹ค.

์„œ์šธ ์ฝ”๋กœ๋‚˜ ๋ฐœ์ƒํ˜„ํ™ฉ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์ˆœ์„œ

  1. ํŽ˜์ด์ง€๋ณ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

  2. ์ „์ฒด ํŽ˜์ด์ง€ ์ˆ˜์ง‘

    • 1๋ฒˆ์„ ๋ฐ˜๋ณต๋ฌธ์„ ํ†ตํ•ด์„œ ์ง„ํ–‰

  3. pd.concat ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜๋กœ ๋ณ‘ํ•ฉ

  4. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ => html ํƒœ๊ทธ ์ œ๊ฑฐ

  5. to_csv ๋กœ ์ „์ฒด ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉ

  6. pd.read_csv ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ์ž˜ ์ €์žฅ๋˜์—ˆ๋Š”์ง€ ์ฝ์–ด์™€์„œ ํ™•์ธ

  7. ์ˆ˜์ง‘ ๋, ๋ถ„์„ ์‹œ์ž‘

[4/7] ๊ธฐ์กด์˜ read_html ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์–ด์˜ฌ ์ˆ˜ ์—†๋Š” ์ด์œ ์™€ ๊ธฐ์กด์˜ ์ˆ˜์ง‘๋ฐฉ๋ฒ•

  • ์ฝ”๋กœ๋‚˜ ํ™•์ง„์ž๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด์„œ ๊ณต๊ฐœ ๋ฐฉ์‹์ด ๋‹ฌ๋ผ์ง

์ด์ „ ์ฝ”๋“œ

# ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

import pandas as pd
import numpy as np
# ํฌ๋กค๋ง์œผ๋กœ ๊ฐ€์ ธ์˜ฌ url ์ฃผ์†Œ๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.

url = "http://www.seoul.go.kr/coronaV/coronaStatus.do"
url
'http://www.seoul.go.kr/coronaV/coronaStatus.do'
"""
11์›” ์ค‘์ˆœ ์ดํ›„ ํ™•์ง„์ž๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ ๋ฐ์ดํ„ฐ ๊ณต๊ฐœ ๋ฐฉ์‹์ด ๋‹ฌ๋ผ์ ธ์„œ pandas ์˜ read_html ๋กœ๋Š” ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.
๊ธฐ์กด์˜ read_html ๋กœ๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ๋ฐ์ดํ„ฐ๋Š” requests ๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ†ตํ•ด ์ฝ์–ด์˜ต๋‹ˆ๋‹ค.
"""
'\n11์›” ์ค‘์ˆœ ์ดํ›„ ํ™•์ง„์ž๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ ๋ฐ์ดํ„ฐ ๊ณต๊ฐœ ๋ฐฉ์‹์ด ๋‹ฌ๋ผ์ ธ์„œ pandas ์˜ read_html ๋กœ๋Š” ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.\n๊ธฐ์กด์˜ read_html ๋กœ๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ๋ฐ์ดํ„ฐ๋Š” requests ๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ†ตํ•ด ์ฝ์–ด์˜ต๋‹ˆ๋‹ค.\n'
# ํŒ๋‹ค์Šค์˜ read_html ์„ ํ†ตํ•ด ์œ„ url์— ์žˆ๋Š” ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

table = pd.read_html(url)
len(table)
6
# read_html ์€ ์‚ฌ์ดํŠธ ๋‚ด์˜ html ํƒœ๊ทธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

table[0]

๊ฐ•๋‚จ๊ตฌ

๊ฐ•๋™๊ตฌ

๊ฐ•๋ถ๊ตฌ

๊ฐ•์„œ๊ตฌ

๊ด€์•…๊ตฌ

๊ด‘์ง„๊ตฌ

๊ตฌ๋กœ๊ตฌ

๊ธˆ์ฒœ๊ตฌ

๋…ธ์›๊ตฌ

๋„๋ด‰๊ตฌ

๋™๋Œ€๋ฌธ๊ตฌ

๋™์ž‘๊ตฌ

๋งˆํฌ๊ตฌ

0

1441

1100

817

1732

1437

947

1117

483

1366

972

1114

1279

1007

1

+13

+5

+4

+3

+3

+3

+5

+1

+2

+2

+1

+6

+2

2

์„œ๋Œ€๋ฌธ๊ตฌ

์„œ์ดˆ๊ตฌ

์„ฑ๋™๊ตฌ

์„ฑ๋ถ๊ตฌ

์†กํŒŒ๊ตฌ

์–‘์ฒœ๊ตฌ

์˜๋“ฑํฌ๊ตฌ

์šฉ์‚ฐ๊ตฌ

์€ํ‰๊ตฌ

์ข…๋กœ๊ตฌ

์ค‘๊ตฌ

์ค‘๋ž‘๊ตฌ

๊ธฐํƒ€

3

796

1174

807

1332

1717

1170

1147

864

1308

596

501

1292

3460

4

+1

+2

+4

+4

+7

+4

+3

+5

+9

0

+6

+1

+1

# table ๋ณ€์ˆ˜ ์•ˆ์— ์žˆ๋Š” table ํƒœ๊ทธ์˜ ๊ฐ’์„ ํ•˜๋‚˜์”ฉ ์ฝ์–ด๋ณด๋ฉฐ ํ™•์ง„์ž ๋ชฉ๋ก์ด ์žˆ๋Š” ํ…Œ์ด๋ธ”์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

table[1]

๊ฐ•๋‚จ๊ตฌ

๊ฐ•๋™๊ตฌ

๊ฐ•๋ถ๊ตฌ

๊ฐ•์„œ๊ตฌ

๊ด€์•…๊ตฌ

๊ด‘์ง„๊ตฌ

0

1441

1100

817

1732

1437

947

1

+13

+5

+4

+3

+3

+3

2

๊ตฌ๋กœ๊ตฌ

๊ธˆ์ฒœ๊ตฌ

๋…ธ์›๊ตฌ

๋„๋ด‰๊ตฌ

๋™๋Œ€๋ฌธ๊ตฌ

๋™์ž‘๊ตฌ

3

1117

483

1366

972

1114

1279

4

+5

+1

+2

+2

+1

+6

5

๋งˆํฌ๊ตฌ

์„œ๋Œ€๋ฌธ๊ตฌ

์„œ์ดˆ๊ตฌ

์„ฑ๋™๊ตฌ

์„ฑ๋ถ๊ตฌ

์†กํŒŒ๊ตฌ

6

1007

796

1174

807

1332

1717

7

+2

+1

+2

+4

+4

+7

8

์–‘์ฒœ๊ตฌ

์˜๋“ฑํฌ๊ตฌ

์šฉ์‚ฐ๊ตฌ

์€ํ‰๊ตฌ

์ข…๋กœ๊ตฌ

์ค‘๊ตฌ

9

1170

1147

864

1308

596

501

10

+4

+3

+5

+9

0

+6

11

์ค‘๋ž‘๊ตฌ

๊ธฐํƒ€

NaN

NaN

NaN

NaN

12

1292

3460

NaN

NaN

NaN

NaN

13

+1

+1

NaN

NaN

NaN

NaN

# ๋ฏธ๋ฆฌ๋ณด๊ธฐ ํ•ฉ๋‹ˆ๋‹ค.

df = table[3]
df.head()

์—ฐ๋ฒˆ

ํ™˜์ž

ํ™•์ง„์ผ

๊ฑฐ์ฃผ์ง€

์—ฌํ–‰๋ ฅ

์ ‘์ด‰๋ ฅ

ํ‡ด์›ํ˜„ํ™ฉ

๋ฐ”๋€ ์ฝ”๋“œ

import requests
# f-string
url = "https://news.seoul.go.kr/api/27/getCorona19Status/get_status_ajax.php?draw=1"
# url = f"{url}&columns%5B0%5D%5Bdata%5D=0&columns%5B0%5D%5Bname%5D=&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=true&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=1&columns%5B1%5D%5Bname%5D=&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=2&columns%5B2%5D%5Bname%5D=&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=true&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=3&columns%5B3%5D%5Bname%5D=&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=true&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B4%5D%5Bdata%5D=4&columns%5B4%5D%5Bname%5D=&columns%5B4%5D%5Bsearchable%5D=true&columns%5B4%5D%5Borderable%5D=true&columns%5B4%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B5%5D%5Bdata%5D=5&columns%5B5%5D%5Bname%5D=&columns%5B5%5D%5Bsearchable%5D=true&columns%5B5%5D%5Borderable%5D=true&columns%5B5%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B6%5D%5Bdata%5D=6&columns%5B6%5D%5Bname%5D=&columns%5B6%5D%5Bsearchable%5D=true&columns%5B6%5D%5Borderable%5D=true&columns%5B6%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=false&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=desc"
url = f"{url}&start=0&length=100"
# "&search%5Bvalue%5D=&search%5Bregex%5D=true&_=1606633538547"
url
'https://news.seoul.go.kr/api/27/getCorona19Status/get_status_ajax.php?draw=1&start=0&length=100'
  • url์ด ๋„ˆ๋ฌด ๊ธธ๊ธฐ ๋•Œ๋ฌธ์— f-string์„ ์ด์šฉํ•˜์—ฌ ์ž๋ฅธ๋‹ค

  • ์ค‘์š”ํ•œ ๋ถ€๋ถ„์€ draw์™€start

    • draw ๋Š” ํŽ˜์ด์ง€ ๋ฒˆํ˜ธ

    • start ๋Š” ํ…Œ์ด๋ธ” ๋ฒˆํ˜ธ

response = requests.get(url)
data_json = response.json()
# pd.DataFrame(data_json["data"])
  • requests.get ์„ ํ†ตํ•ด url ์„ ๊ฐ€์ ธ์˜ค๊ณ  ์ด๋ฅผ json ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜

records_total = data_json['recordsTotal']
records_total
10976
end_page = round(records_total / 100) + 1
end_page
111
  • ํ˜„์žฌ๋Š” 10976๋ช…, 111 ํŽ˜์ด์ง€ 2021.03.23

data = data_json["data"]
# data
pd.DataFrame(data)

0

1

2

3

4

5

6

0

<p class='corona19_no'>30976</p>

99355

2021-03-22

์šฉ์‚ฐ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

1

<p class='corona19_no'>30975</p>

99325

2021-03-22

๋„๋ด‰๊ตฌ

-

ํƒ€์‹œ๋„ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

2

<p class='corona19_no'>30974</p>

99409

2021-03-22

์˜๋“ฑํฌ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

3

<p class='corona19_no'>30973</p>

99415

2021-03-22

๊ด‘์ง„๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

4

<p class='corona19_no'>30972</p>

99394

2021-03-22

๊ฐ•๋ถ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

...

...

...

...

...

...

...

...

95

<p class='corona19_no'>30881</p>

99100

2021-03-22

๋งˆํฌ๊ตฌ

-

ํƒ€์‹œ๋„ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

96

<p class='corona19_no'>30880</p>

99095

2021-03-22

๋งˆํฌ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

97

<p class='corona19_no'>30879</p>

98120

2021-03-20

์ค‘๊ตฌ

-

๊ธฐํƒ€ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

98

<p class='corona19_no'>30878</p>

99016

2021-03-21

๊ฐ•๋ถ๊ตฌ

-

๊ธฐํƒ€ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

99

<p class='corona19_no'>30877</p>

98966

2021-03-21

๋งˆํฌ๊ตฌ

-

๊ธฐํƒ€ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

100 rows ร— 7 columns

  • 1~6์€ ์—ฐ๋ฒˆ, ํ™˜์ž, ํ™•์ง„์ผ ๋“ฑ์˜ ์ •๋ณด

  • html ํƒœ๊ทธ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์•„ ์ „์ฒ˜๋ฆฌ ํ•„์š”

  • ํ•ด๋‹น ํŽ˜์ด์ง€๋Š” 100๊ฐœ์˜ ํ…Œ์ด๋ธ”๋งŒ ๊ฐ€์ง€๊ณ  ์™”๊ณ  ์ด๋ฅผ ๋ฐ˜๋ณตํ•ด์„œ ๊ฐ€์ง€๊ณ  ์˜ค๊ธฐ ์œ„ํ•ด ํ•จ์ˆ˜ ํ•„์š”

[5/7] ๋„คํŠธ์›Œํฌ ํƒญ์„ ๋ณด๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ˆ˜์ง‘ํ•  URL ์ฐพ๊ณ  ์š”์ฒญํ•˜๊ธฐ

ํ•จ์ˆ˜๋กœ ๋งŒ๋“ค๊ธฐ

def get_seoul_covid19_100(page_no):
    """
    page_no : ์ž…๋ ฅ๊ฐ’์œผ๋กœ ํŽ˜์ด์ง€ ๋ฒˆํ˜ธ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ•ด๋‹น ๋ฒˆํ˜ธ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ด
    start_no : ์ž…๋ ฅ๋ฐ›์€ page_no๋กœ ์‹œ์ž‘ ๋ฒˆํ˜ธ๋ฅผ ์˜๋ฏธ
    """
    start_no = (page_no - 1) * 100
    url = f"https://news.seoul.go.kr/api/27/getCorona19Status/get_status_ajax.php?draw={page_no}"
    url = f"{url}&order%5B0%5D%5Bdir%5D=desc&start={start_no}&length=100"
    response = requests.get(url)
    data_json = response.json()
    return data_json
# ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ ๋•Œ 
# get_seoul_covid19_100(3)
# ๋ฐ์ดํ„ฐ๊ฐ€ ์—†์„ ๋•Œ 
get_seoul_covid19_100(-1)
{'data': [], 'draw': -1, 'recordsFiltered': 10976, 'recordsTotal': 10976}
  • ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์–ด๋„ ์ž˜ ๊ฐ€์ ธ์˜ค๊ณ , ๋ฐ์ดํ„ฐ๊ฐ€ ์—†์–ด๋„ [] ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์˜ˆ์™ธ์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š” ์—†์Œ

    • ์—๋Ÿฌ๊ฐ€ ๋‚˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์˜ˆ์™ธ์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š” ์—†์ง€๋งŒ, ํ•  ํ•„์š”๋Š” ์žˆ์Œ

[6/7] ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค๊ณ  ๋ฐ˜๋ณต๋ฌธ์œผ๋กœ ์ „์ฒด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ํ•˜๊ธฐ

์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ

tqdm

  • ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋Š” ์ž‘์—…์˜ ์ง„ํ–‰์ƒํƒœ๋ฅผ ํ‘œ์‹œํ•ด ์ค๋‹ˆ๋‹ค.

  • ๋ณ„๋„์˜ ์„ค์น˜๊ฐ€ ํ•„์š” ํ•ฉ๋‹ˆ๋‹ค.

  • !pip install tqdm ์œผ๋กœ ์„ค์น˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

# !pip install tqdm
# time.sleep ์œผ๋กœ ์‹œ์ฐจ๋ฅผ ๋‘๊ธฐ ์œ„ํ•ด
import time
# tqdm : ์ง„ํ–‰์ƒํƒœ๋ฅผ ํ‘œ์‹œํ•˜๊ธฐ ์œ„ํ•ด
from tqdm import trange
# # ์ฃผ์„์ฒ˜๋ฆฌ : ctrl + /
# page_list = []
# for page_no in trange(1, 4):
#     one_page = get_seoul_covid19_100(page_no)
#     if len(one_page["data"]) > 0:
#         one_page = pd.DataFrame(one_page["data"])
#         page_list.append(one_page)
#         time.sleep(0.5)
#     else:
#         break
# page_list
# pd.concat(page_list)
  • ์˜์‹ฌ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด time ์‚ฌ์šฉ

  • trange ๋ฅผ ์“ฐ๋ฉด ์ง„ํ–‰์ƒํ™ฉ์„ ์‹œ๊ฐ์ ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Œ

# ์ „์ฒด ํŽ˜์ด์ง€๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์ „์— ์ผ๋ถ€ ํŽ˜์ด์ง€๋งŒ ์‹คํ–‰
page_list = []
# ๋ฐ์ดํ„ฐ๊ฐ€ ์ œ๋Œ€๋กœ ๋กœ๋“œ ๋˜๋Š”์ง€ ์•ž๋ถ€๋ถ„ 3ํŽ˜์ด์ง€ ์ •๋„๋งŒ ํ™•์ธํ•˜๊ณ  ์ „์ฒดํŽ˜์ด์ง€๋ฅผ ๊ฐ€์ ธ์˜ค๋ก ํ•ฉ๋‹ˆ๋‹ค.
# ์ฒ˜์Œ๋ถ€ํ„ฐ ์ „์ฒด ํŽ˜์ด์ง€๋ฅผ ์ˆ˜์ง‘ํ•˜๋ฉด ์ค‘๊ฐ„์— ์˜ค๋ฅ˜๊ฐ€ ๋‚˜๋„ ์ฐพ๊ธฐ๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.
# ์ผ๋ถ€๋งŒ ์šฐ์„  ๊ฐ€์ ธ์™€ ๋ณด๊ณ  ์ž˜ ๋™์ž‘ํ•œ๋‹ค๋ฉด ์ „์ฒด๋ฅผ ๊ฐ€์ ธ์˜ค๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
all_page = 3
for page_no in trange(all_page + 1):
    one_page = get_seoul_covid19_100(page_no)
    one_page = pd.DataFrame(one_page["data"])
    page_list.append(one_page)
    # ์„œ๋ฒ„์— ํ•œ๋ฒˆ์— ๋„ˆ๋ฌด ๋งŽ์€์š”์ฒญ์„ ๋ณด๋‚ด๋ฉด ์„œ๋ฒ„์— ๋ถ€๋‹ด์ด ๋ฉ๋‹ˆ๋‹ค.
    # ์„œ๋ฒ„์— ๋ถ€๋‹ด์„ ์ฃผ์ง€ ์•Š๊ธฐ ์œ„์•  0.5์ดˆ์”ฉ ์‰ฌ์—ˆ๋‹ค ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
    time.sleep(0.5)
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4/4 [00:05<00:00,  1.43s/it]
# ๊ฐ€์ ธ์˜จ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งž๋Š”์ง€ ํ™•์ธ ๋„ˆ๋ฌด ๋งŽ์„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์Šฌ๋ผ์ด์‹ฑ์œผ๋กœ ์ž˜๋ผ์„œ ๋ณด๊ธฐ
pd.concat(page_list)

0

1

2

3

4

5

6

0

<p class='corona19_no'>30976</p>

99355

2021-03-22

์šฉ์‚ฐ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

1

<p class='corona19_no'>30975</p>

99325

2021-03-22

๋„๋ด‰๊ตฌ

-

ํƒ€์‹œ๋„ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

2

<p class='corona19_no'>30974</p>

99409

2021-03-22

์˜๋“ฑํฌ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

3

<p class='corona19_no'>30973</p>

99415

2021-03-22

๊ด‘์ง„๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

4

<p class='corona19_no'>30972</p>

99394

2021-03-22

๊ฐ•๋ถ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

...

...

...

...

...

...

...

...

95

<p class='corona19_no'>30681</p>

98407

2021-03-20

์„ฑ๋ถ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

96

<p class='corona19_no'>30680</p>

98373

2021-03-20

๋™๋Œ€๋ฌธ๊ตฌ

-

๊ธฐํƒ€ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

97

<p class='corona19_no'>30679</p>

98386

2021-03-20

๊ฐ•๋ถ๊ตฌ

-

๊ธฐํƒ€ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

98

<p class='corona19_no'>30678</p>

98291

2021-03-20

์„ฑ๋ถ๊ตฌ

-

๋…ธ์›๊ตฌ ์†Œ์žฌ ๊ณต๊ณต๊ธฐ๊ด€ ๊ด€๋ จ

<b class=''></b>

99

<p class='corona19_no'>30677</p>

98400

2021-03-20

์„ฑ๋ถ๊ตฌ

-

ํƒ€์‹œ๋„ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

300 rows ร— 7 columns

def get_multi_page_list(start_page, end_page = 80):
    # ๋ฐ์ดํ„ฐ๊ฐ€ ์ œ๋Œ€๋กœ ๋กœ๋“œ ๋˜๋Š”์ง€ ์•ž๋ถ€๋ถ„ 3ํŽ˜์ด์ง€ ์ •๋„๋งŒ ํ™•์ธํ•˜๊ณ  ์ „์ฒดํŽ˜์ด์ง€๋ฅผ ๊ฐ€์ ธ์˜ค๋ก ํ•ฉ๋‹ˆ๋‹ค.
    
    page_list = []
    for page_no in trange(start_page, end_page + 1):
        one_page = get_seoul_covid19_100(page_no)
        if len(one_page["data"]) > 0:
            one_page = pd.DataFrame(one_page["data"])
            page_list.append(one_page)
            # ์„œ๋ฒ„์— ํ•œ๋ฒˆ์— ๋„ˆ๋ฌด ๋งŽ์€์š”์ฒญ์„ ๋ณด๋‚ด๋ฉด ์„œ๋ฒ„์— ๋ถ€๋‹ด์ด ๋ฉ๋‹ˆ๋‹ค.
            # ์„œ๋ฒ„์— ๋ถ€๋‹ด์„ ์ฃผ์ง€ ์•Š๊ธฐ ์œ„์•  0.5์ดˆ์”ฉ ์‰ฌ์—ˆ๋‹ค ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
            time.sleep(0.5)
        else:
            # ์ˆ˜์ง‘๋œ ๊ฐ’์ด ์—†๋‹ค๋ฉด False๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
            # False ๋ฐ˜ํ™˜ ์‹œ ์ˆ˜์ง‘ํ•œ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
            return page_list
    return page_list
  • one_page["data"] ๊ฐ€ 0์ดํ•˜๋ฉด ValueError๊ฐ€ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ ์˜ˆ์™ธ์ฒ˜๋ฆฌ

# ๋”ฐ๋กœ ์„ค์ •ํ•˜์ง€ ์•Š์œผ๋ฉด end_page ๋ณ€์ˆ˜์— ๋“ค์–ด์žˆ๋Š” ์ˆซ์ž๊ฐ€ ๋งˆ์ง€๋ง‰ ํŽ˜์ด์ง€๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
end_page
111
# ์‹œ์ž‘ํŽ˜์ด์ง€์™€ ๋ํŽ˜์ด์ง€๋ฅผ ๊ผญ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.
start_page = 1
# end_page = 88
page_list = get_multi_page_list(start_page, end_page)
# ๋ฐ์ดํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์Šฌ๋ผ์ด์‹ฑ์œผ๋กœ 1๊ฐœ๋งŒ ๋ฏธ๋ฆฌ๋ณด๊ธฐ
page_list[:1]
 99%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰| 110/111 [02:53<00:01,  1.42s/it]
  • ๊ฒฐ๊ณผ๋Š” ์ƒ๋žต, ๊ต‰์žฅํžˆ ๋งŽ์Œ

# concat์„ ํ†ตํ•ด ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ํ•ฉ์ณ์ค๋‹ˆ๋‹ค.
df_all = pd.concat(page_list)
df_all.shape
(10976, 7)
df

์—ฐ๋ฒˆ

ํ™˜์ž

ํ™•์ง„์ผ

๊ฑฐ์ฃผ์ง€

์—ฌํ–‰๋ ฅ

์ ‘์ด‰๋ ฅ

ํ‡ด์›ํ˜„ํ™ฉ

# read_html ๋กœ ์ฝ์–ด์˜จ 3๋ฒˆ์งธ ํ…Œ์ด๋ธ”์˜ ์ปฌ๋Ÿผ๋ช…์„ ์ˆ˜์ง‘ํ•œ ๋ฐ์ดํ„ฐ์˜ ์ปฌ๋Ÿผ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
cols = df.columns.tolist()
cols
['์—ฐ๋ฒˆ', 'ํ™˜์ž', 'ํ™•์ง„์ผ', '๊ฑฐ์ฃผ์ง€', '์—ฌํ–‰๋ ฅ', '์ ‘์ด‰๋ ฅ', 'ํ‡ด์›ํ˜„ํ™ฉ']
df_all.columns = cols
df_all.head()

์—ฐ๋ฒˆ

ํ™˜์ž

ํ™•์ง„์ผ

๊ฑฐ์ฃผ์ง€

์—ฌํ–‰๋ ฅ

์ ‘์ด‰๋ ฅ

ํ‡ด์›ํ˜„ํ™ฉ

0

<p class='corona19_no'>30976</p>

99355

2021-03-22

์šฉ์‚ฐ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

1

<p class='corona19_no'>30975</p>

99325

2021-03-22

๋„๋ด‰๊ตฌ

-

ํƒ€์‹œ๋„ ํ™•์ง„์ž ์ ‘์ด‰

<b class=''></b>

2

<p class='corona19_no'>30974</p>

99409

2021-03-22

์˜๋“ฑํฌ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

3

<p class='corona19_no'>30973</p>

99415

2021-03-22

๊ด‘์ง„๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

4

<p class='corona19_no'>30972</p>

99394

2021-03-22

๊ฐ•๋ถ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

<b class=''></b>

df_all.shape
(10976, 7)
  • ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ์น˜๊ณ  ์ปฌ๋Ÿผ๋ช…์„ ๋ถ€์—ฌํ•˜๋Š” ๊ณผ์ •

  • 10976๋ช…์˜ ๋ฐ์ดํ„ฐ์™€ 7๊ฐœ์˜ ์ปฌ๋Ÿผ ์กด์žฌ

[7/7] ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ์™€ ์ €์žฅํ•˜๊ณ  ํ™•์ธํ•˜๊ธฐ

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

์—ฐ๋ฒˆ๊ณผ ํ‡ด์›ํ˜„ํ™ฉ

import re
def extract_number(num_string):
    if type(num_string) == str:
        num_string = num_string.replace("corona19", "")
        num = re.sub("[^0-9]", "", num_string)
        num = int(num)
        return num
    else:
        return num_string
num_string = "<p class='corona19_no'>7625</p>"
extract_number(num_string)
7625
df_all["์—ฐ๋ฒˆ"] = df_all["์—ฐ๋ฒˆ"].map(extract_number)
df_all["์—ฐ๋ฒˆ"].head()
0    30976
1    30975
2    30974
3    30973
4    30972
Name: ์—ฐ๋ฒˆ, dtype: int64
def extract_hangeul(origin_text):
    subtract_text = re.sub("[^๊ฐ€-ํžฃ]", "", origin_text)
    return subtract_text
extract_hangeul("<b class='status1'>ํ‡ด์›</b>")
'ํ‡ด์›'
extract_hangeul("<b class='status2'>์‚ฌ๋ง</b>")
'์‚ฌ๋ง'
extract_hangeul("<b class=''></b>")
''
# ์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ๊ณ  str.contains๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
# df_all["ํ‡ด์›ํ˜„ํ™ฉ"] = df_all["ํ‡ด์›ํ˜„ํ™ฉ"].map(extract_hangeul)
# df_all["ํ‡ด์›ํ˜„ํ™ฉ"].value_counts()
df_all["ํ‡ด์›ํ˜„ํ™ฉ"].value_counts()
<b class='status1'>ํ‡ด์›</b>    8741
<b class='status1'></b>      1524
<b class=''></b>              514
<b class='status2'></b>       104
<b class='status2'>์‚ฌ๋ง</b>      93
Name: ํ‡ด์›ํ˜„ํ™ฉ, dtype: int64
df_all.loc[df_all["ํ‡ด์›ํ˜„ํ™ฉ"].str.contains("ํ‡ด์›"), "ํ‡ด์›ํ˜„ํ™ฉ"] = "ํ‡ด์›"
df_all.loc[df_all["ํ‡ด์›ํ˜„ํ™ฉ"].str.contains("์‚ฌ๋ง"), "ํ‡ด์›ํ˜„ํ™ฉ"] = "์‚ฌ๋ง"
df_all.loc[~df_all["ํ‡ด์›ํ˜„ํ™ฉ"].str.contains("ํ‡ด์›|์‚ฌ๋ง"), "ํ‡ด์›ํ˜„ํ™ฉ"] = np.nan
df_all["ํ‡ด์›ํ˜„ํ™ฉ"].value_counts()
ํ‡ด์›    8741
์‚ฌ๋ง      93
Name: ํ‡ด์›ํ˜„ํ™ฉ, dtype: int64
last_date = df_all.iloc[0]["ํ™•์ง„์ผ"]
last_date
'2021-03-22'
# ๋งˆ์ง€๋ง‰ ํ™•์ง„์ผ์„ ํŒŒ์ผ๋ช…์— ์จ์ฃผ๊ธฐ ์œ„ํ•ด . ์„ _ ๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.
# ํ™•์žฅ์ž์™€ ๊ตฌ๋ถ„์ด ์‰ฝ๊ฒŒ ๋˜๋„๋ก _ ๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. 

date = last_date.replace(".", "_")
date
'2021-03-22'
# ํŒŒ์ผ๋ช…์„ ๋งŒ๋“ค์–ด ์ค๋‹ˆ๋‹ค.
# file_name

file_name = f"seoul-covid19-{date}.csv"
file_name
'seoul-covid19-2021-03-22.csv'
# csv ํŒŒ์ผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
df_all.to_csv(file_name, index=False)
# ์ œ๋Œ€๋กœ ์ €์žฅ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
pd.read_csv(file_name)

์—ฐ๋ฒˆ

ํ™˜์ž

ํ™•์ง„์ผ

๊ฑฐ์ฃผ์ง€

์—ฌํ–‰๋ ฅ

์ ‘์ด‰๋ ฅ

ํ‡ด์›ํ˜„ํ™ฉ

0

30976

99355

2021-03-22

์šฉ์‚ฐ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

NaN

1

30975

99325

2021-03-22

๋„๋ด‰๊ตฌ

-

ํƒ€์‹œ๋„ ํ™•์ง„์ž ์ ‘์ด‰

NaN

2

30974

99409

2021-03-22

์˜๋“ฑํฌ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

NaN

3

30973

99415

2021-03-22

๊ด‘์ง„๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

NaN

4

30972

99394

2021-03-22

๊ฐ•๋ถ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

NaN

...

...

...

...

...

...

...

...

10971

20005

63500

2021-01-03

๊ตฌ๋กœ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

์‚ฌ๋ง

10972

20004

63375

2021-01-03

ํƒ€์‹œ๋„

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

ํ‡ด์›

10973

20003

64010

2021-01-03

์„ฑ๋ถ๊ตฌ

-

ํƒ€์‹œ๋„ ํ™•์ง„์ž ์ ‘์ด‰

ํ‡ด์›

10974

20002

64155

2021-01-03

๊ตฌ๋กœ๊ตฌ

-

๊ฐ์—ผ๊ฒฝ๋กœ ์กฐ์‚ฌ์ค‘

ํ‡ด์›

10975

20001

64083

2021-01-03

๊ฐ•๋‚จ๊ตฌ

-

๊ธฐํƒ€ ํ™•์ง„์ž ์ ‘์ด‰

ํ‡ด์›

10976 rows ร— 7 columns

  • re ์ •๊ทœํ‘œํ˜„์‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒ˜๋ฆฌ๋ฅผ ํ•จ

    • <p class='corona19_no'>30976</p> => 30976

    • [0-9] ๊ฐ€ ์•„๋‹ˆ๋ฉด ๋นˆ ๋ฌธ์ž ์ฒ˜๋ฆฌ ๊ทธ๋ฆฌ๊ณ  corona19๋„ ๋นˆ ๋ฌธ์ž ์ฒ˜๋ฆฌ

Previous25 ThuNext23 Tue

Last updated 4 years ago

Was this helpful?

tqdm documentation