15 Tue
TIL
ํ๋ก๊ทธ๋๋จธ์ค AI ์ค์ฟจ 1๊ธฐ
3์ฃผ์ฐจ DAY 2
I. pandas ์์ํ๊ธฐ
Prerequisite : Table
ํ๊ณผ ์ด์ ์ด์ฉํด์ ๋ฐ์ดํฐ๋ฅผ ์ ์ฅํ๊ณ ๊ด๋ฆฌํ๋ ์๋ฃ๊ตฌ์กฐ(์ปจํ ์ด๋)
์ฃผ๋ก ํ์ ๊ฐ์ฒด, ์ด์ ์์ฑ์ ๋ํ๋
Pandas ์ค์นํ๊ธฐ
pip install pandas
Pandas ์์ํ๊ธฐ
import pandas
๋ฅผ ํตํด์ ์งํ pandas๋ ๊ด์ต์ ์ผ๋ก pd ์ถ์ฝํด์ ์ฌ์ฉ
II. pandas๋ก 1์ฐจ์ ๋ฐ์ดํฐ ๋ค๋ฃจ๊ธฐ - Series
Series?
1-D labeled array
์ธ๋ฑ์ค๋ฅผ ์ง์ ํด์ค ์ ์์
Series + Numpy
Series๋ ndarray์ ์ ์ฌํ๋ค!
pandas์ numpy์ ์ ์ฌ์ฑ์ ๋ณผ ์ ์์
Series + dict
series๋ dict์ ์ ์ฌํ๋ค
Series์ ์ด๋ฆ ๋ถ์ด๊ธฐ
name์์ฑ์ ๊ฐ์ง๊ณ ์๋ค.
์ฒ์ Series๋ฅผ ๋ง๋ค ๋ ์ด๋ฆ์ ๋ถ์ผ ์ ์๋ค.
Numpy์ Random ํจ์ 3๊ฐ
dtype ๋ฟ๋ง ์๋๋ผ Name๋ ์ถ๋ ฅ
III. Pandas๋ก 2์ฐจ์ ๋ฐ์ดํฐ ๋ค๋ฃจ๊ธฐ - dataframe
dataframe?
2-D labeled table
์ธ๋ฑ์ค๋ฅผ ์ง์ ํ ์๋ ์์
ํ๋ ํ ์ด๋ธ๋ฑ์ 2์ฐจ์ ๋ฐ์ดํฐ๋ฅผ ํํํ๊ธฐ์ ๋ฆฌ์คํธ๋ ๋ถ์ ํฉ
๋ฐ๋ผ์, ๋์ ๋๋ฆฌ ์ฌ์ฉ
height
weight
0
1
30
1
2
40
2
3
50
3
4
60
From CSV to DataFrame
CSV : Comma Separated Value
pandas๋ csv ํ์ผ์ dataframeํ ํ ์์๋ ํจ์๋ฅผ ์ ๊ณต
.read_csv()
๋ฅผ ์ด์ฉ์ค์ ๋ก csv๋
,
๋ก ๊ตฌ๋ถ๋ ๋ฐ์ดํฐ๋ค๋ก ์ด๋ฃจ์ด์ ธ์์csv์ ๊ฐ ์ฒซ์ค์๋ ๊ฐ ์ปฌ๋ผ์ ํด๋นํ๋ ํญ๋ชฉ์ด๋ฆ
Country/Region
Confirmed
Deaths
Recovered
Active
New cases
New deaths
New recovered
Deaths / 100 Cases
Recovered / 100 Cases
Deaths / 100 Recovered
Confirmed last week
1 week change
1 week % increase
WHO Region
0
Afghanistan
36263
1269
25198
9796
106
10
18
3.50
69.49
5.04
35526
737
2.07
Eastern Mediterranean
1
Albania
4880
144
2745
1991
117
6
63
2.95
56.25
5.25
4171
709
17.00
Europe
2
Algeria
27973
1163
18837
7973
616
8
749
4.16
67.34
6.17
23691
4282
18.07
Africa
3
Andorra
907
52
803
52
10
0
0
5.73
88.53
6.48
884
23
2.60
Europe
4
Angola
950
41
242
667
18
1
0
4.32
25.47
16.94
749
201
26.84
Africa
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
182
West Bank and Gaza
10621
78
3752
6791
152
2
0
0.73
35.33
2.08
8916
1705
19.12
Eastern Mediterranean
183
Western Sahara
10
1
8
1
0
0
0
10.00
80.00
12.50
10
0
0.00
Africa
184
Yemen
1691
483
833
375
10
4
36
28.56
49.26
57.98
1619
72
4.45
Eastern Mediterranean
185
Zambia
4552
140
2815
1597
71
1
465
3.08
61.84
4.97
3326
1226
36.86
Africa
186
Zimbabwe
2704
36
542
2126
192
2
24
1.33
20.04
6.64
1713
991
57.85
Africa
187 rows ร 15 columns
Pandas ํ์ฉ 1. ์ผ๋ถ๋ถ๋ง ๊ด์ฐฐํ๊ธฐ
head(n)
: ์ฒ์ n๊ฐ์ ๋ฐ์ดํฐ ์ฐธ์กฐ
Country/Region
Confirmed
Deaths
Recovered
Active
New cases
New deaths
New recovered
Deaths / 100 Cases
Recovered / 100 Cases
Deaths / 100 Recovered
Confirmed last week
1 week change
1 week % increase
WHO Region
0
Afghanistan
36263
1269
25198
9796
106
10
18
3.50
69.49
5.04
35526
737
2.07
Eastern Mediterranean
1
Albania
4880
144
2745
1991
117
6
63
2.95
56.25
5.25
4171
709
17.00
Europe
2
Algeria
27973
1163
18837
7973
616
8
749
4.16
67.34
6.17
23691
4282
18.07
Africa
3
Andorra
907
52
803
52
10
0
0
5.73
88.53
6.48
884
23
2.60
Europe
4
Angola
950
41
242
667
18
1
0
4.32
25.47
16.94
749
201
26.84
Africa
tail(n)
: ๋ง์ง๋ง n๊ฐ์ ๋ฐ์ดํฐ๋ฅผ ์ฐธ์กฐ
Country/Region
Confirmed
Deaths
Recovered
Active
New cases
New deaths
New recovered
Deaths / 100 Cases
Recovered / 100 Cases
Deaths / 100 Recovered
Confirmed last week
1 week change
1 week % increase
WHO Region
182
West Bank and Gaza
10621
78
3752
6791
152
2
0
0.73
35.33
2.08
8916
1705
19.12
Eastern Mediterranean
183
Western Sahara
10
1
8
1
0
0
0
10.00
80.00
12.50
10
0
0.00
Africa
184
Yemen
1691
483
833
375
10
4
36
28.56
49.26
57.98
1619
72
4.45
Eastern Mediterranean
185
Zambia
4552
140
2815
1597
71
1
465
3.08
61.84
4.97
3326
1226
36.86
Africa
186
Zimbabwe
2704
36
542
2126
192
2
24
1.33
20.04
6.64
1713
991
57.85
Africa
Pandas ํ์ฉ 2. ๋ฐ์ดํฐ ์ ๊ทผํ๊ธฐ
df['column_name']
ordf.column_name
column name์ attribute๋ก ์ ๊ทผํ ๋์๋ spacebar๊ฐ ์ ์ฉ๋ ๋ณ์๋ช ์ ์ ์ฉํ์ง ๋ชปํ๋ ์ฐจ์ด๊ฐ ์๋ค.
ex) covid.Who Region
O : covid["Who Region"]
X : covid.Who Region
Honey Tip! Dataframe์ ๊ฐ column์ "Series"์ด๋ค!
Pandas ํ์ฉ 3. "์กฐ๊ฑด"์ ์ด์ฉํด์ ๋ฐ์ดํฐ ์ ๊ทผํ๊ธฐ
Country/Region
Confirmed
Deaths
Recovered
Active
New cases
New deaths
New recovered
Deaths / 100 Cases
Recovered / 100 Cases
Deaths / 100 Recovered
Confirmed last week
1 week change
1 week % increase
WHO Region
0
Afghanistan
36263
1269
25198
9796
106
10
18
3.50
69.49
5.04
35526
737
2.07
Eastern Mediterranean
1
Albania
4880
144
2745
1991
117
6
63
2.95
56.25
5.25
4171
709
17.00
Europe
2
Algeria
27973
1163
18837
7973
616
8
749
4.16
67.34
6.17
23691
4282
18.07
Africa
Country/Region
Confirmed
Deaths
Recovered
Active
New cases
New deaths
New recovered
Deaths / 100 Cases
Recovered / 100 Cases
Deaths / 100 Recovered
Confirmed last week
1 week change
1 week % increase
WHO Region
13
Bangladesh
226225
2965
125683
97577
2772
37
1801
1.31
55.56
2.36
207453
18772
9.05
South-East Asia
19
Bhutan
99
0
86
13
4
0
1
0.00
86.87
0.00
90
9
10.00
South-East Asia
27
Burma
350
6
292
52
0
0
2
1.71
83.43
2.05
341
9
2.64
South-East Asia
Pandas ํ์ฉ 4. ํ์ ๊ธฐ์ค์ผ๋ก ๋ฐ์ดํฐ ์ ๊ทผํ๊ธฐ
Available
Location
Genre
๋ฒ๊ทธ๋ ๋ฌด์์ธ๊ฐ
True
102
Programming
๋๊ทผ๋๊ทผ ๋ฌผ๋ฆฌํ
False
215
Physics
๋ฏธ๋ถํด์ค ํ์ฆ
False
323
Math
์ธ๋ฑ์ค๋ฅผ ์ด์ฉํด์ ๊ฐ์ ธ์ค๊ธฐ : loc[row, col]
loc[row, col]
์ซ์ ์ธ๋ฑ์ค๋ฅผ ์ด์ฉํด์ ๊ฐ์ ธ์ค๊ธฐ : `.iloc[rowidx, colidx]
Pandas ํ์ฉ 5. groupby
Split : ํน์ ํ "๊ธฐ์ค"์ ๋ฐํ์ผ๋ก DataFrame์ ๋ถํ
Apply : ํต๊ณํจ์ - sum(), mean(), median(), - ์ ์ ์ฉํด์ ๊ฐ ๋ฐ์ดํฐ๋ฅผ ์์ถ
Combine : Apply๋ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํ์ผ๋ก ์๋ก์ด Series๋ฅผ ์์ฑ (group_key : applied_value)
.groupby()
Country/Region
Confirmed
Deaths
Recovered
Active
New cases
New deaths
New recovered
Deaths / 100 Cases
Recovered / 100 Cases
Deaths / 100 Recovered
Confirmed last week
1 week change
1 week % increase
WHO Region
0
Afghanistan
36263
1269
25198
9796
106
10
18
3.50
69.49
5.04
35526
737
2.07
Eastern Mediterranean
1
Albania
4880
144
2745
1991
117
6
63
2.95
56.25
5.25
4171
709
17.00
Europe
2
Algeria
27973
1163
18837
7973
616
8
749
4.16
67.34
6.17
23691
4282
18.07
Africa
3
Andorra
907
52
803
52
10
0
0
5.73
88.53
6.48
884
23
2.60
Europe
4
Angola
950
41
242
667
18
1
0
4.32
25.47
16.94
749
201
26.84
Africa
Mission:
1. covid ๋ฐ์ดํฐ์์ 100 case ๋๋น ์ฌ๋ง๋ฅ (Deaths / 100 Cases
)์ด ๊ฐ์ฅ ๋์ ๊ตญ๊ฐ๋?
Deaths / 100 Cases
)์ด ๊ฐ์ฅ ๋์ ๊ตญ๊ฐ๋?2. covid ๋ฐ์ดํฐ์์ ์ ๊ท ํ์ง์๊ฐ ์๋ ๋๋ผ ์ค WHO Region์ด 'Europe'๋ฅผ ๋ชจ๋ ์ถ๋ ฅํ๋ฉด?
Hint : ํ ์ค์ ๋์์ ๋๊ฐ์ง ์กฐ๊ฑด์ Applyํ๋ ๊ฒฝ์ฐ Warning์ด ๋ฐ์ํ ์ ์์ต๋๋ค.
Country/Region
Confirmed
Deaths
Recovered
Active
New cases
New deaths
New recovered
Deaths / 100 Cases
Recovered / 100 Cases
Deaths / 100 Recovered
Confirmed last week
1 week change
1 week % increase
WHO Region
56
Estonia
2034
69
1923
42
0
0
1
3.39
94.54
3.59
2021
13
0.64
Europe
75
Holy See
12
0
12
0
0
0
0
0.00
100.00
0.00
12
0
0.00
Europe
95
Latvia
1219
31
1045
143
0
0
0
2.54
85.73
2.97
1192
27
2.27
Europe
100
Liechtenstein
86
1
81
4
0
0
0
1.16
94.19
1.23
86
0
0.00
Europe
113
Monaco
116
4
104
8
0
0
0
3.45
89.66
3.85
109
7
6.42
Europe
143
San Marino
699
42
657
0
0
0
0
6.01
93.99
6.39
699
0
0.00
Europe
157
Spain
272421
28432
150376
93613
0
0
0
10.44
55.20
18.91
264836
7585
2.86
Europe
Unnamed: 0
Date
AveragePrice
Total Volume
4046
4225
4770
Total Bags
Small Bags
Large Bags
XLarge Bags
type
year
region
0
0
2015-12-27
1.33
64236.62
1036.74
54454.85
48.16
8696.87
8603.62
93.25
0.0
conventional
2015
Albany
1
1
2015-12-20
1.35
54876.98
674.28
44638.81
58.33
9505.56
9408.07
97.49
0.0
conventional
2015
Albany
2
2
2015-12-13
0.93
118220.22
794.70
109149.67
130.50
8145.35
8042.21
103.14
0.0
conventional
2015
Albany
3
3
2015-12-06
1.08
78992.15
1132.00
71976.41
72.58
5811.16
5677.40
133.76
0.0
conventional
2015
Albany
4
4
2015-11-29
1.28
51039.60
941.48
43838.39
75.78
6183.95
5986.26
197.69
0.0
conventional
2015
Albany
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
18244
7
2018-02-04
1.63
17074.83
2046.96
1529.20
0.00
13498.67
13066.82
431.85
0.0
organic
2018
WestTexNewMexico
18245
8
2018-01-28
1.71
13888.04
1191.70
3431.50
0.00
9264.84
8940.04
324.80
0.0
organic
2018
WestTexNewMexico
18246
9
2018-01-21
1.87
13766.76
1191.92
2452.79
727.94
9394.11
9351.80
42.31
0.0
organic
2018
WestTexNewMexico
18247
10
2018-01-14
1.93
16205.22
1527.63
2981.04
727.01
10969.54
10919.54
50.00
0.0
organic
2018
WestTexNewMexico
18248
11
2018-01-07
1.62
17489.58
2894.77
2356.13
224.53
12014.15
11988.14
26.01
0.0
organic
2018
WestTexNewMexico
18249 rows ร 14 columns
์คํ ๋งํ์ ์ปดํ์ผ๋ฌ
CFG : removing ambiguous grammar
Context Free Grammar๋ ๊ตฌ๋ฌธ ๋ถ์์ ํ๋๋ฐ ์์ด์ ํจ์จ์ ์๋นํ ๋จ์ด๋จ๋ฆฌ๋ ๊ฒฝ์ฐ์ ํจ์จ์ ์ธ ๊ตฌ๋ฌธ๋ถ์์ด ์ด๋ฃจ์ด์ง๋๋ก ์ฃผ์ด์ง ๋ฌธ๋ฒ์ ์ ๋นํ ๋ฌธ๋ฒ์ผ๋ก ๋ฐ๊พธ๋ ๋ฌธ๋ฒ ๋ณํ์ ํ์๋ก ํ๋ค. ๋ชจํธํ ๋ฌธ๋ฒ์ ์ ๊ฑฐ ๋ฐฉ๋ฒ์ ๋ค์๊ณผ ๊ฐ๋ค.
๋ถํ์ํ ์์ฑ๊ท์น์ ์ ๊ฑฐ
ฮต-์์ฑ๊ท์น์ ์ ๊ฑฐ
๋จ์ผ ์์ฑ๊ท์น์ ์ ๊ฑฐ
์ข์ธ์๋ถํด
์ข์ฌ๊ท ์ ๊ฑฐ
์ด๋ฅผ ๊ตฌํํ๋ฉด ๋ค์๊ณผ ๊ฐ๋ค. (์ฝ๋๋ฅผ ๋ณด๊ธฐ ์ ์ ์ธ์ ํ์๋ฉด, ๊ตฌํ๋ ์ ๊ฑฐ๋ ๋ถํ์ํ ์์ฑ๊ท์น์ ์ ๊ฑฐ, ฮต-์์ฑ๊ท์น์ ์ ๊ฑฐ, ๋จ์ผ ์์ฑ๊ท์น์ ์ ๊ฑฐ ๊น์ง์ด๋ฉฐ ์ด๋ง์ ๋ ์๋ฒฝํ๊ฒ ๊ตฌํ๋์ง ์์๋ค. ์ง๊ธ๊น์ง ์ฐพ์ ๋ฐ๋ก๋ก๋ S -> ABC, A -> B | a, B -> C | b, C -> A | c ์ ๊ฐ์ ์ํ ์ฝ๋์ ๋ํด์ ์๋ํ์ง ์๋๋ค. ๊ทธ ์ธ์๋ ์ ์๋ํ๋ค๊ณ ์๊ฐ์ด ๋ค์ง๋ง ์ค๋ฅ๋ฅผ ๋ฐ์์ํค๋ ๋ฐ๋ก๊ฐ ์ถฉ๋ถํ ์์ ๊ฒ์ด๋ผ ์์ํ๋ค. ์ข์ธ์๋ถํด์ ์ข์ฌ๊ท์ ๊ฑฐ๋ ์ฝ๋๊ฐ ๋๋ฌด ์ง์ ๋ถํด์ ธ ๋ฅ๋ ฅ์ ๋ฒ์ด๋๋ ์ผ์ด๋ผ๊ณ ์๊ฐํด ํฌ๊ธฐํ๋ค.)
Last updated
Was this helpful?