3 Sun
TIL
[์ธํ๋ฐ] ๋จ ๋ ์ฅ์ ๋ฌธ์๋ก ๋ฐ์ดํฐ ๋ถ์๊ณผ ์๊ฐํ ๋ฝ๊ฐ๊ธฐ
- Summarize Data
sepal_length
sepal_width
petal_length
petal_width
species
0
5.1
3.5
1.4
0.2
setosa
1
4.9
3.0
1.4
0.2
setosa
df['w'].value_counts()
ํด๋น ํค์ ์์์ ๊ฐ์๋ฅผ ์ถ๋ ฅํ๋ค. ์ข ๋ฅ์ ๊ฐ์๋ฅผ ํ์ธํ ๋ ์ ์ฉํ๋ค.
species
setosa
50
virginica
50
versicolor
50
len(df)
df์ ๊ธธ์ด๋ฅผ ์ ์ ์๋ค. shape๋ก๋ ์ ์ ์์
df['w'].nunique()
๊ณ ์ ๊ฐ์ ๊ฐ์๋ฅผ ๋ณด์ฌ์ค๋ค
df.describe()
์์นํ ๋ฐ์ดํฐ๋ค์ ๋ํ ํต๊ณ๊ฐ์ ๋ณด์ฌ์ค๋ค. [option] include : ํฌํจ, exclude : ์ ์ธ all : ๋ฐ์ดํฐ ํ์ ๊ณผ ๊ด๋ จ์์ด ๋ชจ๋ ๋ฐ์ดํฐ np.number : numpy๋ก ์ซ์ ํํ์ ๋ฐ์ดํฐ np.object : object ํ์ ์ ๋ฐ์ดํฐ category : category ํ์ ์ ๋ฐ์ดํฐ
sepal_length
sepal_width
petal_length
petal_width
count
150.000000
150.000000
150.000000
150.000000
mean
5.843333
3.057333
3.758000
1.199333
std
0.828066
0.435866
1.765298
0.762238
min
4.300000
2.000000
1.000000
0.100000
25%
5.100000
2.800000
1.600000
0.300000
50%
5.800000
3.000000
4.350000
1.300000
75%
6.400000
3.300000
5.100000
1.800000
max
7.900000
4.400000
6.900000
2.500000
sepal_length
sepal_width
petal_length
petal_width
species
count
150.000000
150.000000
150.000000
150.000000
150
unique
NaN
NaN
NaN
NaN
3
top
NaN
NaN
NaN
NaN
setosa
freq
NaN
NaN
NaN
NaN
50
mean
5.843333
3.057333
3.758000
1.199333
NaN
std
0.828066
0.435866
1.765298
0.762238
NaN
min
4.300000
2.000000
1.000000
0.100000
NaN
25%
5.100000
2.800000
1.600000
0.300000
NaN
50%
5.800000
3.000000
4.350000
1.300000
NaN
75%
6.400000
3.300000
5.100000
1.800000
NaN
max
7.900000
4.400000
6.900000
2.500000
NaN
species
count
150
unique
3
top
setosa
freq
50
setosa ์ข ์ด ๊ฐ์ฅ ๋ง์ด ๋ฑ์ฅํ๋ ์ข ์ค์์ ํ๋์ธ๋ฐ 50๋ฒ ๋ฑ์ฅํ๋ค.
some functions()
sepal_length
sepal_width
petal_length
petal_width
0.25
5.1
2.8
1.6
0.3
0.75
6.4
3.3
5.1
1.8
ํ๋ค์ค๋ก apply ํ์ฉํ๊ธฐ lambda ์ต๋ช
ํจ์ ์ฌ์ฉํ๊ธฐ - Summarize Data
df.apply()
sepal_length
sepal_width
petal_length
petal_width
species
species_3
0
5.1
3.5
1.4
0.2
setosa
set
1
4.9
3.0
1.4
0.2
setosa
set
2
4.7
3.2
1.3
0.2
setosa
set
3
4.6
3.1
1.5
0.2
setosa
set
4
5.0
3.6
1.4
0.2
setosa
set
...
...
...
...
...
...
...
145
6.7
3.0
5.2
2.3
virginica
vir
146
6.3
2.5
5.0
1.9
virginica
vir
147
6.5
3.0
5.2
2.0
virginica
vir
148
6.2
3.4
5.4
2.3
virginica
vir
149
5.9
3.0
5.1
1.8
virginica
vir
150 rows ร 6 columns
sepal_length
sepal_width
petal_length
petal_width
species
species_3
species-3
0
5.1
3.5
1.4
0.2
setosa
set
osa
1
4.9
3.0
1.4
0.2
setosa
set
osa
2
4.7
3.2
1.3
0.2
setosa
set
osa
3
4.6
3.1
1.5
0.2
setosa
set
osa
4
5.0
3.6
1.4
0.2
setosa
set
osa
...
...
...
...
...
...
...
...
145
6.7
3.0
5.2
2.3
virginica
vir
ica
146
6.3
2.5
5.0
1.9
virginica
vir
ica
147
6.5
3.0
5.2
2.0
virginica
vir
ica
148
6.2
3.4
5.4
2.3
virginica
vir
ica
149
5.9
3.0
5.1
1.8
virginica
vir
ica
150 rows ร 7 columns
fillna, dropna๋ก ๊ฒฐ์ธก์น ๋ค๋ฃจ๊ธฐ - Handling Missing Data
๊ฒฐ์ธก์น(Not a ...) ์ ๋ํด์ ์ฒ๋ฆฌํ๋ ๋ฐฉ๋ฒ
name
toy
born
0
Alfred
NaN
NaT
1
Batman
Batmobile
1940-04-25
2
Catwoman
Bullwhip
NaT
name
toy
born
0
Alfred
NaN
NaT
1
Batman
Batmobile
1940-04-25
2
Catwoman
Bullwhip
NaT
name
0
Alfred
1
Batman
2
Catwoman
name
toy
born
1
Batman
Batmobile
1940-04-25
A
B
C
D
0
NaN
2.0
NaN
0
1
3.0
4.0
NaN
1
2
NaN
NaN
NaN
5
3
NaN
3.0
NaN
4
A
B
C
D
0
0.0
2.0
2.0
0
1
3.0
4.0
2.0
1
2
0.0
1.0
2.0
5
3
0.0
3.0
2.0
4
A
B
C
D
0
2.5
2.0
2.5
0
1
3.0
4.0
2.5
1
2
2.5
2.5
2.5
5
3
2.5
3.0
2.5
4
A
B
C
D
0
True
False
True
False
1
False
False
True
False
2
True
True
True
False
3
True
False
True
False
assign ์ผ๋ก ์๋ก์ด ์ปฌ๋ผ ๋ง๋ค๊ธฐ, qcut์ผ๋ก binning, bucketing ํ๊ธฐ - Make New Columns
A
B
0
1
0.052204
1
2
-1.489858
2
3
0.427285
3
4
1.148815
4
5
-1.301116
5
6
1.739656
6
7
1.000600
7
8
-1.672363
8
9
0.301468
9
10
-0.221703
A
B
ln_A
0
1
0.052204
0.000000
1
2
-1.489858
0.693147
2
3
0.427285
1.098612
3
4
1.148815
1.386294
4
5
-1.301116
1.609438
A
B
ln_A
0
1
0.052204
0.000000
1
2
-1.489858
0.693147
2
3
0.427285
1.098612
3
4
1.148815
1.386294
4
5
-1.301116
1.609438
assign์ ํตํด์ ์๋ก์ด ์ปฌ๋ผ์ ๋ง๋ค๊ฑฐ๋, ์ง์ ํ ๋น์ ํด์ ๋ง๋ค ์ ์๋ค
n๊ฐ์ ๋ฒํท ์ ๋งํผ ์๋ก์ด ์ปฌ๋ผ์ ๋ง๋ค์ผ๋ผ๋ ์๋ฏธ
Last updated
Was this helpful?