(Python 7-1๊ฐ•) pandas I

210806

Pandas

๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ์˜ ์ฒ˜๋ฆฌ๋ฅผ ์ง€์›ํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

  • panel data : pandas

  • ๊ณ ์„ฑ๋Šฅarray ๊ณ„์‚ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ numpy์™€ ํ†ตํ•ฉํ•˜์—ฌ, ๊ฐ•๋ ฅํ•œโ€œ์Šคํ”„๋ ˆ๋“œ์‹œํŠธโ€ ์ฒ˜๋ฆฌ๊ธฐ๋Šฅ์„ ์ œ๊ณต

  • ์ธ๋ฑ์‹ฑ, ์—ฐ์‚ฐ์šฉํ•จ์ˆ˜, ์ „์ฒ˜๋ฆฌํ•จ์ˆ˜๋“ฑ์„์ œ๊ณตํ•จ

  • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ ํ†ต๊ณ„๋ถ„์„์„์œ„ํ•ด์‚ฌ์šฉ

๋ฐ์ดํ„ฐ ์šฉ์–ด

  • ์ „์ฒด ๋ฐ์ดํ„ฐ : Data table, Sample

  • ์„ธ๋กœ์ค„ : attribute, field, feature, column

  • ๊ฐ€๋กœ์ค„ : instance, tuple, row

  • ํ•˜๋‚˜์˜ ์„ธ๋กœ ์ค„ : Featrue vector

  • ํ•˜๋‚˜์˜ ์›์†Œ : data

๋ฐ์ดํ„ฐ ๋กœ๋”ฉ

import pandas as pd

data url = "์ฃผ์†Œ"
df_data = pd.read_csv(data_url, sep="\s+', header=None)
  • pd.read_csv

    • url์— ์žˆ๋Š” csv ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค

    • sep : ๊ตฌ๋ถ„์ž๋ฅผ ์˜๋ฏธ. ์—ฌ๊ธฐ์„œ๋Š” ๊ณต๋ฐฑ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„์ž๋กœ ์„ ํƒ

    • header : ๋ช…์‹œ์ ์œผ๋กœ None์„ ์“ฐ๋ฉฐ ์ด ๋•Œ๋Š” ์ฒซ ํ–‰์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์—ด ์ด๋ฆ„์ด ๋œ๋‹ค.

Pandas์˜ ๊ตฌ์„ฑ

Dataframe

Data Table ์ „์ฒด๋ฅผ ํฌํ•จํ•˜๋Š” Object์ด๋‹ค.

Series

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ค‘ ํ•˜๋‚˜์˜ ์ปฌ๋Ÿผ์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ชจ์Œ Object ์ด๋‹ค.

Series

import pandas as pd

# Series ์ƒ์„ฑํ•˜๊ธฐ
>>> example_obj = pd.Series(dict_data, dtype=np.float32, name="example_data")
>>> example_obj
a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
Name: example_data, dtype: float32

# data index์— ์ ‘๊ทผํ•˜๊ธฐ, ๊ฐ’ ํ• ๋‹นํ•˜๊ธฐ
>>> example_obj["a"]
1.0
>>> example_obj["a"] = 3.2
>>> example_obj["a"]

# ๊ฐ’, ์ธ๋ฑ์Šค ์–ป๊ธฐ
>>> example_obj.values
array([3.2, 2. , 3. , 4. , 5. ], dtype=float32)
>>> example_obj.index
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

# ๊ฐ’, ์ธ๋ฑ์Šค naming
>>> example_obj.name = "number"
>>> example_obj.index.name = "alphabet"
>>> example_obj
alphabet
a    3.2
b    2.0
c    3.0
d    4.0
e    5.0
Name: number, dtype: float32

Dataframe

# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ
>>> raw_data =
		{'first_name':['Jason', 'Molly', 'Tina'],
		'last_name':['Miller', 'Jacobson', 'Ali'],
		'age':[42, 52, 36],
		'city':['San Francisco', 'Baltimore', 'Miami']
		}
		
>>> df = pd.DataFrame(raw_data)
>>> df
  first_name last_name  age           city
0      Jason    Miller   42  San Francisco
1      Molly  Jacobson   52      Baltimore
2       Tina       Ali   36          Miami

์ดํ›„์˜ pandas ๋‚ด์šฉ์€ ์ด์ „์— ์ž‘์„ฑํ•œ ์‹œ๊ฐํ™” ํŒŒํŠธ์™€ ๋™์ผํ•œ ๋‚ด์šฉ์ด ๋งŽ์•„ ์ด๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

Last updated

Was this helpful?