(Python 7-1๊ฐ•) pandas I

210806

Pandas

๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ์˜ ์ฒ˜๋ฆฌ๋ฅผ ์ง€์›ํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

  • panel data : pandas

  • ๊ณ ์„ฑ๋Šฅarray ๊ณ„์‚ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ numpy์™€ ํ†ตํ•ฉํ•˜์—ฌ, ๊ฐ•๋ ฅํ•œโ€œ์Šคํ”„๋ ˆ๋“œ์‹œํŠธโ€ ์ฒ˜๋ฆฌ๊ธฐ๋Šฅ์„ ์ œ๊ณต

  • ์ธ๋ฑ์‹ฑ, ์—ฐ์‚ฐ์šฉํ•จ์ˆ˜, ์ „์ฒ˜๋ฆฌํ•จ์ˆ˜๋“ฑ์„์ œ๊ณตํ•จ

  • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ ํ†ต๊ณ„๋ถ„์„์„์œ„ํ•ด์‚ฌ์šฉ

๋ฐ์ดํ„ฐ ์šฉ์–ด

  • ์ „์ฒด ๋ฐ์ดํ„ฐ : Data table, Sample

  • ์„ธ๋กœ์ค„ : attribute, field, feature, column

  • ๊ฐ€๋กœ์ค„ : instance, tuple, row

  • ํ•˜๋‚˜์˜ ์„ธ๋กœ ์ค„ : Featrue vector

  • ํ•˜๋‚˜์˜ ์›์†Œ : data

๋ฐ์ดํ„ฐ ๋กœ๋”ฉ

import pandas as pd

data url = "์ฃผ์†Œ"
df_data = pd.read_csv(data_url, sep="\s+', header=None)
  • pd.read_csv

    • url์— ์žˆ๋Š” csv ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค

    • sep : ๊ตฌ๋ถ„์ž๋ฅผ ์˜๋ฏธ. ์—ฌ๊ธฐ์„œ๋Š” ๊ณต๋ฐฑ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„์ž๋กœ ์„ ํƒ

    • header : ๋ช…์‹œ์ ์œผ๋กœ None์„ ์“ฐ๋ฉฐ ์ด ๋•Œ๋Š” ์ฒซ ํ–‰์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์—ด ์ด๋ฆ„์ด ๋œ๋‹ค.

Pandas์˜ ๊ตฌ์„ฑ

Dataframe

Data Table ์ „์ฒด๋ฅผ ํฌํ•จํ•˜๋Š” Object์ด๋‹ค.

Series

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ค‘ ํ•˜๋‚˜์˜ ์ปฌ๋Ÿผ์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ชจ์Œ Object ์ด๋‹ค.

Series

Dataframe

์ดํ›„์˜ pandas ๋‚ด์šฉ์€ ์ด์ „์— ์ž‘์„ฑํ•œ ์‹œ๊ฐํ™” ํŒŒํŠธ์™€ ๋™์ผํ•œ ๋‚ด์šฉ์ด ๋งŽ์•„ ์ด๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

Last updated

Was this helpful?