9 Wed

ํ˜„์—… ์‹ค๋ฌด์ž์—๊ฒŒ ๋ฐฐ์šฐ๋Š” Kaggle ๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ

XGBoost ์†Œ๊ฐœ

  • eTreme Gradient Boosting

  • Gradient Boosting ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์ถ”๊ฐ€์ ์ธ ํ…Œํฌ๋‹‰๋“ค์„ ๊ฒฐํ•ฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜

    • ๊ธฐ๋ณธ ์›๋ฆฌ๋Š” Gradient Boosting ๊ธฐ๋ฐ˜

  • ์บ๊ธ€์— ์ƒ์œ„๊ถŒ ์‚ฌ๋žŒ๋“ค์ด ๋งŽ์ด ์‚ฌ์šฉํ•จ

  • ๋ณ‘๋ ฌ์ ์ด๊ณ  ํšจ์œจ์ ์ด๊ณ  ์ตœ์ ํ™”๋˜์–ด์žˆ์Œ

์•™์ƒ๋ธ” ๋Ÿฌ๋‹

  • ์•™์ƒ๋ธ” ๋Ÿฌ๋‹์€ ํฌ๊ฒŒ Bagging ๋ฐฉ์‹๊ณผ Boosting ๋ฐฉ์‹์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.

  • Bagging

    • ๋งค๋ฒˆ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ์„ ๋ฝ‘์•„์„œ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต์‹œํ‚จ ๋ถ„๋ฅ˜๊ธฐ๋“ค์˜ ๊ฒฐ๊ณผ๋ฅผ ์ข…ํ•ฉํ•˜๋Š” ๊ฒƒ

    • ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ์‹์œผ๋กœ๋Š” ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ๊ฐ€ ์žˆ๋‹ค

  • Boosting

    • ๋งค๋ฒˆ ์ƒ˜ํ”Œ์„ ๋ฝ‘์•„์„œ ํ•™์Šต์‹œํ‚ค๋˜, ๋…๋ฆฝ์ ์ด์ง€ ์•Š๊ณ  ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šต ์‹œํ‚จ๋‹ค

    • ์ด์ „ ๋‹จ๊ณ„์—์„œ ์˜ค์ฐจ๊ฐ€ ํฐ ์ƒ˜ํ”Œ๋“ค์ด ๋‹ค์‹œ ๋ฝ‘ํžˆ๋„๋ก ํ•œ๋‹ค

      • ์˜ค์ฐจ๊ฐ€ ํฐ ์ƒ˜ํ”Œ๋“ค์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•ด์„œ ๋ฝ‘ํž‰ ํ™•๋ฅ ์ด ๋†’๋„๋ก ํ•œ๋‹ค

    • ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ์‹์œผ๋กœ๋Š” AdaBoost, XGBoost, GradientBoost ๋“ฑ์ด ์žˆ๋‹ค.

GBM

  • Gradient Boosting Machine

  • ํ•™์Šต๊ณผ์ •์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š”๋ฐ GD ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•œ๋‹ค.

XGBoost์˜ ์žฅ์ ๊ณผ ๋‹จ์ 

์žฅ์ 

  • ๋Œ€๋ถ€๋ถ„์˜ ์ƒํ™ฉ์—์„œ ์•ˆ์ •์ ์ด๊ณ  ์ข‹์€ ์„ฑ๋Šฅ

  • Feature Enginerring์„ ๋งŽ์ด ์ ์šฉํ•˜์ง€ ์•Š์•„๋„ ์•ˆ์ •์ ์ธ ์„ฑ๋Šฅ

๋‹จ์ 

  • ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋ฐฉ๋Œ€ํ•ด์„œ ํŠœ๋‹ํ•˜๋Š” ๊ฒƒ์ด ์ƒ๋Œ€์ ์œผ๋กœ ์–ด๋ ต๋‹ค

Stroke Preidction ๋ฐ์ดํ„ฐ์…‹ ์†Œ๊ฐœ

  • ๋‚˜์ด, ์„ฑ๋ฒผ๋ฅด ๊ณ ํ˜ˆ์•• ์œ ๋ฌด ๋“ฑ์„ ํ† ๋Œ€๋กœ ๋‡Œ์กธ์ค‘์„ ๊ฐ€์ง„ ์‚ฌ๋žŒ์ธ์ง€ ์•„๋‹Œ์ง€ ์˜ˆ์ธกํ•ด๋ณด๋Š” ๋ฐ์ดํ„ฐ์…‹

  • Feature : 12 Dimentsion

    • id

    • gender

    • age

    • hypertension : ๊ณ ํ˜ˆ์•• ์œ ๋ฌด

    • hear_disease : ์‹ฌ์žฅ๋ณ‘ ์œ ๋ฌด

    • ever_married

    • work_type

    • Residence_type

    • avg_glucose_level

    • bim : body mass index

    • smoking status

    • stroke

  • Target Value : Binary Classification

    • stroke : ๋‡Œ์กธ์ฆ

    • not stroke

  • ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜ : 5,110

XGBoost๋ฅผ ์ด์šฉํ•ด์„œ ๋‡Œ์กธ์ค‘(Stroke) ๋ฐœ์ƒ์œ ๋ฌด ์˜ˆ์ธกํ•ด๋ณด๊ธฐ - Stroke Prediction ๋ฐ์ดํ„ฐ์…‹

  • Input data : 11 Dimension

  • Target : stroke

    • Yes : 1

    • No : 0

  • Estimator

    • DecisionTreeClassifier

    • RandomforestClassifier

    • XGBoostClassifier

  • ์ถ”๊ฐ€์ ์ธ ์ ์šฉ๊ธฐ๋ฒ•

    • Data Cleansing : ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ

๋ถˆํ•„์š”ํ•œ Feature ์ œ๊ฑฐ

์ƒ๊ด€๊ด€๊ณ„

  • ๊ทธ๋‚˜๋งˆ ๋‚˜์ด๊ฐ€ stroke์™€ ์ œ์ผ ์—ฐ๊ด€์ด ์žˆ๋‹ค

String(object) ๋ ˆ์ด๋ธ” encodingํ•˜๊ธฐ

Last updated

Was this helpful?