7 Sun

[ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜] PART 03 Deep Learning

01 ๋”ฅ๋Ÿฌ๋‹์˜ ์ •์˜

๋”ฅ๋Ÿฌ๋‹

  • ์ƒˆ๋กœ์šด ๋ชจ๋ธ์˜ ๊ฐœ๋…์ด ์•„๋‹Œ ์‹ ๊ฒฝ๋ง์ด ๋ฐœ์ „ํ•œ ๋ชจ๋ธ

  • ์‹ ๊ฒฝ๋ง์€ ํ•™์Šตํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํŠน์„ฑ์ƒ ๊ณผ์ ํ•ฉ์ด ์‹ฌํ•˜๊ฒŒ ์ผ์–ด๋‚˜๊ณ  Gradient Vanishing์ด ๋ฐœ์ƒํ•œ๋‹ค.

  • ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด SVM๊ณผ Ensemble Learning์ด ๋งŽ์ด ์“ฐ์ธ๋‹ค.

  • ๋”ฅ๋Ÿฌ๋‹์€ 2๊ฐœ ์ด์ƒ์˜ ์€๋‹‰์ธต์„ ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋‹ค์ธต ์‹ ๊ฒฝ๋ง

  • ๋”ฅ๋Ÿฌ๋‹์ด ๋ณธ๊ฒฉ์ ์œผ๋กœ ๋ฐœ์ „ํ•˜๊ฒŒ ๋œ ๊ฒƒ์€ Graphical Representation Learning์ด๋ผ๋Š” ํŠน์ง• ๋•Œ๋ฌธ

02 ๋”ฅ๋Ÿฌ๋‹์ด ๋ฐœ์ „ํ•˜๊ฒŒ ๋œ ๊ณ„๊ธฐ

  • ๊ณผ์ ํ•ฉ๊ณผ Gradient Vanishing์„ ์™„ํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋ฐœ์ „

  • GPU๋ฅผ ์‹ ๊ฒฝ๋ง์˜ ์—ฐ์‚ฐ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋ฉด์„œ ํ•™์Šต ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ

03 ๋”ฅ๋Ÿฌ๋‹์˜ ์ข…๋ฅ˜

  • MLP

  • CNN : ์ด๋ฏธ์ง€ ๊ด€๋ จ ๋ถ„์•ผ์—์„œ ๋งŽ์ด ์‚ฌ์šฉ

  • RNN : ํ…์ŠคํŠธ๊ฐ™์€ ์‹œ๊ณ„์—ด ๋ถ„์•ผ์— ๋งŽ์ด ์‚ฌ์šฉ

04 ๋”ฅ๋Ÿฌ๋‹์˜ ๋ฐœ์ „์„ ์ด๋ˆ ์•Œ๊ณ ๋ฆฌ์ฆ˜ - 1

Dropout

  • ์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต ๊ณผ์ • ์ค‘ Layer์˜ ๋…ธ๋“œ๋ฅผ ๋žœ๋คํ•˜๊ฒŒ Dropํ•จ์œผ๋กœ์จ Generalization ํšจ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ค๊ฒŒ ํ•˜๋Š” ํ…Œํฌ๋‹‰

  • ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ์•„์ด๋””์–ด๋ฅผ ์ฐจ์šฉ

  • MNIST๋ผ๋Š” ์†๊ธ€์”จ ๋ฐ์ดํ„ฐ์— Dropout์„ ์ ์šฉํ•œ ์‹ ๊ฒฝ๋ง๊ณผ ์ ์šฉํ•˜์ง€ ์•Š์€ ์‹ ๊ฒฝ๋ง์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜๋ฉด Dropout์„ ์ ์šฉํ•œ ์‹ ๊ฒฝ๋ง์ด Test Error๊ฐ€ ๋” ๋‚ฎ๋‹ค.

  • Ensemble Learning์˜ Random Forest์˜ ๊ฐœ๋…๊ณผ ๋น„์Šทํ•˜๋‹ค

    • Ensemble Learning์˜ ๊ธฐ๋ณธ ๊ฐœ๋…์€ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ

    • ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋žœ๋คํ•˜๊ฒŒ ๊ตฌ์„ฑํ•˜๊ณ  ๋ณ€์ˆ˜๋„ ๋žœ๋คํ•˜๊ฒŒ ๊ตฌ์„ฑํ•œ ๊ฒƒ์ด RandomForest

    • Dropout์„ ๋žœ๋คํ•œ ๋ณ€์ˆ˜์˜ ๊ตฌ์„ฑ์œผ๋กœ ๋ณด๋ฉด ๋น„์Šทํ•œ ๋ชจ๋ธ ๊ตฌ์„ฑ

Activation ํ•จ์ˆ˜

  • ReLU

    • Rectified Linear Unit

    • ์‹œ๊ทธ๋ชจ๋””์œผ ํ•จ์ˆ˜์™€ ๊ฐ™์€ ๋น„์„ ํ˜• ํ™œ์„ฑ ํ•จ์ˆ˜์˜ ๋ฌธ์ œ์ ์„ ์–ด๋А ์ •๋„ ํ•ด๊ฒฐ => Gradient Vanishing ์™„ํ™”

    • f(x) = max(0, x)

    • ์ดํ›„๋กœ Leaky ReLU, ELU, parametric ReLU, SELU, SERLU ๋“ฑ ๋‹ค์–‘ํ•œ ํ™œ์„ฑํ•จ์ˆ˜ ๋“ฑ์žฅ

Batch Normalization

  • ์‹ ๊ฒฝ๋ง์—๋Š” ๊ณผ์ ํ•ฉ๊ณผ GV ์™ธ์—๋„ Internal Covariance shift๋ผ๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒ

    • ๊ฐ ์ธต๋งˆ๋‹ค Input ๋ถ„ํฌ๊ฐ€ ๋‹ฌ๋ผ์ง์— ๋”ฐ๋ผ ํ•™์Šต ์†๋„๊ฐ€ ๋А๋ ค์ง€๋Š” ํ˜„์ƒ

    • Batch Normalization์€ ์ด๋ฅผ ๋ฐฉ์ง€ => Input ๋ถ„ํฌ๋ฅผ ์ •๊ทœํ™”ํ•ด ํ•™์Šต ์†๋„๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•จ

    • ์—ฌ๊ธฐ๋ฅผ ์ฐธ๊ณ ํ•˜๋ฉด ์ดํ•ด์— ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Œ(๊ทผ๋ฐ ๋‚œ ์ดํ•ด ์ž˜ ๋ชปํ•จ)

    • ๋Œ€๋ ฅ์ ์ธ ๋А๋‚Œ : ReLU๋Š” ์ž…๋ ฅ๊ฐ’์ด 0๋ณด๋‹ค ํฌ๋ฉด ํ•ญ์ƒ ์ž๊ธฐ ์ž์‹ ์„ Output์œผ๋กœ ์ฃผ๋Š”๋ฐ, ์ด ๊ฐ’์˜ ๋ฒ”์œ„๊ฐ€ ๋„ˆ๋ฌด ๊ฐ€์ง€๊ฐ์ƒ‰์ด๋‹ˆ ์ •๊ทœํ™”๋ฅผ ํ†ตํ•ด ์ผ์ • ๋ฒ”์œ„์•ˆ์˜ ์žˆ๋Š” ๊ฐ’์œผ๋กœ ํ†ต์ผํ•˜๊ฒ ๋‹ค๋ผ๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ์ด ๋•Œ ํ‘œ์ค€๋ถ„ํฌ๋ฅผ ์“ฐ๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ, ๊ฐ ๋ ˆ์ด์–ด๋งˆ๋‹ค ์•ŒํŒŒ ๋ฒ ํƒ€ ๊ฐ๋งˆ๋ฅผ......... ใ… ใ… 

Initialization

  • LeCun Initialization

    • CNN ์ฐฝ์‹œ์ž์˜ ์ด๋ฆ„์„ ๋•€

  • He Initialization

    • Xavier Initialization์„ ๋ณด์™„

Optimizer

  • SGD์ด์™ธ์—๋„ ๋‹ค์–‘ํ•œ Optimizer ์กด์žฌ

  • Momentum

    • ๋ฏธ๋ถ„์„ ํ†ตํ•œ Gradient ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€๋˜, ์ผ์ข…์˜ ๊ด€์„ฑ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฐœ๋…

    • ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์„ ๊ฒฝ์šฐ๋ณด๋‹ค ์ตœ์ ์˜ ์žฅ์†Œ๋กœ ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋ฉฐ ๊ฑธ์–ด๊ฐ€๋Š” ๋ณดํญ์ด ์ปค์ง„ ๊ฐœ๋…์œผ๋กœ ์ดํ•ด ๊ฐ€๋Šฅ

    • ์ตœ์  ํ•ด๊ฐ€ ์•„๋‹Œ ์ง€์—ญํ•ด๋ฅผ ์ง€๋‚˜์น  ์ˆ˜๋„์žˆ๋‹ค๋Š” ์žฅ์ 

  • NAG

    • Nesterov Accelerated Gradient

    • Momentum์„ ์•ฝ๊ฐ„ ๋ณ€ํ˜•ํ•œ ๋ฐฉ๋ฒ•

    • ๋ชจ๋ฉ˜ํ…€์œผ๋กœ ์ด๋™ํ•œ ํ›„ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•ด ์ด๋™ํ•˜๋Š” ๋ฐฉ์‹

  • Adagrad

    • Adaptive Gradient

    • ๊ฐ€๋ณด์ง€ ์•Š์€ ๊ณณ์€ ๋งŽ์ด ์›€์ง์ด๊ณ  ๊ฐ€๋ณธ ๊ณณ์€ ์กฐ๊ธˆ์”ฉ ์›€์ง์ด์ž

  • RMSProp

    • Adagrad์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•œ ๋ฐฉ๋ฒ• => ํ•™์Šต์ด ์˜ค๋ž˜ ์ง„ํ–‰๋ ์ˆ˜๋ก step size๊ฐ€ ์ž‘์•„์ง€๊ณ  ๋ถ€๋ถ„์ด ๊ณ„์† ์ฆ๊ฐ€ => G(๊ฐฑ์‹ ๋œ ํŒŒ๋ผ๋ฏธํ„ฐ)๊ฐ€ ๋ฌดํ•œํžˆ ์ปค์ง€์ง€ ์•Š๋„๋ก ์ง€์ˆ˜ ํ‰๊ท ์„ ๋‚ด ๊ณ„์‚ฐ

  • Adadelta

    • Adaptive Delta

    • Adagrad์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•œ ๋ฐฉ๋ฒ•

    • Gradient์˜ ์–‘์ด ๋„ˆ๋ฌด ์ ์–ด์ง€๋ฉด ์›€์ง์ž„์ด ๋ฉˆ์ถ”๋Š”๋ฐ, ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•

  • Adam

    • Adaptive Moment Estimation

    • ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ณธ์ ์ธ Optimizer

    • RMSProp๊ณผ Momentum ๋ฐฉ์‹์˜ ํŠน์ง•์„ ๊ฒฐํ•ฉํ•œ ๋ฐฉ๋ฒ•

  • RAdam

    • Rectified Adam

    • ๋Œ€๋ถ€๋ถ„์˜ Optimizer๋Š” ํ•™์Šต ์ดˆ๊ธฐ์— ์ „์—ญ ์ตœ์ €์ ์ด ์•„๋‹Œ ์ง€์—ญ ์ตœ์ €์ ์— ์ˆ˜๋ ดํ•ด ๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋Š” ๋‹จ์ ์ด ์žˆ๋Š”๋ฐ ์ด๋ฅผ ๊ต์ •ํ•˜๊ธฐ ์œ„ํ•œ Optimizer

Last updated

Was this helpful?