25 Thu

[ํŒŒ์ด์ฌ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ† ์น˜] PART 05 NLP

03 Models

๋ฌธ์ž ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ๋ง

  • ๋ฌธ์žฅ์ด๋‚˜ ๋ฌธ๋‹จ๊ณผ ๊ฐ™์€ ๋ฌธ์ž ๋ฐ์ดํ„ฐ๋Š” ์ฃผ๋ณ€ ๋‹จ์–ด์— ๋Œ€ํ•œ ์—ฐ์†์ ์ธ ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋ง์— ์ž˜ ๋‹ด์•„์•ผ ํ•œ๋‹ค.

  • NLP ๋ชจ๋ธ์€ ์–ธ์–ด์  ์•ฝ์†์„ ๋ชจ๋ธ๋ง ํ•˜๋ ค ํ–ˆ๊ณ  Classic Model์€ ํ†ต๊ณ„์ ์œผ๋กœ ์ ‘๊ทผํ•˜๋ ค๊ณ  ํ–ˆ๋‹ค.

RNN

  • Recurrent Neural Network

  • 1๊ฐœ์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด 1๊ฐœ์˜ ๊ฒฐ๊ด๊ฐ’์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต๋˜์–ด ์žˆ๋‹ค

    • ์ด๋ฅผ One To One ๋ฌธ์ œ๋ผ๊ณ  ํ•œ๋‹ค

    • Many to One : ๋Œ“๊ธ€์˜ ์•…ํ”Œ ๊ฐ€๋Šฅ์„ฑ ์ •๋„๋ฅผ ์ธก์ •ํ•˜๋Š” Sentence Classification

    • One to Many : ์‚ฌ์ง„ ์† ๋‚ด์šฉ์„ ์„ค๋ช…ํ•˜๋Š” ๊ธ€์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” Image Captioning

    • Many to Many(token by token) : ๋ฌธ์žฅ์˜ ๋ชจ๋“  Token์— ๋Œ€ํ•œ ํ’ˆ์‚ฌ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” Pos Tagging

    • Many to Mnay(encoder-decoder) : ์ž…๋ ฅ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ๋ฒˆ์—ญ ๋ฌธ์žฅ์„ ๋งŒ๋“ค์–ด๋‚ด์ฃผ๋Š” Translation

  • Sequential Data๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ๋ชจ๋ธ

  • ๊ฐ ์ •๋ณด๋Š” ์ด์ „ ์ •๋ณด๋ฅผ ์ฐธ๊ณ ํ•จ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ์˜ ์ˆœ์„œ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์–ด ์Œ์„ฑ๊ณผ ๊ฐ™์€ ์—ฐ์†์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ ํƒ์›”ํ•˜๋‹ค

  • ์„ฑ๋Šฅ์€ ๋›ฐ์–ด๋‚ฌ์ง€๋งŒ ๋ฐ˜๋ณต์ ์ธ Back Propagation ๋•Œ๋ฌธ์— G.V. ๋ฌธ์ œ๊ฐ€ ์‹ฌํ•˜๋‹ค

  • ๋ฌธ์žฅ์˜ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ๋„ ๋–จ์–ด์ง€๊ณ  ์‹œ๊ฐ„๋„ ์˜ค๋ž˜๊ฑธ๋ฆฐ๋‹ค

    • ์ด๋ฅผ ํ•ด๊ฒฐํ•œ ๋ชจ๋ธ์ด LSTM

LSTM

  • Long Short -Term Memory, 1997

  • RNN์˜ ํฐ ๋‹จ์ ์ธ ๋‹จ๊ธฐ ๊ธฐ์–ต๋งŒ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๋ถ€๋ถ„์„ ๊ฐœ์„ 

  • ํ•ต์‹ฌ์€ Cell๊ณผ ๋‹ค์ˆ˜์˜ Gate๋ฅผ ํ†ตํ•œ ์ •๋ณด ํ•„ํ„ฐ๋ง์ด๋‹ค.

    • ํ˜„์žฌ Token Cell Ci C_{i} ์€ ์ด์ „ ์ •๋ณด๋ฅผ ๋‹ด๊ณ ์žˆ๋Š” Cell Ciโˆ’1 C_{i-1} ์—์„œ ์–ผ๋งˆ๋‚˜ ์žŠ์„์ง€์— ๋Œ€ํ•œ Gate f๋ฅผ ํ†ต๊ณผํ•œ ๊ฐ’๊ณผ ํ˜„์žฌ ํ† ํฐ์—์„œ ์–ผ๋งˆ๋‚˜ ๊ฐ€์ ธ์˜ฌ์ง€์— ๋Œ€ํ•œ Gate i๋ฅผ ํ†ต๊ณผํ•œ ๊ฐ’์˜ ํ•ฉ์ด๋‹ค.

    • ์ด์ „ ์ •๋ณด hiโˆ’1 h_{i-1} ๊ณผ ํ˜„์žฌ ํ† ํฐ xi x_i ๊ฐ’์„ ์กฐํ•ฉํ•˜๋Š” ๊ณผ์ •

    • ์ด๋ ‡๊ฒŒ ๊ตฌํ•œ Cell ๊ฐ’์€ ์ตœ์ข… Gate๋ฅผ ๊ฑฐ์น˜๋ฉด์„œ ๋˜ ํ•œ๋ฒˆ์˜ ์ •๋ณด ์ˆ˜์ •์„ ํ†ตํ•ด ์ตœ์ข…์ ์ธ ์€๋‹‰์ธต์„ ๊ตฌ์„ฑํ•œ๋‹ค.

๊ทธ ์™ธ์˜ ๋ชจ๋ธ

  • Bi-RNNs

    • ๊ธฐ์กด ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ์˜ ๋‹จ๋ฐฉํ–ฅ์ด ์•„๋‹Œ ์–‘๋ฐฉํ–ฅ์˜ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ

  • GRUs

    • LSTM์˜ Output์„ ๊ฐ„์†Œํ™” ํ•จ

  • Attension Mechanism

  • Convolution Nerual Network for Text Classification

Transformer

  • ๋ณดํ†ต์˜ ์ธ์ฝ”๋”์™€ ๋””์ฝ”๋”๋Š” RNN๋ฅ˜์˜ LSTM์ด๋‚˜ GRU ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜๊ณ  Attention์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ Transformer๋Š” RNN์„ ์ „ํ˜€ ์“ฐ์ง€ ์•Š๊ณ  ์—ฌ๋Ÿฌ Attention ๋ชจ๋“ˆ์„ ์ด์–ด ๋งŒ๋“ค์—ˆ๋‹ค

  • ๋˜ ์ˆœ์„œ๋Œ€๋กœ Token์„ ์ž…๋ ฅ๋ฐ›๋Š” RNN๊ณผ ๋‹ฌ๋ฆฌ ํ•œ๋ฒˆ์— ๋ชจ๋“  Token์„ ์ž…๋ ฅ๋ฐ›์•„์„œ ํ•™์Šต์ด ๋น ๋ฅด๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค

  • ๊ธฐ์กด ๋ฒˆ์—ญ ๋ชจ๋ธ์—์„œ RNN๋ฅ˜์˜ ๋ชจ๋“ˆ์„ ๋ฒ—์–ด๋‚ซ๊ณ  Attention ๋งŒ์œผ๋กœ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ƒˆ๋‹ค๋Š” ์˜์˜๊ฐ€ ์žˆ์ง€๋งŒ Transformer๋ฅผ ์ด์šฉํ•œ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์ด ์—ฐ๊ตฌ๊ฐ€ ๋˜๊ณ  ์žˆ์Œ์ด ํฐ ์—ฐ๊ตฌ ์„ฑ๊ณผ์ด๋‹ค

BERT

  • ๊ตฌ๊ธ€์—์„œ ๋ฐœํ‘œ, 2018

  • Pre-training of Deep Bidirectional Transformers for Language Understanding

  • NLP์—์„œ Pre-trained ๋œ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ์ด๋ฅผ Fine-tuning ํ•˜๋Š” ๋ชจ๋ธ์˜ ๊ฐ€๋Šฅ์„ฑ๊ณผ ์„ฑ๋Šฅ์„ ๋™์‹œ์— ๋ณด์—ฌ์คŒ

Last updated

Was this helpful?