17 Wed

[AI ์Šค์ฟจ 1๊ธฐ] 10์ฃผ์ฐจ DAY 3

NLP : ๋ฌธ์„œ๋ถ„๋ฅ˜ I

๋ฌธ์„œ ๋ถ„๋ฅ˜

  • ํ…์ŠคํŠธ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„, ํ…์ŠคํŠธ๊ฐ€ ์–ด๋–ค ์ข…๋ฅ˜์˜ ๋ฒ”์ฃผ์— ์†ํ•˜๋Š”์ง€๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ์ž‘์—…

๋‹ค์–‘ํ•œ ๋ฌธ์„œ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋“ค

  • ๋ฌธ์„œ์˜ ๋ฒ”์ฃผ, ์ฃผ์ œ ๋ถ„๋ฅ˜

  • ์ด๋ฉ”์ผ ์ŠคํŒธ ๋ถ„๋ฅ˜

  • ๊ฐ์„ฑ ๋ถ„๋ฅ˜ : ๊ธ์ •์ ์ธ์ง€ ๋ถ€์ •์ ์ธ์ง€

  • ์–ธ์–ด ๋ถ„๋ฅ˜ : ์ฃผ์–ด์ง„ ๋ฌธ์ œ๊ฐ€ ์–ด๋–ค ์–ธ์–ด๋กœ ์“ฐ์—ฌ ์žˆ๋Š”๊ฐ€?

์ฃผ์ œ ๋ถ„๋ฅ˜

  • CS๋…ผ๋ฌธ์„ ๊ฐ€์ง€๊ณ  CS ์ฃผ์ œ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ

๊ฐ์„ฑ ๋ถ„๋ฅ˜

  • ์ฃผ์–ด์ง„ ๋ฌธ์„œ์— ๋Œ€ํ•ด์„œ ํ•ด๋‹น ๋ฌธ์„œ๊ฐ€ ๊ธ์ •์ ์ธ์ง€ ๋ถ€์ •์ ์ธ์ง€ ํŒŒ์•…

  • ์˜ํ™” ๋ฆฌ๋ทฐ๊ฐ€ ๊ธ/๋ถ€ ์ •์ ์ธ์ง€ ํŒŒ์•…ํ•˜๊ธฐ

    • ๊ธ์ •์  : richly, great, awesome, love

    • ๋ถ€์ •์  : pathetic, worst, awful, ridiculously

  • ์ œํ’ˆ์— ๋Œ€ํ•œ ๋Œ€์ค‘์˜ ๋ฐ˜์‘

  • ์ •์น˜์ธ์— ๋Œ€ํ•œ ์‚ฌ๋žŒ๋“ค์˜ ์ƒ๊ฐ

  • ๊ฐ์„ฑ ๋ถ„๋ฅ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ ๊ฑฐ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธก

  • ๊ฐ์ •์ (๋ฐฉํ™ฉ, ์Šฌํ””, ๊ธฐ์จ), ํƒœ๋„์ (์ข‹์•„ํ•จ, ์‚ฌ๋ž‘ํ•จ, ์‹ซ์–ดํ•จ), ์„ฑ๊ฒฉ์ (๋ถˆ์•ˆํ•จ, ์ ๋Œ€์ , ํ˜ธ์˜์ ) ์ธก๋ฉด์ด ์žˆ์ง€๋งŒ ๊ธ/๋ถ€์ •์ ์ธ ํƒœ๋„๋งŒ์„ ํŒŒ์•…ํ•˜๋Š” ๊ฐ„๋‹จํ•œ ์ž‘์—…์„ ๋‹ค๋ฃฐ ๊ฒƒ.

๋ฌธ์„œ ๋ถ„๋ฅ˜ : ์ •์˜

  • ๋ฌธ์„œ๋ฅผ ์ฃผ๋กœ d ๋ผ๊ณ  ํ•จ

  • ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ํด๋ž˜์Šค๋ฅผ C ={c1, c2, ...} ๋ผ๊ณ  ํ•œ๋‹ค

  • ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋Š” : ์˜ˆ์ธก๋œ ํด๋ž˜์Šค cn

๋ฌธ์„œ ๋ถ„๋ฅ˜ ๋ฐฉ๋ฒ• - ๊ทœ์น™๊ธฐ๋ฐ˜ ๋ชจ๋ธ

  • ๋‹จ์–ด๋“ค์˜ ์กฐํ•ฉ์„ ์‚ฌ์šฉํ•œ ๊ทœ์น™๋“ค์„ ์‚ฌ์šฉ

    • spam : black-list, dollars & you have been selected, etc

  • Precision์€ ๋†’์ง€๋งŒ recall์ด ๋‚ฎ์Œ

    • ์‚ฌ๋žŒ์ด ๊ทœ์น™์„ ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์— ์ •ํ™•๋„๋Š” ๋†’์Œ

    • ์ง€์†์ ์ธ ์—…๋ฐ์ดํŠธ๋กœ ์ •ํ™•๋„์— ๊ธฐ์—ฌ

    • ์ปค๋ฒ„ํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒฝ์šฐ๋„ ๊ต‰์žฅํžˆ ๋งŽ๋‹ค

    • ๋”ฐ๋ผ์„œ ์‚ฌ๋žŒ์ด ๋งŒ๋“  ๊ทœ์น™๋ณด๋‹ค ๋จธ์‹ ๋Ÿฌ๋‹ ํ•™์Šต์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ๊ทœ์น™์„ ๋”ฐ๋ฅด๋Š” ๊ฒƒ์ด ์ข‹์Œ

    • ๋จธ์‹ ๋Ÿฌ๋‹๋„ ์‚ฌ์‹ค์€ ๋ณต์žกํ•œ ๊ทœ์น™์„ ๋งŒ๋“ ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์‚ฌ๋žŒ์ด ์ด๋ฅผ ๋งŒ๋“ค๊ธฐ๋Š” ์‰ฝ์ง€ ์•Š์Œ

  • Snorkel

    • ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜

    • ๊ทœ์น™๊ธฐ๋ฐ˜ ๋ชจ๋ธ๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹

    • ๊ฐ๊ฐ์˜ ๊ทœ์น™์„ labeling function์œผ๋กœ ๊ฐ„์ฃผ

      • ์ „๋ฌธ๊ฐ€๊ฐ€ labeling ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ ํŠน์ • ๊ทœ์น™์•ˆ์—์„œ labeling์„ ํ•œ๋‹ค.

    • factor graph๋ฅผ ๊ฐ€์ง€๊ณ  ํ™•๋ฅ ์  ๋ชฉํ‘œ๊ฐ’์„ ์ƒ์„ฑํ•œ๋‹ค

      • generative model

      • ๊ธ์ • 1, ๋ถ€์ • 0 ์ด ์•„๋‹Œ 0.7, 0.2 ๋“ฑ์˜ ํ™•๋ฅ ์  ์ˆ˜์น˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค

      • ์ด๋Š” labeling๋˜์ง€ ์•Š์€ unlabed ๋ฐ์ดํ„ฐ๋“ค์— ๋Œ€ํ•ด์„œ labelingํ•ด์ฃผ๋Š” ์ž‘์—…์ด๋‹ค

    • labeled data๊ฐ€ ๋ถ€์กฑํ•˜๊ฑฐ๋‚˜ ํด๋ž˜์Šค ์ •์˜ ์ž์ฒด๊ฐ€ ์• ๋งคํ•œ ๊ฒฝ์šฐ ๋งค์šฐ ์œ ์šฉํ•˜๋‹ค => ๊ฐ์„ฑ ๋ถ„๋ฅ˜ ์ž์ฒด๋Š” ๊ต‰์žฅํžˆ ์• ๋งคํ•œ ํด๋ž˜์Šค์— ์†ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค.

    • ๋ฌธ์„œ ํ•˜๋‚˜ํ•˜๋‚˜์— labeling ํ•˜๊ธฐ๋Š” ์–ด๋ ต๋‹ค.

    • ํ™•์‹คํ•˜ ๊ทœ์น™ ์ƒ์„ฑ์€ ์‰ฌ์šด ํŽธ์— ์†ํ•จ.

    • ์ด ํ›„ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค

๋ฌธ์„œ ๋ถ„๋ฅ˜ ๋ฐฉ๋ฒ•๋“ค - ์ง€๋„ํ•™์Šต

  • input : ๋ฌธ์„œ d

  • classes : C

  • training set : m = (d1, c1) ,,, (dm, cm)

  • classifier(output) : y:d -> c

  • ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ์‚ฌ์šฉ ๊ฐ€๋Šฅ

    • Naive Bayes

    • Logistic regression

    • Nerual networks

    • k-Nearest Neighbors

NLP : ๋ฌธ์„œ๋ถ„๋ฅ˜ II

Naive Bayes ๋ถ„๋ฅ˜๊ธฐ

  • ๊ฐ€์ • ์ด๋ฆ„์ด Naive Bayes์ž„

    • ๊ทธ๋ž˜์„œ ์ด๋ฆ„์ด ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ ๋ถ„๋ฅ˜๊ธฐ

  • ์ด ๋ชจ๋ธ์€ Bag of Words ํ‘œํ˜„์— ๊ธฐ๋ฐ˜ํ•จ

    • ํ…์ŠคํŠธ ํ‘œํ˜„ ๋ฐฉ๋ฒ•์ด ์ด ๋ฐฉ๋ฒ•

  • ์ˆœ์„œ๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š์€ ๋‹จ์–ด๋“ค์˜ ์ง‘ํ•ฉ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค

  • (0, 5, 0, 3, ... , 0)

    • ๊ฐ๊ฐ์˜ vocaburary์˜ ํ•œ๋‹จ์–ด => |v|๊ฐœ์˜ ์›์†Œ๋ฅผ ๊ฐ€์ง„ ๋ฒกํ„ฐ

    • ์›์†Œ์˜ ๊ฐ’์€ ๋นˆ๋„์ˆ˜๋ฅผ ์˜๋ฏธ

Naive Bayes ๋ถ„๋ฅ˜๊ธฐ - ์ˆ˜์‹ํ™”

  • ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋ฅผ ์ด์šฉํ•ด์„œ ์‹ ์ƒ์„ฑ

  • ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ ํด๋ž˜์Šค๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ๋ถ„๋ฅ˜๊ธฐ์˜ ๋ชฉํ‘œ

  • ๋ถ„๋ชจ ์ œ๊ฑฐ : ์ตœ๋Œ€ํ™” ํ•˜๋Š” ํด๋ž˜์Šค๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ P(d)๋Š” ๊ด€๋ จ์ด ์—†๊ธฐ ๋•Œ๋ฌธ

  • d๊ฐ€ n๊ฐœ์˜ ์†์„ฑ์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ  x1 ๋ถ€ํ„ฐ xn์œผ๋กœ ํ‘œํ˜„

  • d๋ฅผ n๊ฐœ์˜ ํŠน์ง•์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋‹ค ๋ณด๋‹ˆ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์•„์ฃผ ๋งŽ์•„์ง„๋‹ค. => ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์œผ๋ฉด ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ ์–ด๋ ค์›€

  • Bag of Words ๊ฐ€์ • : ๋‹จ์–ด์˜ ์œ„์น˜๋Š” ํ™•๋ฅ ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋Š”๋‹ค.

  • ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ ๊ฐ€์ • : ํด๋ž˜์Šค๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ์†์„ฑ๋“ค์€ ๋…๋ฆฝ์ ์ด๋‹ค

    • ์ด๋ฅผ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

  • ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ ๊ฐ€์ •์„ ํ†ตํ•ด ๋ถ„๋ฆฌ๋ฅผ ํ•˜๊ฒŒ ๋˜๋ฉด ๊ฐ๊ฐ์˜ ํด๋ž˜์Šค์— ํ•ด๋‹นํ•˜๋Š” ํŠน์„ฑ๋งŒ ์กด์žฌํ•˜๋ฏ€๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ ์–ด์ง€๊ฒŒ ๋œ๋‹ค => ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ ์ ๋”๋ผ๋„ ํ•™์Šต ๊ฐ€๋Šฅ

  • Naive Bayes ๋ถ„๋ฅ˜๊ธฐ๋Š” ์ž…๋ ฅ๊ฐ’์— ๊ด€ํ•œ ์„ ํ˜•๋ชจ๋ธ์ด๋‹ค.

NLP : ๋ฌธ์„œ๋ถ„๋ฅ˜ III

Naive Bayes ๋ถ„๋ฅ˜๊ธฐ - ํ•™์Šต

  • MLE

  • Zero ํ™•๋ฅ ๋ฌธ์ œ

    • ๋ฐ์ดํ„ฐ์˜ ๋ถ€์กฑ์œผ๋กœ ํ™•๋ฅ ์ด 0์ด ๋  ์ˆ˜ ์žˆ์Œ

    • ํด๋ž˜์Šค์™€ ํŠน์„ฑ์˜ ๊ณฑ์œผ๋กœ ํ‘œํ˜„๋˜๋ฏ€๋กœ ํ•˜๋‚˜๋ผ๋„ 0์ด ๋˜๋ฉด ์ „์ฒด ํ™•๋ฅ ์ด 0์ด ๋˜๊ธฐ ๋–„๋ฌธ

    • ์ด ๋•Œ ๋ผํ”Œ๋ผ์Šค ์Šค๋ฌด๋”ฉ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค

Naive Basyes ๋ถ„๋ฅ˜๊ธฐ - ์š”์•ฝ

  • naive ํ•˜์ง€๋Š” ์•Š์Œ

  • ์ ์€ ํ•™์Šต๋ฐ์ดํ„ฐ๋กœ๋„ ์ข‹์€ ์„ฑ๋Šฅ๊ณผ ๋น ๋ฅธ ์†๋„

  • ์กฐ๊ฑด๋ถ€๋…๋ฆฝ ๊ฐ€์ •์ด ์‹ค์ œ ๋ฐ์ดํ„ฐ์—์„œ ์„ฑ๋ฆฝํ•  ๋•Œ ์ตœ์ ์˜ ๋ชจ๋ธ

  • ๋ฌธ์„œ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ๋กœ ์ ํ•ฉํ•˜๋‹ค

Last updated

Was this helpful?