SlideShare a Scribd company logo
by Sparse
Word Representation
Compressing
Neural Language Model
KoreaUniversity,
DepartmentofComputerScience&Radio
CommunicationEngineering
2016010646 ๊น€๋ฒ”์ˆ˜
2016010636 ์ด์ง„ํ˜
Korean Information Processing
Professor HaechangLim
1
KOREAN INFORMATION PROCESSING Presentation
Contents
01.LanguageModel
02.ProposedModel
1-1. Language Model ์ด๋ž€?
2-1. Sparse Representation
2
1-2. N-grams
1-3. Standard Neural LM
2-2. Embedding Compression
2-3. Prediction Compression
2-4. ZRegression NCE
03.Evaluation
3-1. Dataset
3-2. Qualitative Analysis
3-3. Quantitative analysis
3-4. Conclusion
Language Model
1-1. Language Model ์ด๋ž€?
1-2. N-grams
3KOREAN INFORMATION PROCESSING Presentation
1-3. Standard Neural LM
1. Language Model
4
1-1. Language Model ์ด๋ž€?
Unfortunately, I am an ____________
KOREAN INFORMATION PROCESSING Presentation
Language Model (์–ธ์–ด๋ชจ๋ธ)
ํ˜„์žฌ๊นŒ์ง€์˜ context(history)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚  ๋‹จ์–ด์˜ ํ™•๋ฅ  ๐‘ท๐‘ท ๐’˜๐’˜ ๐’„๐’„
๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ชจ๋ธ ๊ธฐ๋ฒ•
idiot 0.672
flower 0.115
psycho-pass 0.581
genius 0.336
โ€ฆ
walk 0.016
process 0.052
cancel 0.039
์–ธ์–ด๋ชจ๋ธ์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€
1. Language Model
5
1-1. Language Model ์ด๋ž€?
KOREAN INFORMATION PROCESSING Presentation
LM ์ •๋ฆฌ 1.
๊ฐ ๋‹จ์–ด์˜ ํ™•๋ฅ ๋กœ๋ถ€ํ„ฐ ๋ฌธ์žฅ,. ์ฆ‰ sequence์˜ joint probability๋ฅผ chain rule์—
์˜๊ฑฐํ•˜์—ฌ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.
๐‘ท๐‘ท( ๐‘พ๐‘พ) = ๐‘ƒ๐‘ƒ(๐‘ค๐‘ค1,๐‘ค๐‘ค2,๐‘ค๐‘ค3,โ€ฆ,๐‘ค๐‘ค๐‘›๐‘›)
๐‘ท๐‘ท ๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ‘๐Ÿ‘,โ€ฆ,๐’˜๐’˜๐’๐’ = ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค1 ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค2 ๐‘ค๐‘ค1 ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค3 ๐‘ค๐‘ค1,๐‘ค๐‘ค2 โ€ฆ๐‘ƒ๐‘ƒ(๐‘ค๐‘ค๐‘›๐‘›|๐‘ค๐‘ค1,๐‘ค๐‘ค2,โ€ฆ,๐‘ค๐‘ค๐‘›๐‘›โˆ’1)
๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’• ๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜ ๐’Š๐’Š๐’Š๐’Š ๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•
= ๐‘ƒ๐‘ƒ ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก ๐‘ƒ๐‘ƒ ๐‘–๐‘–๐‘–๐‘– ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค โ€ฆ ๐‘ƒ๐‘ƒ(๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก|๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘–๐‘–๐‘–๐‘– ๐‘ ๐‘ ๐‘ ๐‘ )
์–ธ์–ด๋ชจ๋ธ, ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ•˜๋Š”๊ฐ€
1. Language Model
6
1-1. Language Model ์ด๋ž€?
KOREAN INFORMATION PROCESSING Presentation
์–ธ์–ด๋ชจ๋ธ, ์™œ ์œ ์šฉํ•œ๊ฐ€?
Machine translation
Spell Correction
Speech Recognition
P(delicious fish) > P( ominous fish )
P( I love you ) > P( I loev you )
P( I saw a van ) > P( eyes awe of an )
1. Language Model
7
1-1. Language Model ์ด๋ž€?
KOREAN INFORMATION PROCESSING Presentation
๊ทธ๋Ÿฌ๋‚˜โ€ฆ
๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’• ๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜ ๐’Š๐’Š๐’Š๐’Š ๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•
= ๐‘ƒ๐‘ƒ ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก ๐‘ƒ๐‘ƒ ๐‘–๐‘–๐‘–๐‘– ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘ƒ๐‘ƒ(๐‘ ๐‘ ๐‘ ๐‘ |๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘–๐‘–๐‘–๐‘–)๐‘ƒ๐‘ƒ(๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก|๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘–๐‘–๐‘–๐‘– ๐‘ ๐‘ ๐‘ ๐‘ )
๋ฌธ์žฅ์ด ๊ธธ์–ด์งˆ ๊ฒฝ์šฐ, ๋งค์šฐ ๋ณต์žกํ•œ chain rule ๊ณ„์‚ฐ์„ ์š”๊ตฌํ•œ๋‹ค.
๐‘ท๐‘ท ๐’˜๐’˜ ์— ๋Œ€ํ•ด, ์ผ์ผ์ด count๋ฅผ ์„ธ์–ด ํ™•๋ฅ ์„ ๊ตฌํ•˜๊ธฐ์—” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งค์šฐ ๋ถˆ์ถฉ๋ถ„ํ•˜๋‹ค.
๐‘ƒ๐‘ƒ(๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก|๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– ๐‘ ๐‘ ๐‘ ๐‘ ) ์—์„œ, ๐‘ƒ๐‘ƒ(๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– ๐‘ ๐‘ ๐‘ ๐‘ ) ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋“ฑ์žฅํ•  ํ™•๋ฅ ์€ ๋งค์šฐ ๋‚ฎ๋‹ค.
1. Language Model
8
1-2. N-gram
KOREAN INFORMATION PROCESSING Presentation
Markov assumption
๋‹จ์–ด์— ์ด์ „ ๋‹จ์–ด๋“ค์˜ ์ƒํƒœ๊ฐ€ ๋ชจ๋‘ ๋ฐ˜์˜๋˜์–ด ์žˆ๋‹ค๋Š” ๋‹จ์ˆœํ™” ๊ฐ€์ •.
๊ทธ๋ ‡๋‹ค๋ฉด, ๋‹จ์ˆœํ™”ํ•˜์—ฌ ์ถ”์ •ํ•ด๋ณด์ž
๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•|๐’•๐’•๐’•๐’•๐’•๐’• ๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜ ๐’Š๐’Š๐’Š๐’Š ๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’• โ‹ ๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•|๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•
๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•|๐’•๐’•๐’•๐’•๐’•๐’• ๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜ ๐’Š๐’Š๐’Š๐’Š ๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’• โ‹ ๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•|๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•
โ€ฆ
1. Language Model
9
1-2. N-gram
KOREAN INFORMATION PROCESSING Presentation
Markov assumption
Markov ๊ฐ€์ •์œผ๋กœ๋ถ€ํ„ฐ ์ถœ๋ฐœํ•œ n-gram ๋ชจ๋ธ
N-gram model
LM์˜ history๋กœ์จ ์ด์ „ n-1 ๊ฐœ์˜ ๋‹จ์–ด(๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ๐Ÿ,โ€ฆ,๐’˜๐’˜๐’๐’โˆ’๐Ÿ๐Ÿ)๋งŒ์„ ์ฐธ์กฐํ•˜๋Š” ๋ชจ๋ธ.
๐‘ท๐‘ท ๐’˜๐’˜๐’๐’ = ๐‘ท๐‘ท(๐’˜๐’˜๐’๐’|๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ๐Ÿ,โ€ฆ,๐’˜๐’˜๐’๐’โˆ’๐Ÿ๐Ÿ)
1. Language Model
10
1-2. N-gram
KOREAN INFORMATION PROCESSING Presentation
ํ•œ๊ณ„์ 
๊ฐ ๋‹จ์–ด๋ฅผ one-hot ์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค.
vocabulary size๊ฐ€ ์ฆ๊ฐ€ํ•  ๊ฒฝ์šฐ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋Ÿ‰์ด ๋น„ํšจ์œจ์ ์œผ๋กœ ์ฆ๊ฐ€
ํ†ต๊ณ„์  ๊ธฐ๋ฐ˜์˜ parameter estimation
unseen word์— ๋Œ€ํ•ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†์Œ.
์–ธ์–ด์—๋Š” long-distance dependency๊ฐ€ ์กด์žฌํ•œ๋‹ค.
n-gram์œผ๋กœ ์ฐพ๊ธฐ ๋งค์šฐ ๋ถ€์ ์ ˆํ•จ.
1. Language Model
11
1-2. N-gram
KOREAN INFORMATION PROCESSING Presentation
ํ•œ๊ณ„์ ์˜ ํ•ด๊ฒฐ
๊ฐ ๋‹จ์–ด๋ฅผ one-hot ์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค.
vocabulary size๊ฐ€ ์ฆ๊ฐ€ํ•  ๊ฒฝ์šฐ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋Ÿ‰์ด ๋น„ํšจ์œจ์ ์œผ๋กœ ์ฆ๊ฐ€
ํ†ต๊ณ„์  ๊ธฐ๋ฐ˜์˜ parameter estimation
unseen word์— ๋Œ€ํ•ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†์Œ.
์–ธ์–ด์—๋Š” long-distance dependency๊ฐ€ ์กด์žฌํ•œ๋‹ค.
n-gram์œผ๋กœ ์ฐพ๊ธฐ ๋งค์šฐ ๋ถ€์ ์ ˆํ•จ.
word embedding
FFNN, RNN
RNN (LSTM)
1. Language Model
12
1-3. Standard Neural LM
KOREAN INFORMATION PROCESSING Presentation
Neural Probabilistic Language Model (Bengioโ€™ 2003)
Prediction
z Encoding
Word Embedding
1. Language Model
13
1-3. Standard Neural LM
KOREAN INFORMATION PROCESSING Presentation
์ด 3๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ
Prediction
Encoding
Word Embedding (๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ๐Ÿ,โ€ฆ,๐’˜๐’˜๐’๐’โˆ’๐Ÿ๐Ÿ)
๐’˜๐’˜๐’๐’ loss
min
1. Language Model
14
1-3. Standard Neural LM
KOREAN INFORMATION PROCESSING Presentation
1๋‹จ๊ณ„ โ€“ Word Embedding
Prediction
Encoding
Word Embedding
๊ฐ ๋‹จ์–ด๋ฅผ dense vector๋กœ mapping
neural model์„ ํ†ตํ•ด ๋ฌธ์žฅ ๋‚ด ๊ฐ ๋‹จ์–ด์˜
ํ™•๋ฅ  ๊ฐ’ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต
๋ฐฉ๋ฒ•1) Skip gram
๋ฐฉ๋ฒ•2) CBOW (Continuous Bag Of Words)
1. Language Model
15
1-3. Standard Neural LM
KOREAN INFORMATION PROCESSING Presentation
2๋‹จ๊ณ„ โ€“ Encoding
Prediction
Encoding
Word Embedding
Context๋ฅผ dense vector๋กœ mapping
FFNN, RNN ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์กด ํ†ต๊ณ„์ 
๊ธฐ๋ฐ˜ ์˜ˆ์ธก์˜ ํ•œ๊ณ„์ ์„ ๊ทน๋ณต
RNN : long distance dependency๋กœ ํ†ต์šฉ
1. Language Model
16
1-3. Standard Neural LM
KOREAN INFORMATION PROCESSING Presentation
3๋‹จ๊ณ„ โ€“ Prediction
Prediction
Encoding
Word Embedding
Maximum likelihood estimation
โˆด ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚  ๋‹จ์–ด ๐’˜๐’˜๐’Š๐’Š ์˜ log ํ™•๋ฅ ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋ก ํ•™์Šต.
๐’”๐’”(๐’‰๐’‰, ๐’˜๐’˜๐’Š๐’Š):scoring function (context ๐’‰๐’‰ โ‰ˆtarget word ๐’˜๐’˜?)
๐‘พ๐‘พ๐’Š๐’Š :NeuralLMmodel์˜weight(Cdimension, V๊ฐœ์กด์žฌ)
๐’ƒ๐’ƒ๐’Š๐’Š :NeuralLMmodel์˜bias(Vdimension, )
๐’‰๐’‰:context ๋ฅผ์ธ์ฝ”๋”ฉํ•œ๋ฒกํ„ฐ
1. Language Model
17
1-3. Standard Neural LM
KOREAN INFORMATION PROCESSING Presentation
3๋‹จ๊ณ„ โ€“ Prediction
Prediction
Encoding
Word Embedding
Maximum likelihood estimation
๋…ผ๋ฌธ์—๋Š” C dimension ์ด๋ผ๊ณ  ๊ธฐ์ˆ ๋˜์–ด ์žˆ์œผ๋‚˜, ๊ณ„์‚ฐ์ด ๋งž์ง€
์•Š์•„ ํ•ด๋‹น ๋…ผ๋ฌธ์˜ 1์ €์ž(Yunchuan Chen) ์—๊ฒŒ ๋ฉ”์ผ์„ ๋“œ๋ฆฐ ๊ฒฐ
๊ณผ V dimension ์ด ๋งž๋‹ค๋Š” ํ™•๋‹ต์„ ๋ฐ›์•˜๊ณ , ๊ณง ๋…ผ๋ฌธ์„ ์ˆ˜์ •ํ•˜
์—ฌ arXiv์— ์ด๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ๊ฒŒ์žฌํ•˜์‹ค ๊ฒƒ์ด๋ผ๊ณ  ํ•จ.
๐’ƒ๐’ƒ๐’Š๐’Š :NeuralLMmodel์˜bias(Vdimension )
1. Language Model
18
1-3. Standard Neural LM
KOREAN INFORMATION PROCESSING Presentation
3๋‹จ๊ณ„ โ€“ Prediction
Prediction
Encoding
Word Embedding โˆด ๐‘ฝ๐‘ฝ ๋‚ด์— ์žˆ๋Š” ๋ชจ๋“  ๋‹จ์–ด ๐’˜๐’˜๐’Š๐’Š ์— ๋Œ€ํ•ด ๊ฐ๊ฐ
์„ ๋ชจ๋‘ ๊ณ„์‚ฐํ•ด์ฃผ์–ด์•ผ ํ•จ
Maximum likelihood estimation
1. Language Model
19
1-3. Standard Neural LM
KOREAN INFORMATION PROCESSING Presentation
๋ฌด์—‡์ด ๋ฌธ์ œ์ธ๊ฐ€?
Time complexity Memory complexity
Vocabulary size dependent
โ‰ˆ time complexity
parameters โˆ Vocabulary size
Hierarchical softmax by Bayesian network
Importance sampling (Bengio & Senecal)
Noise Contrastive Estimation, etcโ€ฆ
Differentiated softmax (Chen et al.)
W๋งŒ compress, input์€ ๊ทธ๋Œ€๋กœ์ธ ๋‹จ์ 
1. Language Model
20KOREAN INFORMATION PROCESSING Presentation
ํ†ต๊ณ„์  ๊ธฐ๋ฐ˜์˜ parameter estimation
unseen word์— ๋Œ€ํ•ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†์Œ.
FFNN, RNN
1-3. Standard Neural LM
๋ฌด์—‡์ด ๋ฌธ์ œ์ธ๊ฐ€?
๊ทธ๋Ÿฌ๋‚˜, Neural LM ์—์„œ๋„ infrequent, ์ฆ‰ ์ž์ฃผ ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด๋Š” ๋งค์šฐ
์ ์€ ํšŸ์ˆ˜๋กœ ์—…๋ฐ์ดํŠธ ๋˜์–ด ์ œ๋Œ€๋กœ ํ•™์Šต์ด ๋˜์ง€ ์•Š๊ณ  ๋ฉ”๋ชจ๋ฆฌ์™€ ์‹œ๊ฐ„ ๋ณต์žก๋„
๋งŒ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฌธ์ œ์ ์œผ๋กœ ๋‚จ์•„์žˆ๋‹ค.
์ด๋ฅผ ์ข€ ๋” ํšจ์œจ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์—†์„๊นŒ?
Proposed Model
2-1. Sparse Representation
2-2. Embedding Compression
21KOREAN INFORMATION PROCESSING Presentation
2-3. Prediction Compression
2-4. ZRegression NCE
2. Proposed Model
22
2-1. Sparse Representation
KOREAN INFORMATION PROCESSING Presentation
์˜์˜์‚ฌ์ „, ์™œ ์“ฐ๋Š”๊ฐ€?
[-] lung disease that is otherwise known as silicosis.
์ž์ฃผ ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š”, ์–ด๋ ค์šด, ๋ชจ๋ฅด๋Š” ๋‹จ์–ด๋Š” ์ž์ฃผ
๋“ฑ์žฅํ•˜๋Š” ์•Œ๊ธฐ ์‰ฌ์šด ๋‹จ์–ด๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค!
2. Proposed Model
23
2-1. Sparse Representation
KOREAN INFORMATION PROCESSING Presentation
์˜์˜์‚ฌ์ „, ์™œ ์“ฐ๋Š”๊ฐ€?
[-] lung disease that is otherwise known as silicosis.
์ž์ฃผ ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š”, ์–ด๋ ค์šด, ๋ชจ๋ฅด๋Š” ๋‹จ์–ด๋Š” ์ž์ฃผ
๋“ฑ์žฅํ•˜๋Š” ์•Œ๊ธฐ ์‰ฌ์šด ๋‹จ์–ด๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค!
Infrequent word ์— ๋Œ€ํ•ด, ๋ฌด์ž‘์ • vocabulary size๋ฅผ ๋Š˜๋ฆด ๊ฒƒ์ด ์•„๋‹ˆ๋ผ,
๋นˆ์ถœ ๋‹จ์–ด์˜ linear combination์œผ๋กœ ์ •์˜ํ•ด๋ณด์ž!!
2. Proposed Model
24
2-1. Sparse Representation
KOREAN INFORMATION PROCESSING Presentation
Embedding ๊ตฌ์กฐ
๐‘ฝ๐‘ฝ : Vocabulary
๐‘ฉ๐‘ฉ : Base set
8k of common words
๐‘ช๐‘ช : Uncommon words
vocabulary ํฌ๊ธฐ์— ๋น„๋ก€ํ•˜
์—ฌ complexity ์ฆ๊ฐ€ ๋ฐฉ์ง€
์ด๋ฅผ ํ†ตํ•ด ๋‹จ์–ด๋ฅผ vector
์— mapping ํ•  ๋•Œ์˜ ์ฐจ
์›์„ ์ถ•์†Œ
์ถ•์†Œ ๊ณผ์ •์—์„œ, sparse
vector ๋ฅผ ์ด์šฉํ•˜๋ฏ€๋กœ
๋ณต์žก๋„ ์ ˆ์•ฝ
2. Proposed Model
25
2-1. Sparse Representation
KOREAN INFORMATION PROCESSING Presentation
Word Embedding ํ•™์Šต
๐‘ฝ๐‘ฝ : Vocabulary
๐‘ฉ๐‘ฉ : Base set
8k of common words
๐‘ช๐‘ช : Uncommon words
์ „์ฒด vocabulary์— ๋Œ€ํ•ด SkipGram ๋ฐฉ์‹์œผ๋กœ embedding
Common words ๐‘ฉ๐‘ฉ ๊ฐœ์— ๋Œ€ํ•œ embedding์„ ์ถ”์ถœ
2. Proposed Model
26
2-1. Sparse Representation
KOREAN INFORMATION PROCESSING Presentation
Common Base Set ์ถ”์ถœ
๐‘ฝ๐‘ฝ : Vocabulary
๐‘ฉ๐‘ฉ : Base set
8k of common words
๐‘ช๐‘ช : Uncommon words
sparse code ๐’™๐’™
๐’™๐’™ : one-hot vector (๐‘ฉ๐‘ฉโ€™s iโ€™th word) ์ผ ๋•Œ,
needs to learn sparse representation ๐’™๐’™
2. Proposed Model
27
2-1. Sparse Representation
KOREAN INFORMATION PROCESSING Presentation
Sparse vector ํ•™์Šต
๐‘ฝ๐‘ฝ : Vocabulary
๐‘ฉ๐‘ฉ : Base set
8k of common words
๐‘ช๐‘ช : Uncommon words
: fitting loss
Sparse coding representation โ‰ˆ โ€œtrueโ€œ representation
: ๐’๐’๐Ÿ๐Ÿ regularizer
๐’™๐’™ ์˜ sum 1 ์ด ๋˜๋„๋ก ํ•จ
: regularization term
: non-negative
2. Proposed Model
28
2-1. Sparse Representation
KOREAN INFORMATION PROCESSING Presentation
Optimization function
๐‘ฝ๐‘ฝ : Vocabulary
๐‘ฉ๐‘ฉ : Base set
8k of common words
: ๐‘ณ๐‘ณ(๐’™๐’™)
: ๐‘น๐‘น๐Ÿ๐Ÿ(๐’™๐’™)
: ๐‘น๐‘น๐Ÿ๐Ÿ(๐’™๐’™)
: ๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„
๋ฒ”์œ„์—์„œ๋ฒ—์–ด๋‚œ๊ฐ’์„๋ฒ”์œ„๋‚ด์˜
๊ฐ’์œผ๋กœ์กฐ์ •
2. Proposed Model
29
2-1. Sparse Representation
KOREAN INFORMATION PROCESSING Presentation
Objective function
๐‘ฝ๐‘ฝ : Vocabulary
๐‘ฉ๐‘ฉ : Base set
8k of common words
์ด ๋•Œ, fitting๊ณผ regularization์ด ์ผ์ •ํ•œ ๋น„์œจ
์„ ์ด๋ฃจ๋„๋ก updateํ•œ๋‹ค.
2. Proposed Model
30
2-2. Embedding Compression
KOREAN INFORMATION PROCESSING Presentation
Uncommon Word Representation
๐‘ฉ๐‘ฉ : Base set
8k of common words
๐‘ช๐‘ช : Uncommon words non-negative sparse code ๐’™๐’™ โˆŠ โ„๐‘ฉ๐‘ฉ
uncommon word
์ด ๋•Œ, U๊ฐ€ dense ํ•˜๋ฏ€๋กœ ์—ญ์‹œ sparse ํ•˜์ง€ ์•Š๋‹ค.
2. Proposed Model
31
2-3. Prediction Compression
KOREAN INFORMATION PROCESSING Presentation
Flashback
Prediction
Encoding
Word Embedding
Maximum likelihood estimation
โˆด ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚  ๋‹จ์–ด ๐’˜๐’˜๐’Š๐’Š ์˜ log ํ™•๋ฅ ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋ก ํ•™์Šต.
๐’”๐’”(๐’‰๐’‰, ๐’˜๐’˜๐’Š๐’Š):scoring function (context ๐’‰๐’‰ โ‰ˆtarget word ๐’˜๐’˜?)
๐‘พ๐‘พ๐’Š๐’Š :NeuralLMmodel์˜weight(Cdimension,V๊ฐœ์กด์žฌ)
๐’ƒ๐’ƒ๐’Š๐’Š :NeuralLMmodel์˜bias(Vdimension, )
๐’‰๐’‰:context ๋ฅผ์ธ์ฝ”๋”ฉํ•œ๋ฒกํ„ฐ
2. Proposed Model
32
2-3. Prediction Compression
KOREAN INFORMATION PROCESSING Presentation
Compressing Output of Neural LM
์ „์ฒด Embedding์— ๋Œ€ํ•œ weight์„
๐‘พ๐‘พ, bias๋ฅผ ๐’ƒ๐’ƒ๋ผ ํ•œ๋‹ค.
์ด ๋•Œ, ์ด ์ค‘ common words์— ๋Œ€ํ•œ
weight ๐‘ซ๐‘ซ, bias ๐’„๐’„ ๋กœ ๐‘พ๐‘พ, ๐’ƒ๐’ƒ ํ‘œํ˜„
Word Embedding ์—์„œ๋Š” ๊ณ„์‚ฐ๋œ
sparse code (๐’™๐’™)๋ฅผ ํ†ตํ•ด ์ด๋ฅผ ํ‘œํ˜„
2. Proposed Model
33
2-3. Prediction Compression
KOREAN INFORMATION PROCESSING Presentation
Word Embedding โ†’ Prediction
Common word ์— ๋Œ€ํ•œ embedding ๐‘ผ๐‘ผ ๋ฅผ ๋ฏธ๋ฆฌ ์•Œ ์ˆ˜ ์žˆ์Œ.
โ†’ rare word ์— ๋Œ€ํ•œ sparse vector ๐’™๐’™ ๋ฅผ ๋ฏธ๋ฆฌ ์•Œ ์ˆ˜ ์žˆ์Œ.
๊ทธ๋Ÿฌ๋‚˜ Output Weight, bias์—์„œ๋Š” ํ•™์Šตํ•  ๋Œ€์ƒ์ด ์—†์œผ๋ฏ€๋กœ ๐’™๐’™ ๋ฅผ ์•Œ ์ˆ˜ ์—†์Œ.
Embedding ์‹œ ๊ณ„์‚ฐ๋œ sparse code (๐’™๐’™)๋ฅผ prediction์—์„œ ์‚ฌ์šฉํ•˜์—ฌ, Infrequentํ•œ
word์— ๋Œ€ํ•œ weight์™€ bias๋ฅผ frequent word์— ๋Œ€ํ•œ parameter์™€ sparse code
๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ .
Then, how do we obtain sparse vector ๐’™๐’™ for output weight & bias?
2. Proposed Model
34
2-3. Prediction Compression
KOREAN INFORMATION PROCESSING Presentation
Compressing Output of Neural LM
context ๐’‰๐’‰์™€ word ๐’˜๐’˜๊ฐ€ ๋น„์Šทํ•œ structure
๋ฅผ ๊ฐ€์ง€๋ฏ€๋กœ, Word Embedding๊ณผ
output weight ์—ญ์‹œ๋„ ๋น„์Šทํ•œ ๊ตฌ์กฐ๋ฅผ
๊ฐ€์งˆ ๊ฒƒ.
๋”ฐ๋ผ์„œ, word์—์„œ ์‚ฌ์šฉํ•œ sparse code
๐’™๐’™ ๋ฅผ ๐–๐–์—์„œ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
2. Proposed Model
35
2-4. ZRegression NCE
KOREAN INFORMATION PROCESSING Presentation
Noise-Contrastive Estimation
Gutmann and Hyvarinen, 2012
Non-linear logistic regression
Noise-Contrastive Estimation
Discriminate real data and artificial data (=noise)
Log-density function
Maximize log probability of softmax
2. Proposed Model
36
2-4. ZRegression NCE
KOREAN INFORMATION PROCESSING Presentation
Noise-Contrastive Estimation
์ถœ์ฒ˜ : https://www.tensorflow.org/versions/r0.10/tutorials/word2vec/index.html
Maximum Likelihood Estimation Noise-Contrastive Estimation
2. Proposed Model
37
2-4. ZRegression NCE
KOREAN INFORMATION PROCESSING Presentation
NCE vs. MLE
์ถœ์ฒ˜ : https://www.tensorflow.org/versions/r0.10/tutorials/word2vec/index.html
Maximum Likelihood Estimation Noise-Contrastive Estimation
vocabulary ๋‚ด ๋ชจ๋“  word ์— ๋Œ€ํ•ด
softmax estimation ์„ ์‹คํ–‰ํ•œ๋‹ค.
noise distribution k๊ฐœ์— ๋Œ€ํ•ด์„œ๋งŒ
softmax estimation ์„ ์‹คํ–‰ํ•œ๋‹ค.
ํ•˜๋‚˜์˜ positive data์— ๋Œ€ํ•ด noise๋ฅผ
k๊ฐœ๋งŒํผ ์ƒ์„ฑ
ํ•˜๋‚˜์˜ positive data์— ๋Œ€ํ•ด ์ „์ฒด
word์˜ probability๋ฅผ ํ•„์š”๋กœ ํ•จ.
2. Proposed Model
38
2-4. ZRegression NCE
KOREAN INFORMATION PROCESSING Presentation
Noise-Contrastive Estimation
MLE ๋ฐฉ์‹์—์„œ๋Š” ๐’๐’๐’‰๐’‰ = ๋ฅผ ์ „์ฒด vocabulary์— ๋Œ€ํ•ด ๊ตฌํ•œ๋‹ค.
Noise-Contrastive Estimation
๊ทธ๋Ÿฌ๋‚˜ NCE ๋ฐฉ์‹์—์„œ๋Š” ๐’๐’๐’‰๐’‰๊ฐ€ h์— dependentํ•˜์—ฌ ์ ์šฉํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค.
Mnih & The (2012)์— ์˜ํ•ด, ๐’๐’๐’‰๐’‰ = ๐Ÿ๐Ÿ ๋กœ ๊ฐ€์ •ํ•˜์—ฌ๋„ ์ผ๋ฐ˜์ ์œผ๋กœ ์„ฑ๋ฆฝํ•œ๋‹ค.
2. Proposed Model
39
2-4. ZRegression NCE
KOREAN INFORMATION PROCESSING Presentation
Noise-Contrastive Estimation
Noise-Contrastive Estimation
Mnih & The (2012)์— ์˜ํ•ด, ๐’๐’๐’‰๐’‰ = ๐Ÿ๐Ÿ ๋กœ ๊ฐ€์ •.
๐’˜๐’˜๐’Š๐’Š๊ฐ€ ๐’‰๐’‰์— ๋“ฑ์žฅํ•  log probability ๐’˜๐’˜๐’‹๐’‹๊ฐ€ ๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’์— ๋“ฑ์žฅํ•  log probability
์›ํ•˜๋Š” ๋‹จ์–ด๋Š” context์—์„œ, ๋‚˜๋จธ์ง€ ๋‹จ์–ด๋Š” noise์—์„œ ๋“ฑ์žฅํ•  ํ™•๋ฅ 
์ด ๋†’์•„์ง€๋„๋ก ํ•™์Šต.
๐‘ท๐‘ท๐’๐’ โˆถ negative sample(noise)
์—์„œ ์ถ”์ถœํ•  ํ™•๋ฅ 
2. Proposed Model
40
2-4. ZRegression NCE
KOREAN INFORMATION PROCESSING Presentation
Noise-Contrastive Estimation
ZRegression
Objective function :
๐’๐’๐’‰๐’‰ = ๐Ÿ๐Ÿ ์˜ ๊ฒฝ์šฐ unstability ๋ฐœ์ƒ ๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’
2. Proposed Model
41
2-4. ZRegression NCE
KOREAN INFORMATION PROCESSING Presentation
ZRegression NCE
ZRegression
: Regression Layer
Evaluation
3-1. Dataset
3-2. Qualitative Analysis
42KOREAN INFORMATION PROCESSING Presentation
3-3. Quantitative Analysis
3. Evaluation
43
3-1. Dataset
KOREAN INFORMATION PROCESSING Presentation
์œ„ํ‚คํ”ผ๋””์•„ 2014๋…„ ์ž๋ฃŒ
v
100M for neural LM
2014 Wikipedia dump
Preprocessing
1.6 billion running words
Train / Validation / Test split
All for backoff n-gram
3. Evaluation
44
3-2. Qualitative Analysis
KOREAN INFORMATION PROCESSING Presentation
8000๊ฐœ์˜ common โ†’ 2k~24k๊ฐœ์˜ uncommon
v
๐‘ฉ๐‘ฉ : Base set
8k of common words
๐‘ช๐‘ช : Uncommon words
2k ~ 24k
v
Pre-trained word
embedding ์ด์šฉ
Adam Optimizer
Small coefficients
3. Evaluation
45
3-2. Qualitative Analysis
KOREAN INFORMATION PROCESSING Presentation
8000๊ฐœ์˜ common โ†’ 2k~24k๊ฐœ์˜ uncommon
v
๋‹จ์ˆ˜ํ˜• โ†’ common
Uncommon(rare)์˜
coefficient๊ฐ€ 0.6 ์ด์ƒ
์œ ์˜๋ฏธํ•˜๊ฒŒ mapping ๋จ
๋ณต์ˆ˜ํ˜• โ†’ rare
coefficient
commonwords
Uncommon(rare)์˜
sparse representation
3. Evaluation
46
3-3. Quantitative Analysis
KOREAN INFORMATION PROCESSING Presentation
Perplexity measure
v
N : Test corpus์˜ running words ๊ฐœ์ˆ˜
Lower the Better, ๋‚ฎ์„์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๋” ๋›ฐ์–ด๋‚จ์„ ์˜๋ฏธ
LSTM-RNN ์„ ํ†ตํ•ด Encoding
200d Hidden Layer, Adam Optimization
Given context์—์„œ, ์–ผ๋งˆ๋‚˜ ์ฃผ์–ด์ง„ word๊ฐ€ ์•Œ๋งž๊ฒŒ ๋„์ถœ๋˜๋Š”๊ฐ€?
Perplexity of each models
Memory reduction compared to LSTM-z
3. Evaluation
47
3-3. Quantitative Analysis
KOREAN INFORMATION PROCESSING Presentation
Perplexity measure
v
- KN3
Knerser-Ney smoothing technique on 3-gram LM
- LBL5 (5 preceding words)
Log-bilinear model (Mnih and Hinton, 2007)
- LSTM-s
Standard LSTM-RNN LM
- LSTM-z
Enhanced with ZRegression
- LSTM-z, wb
Compressing both weights and biases in Prediction
- LSTM-z, w
Compressing only weights in prediction
3. Evaluation
48
3-3. Quantitative Analysis
KOREAN INFORMATION PROCESSING Presentation
Vocabulary Size X PPL
Vocabulary size๊ฐ€ ์ฆ๊ฐ€ํ•˜์—ฌ๋„, Compressionํ•œ ๋ฐฉ๋ฒ•์—์„œ๋Š” ๊ฑฐ์˜ Memory๊ฐ€ ์ฆ๊ฐ€ํ•˜์ง€ ์•Š์Œ
๋ฐ˜๋ฉด, Compression์„ ํ•˜์—ฌ๋„ KN3๋ณด๋‹ค PPL์ด ๋‚ฎ์Œ( ๋‚ฎ์„์ˆ˜๋ก ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ )
3. Evaluation
49
3-3. Quantitative Analysis
KOREAN INFORMATION PROCESSING Presentation
Method Comparison
KN3 ์ด ๋” ๋„“์€ corpus์—์„œ training ๋์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , LSTMs์˜ ์„ฑ๋Šฅ์ด ๋” ์ข‹์Œ.
LSTM-z,w > LSTM-z (80% ๊ฐ€๋Ÿ‰์˜ memory reduction + ์„ฑ๋Šฅํ–ฅ์ƒ)
3. Evaluation
50
3-3. Quantitative Analysis
KOREAN INFORMATION PROCESSING Presentation
์™œ LSTM-z,w > LSTM-z,wb ??
Compression์˜ ๊ฒฐ๊ณผ, ์ •๋ณด ์†์‹ค์ด ๋ฐœ์ƒํ•˜๋Š” ๋Œ€์‹  rare word์— ๋Œ€ํ•œ ์ •ํ™•๋„๊ฐ€ ๋†’์•„์ง„๋‹ค.
LSTM-z,wb์˜ ๊ฒฝ์šฐ, bias๋ฅผ compressํ•  ๋•Œ์˜ ์†์‹ค์ด ์ •ํ™•๋„ ์ƒ์Šน๋ถ„๋ณด๋‹ค ํฌ๋‹ค.
3. Evaluation
51
3-4. Conclusion
KOREAN INFORMATION PROCESSING Presentation
Compressing NL by Sparse Word Representation
Sparse Linear Combinations for Rare Words, ์ด๋ฅผ ํ†ตํ•ด embedding dimension๊ณผ
prediction dimension์„ ํ˜„์ €ํ•˜๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, infrequent word๊ฐ€ ์ œ๋Œ€๋กœ ํ•™์Šต๋˜์ง€
์•Š๋Š” ๋ฌธ์ œ ๋˜ํ•œ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.
๋˜ํ•œ, vocabulary size์™€ ๋น„๋ก€ํ•˜์—ฌ memory์™€ time complexity ๊ฐ€ ๊ฐ™์ด ์ฆ๊ฐ€ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ
ํ•ด๊ฒฐํ•˜์—ฌ, vocabulary size์™€ ๊ฑฐ์˜ ๋ฌด๊ด€ํ•˜๊ฒŒ ์ผ์ •ํ•œ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋Ÿ‰์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค.
๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋Ÿ‰์„ ์ค„์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  performance ์—ญ์‹œ ์ฆ๊ฐ€ํ•จ(PPL ์ˆ˜์น˜์˜ ๊ฐ์†Œ)์„ ํ™•
์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
Thank you for your attention!
52

More Related Content

PPTX
Bi directional attention flow for nlp
PPTX
MRC recent trend_ppt
PPTX
220112 ์ง€์Šนํ˜„ mauve
PDF
[224] backend แ„€แ…ขแ„‡แ…กแ†ฏแ„Œแ…กแ„‹แ…ด neural machine translation แ„€แ…ขแ„‡แ…กแ†ฏแ„€แ…ต แ„€แ…ตแ†ทแ„‰แ…กแ†ผแ„€แ…งแ†ผ
PDF
230223_word2vec2_๋…ผ๋ฌธ๋ฆฌ๋ทฐ.pdf
PDF
230112_word2vec1_๋…ผ๋ฌธ๋ฆฌ๋ทฐ.pdf
PDF
(Papers Review)CNN for sentence classification
PDF
๋ฌธ์ž ๋‹จ์œ„์˜ Neural Machine Translation
Bi directional attention flow for nlp
MRC recent trend_ppt
220112 ์ง€์Šนํ˜„ mauve
[224] backend แ„€แ…ขแ„‡แ…กแ†ฏแ„Œแ…กแ„‹แ…ด neural machine translation แ„€แ…ขแ„‡แ…กแ†ฏแ„€แ…ต แ„€แ…ตแ†ทแ„‰แ…กแ†ผแ„€แ…งแ†ผ
230223_word2vec2_๋…ผ๋ฌธ๋ฆฌ๋ทฐ.pdf
230112_word2vec1_๋…ผ๋ฌธ๋ฆฌ๋ทฐ.pdf
(Papers Review)CNN for sentence classification
๋ฌธ์ž ๋‹จ์œ„์˜ Neural Machine Translation

Similar to Compressing neural language models by sparse word representation (17)

PDF
Text summarization
PPTX
211223 ์ง€์Šนํ˜„ text generation survey
PDF
NLU Tech Talk with KorBERT
PDF
Pycon Korea 2020
PPTX
๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๊ธฐ์ˆ (I)
PPTX
๋”ฅ๋Ÿฌ๋‹์„ ์ด์šฉํ•œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์˜ ์—ฐ๊ตฌ๋™ํ–ฅ
PDF
Transliteration English to Korean
PPT
์˜์–ด ๋งํ•˜๊ธฐ ์ž๋™์ฑ„์  ํ”„๋กœ๊ทธ๋žจ์˜ ํ˜„์žฌ์™€ ๋ฏธ๋ž˜
PPTX
Word 2 Vec Algorithm
PDF
ํŒŒ์ด์ฌ๊ณผ ์ž์—ฐ์–ด 4 | word/doc2vec
PDF
Deep Learning for Chatbot (2/4)
PDF
[study] character aware neural language models
PPTX
Papago/N2MT ๊ฐœ๋ฐœ์ด์•ผ๊ธฐ
PDF
์ž์—ฐ์–ด5 | 1์ฐจ๊ฐ•์˜
PDF
Vs^3 net for machine reading comprehension question answering
PDF
ํŒŒ์ด์ฝ˜ ํ•œ๊ตญ 2019 ํŠœํ† ๋ฆฌ์–ผ - LRP (Part 2)
ย 
PDF
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
Text summarization
211223 ์ง€์Šนํ˜„ text generation survey
NLU Tech Talk with KorBERT
Pycon Korea 2020
๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๊ธฐ์ˆ (I)
๋”ฅ๋Ÿฌ๋‹์„ ์ด์šฉํ•œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์˜ ์—ฐ๊ตฌ๋™ํ–ฅ
Transliteration English to Korean
์˜์–ด ๋งํ•˜๊ธฐ ์ž๋™์ฑ„์  ํ”„๋กœ๊ทธ๋žจ์˜ ํ˜„์žฌ์™€ ๋ฏธ๋ž˜
Word 2 Vec Algorithm
ํŒŒ์ด์ฌ๊ณผ ์ž์—ฐ์–ด 4 | word/doc2vec
Deep Learning for Chatbot (2/4)
[study] character aware neural language models
Papago/N2MT ๊ฐœ๋ฐœ์ด์•ผ๊ธฐ
์ž์—ฐ์–ด5 | 1์ฐจ๊ฐ•์˜
Vs^3 net for machine reading comprehension question answering
ํŒŒ์ด์ฝ˜ ํ•œ๊ตญ 2019 ํŠœํ† ๋ฆฌ์–ผ - LRP (Part 2)
ย 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
Ad

More from Brian Kim (8)

PDF
FreeAnchor
PDF
20190708 bumsookim yolact
PDF
20190718 bumsookim 2_attention
PDF
Spectral cnn
PDF
Representation learning
PDF
Mastering the game of go with deep neural networks and tree searching
PDF
Dcgan
PPTX
Google net
FreeAnchor
20190708 bumsookim yolact
20190718 bumsookim 2_attention
Spectral cnn
Representation learning
Mastering the game of go with deep neural networks and tree searching
Dcgan
Google net
Ad

Compressing neural language models by sparse word representation

  • 1. by Sparse Word Representation Compressing Neural Language Model KoreaUniversity, DepartmentofComputerScience&Radio CommunicationEngineering 2016010646 ๊น€๋ฒ”์ˆ˜ 2016010636 ์ด์ง„ํ˜ Korean Information Processing Professor HaechangLim 1
  • 2. KOREAN INFORMATION PROCESSING Presentation Contents 01.LanguageModel 02.ProposedModel 1-1. Language Model ์ด๋ž€? 2-1. Sparse Representation 2 1-2. N-grams 1-3. Standard Neural LM 2-2. Embedding Compression 2-3. Prediction Compression 2-4. ZRegression NCE 03.Evaluation 3-1. Dataset 3-2. Qualitative Analysis 3-3. Quantitative analysis 3-4. Conclusion
  • 3. Language Model 1-1. Language Model ์ด๋ž€? 1-2. N-grams 3KOREAN INFORMATION PROCESSING Presentation 1-3. Standard Neural LM
  • 4. 1. Language Model 4 1-1. Language Model ์ด๋ž€? Unfortunately, I am an ____________ KOREAN INFORMATION PROCESSING Presentation Language Model (์–ธ์–ด๋ชจ๋ธ) ํ˜„์žฌ๊นŒ์ง€์˜ context(history)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚  ๋‹จ์–ด์˜ ํ™•๋ฅ  ๐‘ท๐‘ท ๐’˜๐’˜ ๐’„๐’„ ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ชจ๋ธ ๊ธฐ๋ฒ• idiot 0.672 flower 0.115 psycho-pass 0.581 genius 0.336 โ€ฆ walk 0.016 process 0.052 cancel 0.039 ์–ธ์–ด๋ชจ๋ธ์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€
  • 5. 1. Language Model 5 1-1. Language Model ์ด๋ž€? KOREAN INFORMATION PROCESSING Presentation LM ์ •๋ฆฌ 1. ๊ฐ ๋‹จ์–ด์˜ ํ™•๋ฅ ๋กœ๋ถ€ํ„ฐ ๋ฌธ์žฅ,. ์ฆ‰ sequence์˜ joint probability๋ฅผ chain rule์— ์˜๊ฑฐํ•˜์—ฌ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๐‘ท๐‘ท( ๐‘พ๐‘พ) = ๐‘ƒ๐‘ƒ(๐‘ค๐‘ค1,๐‘ค๐‘ค2,๐‘ค๐‘ค3,โ€ฆ,๐‘ค๐‘ค๐‘›๐‘›) ๐‘ท๐‘ท ๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ‘๐Ÿ‘,โ€ฆ,๐’˜๐’˜๐’๐’ = ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค1 ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค2 ๐‘ค๐‘ค1 ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค3 ๐‘ค๐‘ค1,๐‘ค๐‘ค2 โ€ฆ๐‘ƒ๐‘ƒ(๐‘ค๐‘ค๐‘›๐‘›|๐‘ค๐‘ค1,๐‘ค๐‘ค2,โ€ฆ,๐‘ค๐‘ค๐‘›๐‘›โˆ’1) ๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’• ๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜ ๐’Š๐’Š๐’Š๐’Š ๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’• = ๐‘ƒ๐‘ƒ ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก ๐‘ƒ๐‘ƒ ๐‘–๐‘–๐‘–๐‘– ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค โ€ฆ ๐‘ƒ๐‘ƒ(๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก|๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘–๐‘–๐‘–๐‘– ๐‘ ๐‘ ๐‘ ๐‘ ) ์–ธ์–ด๋ชจ๋ธ, ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ•˜๋Š”๊ฐ€
  • 6. 1. Language Model 6 1-1. Language Model ์ด๋ž€? KOREAN INFORMATION PROCESSING Presentation ์–ธ์–ด๋ชจ๋ธ, ์™œ ์œ ์šฉํ•œ๊ฐ€? Machine translation Spell Correction Speech Recognition P(delicious fish) > P( ominous fish ) P( I love you ) > P( I loev you ) P( I saw a van ) > P( eyes awe of an )
  • 7. 1. Language Model 7 1-1. Language Model ์ด๋ž€? KOREAN INFORMATION PROCESSING Presentation ๊ทธ๋Ÿฌ๋‚˜โ€ฆ ๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’• ๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜ ๐’Š๐’Š๐’Š๐’Š ๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’• = ๐‘ƒ๐‘ƒ ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก ๐‘ƒ๐‘ƒ ๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก ๐‘ƒ๐‘ƒ ๐‘–๐‘–๐‘–๐‘– ๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘ƒ๐‘ƒ(๐‘ ๐‘ ๐‘ ๐‘ |๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘–๐‘–๐‘–๐‘–)๐‘ƒ๐‘ƒ(๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก|๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค ๐‘–๐‘–๐‘–๐‘– ๐‘ ๐‘ ๐‘ ๐‘ ) ๋ฌธ์žฅ์ด ๊ธธ์–ด์งˆ ๊ฒฝ์šฐ, ๋งค์šฐ ๋ณต์žกํ•œ chain rule ๊ณ„์‚ฐ์„ ์š”๊ตฌํ•œ๋‹ค. ๐‘ท๐‘ท ๐’˜๐’˜ ์— ๋Œ€ํ•ด, ์ผ์ผ์ด count๋ฅผ ์„ธ์–ด ํ™•๋ฅ ์„ ๊ตฌํ•˜๊ธฐ์—” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งค์šฐ ๋ถˆ์ถฉ๋ถ„ํ•˜๋‹ค. ๐‘ƒ๐‘ƒ(๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก๐‘ก|๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– ๐‘ ๐‘ ๐‘ ๐‘ ) ์—์„œ, ๐‘ƒ๐‘ƒ(๐‘ก๐‘ก ๐‘ก๐‘ก๐‘ก๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– ๐‘ ๐‘ ๐‘ ๐‘ ) ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋“ฑ์žฅํ•  ํ™•๋ฅ ์€ ๋งค์šฐ ๋‚ฎ๋‹ค.
  • 8. 1. Language Model 8 1-2. N-gram KOREAN INFORMATION PROCESSING Presentation Markov assumption ๋‹จ์–ด์— ์ด์ „ ๋‹จ์–ด๋“ค์˜ ์ƒํƒœ๊ฐ€ ๋ชจ๋‘ ๋ฐ˜์˜๋˜์–ด ์žˆ๋‹ค๋Š” ๋‹จ์ˆœํ™” ๊ฐ€์ •. ๊ทธ๋ ‡๋‹ค๋ฉด, ๋‹จ์ˆœํ™”ํ•˜์—ฌ ์ถ”์ •ํ•ด๋ณด์ž ๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•|๐’•๐’•๐’•๐’•๐’•๐’• ๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜ ๐’Š๐’Š๐’Š๐’Š ๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’• โ‹ ๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•|๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’• ๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•|๐’•๐’•๐’•๐’•๐’•๐’• ๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜๐’˜ ๐’Š๐’Š๐’Š๐’Š ๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’• โ‹ ๐‘ท๐‘ท ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•|๐’”๐’”๐’”๐’” ๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’•๐’• โ€ฆ
  • 9. 1. Language Model 9 1-2. N-gram KOREAN INFORMATION PROCESSING Presentation Markov assumption Markov ๊ฐ€์ •์œผ๋กœ๋ถ€ํ„ฐ ์ถœ๋ฐœํ•œ n-gram ๋ชจ๋ธ N-gram model LM์˜ history๋กœ์จ ์ด์ „ n-1 ๊ฐœ์˜ ๋‹จ์–ด(๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ๐Ÿ,โ€ฆ,๐’˜๐’˜๐’๐’โˆ’๐Ÿ๐Ÿ)๋งŒ์„ ์ฐธ์กฐํ•˜๋Š” ๋ชจ๋ธ. ๐‘ท๐‘ท ๐’˜๐’˜๐’๐’ = ๐‘ท๐‘ท(๐’˜๐’˜๐’๐’|๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ๐Ÿ,โ€ฆ,๐’˜๐’˜๐’๐’โˆ’๐Ÿ๐Ÿ)
  • 10. 1. Language Model 10 1-2. N-gram KOREAN INFORMATION PROCESSING Presentation ํ•œ๊ณ„์  ๊ฐ ๋‹จ์–ด๋ฅผ one-hot ์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. vocabulary size๊ฐ€ ์ฆ๊ฐ€ํ•  ๊ฒฝ์šฐ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋Ÿ‰์ด ๋น„ํšจ์œจ์ ์œผ๋กœ ์ฆ๊ฐ€ ํ†ต๊ณ„์  ๊ธฐ๋ฐ˜์˜ parameter estimation unseen word์— ๋Œ€ํ•ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†์Œ. ์–ธ์–ด์—๋Š” long-distance dependency๊ฐ€ ์กด์žฌํ•œ๋‹ค. n-gram์œผ๋กœ ์ฐพ๊ธฐ ๋งค์šฐ ๋ถ€์ ์ ˆํ•จ.
  • 11. 1. Language Model 11 1-2. N-gram KOREAN INFORMATION PROCESSING Presentation ํ•œ๊ณ„์ ์˜ ํ•ด๊ฒฐ ๊ฐ ๋‹จ์–ด๋ฅผ one-hot ์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. vocabulary size๊ฐ€ ์ฆ๊ฐ€ํ•  ๊ฒฝ์šฐ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋Ÿ‰์ด ๋น„ํšจ์œจ์ ์œผ๋กœ ์ฆ๊ฐ€ ํ†ต๊ณ„์  ๊ธฐ๋ฐ˜์˜ parameter estimation unseen word์— ๋Œ€ํ•ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†์Œ. ์–ธ์–ด์—๋Š” long-distance dependency๊ฐ€ ์กด์žฌํ•œ๋‹ค. n-gram์œผ๋กœ ์ฐพ๊ธฐ ๋งค์šฐ ๋ถ€์ ์ ˆํ•จ. word embedding FFNN, RNN RNN (LSTM)
  • 12. 1. Language Model 12 1-3. Standard Neural LM KOREAN INFORMATION PROCESSING Presentation Neural Probabilistic Language Model (Bengioโ€™ 2003) Prediction z Encoding Word Embedding
  • 13. 1. Language Model 13 1-3. Standard Neural LM KOREAN INFORMATION PROCESSING Presentation ์ด 3๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ Prediction Encoding Word Embedding (๐’˜๐’˜๐Ÿ๐Ÿ,๐’˜๐’˜๐Ÿ๐Ÿ,โ€ฆ,๐’˜๐’˜๐’๐’โˆ’๐Ÿ๐Ÿ) ๐’˜๐’˜๐’๐’ loss min
  • 14. 1. Language Model 14 1-3. Standard Neural LM KOREAN INFORMATION PROCESSING Presentation 1๋‹จ๊ณ„ โ€“ Word Embedding Prediction Encoding Word Embedding ๊ฐ ๋‹จ์–ด๋ฅผ dense vector๋กœ mapping neural model์„ ํ†ตํ•ด ๋ฌธ์žฅ ๋‚ด ๊ฐ ๋‹จ์–ด์˜ ํ™•๋ฅ  ๊ฐ’ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต ๋ฐฉ๋ฒ•1) Skip gram ๋ฐฉ๋ฒ•2) CBOW (Continuous Bag Of Words)
  • 15. 1. Language Model 15 1-3. Standard Neural LM KOREAN INFORMATION PROCESSING Presentation 2๋‹จ๊ณ„ โ€“ Encoding Prediction Encoding Word Embedding Context๋ฅผ dense vector๋กœ mapping FFNN, RNN ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์กด ํ†ต๊ณ„์  ๊ธฐ๋ฐ˜ ์˜ˆ์ธก์˜ ํ•œ๊ณ„์ ์„ ๊ทน๋ณต RNN : long distance dependency๋กœ ํ†ต์šฉ
  • 16. 1. Language Model 16 1-3. Standard Neural LM KOREAN INFORMATION PROCESSING Presentation 3๋‹จ๊ณ„ โ€“ Prediction Prediction Encoding Word Embedding Maximum likelihood estimation โˆด ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚  ๋‹จ์–ด ๐’˜๐’˜๐’Š๐’Š ์˜ log ํ™•๋ฅ ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋ก ํ•™์Šต. ๐’”๐’”(๐’‰๐’‰, ๐’˜๐’˜๐’Š๐’Š):scoring function (context ๐’‰๐’‰ โ‰ˆtarget word ๐’˜๐’˜?) ๐‘พ๐‘พ๐’Š๐’Š :NeuralLMmodel์˜weight(Cdimension, V๊ฐœ์กด์žฌ) ๐’ƒ๐’ƒ๐’Š๐’Š :NeuralLMmodel์˜bias(Vdimension, ) ๐’‰๐’‰:context ๋ฅผ์ธ์ฝ”๋”ฉํ•œ๋ฒกํ„ฐ
  • 17. 1. Language Model 17 1-3. Standard Neural LM KOREAN INFORMATION PROCESSING Presentation 3๋‹จ๊ณ„ โ€“ Prediction Prediction Encoding Word Embedding Maximum likelihood estimation ๋…ผ๋ฌธ์—๋Š” C dimension ์ด๋ผ๊ณ  ๊ธฐ์ˆ ๋˜์–ด ์žˆ์œผ๋‚˜, ๊ณ„์‚ฐ์ด ๋งž์ง€ ์•Š์•„ ํ•ด๋‹น ๋…ผ๋ฌธ์˜ 1์ €์ž(Yunchuan Chen) ์—๊ฒŒ ๋ฉ”์ผ์„ ๋“œ๋ฆฐ ๊ฒฐ ๊ณผ V dimension ์ด ๋งž๋‹ค๋Š” ํ™•๋‹ต์„ ๋ฐ›์•˜๊ณ , ๊ณง ๋…ผ๋ฌธ์„ ์ˆ˜์ •ํ•˜ ์—ฌ arXiv์— ์ด๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ๊ฒŒ์žฌํ•˜์‹ค ๊ฒƒ์ด๋ผ๊ณ  ํ•จ. ๐’ƒ๐’ƒ๐’Š๐’Š :NeuralLMmodel์˜bias(Vdimension )
  • 18. 1. Language Model 18 1-3. Standard Neural LM KOREAN INFORMATION PROCESSING Presentation 3๋‹จ๊ณ„ โ€“ Prediction Prediction Encoding Word Embedding โˆด ๐‘ฝ๐‘ฝ ๋‚ด์— ์žˆ๋Š” ๋ชจ๋“  ๋‹จ์–ด ๐’˜๐’˜๐’Š๐’Š ์— ๋Œ€ํ•ด ๊ฐ๊ฐ ์„ ๋ชจ๋‘ ๊ณ„์‚ฐํ•ด์ฃผ์–ด์•ผ ํ•จ Maximum likelihood estimation
  • 19. 1. Language Model 19 1-3. Standard Neural LM KOREAN INFORMATION PROCESSING Presentation ๋ฌด์—‡์ด ๋ฌธ์ œ์ธ๊ฐ€? Time complexity Memory complexity Vocabulary size dependent โ‰ˆ time complexity parameters โˆ Vocabulary size Hierarchical softmax by Bayesian network Importance sampling (Bengio & Senecal) Noise Contrastive Estimation, etcโ€ฆ Differentiated softmax (Chen et al.) W๋งŒ compress, input์€ ๊ทธ๋Œ€๋กœ์ธ ๋‹จ์ 
  • 20. 1. Language Model 20KOREAN INFORMATION PROCESSING Presentation ํ†ต๊ณ„์  ๊ธฐ๋ฐ˜์˜ parameter estimation unseen word์— ๋Œ€ํ•ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†์Œ. FFNN, RNN 1-3. Standard Neural LM ๋ฌด์—‡์ด ๋ฌธ์ œ์ธ๊ฐ€? ๊ทธ๋Ÿฌ๋‚˜, Neural LM ์—์„œ๋„ infrequent, ์ฆ‰ ์ž์ฃผ ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด๋Š” ๋งค์šฐ ์ ์€ ํšŸ์ˆ˜๋กœ ์—…๋ฐ์ดํŠธ ๋˜์–ด ์ œ๋Œ€๋กœ ํ•™์Šต์ด ๋˜์ง€ ์•Š๊ณ  ๋ฉ”๋ชจ๋ฆฌ์™€ ์‹œ๊ฐ„ ๋ณต์žก๋„ ๋งŒ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฌธ์ œ์ ์œผ๋กœ ๋‚จ์•„์žˆ๋‹ค. ์ด๋ฅผ ์ข€ ๋” ํšจ์œจ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์—†์„๊นŒ?
  • 21. Proposed Model 2-1. Sparse Representation 2-2. Embedding Compression 21KOREAN INFORMATION PROCESSING Presentation 2-3. Prediction Compression 2-4. ZRegression NCE
  • 22. 2. Proposed Model 22 2-1. Sparse Representation KOREAN INFORMATION PROCESSING Presentation ์˜์˜์‚ฌ์ „, ์™œ ์“ฐ๋Š”๊ฐ€? [-] lung disease that is otherwise known as silicosis. ์ž์ฃผ ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š”, ์–ด๋ ค์šด, ๋ชจ๋ฅด๋Š” ๋‹จ์–ด๋Š” ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ์•Œ๊ธฐ ์‰ฌ์šด ๋‹จ์–ด๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค!
  • 23. 2. Proposed Model 23 2-1. Sparse Representation KOREAN INFORMATION PROCESSING Presentation ์˜์˜์‚ฌ์ „, ์™œ ์“ฐ๋Š”๊ฐ€? [-] lung disease that is otherwise known as silicosis. ์ž์ฃผ ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š”, ์–ด๋ ค์šด, ๋ชจ๋ฅด๋Š” ๋‹จ์–ด๋Š” ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ์•Œ๊ธฐ ์‰ฌ์šด ๋‹จ์–ด๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค! Infrequent word ์— ๋Œ€ํ•ด, ๋ฌด์ž‘์ • vocabulary size๋ฅผ ๋Š˜๋ฆด ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋นˆ์ถœ ๋‹จ์–ด์˜ linear combination์œผ๋กœ ์ •์˜ํ•ด๋ณด์ž!!
  • 24. 2. Proposed Model 24 2-1. Sparse Representation KOREAN INFORMATION PROCESSING Presentation Embedding ๊ตฌ์กฐ ๐‘ฝ๐‘ฝ : Vocabulary ๐‘ฉ๐‘ฉ : Base set 8k of common words ๐‘ช๐‘ช : Uncommon words vocabulary ํฌ๊ธฐ์— ๋น„๋ก€ํ•˜ ์—ฌ complexity ์ฆ๊ฐ€ ๋ฐฉ์ง€ ์ด๋ฅผ ํ†ตํ•ด ๋‹จ์–ด๋ฅผ vector ์— mapping ํ•  ๋•Œ์˜ ์ฐจ ์›์„ ์ถ•์†Œ ์ถ•์†Œ ๊ณผ์ •์—์„œ, sparse vector ๋ฅผ ์ด์šฉํ•˜๋ฏ€๋กœ ๋ณต์žก๋„ ์ ˆ์•ฝ
  • 25. 2. Proposed Model 25 2-1. Sparse Representation KOREAN INFORMATION PROCESSING Presentation Word Embedding ํ•™์Šต ๐‘ฝ๐‘ฝ : Vocabulary ๐‘ฉ๐‘ฉ : Base set 8k of common words ๐‘ช๐‘ช : Uncommon words ์ „์ฒด vocabulary์— ๋Œ€ํ•ด SkipGram ๋ฐฉ์‹์œผ๋กœ embedding Common words ๐‘ฉ๐‘ฉ ๊ฐœ์— ๋Œ€ํ•œ embedding์„ ์ถ”์ถœ
  • 26. 2. Proposed Model 26 2-1. Sparse Representation KOREAN INFORMATION PROCESSING Presentation Common Base Set ์ถ”์ถœ ๐‘ฝ๐‘ฝ : Vocabulary ๐‘ฉ๐‘ฉ : Base set 8k of common words ๐‘ช๐‘ช : Uncommon words sparse code ๐’™๐’™ ๐’™๐’™ : one-hot vector (๐‘ฉ๐‘ฉโ€™s iโ€™th word) ์ผ ๋•Œ, needs to learn sparse representation ๐’™๐’™
  • 27. 2. Proposed Model 27 2-1. Sparse Representation KOREAN INFORMATION PROCESSING Presentation Sparse vector ํ•™์Šต ๐‘ฝ๐‘ฝ : Vocabulary ๐‘ฉ๐‘ฉ : Base set 8k of common words ๐‘ช๐‘ช : Uncommon words : fitting loss Sparse coding representation โ‰ˆ โ€œtrueโ€œ representation : ๐’๐’๐Ÿ๐Ÿ regularizer ๐’™๐’™ ์˜ sum 1 ์ด ๋˜๋„๋ก ํ•จ : regularization term : non-negative
  • 28. 2. Proposed Model 28 2-1. Sparse Representation KOREAN INFORMATION PROCESSING Presentation Optimization function ๐‘ฝ๐‘ฝ : Vocabulary ๐‘ฉ๐‘ฉ : Base set 8k of common words : ๐‘ณ๐‘ณ(๐’™๐’™) : ๐‘น๐‘น๐Ÿ๐Ÿ(๐’™๐’™) : ๐‘น๐‘น๐Ÿ๐Ÿ(๐’™๐’™) : ๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„๐’„ ๋ฒ”์œ„์—์„œ๋ฒ—์–ด๋‚œ๊ฐ’์„๋ฒ”์œ„๋‚ด์˜ ๊ฐ’์œผ๋กœ์กฐ์ •
  • 29. 2. Proposed Model 29 2-1. Sparse Representation KOREAN INFORMATION PROCESSING Presentation Objective function ๐‘ฝ๐‘ฝ : Vocabulary ๐‘ฉ๐‘ฉ : Base set 8k of common words ์ด ๋•Œ, fitting๊ณผ regularization์ด ์ผ์ •ํ•œ ๋น„์œจ ์„ ์ด๋ฃจ๋„๋ก updateํ•œ๋‹ค.
  • 30. 2. Proposed Model 30 2-2. Embedding Compression KOREAN INFORMATION PROCESSING Presentation Uncommon Word Representation ๐‘ฉ๐‘ฉ : Base set 8k of common words ๐‘ช๐‘ช : Uncommon words non-negative sparse code ๐’™๐’™ โˆŠ โ„๐‘ฉ๐‘ฉ uncommon word ์ด ๋•Œ, U๊ฐ€ dense ํ•˜๋ฏ€๋กœ ์—ญ์‹œ sparse ํ•˜์ง€ ์•Š๋‹ค.
  • 31. 2. Proposed Model 31 2-3. Prediction Compression KOREAN INFORMATION PROCESSING Presentation Flashback Prediction Encoding Word Embedding Maximum likelihood estimation โˆด ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚  ๋‹จ์–ด ๐’˜๐’˜๐’Š๐’Š ์˜ log ํ™•๋ฅ ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋ก ํ•™์Šต. ๐’”๐’”(๐’‰๐’‰, ๐’˜๐’˜๐’Š๐’Š):scoring function (context ๐’‰๐’‰ โ‰ˆtarget word ๐’˜๐’˜?) ๐‘พ๐‘พ๐’Š๐’Š :NeuralLMmodel์˜weight(Cdimension,V๊ฐœ์กด์žฌ) ๐’ƒ๐’ƒ๐’Š๐’Š :NeuralLMmodel์˜bias(Vdimension, ) ๐’‰๐’‰:context ๋ฅผ์ธ์ฝ”๋”ฉํ•œ๋ฒกํ„ฐ
  • 32. 2. Proposed Model 32 2-3. Prediction Compression KOREAN INFORMATION PROCESSING Presentation Compressing Output of Neural LM ์ „์ฒด Embedding์— ๋Œ€ํ•œ weight์„ ๐‘พ๐‘พ, bias๋ฅผ ๐’ƒ๐’ƒ๋ผ ํ•œ๋‹ค. ์ด ๋•Œ, ์ด ์ค‘ common words์— ๋Œ€ํ•œ weight ๐‘ซ๐‘ซ, bias ๐’„๐’„ ๋กœ ๐‘พ๐‘พ, ๐’ƒ๐’ƒ ํ‘œํ˜„ Word Embedding ์—์„œ๋Š” ๊ณ„์‚ฐ๋œ sparse code (๐’™๐’™)๋ฅผ ํ†ตํ•ด ์ด๋ฅผ ํ‘œํ˜„
  • 33. 2. Proposed Model 33 2-3. Prediction Compression KOREAN INFORMATION PROCESSING Presentation Word Embedding โ†’ Prediction Common word ์— ๋Œ€ํ•œ embedding ๐‘ผ๐‘ผ ๋ฅผ ๋ฏธ๋ฆฌ ์•Œ ์ˆ˜ ์žˆ์Œ. โ†’ rare word ์— ๋Œ€ํ•œ sparse vector ๐’™๐’™ ๋ฅผ ๋ฏธ๋ฆฌ ์•Œ ์ˆ˜ ์žˆ์Œ. ๊ทธ๋Ÿฌ๋‚˜ Output Weight, bias์—์„œ๋Š” ํ•™์Šตํ•  ๋Œ€์ƒ์ด ์—†์œผ๋ฏ€๋กœ ๐’™๐’™ ๋ฅผ ์•Œ ์ˆ˜ ์—†์Œ. Embedding ์‹œ ๊ณ„์‚ฐ๋œ sparse code (๐’™๐’™)๋ฅผ prediction์—์„œ ์‚ฌ์šฉํ•˜์—ฌ, Infrequentํ•œ word์— ๋Œ€ํ•œ weight์™€ bias๋ฅผ frequent word์— ๋Œ€ํ•œ parameter์™€ sparse code ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ . Then, how do we obtain sparse vector ๐’™๐’™ for output weight & bias?
  • 34. 2. Proposed Model 34 2-3. Prediction Compression KOREAN INFORMATION PROCESSING Presentation Compressing Output of Neural LM context ๐’‰๐’‰์™€ word ๐’˜๐’˜๊ฐ€ ๋น„์Šทํ•œ structure ๋ฅผ ๊ฐ€์ง€๋ฏ€๋กœ, Word Embedding๊ณผ output weight ์—ญ์‹œ๋„ ๋น„์Šทํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์งˆ ๊ฒƒ. ๋”ฐ๋ผ์„œ, word์—์„œ ์‚ฌ์šฉํ•œ sparse code ๐’™๐’™ ๋ฅผ ๐–๐–์—์„œ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
  • 35. 2. Proposed Model 35 2-4. ZRegression NCE KOREAN INFORMATION PROCESSING Presentation Noise-Contrastive Estimation Gutmann and Hyvarinen, 2012 Non-linear logistic regression Noise-Contrastive Estimation Discriminate real data and artificial data (=noise) Log-density function Maximize log probability of softmax
  • 36. 2. Proposed Model 36 2-4. ZRegression NCE KOREAN INFORMATION PROCESSING Presentation Noise-Contrastive Estimation ์ถœ์ฒ˜ : https://www.tensorflow.org/versions/r0.10/tutorials/word2vec/index.html Maximum Likelihood Estimation Noise-Contrastive Estimation
  • 37. 2. Proposed Model 37 2-4. ZRegression NCE KOREAN INFORMATION PROCESSING Presentation NCE vs. MLE ์ถœ์ฒ˜ : https://www.tensorflow.org/versions/r0.10/tutorials/word2vec/index.html Maximum Likelihood Estimation Noise-Contrastive Estimation vocabulary ๋‚ด ๋ชจ๋“  word ์— ๋Œ€ํ•ด softmax estimation ์„ ์‹คํ–‰ํ•œ๋‹ค. noise distribution k๊ฐœ์— ๋Œ€ํ•ด์„œ๋งŒ softmax estimation ์„ ์‹คํ–‰ํ•œ๋‹ค. ํ•˜๋‚˜์˜ positive data์— ๋Œ€ํ•ด noise๋ฅผ k๊ฐœ๋งŒํผ ์ƒ์„ฑ ํ•˜๋‚˜์˜ positive data์— ๋Œ€ํ•ด ์ „์ฒด word์˜ probability๋ฅผ ํ•„์š”๋กœ ํ•จ.
  • 38. 2. Proposed Model 38 2-4. ZRegression NCE KOREAN INFORMATION PROCESSING Presentation Noise-Contrastive Estimation MLE ๋ฐฉ์‹์—์„œ๋Š” ๐’๐’๐’‰๐’‰ = ๋ฅผ ์ „์ฒด vocabulary์— ๋Œ€ํ•ด ๊ตฌํ•œ๋‹ค. Noise-Contrastive Estimation ๊ทธ๋Ÿฌ๋‚˜ NCE ๋ฐฉ์‹์—์„œ๋Š” ๐’๐’๐’‰๐’‰๊ฐ€ h์— dependentํ•˜์—ฌ ์ ์šฉํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค. Mnih & The (2012)์— ์˜ํ•ด, ๐’๐’๐’‰๐’‰ = ๐Ÿ๐Ÿ ๋กœ ๊ฐ€์ •ํ•˜์—ฌ๋„ ์ผ๋ฐ˜์ ์œผ๋กœ ์„ฑ๋ฆฝํ•œ๋‹ค.
  • 39. 2. Proposed Model 39 2-4. ZRegression NCE KOREAN INFORMATION PROCESSING Presentation Noise-Contrastive Estimation Noise-Contrastive Estimation Mnih & The (2012)์— ์˜ํ•ด, ๐’๐’๐’‰๐’‰ = ๐Ÿ๐Ÿ ๋กœ ๊ฐ€์ •. ๐’˜๐’˜๐’Š๐’Š๊ฐ€ ๐’‰๐’‰์— ๋“ฑ์žฅํ•  log probability ๐’˜๐’˜๐’‹๐’‹๊ฐ€ ๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’์— ๋“ฑ์žฅํ•  log probability ์›ํ•˜๋Š” ๋‹จ์–ด๋Š” context์—์„œ, ๋‚˜๋จธ์ง€ ๋‹จ์–ด๋Š” noise์—์„œ ๋“ฑ์žฅํ•  ํ™•๋ฅ  ์ด ๋†’์•„์ง€๋„๋ก ํ•™์Šต. ๐‘ท๐‘ท๐’๐’ โˆถ negative sample(noise) ์—์„œ ์ถ”์ถœํ•  ํ™•๋ฅ 
  • 40. 2. Proposed Model 40 2-4. ZRegression NCE KOREAN INFORMATION PROCESSING Presentation Noise-Contrastive Estimation ZRegression Objective function : ๐’๐’๐’‰๐’‰ = ๐Ÿ๐Ÿ ์˜ ๊ฒฝ์šฐ unstability ๋ฐœ์ƒ ๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’๐’
  • 41. 2. Proposed Model 41 2-4. ZRegression NCE KOREAN INFORMATION PROCESSING Presentation ZRegression NCE ZRegression : Regression Layer
  • 42. Evaluation 3-1. Dataset 3-2. Qualitative Analysis 42KOREAN INFORMATION PROCESSING Presentation 3-3. Quantitative Analysis
  • 43. 3. Evaluation 43 3-1. Dataset KOREAN INFORMATION PROCESSING Presentation ์œ„ํ‚คํ”ผ๋””์•„ 2014๋…„ ์ž๋ฃŒ v 100M for neural LM 2014 Wikipedia dump Preprocessing 1.6 billion running words Train / Validation / Test split All for backoff n-gram
  • 44. 3. Evaluation 44 3-2. Qualitative Analysis KOREAN INFORMATION PROCESSING Presentation 8000๊ฐœ์˜ common โ†’ 2k~24k๊ฐœ์˜ uncommon v ๐‘ฉ๐‘ฉ : Base set 8k of common words ๐‘ช๐‘ช : Uncommon words 2k ~ 24k v Pre-trained word embedding ์ด์šฉ Adam Optimizer Small coefficients
  • 45. 3. Evaluation 45 3-2. Qualitative Analysis KOREAN INFORMATION PROCESSING Presentation 8000๊ฐœ์˜ common โ†’ 2k~24k๊ฐœ์˜ uncommon v ๋‹จ์ˆ˜ํ˜• โ†’ common Uncommon(rare)์˜ coefficient๊ฐ€ 0.6 ์ด์ƒ ์œ ์˜๋ฏธํ•˜๊ฒŒ mapping ๋จ ๋ณต์ˆ˜ํ˜• โ†’ rare coefficient commonwords Uncommon(rare)์˜ sparse representation
  • 46. 3. Evaluation 46 3-3. Quantitative Analysis KOREAN INFORMATION PROCESSING Presentation Perplexity measure v N : Test corpus์˜ running words ๊ฐœ์ˆ˜ Lower the Better, ๋‚ฎ์„์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๋” ๋›ฐ์–ด๋‚จ์„ ์˜๋ฏธ LSTM-RNN ์„ ํ†ตํ•ด Encoding 200d Hidden Layer, Adam Optimization Given context์—์„œ, ์–ผ๋งˆ๋‚˜ ์ฃผ์–ด์ง„ word๊ฐ€ ์•Œ๋งž๊ฒŒ ๋„์ถœ๋˜๋Š”๊ฐ€?
  • 47. Perplexity of each models Memory reduction compared to LSTM-z 3. Evaluation 47 3-3. Quantitative Analysis KOREAN INFORMATION PROCESSING Presentation Perplexity measure v - KN3 Knerser-Ney smoothing technique on 3-gram LM - LBL5 (5 preceding words) Log-bilinear model (Mnih and Hinton, 2007) - LSTM-s Standard LSTM-RNN LM - LSTM-z Enhanced with ZRegression - LSTM-z, wb Compressing both weights and biases in Prediction - LSTM-z, w Compressing only weights in prediction
  • 48. 3. Evaluation 48 3-3. Quantitative Analysis KOREAN INFORMATION PROCESSING Presentation Vocabulary Size X PPL Vocabulary size๊ฐ€ ์ฆ๊ฐ€ํ•˜์—ฌ๋„, Compressionํ•œ ๋ฐฉ๋ฒ•์—์„œ๋Š” ๊ฑฐ์˜ Memory๊ฐ€ ์ฆ๊ฐ€ํ•˜์ง€ ์•Š์Œ ๋ฐ˜๋ฉด, Compression์„ ํ•˜์—ฌ๋„ KN3๋ณด๋‹ค PPL์ด ๋‚ฎ์Œ( ๋‚ฎ์„์ˆ˜๋ก ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ )
  • 49. 3. Evaluation 49 3-3. Quantitative Analysis KOREAN INFORMATION PROCESSING Presentation Method Comparison KN3 ์ด ๋” ๋„“์€ corpus์—์„œ training ๋์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , LSTMs์˜ ์„ฑ๋Šฅ์ด ๋” ์ข‹์Œ. LSTM-z,w > LSTM-z (80% ๊ฐ€๋Ÿ‰์˜ memory reduction + ์„ฑ๋Šฅํ–ฅ์ƒ)
  • 50. 3. Evaluation 50 3-3. Quantitative Analysis KOREAN INFORMATION PROCESSING Presentation ์™œ LSTM-z,w > LSTM-z,wb ?? Compression์˜ ๊ฒฐ๊ณผ, ์ •๋ณด ์†์‹ค์ด ๋ฐœ์ƒํ•˜๋Š” ๋Œ€์‹  rare word์— ๋Œ€ํ•œ ์ •ํ™•๋„๊ฐ€ ๋†’์•„์ง„๋‹ค. LSTM-z,wb์˜ ๊ฒฝ์šฐ, bias๋ฅผ compressํ•  ๋•Œ์˜ ์†์‹ค์ด ์ •ํ™•๋„ ์ƒ์Šน๋ถ„๋ณด๋‹ค ํฌ๋‹ค.
  • 51. 3. Evaluation 51 3-4. Conclusion KOREAN INFORMATION PROCESSING Presentation Compressing NL by Sparse Word Representation Sparse Linear Combinations for Rare Words, ์ด๋ฅผ ํ†ตํ•ด embedding dimension๊ณผ prediction dimension์„ ํ˜„์ €ํ•˜๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, infrequent word๊ฐ€ ์ œ๋Œ€๋กœ ํ•™์Šต๋˜์ง€ ์•Š๋Š” ๋ฌธ์ œ ๋˜ํ•œ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, vocabulary size์™€ ๋น„๋ก€ํ•˜์—ฌ memory์™€ time complexity ๊ฐ€ ๊ฐ™์ด ์ฆ๊ฐ€ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ, vocabulary size์™€ ๊ฑฐ์˜ ๋ฌด๊ด€ํ•˜๊ฒŒ ์ผ์ •ํ•œ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋Ÿ‰์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋Ÿ‰์„ ์ค„์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  performance ์—ญ์‹œ ์ฆ๊ฐ€ํ•จ(PPL ์ˆ˜์น˜์˜ ๊ฐ์†Œ)์„ ํ™• ์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
  • 52. Thank you for your attention! 52