A Vietnamese Language Model Based on Recurrent Neural Network

A Vietnamese Language Model
Based on
Recurrent Neural Network
Viet-Trung Tran, Kiem-Hieu Nguyen, Duc-Hanh Bui
Hanoi University of Science and Technology
1Friday, October 7, 16

Outline
Statistical language model
Current state of the art
RNN for Vietnamese language model
Experimental results
Conclusion
2
Friday, October 7, 16

Statistical language
model
A probability distribution of word sequence
E.g. “go to the airport”
? = P(“airport”|“go to the”)
Applications:
Spelling checkers, smart keyboards
Enhance speed recognition/machine translation
LABAN KEY
3

Challenges
Meaningful
grammatically correct
understandable
Context-aware
E.g. I am from Vietnam. My mother-tongue is Vietnamese
Out of vocabulary
Slang, abbreviations, etc.
4

Common approach
N-gram language model
Katz's back-oﬀ: estimates the conditional
probability of a word given its history in the n-gram
When trigram unavailable -> back-oﬀ to bi-gram
-> uni-gram
SOURCE: HTTPS://EN.WIKIPEDIA.ORG/WIKI/KATZ%27S_BACK-OFF_MODEL
5

N-gram language model
Only see a few words back
Only predict words seen in the same context
6

Deep learning for NLP
Word embedding
(SOCHER ET AL. (2013A))
MIKOLOV ET AL. (2013B).
7

Recurrent neural
network for text
8
INPUT : GO TO THE
OUTPUT : TO THE SCHOOL
PROBABILITY (SCHOOL | GO TO THE)

RNN vs. N-gram
Foldable word context vs. ﬁx n-gam context
Personalization through continuous learning
More meaningful text suggestions
Naturally support phrase, terms suggestions
9

RNN for Vietnamese
language model
Character level language model
{previous characters} -> next characters
Syllable level language model
{previous syllables} -> next syllables
10

LSTM cell
SOURCE: HTTP://COLAH.GITHUB.IO/POSTS/2015-08-
UNDERSTANDING-LSTMS/
11

Stacking multiple layers
12

Experiments
1,500 MOVIES - 2.056.308 SENTENCES
13

Experimental results
14

Conclusion
First neural language model for Vietnamese
Largest experimental dataset
Future work
Word embedding
Neural net compression
Conversational neural machine translation
16

Thank you for your
attention
17

Conversational
Chú hoài linh đẹp trai. Chú hoài linh
Chào buổi sáng
chị hát hay wa!! nghe thick a.
chị khởi my ơi e rất la hâm mộ
chú hoài linh thật đẹp zai và chú Trấn thành đẹp
qá
18

lịch sử ghi nhớ năm 1979
tại hội nghị, đồng chí Phạm Ngọc Thủy Võ Văn
Kiệt
tại hội nghị, đồng chí Hồ Chí Minh nói
tại hội nghị, đồng chí Võ Nguyên Giáp và đồng chí
Hồ Chí Minh đã ngồi ở
tại đại hội Đảng lần thứ nhất vào năm 1945,
Ngay từ những ngày đầu, Đúng như nhận xét của
Giáo sư Nguyễn Văn Linh
19

A Vietnamese Language Model Based on Recurrent Neural Network

More Related Content

What's hot (20)

Similar to A Vietnamese Language Model Based on Recurrent Neural Network (11)

More from Viet-Trung TRAN (20)

Recently uploaded (20)

A Vietnamese Language Model Based on Recurrent Neural Network