SlideShare a Scribd company logo
Mark Fishel, TartuNLP
MoMo Estonia / AI & ML
January 14, 2019
Natural organic non-GMO
bio-degradable eco-friendly
Language Processing
or NLP yesterday, today and tomorrow
Natural organic non-GMO
bio-degradable eco-friendly
Language Processing
or NLP yesterday, today and tomorrow
or NLP today, tomorrow and the day after
Mark Fishel, TartuNLP
MoMo Estonia / AI & ML
January 14, 2019
AI
NLP
● end-user applications
○ translation (neurotolge.ee)
○ text↔speech (neurokone.ee)
○ text mining, information extraction (texta.ee)
○ chat bots
○ world domination, destruction of humanity
○ etc.
● components
● analysis, linguistics
● etc.
NLP
Why?
● NLP makes mistakes!
● in practice: semi-automation, post-editing,
etc.
Why?
1. Step-by-step NLP
● solve separate steps / components
○ via ML, rules, etc.
○ one by one
● put them in a pipeline
○ for that we have to (think we) understand how it works
● …
● profit!
NLP before: step-by-step
ET: ?
LV: Vai tev ir labāka ideja?
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: Kas
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: Kas sul
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: Kas sul on
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: Kas sul on parem idee
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
ET: Kas sul on parem idee?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
Actual translation:
● segment input
● translate pieces
● reorder
● put in context
● …
ET: Kas sul on parem idee?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev ir pārtulkotu tekstu?
ET: Sul peaks olema palju tõlketekste.
LV: Tev jābūt daudz pārtulkotu tekstu.
Statistical Translation
Text-to-speech
1. text to phonemes
e.g. through → [θru], reason → ['rizən]
2. pronunciation for phonemes (or pairs of
phonemes): e.g.
θr →
3. “glue” pieces together → speech
2. End-to-end NLP/ML
● gather input/output examples for the
end-user task
○ (sentence text, speech)
○ (Estonian sentence, English sentence)
● teach end-to-end deep neural black magic
to go from input to output
○ ignore how we think it should be done
● …
● profit!
NLP now: end-to-end
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
Thank you very …?
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
Thank you very much.
Would you like tea or …?
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
Thank you very much.
Would you like tea or coffee?
Dear …?
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
Thank you very much.
Would you like tea or coffee?
Dear (ladies and gentlemen / mom / …)
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art approach
→ …
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art approach
→ nad …
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art approach
→ nad kasutasid …
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art approach
→ nad kasutasid kaasaegset …
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art approach
→ nad kasutasid kaasaegset meetodit …
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art approach
→ nad kasutasid kaasaegset meetodit KÕIK.
Neural Translation
p(They used a state-of-the-art approach, nad...) =
= neural_estimator(x, y) =
= { kasutasid: 0.67,
rakendasid: 0.21,
kasutavad: 0.04,
… }
Neural Translation
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art approach
→ nad kasutasid kaasaegset meetodit KÕIK.
Speech↔text: same/similar end-to-end approach
End-to-end NLP
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art approach
→ nad kasutasid kaasaegset meetodit KÕIK.
Speech↔text: same/similar end-to-end approach
NB: needs lots of explicit examples (data)
End-to-end NLP
3. NLP/ML with no data
● explicit data is expensive and wasteful
● what to do for tasks without it?
NLP/ML with no explicit data
Unsupervised Translation
https://aclweb.org/anthology/D18-1549.pdf
Learn from:
A. Tere! Minu nimi on Juhan.
Kui ma eelmisel korral
sellest pildist olen…..
B. We must address this
question as soon as
possible. Why have we
not…..
Task: Translate between English and Estonian
without a single translation example!
https://aclweb.org/anthology/D18-1549.pdf
Unsupervised Translation
A. Tere! Minu nimi on Juhan.
Kui ma eelmisel korral
sellest pildist olen…..
B. We must address this
question as soon as
possible. Why have we
not…..
Or: translate dog barks / kid speech
Unsupervised Translation
Estonian English
Latvian Swedish
Zero-shot learning
https://arxiv.org/abs/1611.04558
Estonian English
Latvian Swedish
Zero-shot learning
https://arxiv.org/abs/1611.04558
● style transfer
○ “that’s weird” → “that is strange”
● correcting errors
○ “i biggest your fan” → “I am your biggest fan”
click
Zero-shot NLP demo
● “data + task understanding” is stable
● “data + end-to-end neural networks” is
cool and promising
● “no data, thing still works” is sexy!
Message to take home
Thanks!
neurotolge.ee
neurokone.ee
livesubs.ee
nlp.cs.ut.ee

More Related Content

PDF
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
PPTX
Introduction to Natural Language Processing
PDF
Practical Natural Language Processing
PDF
Intro to NLP. Lecture 2
PPTX
Natural Language processing
PPT
NLP new words
PPTX
A Panorama of Natural Language Processing
DOCX
Natural language processing
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Introduction to Natural Language Processing
Practical Natural Language Processing
Intro to NLP. Lecture 2
Natural Language processing
NLP new words
A Panorama of Natural Language Processing
Natural language processing

What's hot (20)

PPTX
Natural language processing
PDF
Natural language processing
PPT
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
PDF
GATE : General Architecture for Text Engineering
PPT
Natural Language Processing for Games Research
DOCX
Langauage model
PPTX
From NLP to text mining
PDF
Introduction to natural language processing
PDF
Natural Language Processing seminar review
PPTX
Past, Present, and Future: Machine Translation & Natural Language Processing ...
PDF
Text analysis and Semantic Search with GATE
PPT
Natural language procssing
PDF
Text Analysis and Semantic Search with GATE
PDF
Natural language processing
PDF
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
PPTX
PPT
Natural language processing
PPTX
NLP pipeline in machine translation
Natural language processing
Natural language processing
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
GATE : General Architecture for Text Engineering
Natural Language Processing for Games Research
Langauage model
From NLP to text mining
Introduction to natural language processing
Natural Language Processing seminar review
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Text analysis and Semantic Search with GATE
Natural language procssing
Text Analysis and Semantic Search with GATE
Natural language processing
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
Natural language processing
NLP pipeline in machine translation
Ad

Similar to Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute of Computer Science, University of Tartu (20)

PDF
CSCE181 Big ideas in NLP
PDF
Beyond the Symbols: A 30-minute Overview of NLP
PDF
Practical Natural Language Processing
PDF
cs224n natural language processing with deep learning cs224n
PPTX
NLP Introduction for engineering stuedents.pptx
PDF
Yves Peirsman - Deep Learning for NLP
PPTX
Nltk
PDF
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
PPTX
naturallanguageprocessingnlp-231215172843-839c05ab.pptx
PDF
Introduction to NLTK
PDF
The Latest Advances in Patent Machine Translation
PDF
Collaboration, Languages and Big Data, Anne Sophie Roessler, Deployment Strat...
PPTX
Natural Language Processing (NLP)
PPTX
A Simple Explanation of XLNet
PDF
The NLP Muppets revolution!
PPTX
Past, Present, and Future: Machine Translation & Natural Language Processing ...
PPTX
Introduction to Natural Language Processing (NLP)
PDF
Learning to Translate with Joey NMT
PPTX
Natural Language Processing.pptx
PPTX
Natural Language Processing.pptx
CSCE181 Big ideas in NLP
Beyond the Symbols: A 30-minute Overview of NLP
Practical Natural Language Processing
cs224n natural language processing with deep learning cs224n
NLP Introduction for engineering stuedents.pptx
Yves Peirsman - Deep Learning for NLP
Nltk
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
naturallanguageprocessingnlp-231215172843-839c05ab.pptx
Introduction to NLTK
The Latest Advances in Patent Machine Translation
Collaboration, Languages and Big Data, Anne Sophie Roessler, Deployment Strat...
Natural Language Processing (NLP)
A Simple Explanation of XLNet
The NLP Muppets revolution!
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Introduction to Natural Language Processing (NLP)
Learning to Translate with Joey NMT
Natural Language Processing.pptx
Natural Language Processing.pptx
Ad

More from MobileMonday Estonia (20)

PPTX
Modern problems in backend engineering, Siim Kaspar Uustalu
PPTX
Modern problems in backend engineering, Marten Meikop
PPTX
Modern problems in backend engineering, Asko Tiidumaa
PPTX
Modern problems in backend engineering, Joel Mislav Kunst
PPTX
Modern problems in backend engineering, Jüri Tarkpea
PPTX
Scientists meet Entrepreneurs - AI & Machine Learning, Kristjan Korjus, Starship
PDF
Scientists meet Entrepreneurs - AI & Machine Learning, Peeter Piksarv, Moonca...
PPTX
Scientists meet Entrepreneurs - AI & Machine Learning, Tambet Matiisen, Unive...
PDF
Scientists meet Entrepreneurs - AI & Machine Learning, Dima Fishman, Universi...
PDF
Space Edition, Sven Lilla, ESA BIC
PPTX
Space Edition, Kadri Bussov, EST Cube
PPTX
Space Edition, Kalev Koppel, KappaZetta
PPTX
Space Edition, Dr. Ali Nadir Arslan
PPTX
Product Marketing, Kair Käsper, Pipedrive
PDF
Product Marketing, Marelle Ellen
PPTX
Product Marketing, Mattias Liivak, Fortumo
PDF
What Does it take to Develop Kickass Products?, Laura Noodapera
ODP
What Does it take to Develop Kickass Products?, Britt Maasalu
PPTX
Meeting Female Entrepreneurs in Tech, Triinu Sirge
PDF
Meeting Female Entrepreneurs in Tech, Triin Kask
Modern problems in backend engineering, Siim Kaspar Uustalu
Modern problems in backend engineering, Marten Meikop
Modern problems in backend engineering, Asko Tiidumaa
Modern problems in backend engineering, Joel Mislav Kunst
Modern problems in backend engineering, Jüri Tarkpea
Scientists meet Entrepreneurs - AI & Machine Learning, Kristjan Korjus, Starship
Scientists meet Entrepreneurs - AI & Machine Learning, Peeter Piksarv, Moonca...
Scientists meet Entrepreneurs - AI & Machine Learning, Tambet Matiisen, Unive...
Scientists meet Entrepreneurs - AI & Machine Learning, Dima Fishman, Universi...
Space Edition, Sven Lilla, ESA BIC
Space Edition, Kadri Bussov, EST Cube
Space Edition, Kalev Koppel, KappaZetta
Space Edition, Dr. Ali Nadir Arslan
Product Marketing, Kair Käsper, Pipedrive
Product Marketing, Marelle Ellen
Product Marketing, Mattias Liivak, Fortumo
What Does it take to Develop Kickass Products?, Laura Noodapera
What Does it take to Develop Kickass Products?, Britt Maasalu
Meeting Female Entrepreneurs in Tech, Triinu Sirge
Meeting Female Entrepreneurs in Tech, Triin Kask

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Quality review (1)_presentation of this 21
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Reliability_Chapter_ presentation 1221.5784
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
STUDY DESIGN details- Lt Col Maksud (21).pptx
IB Computer Science - Internal Assessment.pptx
climate analysis of Dhaka ,Banglades.pptx
Miokarditis (Inflamasi pada Otot Jantung)
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Supervised vs unsupervised machine learning algorithms
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Quality review (1)_presentation of this 21
Mega Projects Data Mega Projects Data
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Data_Analytics_and_PowerBI_Presentation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute of Computer Science, University of Tartu

  • 1. Mark Fishel, TartuNLP MoMo Estonia / AI & ML January 14, 2019 Natural organic non-GMO bio-degradable eco-friendly Language Processing or NLP yesterday, today and tomorrow
  • 2. Natural organic non-GMO bio-degradable eco-friendly Language Processing or NLP yesterday, today and tomorrow or NLP today, tomorrow and the day after Mark Fishel, TartuNLP MoMo Estonia / AI & ML January 14, 2019
  • 4. ● end-user applications ○ translation (neurotolge.ee) ○ text↔speech (neurokone.ee) ○ text mining, information extraction (texta.ee) ○ chat bots ○ world domination, destruction of humanity ○ etc. ● components ● analysis, linguistics ● etc. NLP
  • 6. ● NLP makes mistakes! ● in practice: semi-automation, post-editing, etc. Why?
  • 8. ● solve separate steps / components ○ via ML, rules, etc. ○ one by one ● put them in a pipeline ○ for that we have to (think we) understand how it works ● … ● profit! NLP before: step-by-step
  • 9. ET: ? LV: Vai tev ir labāka ideja? Statistical Translation
  • 10. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 11. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 12. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 13. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 14. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 15. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 16. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 17. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 18. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 19. ET: Kas LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 20. ET: Kas sul LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 21. ET: Kas sul on LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 22. ET: Kas sul on parem idee LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 23. ET: Kas sul on parem idee? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 24. Actual translation: ● segment input ● translate pieces ● reorder ● put in context ● … ET: Kas sul on parem idee? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  • 25. Text-to-speech 1. text to phonemes e.g. through → [θru], reason → ['rizən] 2. pronunciation for phonemes (or pairs of phonemes): e.g. θr → 3. “glue” pieces together → speech
  • 27. ● gather input/output examples for the end-user task ○ (sentence text, speech) ○ (Estonian sentence, English sentence) ● teach end-to-end deep neural black magic to go from input to output ○ ignore how we think it should be done ● … ● profit! NLP now: end-to-end
  • 28. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very …? Neural Translation
  • 29. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or …? Neural Translation
  • 30. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or coffee? Dear …? Neural Translation
  • 31. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or coffee? Dear (ladies and gentlemen / mom / …) Neural Translation
  • 32. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → … Neural Translation
  • 33. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad … Neural Translation
  • 34. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid … Neural Translation
  • 35. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset … Neural Translation
  • 36. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit … Neural Translation
  • 37. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Neural Translation
  • 38. p(They used a state-of-the-art approach, nad...) = = neural_estimator(x, y) = = { kasutasid: 0.67, rakendasid: 0.21, kasutavad: 0.04, … } Neural Translation
  • 39. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Speech↔text: same/similar end-to-end approach End-to-end NLP
  • 40. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Speech↔text: same/similar end-to-end approach NB: needs lots of explicit examples (data) End-to-end NLP
  • 41. 3. NLP/ML with no data
  • 42. ● explicit data is expensive and wasteful ● what to do for tasks without it? NLP/ML with no explicit data
  • 44. Learn from: A. Tere! Minu nimi on Juhan. Kui ma eelmisel korral sellest pildist olen….. B. We must address this question as soon as possible. Why have we not….. Task: Translate between English and Estonian without a single translation example! https://aclweb.org/anthology/D18-1549.pdf Unsupervised Translation
  • 45. A. Tere! Minu nimi on Juhan. Kui ma eelmisel korral sellest pildist olen….. B. We must address this question as soon as possible. Why have we not….. Or: translate dog barks / kid speech Unsupervised Translation
  • 46. Estonian English Latvian Swedish Zero-shot learning https://arxiv.org/abs/1611.04558
  • 47. Estonian English Latvian Swedish Zero-shot learning https://arxiv.org/abs/1611.04558
  • 48. ● style transfer ○ “that’s weird” → “that is strange” ● correcting errors ○ “i biggest your fan” → “I am your biggest fan” click Zero-shot NLP demo
  • 49. ● “data + task understanding” is stable ● “data + end-to-end neural networks” is cool and promising ● “no data, thing still works” is sexy! Message to take home