SlideShare a Scribd company logo
Natural Language Processing
in practice
Topics
* Overview of NLP
* Getting Data
* Models & Algorithms
* Building an NLP system
* A practical example
A bit about me
* Lisp programmer
* Architect and research lead at Grammarly
(3+ years of NLP work)
* Teacher at KPI: Operating Systems
* Links:
http://lisp-univ-etc.blogspot.com
http://github.com/vseloved
http://twitter.com/vseloved
A bit about Grammarly
(c) xkcd
The best English language writing
enhancement app:
Spellcheck - Grammar check - Style
improvement - Synonyms and word choice -
Plagiarism check
What is NLP?
Transforming free-form text
into structured data and back
Intersection of Comp Sci &
Linguistics & Software Eng
Based on Algorithms, Machine
Learning, and Statistics
Popular NLP problems
* Spam Filtering
* Spelling Correction
* Sentiment Analysis
* Question Answering
* Machine Translation
* Text Summarization
* Search (also IR)
http://www.paulgraham.com/spam.html
http://norvig.com/spell-correct.html
(c) gettyimages
Levels of NLP
* data & tools
* models
* production-ready systems
Role of Linguistics
NLP Data
structured semi-structured–
unstructured–
“Data is ten times more
powerful than algorithms.”
-- Peter Norvig
The Unreasonable
Effectiveness of Data.
http://youtu.be/yvDCzhbjYWs
Kinds of data
* Dictionaries
* Corpora
* User Data
Where to get data?
* Linguistic Data Consortium
http://www.ldc.upenn.edu/
* Google ngrams, book ngrams,
syntactic ngrams
* Wikimedia
* Wordnet
* APIs: Twitter, Wordnik, ...
* University sites: Stanford,
Oxford, CMU, ...
Create your own!
* Linguists
* Crowdsourcing
* By-product
-- Johnatahn Zittrain
http://goo.gl/hs4qB
Tools
* analysis tools
* processing tools
* Unix command line
* XML processing
* Map-reduce systems
* R, Python, Lisp
(c) O'Reilly Media
Algorithms
* Dynamic Programming
* Search Algorithms
* Tree Algorithms
Beyond Algorithms
* CKY constituency parsing
* Noisy channel spelling
correction
* TF-IDF document
classification
* Bayesian filtering
Models
* generative vs discriminative
* statistical vs rule-based
Language Models
Ngrams
Generative ML models:
* Bayesian inference
(bag-of-words model)
* Hidden Markov model
(sequence model)
* Neural networks
(holistic model)
LM + Domain Model
Discriminative Models
* Heuristic
* Maximum Entropy
* “Advanced” LM Models
Going Into Prod
* Translate real-world requirements
into a measurable goal
* Pre- and post- processing
* Don't trust research results
* Gather user feedback
Practical Example:
Language Detection
Idea
Standard approach:
character LM
Let's try an alternative:
word LM
Data – from Wiktionary
Test data from Wikipedia–
Practical ML System
* Training
ML System
* Training
* Evaluation
ML System
* Training
* Evaluation
* Production
Thanks!
Questions?
Vsevolod Dyomkin
@vseloved

More Related Content

PDF
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
PDF
Web scraping in python
PDF
Can functional programming be liberated from static typing?
PPTX
Oles Petriv “Creating one concept embedding space for persons, brands and new...
PPTX
How the Web can change social science research (including yours)
PPTX
Data and Donuts: Data cleaning with OpenRefine
PPTX
Intro to Reproducible Research
PPTX
Semanticnews 230913-final
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
Web scraping in python
Can functional programming be liberated from static typing?
Oles Petriv “Creating one concept embedding space for persons, brands and new...
How the Web can change social science research (including yours)
Data and Donuts: Data cleaning with OpenRefine
Intro to Reproducible Research
Semanticnews 230913-final

What's hot (20)

PDF
Crash-course in Natural Language Processing
PDF
AINL 2016: Kozerenko
PDF
Aspects of NLP Practice
PPTX
Reproducible research
PPTX
Reproducible research concepts and tools
PDF
The State of #NLProc
PPT
How to put an annotation in html
PDF
Crash Course in Natural Language Processing (2016)
PDF
Practical NLP with Lisp
PPTX
An Introduction to Information Retrieval and Applications
PDF
NLP Project Full Cycle
PPTX
Converting Metadata to Linked Data
PPTX
Mapping Australian User-Created Content: Methodological, Technological and E...
PDF
AjayBhullar_Resume (5)
PPTX
Extracting insights from textual data
ODP
OpenRefine - Data Science Training for Librarians
PPTX
NLP and LSA getting started
PDF
csresume_aug2016
PPTX
Milex 2010 final
Crash-course in Natural Language Processing
AINL 2016: Kozerenko
Aspects of NLP Practice
Reproducible research
Reproducible research concepts and tools
The State of #NLProc
How to put an annotation in html
Crash Course in Natural Language Processing (2016)
Practical NLP with Lisp
An Introduction to Information Retrieval and Applications
NLP Project Full Cycle
Converting Metadata to Linked Data
Mapping Australian User-Created Content: Methodological, Technological and E...
AjayBhullar_Resume (5)
Extracting insights from textual data
OpenRefine - Data Science Training for Librarians
NLP and LSA getting started
csresume_aug2016
Milex 2010 final
Ad

Viewers also liked (16)

PDF
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
PDF
Моделирование структурными уравнениями_Алексей Гаевский
PDF
"AI&Big Data для путешественников"_Кузнецов Юрий
PDF
Tweaking perfomance on high-load projects_Думанский Дмитрий
PDF
Стартапы в AI&BigData_Виталий Гончарук
PDF
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
PDF
Deep learning: Cложный анализ данных простыми словами_Сергей Шелпук
PDF
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
PDF
Презентация Ukraine Global Scholars
PDF
Adnan: Introduction to Natural Language Processing
PDF
Natural Language Processing Crash Course
PDF
Natural Language Processing in Practice
PDF
Natural language processing (NLP) introduction
PDF
Practical Natural Language Processing
PPTX
освіта калуш New.pptx
PDF
Natural Language Processing
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
Моделирование структурными уравнениями_Алексей Гаевский
"AI&Big Data для путешественников"_Кузнецов Юрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
Стартапы в AI&BigData_Виталий Гончарук
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
Deep learning: Cложный анализ данных простыми словами_Сергей Шелпук
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
Презентация Ukraine Global Scholars
Adnan: Introduction to Natural Language Processing
Natural Language Processing Crash Course
Natural Language Processing in Practice
Natural language processing (NLP) introduction
Practical Natural Language Processing
освіта калуш New.pptx
Natural Language Processing
Ad

Similar to Всеволод Демкин "Natural language processing на практике" (20)

PPTX
Bots & spiders
PPTX
Automation of (Biological) Data Analysis and Report Generation
PPTX
Apresentação - Minicurso de Introdução a Python, Data Science e Machine Learning
PPTX
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
PPTX
Scrapy.for.dummies
PPTX
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
PPTX
Towards Reusable Research Software
PPTX
Smart modeling of smart software
PDF
Capitalizing on Machine Reading to Engage Bigger Data
ODP
Information Extraction from the Web - Algorithms and Tools
PPTX
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
PPTX
Session 03 acquiring data
PPTX
Session 03 acquiring data
PPTX
Universal Design Tokens
PDF
Resume
PDF
Srinivas Muddana Resume
PDF
Srinivas Muddana Resume
PDF
Srinivas Muddana Resume
PPTX
Natural Language Processing (NLP).pptx
PPTX
Improving your team’s source code searching capabilities
Bots & spiders
Automation of (Biological) Data Analysis and Report Generation
Apresentação - Minicurso de Introdução a Python, Data Science e Machine Learning
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Scrapy.for.dummies
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
Towards Reusable Research Software
Smart modeling of smart software
Capitalizing on Machine Reading to Engage Bigger Data
Information Extraction from the Web - Algorithms and Tools
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
Session 03 acquiring data
Session 03 acquiring data
Universal Design Tokens
Resume
Srinivas Muddana Resume
Srinivas Muddana Resume
Srinivas Muddana Resume
Natural Language Processing (NLP).pptx
Improving your team’s source code searching capabilities

More from GeeksLab Odessa (20)

PDF
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
PDF
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
PDF
DataScience Lab 2017_Блиц-доклад_Турский Виктор
PDF
DataScience Lab 2017_Обзор методов детекции лиц на изображение
PDF
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
PDF
DataScienceLab2017_Блиц-доклад
PDF
DataScienceLab2017_Блиц-доклад
PDF
DataScienceLab2017_Блиц-доклад
PDF
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
PDF
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
PDF
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
PDF
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
PDF
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
PDF
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
PDF
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
PDF
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
PDF
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
PDF
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
PDF
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
PPTX
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js

Recently uploaded (20)

PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Business Analytics and business intelligence.pdf
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Global Data and Analytics Market Outlook Report
PDF
annual-report-2024-2025 original latest.
PPTX
Database Infoormation System (DBIS).pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
ISS -ESG Data flows What is ESG and HowHow
retention in jsjsksksksnbsndjddjdnFPD.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
A Complete Guide to Streamlining Business Processes
Business Analytics and business intelligence.pdf
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
SAP 2 completion done . PRESENTATION.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
IMPACT OF LANDSLIDE.....................
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Pilar Kemerdekaan dan Identi Bangsa.pptx
Global Data and Analytics Market Outlook Report
annual-report-2024-2025 original latest.
Database Infoormation System (DBIS).pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
Optimise Shopper Experiences with a Strong Data Estate.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx

Всеволод Демкин "Natural language processing на практике"