SlideShare a Scribd company logo
Тема доклада
Тема доклада
Тема доклада
KYIV 2019
Natural Language Processing with .NET
.NET CONFERENCE #1 IN UKRAINE
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
About me
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Sergiy Korzh
25+ years in software development
20 year running own business
.NET developer since 2004
iForum.ua (technology section)
Projects:
EasyQuery (https://korzh.com/easyquery)
Easy.Report (http://easy.report)
Aistant (https://aistant.com/)
Twitter: @korzhs
LinkedIn: https://www.linkedin.com/in/korzh/
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
Agenda
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
1 Introduction to NLP (main tasks and basic concepts)
NLP Tools for .NET (and not only)2
3 Demos
4 Useful materials and conclusions
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
Why NLP on .NET?
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
Why NLP on .NET?
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Because we love .NET, right?
Quick and easy (for simple NLP tasks)
No “glue” code
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
Remarks
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
“Light” NLP tasks only!
No Deep Learning
Beginner level topics
.NET LEVEL UP
NLP Tasks
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
1 Linguistic
Analysis
Transformation
2
3
Generation4
.NET LEVEL UP
NLP Tasks
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
1 Linguistic
• Segmentation
• Part of speech tagging
• Named-entity recognition
• Relation extraction
• Syntactic parsing
• Coreference resolution
• Semantic parsing
.NET LEVEL UP
NLP Tasks’ Examples
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
2 Analysis
• Spam-filter
• Sentiment analysis
• Text similarity
• Information extraction
.NET LEVEL UP
NLP Tasks’ Examples
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
3 Transformation
• Machine translation
• Speech to Text / Text to speech
• Grammar correction
• Text summarization
.NET LEVEL UP
NLP Tasks’ Examples
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
4 Generation
• Question Answering
• Chat bots
• Story generation
.NET LEVEL UP
NLP Pipeline
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
TEXT Text Featurizing
(Numeric representation)
ML Algorithm RESULT
.NET LEVEL UP
NLP Pipeline: Classic
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
from AYLIEN blog
.NET LEVEL UP
NLP Pipeline: Deep Learning
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
from AYLIEN blog
.NET LEVEL UP
NLP concepts: Bag of words
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
The way to represent your text for ML algorithms
• Word frequency
• One-hot encoding
• TF-IDF
• Other metrics
Encoding approaches:
.NET LEVEL UP
NLP concepts: TF-IDF
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
For a word-document pair, TF-IDF shows the
importance of the word in the document.
Used in all kinds of information retrieval tasks:
• Search
• Text mining
• Stop-words filtering
.NET LEVEL UP
NLP concepts: N-grams
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Word N-grams
n-gram is a contiguous sequence of n items from a given sample of text.
“I live in Kyiv” word bi-grams
1. # I
2. I live
3. live in
4. in Kyiv
5. Kyiv #
Character N-grams
“I live in Kyiv” character bi-grams
1. #_
2. _I
3. I_
4. _l
5. li
6. Iv
7. ve
8. . . .
.NET LEVEL UP
NLP concepts: Word Embeddings
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
A set of techniques which allow to map words (or phrases) to numeric vectors.
The words with similar meanings have “close” vectors.
word Vector
man [0.23, 0.56, …]
king [0.34, 0.16, …]
woman [0.41, 0.73, …]
queen [0.09, 0.62, …]
[king] – [man] + [woman] ≈ [queen]
Popular embeddings algorithms:
 Word2Vec
 fastText
 Glove
 . . .
.NET LEVEL UP
NLP concepts: Language Model
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
allows to compute a probability of a word in a sequence.
Where used? (spoiler: almost everywhere!)
Please, give me a … [ pen: 0.002, example: 0.0001, hand:0.08, … ]
• Machine translation
• Error correction
• Speech recognition
• Text generation
.NET LEVEL UP
NLP Tools
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
1 Online services
Python libraries
.NET Libraries
2
3
Azure Cognitive Services, IBM Watson, Amazon AI Services
NLTK, spaCy, skikit-learn,
gensim, Pattern
ML.NET, Microsoft.Speech,
Microsoft.Recognizers, Catalyst
.NET LEVEL UP
.NET libs: ML.NET
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet
Pros:
• Native for .NET (Core)
• Backed my Microsoft
• Super performant (at least MS says that )
• Extended with TensorFlow & more
NLP features:
• Text normalization
• Tokenizing
• N-gram
• Word embeddings
• Stop words removal Cons:
• Poor NLP features
• English-only (mostly)
• Not convenient for using separately from ML pipeline
.NET LEVEL UP
.NET libs: Catalyst
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
NLP features:
• Text normalization
• Tokenizing
• POS-tagging
• Word embeddings
• Stop words removal
https://github.com/curiosity-ai/catalyst
Pros:
• Native for .NET (Core)
• Inspired by spaCy library
• Fast tokenizer
• Has pretrained models
• Allows to train your own models
(based on Universal Dependencies project)
Cons:
• Early beta (or even alpha). Version 0.0.2795
• English-only (mostly)
.NET LEVEL UP
.NET libs: Microsoft.Recognizers
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
• Rule-based
• Recognizes numbers, units, date/time, etc
• Supports about 10 different languages
• Not only .NET (JavaScript, Python, Java)
• No support for Russian or Ukrainian 
https://github.com/Microsoft/Recognizers-Text/
.NET LEVEL UP
Other useful libraries
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
DEMO 1
Text summarization (extraction based) using home-brewed NLP
TEXT
Detect
language
Break into
sentences
Tokenize
and
get stems
sentence1 sentence2 sentence3
stem1 1 3 5
stem2 0 2 4
stem3 3 4 0
stem4 2 0 2
Bag of words
S1 S2 S3
S1 0 1.21 0.2
S2 1.21 0 3.56
S3 0.2 3.56 0
Similarity matrix
Page rank
algorithm
Summary
(top-rated
sentences)
Other useful libraries
Other useful libraries
Other useful libraries
.NET LEVEL UP
Other useful libraries
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
DEMO 2
Text summarization using ML.NET
Other useful libraries
Other useful libraries
.NET LEVEL UP
Other useful libraries
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
DEMO 3
Document tagging
(with TF-IDF and Catalyst POS tagging)
Other useful libraries
Other useful libraries
Other useful libraries
.NET LEVEL UP
Useful resources
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Universal Dependencies
https://universaldependencies.org/
Lang-uk
http://lang.org.ua/uk/
https://github.com/korzh/Korzh.NLP
All source code of this talk
Math.net – numerical computation algorithms for .NET
https://www.mathdotnet.com/
http://tiny.cc/dotnet-nlp-libs
List of .NET libraries with some NLP features
.NET LEVEL UP
Conclusions
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Catalyst library
looks promising but still a way to go
Contribute!
We can do NLP on .NET
(for the basic tasks at least)
ML.NET library
good and reliable but limited NLP features
.NET LEVEL UP
Other useful libraries
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Thank you!
Sergiy Korzh
Twitter: @korzhs
LinkedIn: https://www.linkedin.com/in/korzh/
Facebook: https://www.facebook.com/sergiy.korzh
Email: sergiy@korzh.com

More Related Content

PPTX
Scala vs. Python: Which Language Should be learned in 2020
ODP
PDQ Programming Languages plus an overview of Alice - Frank Ducrest
PDF
[INNOVATUBE] Tech Talk #3: Golang - Takaaki Mizuno
PDF
About programming languages
PDF
MER: a Minimal Named-Entity Recognition Tagger and Annotation Server
PPTX
Translating software with SDL Passolo?
ODP
PPTX
Android maven Road to flutter| Mavenizing Flutter for web
Scala vs. Python: Which Language Should be learned in 2020
PDQ Programming Languages plus an overview of Alice - Frank Ducrest
[INNOVATUBE] Tech Talk #3: Golang - Takaaki Mizuno
About programming languages
MER: a Minimal Named-Entity Recognition Tagger and Annotation Server
Translating software with SDL Passolo?
Android maven Road to flutter| Mavenizing Flutter for web

What's hot (20)

PDF
Engaging new l10n contributors through Open Source Contributhon
PPTX
Agile Tools for PHP
PDF
Agile Localization: Oxymoron or Heroic Achievement?
PDF
[INNOVATUBE] Tech Talk #3: Golang - Vũ Nguyễn
PPTX
.NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой спос...
PDF
Kotlin & Arrow the functional way
PPTX
Creating multillingual apps for android
PPTX
APIdays Paris 2014 - The State of Web API Languages
PDF
Introduction to Algorithms and Data Structures in Swift 4: Get ready for prog...
PDF
Building LibreOffice Korean Community and CJK common & different issues
PDF
Towards a Commons RDF Java library
PPT
A First Look at Google's Go Programming Language
PDF
Death to project documentation with eXtreme Programming
PDF
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
PDF
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
PDF
Automating boring and repetitive UbuCon Asia video and subtitle stuffs
PDF
Introduction to OmegaT
PDF
Kotlin strives for Deep Learning
PDF
The Go programming language - Intro by MyLittleAdventure
PPTX
How to Review your Translation with 2 Free and Open Source QA Tools
Engaging new l10n contributors through Open Source Contributhon
Agile Tools for PHP
Agile Localization: Oxymoron or Heroic Achievement?
[INNOVATUBE] Tech Talk #3: Golang - Vũ Nguyễn
.NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой спос...
Kotlin & Arrow the functional way
Creating multillingual apps for android
APIdays Paris 2014 - The State of Web API Languages
Introduction to Algorithms and Data Structures in Swift 4: Get ready for prog...
Building LibreOffice Korean Community and CJK common & different issues
Towards a Commons RDF Java library
A First Look at Google's Go Programming Language
Death to project documentation with eXtreme Programming
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
Automating boring and repetitive UbuCon Asia video and subtitle stuffs
Introduction to OmegaT
Kotlin strives for Deep Learning
The Go programming language - Intro by MyLittleAdventure
How to Review your Translation with 2 Free and Open Source QA Tools
Ad

Similar to .NET Fest 2019. Сергей Корж. Natural Language Processing in .NET (20)

PDF
Natural Language Processing (NLP)
PDF
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
PPTX
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
PPTX
Natural Language Processing (NLP).pptx
PPTX
Intro to nlp
PDF
introtonlp-190218095523 (1).pdf
PDF
Natural Language Processing with Python
PDF
Visual-Semantic Embeddings: some thoughts on Language
PPTX
Machine learning (ML) and natural language processing (NLP)
PPTX
NLP Introduction and basics of natural language processing
PDF
Deep Learning for Natural Language Processing: Word Embeddings
PDF
Deep learning for natural language embeddings
PDF
Natural language processing (NLP) introduction
PDF
A Gentle Introduction to Text Analysis :)
PPTX
Natural language processing state of the art, current trends and challenges.pptx
PPTX
Building NLP solutions for Davidson ML Group
PDF
NLP and Knowledge Graphs
PPTX
NLP.pptx
PDF
Representation Learning of Text for NLP
PDF
Anthiil Inside workshop on NLP
Natural Language Processing (NLP)
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Natural Language Processing (NLP).pptx
Intro to nlp
introtonlp-190218095523 (1).pdf
Natural Language Processing with Python
Visual-Semantic Embeddings: some thoughts on Language
Machine learning (ML) and natural language processing (NLP)
NLP Introduction and basics of natural language processing
Deep Learning for Natural Language Processing: Word Embeddings
Deep learning for natural language embeddings
Natural language processing (NLP) introduction
A Gentle Introduction to Text Analysis :)
Natural language processing state of the art, current trends and challenges.pptx
Building NLP solutions for Davidson ML Group
NLP and Knowledge Graphs
NLP.pptx
Representation Learning of Text for NLP
Anthiil Inside workshop on NLP
Ad

More from NETFest (20)

PDF
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
PPTX
.NET Fest 2019. Сергей Калинец. Efficient Microservice Communication with .NE...
PPTX
.NET Fest 2019. Оля Гавриш. .NET Core 3.0 и будущее .NET
PPTX
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
PPTX
.NET Fest 2019. Roberto Freato. Provisioning Azure PaaS fluently with Managem...
PPTX
.NET Fest 2019. Halil Ibrahim Kalkan. Implementing Domain Driven Design
PPTX
.NET Fest 2019. Сергій Бута. Feature Toggles: Dynamic Configuration at Wirex
PPTX
.NET Fest 2019. Michael Staib. Hot Chocolate: GraphQL Schema Stitching with A...
PPTX
.NET Fest 2019. Андрей Литвинов. Async lifetime tests with xUnit and AutoFixture
PPTX
.NET Fest 2019. Анатолий Колесник. Love, Death & F# Tests
PPTX
.NET Fest 2019. Roberto Freato. Azure App Service deep dive
PPTX
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
PPTX
.NET Fest 2019. Александр Демчук. How to measure relationships within the Com...
PDF
.NET Fest 2019. Anna Melashkina та Philipp Bauknecht. Dragons in a Mixed Real...
PDF
.NET Fest 2019. Alex Thissen. Architecting .NET solutions in a Docker ecosystem
PPTX
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
PPTX
.NET Fest 2019. Сергей Медведев. How serverless makes Integration TDD a reali...
PDF
.NET Fest 2019. Eran Stiller. Create Your Own Serverless PKI with .NET & Azur...
PDF
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
PPTX
.NET Fest 2019. Kevin Dockx. Uncovering Swagger/OpenAPI
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Сергей Калинец. Efficient Microservice Communication with .NE...
.NET Fest 2019. Оля Гавриш. .NET Core 3.0 и будущее .NET
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
.NET Fest 2019. Roberto Freato. Provisioning Azure PaaS fluently with Managem...
.NET Fest 2019. Halil Ibrahim Kalkan. Implementing Domain Driven Design
.NET Fest 2019. Сергій Бута. Feature Toggles: Dynamic Configuration at Wirex
.NET Fest 2019. Michael Staib. Hot Chocolate: GraphQL Schema Stitching with A...
.NET Fest 2019. Андрей Литвинов. Async lifetime tests with xUnit and AutoFixture
.NET Fest 2019. Анатолий Колесник. Love, Death & F# Tests
.NET Fest 2019. Roberto Freato. Azure App Service deep dive
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
.NET Fest 2019. Александр Демчук. How to measure relationships within the Com...
.NET Fest 2019. Anna Melashkina та Philipp Bauknecht. Dragons in a Mixed Real...
.NET Fest 2019. Alex Thissen. Architecting .NET solutions in a Docker ecosystem
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
.NET Fest 2019. Сергей Медведев. How serverless makes Integration TDD a reali...
.NET Fest 2019. Eran Stiller. Create Your Own Serverless PKI with .NET & Azur...
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
.NET Fest 2019. Kevin Dockx. Uncovering Swagger/OpenAPI

Recently uploaded (20)

PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Institutional Correction lecture only . . .
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Classroom Observation Tools for Teachers
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Insiders guide to clinical Medicine.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Basic Mud Logging Guide for educational purpose
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Complications of Minimal Access Surgery at WLH
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Institutional Correction lecture only . . .
FourierSeries-QuestionsWithAnswers(Part-A).pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Abdominal Access Techniques with Prof. Dr. R K Mishra
Sports Quiz easy sports quiz sports quiz
Module 4: Burden of Disease Tutorial Slides S2 2025
Classroom Observation Tools for Teachers
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O7-L3 Supply Chain Operations - ICLT Program
Insiders guide to clinical Medicine.pdf
Pre independence Education in Inndia.pdf
Microbial disease of the cardiovascular and lymphatic systems
Basic Mud Logging Guide for educational purpose
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
human mycosis Human fungal infections are called human mycosis..pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Anesthesia in Laparoscopic Surgery in India
Complications of Minimal Access Surgery at WLH
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf

.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET

  • 1. Тема доклада Тема доклада Тема доклада KYIV 2019 Natural Language Processing with .NET .NET CONFERENCE #1 IN UKRAINE
  • 2. Тема доклада Тема доклада Тема доклада .NET LEVEL UP About me .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Sergiy Korzh 25+ years in software development 20 year running own business .NET developer since 2004 iForum.ua (technology section) Projects: EasyQuery (https://korzh.com/easyquery) Easy.Report (http://easy.report) Aistant (https://aistant.com/) Twitter: @korzhs LinkedIn: https://www.linkedin.com/in/korzh/
  • 3. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Agenda .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Introduction to NLP (main tasks and basic concepts) NLP Tools for .NET (and not only)2 3 Demos 4 Useful materials and conclusions
  • 4. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Why NLP on .NET? .NET CONFERENCE #1 IN UKRAINE KYIV 2019
  • 5. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Why NLP on .NET? .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Because we love .NET, right? Quick and easy (for simple NLP tasks) No “glue” code
  • 6. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Remarks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 “Light” NLP tasks only! No Deep Learning Beginner level topics
  • 7. .NET LEVEL UP NLP Tasks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Linguistic Analysis Transformation 2 3 Generation4
  • 8. .NET LEVEL UP NLP Tasks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Linguistic • Segmentation • Part of speech tagging • Named-entity recognition • Relation extraction • Syntactic parsing • Coreference resolution • Semantic parsing
  • 9. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 2 Analysis • Spam-filter • Sentiment analysis • Text similarity • Information extraction
  • 10. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 3 Transformation • Machine translation • Speech to Text / Text to speech • Grammar correction • Text summarization
  • 11. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 4 Generation • Question Answering • Chat bots • Story generation
  • 12. .NET LEVEL UP NLP Pipeline .NET CONFERENCE #1 IN UKRAINE KYIV 2019 TEXT Text Featurizing (Numeric representation) ML Algorithm RESULT
  • 13. .NET LEVEL UP NLP Pipeline: Classic .NET CONFERENCE #1 IN UKRAINE KYIV 2019 from AYLIEN blog
  • 14. .NET LEVEL UP NLP Pipeline: Deep Learning .NET CONFERENCE #1 IN UKRAINE KYIV 2019 from AYLIEN blog
  • 15. .NET LEVEL UP NLP concepts: Bag of words .NET CONFERENCE #1 IN UKRAINE KYIV 2019 The way to represent your text for ML algorithms • Word frequency • One-hot encoding • TF-IDF • Other metrics Encoding approaches:
  • 16. .NET LEVEL UP NLP concepts: TF-IDF .NET CONFERENCE #1 IN UKRAINE KYIV 2019 For a word-document pair, TF-IDF shows the importance of the word in the document. Used in all kinds of information retrieval tasks: • Search • Text mining • Stop-words filtering
  • 17. .NET LEVEL UP NLP concepts: N-grams .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Word N-grams n-gram is a contiguous sequence of n items from a given sample of text. “I live in Kyiv” word bi-grams 1. # I 2. I live 3. live in 4. in Kyiv 5. Kyiv # Character N-grams “I live in Kyiv” character bi-grams 1. #_ 2. _I 3. I_ 4. _l 5. li 6. Iv 7. ve 8. . . .
  • 18. .NET LEVEL UP NLP concepts: Word Embeddings .NET CONFERENCE #1 IN UKRAINE KYIV 2019 A set of techniques which allow to map words (or phrases) to numeric vectors. The words with similar meanings have “close” vectors. word Vector man [0.23, 0.56, …] king [0.34, 0.16, …] woman [0.41, 0.73, …] queen [0.09, 0.62, …] [king] – [man] + [woman] ≈ [queen] Popular embeddings algorithms:  Word2Vec  fastText  Glove  . . .
  • 19. .NET LEVEL UP NLP concepts: Language Model .NET CONFERENCE #1 IN UKRAINE KYIV 2019 allows to compute a probability of a word in a sequence. Where used? (spoiler: almost everywhere!) Please, give me a … [ pen: 0.002, example: 0.0001, hand:0.08, … ] • Machine translation • Error correction • Speech recognition • Text generation
  • 20. .NET LEVEL UP NLP Tools .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Online services Python libraries .NET Libraries 2 3 Azure Cognitive Services, IBM Watson, Amazon AI Services NLTK, spaCy, skikit-learn, gensim, Pattern ML.NET, Microsoft.Speech, Microsoft.Recognizers, Catalyst
  • 21. .NET LEVEL UP .NET libs: ML.NET .NET CONFERENCE #1 IN UKRAINE KYIV 2019 https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet Pros: • Native for .NET (Core) • Backed my Microsoft • Super performant (at least MS says that ) • Extended with TensorFlow & more NLP features: • Text normalization • Tokenizing • N-gram • Word embeddings • Stop words removal Cons: • Poor NLP features • English-only (mostly) • Not convenient for using separately from ML pipeline
  • 22. .NET LEVEL UP .NET libs: Catalyst .NET CONFERENCE #1 IN UKRAINE KYIV 2019 NLP features: • Text normalization • Tokenizing • POS-tagging • Word embeddings • Stop words removal https://github.com/curiosity-ai/catalyst Pros: • Native for .NET (Core) • Inspired by spaCy library • Fast tokenizer • Has pretrained models • Allows to train your own models (based on Universal Dependencies project) Cons: • Early beta (or even alpha). Version 0.0.2795 • English-only (mostly)
  • 23. .NET LEVEL UP .NET libs: Microsoft.Recognizers .NET CONFERENCE #1 IN UKRAINE KYIV 2019 • Rule-based • Recognizes numbers, units, date/time, etc • Supports about 10 different languages • Not only .NET (JavaScript, Python, Java) • No support for Russian or Ukrainian  https://github.com/Microsoft/Recognizers-Text/
  • 24. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 1 Text summarization (extraction based) using home-brewed NLP TEXT Detect language Break into sentences Tokenize and get stems sentence1 sentence2 sentence3 stem1 1 3 5 stem2 0 2 4 stem3 3 4 0 stem4 2 0 2 Bag of words S1 S2 S3 S1 0 1.21 0.2 S2 1.21 0 3.56 S3 0.2 3.56 0 Similarity matrix Page rank algorithm Summary (top-rated sentences)
  • 28. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 2 Text summarization using ML.NET
  • 31. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 3 Document tagging (with TF-IDF and Catalyst POS tagging)
  • 35. .NET LEVEL UP Useful resources .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Universal Dependencies https://universaldependencies.org/ Lang-uk http://lang.org.ua/uk/ https://github.com/korzh/Korzh.NLP All source code of this talk Math.net – numerical computation algorithms for .NET https://www.mathdotnet.com/ http://tiny.cc/dotnet-nlp-libs List of .NET libraries with some NLP features
  • 36. .NET LEVEL UP Conclusions .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Catalyst library looks promising but still a way to go Contribute! We can do NLP on .NET (for the basic tasks at least) ML.NET library good and reliable but limited NLP features
  • 37. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Thank you! Sergiy Korzh Twitter: @korzhs LinkedIn: https://www.linkedin.com/in/korzh/ Facebook: https://www.facebook.com/sergiy.korzh Email: sergiy@korzh.com

Editor's Notes

  • #22: What kind of normalization? How to get tokens? What n-gramming is supported (word, character?) What kind of word embeddings? Only English? How to add my own stop-word removal?