SlideShare a Scribd company logo
Practical Natural language processing
(NLP)
Overview
1. Intro & overview 2. NLP task & tools 3. NLP Use case sharing 4. Impact,
Lesson learned
What is Natural Language Processing? Why we need it?
Explosive growth of unstructured data
Better understanding on user interaction & behavior
Example NLP use case
Social media monitoring
Recommendation engine
NLP in finance & credit scoring
Other use case
Language is complicated…
Language is complicated
Buffalo…. this is a correct sentence
Two interpretation of same sentence
I saw a girl with telescope
Practical Natural language processing
Practical Natural language processing
Practical Natural language processing
Natural language processing (NLP) task
stemming stopwords Word segment Part-of-speech
Name entity
recognition
Abbrevation Ambigious term Word similarity Translation
Python package: spacy, NLTK
1. Stemmer
Simplify words to root word
• Affect / Affection / Affections /Affected /Affecting  Affect
2. Remove Stopwords
Remove common words and word with little meaning
• I, am, it, she ,he, want, do…..
3. Handle ambigious term
word with potential multiple meanings
● “I love Blackberry”  Fruit or mobile phone?
● Java  Programming language or Indonesian island?
4. Handle abbreviation
a shortened form of a word or phrase
● HDB/ MIC / NTUC / PM/ CV
5. Name entity extraction (NER)
Python package: spacy, NLTK
6. word2vec ( word embedding)
e.g. semantic similarity, what are the words that have similar meaning of given word/phrase
Python package: genism, tensorflow
Search: Java developer ( what is the word semantic similar to?)
7. Wordcloud
Other challenge in text processing
● Language library  Malay & Tamils language are not supported yet
by most service provider
● Spelling mistakes
● Language translation and mapping
● Contextual meaning
● Informal language handling
● Sarcassim
Use case:
Build a job classification engine using NLP
(with human-in-loop design)
1a) Business objective
To understand job market demand & supply, location,
skillset needed for each profession & role
1b) Overview
2) Key information extraction
1) Classification
Classifier 1 – MASCO job category
Classifier 2 – MSIC industry
Classifier 3 – NEC field of study
Classifier 4 - SKILL library
• To classify a job post into MASCO job category (6000+ categories !!)
• To classify a job post into MSIC industry category ( 300+ categories )
• To classify a job post into NEC field of study category (100+ categories)
• To extract relevant skill that match to SKILL library ( 2000+ categories)
6000+ MASCO category
1c) Problem framing and solution approach
1. Business objective: To classify a job post into MASCO job category (classification problem, text &
language)
2. Input data: Job title + Job description ( very sparse & text-based data)
3. Output category: 6000+ MASCO category ( even google NLP API only able to categorize 300+)
4. Selected algorithm/ model: word2vec semantic similarity
5. Other alternative model: custom deep learning model like BERT, LSTM (but wait… it take 3~4
months to finetune and pray hard for the accuracy!!!)
6. Other challenge: Limited time ( < 2 months) and resource ( 1 Data scientist , 1 Data engineer)
Let’s test on Google world leading NLP API
Practical Natural language processing
2a) Data preparation & standardization
3a) Technology architecture overview
3b) Data engineering & workflow
4a) Text preprocessing
4b) Information extraction
4c) Train word2vec model
4d) tensorboard to visualize word2vec model
5a) Classification ( exact-match)
MASCO job category
NEC field of study
5b) Classification ( semantic-match)
MASCO job category
6a) Evaluation (human-in-loop-design)
7) Impact & Benefit
1) Faster discovery of job
market insight & trend
Improve time data-to-decision from months to
day
2) Automate 90% of manual work
50,000+ job post auto-classified per month
300+ man-hours saved per month ( 2 two head-count)
8a) Business challenge & lesson learned
● Get buy-in early - Identify all your stakeholders and involve them
since project initiation ( Avoid i build first, they will come later)
● Goal & impact oriented - Understand what matter to the
organization & identify high impact and low-hanging fruit use case
● Be Agile – Start with simple model, build first prototype, get
feedback and run iteration
8b) Technical challenge & lesson learned
● Be realistic – Data is not always came in “perfect” structures as per
your wish list!
● Technology/Technical gap - in your organization i.e. legacy
systems & integration
● Performance - Architecture & pipelines to solve performance i.e.
increment of concurrent users
● End-to-end solution mindset - You need knowledge in software
infrastructures, development pipeline & deployment!
Questions?
Thank you
lutherteh0204@gmail.com
Practical Natural language processing
NLP & NLU

More Related Content

PPTX
Google Next '18 extended -- data science & nlp for content recommendation
PPTX
Productionalize content recommendation engine
PDF
Understanding and winning your customers in the big data era ( retail industry)
PDF
MPB - introduction to AI & Big Data
PDF
Big Data LDN 2017: Improving Customer Experience with an AI Bot
PDF
Big Data LDN 2017: Applied AI using Cognitive Services
PDF
Find it! Nail it! Boosting e-commerce search conversions with machine learnin...
PDF
Entirely tailored sentiment analysis - MeaningCloud webinar
Google Next '18 extended -- data science & nlp for content recommendation
Productionalize content recommendation engine
Understanding and winning your customers in the big data era ( retail industry)
MPB - introduction to AI & Big Data
Big Data LDN 2017: Improving Customer Experience with an AI Bot
Big Data LDN 2017: Applied AI using Cognitive Services
Find it! Nail it! Boosting e-commerce search conversions with machine learnin...
Entirely tailored sentiment analysis - MeaningCloud webinar

Similar to Practical Natural language processing (20)

PDF
Full Stack Software Development
PPTX
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
PDF
Full Stack Software Development Program
PDF
Starting a career in data science
PDF
Software development Program with Specialization in Cloud & DevOps
PDF
Capgemini Interview Questions By ScholarHat
PDF
09. AI (ML_DL_NLP)[1].pdf
PPT
From context to knowledge: consecutive mapping ontologies and contexts
PDF
21AI401 AI Unit 1.pdf
PPTX
Machine learning: A Walk Through School Exams
PPT
NEXiDA at OMG June 2009
PPTX
Predictive Knowledge Sharing - Kaleo
PDF
210428kopo
PPTX
PDF
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
PDF
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
PDF
ACL 2018 Recap
DOC
MyResume_Manivannan
PDF
Board Infinity Data Science Brochure - data science learning path
PDF
Artifical Intelligence and Machine Learning and what they are doing to hiring...
Full Stack Software Development
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Full Stack Software Development Program
Starting a career in data science
Software development Program with Specialization in Cloud & DevOps
Capgemini Interview Questions By ScholarHat
09. AI (ML_DL_NLP)[1].pdf
From context to knowledge: consecutive mapping ontologies and contexts
21AI401 AI Unit 1.pdf
Machine learning: A Walk Through School Exams
NEXiDA at OMG June 2009
Predictive Knowledge Sharing - Kaleo
210428kopo
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
ACL 2018 Recap
MyResume_Manivannan
Board Infinity Data Science Brochure - data science learning path
Artifical Intelligence and Machine Learning and what they are doing to hiring...
Ad

Recently uploaded (20)

PPT
Predictive modeling basics in data cleaning process
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Microsoft Core Cloud Services powerpoint
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Managing Community Partner Relationships
PDF
Lecture1 pattern recognition............
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
annual-report-2024-2025 original latest.
PPTX
Modelling in Business Intelligence , information system
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
modul_python (1).pptx for professional and student
PDF
Introduction to Data Science and Data Analysis
PPTX
Qualitative Qantitative and Mixed Methods.pptx
Predictive modeling basics in data cleaning process
climate analysis of Dhaka ,Banglades.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction-to-Cloud-ComputingFinal.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Microsoft Core Cloud Services powerpoint
IBA_Chapter_11_Slides_Final_Accessible.pptx
Managing Community Partner Relationships
Lecture1 pattern recognition............
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
annual-report-2024-2025 original latest.
Modelling in Business Intelligence , information system
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Pilar Kemerdekaan dan Identi Bangsa.pptx
modul_python (1).pptx for professional and student
Introduction to Data Science and Data Analysis
Qualitative Qantitative and Mixed Methods.pptx
Ad

Practical Natural language processing

  • 1. Practical Natural language processing (NLP)
  • 2. Overview 1. Intro & overview 2. NLP task & tools 3. NLP Use case sharing 4. Impact, Lesson learned
  • 3. What is Natural Language Processing? Why we need it?
  • 4. Explosive growth of unstructured data
  • 5. Better understanding on user interaction & behavior
  • 9. NLP in finance & credit scoring
  • 13. Buffalo…. this is a correct sentence
  • 14. Two interpretation of same sentence I saw a girl with telescope
  • 18. Natural language processing (NLP) task stemming stopwords Word segment Part-of-speech Name entity recognition Abbrevation Ambigious term Word similarity Translation Python package: spacy, NLTK
  • 19. 1. Stemmer Simplify words to root word • Affect / Affection / Affections /Affected /Affecting  Affect
  • 20. 2. Remove Stopwords Remove common words and word with little meaning • I, am, it, she ,he, want, do…..
  • 21. 3. Handle ambigious term word with potential multiple meanings ● “I love Blackberry”  Fruit or mobile phone? ● Java  Programming language or Indonesian island?
  • 22. 4. Handle abbreviation a shortened form of a word or phrase ● HDB/ MIC / NTUC / PM/ CV
  • 23. 5. Name entity extraction (NER) Python package: spacy, NLTK
  • 24. 6. word2vec ( word embedding) e.g. semantic similarity, what are the words that have similar meaning of given word/phrase Python package: genism, tensorflow Search: Java developer ( what is the word semantic similar to?)
  • 26. Other challenge in text processing ● Language library  Malay & Tamils language are not supported yet by most service provider ● Spelling mistakes ● Language translation and mapping ● Contextual meaning ● Informal language handling ● Sarcassim
  • 27. Use case: Build a job classification engine using NLP (with human-in-loop design)
  • 28. 1a) Business objective To understand job market demand & supply, location, skillset needed for each profession & role
  • 29. 1b) Overview 2) Key information extraction 1) Classification Classifier 1 – MASCO job category Classifier 2 – MSIC industry Classifier 3 – NEC field of study Classifier 4 - SKILL library
  • 30. • To classify a job post into MASCO job category (6000+ categories !!) • To classify a job post into MSIC industry category ( 300+ categories ) • To classify a job post into NEC field of study category (100+ categories) • To extract relevant skill that match to SKILL library ( 2000+ categories)
  • 32. 1c) Problem framing and solution approach 1. Business objective: To classify a job post into MASCO job category (classification problem, text & language) 2. Input data: Job title + Job description ( very sparse & text-based data) 3. Output category: 6000+ MASCO category ( even google NLP API only able to categorize 300+) 4. Selected algorithm/ model: word2vec semantic similarity 5. Other alternative model: custom deep learning model like BERT, LSTM (but wait… it take 3~4 months to finetune and pray hard for the accuracy!!!) 6. Other challenge: Limited time ( < 2 months) and resource ( 1 Data scientist , 1 Data engineer)
  • 33. Let’s test on Google world leading NLP API
  • 35. 2a) Data preparation & standardization
  • 37. 3b) Data engineering & workflow
  • 41. 4d) tensorboard to visualize word2vec model
  • 42. 5a) Classification ( exact-match) MASCO job category NEC field of study
  • 43. 5b) Classification ( semantic-match) MASCO job category
  • 45. 7) Impact & Benefit 1) Faster discovery of job market insight & trend Improve time data-to-decision from months to day 2) Automate 90% of manual work 50,000+ job post auto-classified per month 300+ man-hours saved per month ( 2 two head-count)
  • 46. 8a) Business challenge & lesson learned ● Get buy-in early - Identify all your stakeholders and involve them since project initiation ( Avoid i build first, they will come later) ● Goal & impact oriented - Understand what matter to the organization & identify high impact and low-hanging fruit use case ● Be Agile – Start with simple model, build first prototype, get feedback and run iteration
  • 47. 8b) Technical challenge & lesson learned ● Be realistic – Data is not always came in “perfect” structures as per your wish list! ● Technology/Technical gap - in your organization i.e. legacy systems & integration ● Performance - Architecture & pipelines to solve performance i.e. increment of concurrent users ● End-to-end solution mindset - You need knowledge in software infrastructures, development pipeline & deployment!