SlideShare a Scribd company logo
Introduction to NLP
Sandeep Tammu
Natural language processing
Input: Natural language
Unstructured text, Web
pages, Speech
Output: Structured
information
Insights from natural
language
A transformed version of
natural language
(..summarization,
translation)
Question-
Answering
(Jeopardy
game)
IBM Watson
On Sept. 1, 1715 Louis XIV died in this city,
site of a fabulous palace he built.
Versailles
Spam
Classification
Black Friday Begins ==>>
Mysterical Money link
inside..
Killer Mind Control secrets
Let’s meet tomorrow!
Language
Technologies
Mostly solved: Spam detection, Parts of
speech tagging, Named Entity Recognition
Making good progress: Sentiment analysis,
Coreference resolution, Word Sense
disambiguation, Parsing, Machine Translation,
Information Extraction
Still very hard: Question Answering (QA),
Paraphrase detection, Summarization, Dialog
Parts of speech tagging
Colorless green ideas sleep furiously.
ADJ ADJ NOUN VERB ADV
Named Entity Recognition
● Identifying Person, Location, Organisation
Einstein met with UN officials in Princeton
PERSON ORG LOC
Person names, Organizations (companies, government organisations,
committees, etc), Locations (cities, countries, rivers, etc), Date and time
expressions, Other common types: measures (percent, money, weight etc), email
addresses, Web addresses, street addresses, etc.,Some domain-specific entities:
names of drugs, medical conditions, names of ships, bibliographic references etc.
Coreference resolution: Linking
entities
Carter told Mubarak he shouldn’t run again.
Sentiment analysis
the camera really takes good images and you would not be
left to desire more.
the camera was already "used"
Worste after receiving product after 4 days long lens zoomin
not working
it supurib
Word Sense disambiguation
I need new batteries for my mouse.
Text Summarization
● Summarize longer documents into important sentences
● Needs to understand important keyphrases and concepts
in document
● Abstractive, Extractive
Paraphrase
detection:
Writing in other
words
XYZ acquired ABC yesterday
ABC has been taken over by XYZ
Dialog and
conversing
systems
Where is Doctor Strange playing in Bangalore?
INOX at 7:30 pm. Would you like to book a
ticket?
Yes/ Ok/ No
I am booking a ticket for the movie Doctor
Strange tomorrow at 7:30pm for one person,
can you confirm?
Yes.
Challenges in natural language
● Ambiguity: No general rules
Teacher Strikes Idle Kids, Hospitals Are Sued by 7 Foot Doctors
● Non-standard english: want 2 go, b4 u
● Segmentation issues: #customerexperience
● Idioms: No fixed meaning eg: dark horse
● Neologisms: new words unfriended, BFF
● World knowledge:
Mary and Sue are sisters, Mary and Sue are mothers.
Tokenization
● Issues in English tokenization
Finland’s capital: Finland, Finlands, Finland’s ?
what’re, I’m, isn’t: What are, I am, is not
Hewlett-Packard state-of-the-art, San Francisco, New York
● Acronyms: m.p.h., PhD.
Tokenization
● Other languages: Compound Nouns (German)
Lebensversicherungsgesellschaftsangestellter
‘life insurance company employee’
● Japanese: Further complicated, users can use different
kinds of language in a sentence
Lemmatization
● Reduce variations to base form
car, cars, car's, cars' to car
organize, organizes, and organizing
● Have to find the correct form in dictionary
● Does things properly using vocabulary and considering
the context in which the word is used in.
Stemming
● Reduce the words in a crude manner
Automate (s), automatic, automation to automat
● Rule-based
Caresses: caress, SSES: SS
● Chops off the ends of words
● End user will not be able to interpret the stem all the time
Demo!
NLP in 3 lines: Spacy
● Python package with consistent API
● Very fast, accurate and easy
from spacy.en import English
engine = English()
parsedDoc = engine(‘document’)
References
● https://web.stanford.edu/~jurafsky/NLPCourseraSlides.html
● https://www.cse.iitb.ac.in/~nlp-ai/WSD.ppt

More Related Content

PPTX
Natural language processing
PDF
Access for All: Digital Accessibility 101 (updated)
PDF
Smart Data Webinar: Advances in Natural Language Processing
PDF
Introduction to natural language processing
PDF
Natural language processing (NLP) introduction
PDF
Natural Language Processing (NLP)
PPT
Introduction to Natural Language Processing
PDF
Natural Language Processing
Natural language processing
Access for All: Digital Accessibility 101 (updated)
Smart Data Webinar: Advances in Natural Language Processing
Introduction to natural language processing
Natural language processing (NLP) introduction
Natural Language Processing (NLP)
Introduction to Natural Language Processing
Natural Language Processing

Viewers also liked (18)

PPTX
Introduction to Natural Language Processing
PDF
Analytics2017
PPTX
Game Design Patterns Workshop - FDG2012 - Opening Remarks
PPT
Natural Language Processing for Games Research
PPTX
Inferno Scalable Deep Learning on Spark
PPTX
From Big Data to Precision Medicine
PPTX
Natural Language Processing in Alternative and Augmentative Communication
PDF
Natural Language Processing: L01 introduction
PDF
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
PDF
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
PPTX
From Natural Language Processing to Artificial Intelligence
PDF
Natural Language Processing with Python
PDF
Practical Natural Language Processing
PPTX
Natural language processing
PPT
Introduction to Natural Language Processing
PDF
Welcome to User Experience (UX) Design at EMBL-EBI
PDF
Natural language processing
PDF
Build Features, Not Apps
Introduction to Natural Language Processing
Analytics2017
Game Design Patterns Workshop - FDG2012 - Opening Remarks
Natural Language Processing for Games Research
Inferno Scalable Deep Learning on Spark
From Big Data to Precision Medicine
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing: L01 introduction
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
From Natural Language Processing to Artificial Intelligence
Natural Language Processing with Python
Practical Natural Language Processing
Natural language processing
Introduction to Natural Language Processing
Welcome to User Experience (UX) Design at EMBL-EBI
Natural language processing
Build Features, Not Apps
Ad

Similar to Introduction to Natural Language Processing (20)

PPTX
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
PPTX
Natural-Language-Processing -Stages and application area.pptx
PPTX
Natural Language Processing (NLP).pptx
PPTX
Unit 1 Natural Language Procerssing.pptx
PDF
Natural Language Processing for development
PPT
NLP Introduction.ppt machine learning presentation
PDF
Introduction to Natural Language Processing (NLP)
PDF
Natural language processing module 1 chapter 1
PPT
Lecture1 Natural Language Processing for
PPTX
Natural Language Processing_in semantic web.pptx
PPTX
Presentacion_Procesamiento_Lenguaje.pptx
PDF
Natural Language Processing from Object Automation
PPT
Natural language procssing
PPTX
Natural Language Processing
PDF
Natural Language Processing, Techniques, Current Trends and Applications in I...
PPTX
NATURAL LANGUAGE PROCESSING AA PPT1.pptx
PDF
Natural Language Processing Theory, Applications and Difficulties
PDF
Natural language processing (nlp)
PPTX
Jarrar: Introduction to Natural Language Processing
PPTX
U1_NLP complete.pptxerererererererererrr
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Natural-Language-Processing -Stages and application area.pptx
Natural Language Processing (NLP).pptx
Unit 1 Natural Language Procerssing.pptx
Natural Language Processing for development
NLP Introduction.ppt machine learning presentation
Introduction to Natural Language Processing (NLP)
Natural language processing module 1 chapter 1
Lecture1 Natural Language Processing for
Natural Language Processing_in semantic web.pptx
Presentacion_Procesamiento_Lenguaje.pptx
Natural Language Processing from Object Automation
Natural language procssing
Natural Language Processing
Natural Language Processing, Techniques, Current Trends and Applications in I...
NATURAL LANGUAGE PROCESSING AA PPT1.pptx
Natural Language Processing Theory, Applications and Difficulties
Natural language processing (nlp)
Jarrar: Introduction to Natural Language Processing
U1_NLP complete.pptxerererererererererrr
Ad

Recently uploaded (20)

PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Computer network topology notes for revision
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Business Analytics and business intelligence.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Mega Projects Data Mega Projects Data
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Data Science and Data Analysis
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Computer network topology notes for revision
[EN] Industrial Machine Downtime Prediction
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
Data_Analytics_and_PowerBI_Presentation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Business Analytics and business intelligence.pdf
.pdf is not working space design for the following data for the following dat...
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Mega Projects Data Mega Projects Data
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Knowledge Engineering Part 1
STUDY DESIGN details- Lt Col Maksud (21).pptx

Introduction to Natural Language Processing

  • 2. Natural language processing Input: Natural language Unstructured text, Web pages, Speech Output: Structured information Insights from natural language A transformed version of natural language (..summarization, translation)
  • 3. Question- Answering (Jeopardy game) IBM Watson On Sept. 1, 1715 Louis XIV died in this city, site of a fabulous palace he built. Versailles
  • 4. Spam Classification Black Friday Begins ==>> Mysterical Money link inside.. Killer Mind Control secrets Let’s meet tomorrow!
  • 5. Language Technologies Mostly solved: Spam detection, Parts of speech tagging, Named Entity Recognition Making good progress: Sentiment analysis, Coreference resolution, Word Sense disambiguation, Parsing, Machine Translation, Information Extraction Still very hard: Question Answering (QA), Paraphrase detection, Summarization, Dialog
  • 6. Parts of speech tagging Colorless green ideas sleep furiously. ADJ ADJ NOUN VERB ADV
  • 7. Named Entity Recognition ● Identifying Person, Location, Organisation Einstein met with UN officials in Princeton PERSON ORG LOC Person names, Organizations (companies, government organisations, committees, etc), Locations (cities, countries, rivers, etc), Date and time expressions, Other common types: measures (percent, money, weight etc), email addresses, Web addresses, street addresses, etc.,Some domain-specific entities: names of drugs, medical conditions, names of ships, bibliographic references etc.
  • 8. Coreference resolution: Linking entities Carter told Mubarak he shouldn’t run again.
  • 9. Sentiment analysis the camera really takes good images and you would not be left to desire more. the camera was already "used" Worste after receiving product after 4 days long lens zoomin not working it supurib
  • 10. Word Sense disambiguation I need new batteries for my mouse.
  • 11. Text Summarization ● Summarize longer documents into important sentences ● Needs to understand important keyphrases and concepts in document ● Abstractive, Extractive
  • 12. Paraphrase detection: Writing in other words XYZ acquired ABC yesterday ABC has been taken over by XYZ
  • 13. Dialog and conversing systems Where is Doctor Strange playing in Bangalore? INOX at 7:30 pm. Would you like to book a ticket? Yes/ Ok/ No I am booking a ticket for the movie Doctor Strange tomorrow at 7:30pm for one person, can you confirm? Yes.
  • 14. Challenges in natural language ● Ambiguity: No general rules Teacher Strikes Idle Kids, Hospitals Are Sued by 7 Foot Doctors ● Non-standard english: want 2 go, b4 u ● Segmentation issues: #customerexperience ● Idioms: No fixed meaning eg: dark horse ● Neologisms: new words unfriended, BFF ● World knowledge: Mary and Sue are sisters, Mary and Sue are mothers.
  • 15. Tokenization ● Issues in English tokenization Finland’s capital: Finland, Finlands, Finland’s ? what’re, I’m, isn’t: What are, I am, is not Hewlett-Packard state-of-the-art, San Francisco, New York ● Acronyms: m.p.h., PhD.
  • 16. Tokenization ● Other languages: Compound Nouns (German) Lebensversicherungsgesellschaftsangestellter ‘life insurance company employee’ ● Japanese: Further complicated, users can use different kinds of language in a sentence
  • 17. Lemmatization ● Reduce variations to base form car, cars, car's, cars' to car organize, organizes, and organizing ● Have to find the correct form in dictionary ● Does things properly using vocabulary and considering the context in which the word is used in.
  • 18. Stemming ● Reduce the words in a crude manner Automate (s), automatic, automation to automat ● Rule-based Caresses: caress, SSES: SS ● Chops off the ends of words ● End user will not be able to interpret the stem all the time
  • 19. Demo!
  • 20. NLP in 3 lines: Spacy ● Python package with consistent API ● Very fast, accurate and easy from spacy.en import English engine = English() parsedDoc = engine(‘document’)