SlideShare a Scribd company logo
Python + NLTK
Natural Language Processing
Brief History of Python
• Invented in the Netherlands, early 90s by Guido van Rossum
• Named after Monty Python
• Open sourced from the beginning
• Used by Google from the beginning
• Increasingly popular
• https://www.python.org/downloads/
Naming Rules
• Names are case sensitive and cannot start with a number. They can
contain letters, numbers, and underscores.
bob Bob _bob _2_bob_ bob_2 BoB
• There are some reserved words:
and, assert, break, class, continue, def, del, elif,
else, except, exec, finally, for, from, global, if,
import, in, is, lambda, not, or, pass, print, raise,
return, try, while
Assignment
• You can assign to multiple names at the same time
>>> x, y = 2, 3
>>> x
2
>>> y
3
This makes it easy to swap values
>>> x, y = y, x
• Assignments can be chained
>>> a = b = x = 2
if elseelse elif
A Code Sample (in IDLE)
x = 10 - 5 # A comment.
y = “Hello” # Another one.
z = 3.45
if z == 3.45 or y == “Hello”:
x = x + 1
y = y + “ World” # String concat.
print x
print y
Array
• array = [2, 54, 5, 7, 8, 9]
• list1 = list() #empty list
NLTK
• NLTK (Natural Language Toolkit) is the go-to API for NLP
(Natural Language Processing) with Python
pip install nltk
Tokenization
• Tokenization is a way of separating a piece of text into
smaller units called tokens.
• sentence: “Never give up”.
• 3 tokens – Never-give-up.
• from nltk import word_tokenize, sent_tokenize
• sent = "I will walk 500 miles and I would walk 500 more, just to be the
man who walks a thousand miles to fall down at your door!“
• print(word_tokenize(sent))
• print(sent_tokenize(sent))
Stop words
• Stop words are the words which are very common in text
documents
• Example
• as a, an, the, you, your, etc.
• Print all stopwords in English
Stop Word Removal
• from nltk.corpus import stopwords
• from nltk.tokenize import word_tokenize
• example_sent = """early symptoms of the coronavirus"""
• stop_words = set(stopwords.words('english'))
• word_tokens = word_tokenize(example_sent)
• filtered_sentence = [w for w in word_tokens if not w in stop_words]
Stop Word Removal
• filtered_sentence = []
• for w in word_tokens:
• if w not in stop_words:
• filtered_sentence.append(w)
• print(word_tokens)
• print(filtered_sentence)
Stemming
• Stemming is the process of producing morphological
variants of a root/base word.
• Stemming is used in information retrieval systems like
search engines.
• It is used to determine domain vocabularies in domain
analysis.
• Some more example of stemming for root word
"like" include:
-> "likes"
-> "liked"
-> "likely"
Some more example of stemming for root word "like" include: -> "likes" -> "liked" -> "likely" -> "liking"

More Related Content

PPTX
PPT
Getting started in Python presentation by Laban K
PPT
Python ppt
PPTX
Brixton Library Technology Initiative Week0 Recap
PPTX
Code Like Pythonista
PDF
Python - Lecture 1
PPT
Python Basics
PPT
pysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
Getting started in Python presentation by Laban K
Python ppt
Brixton Library Technology Initiative Week0 Recap
Code Like Pythonista
Python - Lecture 1
Python Basics
pysdasdasdsadsadsadsadsadsadasdasdthon1.ppt

Similar to Python computer science technology .pptx (20)

PPT
python1.ppt
PPT
Introductio_to_python_progamming_ppt.ppt
PPT
python1.ppt
PPT
python1.ppt
PPT
python1.ppt
PPT
Lenguaje Python
PPT
python1.ppt
PPT
python1.ppt
PPT
python1.ppt
PPT
coolstuff.ppt
PPT
Learn Python in Three Hours - Presentation
PPTX
IoT-Week1-Day1-Lab.pptx
PDF
Python教程 / Python tutorial
PPT
Kavitha_python.ppt
PPTX
python_class.pptx
PPT
ENGLISH PYTHON.ppt
PDF
Ruby 2: some new things
PPTX
manish python.pptx
PDF
python program
PDF
PLAT-11 Moving from Lucene to Alfresco FTS
python1.ppt
Introductio_to_python_progamming_ppt.ppt
python1.ppt
python1.ppt
python1.ppt
Lenguaje Python
python1.ppt
python1.ppt
python1.ppt
coolstuff.ppt
Learn Python in Three Hours - Presentation
IoT-Week1-Day1-Lab.pptx
Python教程 / Python tutorial
Kavitha_python.ppt
python_class.pptx
ENGLISH PYTHON.ppt
Ruby 2: some new things
manish python.pptx
python program
PLAT-11 Moving from Lucene to Alfresco FTS
Ad

More from Athar Baig (8)

PPTX
stemming and tokanization in corpus.pptx
PPTX
Lecture # 1 Introduction to Electronics (Semiconductors)
PDF
Lecture 01 Introduction and applications of Electronics & SemiConductors.pdf
PPTX
Lecture 1,2 of Motion Control Technologies.pptx
PPTX
Introductory Lecture to Electronic Devices and Circuits.pptx
PPTX
Lecture 1-2.pptx
PPTX
Defense Presentation(Final).pptx
PDF
Lecture 1 & 2.pdf
stemming and tokanization in corpus.pptx
Lecture # 1 Introduction to Electronics (Semiconductors)
Lecture 01 Introduction and applications of Electronics & SemiConductors.pdf
Lecture 1,2 of Motion Control Technologies.pptx
Introductory Lecture to Electronic Devices and Circuits.pptx
Lecture 1-2.pptx
Defense Presentation(Final).pptx
Lecture 1 & 2.pdf
Ad

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Modelling in Business Intelligence , information system
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Business Analytics and business intelligence.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Leprosy and NLEP programme community medicine
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPT
Predictive modeling basics in data cleaning process
PDF
Transcultural that can help you someday.
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Lecture1 pattern recognition............
Optimise Shopper Experiences with a Strong Data Estate.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Modelling in Business Intelligence , information system
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Business Analytics and business intelligence.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Leprosy and NLEP programme community medicine
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Predictive modeling basics in data cleaning process
Transcultural that can help you someday.
SAP 2 completion done . PRESENTATION.pptx
annual-report-2024-2025 original latest.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

Python computer science technology .pptx

  • 1. Python + NLTK Natural Language Processing
  • 2. Brief History of Python • Invented in the Netherlands, early 90s by Guido van Rossum • Named after Monty Python • Open sourced from the beginning • Used by Google from the beginning • Increasingly popular • https://www.python.org/downloads/
  • 3. Naming Rules • Names are case sensitive and cannot start with a number. They can contain letters, numbers, and underscores. bob Bob _bob _2_bob_ bob_2 BoB • There are some reserved words: and, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while
  • 4. Assignment • You can assign to multiple names at the same time >>> x, y = 2, 3 >>> x 2 >>> y 3 This makes it easy to swap values >>> x, y = y, x • Assignments can be chained >>> a = b = x = 2
  • 6. A Code Sample (in IDLE) x = 10 - 5 # A comment. y = “Hello” # Another one. z = 3.45 if z == 3.45 or y == “Hello”: x = x + 1 y = y + “ World” # String concat. print x print y
  • 7. Array • array = [2, 54, 5, 7, 8, 9] • list1 = list() #empty list
  • 8. NLTK • NLTK (Natural Language Toolkit) is the go-to API for NLP (Natural Language Processing) with Python pip install nltk
  • 9. Tokenization • Tokenization is a way of separating a piece of text into smaller units called tokens. • sentence: “Never give up”. • 3 tokens – Never-give-up.
  • 10. • from nltk import word_tokenize, sent_tokenize • sent = "I will walk 500 miles and I would walk 500 more, just to be the man who walks a thousand miles to fall down at your door!“ • print(word_tokenize(sent)) • print(sent_tokenize(sent))
  • 11. Stop words • Stop words are the words which are very common in text documents • Example • as a, an, the, you, your, etc. • Print all stopwords in English
  • 12. Stop Word Removal • from nltk.corpus import stopwords • from nltk.tokenize import word_tokenize • example_sent = """early symptoms of the coronavirus""" • stop_words = set(stopwords.words('english')) • word_tokens = word_tokenize(example_sent) • filtered_sentence = [w for w in word_tokens if not w in stop_words]
  • 13. Stop Word Removal • filtered_sentence = [] • for w in word_tokens: • if w not in stop_words: • filtered_sentence.append(w) • print(word_tokens) • print(filtered_sentence)
  • 14. Stemming • Stemming is the process of producing morphological variants of a root/base word. • Stemming is used in information retrieval systems like search engines. • It is used to determine domain vocabularies in domain analysis. • Some more example of stemming for root word "like" include: -> "likes" -> "liked" -> "likely" Some more example of stemming for root word "like" include: -> "likes" -> "liked" -> "likely" -> "liking"

Editor's Notes