SlideShare a Scribd company logo
The	Use	of	NLP	to	
Solve	Problems
Annie	Flippo
11/2/2016
Who	am	I?
Annie	Flippo
Sr.	Data	Scientist
AwesomenessTV	/	Dreamworks	Animation	SKG
Slides	at	bit.ly/acflippo-nlp
Who	is	AwesomenessTV?
We’re	digital	content	provider	for	
platforms	including	Hulu,	Netflix,	Roku,	
Verizon	&	YouTube.
Business	Problem
Many	systems	managing	videos	
on	different	platforms
Goals
Develop	a	method	to	identify	same	
or	similar	assets	across	systems:
• Show	asset	relationship
• Generate	unique	id	for	in-house	apps
Why	use	NLP?
Top	goals	for	Natural	Language	
Processing	are:
1. Document	Similarity	(search	engine	query)
2. Topic	Modeling	(Twitter/Blog	Analysis)
3. Sentiment	Analysis	(movie	or	restaurant	reviews)
Data	Processing
Why	perform	text	processing?
• To	rid	of	messiness	of	free-from	text
• To	group	words	with	the	same	
meaning
• Convert	text	to	numeric	features
• Model	on	equivalent	numeric	features
Data	Processing
Titles	and	descriptions	get	scrubbed
• Remove	punctuation,	non-ascii,	carriage	
returns
• Remove	stop	words	(i.e.	it,	this,	and,	that)
• Stemming
• Lemmatize
• Tokenize
• Vectorize
Stemming
Reduce	to	the	root	of	the	word
Provision,	providing,	provider,	provided		
=>	provid
Argue,	argues,	arguing,	argued	=>	argu
Lemmatize
Retrieve	the	linguistic	root	of	the	word
Walk,	walking,	walked	=>	walk
Is,	am,	are	=>	be
Begin,	began,	begun	=>	begin
*Nouns	and	verbs	are	lemmatized	differently.
Tokenize
Count	distinct	words from	a	corpus
“The	quick	brown	fox	jumped	over	the	lazy	dogs”
becomes
[‘the’,	‘quick’,	‘brown’,	‘fox’,	‘jump’,	‘over’,	‘lazy’, ‘dog’]
Vectorize
Count	occurrences	from	distinct	word	
vector.
“The	quick	brown	fox	jumped over	the	lazy	dogs”
Tokenized	to:
[‘the’,	‘quick’,	‘brown’,	‘fox’,	‘jump’,	‘over’,	‘lazy’,	‘dog’]
Vectorized	to:
[2,	1,	1,	1,	1,	1,	1,	1]
Bag-of-Words	Comparison
Doc	1: “The	quick	brown	fox	jumped	over	the	lazy	dogs”
Doc	2:	“The	quick	fox	ran	away	from	the	dog”
After	processing,	the	corpus	attribute	vector	is:
[‘quick’,	‘brown’,	‘fox’,	‘jump’,	‘over’,	‘lazy’,	
‘dog’,	‘run’,	‘away’, ‘from’]
Two	documents	vectorize to:
Doc	1:	[1,	1,	1,	1,	1,	1,	1,	0,	0,	0]
Doc	2:	[1,	0,	1,	0,	0,	0,	1,	1,	1,	1]
Sentences	are	transformed	into	numeric	vectors!
brown
fox
lazy
run
quick
dog
Similarity	Measure
Cosine	similarity	calculates	how	close	2	numeric	vectors	
are	which	is	like	the distance	measure	between	2	
points.	
This	problem	has	just	reduced	to	simple	matrix	algebra.
Bi-Gram	Comparison
Due	to	the	same	words	used	across	our	videos,	the	bag-of-words	
similarity	resulted	high	false	positive	matches.		
The	solution	is	to	use	a	Bi-Gram	algorithm	where	2	consecutive	
words	are	extracted	as	one	feature:
“The	quick	brown	fox	jumped	over	the	lazy	dogs”
becomes:
[‘the	quick’,	‘quick	brown’,	‘brown	fox’,	‘fox	jump’,	
‘jump	over’,	‘over	the’,	‘the	lazy’,	‘lazy	dog’]
Limitations
Certain	phrases	such	as	“Behind	the	scenes”	are	
found	frequently.		This	creates	an	artificially	high	
similarity	score	even	if	the	videos	are	dissimilar.
Possible	solutions:
• Perform	more	custom	data	scrubbing
• Double-check	by	matching	duration	of	videos	
• Have	the	matches	verified	by	a	human
Conclusion
I	use	Natural	Language	Processing	to:
1. Identify	similar	videos	across	platforms
2. Tie	assets	together	where	some	are	identical	videos	
while	others	are	derived	videos	(such	as	trailers	or	
promos).
Thank	You!
Annie	Flippo @ACflippo
Slides	and	code	are	available	at
bit.ly/acflippo-nlp

More Related Content

PDF
Predict YouTube Video Views
PDF
What Managers Need to Know about Data Science
PDF
A Chance to Change your Perception on Change
PPTX
Vedic Neuro Linguistic Programming (NLP)
DOCX
How to learn languages with nlp
PPTX
Neuro linguistic programming introduction
PDF
27 development
PDF
Mind Master NLP Strategies for Transformation
Predict YouTube Video Views
What Managers Need to Know about Data Science
A Chance to Change your Perception on Change
Vedic Neuro Linguistic Programming (NLP)
How to learn languages with nlp
Neuro linguistic programming introduction
27 development
Mind Master NLP Strategies for Transformation

Viewers also liked (8)

PPT
NLP in English
PPTX
Neuro Linguistic Programming in Elt
PPTX
Learning To Learn by Catalyst NLP
PPTX
Powerpoint nlp..
PPTX
Well being through neuro linguistic programming
PPTX
The Role of Natural Language Processing in Information Retrieval
PPT
PDF
Nlp breakthrough india
NLP in English
Neuro Linguistic Programming in Elt
Learning To Learn by Catalyst NLP
Powerpoint nlp..
Well being through neuro linguistic programming
The Role of Natural Language Processing in Information Retrieval
Nlp breakthrough india
Ad

Similar to Use NLP to Solve Business Problems (20)

PPTX
Natural language processing and search
PDF
Introduction to Natural Language Processing (NLP)
PPTX
SG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptx
PPTX
BEA 2015 Generating Metadata by Machine Final
PPTX
Natural-Language-Processing -Stages and application area.pptx
PDF
Deep Representation: Building a Semantic Image Search Engine
PDF
Beyond the Symbols: A 30-minute Overview of NLP
PDF
Machine Learning for Natural Language Processing| ashokveda . pdf
PDF
Natural Language Processing for development
PDF
Natural Language Processing for development
PPTX
Introduction to natural language processing (NLP)
PDF
Serverless Text Analytics with Amazon Comprehend
PPT
Content like water
PDF
Natural Language Processing: L01 introduction
PPTX
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
PPTX
Networking lesson 4 chaoter 1 Module 4-1.pptx
PPTX
Fine tune and deploy Hugging Face NLP models
PDF
Watson API Use Case Demos for the Nittany Watson Challenge
PPTX
NLP Deep Dive - recurrent neural networks .pptx
PDF
Natural language processing (nlp)
Natural language processing and search
Introduction to Natural Language Processing (NLP)
SG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptx
BEA 2015 Generating Metadata by Machine Final
Natural-Language-Processing -Stages and application area.pptx
Deep Representation: Building a Semantic Image Search Engine
Beyond the Symbols: A 30-minute Overview of NLP
Machine Learning for Natural Language Processing| ashokveda . pdf
Natural Language Processing for development
Natural Language Processing for development
Introduction to natural language processing (NLP)
Serverless Text Analytics with Amazon Comprehend
Content like water
Natural Language Processing: L01 introduction
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
Networking lesson 4 chaoter 1 Module 4-1.pptx
Fine tune and deploy Hugging Face NLP models
Watson API Use Case Demos for the Nittany Watson Challenge
NLP Deep Dive - recurrent neural networks .pptx
Natural language processing (nlp)
Ad

Recently uploaded (20)

PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Foundation of Data Science unit number two notes
PDF
Introduction to Business Data Analytics.
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Lecture1 pattern recognition............
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Global journeys: estimating international migration
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Supervised vs unsupervised machine learning algorithms
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to Knowledge Engineering Part 1
IB Computer Science - Internal Assessment.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Foundation of Data Science unit number two notes
Introduction to Business Data Analytics.
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
climate analysis of Dhaka ,Banglades.pptx
Lecture1 pattern recognition............
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction-to-Cloud-ComputingFinal.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Global journeys: estimating international migration

Use NLP to Solve Business Problems