SlideShare a Scribd company logo
How computers understand
text content
a presentation for the Auckland content strategy meetup
by Anna Divoli
@annadivoli
.
Ph.D. in Biomedical Text Mining | Text Analytics Researcher | Head of R&D at Pingar
Who am I?
• 14 years in academia + 4 years in industry
• academically exposed to different disciplines:
biomedicine, bioinformatics,
computational linguistics, information retrieval,
information extraction, semantic technologies,
human-computer interaction, search user interface usability,
knowledge acquisition, visualizations
• lived in different countries:
Greece, UK, US, NZ
• learned English as a second language
(hint: I empathize with computer systems)
Anna Divoli Auckland content strategy meetup Aug 2015
Who are you?
• Marketing?
• Digital content?
• Information Architecture?
• Journalists?
• UX?
• Business Analysis?
• Software Development?
• CS research (incl. “text” people)?
• Other?
Anna Divoli Auckland content strategy meetup Aug 2015
What is “text”? Where is it?
www.nailingit.com/images/websites.jpg
www.bu.edu/today/files/2012/10/t_journals1.jpgweb.clarku.edu/offices/its/images/filepile.jpg
www.flickr.com/photos/jlconfor/14191286471
Human – Text Content Interaction
Humans:
Slow, Inconsistent, Expensive
Text content:
Overwhelmingly fast growing,
Disseminated across multiple sources
Anna Divoli Auckland content strategy meetup Aug 2015
NLP ∈ Artificial Intelligence
Machine
Learning
NLP
Computational
Linguistics
Applied
Text
Analytics
Storage
Memory
Security
Friendly UIs
Visualizations
Anna Divoli Auckland content strategy meetup Aug 2015
So, what’s in the text?
• Entities
• Facts
• Relations
• Themes/topics
• Opinions & sentiment
• …
+ Time/Location dimensions:
• Trends & paradigm shifts
• Networks
• …
Anna Divoli Auckland content strategy meetup Aug 2015
Named Entity Recognition
Find and classify names…
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
Anna Divoli Auckland content strategy meetup Aug 2015
Named Entity Recognition
Find and classify names…
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
People
Locations
Organizations
Methods: lexicon-based (gazeteers)
grammar-based (rule-based)
✓ statistical models (machine learning: algorithms + features)
✓ hybrids
Anna Divoli Auckland content strategy meetup Aug 2015
Named Entity Recognition
Find and classify names…
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
People Dates
Locations
Organizations
Who? Where?
When?
Anna Divoli Auckland content strategy meetup Aug 2015
Disambiguation & Normalization:
Word Sense Disambiguation & Text
Normalization
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
Word Sense Disambiguation: identifying which sense/meaning
of a word is used in a sentence, when the word has multiple
meanings. Synonyms & homonyms. Use context!!
Text normalization: transforming text into a single canonical
form that it might not have had before.
Anna Divoli Auckland content strategy meetup Aug 2015
Word Sense Disambiguation
& Text Normalization
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
Sam Arlington initiated partnership discussions during his visit to
Eureka offices in July.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
J. Smith went to Washington DC to see the Smithsonian Institute
and also met up with Virginia Peterson for a coffee.
Anna Divoli Auckland content strategy meetup Aug 2015
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
Sam Arlington initiated partnership discussions during his visit to
Eureka office in July.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
J. Smith went to Washington DC to see the Smithsonian Institute
and also met up with Virginia Peterson for a coffee.
Word Sense Disambiguation
& Text Normalization
Anna Divoli Auckland content strategy meetup Aug 2015
Fact & Relationship extraction
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
What?
Anna Divoli Auckland content strategy meetup Aug 2015
Deeper knowledge & Sentiment
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
How? Why? How do we feel about it?
S. Arlington visited the Eureka’s Ltd offices last month to initiate
partnership discussions.
John Smith was delighted to go to Washington to see the
Smithsonian and also met up with Virginia for a coffee.
Anna Divoli Auckland content strategy meetup Aug 2015
Sentiment analysis & opinion mining
• Dictionary-based (e.g. LIWC)
• Statistical
• Hybrid
• Polarity & strength
• Feelings
• Mood
• Aspects
• Who has this sentiment (source)
• What is the target of the sentiment
Pos | Neu | Neg & score
Angry, sad…
Happy, depressed…
Location, cleanliness…
Employees, customers…
Product, event, person…
Anna Divoli Auckland content strategy meetup Aug 2015
So, what’s in the text?
Anna Divoli Auckland content strategy meetup Aug 2015
• Entities
• Facts
• Relations
• Themes/topics  no training or ontologies need!
can utilize web resources (e.g., Wikipedia)
• Opinions & sentiment
• …
+ Time/Location dimensions:
• Trends & paradigm shifts
• Networks
• …
So, what ELSE is in the text?
• Ambiguity
• Metaphors
• Sarcasm
• Colloquialism/Slang
• Negation
• Hedging
• Conditional statements
• Inconsistencies/Bad grammar
• Text speak
• Anaphora
• Humor
I want an apple.
He drowned in a sea of grief.
George W Bush. Love him!
I slept like crap last night.
I am not sure I want to go to NYC.
The results indicate this.
When it rains I feel sad.
I think your smart.
C u l8r @Jacks
John met with Nick. He was upset.
Did you take a bath today? No. Is one
missing?
Anna Divoli Auckland content strategy meetup Aug 2015
So, what ELSE is in the text?
• Ambiguity
• Metaphors
• Sarcasm
• Colloquialism/Slang
• Negation
• Hedging
• Conditional statements
• Inconsistencies/Bad grammar
• Text speak
• Anaphora
• Humor
I want an apple.
He drowned in a sea of grief.
George W Bush. Love him!
I slept like crap last night.
I am not sure I want to go to NYC.
The results indicate this.
When it rains I feel sad.
I think your smart.
C u l8r @Jacks
John met with Nick. He was upset.
Did you take a bath today? No. Is one
missing?
Consider: distributed information (dialogue), technical/scientific text,
legal text, creative/poetry…
Anna Divoli Auckland content strategy meetup Aug 2015
Human language!
Eye drops off shelf.
Include your children when
baking cookies.
Turn right here.
John saw the man on the
mountain with a telescope.
He gave her cat food.
They are hunting dogs.
Anna Divoli Auckland content strategy meetup Aug 2015
Examples: Biology…
Looking for: interactions between SAF and viral LTR elements
(SAF is a transcription factor, LTR stands for ‘long terminal repeat’)
(Also: SAF = single and free, LTR = long term relationship)
Gene names:
tinman, lilliputian, dreadlocks, lush,
cheap date, methuselah, Van Gogh,
maggie, brainiac, grim, reaper,
cleopatra, swiss cheese, fucK, out cold,
ken and barbie, kenny, lava lamp,
hamlet, sonic hedgehog, werewolf, half
pint, drop dead, chardonnay, agnostic,
I’m not dead yet…
Anna Divoli Auckland content strategy meetup Aug 2015
Current State of NLP
• Rule-based systems for high precision results
• Hybrid systems for more robust performance
(rules + dictionaries/ontologies + statistical models)
• Limitation: specialized systems perform better
(much like humans!)
• Workflows offer work-around for more generic systems
e.g., check language  check category  choose model
Anna Divoli Auckland content strategy meetup Aug 2015
Examples of applications
(some are very specialized!)
Anna Divoli Auckland content strategy meetup Aug 2015
How computers understand text content - by Anna Divoli
How computers understand text content - by Anna Divoli
How computers understand text content - by Anna Divoli
How computers understand text content - by Anna Divoli
How computers understand text content - by Anna Divoli
How computers understand text content - by Anna Divoli
Content Enrichment
Content Inventory
Content Intelligence
How computers understand text content - by Anna Divoli
How computers understand text content - by Anna Divoli
pingar.com/discoveryone/
www.youtube.com/watch?v=i9FnMylGQxw
Take home messages
• Machines can do a lot of consistent, fast information
extraction
• Specialization is needed in several fields but systems can have
internal workflows
• Big data + statistics = magic!
• Always room for improvement
• Information management AND decisions AND predictions
Time for questions and discussion!
https://xkcd.com/1263/
Anna Divoli Auckland content strategy meetup Aug 2015
@annadivoli
.

More Related Content

PPTX
Blue Earth Global Conference
PPTX
Anna Divoli (Pingar Research) "How taxonomies and facets bring end-users clos...
PPTX
"Findability and usability lessons learnt from text analytics" By: Anna Div...
PDF
Essay On Tradition And Culture.pdf
PPTX
Kyla’s 2010 trip to usa
PDF
Gouty Arthritis Essay
PDF
Common Core in the SS Classroom 4-5
PPTX
Deviating with diversity, innovating with inclusion: a call for radical activ...
Blue Earth Global Conference
Anna Divoli (Pingar Research) "How taxonomies and facets bring end-users clos...
"Findability and usability lessons learnt from text analytics" By: Anna Div...
Essay On Tradition And Culture.pdf
Kyla’s 2010 trip to usa
Gouty Arthritis Essay
Common Core in the SS Classroom 4-5
Deviating with diversity, innovating with inclusion: a call for radical activ...

Similar to How computers understand text content - by Anna Divoli (20)

PPTX
Data Storytelling for Social Change
PPTX
Bigdatahuman
PPSX
Kyla USA trip april 2010
PDF
Let's Go! Final Presentation
PDF
Transgender Identity
RTF
AvrahamSpechlerResume.rev-best
PDF
The Politics Of Collective Advocacy In India Tools And Traps 1st Edition Nand...
PPTX
Test your research iq
PPTX
Engaging Community Residents with Data
PDF
CIL Stats Workshop April1 2022 Abram Silk.pdf
PPTX
Untitled design_20240807_222633_0000.pptx
PPTX
AECT 2015 Creating an intentional web presence
PPTX
Socio Scientific Issues Introduction 2014
PPTX
Socio Scientific Issue Introduction
PDF
Loff conference brochure 2015 Dominic Carter Keynotes
PDF
Pin On Products
DOCX
resume
PDF
Purdue Application Essay
PDF
Essay On Active Listening Skills
PPTX
Bringing the Child and Youth's Voice into Research and Evaluation
Data Storytelling for Social Change
Bigdatahuman
Kyla USA trip april 2010
Let's Go! Final Presentation
Transgender Identity
AvrahamSpechlerResume.rev-best
The Politics Of Collective Advocacy In India Tools And Traps 1st Edition Nand...
Test your research iq
Engaging Community Residents with Data
CIL Stats Workshop April1 2022 Abram Silk.pdf
Untitled design_20240807_222633_0000.pptx
AECT 2015 Creating an intentional web presence
Socio Scientific Issues Introduction 2014
Socio Scientific Issue Introduction
Loff conference brochure 2015 Dominic Carter Keynotes
Pin On Products
resume
Purdue Application Essay
Essay On Active Listening Skills
Bringing the Child and Youth's Voice into Research and Evaluation
Ad

More from Anna Divoli (7)

PPTX
AI for information management: why and how
PPTX
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
PPTX
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
PPTX
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
PPTX
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
PPT
Divoli Presentation at EBI Apr2011 Usability Part
PPT
Ebi apr2011 usability-part
AI for information management: why and how
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
Divoli Presentation at EBI Apr2011 Usability Part
Ebi apr2011 usability-part
Ad

Recently uploaded (20)

PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
A Presentation on Touch Screen Technology
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Encapsulation theory and applications.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Unlocking AI with Model Context Protocol (MCP)
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A Presentation on Touch Screen Technology
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
1. Introduction to Computer Programming.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Zenith AI: Advanced Artificial Intelligence
Heart disease approach using modified random forest and particle swarm optimi...
MIND Revenue Release Quarter 2 2025 Press Release
TLE Review Electricity (Electricity).pptx
Encapsulation theory and applications.pdf
A comparative study of natural language inference in Swahili using monolingua...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
WOOl fibre morphology and structure.pdf for textiles

How computers understand text content - by Anna Divoli

  • 1. How computers understand text content a presentation for the Auckland content strategy meetup by Anna Divoli @annadivoli . Ph.D. in Biomedical Text Mining | Text Analytics Researcher | Head of R&D at Pingar
  • 2. Who am I? • 14 years in academia + 4 years in industry • academically exposed to different disciplines: biomedicine, bioinformatics, computational linguistics, information retrieval, information extraction, semantic technologies, human-computer interaction, search user interface usability, knowledge acquisition, visualizations • lived in different countries: Greece, UK, US, NZ • learned English as a second language (hint: I empathize with computer systems) Anna Divoli Auckland content strategy meetup Aug 2015
  • 3. Who are you? • Marketing? • Digital content? • Information Architecture? • Journalists? • UX? • Business Analysis? • Software Development? • CS research (incl. “text” people)? • Other? Anna Divoli Auckland content strategy meetup Aug 2015
  • 4. What is “text”? Where is it? www.nailingit.com/images/websites.jpg www.bu.edu/today/files/2012/10/t_journals1.jpgweb.clarku.edu/offices/its/images/filepile.jpg www.flickr.com/photos/jlconfor/14191286471
  • 5. Human – Text Content Interaction Humans: Slow, Inconsistent, Expensive Text content: Overwhelmingly fast growing, Disseminated across multiple sources Anna Divoli Auckland content strategy meetup Aug 2015
  • 6. NLP ∈ Artificial Intelligence Machine Learning NLP Computational Linguistics Applied Text Analytics Storage Memory Security Friendly UIs Visualizations Anna Divoli Auckland content strategy meetup Aug 2015
  • 7. So, what’s in the text? • Entities • Facts • Relations • Themes/topics • Opinions & sentiment • … + Time/Location dimensions: • Trends & paradigm shifts • Networks • … Anna Divoli Auckland content strategy meetup Aug 2015
  • 8. Named Entity Recognition Find and classify names… S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. Anna Divoli Auckland content strategy meetup Aug 2015
  • 9. Named Entity Recognition Find and classify names… S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. People Locations Organizations Methods: lexicon-based (gazeteers) grammar-based (rule-based) ✓ statistical models (machine learning: algorithms + features) ✓ hybrids Anna Divoli Auckland content strategy meetup Aug 2015
  • 10. Named Entity Recognition Find and classify names… S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. People Dates Locations Organizations Who? Where? When? Anna Divoli Auckland content strategy meetup Aug 2015
  • 11. Disambiguation & Normalization: Word Sense Disambiguation & Text Normalization S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. Word Sense Disambiguation: identifying which sense/meaning of a word is used in a sentence, when the word has multiple meanings. Synonyms & homonyms. Use context!! Text normalization: transforming text into a single canonical form that it might not have had before. Anna Divoli Auckland content strategy meetup Aug 2015
  • 12. Word Sense Disambiguation & Text Normalization S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. Sam Arlington initiated partnership discussions during his visit to Eureka offices in July. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. J. Smith went to Washington DC to see the Smithsonian Institute and also met up with Virginia Peterson for a coffee. Anna Divoli Auckland content strategy meetup Aug 2015
  • 13. S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. Sam Arlington initiated partnership discussions during his visit to Eureka office in July. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. J. Smith went to Washington DC to see the Smithsonian Institute and also met up with Virginia Peterson for a coffee. Word Sense Disambiguation & Text Normalization Anna Divoli Auckland content strategy meetup Aug 2015
  • 14. Fact & Relationship extraction S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. What? Anna Divoli Auckland content strategy meetup Aug 2015
  • 15. Deeper knowledge & Sentiment S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. How? Why? How do we feel about it? S. Arlington visited the Eureka’s Ltd offices last month to initiate partnership discussions. John Smith was delighted to go to Washington to see the Smithsonian and also met up with Virginia for a coffee. Anna Divoli Auckland content strategy meetup Aug 2015
  • 16. Sentiment analysis & opinion mining • Dictionary-based (e.g. LIWC) • Statistical • Hybrid • Polarity & strength • Feelings • Mood • Aspects • Who has this sentiment (source) • What is the target of the sentiment Pos | Neu | Neg & score Angry, sad… Happy, depressed… Location, cleanliness… Employees, customers… Product, event, person… Anna Divoli Auckland content strategy meetup Aug 2015
  • 17. So, what’s in the text? Anna Divoli Auckland content strategy meetup Aug 2015 • Entities • Facts • Relations • Themes/topics  no training or ontologies need! can utilize web resources (e.g., Wikipedia) • Opinions & sentiment • … + Time/Location dimensions: • Trends & paradigm shifts • Networks • …
  • 18. So, what ELSE is in the text? • Ambiguity • Metaphors • Sarcasm • Colloquialism/Slang • Negation • Hedging • Conditional statements • Inconsistencies/Bad grammar • Text speak • Anaphora • Humor I want an apple. He drowned in a sea of grief. George W Bush. Love him! I slept like crap last night. I am not sure I want to go to NYC. The results indicate this. When it rains I feel sad. I think your smart. C u l8r @Jacks John met with Nick. He was upset. Did you take a bath today? No. Is one missing? Anna Divoli Auckland content strategy meetup Aug 2015
  • 19. So, what ELSE is in the text? • Ambiguity • Metaphors • Sarcasm • Colloquialism/Slang • Negation • Hedging • Conditional statements • Inconsistencies/Bad grammar • Text speak • Anaphora • Humor I want an apple. He drowned in a sea of grief. George W Bush. Love him! I slept like crap last night. I am not sure I want to go to NYC. The results indicate this. When it rains I feel sad. I think your smart. C u l8r @Jacks John met with Nick. He was upset. Did you take a bath today? No. Is one missing? Consider: distributed information (dialogue), technical/scientific text, legal text, creative/poetry… Anna Divoli Auckland content strategy meetup Aug 2015
  • 20. Human language! Eye drops off shelf. Include your children when baking cookies. Turn right here. John saw the man on the mountain with a telescope. He gave her cat food. They are hunting dogs. Anna Divoli Auckland content strategy meetup Aug 2015
  • 21. Examples: Biology… Looking for: interactions between SAF and viral LTR elements (SAF is a transcription factor, LTR stands for ‘long terminal repeat’) (Also: SAF = single and free, LTR = long term relationship) Gene names: tinman, lilliputian, dreadlocks, lush, cheap date, methuselah, Van Gogh, maggie, brainiac, grim, reaper, cleopatra, swiss cheese, fucK, out cold, ken and barbie, kenny, lava lamp, hamlet, sonic hedgehog, werewolf, half pint, drop dead, chardonnay, agnostic, I’m not dead yet… Anna Divoli Auckland content strategy meetup Aug 2015
  • 22. Current State of NLP • Rule-based systems for high precision results • Hybrid systems for more robust performance (rules + dictionaries/ontologies + statistical models) • Limitation: specialized systems perform better (much like humans!) • Workflows offer work-around for more generic systems e.g., check language  check category  choose model Anna Divoli Auckland content strategy meetup Aug 2015
  • 23. Examples of applications (some are very specialized!) Anna Divoli Auckland content strategy meetup Aug 2015
  • 34. Take home messages • Machines can do a lot of consistent, fast information extraction • Specialization is needed in several fields but systems can have internal workflows • Big data + statistics = magic! • Always room for improvement • Information management AND decisions AND predictions
  • 35. Time for questions and discussion! https://xkcd.com/1263/ Anna Divoli Auckland content strategy meetup Aug 2015 @annadivoli .

Editor's Notes

  • #6: We create and consume text!