SlideShare a Scribd company logo
NATURAL LANGUAGE UNDERSTANDING
WITH MACHINE LEARNED ANNOTATORS &
DEEP LEARNED ONTOLOGIES AT SCALE
Dr. David Talby
The problem
Who needs to
be vaccinated?
Who fits this
clinical trial?
Who is at risk
for sepsis?
Who is getting
meds they’re
allergic to?
Who on this
protocol did not
have this side
effect?
At the beginning, there was search
Scalable & robust Indexing pipeline
Tokenizers & analyzers
Synonyms, spellers & Auto-suggest
File formats & header boosting
Rankers, link & reputation boosting
Then there was semantic search
“cheap red prom dresses”
“laptops under $500”
“italian restaurants near me that deliver”
“captain america civil war tonight”
“nba scores”
Dictionary Based Attribute Extraction
Dell - XPS 15.6 4K Ultra HD Touch-
Screen Laptop - Intel Core i5 - 8GB
Memory - 256GB Solid State Drive -
Silver
Machine Learned Attribute Extraction
If you go for the ambience, you'll be
disappointed. If you go for good,
inexpensive and authentic Mexican
food, then you're in the right place.
Then, you need to understand language
Prescribing sick days due to diagnosis of influenza. Positive
Jane complains about flu-like symptoms. Speculative
Jane may be experiencing some sort of flu episode. Possible
Jane’s RIDT came back negative for influenza. Negative
Jane is at high risk for flu if she’s not vaccinated. Conditional
Jane’s older brother had the flu last month.
Family
history
Jane had a severe case of flu last year.
Patient
history
1.
Language gets complex
and domain specific
Human language is wonderfully nuanced
Joe expressed concerns about the risks of bird flu. Nothing
Joe shows no signs of stroke, except for numbness.
Double
Negative
Nausea, vomiting and ankle swelling negative. Compound
(it gets worse – in reality a lot of text isn’t valid English)
Patient denies alcohol abuse. Speculative
Allergies: Penicillin, Dust, Sneezing. Compound
Let’s build this!
The input
(patient
records)
The
processing
framework
The output The query
engines
SENTENCE
DETECTION
SECTION DETECTION
TOKENIZER LEMMATIZER
STOPWORD REMOVAL
NEGATION DETECTION
CONDITIONAL SCOPE
SPECULATIVE SCOPE
DATE NUMBER UNIT QUANITITY
CONCEPT EXTRACTION
Natural Language Understanding with Machine Learned Annotators and Deep Learned Ontologies at Scale
Natural Language Understanding with Machine Learned Annotators and Deep Learned Ontologies at Scale
2.
you’ll need
machine learning early
Machine learned annotators
Grammatical Patterns
If … then …
Direct Inferences
Age < 18 ==> Child
Lookups
RIDT (lab test)
Under-diagnosed conditions
Flu Depression
Implied by Context
relevant labs normal
Sometimes, it’s easier to just code an annotation’s business logic
But sometimes it’s easier to learn it from examples:
Natural Language Understanding with Machine Learned Annotators and Deep Learned Ontologies at Scale
3.
bootstrap and then expand
your vocabulary
Natural Language Understanding with Machine Learned Annotators and Deep Learned Ontologies at Scale
Expanding & updating ontologies
Word2Vec
Let’s build this too!
Natural Language Understanding with Machine Learned Annotators and Deep Learned Ontologies at Scale
Natural Language Understanding with Machine Learned Annotators and Deep Learned Ontologies at Scale
Summary: How Summary: Why
1. Language gets complex
and domain specific
2. You’ll need machine
learning early
3. Bootstrap & then
expand your vocabulary
Who needs to
be vaccinated?
Who fits this
clinical trial?
Who is at risk
for sepsis?
Thank You!
@davidtalby

More Related Content

PPTX
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
PDF
1555 track2 talby
PDF
Nlp based retrieval of medical information for diagnosis of human diseases
PDF
Nlp based retrieval of medical information for diagnosis of human diseases
PDF
Cambridge seminar april 2018
PDF
NLP tutorial at AIME 2020
PDF
CV_Min_Jiang
PDF
Towards comprehensive syntactic and semantic annotations of the clinical narr...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
1555 track2 talby
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseases
Cambridge seminar april 2018
NLP tutorial at AIME 2020
CV_Min_Jiang
Towards comprehensive syntactic and semantic annotations of the clinical narr...

Similar to Natural Language Understanding with Machine Learned Annotators and Deep Learned Ontologies at Scale (20)

PDF
Challenges in understanding clinical notes: Why NLP Engines Fall Short
PPTX
Information extraction from EHR
PDF
Natural Language Processing, Techniques, Current Trends and Applications in I...
PDF
International Journal of Computational Engineering Research(IJCER)
PPTX
Natural Language Processing to Curate Unstructured Electronic Health Records
PDF
State of the Art Natural Language Processing at Scale with Alexander Thomas a...
PDF
rosario_phd_thesis
PDF
Clinical data successes using machine learning (word2vec, RNN)
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Learning to speak medicine
PPTX
Understanding medical concepts and codes through NLP methods
PDF
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
PDF
NLP support for clinical tasks and decisions
PPT
Text Analytics for Semantic Computing
PPT
NLP 2020: What Works and What's Next
PPTX
Natural Language Understanding in Healthcare
PDF
Natural Language Processing with Python
PDF
MLconf NYC Chang Wang
PDF
Using NLP to Explore Entity Relationships in COVID-19 Literature
PDF
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Challenges in understanding clinical notes: Why NLP Engines Fall Short
Information extraction from EHR
Natural Language Processing, Techniques, Current Trends and Applications in I...
International Journal of Computational Engineering Research(IJCER)
Natural Language Processing to Curate Unstructured Electronic Health Records
State of the Art Natural Language Processing at Scale with Alexander Thomas a...
rosario_phd_thesis
Clinical data successes using machine learning (word2vec, RNN)
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Learning to speak medicine
Understanding medical concepts and codes through NLP methods
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
NLP support for clinical tasks and decisions
Text Analytics for Semantic Computing
NLP 2020: What Works and What's Next
Natural Language Understanding in Healthcare
Natural Language Processing with Python
MLconf NYC Chang Wang
Using NLP to Explore Entity Relationships in COVID-19 Literature
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Ad

More from David Talby (11)

PPTX
Building State-of-the-art Natural Language Processing Projects with Free Soft...
PPTX
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
PPTX
How to Apply NLP to Analyze Clinical Trials
PPTX
New Frontiers in Applied NLP​ - PAW Healthcare 2022
PPTX
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
PPTX
Applying NLP to Personalized Healthcare - 2021
PPTX
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
PPTX
Architecting an Open Source AI Platform 2018 edition
PPTX
Deep learning for natural language understanding
PPTX
Build your open source data science platform
PPTX
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Building State-of-the-art Natural Language Processing Projects with Free Soft...
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
How to Apply NLP to Analyze Clinical Trials
New Frontiers in Applied NLP​ - PAW Healthcare 2022
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Applying NLP to Personalized Healthcare - 2021
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Architecting an Open Source AI Platform 2018 edition
Deep learning for natural language understanding
Build your open source data science platform
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Ad

Recently uploaded (20)

PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
AutoCAD Professional Crack 2025 With License Key
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Cost to Outsource Software Development in 2025
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Complete Guide to Website Development in Malaysia for SMEs
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Autodesk AutoCAD Crack Free Download 2025
AutoCAD Professional Crack 2025 With License Key
CHAPTER 2 - PM Management and IT Context
Digital Systems & Binary Numbers (comprehensive )
Operating system designcfffgfgggggggvggggggggg
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
Wondershare Filmora 15 Crack With Activation Key [2025
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Oracle Fusion HCM Cloud Demo for Beginners
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
iTop VPN Free 5.6.0.5262 Crack latest version 2025
Computer Software and OS of computer science of grade 11.pptx
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
wealthsignaloriginal-com-DS-text-... (1).pdf
Cost to Outsource Software Development in 2025
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Complete Guide to Website Development in Malaysia for SMEs
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps

Natural Language Understanding with Machine Learned Annotators and Deep Learned Ontologies at Scale

  • 1. NATURAL LANGUAGE UNDERSTANDING WITH MACHINE LEARNED ANNOTATORS & DEEP LEARNED ONTOLOGIES AT SCALE Dr. David Talby
  • 2. The problem Who needs to be vaccinated? Who fits this clinical trial? Who is at risk for sepsis? Who is getting meds they’re allergic to? Who on this protocol did not have this side effect?
  • 3. At the beginning, there was search Scalable & robust Indexing pipeline Tokenizers & analyzers Synonyms, spellers & Auto-suggest File formats & header boosting Rankers, link & reputation boosting
  • 4. Then there was semantic search “cheap red prom dresses” “laptops under $500” “italian restaurants near me that deliver” “captain america civil war tonight” “nba scores” Dictionary Based Attribute Extraction Dell - XPS 15.6 4K Ultra HD Touch- Screen Laptop - Intel Core i5 - 8GB Memory - 256GB Solid State Drive - Silver Machine Learned Attribute Extraction If you go for the ambience, you'll be disappointed. If you go for good, inexpensive and authentic Mexican food, then you're in the right place.
  • 5. Then, you need to understand language Prescribing sick days due to diagnosis of influenza. Positive Jane complains about flu-like symptoms. Speculative Jane may be experiencing some sort of flu episode. Possible Jane’s RIDT came back negative for influenza. Negative Jane is at high risk for flu if she’s not vaccinated. Conditional Jane’s older brother had the flu last month. Family history Jane had a severe case of flu last year. Patient history
  • 7. Human language is wonderfully nuanced Joe expressed concerns about the risks of bird flu. Nothing Joe shows no signs of stroke, except for numbness. Double Negative Nausea, vomiting and ankle swelling negative. Compound (it gets worse – in reality a lot of text isn’t valid English) Patient denies alcohol abuse. Speculative Allergies: Penicillin, Dust, Sneezing. Compound
  • 8. Let’s build this! The input (patient records) The processing framework The output The query engines
  • 9. SENTENCE DETECTION SECTION DETECTION TOKENIZER LEMMATIZER STOPWORD REMOVAL NEGATION DETECTION CONDITIONAL SCOPE SPECULATIVE SCOPE DATE NUMBER UNIT QUANITITY CONCEPT EXTRACTION
  • 13. Machine learned annotators Grammatical Patterns If … then … Direct Inferences Age < 18 ==> Child Lookups RIDT (lab test) Under-diagnosed conditions Flu Depression Implied by Context relevant labs normal Sometimes, it’s easier to just code an annotation’s business logic But sometimes it’s easier to learn it from examples:
  • 15. 3. bootstrap and then expand your vocabulary
  • 17. Expanding & updating ontologies Word2Vec
  • 21. Summary: How Summary: Why 1. Language gets complex and domain specific 2. You’ll need machine learning early 3. Bootstrap & then expand your vocabulary Who needs to be vaccinated? Who fits this clinical trial? Who is at risk for sepsis?