SlideShare a Scribd company logo
David Talby
@davidtalby
CTO, Atigeo
SEMANTIC NATURAL LANGUAGE UNDERSTANDING
WITH SPARK, UIMA & MACHINE-LEARNED ONTOLOGIES
Claudiu Branzan
@melcutz
Principal Lead, Atigeo
2
2
THE PROBLEM
Who needs to
be vaccinated?
Who fits this
clinical trial?
Who is at risk
for sepsis?
Who is getting
meds they’re
allergic to?
Who on this protocol
did not have this
side effect?
3
AT THE BEGINNING, THERE WAS SEARCH
Scalable & robust Indexing pipeline
Tokenizers & analyzers
Synonyms, spellers & Auto-suggest
File formats & header boosting
Rankers, link & reputation boosting
4
THEN THERE WAS SEMANTIC SEARCH
“cheap red prom dresses”
“laptops under $500”
“italian restaurants near me that deliver”
“captain america civil war tonight”
“nba scores”
Dictionary Based Attribute Extraction
Dell - XPS 15.6 4K Ultra HD Touch-Screen
Laptop - Intel Core i5 - 8GB Memory -
256GB Solid State Drive - Silver
Machine Learned Attribute Extraction
If you go for the ambience, you'll be
disappointed. If you go for good,
inexpensive and authentic Mexican food,
then you're in the right place.
5
AND THEN, YOU NEED TO UNDERSTAND LANGUAGE
Prescribing sick days due to diagnosis of influenza. Positive
Jane complains about flu-like symptoms. Speculative
Jane may be experiencing some sort of flu episode. Possible
Jane’s RIDT came back negative for influenza. Negative
Jane is at high risk for flu if she’s not vaccinated. Conditional
Jane’s older brother had the flu last month. Family history
Jane had a severe case of flu last year. Patient history
6
LANGUAGE GETS COMPLEX & DOMAIN SPECIFIC
Joe expressed concerns about the risks of bird flu. Nothing
Joe shows no signs of stroke, except for numbness. Double Negative
Nausea, vomiting and ankle swelling negative. Compound
(it gets worse – in reality a lot of text isn’t valid English)
Patient denies alcohol abuse. Speculative
Allergies: Penicillin, Dust, Sneezing. Compound
7
7
LET’S BUILD THIS!
The input
(patient records)
The processing
framework
The output The query engines
8
8
SENTENCE DETECTION
SECTION DETECTION
TOKENIZER LEMMATIZER
STOPWORD REMOVAL
NEGATION DETECTION
CONDITIONAL SCOPE
SPECULATIVE SCOPE
DATE NUMBER UNIT QUANITITY
CONCEPT EXTRACTION
9
9
First Demo: Annotators & Assertions
1 0
10
MACHINE LEARNED ANNOTATORS
Grammatical Patterns
If … then …
Direct Inferences
Age < 18 ==> Child
Lookups
RIDT (lab test)
Under-diagnosed conditions
Flu Depression
Implied by Context
relevant labs normal
Sometimes, it’s easier to just code an annotation’s business logic
But sometimes it’s easier to learn it from examples:
1 1
11
Second Demo: Machine Learned Annotator
1 2
1 3
13
WHAT ABOUT EXPANDING & UPDATING ONTOLOGIES?
Word2Vec
1 4
14
LET’S BUILD THIS TOO!
1 5
15
Third Demo: Ontology Enrichment
1 6
16
SUMMARY & APPLICATIONS
Who needs to
be vaccinated?
Who fits this
clinical trial?
Who is at risk
for sepsis?
Who is getting
meds they’re
allergic to?
Who on this protocol
did not have this
side effect?
1 7
17
@Atigeo
@melcutz
@davidtalby
© 2015 Atigeo, Corporation. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this presentation. Because Atigeo
must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided after the date of this presentation. ATIGEO MAKES NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
APPENDIX
In case the live demo gets cold feet on stage
1 9
2 0
2 1
2 2
2 3
2 4
2 5
2 6
2 7
2 8

More Related Content

PPTX
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
PDF
1555 track2 talby
PDF
Similarity Measures for Semantic Relation Extraction
PPTX
Hunting criminals with hybrid analytics -- October 2015
PPTX
Building an intelligent big data application in 30 minutes
PPTX
Natural Language Processing
PDF
What Makes Healthcare Data Science so Hard & Interesting - Data Science Pop-u...
PPTX
Building blocks for building bots
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
1555 track2 talby
Similarity Measures for Semantic Relation Extraction
Hunting criminals with hybrid analytics -- October 2015
Building an intelligent big data application in 30 minutes
Natural Language Processing
What Makes Healthcare Data Science so Hard & Interesting - Data Science Pop-u...
Building blocks for building bots

Viewers also liked (20)

PPTX
"There's a bot for that!" - The World of Conversational UIs and Chat Bots
PDF
Natural Language Processing for the Semantic Web
PPTX
Tokyo azure meetup #13 build bots with azure bot services
PPTX
Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...
PDF
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
PDF
Data day2017
PDF
From Rocket Science to Data Science
PPTX
Extending Data Lake using the Lambda Architecture June 2015
PDF
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
PDF
Using Machine Learning to Automate Clinical Pathways
PPTX
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
PPTX
Clinical Trial Management Systems of next next decade
PPTX
Clinical research and clinical data management - Ikya Global
PPTX
Oncology Big Data: A Mirage or Oasis of Clinical Value?
PPTX
Clinical Data Management: Strategies for unregulated data
PPT
Artificial Intelligence
PDF
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
PPTX
Deep Learning and Recurrent Neural Networks in the Enterprise
PPTX
Smart Data Conference: DL4J and DataVec
"There's a bot for that!" - The World of Conversational UIs and Chat Bots
Natural Language Processing for the Semantic Web
Tokyo azure meetup #13 build bots with azure bot services
Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Data day2017
From Rocket Science to Data Science
Extending Data Lake using the Lambda Architecture June 2015
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Using Machine Learning to Automate Clinical Pathways
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Clinical Trial Management Systems of next next decade
Clinical research and clinical data management - Ikya Global
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Clinical Data Management: Strategies for unregulated data
Artificial Intelligence
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Deep Learning and Recurrent Neural Networks in the Enterprise
Smart Data Conference: DL4J and DataVec
Ad

Similar to Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies (20)

PPTX
2014 abic-talk
PPTX
How Four Statistical Rules Forecast Who Wins a Competitive Bid
PDF
All you need know about testing
DOCX
Essential Biology 04.4 Genetic Engineering & Biotechnology
PDF
BigData and Algorithms - LA Algorithmic Trading
PDF
Discount Usability Testing for Agile Teams
PDF
IMA How to Give A Great Research Talk
PDF
Opsec for security researchers
PDF
Plexus Sept Oct 2013
PDF
Gamification of Chaos Testing
PPTX
The Semantic Web - This time... its Personal
PDF
4 Factors That Affect Research Reproducibility
PPTX
Bioanalytical validation house of cards
PDF
Stuart Reid - When Passion Obscures the Facts:The Case For Evidence-Based Te...
PPT
User Driven Development For Palinet
PDF
II-SDV 2015, 20 - 21 April, in Nice
PPTX
Deep Learning Applications in the Enterprise
PDF
Chaos Engineering Without Observability ... Is Just Chaos
PPTX
TCUK 2012, Leah Guren, Golden Rules Redux
PDF
Hogeschool Den Haag Legal Analytics
2014 abic-talk
How Four Statistical Rules Forecast Who Wins a Competitive Bid
All you need know about testing
Essential Biology 04.4 Genetic Engineering & Biotechnology
BigData and Algorithms - LA Algorithmic Trading
Discount Usability Testing for Agile Teams
IMA How to Give A Great Research Talk
Opsec for security researchers
Plexus Sept Oct 2013
Gamification of Chaos Testing
The Semantic Web - This time... its Personal
4 Factors That Affect Research Reproducibility
Bioanalytical validation house of cards
Stuart Reid - When Passion Obscures the Facts:The Case For Evidence-Based Te...
User Driven Development For Palinet
II-SDV 2015, 20 - 21 April, in Nice
Deep Learning Applications in the Enterprise
Chaos Engineering Without Observability ... Is Just Chaos
TCUK 2012, Leah Guren, Golden Rules Redux
Hogeschool Den Haag Legal Analytics
Ad

More from David Talby (12)

PPTX
Building State-of-the-art Natural Language Processing Projects with Free Soft...
PPTX
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
PPTX
How to Apply NLP to Analyze Clinical Trials
PPTX
New Frontiers in Applied NLP​ - PAW Healthcare 2022
PPTX
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
PPTX
Applying NLP to Personalized Healthcare - 2021
PPTX
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
PPTX
Natural Language Understanding in Healthcare
PPTX
Architecting an Open Source AI Platform 2018 edition
PPTX
Deep learning for natural language understanding
PPTX
Build your open source data science platform
PPTX
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Building State-of-the-art Natural Language Processing Projects with Free Soft...
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
How to Apply NLP to Analyze Clinical Trials
New Frontiers in Applied NLP​ - PAW Healthcare 2022
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Applying NLP to Personalized Healthcare - 2021
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Natural Language Understanding in Healthcare
Architecting an Open Source AI Platform 2018 edition
Deep learning for natural language understanding
Build your open source data science platform
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

Recently uploaded (20)

PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Design an Analysis of Algorithms I-SECS-1021-03
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
Cost to Outsource Software Development in 2025
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Website Design Services for Small Businesses.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
Why Generative AI is the Future of Content, Code & Creativity?
Reimagine Home Health with the Power of Agentic AI​
Navsoft: AI-Powered Business Solutions & Custom Software Development
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Computer Software and OS of computer science of grade 11.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Design an Analysis of Algorithms I-SECS-1021-03
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Cost to Outsource Software Development in 2025
Patient Appointment Booking in Odoo with online payment
Website Design Services for Small Businesses.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Designing Intelligence for the Shop Floor.pdf
Weekly report ppt - harsh dattuprasad patel.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Design an Analysis of Algorithms II-SECS-1021-03
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025

Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies

  • 1. David Talby @davidtalby CTO, Atigeo SEMANTIC NATURAL LANGUAGE UNDERSTANDING WITH SPARK, UIMA & MACHINE-LEARNED ONTOLOGIES Claudiu Branzan @melcutz Principal Lead, Atigeo
  • 2. 2 2 THE PROBLEM Who needs to be vaccinated? Who fits this clinical trial? Who is at risk for sepsis? Who is getting meds they’re allergic to? Who on this protocol did not have this side effect?
  • 3. 3 AT THE BEGINNING, THERE WAS SEARCH Scalable & robust Indexing pipeline Tokenizers & analyzers Synonyms, spellers & Auto-suggest File formats & header boosting Rankers, link & reputation boosting
  • 4. 4 THEN THERE WAS SEMANTIC SEARCH “cheap red prom dresses” “laptops under $500” “italian restaurants near me that deliver” “captain america civil war tonight” “nba scores” Dictionary Based Attribute Extraction Dell - XPS 15.6 4K Ultra HD Touch-Screen Laptop - Intel Core i5 - 8GB Memory - 256GB Solid State Drive - Silver Machine Learned Attribute Extraction If you go for the ambience, you'll be disappointed. If you go for good, inexpensive and authentic Mexican food, then you're in the right place.
  • 5. 5 AND THEN, YOU NEED TO UNDERSTAND LANGUAGE Prescribing sick days due to diagnosis of influenza. Positive Jane complains about flu-like symptoms. Speculative Jane may be experiencing some sort of flu episode. Possible Jane’s RIDT came back negative for influenza. Negative Jane is at high risk for flu if she’s not vaccinated. Conditional Jane’s older brother had the flu last month. Family history Jane had a severe case of flu last year. Patient history
  • 6. 6 LANGUAGE GETS COMPLEX & DOMAIN SPECIFIC Joe expressed concerns about the risks of bird flu. Nothing Joe shows no signs of stroke, except for numbness. Double Negative Nausea, vomiting and ankle swelling negative. Compound (it gets worse – in reality a lot of text isn’t valid English) Patient denies alcohol abuse. Speculative Allergies: Penicillin, Dust, Sneezing. Compound
  • 7. 7 7 LET’S BUILD THIS! The input (patient records) The processing framework The output The query engines
  • 8. 8 8 SENTENCE DETECTION SECTION DETECTION TOKENIZER LEMMATIZER STOPWORD REMOVAL NEGATION DETECTION CONDITIONAL SCOPE SPECULATIVE SCOPE DATE NUMBER UNIT QUANITITY CONCEPT EXTRACTION
  • 10. 1 0 10 MACHINE LEARNED ANNOTATORS Grammatical Patterns If … then … Direct Inferences Age < 18 ==> Child Lookups RIDT (lab test) Under-diagnosed conditions Flu Depression Implied by Context relevant labs normal Sometimes, it’s easier to just code an annotation’s business logic But sometimes it’s easier to learn it from examples:
  • 11. 1 1 11 Second Demo: Machine Learned Annotator
  • 12. 1 2
  • 13. 1 3 13 WHAT ABOUT EXPANDING & UPDATING ONTOLOGIES? Word2Vec
  • 14. 1 4 14 LET’S BUILD THIS TOO!
  • 15. 1 5 15 Third Demo: Ontology Enrichment
  • 16. 1 6 16 SUMMARY & APPLICATIONS Who needs to be vaccinated? Who fits this clinical trial? Who is at risk for sepsis? Who is getting meds they’re allergic to? Who on this protocol did not have this side effect?
  • 18. © 2015 Atigeo, Corporation. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 19. APPENDIX In case the live demo gets cold feet on stage 1 9
  • 20. 2 0
  • 21. 2 1
  • 22. 2 2
  • 23. 2 3
  • 24. 2 4
  • 25. 2 5
  • 26. 2 6
  • 27. 2 7
  • 28. 2 8