SlideShare a Scribd company logo
Apache cTAKES
NLP in Healthcare
Alex Zbarcea (FannieMae / cTAKES committer)
2
Episode of Care
medications
imaging
pathology
inpatient
services and procedures
outpatient
services and procedures
medications
imaging
pathology
inpatient
services and procedures
outpatient
services and procedures
research
EMR
notes
Natural Language Processing (NLP)
3
“A way for computers to analyze, understand and derive
meaning from human language” - algorithmia [1]
[1] - https://blog.algorithmia.com/introduction-natural-language-processing-nlp/
● Feasibility
Big Data / Machine Learning / Apache Projects
● Challenges
Ontology / Specialization / Anonymization
● Approaches:
Extraction
Generation
● Algorithms:
Rule-based
Machine Learning
● Linguistic annotations:
Penn TreeBank [1]
GENIA [2]
How it works
4
corpus
[1] - https://www.clips.uantwerpen.be/pages/mbsp-tags
[2] - https://orbit.nlm.nih.gov/browse-repository/dataset/human-annotated/83-genia-corpus
5
Apache cTAKES: Overview
plain text
CDA
Named Entity
* drug
* disease/disorder
* sign/symptom
* anatomical site
* procedures
Pipeline based - combining techniques:
● Rule-based
● Machine Learning (ML)
Java, Modular
Measurable performance (standard)
Boundary detection
Tokenization
Normalization (Lemma)
Part-of-speech
Shallow parsing
Entity recognition
cTAKES System
6
NLM
Apache OpenNLP
SPECIALIST NLP Tools
Apache Lucene
UMLS, SNOMED-CT, RxNORM
ICD10/9, Mayo Clinic, Custom
Tasks in NLP
(cTAKES example)
7
cTAKES: Pipelines
8
(e.g. examples/pipeline/ProcessDir.piper )
// This file contains commands and parameters to run the ctakes-examples "Hello World"
pipeline
readFiles org/apache/ctakes/examples/notes
// Load a simple token processing pipeline from another pipeline file
load DefaultTokenizerPipeline.piper
// Add non-core annotators
add ContextDependentTokenizerAnnotator
// Collect discovered Entity information for post-run access
collectEntities
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
● Documentation (confluence [3])
● ctakes-examples
● main Classes
[alex ~/ctakes {trunk %} ]$ grep -nRI --include="*.java" "main(String[] args)" | wc -l
171
cTAKES: Exploring Examples
9
[1] - https://builds.apache.org/analysis
[2] - https://builds.apache.org/view/C/view/Apache%20cTAKES/
[3] - https://cwiki.apache.org//confluence/display/CTAKES
● smokingstatus
● coreference
● NexEx
● pipelines
● training
● temporal
● relationextractor
● etc
● Run on real data (i.a. LibreHealth / OpenEMR)
Apache cTAKES Demo
10 [1] - https://github.com/azbarcea/ctakes-examples
Apache Software Foundation
● Community
○ Linguist experts
Users
Developers
● Mature Software Lifecycle
○ Support
Issues
SCM - Collaboration
Jenkins
Sonar
Distribution
● Popularize
11
Get involved
(You don’t need to be a software developer)
12
● Help new users and provide feedback
● Give feedback on required features
● Write or Update documentation
● Test the code and report bugs
● Fix bugs
● Write and update the software
● Create artwork
● Extend docs references
● Recommend the project to others
● Gamification
● Volunteer valuable skills
● Learn about communities - the Apache Way
● Requirements Engineering
● Learn about NLP and Healthcare
● What a strong product is about
● Test Automation and Software Engineering
● Develop code with high quality
● Build strong Software Development skills
● Explore your creativity
● Marketing
● Use your time wisely
● Help research community
git: https://github.com/apache/ctakes
wiki: https://cwiki.apache.org//confluence/display/CTAKES
e-mail: https://ctakes.apache.org/mailing-lists.html
Thanks!
Any questions?
You can find me at:
● https://linkedin.com/in/azbarcea
● alexz@apache.org
13

More Related Content

PPTX
Apache cTAKES - NLP in Healthcare
PDF
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
PDF
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
PDF
exRNA Data Analysis Tools in the Genboree Workbench
PDF
Best practices for_large_oracle_apps_r12_implementations
PDF
Distributed tracing in OpenStack
PDF
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
PPTX
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Apache cTAKES - NLP in Healthcare
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
exRNA Data Analysis Tools in the Genboree Workbench
Best practices for_large_oracle_apps_r12_implementations
Distributed tracing in OpenStack
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
"Data Provenance: Principles and Why it matters for BioMedical Applications"

Similar to Apache cTAKES- NLP in Healthcare (20)

PDF
Shorten Device Boot Time for Automotive IVI and Navigation Systems
PPTX
OpenTelemetry 101 FTW
PPTX
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
ODP
Maven university-course
PDF
Presentation of OpenNLP
PDF
Integrating Apache Camel with Apache Syncope
PPTX
Getting Access to ALCF Resources and Services
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
PDF
Linaro Connect 2016 (BKK16) - Introduction to LISA
PDF
Introduction to OpenSees by Frank McKenna
PPTX
Internship msc cs
PDF
OpenSCAP Overview(security scanning for docker image and container)
DOCX
Resume_052715
PDF
Web Sphere Problem Determination Ext
KEY
PyCon AU 2012 - Debugging Live Python Web Applications
PPTX
The power of linux advanced tracer [POUG18]
PPT
Performance Analysis of Idle Programs
PDF
Resilience Engineering: A field of study, a community, and some perspective s...
PDF
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
PDF
Practical, team-focused operability techniques for distributed systems - DevO...
Shorten Device Boot Time for Automotive IVI and Navigation Systems
OpenTelemetry 101 FTW
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Maven university-course
Presentation of OpenNLP
Integrating Apache Camel with Apache Syncope
Getting Access to ALCF Resources and Services
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
Linaro Connect 2016 (BKK16) - Introduction to LISA
Introduction to OpenSees by Frank McKenna
Internship msc cs
OpenSCAP Overview(security scanning for docker image and container)
Resume_052715
Web Sphere Problem Determination Ext
PyCon AU 2012 - Debugging Live Python Web Applications
The power of linux advanced tracer [POUG18]
Performance Analysis of Idle Programs
Resilience Engineering: A field of study, a community, and some perspective s...
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
Practical, team-focused operability techniques for distributed systems - DevO...
Ad

Recently uploaded (20)

PDF
Human Health And Disease hggyutgghg .pdf
PDF
focused on the development and application of glycoHILIC, pepHILIC, and comm...
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPTX
CME 2 Acute Chest Pain preentation for education
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
neonatal infection(7392992y282939y5.pptx
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPTX
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
PPTX
Respiratory drugs, drugs acting on the respi system
PPTX
antibiotics rational use of antibiotics.pptx
PPT
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
PPT
ASRH Presentation for students and teachers 2770633.ppt
PPT
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
PPTX
anal canal anatomy with illustrations...
PPTX
Note on Abortion.pptx for the student note
PPTX
Neuropathic pain.ppt treatment managment
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PPTX
Cardiovascular - antihypertensive medical backgrounds
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
ACID BASE management, base deficit correction
Human Health And Disease hggyutgghg .pdf
focused on the development and application of glycoHILIC, pepHILIC, and comm...
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
CME 2 Acute Chest Pain preentation for education
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
neonatal infection(7392992y282939y5.pptx
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
Respiratory drugs, drugs acting on the respi system
antibiotics rational use of antibiotics.pptx
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
ASRH Presentation for students and teachers 2770633.ppt
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
anal canal anatomy with illustrations...
Note on Abortion.pptx for the student note
Neuropathic pain.ppt treatment managment
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
Cardiovascular - antihypertensive medical backgrounds
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
ACID BASE management, base deficit correction
Ad

Apache cTAKES- NLP in Healthcare

  • 1. Apache cTAKES NLP in Healthcare Alex Zbarcea (FannieMae / cTAKES committer)
  • 2. 2 Episode of Care medications imaging pathology inpatient services and procedures outpatient services and procedures medications imaging pathology inpatient services and procedures outpatient services and procedures research EMR notes
  • 3. Natural Language Processing (NLP) 3 “A way for computers to analyze, understand and derive meaning from human language” - algorithmia [1] [1] - https://blog.algorithmia.com/introduction-natural-language-processing-nlp/ ● Feasibility Big Data / Machine Learning / Apache Projects ● Challenges Ontology / Specialization / Anonymization
  • 4. ● Approaches: Extraction Generation ● Algorithms: Rule-based Machine Learning ● Linguistic annotations: Penn TreeBank [1] GENIA [2] How it works 4 corpus [1] - https://www.clips.uantwerpen.be/pages/mbsp-tags [2] - https://orbit.nlm.nih.gov/browse-repository/dataset/human-annotated/83-genia-corpus
  • 5. 5 Apache cTAKES: Overview plain text CDA Named Entity * drug * disease/disorder * sign/symptom * anatomical site * procedures Pipeline based - combining techniques: ● Rule-based ● Machine Learning (ML) Java, Modular Measurable performance (standard)
  • 6. Boundary detection Tokenization Normalization (Lemma) Part-of-speech Shallow parsing Entity recognition cTAKES System 6 NLM Apache OpenNLP SPECIALIST NLP Tools Apache Lucene UMLS, SNOMED-CT, RxNORM ICD10/9, Mayo Clinic, Custom
  • 7. Tasks in NLP (cTAKES example) 7
  • 8. cTAKES: Pipelines 8 (e.g. examples/pipeline/ProcessDir.piper ) // This file contains commands and parameters to run the ctakes-examples "Hello World" pipeline readFiles org/apache/ctakes/examples/notes // Load a simple token processing pipeline from another pipeline file load DefaultTokenizerPipeline.piper // Add non-core annotators add ContextDependentTokenizerAnnotator // Collect discovered Entity information for post-run access collectEntities https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
  • 9. ● Documentation (confluence [3]) ● ctakes-examples ● main Classes [alex ~/ctakes {trunk %} ]$ grep -nRI --include="*.java" "main(String[] args)" | wc -l 171 cTAKES: Exploring Examples 9 [1] - https://builds.apache.org/analysis [2] - https://builds.apache.org/view/C/view/Apache%20cTAKES/ [3] - https://cwiki.apache.org//confluence/display/CTAKES ● smokingstatus ● coreference ● NexEx ● pipelines ● training ● temporal ● relationextractor ● etc ● Run on real data (i.a. LibreHealth / OpenEMR)
  • 10. Apache cTAKES Demo 10 [1] - https://github.com/azbarcea/ctakes-examples
  • 11. Apache Software Foundation ● Community ○ Linguist experts Users Developers ● Mature Software Lifecycle ○ Support Issues SCM - Collaboration Jenkins Sonar Distribution ● Popularize 11
  • 12. Get involved (You don’t need to be a software developer) 12 ● Help new users and provide feedback ● Give feedback on required features ● Write or Update documentation ● Test the code and report bugs ● Fix bugs ● Write and update the software ● Create artwork ● Extend docs references ● Recommend the project to others ● Gamification ● Volunteer valuable skills ● Learn about communities - the Apache Way ● Requirements Engineering ● Learn about NLP and Healthcare ● What a strong product is about ● Test Automation and Software Engineering ● Develop code with high quality ● Build strong Software Development skills ● Explore your creativity ● Marketing ● Use your time wisely ● Help research community git: https://github.com/apache/ctakes wiki: https://cwiki.apache.org//confluence/display/CTAKES e-mail: https://ctakes.apache.org/mailing-lists.html
  • 13. Thanks! Any questions? You can find me at: ● https://linkedin.com/in/azbarcea ● alexz@apache.org 13