SlideShare a Scribd company logo
Healthcare NLP:
Four Essentials to Make the Most of Unstructured Data
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Essentials for Natural Language Processing
The healthcare industry has recently realized a
sharp increase in interest in natural language
processing (NLP).
The unstructured clinical record contains a
wealth of insight into patients that isn’t available
in the structured record.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Essentials for Natural Language Processing
Additionally, advances in data science
and AI have introduced new techniques
for analyzing text, broadening and
deepening understanding of the patient.
Any organization seeking to leverage
their data to improve outcomes, reduce
cost, and further medical research
needs to consider the wealth of insight
stored in text and how they will create
value from that data using NLP.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Essentials for Natural Language Processing
The first step in using NLP can be the most
difficult, and many organizations never meet
the initial challenge of making the data
available for analysis.
NLP requires that data engineers transform
unstructured text into a usable format (see
need to know aspect #2 below) and in a
location where the NLP technology can make
use of it.
This NLP pre-requisite can be a complex
process, involving larger data sets and
different technologies than many data
engineers are familiar with.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Essentials for Natural Language Processing
This presentation outlines four need-to-know
ways to meet and overcome the challenges
of making unstructured text available for
advanced NLP analysis.
It’s focused on the challenges and skillsets
required to build a solid foundation for text
analytics.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding Free Text Is the Foundation
for Healthcare NLP
In my role of leading NLP efforts for
healthcare analytics vendor, I recently
worked on a patient safety surveillance
tool that helps health systems monitor for
potential adverse events.
For example, administering Narcan to
reverse the effects of a patient who
doesn’t respond well to a pain killer or
hospital-acquired pressure ulcers.
While the administration of Narcan is
commonly documented in structured data,
pressure ulcers are often found in
unstructured nursing notes.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding Free Text Is the Foundation
for Healthcare NLP
To get the necessary data to improve
patient safety, we needed to leverage the
free text of nursing notes.
We found that five of the 33 adverse
events were primarily documented in
unstructured text.
To access and leverage the text data in
the patient safety tool, we needed NLP.
We needed more, however, than the right
tools for NLP itself to use the rich
information unstructured text holds.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
To effectively build a data pipeline for text,
and navigate unfamiliar challenges, data
engineers must understand four key points:
1. Text Is Bigger and More Complex
2. Text Comes from Different Data Sources
3. Text Is Stored in Multiple Areas
4. Text User Documentation Patterns Matter
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
1: Text Is Bigger and More Complex
An average EMR record—such as a medication,
allergy, or diagnosis, etc.—runs between 50 to
150 bytes, or 50 to 150 MB per million records.
On the other hand, the average clinical note
record is approximately 150 times as large.
With large health systems storing hundreds of
millions of note records, this scale introduces
data transfer and storage complexities that
many data engineers won’t have previously
confronted.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
2: Text Comes from Different Data Sources
Experienced data professionals know well that data sources vary widely.
The data model for one vendor is different from another (e.g., from one EMR
to another). With text, the stakes are even higher. A typical data pipeline for
structured data (Figure 1) from an EMR is less complex than an unstructured
data pipeline.
Figure 1: A typical structured data pipeline
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
2: Text Comes from Different Data Sources
Structured data typically involves working with
just SQL and supporting tools (e.g., SSIS or
Informatica).
On the other hand, working with unstructured
text (Figure 2) involves a variety of tools
outside the typical data engineer’s skillset—
including programming languages such as C#
or Python and search engines such as
Elasticsearch and SOLR.
On top of this, the transformations required for
text vary significantly based on how it’s stored
in the source.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
2: Text Comes from Different Data Sources
Figure 2: An analytics vendor’s unstructured text pipeline for three EMR vendors
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
3: Text Is Stored in Multiple Areas
It’s easy to think of text as a monolith—that all
the text in a system lives in one place.
Where text is stored, however, depends on the
type of text and the system in use.
For example, clinical notes, radiology
reports, and pathology reports may exist
in two or three different sets of tables,
depending on the source system.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
3: Text Is Stored in Multiple Areas
Location will also vary based on the specific
implementation of that system.
With one vendor’s system, radiology reports
may be in the same table as clinical notes
or in the same tables as results, depending
on the workflow decisions behind the
configuration of the organization’s EMR.
One EMR vendor stores shorter text results
as a separate table from notes and reports,
while another will put results from the
tasking/messaging engine in another table.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
4: Text User Documentation Patterns Matter
Understanding how users document data
matters. For example, during a recent project to
identify adverse events for patients, we searched
for documentation of in-hospital falls.
The patient safety expert I was working with, a
nurse, had always seen patient falls documented
in nursing progress notes, but we found very
few mentions of any falls in those notes.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
4: Text User Documentation Patterns Matter
After discussions with the health information
management group and nurses at the health
system, we learned that it used a structured-only
documentation methodology for nursing.
The best source for documentation of in-hospital
falls was the physician progress notes.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
4: Text User Documentation Patterns Matter
This insight made a small difference in how
our data scientist searched for falls data, but
it made a significant difference in the results.
Filtering which notes went into the NLP
algorithm improved accuracy, particularly
the sensitivity of the algorithm.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding the Nuances of Text Makes
Successful NLP Possible
Working with text data is different than structured data.
Keep in mind this article’s four lessons:
Unstructured text records are significantly
larger than structured records.
Data engineers often need to preprocess
text before running NLP, which often
requires tools outside normal data pipelines.
Text may be stored in different areas of
source systems or EMRs.
Each organization may document text differently.
>
>
>
>
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding the Nuances of Text Makes
Successful NLP Possible
Data engineers who want to meet the challenges
of text and unlock its rich information will benefit
by starting on a focused project, rather than
taking on too many text tasks at once (a bottom-
up versus a top-down approach).
I recommend starting with a great use case that
aligns with organizational goals.
Using the patient safety scenario from
earlier, if an organization is focused on
improving patient safety, it may find that safety
events are documented in unstructured text,
limiting its ability to identify patient harm.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding the Nuances of Text Makes
Successful NLP Possible
Starting by pulling text for one type of safety
event (e.g., deep vein thromboses) can help
data engineers form a process.
They can then replicate this process for other
use cases and start pulling the text data and
using NLP tools to reduce patient harm and
transform healthcare more broadly.
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
For more information:
“This book is a fantastic piece of work”
– Robert Lindeman MD, FAAP, Chief Physician Quality Officer
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
More about this topic
Link to original article for a more in-depth discussion.
Healthcare NLP: Four Essentials to Make the Most of Unstructured Data
How Healthcare Text Analytics and Machine Learning Work Together to Improve Patient Outcomes
Mike Dow, Technical Director; Levi Thatcher, VP, Data Science
Text Analytics in Healthcare—Two Promising Frameworks that Meet Its Unique Demands
Mike Dow, Technical Director
Regenstrief Institute and Health Catalyst Team to Reveal Hidden Meaning in Clinical Data for Better
Patient Care – Health Catalyst News
The Top Three Recommendations for Successfully Deploying Predictive Analytics in Healthcare
Eric Just, Senior VP of Product Development
Three Approaches to Predictive Analytics in Healthcare
Health Catalyst Insight
© 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Other Clinical Quality Improvement Resources
Click to read additional information at www.healthcatalyst.com
Mike learned of the value of data early in his career. While working at a major EMR vendor in
2001, he led a project to help identify patients who were affected by drug recalls. He continued
his work in various roles at Allscripts, including reporting, data exchange and systems
architecture. From 2006 to 2015, Mike led the technology group at Galen Healthcare Solutions.
While the company and his team grew by 50% annually during this time, they became known for
excellence, earning awards like Best in KLAS for Technical Services and a Best Place to Work by Modern
Healthcare. Mike joined Health Catalyst in 2015 to help with strategic client implementations. He has since
joined the product development team to lead Health Catalyst’s text analytics initiative, making information
previously locked in text notes available to Health Catalyst’s apps and data architects.
Mike Dow

More Related Content

PPTX
Social Determinants of Health: Tools to Leverage Today's Data Imperative
PPTX
When Healthcare Data Analysts Fulfill the Data Detective Role
PPTX
ICD-10 PCS: Harnessing the Power of Procedure Codes
PPTX
Artificial Intelligence and Machine Learning in Healthcare: Four Real-World I...
PPTX
Improving Quality Measures Can Lead to Better Outcomes
PPTX
The Four Keys to Increasing Hospital Capacity Without Construction
PPTX
Data Science for Healthcare: What Today’s Leaders Must Know
PPTX
The Dangers of Commoditized Machine Learning in Healthcare: 5 Key Differentia...
Social Determinants of Health: Tools to Leverage Today's Data Imperative
When Healthcare Data Analysts Fulfill the Data Detective Role
ICD-10 PCS: Harnessing the Power of Procedure Codes
Artificial Intelligence and Machine Learning in Healthcare: Four Real-World I...
Improving Quality Measures Can Lead to Better Outcomes
The Four Keys to Increasing Hospital Capacity Without Construction
Data Science for Healthcare: What Today’s Leaders Must Know
The Dangers of Commoditized Machine Learning in Healthcare: 5 Key Differentia...

What's hot (20)

PPTX
Health Catalyst® Introduces Closed-Loop Analytics™ Services
PPTX
How to Build a Healthcare Analytics Team and Solve Strategic Problems
PPTX
Health Systems Share COVID-19 Financial Recovery Strategies in First Client H...
PPTX
A Roadmap for Optimizing Clinical Decision Support
PPTX
Effective Patient Stratification: Four Solutions to Common Hurdles
PPTX
Employer Health Plans: Keys to Lowering Cost, Boosting Benefits
PPTX
A Healthcare Mergers Framework: How to Accelerate the Benefits
PPTX
The Digitization of Healthcare: Why the Right Approach Matters and Five Steps...
PPTX
Three Keys to Improving Hospital Patient Flow with Machine Learning
PPTX
Exceptions to Information Blocking Defined in Proposed Rule: Here’s What You ...
PPTX
Physician Burnout and the EHR: Addressing Five Common Burdens
PPTX
The Top Four Skills of an Effective Healthcare Data Analyst
PPTX
Four Steps to Effective Opportunity Analysis
PPTX
Is a Medical Writer the Missing Accelerant to Your Outcome Improvement Efforts?
PPTX
10 Motivational Interviewing Strategies for Deeper Patient Engagement in Care...
PPTX
A 5-Step Guide for Successful Healthcare Data Warehouse Operations
PPTX
Six Ways Health Systems Use Analytics to Improve Patient Safety
PPTX
Survey Shows the Role of Technology in the Progress of Patient Safety
PPTX
Improving Sepsis Care: Three Paths to Better Outcomes
PPTX
Extended Real-World Data: The Life Science Industry’s Number One Asset
Health Catalyst® Introduces Closed-Loop Analytics™ Services
How to Build a Healthcare Analytics Team and Solve Strategic Problems
Health Systems Share COVID-19 Financial Recovery Strategies in First Client H...
A Roadmap for Optimizing Clinical Decision Support
Effective Patient Stratification: Four Solutions to Common Hurdles
Employer Health Plans: Keys to Lowering Cost, Boosting Benefits
A Healthcare Mergers Framework: How to Accelerate the Benefits
The Digitization of Healthcare: Why the Right Approach Matters and Five Steps...
Three Keys to Improving Hospital Patient Flow with Machine Learning
Exceptions to Information Blocking Defined in Proposed Rule: Here’s What You ...
Physician Burnout and the EHR: Addressing Five Common Burdens
The Top Four Skills of an Effective Healthcare Data Analyst
Four Steps to Effective Opportunity Analysis
Is a Medical Writer the Missing Accelerant to Your Outcome Improvement Efforts?
10 Motivational Interviewing Strategies for Deeper Patient Engagement in Care...
A 5-Step Guide for Successful Healthcare Data Warehouse Operations
Six Ways Health Systems Use Analytics to Improve Patient Safety
Survey Shows the Role of Technology in the Progress of Patient Safety
Improving Sepsis Care: Three Paths to Better Outcomes
Extended Real-World Data: The Life Science Industry’s Number One Asset
Ad

Similar to Healthcare NLP - Four Essentials to Make the Most of Unstructured Data (20)

PDF
Natural Language Processing In Healthcare
PPTX
Machine Learning Tools Unlock the Most Critical Insights from Unstructured He...
PPTX
How NLP is Catalyzing Medical Records with AI-Powered EHR Systems.pptx
PDF
transforming-healthcare-the-role-of-natural-language-processing-nlp-in-patien...
PDF
NLP to Enhance Your Hospital Documentation
PPTX
Natural Language Processing to Curate Unstructured Electronic Health Records
PPTX
How to Use Text Analytics in Healthcare to Improve Outcomes: Why You Need Mor...
PPTX
New Frontiers in Applied NLP​ - PAW Healthcare 2022
PPTX
MTIA 2009 - Healthstory Project Overview Dictation To Clinical Data
PDF
Unstructured Data into EHR Systems: Challenges and Solutions
PDF
NLP Prescription for Healthcare Challenges.pdf
PPTX
Demystifying Text Analytics and NLP in Healthcare
PPTX
Healthstory - Dictation to Clinical Data: Automating the Production of Struc...
PDF
Getting Started with Unstructured Data
PDF
The clinician’s perspective on electronic health records and how they can aff...
PDF
COMMUNICATION INFORMATICS. Enrico Coiera.
PDF
Ehr in primary care
PPTX
The Healthcare Analytics Ecosystem: A Must-Have in Today’s Transformation
PDF
Generate insights with unstructured data extraction.pdf
PPTX
Healthcare Interoperability: New Tactics and Technology
Natural Language Processing In Healthcare
Machine Learning Tools Unlock the Most Critical Insights from Unstructured He...
How NLP is Catalyzing Medical Records with AI-Powered EHR Systems.pptx
transforming-healthcare-the-role-of-natural-language-processing-nlp-in-patien...
NLP to Enhance Your Hospital Documentation
Natural Language Processing to Curate Unstructured Electronic Health Records
How to Use Text Analytics in Healthcare to Improve Outcomes: Why You Need Mor...
New Frontiers in Applied NLP​ - PAW Healthcare 2022
MTIA 2009 - Healthstory Project Overview Dictation To Clinical Data
Unstructured Data into EHR Systems: Challenges and Solutions
NLP Prescription for Healthcare Challenges.pdf
Demystifying Text Analytics and NLP in Healthcare
Healthstory - Dictation to Clinical Data: Automating the Production of Struc...
Getting Started with Unstructured Data
The clinician’s perspective on electronic health records and how they can aff...
COMMUNICATION INFORMATICS. Enrico Coiera.
Ehr in primary care
The Healthcare Analytics Ecosystem: A Must-Have in Today’s Transformation
Generate insights with unstructured data extraction.pdf
Healthcare Interoperability: New Tactics and Technology
Ad

More from Health Catalyst (20)

PDF
2025 CPT Updates - Professional Evaluation & Management (E/M) and Medicine Ch...
PPTX
2025 CPT Updates - Professional Evaluation & Management (E/M) and Medicine Ch...
PPTX
2025 CPT® Code Updates ( HIM Focused )
PPTX
2025 CPT® Code Updates ( CDM Focused )
PPTX
What’s Next for the OPPS: A Look at the 2025 Final Rule
PPTX
Unlocking Data for Growth: Harnessing Insights for Strategic Decisions
PPTX
How the PFS Final Rule Will Impact Your MSSP ACO Quality Reporting and Savings
PPTX
2025 Medicare Physician Fee Schedule (MPFS) Final Rule Updates
PPTX
What’s Next for the OPPS: A Look at the 2025 Final Rule
PPTX
Elevate Your Charge Capture: Harnessing Technology for Streamlined Data Colle...
PPTX
Looking Forward: The Evolution of Cancer Registry
PPTX
Addressing Key Challenges in Ambulatory Settings.pptx
PPTX
Leveraging Automated Data Flows, AI, and Analytics for Chart Abstraction
PPTX
Vitalware Insight into the 2025 ICD-10 PCS Updates
PPTX
Vitalware-Insight-Into-the-2025-ICD10-CM-Updates.pptx
PPTX
Embedded Refills: Improving Workflow Efficiency and Optimizing the Medication...
PPTX
A Data and Analytics Ecosystem, Purpose-Built for Healthcare
PPTX
Health Catalyst AI Becker's Webinar.pptx
PPTX
Empowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
PPTX
Unlock the Secrets to Optimizing Ambulatory Operations Efficiency and Change ...
2025 CPT Updates - Professional Evaluation & Management (E/M) and Medicine Ch...
2025 CPT Updates - Professional Evaluation & Management (E/M) and Medicine Ch...
2025 CPT® Code Updates ( HIM Focused )
2025 CPT® Code Updates ( CDM Focused )
What’s Next for the OPPS: A Look at the 2025 Final Rule
Unlocking Data for Growth: Harnessing Insights for Strategic Decisions
How the PFS Final Rule Will Impact Your MSSP ACO Quality Reporting and Savings
2025 Medicare Physician Fee Schedule (MPFS) Final Rule Updates
What’s Next for the OPPS: A Look at the 2025 Final Rule
Elevate Your Charge Capture: Harnessing Technology for Streamlined Data Colle...
Looking Forward: The Evolution of Cancer Registry
Addressing Key Challenges in Ambulatory Settings.pptx
Leveraging Automated Data Flows, AI, and Analytics for Chart Abstraction
Vitalware Insight into the 2025 ICD-10 PCS Updates
Vitalware-Insight-Into-the-2025-ICD10-CM-Updates.pptx
Embedded Refills: Improving Workflow Efficiency and Optimizing the Medication...
A Data and Analytics Ecosystem, Purpose-Built for Healthcare
Health Catalyst AI Becker's Webinar.pptx
Empowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
Unlock the Secrets to Optimizing Ambulatory Operations Efficiency and Change ...

Recently uploaded (20)

PPTX
1. Drug Distribution System.pptt b pharmacy
PPT
KULIAH UG WANITA Prof Endang 121110 (1).ppt
PPTX
CBT FOR OCD TREATMENT WITHOUT MEDICATION
PPTX
Nursing Care Aspects for High Risk newborn.pptx
PPT
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
PPTX
Importance of Immediate Response (1).pptx
PPTX
ABG advance Arterial Blood Gases Analysis
PPT
Recent advances in Diagnosis of Autoimmune Disorders
PPTX
Infection prevention and control for medical students
PPTX
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
PPTX
Rheumatic heart diseases with Type 2 Diabetes Mellitus
PPT
Microscope is an instrument that makes an enlarged image of a small object, t...
PPTX
Galactosemia pathophysiology, clinical features, investigation and treatment ...
PPTX
First Aid and Basic Life Support Training.pptx
PDF
MECE & SCQA FRAMEWORKS, - Adding Innovation & Influencing Hospital & Super-Sp...
PPTX
NUTRITIONAL PROBLEMS, CHANGES NEEDED TO PREVENT MALNUTRITION
PDF
Selvita_Development-Strategy-2022-2025.pdf
PDF
Priorities Critical Care Nursing 7th Edition by Urden Stacy Lough Test Bank.pdf
PDF
Dermatology diseases Index August 2025.pdf
PDF
2E-Learning-Together...PICS-PCISF con.pdf
1. Drug Distribution System.pptt b pharmacy
KULIAH UG WANITA Prof Endang 121110 (1).ppt
CBT FOR OCD TREATMENT WITHOUT MEDICATION
Nursing Care Aspects for High Risk newborn.pptx
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
Importance of Immediate Response (1).pptx
ABG advance Arterial Blood Gases Analysis
Recent advances in Diagnosis of Autoimmune Disorders
Infection prevention and control for medical students
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
Rheumatic heart diseases with Type 2 Diabetes Mellitus
Microscope is an instrument that makes an enlarged image of a small object, t...
Galactosemia pathophysiology, clinical features, investigation and treatment ...
First Aid and Basic Life Support Training.pptx
MECE & SCQA FRAMEWORKS, - Adding Innovation & Influencing Hospital & Super-Sp...
NUTRITIONAL PROBLEMS, CHANGES NEEDED TO PREVENT MALNUTRITION
Selvita_Development-Strategy-2022-2025.pdf
Priorities Critical Care Nursing 7th Edition by Urden Stacy Lough Test Bank.pdf
Dermatology diseases Index August 2025.pdf
2E-Learning-Together...PICS-PCISF con.pdf

Healthcare NLP - Four Essentials to Make the Most of Unstructured Data

  • 1. Healthcare NLP: Four Essentials to Make the Most of Unstructured Data
  • 2. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Essentials for Natural Language Processing The healthcare industry has recently realized a sharp increase in interest in natural language processing (NLP). The unstructured clinical record contains a wealth of insight into patients that isn’t available in the structured record.
  • 3. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Essentials for Natural Language Processing Additionally, advances in data science and AI have introduced new techniques for analyzing text, broadening and deepening understanding of the patient. Any organization seeking to leverage their data to improve outcomes, reduce cost, and further medical research needs to consider the wealth of insight stored in text and how they will create value from that data using NLP.
  • 4. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Essentials for Natural Language Processing The first step in using NLP can be the most difficult, and many organizations never meet the initial challenge of making the data available for analysis. NLP requires that data engineers transform unstructured text into a usable format (see need to know aspect #2 below) and in a location where the NLP technology can make use of it. This NLP pre-requisite can be a complex process, involving larger data sets and different technologies than many data engineers are familiar with.
  • 5. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Essentials for Natural Language Processing This presentation outlines four need-to-know ways to meet and overcome the challenges of making unstructured text available for advanced NLP analysis. It’s focused on the challenges and skillsets required to build a solid foundation for text analytics.
  • 6. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Understanding Free Text Is the Foundation for Healthcare NLP In my role of leading NLP efforts for healthcare analytics vendor, I recently worked on a patient safety surveillance tool that helps health systems monitor for potential adverse events. For example, administering Narcan to reverse the effects of a patient who doesn’t respond well to a pain killer or hospital-acquired pressure ulcers. While the administration of Narcan is commonly documented in structured data, pressure ulcers are often found in unstructured nursing notes.
  • 7. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Understanding Free Text Is the Foundation for Healthcare NLP To get the necessary data to improve patient safety, we needed to leverage the free text of nursing notes. We found that five of the 33 adverse events were primarily documented in unstructured text. To access and leverage the text data in the patient safety tool, we needed NLP. We needed more, however, than the right tools for NLP itself to use the rich information unstructured text holds.
  • 8. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text To effectively build a data pipeline for text, and navigate unfamiliar challenges, data engineers must understand four key points: 1. Text Is Bigger and More Complex 2. Text Comes from Different Data Sources 3. Text Is Stored in Multiple Areas 4. Text User Documentation Patterns Matter
  • 9. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 1: Text Is Bigger and More Complex An average EMR record—such as a medication, allergy, or diagnosis, etc.—runs between 50 to 150 bytes, or 50 to 150 MB per million records. On the other hand, the average clinical note record is approximately 150 times as large. With large health systems storing hundreds of millions of note records, this scale introduces data transfer and storage complexities that many data engineers won’t have previously confronted.
  • 10. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 2: Text Comes from Different Data Sources Experienced data professionals know well that data sources vary widely. The data model for one vendor is different from another (e.g., from one EMR to another). With text, the stakes are even higher. A typical data pipeline for structured data (Figure 1) from an EMR is less complex than an unstructured data pipeline. Figure 1: A typical structured data pipeline
  • 11. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 2: Text Comes from Different Data Sources Structured data typically involves working with just SQL and supporting tools (e.g., SSIS or Informatica). On the other hand, working with unstructured text (Figure 2) involves a variety of tools outside the typical data engineer’s skillset— including programming languages such as C# or Python and search engines such as Elasticsearch and SOLR. On top of this, the transformations required for text vary significantly based on how it’s stored in the source.
  • 12. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 2: Text Comes from Different Data Sources Figure 2: An analytics vendor’s unstructured text pipeline for three EMR vendors
  • 13. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 3: Text Is Stored in Multiple Areas It’s easy to think of text as a monolith—that all the text in a system lives in one place. Where text is stored, however, depends on the type of text and the system in use. For example, clinical notes, radiology reports, and pathology reports may exist in two or three different sets of tables, depending on the source system.
  • 14. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 3: Text Is Stored in Multiple Areas Location will also vary based on the specific implementation of that system. With one vendor’s system, radiology reports may be in the same table as clinical notes or in the same tables as results, depending on the workflow decisions behind the configuration of the organization’s EMR. One EMR vendor stores shorter text results as a separate table from notes and reports, while another will put results from the tasking/messaging engine in another table.
  • 15. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 4: Text User Documentation Patterns Matter Understanding how users document data matters. For example, during a recent project to identify adverse events for patients, we searched for documentation of in-hospital falls. The patient safety expert I was working with, a nurse, had always seen patient falls documented in nursing progress notes, but we found very few mentions of any falls in those notes.
  • 16. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 4: Text User Documentation Patterns Matter After discussions with the health information management group and nurses at the health system, we learned that it used a structured-only documentation methodology for nursing. The best source for documentation of in-hospital falls was the physician progress notes.
  • 17. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Four Need-to-Know Aspects of Working with Unstructured Text 4: Text User Documentation Patterns Matter This insight made a small difference in how our data scientist searched for falls data, but it made a significant difference in the results. Filtering which notes went into the NLP algorithm improved accuracy, particularly the sensitivity of the algorithm.
  • 18. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Understanding the Nuances of Text Makes Successful NLP Possible Working with text data is different than structured data. Keep in mind this article’s four lessons: Unstructured text records are significantly larger than structured records. Data engineers often need to preprocess text before running NLP, which often requires tools outside normal data pipelines. Text may be stored in different areas of source systems or EMRs. Each organization may document text differently. > > > >
  • 19. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Understanding the Nuances of Text Makes Successful NLP Possible Data engineers who want to meet the challenges of text and unlock its rich information will benefit by starting on a focused project, rather than taking on too many text tasks at once (a bottom- up versus a top-down approach). I recommend starting with a great use case that aligns with organizational goals. Using the patient safety scenario from earlier, if an organization is focused on improving patient safety, it may find that safety events are documented in unstructured text, limiting its ability to identify patient harm.
  • 20. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Understanding the Nuances of Text Makes Successful NLP Possible Starting by pulling text for one type of safety event (e.g., deep vein thromboses) can help data engineers form a process. They can then replicate this process for other use cases and start pulling the text data and using NLP tools to reduce patient harm and transform healthcare more broadly.
  • 21. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. For more information: “This book is a fantastic piece of work” – Robert Lindeman MD, FAAP, Chief Physician Quality Officer
  • 22. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. More about this topic Link to original article for a more in-depth discussion. Healthcare NLP: Four Essentials to Make the Most of Unstructured Data How Healthcare Text Analytics and Machine Learning Work Together to Improve Patient Outcomes Mike Dow, Technical Director; Levi Thatcher, VP, Data Science Text Analytics in Healthcare—Two Promising Frameworks that Meet Its Unique Demands Mike Dow, Technical Director Regenstrief Institute and Health Catalyst Team to Reveal Hidden Meaning in Clinical Data for Better Patient Care – Health Catalyst News The Top Three Recommendations for Successfully Deploying Predictive Analytics in Healthcare Eric Just, Senior VP of Product Development Three Approaches to Predictive Analytics in Healthcare Health Catalyst Insight
  • 23. © 2018 Health Catalyst Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. Other Clinical Quality Improvement Resources Click to read additional information at www.healthcatalyst.com Mike learned of the value of data early in his career. While working at a major EMR vendor in 2001, he led a project to help identify patients who were affected by drug recalls. He continued his work in various roles at Allscripts, including reporting, data exchange and systems architecture. From 2006 to 2015, Mike led the technology group at Galen Healthcare Solutions. While the company and his team grew by 50% annually during this time, they became known for excellence, earning awards like Best in KLAS for Technical Services and a Best Place to Work by Modern Healthcare. Mike joined Health Catalyst in 2015 to help with strategic client implementations. He has since joined the product development team to lead Health Catalyst’s text analytics initiative, making information previously locked in text notes available to Health Catalyst’s apps and data architects. Mike Dow