SlideShare a Scribd company logo
Niffler: A DICOM Framework for
Machine Learning and Processing Pipelines.
Pradeeban Kathiravelu, PhD
Emory University,
Atlanta, GA.
2021, October 6th
Introduction
2
• Real-time execution of machine learning (ML)
pipelines on radiology images is hard
• limited computing resources in clinical
environments
• running them in research clusters?
• efficient data transfer
• processing capabilities.
Our Proposal – Niffler
3
• An open-source ML framework: https://github.com/Emory-HITI/Niffler
• DICOM Images: PACS ⇨ to process in Research Clusters.
• Real-time and retrospectively on-demand
• Extracts the textual metadata from DICOM headers
• in real-time
• stores in a scalable database.
• Workflows and ML pipelines on images and metadata.
Niffler Modules
4
• Real-time DICOM extractions (a data stream).
• On-demand DICOM extractions (simplified and
expanded C-FIND, C-MOVE, and C-FIND+C-MOVE).
• DICOM  PNG conversion.
• Workflows module (Niffler service chaining).
• RTA extraction (fetch data from RIS).
• DICOM  NifTi conversion (experimental).
• DICOM Anonymization (experimental).
Niffler On-demand queries
5
• Niffler provides a flexible one-command retrieval of
several studies.
• all at once at bulk, with pause-and-resume capabilities.
• A CSV file with DICOM headers.
• A config file indicating up to 3 DICOM C-FIND headers to
query from.
• Niffler internally issues several C-FIND and C-MOVE
combinations for each line in the CSV file.
Sample configuration (json)
6
• {
• "NifflerSystem": "system.json",
• "StorageFolder": "/opt/data/new-study",
• "FilePath": "{00100020}/{0020000D}/{0020000E}/{00080018}.dcm",
• "CsvFile": "csv/empi.csv",
• "NumberOfQueryAttributes": 1,
• "FirstAttr": "PatientID",
• "FirstIndex": 0,
• "SecondAttr": "AccessionNumber",
• "SecondIndex": 5,
• "ThirdAttr": "StudyDate",
• "ThirdIndex": 3,
• "DateFormat": "%Y%m%d",
• "SendEmail": true,
• "YourEmail": "test@test.test"
• }
Capabilities
7
• Flexible queries that were not possible or easy before.
• “Retrieve all the CT Images of the female patients from
August 2021” can be a one Niffler extraction based on
StudyDate, Modality, and PatientSex.
Architecture and Prototype Deployment
8
• Queries to multiple
PACS.
• A Real-Time Listener
and a Retrospective
Extractor.
• ML pipelines and
analytics as containers.
Evaluation
9
• 715 Scanners
• 350 GB/day
• A few practical use cases
• Real-time ML Pipelines on DICOM images.
• IVC Filter detection.
• Real-time processing of metadata.
• Calculating Scanner utilization.
• Scanner clock calibration.
Evaluation – Use case 1: IVC Filter Detection
10
• Pre-trained
models.
• Real-time
Execution.
• 96% accuracy.
Evaluation – Use case 2: Understanding
Operational Efficiency of MRI Systems
11
• Based on calculated metrics from exam timestamps.
• Evaluate against a clinical data warehouse (CDW).
• Timestamps from CDW more prone to human errors.
• False depiction of exam overlaps.
Conclusion
12
• Running the ML pipelines from a research cluster
• Feasibility and efficiency
• On images and their metadata
• Received in real-time and retrospectively from PACS.
Future work
• Niffler + real-time clinical feed ⇨ Live AI inference.
Thank you. Questions?
pradeeban.kathiravelu@emory.edu
linkedin.com/in/kpradeeban/

More Related Content

PPTX
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
PPTX
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
PDF
Reproducible Research and the Cloud
PDF
Building Reproducible Network Data Analysis / Visualization Workflows
PPTX
Towards Knowledge Graphs of Reusable Research Software Metadata
PPTX
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
PDF
cyREST: Cytoscape as a Service
PDF
Overview of Modern Graph Analysis Tools
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Reproducible Research and the Cloud
Building Reproducible Network Data Analysis / Visualization Workflows
Towards Knowledge Graphs of Reusable Research Software Metadata
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
cyREST: Cytoscape as a Service
Overview of Modern Graph Analysis Tools

What's hot (7)

PPTX
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
PDF
What's New in Cytoscape
PDF
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
PDF
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
PPTX
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
PDF
Cytoscape and External Data Analysis Tools
PDF
IASSIST identifiers By Joan Starr
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
What's New in Cytoscape
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
Cytoscape and External Data Analysis Tools
IASSIST identifiers By Joan Starr
Ad

Similar to Niffler: A DICOM Framework for Machine Learning and Processing Pipelines. (11)

DOCX
Intelligent generator of big data medical
PPT
Big Data in Biomedicine: Where is the NIH Headed
PDF
Application of Deep Learning for Early Detection of Covid 19 using CT scan Im...
PDF
Using NLP and curation to make clinical data available for research
PPTX
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
PDF
Standardized representation of the LIDC annotations using DICOM
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
PPTX
How to Apply NLP to Analyze Clinical Trials
DOCX
deep learning applications in medical image analysis brain tumor
PPTX
Working With Large-Scale Clinical Datasets
PPTX
Big Data in Clinical Research
Intelligent generator of big data medical
Big Data in Biomedicine: Where is the NIH Headed
Application of Deep Learning for Early Detection of Covid 19 using CT scan Im...
Using NLP and curation to make clinical data available for research
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Standardized representation of the LIDC annotations using DICOM
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
How to Apply NLP to Analyze Clinical Trials
deep learning applications in medical image analysis brain tumor
Working With Large-Scale Clinical Datasets
Big Data in Clinical Research
Ad

More from Pradeeban Kathiravelu, Ph.D. (20)

PDF
Google Summer of Code_2023.pdf
PDF
Google Summer of Code (GSoC) 2022
PDF
Google Summer of Code (GSoC) 2022
PDF
Google summer of code (GSoC) 2021
PDF
Google Summer of Code (GSoC) 2020 for mentors
PDF
Google Summer of Code (GSoC) 2020
PDF
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
PDF
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
PDF
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
PDF
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
PDF
UCL Ph.D. Confirmation 2018
PDF
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
PDF
Moving bits with a fleet of shared virtual routers
PDF
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
PDF
On-Demand Service-Based Big Data Integration: Optimized for Research Collabor...
PDF
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
PDF
Software-Defined Inter-Cloud Composition of Big Services
PDF
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
PDF
Componentizing Big Services in the Internet
PDF
SD-CPS: Taming the Challenges of Cyber-Physical Systems with a Software-Defin...
Google Summer of Code_2023.pdf
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
Google summer of code (GSoC) 2021
Google Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
UCL Ph.D. Confirmation 2018
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Moving bits with a fleet of shared virtual routers
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
On-Demand Service-Based Big Data Integration: Optimized for Research Collabor...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Software-Defined Inter-Cloud Composition of Big Services
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Componentizing Big Services in the Internet
SD-CPS: Taming the Challenges of Cyber-Physical Systems with a Software-Defin...

Recently uploaded (20)

PPT
Microscope is an instrument that makes an enlarged image of a small object, t...
PPTX
First Aid and Basic Life Support Training.pptx
PDF
NUTRITION THROUGHOUT THE LIFE CYCLE CHILDHOOD -AGEING
PPTX
Nancy Caroline Emergency Paramedic Chapter 8
PPTX
Current Treatment Of Heart Failure By Dr Masood Ahmed
PDF
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
PPTX
ABG advance Arterial Blood Gases Analysis
PPTX
Trichuris trichiura infection
PPTX
3. Adherance Complianace.pptx pharmacy pci
PPTX
different types of Gait in orthopaedic injuries
PPTX
Bronchial_Asthma_in_acute_exacerbation_.pptx
PPTX
Galactosemia pathophysiology, clinical features, investigation and treatment ...
PPTX
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
PPTX
Basics of pharmacology (Pharmacology I).pptx
PPT
Pyramid Points Lab Values Power Point(11).ppt
PDF
Structure Composition and Mechanical Properties of Australian O.pdf
PPTX
NUTRITIONAL PROBLEMS, CHANGES NEEDED TO PREVENT MALNUTRITION
PPTX
Immunity....(shweta).................pptx
PPT
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
PPTX
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
Microscope is an instrument that makes an enlarged image of a small object, t...
First Aid and Basic Life Support Training.pptx
NUTRITION THROUGHOUT THE LIFE CYCLE CHILDHOOD -AGEING
Nancy Caroline Emergency Paramedic Chapter 8
Current Treatment Of Heart Failure By Dr Masood Ahmed
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
ABG advance Arterial Blood Gases Analysis
Trichuris trichiura infection
3. Adherance Complianace.pptx pharmacy pci
different types of Gait in orthopaedic injuries
Bronchial_Asthma_in_acute_exacerbation_.pptx
Galactosemia pathophysiology, clinical features, investigation and treatment ...
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
Basics of pharmacology (Pharmacology I).pptx
Pyramid Points Lab Values Power Point(11).ppt
Structure Composition and Mechanical Properties of Australian O.pdf
NUTRITIONAL PROBLEMS, CHANGES NEEDED TO PREVENT MALNUTRITION
Immunity....(shweta).................pptx
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx

Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.

  • 1. Niffler: A DICOM Framework for Machine Learning and Processing Pipelines. Pradeeban Kathiravelu, PhD Emory University, Atlanta, GA. 2021, October 6th
  • 2. Introduction 2 • Real-time execution of machine learning (ML) pipelines on radiology images is hard • limited computing resources in clinical environments • running them in research clusters? • efficient data transfer • processing capabilities.
  • 3. Our Proposal – Niffler 3 • An open-source ML framework: https://github.com/Emory-HITI/Niffler • DICOM Images: PACS ⇨ to process in Research Clusters. • Real-time and retrospectively on-demand • Extracts the textual metadata from DICOM headers • in real-time • stores in a scalable database. • Workflows and ML pipelines on images and metadata.
  • 4. Niffler Modules 4 • Real-time DICOM extractions (a data stream). • On-demand DICOM extractions (simplified and expanded C-FIND, C-MOVE, and C-FIND+C-MOVE). • DICOM  PNG conversion. • Workflows module (Niffler service chaining). • RTA extraction (fetch data from RIS). • DICOM  NifTi conversion (experimental). • DICOM Anonymization (experimental).
  • 5. Niffler On-demand queries 5 • Niffler provides a flexible one-command retrieval of several studies. • all at once at bulk, with pause-and-resume capabilities. • A CSV file with DICOM headers. • A config file indicating up to 3 DICOM C-FIND headers to query from. • Niffler internally issues several C-FIND and C-MOVE combinations for each line in the CSV file.
  • 6. Sample configuration (json) 6 • { • "NifflerSystem": "system.json", • "StorageFolder": "/opt/data/new-study", • "FilePath": "{00100020}/{0020000D}/{0020000E}/{00080018}.dcm", • "CsvFile": "csv/empi.csv", • "NumberOfQueryAttributes": 1, • "FirstAttr": "PatientID", • "FirstIndex": 0, • "SecondAttr": "AccessionNumber", • "SecondIndex": 5, • "ThirdAttr": "StudyDate", • "ThirdIndex": 3, • "DateFormat": "%Y%m%d", • "SendEmail": true, • "YourEmail": "test@test.test" • }
  • 7. Capabilities 7 • Flexible queries that were not possible or easy before. • “Retrieve all the CT Images of the female patients from August 2021” can be a one Niffler extraction based on StudyDate, Modality, and PatientSex.
  • 8. Architecture and Prototype Deployment 8 • Queries to multiple PACS. • A Real-Time Listener and a Retrospective Extractor. • ML pipelines and analytics as containers.
  • 9. Evaluation 9 • 715 Scanners • 350 GB/day • A few practical use cases • Real-time ML Pipelines on DICOM images. • IVC Filter detection. • Real-time processing of metadata. • Calculating Scanner utilization. • Scanner clock calibration.
  • 10. Evaluation – Use case 1: IVC Filter Detection 10 • Pre-trained models. • Real-time Execution. • 96% accuracy.
  • 11. Evaluation – Use case 2: Understanding Operational Efficiency of MRI Systems 11 • Based on calculated metrics from exam timestamps. • Evaluate against a clinical data warehouse (CDW). • Timestamps from CDW more prone to human errors. • False depiction of exam overlaps.
  • 12. Conclusion 12 • Running the ML pipelines from a research cluster • Feasibility and efficiency • On images and their metadata • Received in real-time and retrospectively from PACS. Future work • Niffler + real-time clinical feed ⇨ Live AI inference. Thank you. Questions? pradeeban.kathiravelu@emory.edu linkedin.com/in/kpradeeban/

Editor's Notes

  • #2: Hi everyone, I am Pradeeban Kathiravelu from Emory University. Today I present our research “A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology Images”.
  • #3: Real-time execution of machine learning pipelines on radiology images is challenging, due to limited computing resources in clinical environments. On the other hand, running them in research clusters requires efficient data transfer and processing capabilities.
  • #4: We propose Niffler, an open-source ML framework that runs in research clusters by receiving images in real-time and retrospectively on-demand, using DICOM networking protocol from hospitals' PACS. Niffler extracts the metadata from the DICOM headers in real-time and stores the metadata in a scalable database. Niffler enables efficient execution of processing workflows and ML pipelines on images and their metadata. Niffler consists of DICOM listeners for receiving images in real-time, and retrospective DICOM extractors to query and retrieve images on-demand. It also consists of a Metadata Extractor that extracts the textual metadata from the retrieved DICOM images. Niffler stores the images in its storage and the metadata in a Metadata Store.
  • #9: The Figure depicts the Niffler architecture and prototype deployment. In the standard healthcare system, the radiology department may consist of several PACS, each receiving radiology images from scanners of various modalities. Our current deployment environment consists of 2 PACS from our institutional radiology department, configured to accept DICOM retrieval queries from Niffler. Niffler can receive DICOM data from more PACS systems at once, with minor configuration changes to the PACS and Niffler. In the deployment at our institution, the primary PACS receives data in real-time from the scanners. The radiology department has configured an archival process that periodically copies the images from the primary PACS to a shadow PACS and then cleans up the primary PACS every week. Hence, the shadow PACS stores imaging data of several years, supporting retrospective queries. Niffler enables the execution of ML and real-time analytics pipelines as Docker containers on radiology images retrieved from the PACS and the textual metadata of the images. The Real-Time DICOM Listener receives images from the primary PACS continuously as a DICOM imaging stream. The Retrospective DICOM Extractor performs on-demand queries issued by the users on the retrospective data. Niffler executes multiple StoreSCP processes, one for each PACS. It stores the images from each PACS separately in an encrypted storage, in a hierarchical folder structure. The Metadata Extractor traverses and queries all the images in the storage, extracts the relevant metadata from the DICOM headers, and stores the PHI-free metadata in a NoSQL database, which we call the Metadata Store. The Application Layer facilitates access to the DICOM images from the storage and the respective metadata from the metadata store. Thus, it provides a unified data explorer access to both data and metadata. It further provides utility functions such as de-identification, image conversion, and scripts for scanner utilization computation and scanner clock calibration. The ML pipelines run either directly or via the application layer, on the images and the metadata. Niffler deletes the images from its storage periodically once the metadata extraction and the ML pipelines on the images complete their execution. Subsets of images relevant for a study can be shared with the other researchers, typically after processing them, including de-identifying images, converting DICOM images into PNG, or the image output and associated ML inference result. Since the framework supports queries to extract images meeting specific criteria, it limits the amount of information stored in the research clusters that are duplicated. Without Niffler, a researcher would have to submit multiple queries to the PACS and a clinical data warehouse -- a CDW, work on anonymizing the data collected, merge the data, and then run the model inference. Niffler supports prospective dynamic cohort and subcohort creation, eliminating the need for duplicate data storage and aggregation, with anonymized model output. Through its native support for the ML pipeline execution as containers, Niffler provides an infrastructure-agnostic execution with seamless scaling and migration. Thus, Niffler minimizes the repetitive and complicated configuration steps while automating the end-to-end process of an ML pipeline.
  • #10: Niffler retrieved data from 715 scanners, up to 350 GB/day continuously over the past 19 months. We look into a few practical use cases that highlight the performance of Niffler. First, real-time ML pipelines on DICOM images in a research cluster. Second, real-time processing of metadata for operational efficiency, such as computing scanner utilization and calibrating scanner clock.
  • #11: First, we built an IVC filter detection pipeline as a container to execute on the images retrieved in real-time with Niffler. The pipeline uses the Keras RetinaNet object detection and pre-trained models to determine whether an IVC filter is detected in the subcategories of the images. The Niffler Metadata Extractor applies the filters on modality and body parts to create a DICOM subset. The IVC Filter detection container runs its inference on the identified images, including chest Xray, abdomen radiographs, and Spine Xrays. The pipeline draws a bounding box around the filter and outputs a PNG image with the detection box, as the Figure shows. The IVC filter detection algorithm classified the test images with high accuracy of 96% on the images retrieved in real-time with Niffler.
  • #12: As the second use case, we applied the Niffler Metadata Extractor to understand the operational efficiency of individual MRI systems, based on calculated metrics from exam timestamps. These metadata allow measurement of exam duration and system idle time. The figure indicates the calculated exam time windows from one scanner on a particular day, according to Niffler and CDW. Identifying scanner utilization from CDW is more prone to human errors, leading to a false depiction of exam overlap. Niffler accurately identified examination timeframes and idling times of the scanners.
  • #13: We observe that our evaluations on Niffler highlight the feasibility and efficiency in running the ML pipelines from a research cluster on the images and metadata received in real-time and retrospectively from PACS. By merging Niffler with a real-time HL7 clinical feed, we can create a live AI inference pipeline that accelerates the development of clinically useful AI algorithms. To our knowledge, this will be the first AI inference pipeline that combines real-time image and clinical data information during AI validation. Thank you for your attention. Please feel free to reach out to me if you have questions or comments.