SlideShare a Scribd company logo
EXPLOITING AI IN THE SEARCH FOR REAL WORLD
EVIDENCE IN CANCER
Amber Simpson, PhD
Canada Research Chair in Biomedical Computing and Informatics
Associate Professor, Department of Biomedical and Molecular Sciences / School of Computing
Director, Centre for Health Innovation
Senior Investigator, Canadian Cancer Trials Group
Af
fi
liate Member, Vector Institute for AI
amber.simpson@queensu.ca
simpsonlab.org
@profsimsim
Disclosures:
My lab receives funding from CFI, CIHR, SSHRC, NSERC, and NIH.
I am a chartered member of NIH study section.
• Vanderbilt Postdoc
• Sloan Kettering Faculty (2015)
• Queen’s Faculty (July 2019)
• CHI Director (September 2020)
My Background
• Previously at Memorial Sloan
Kettering Cancer Center
• Attending Computational
Biologist in the Department of
Surgery and Professor at
Cornell University
Complicated Rela
ti
onship with AI
My Lab
Hassan Muhammad
PhD Student
Travis Williams
Post Doc
Karen Batch
MSc - CS
Fraser Raney
MSc - CS
Minhaj Ansari
MSc - CS
Abbey Kearney
MSc - MBI
Sal Choueib
MSc - CS
Andrew Garven
MSc - DBMS
Mohammad Hamghalam
Post Doc
Jacob Peoples
AI Scientist
LLana James
Post Doc
Jianwei Yue
PhD - CS
Katy Scott
MSc - CS
Poulina Tran
MSc - CS
Natalia Kim
MSc - CS
Jean-Paul Salameh
Med Student
Alex Robbins
MSc - PHS
Ricky Hu
Med Student
Danielle Cutler
Life Sci
Annabelle Suave
MSc - CS
Alan Dimitriev
MSc - CS
Jordan Loewen
Post Doc
Katie Lindale
MSc - TMED
Ramtin Mojtahedi Saffari
PhD - CS
Labs
Computer Science
Health Sciences
CHI - Kingston Health Sciences
Simpson
Lab
Methodology
Development
and Evaluation
Biomarker
Discovery and
Validation
Health Data
Platforms
Crowdsourcing
and
Grand Challenges
Association of Multimodal
Data (Genomics/Imaging)
with Clinical Characteristics
and Outcome, Prognostic
Modeling
Robustness Evaluation
(Repeatability and
Reproducibility)
Data Sharing, Hosting
Segmentation and
Survival Grand
Challenges
Unsupervised and
Supervised Analysis of
Big Biomedical Data,
Contribution of the
design and
implementation of
platforms
Novel Algorithms and
Approaches for
Multimodal Data
Integration
Rou
ti
ne Imaging Contains Predic
ti
ve & Prognos
ti
c Informa
ti
on
Liver CT
Imaging Biomarker Development
Cholangiocarcinoma HAIP Trial
Funded by NCI R01
• Hepatic arterial infusion
chemotherapy for
intrahepatic
cholangiocarcinoma led by
Bill Jarnagin (MSK)
• Correlative studies in
genomics and radiomics
The Preven
ti
on of Progression to Pancrea
ti
c Cancer Trial (The 3P-C Trial)
Funded by NCI R01
• Multi-center randomized double-blind
placebo controlled trial of patients with
high-risk IPMN led by Peter Allan
(Duke)
• Evaluate the effect of sulindac on the
presence or absence of progression of
IPMN after 3 years of treatment
• Correlative studies in radiomics and
cyst
fl
uid markers
Real World Evidence in Medicine
image source: CHCUK
Can we useAI to derive real-
world evidence?
Today’s Talk
Image
Segmentation NLP of Radiology
Reports
!"#$%&'()*+&,-./-01-2
34-"/0%&56&,7
)89562-1%&56&,7
!:;-/%&56&,7
*<4--#%&56&,7
30#./-01%&56&,7+&*"/$-/=+
>2/-#041%&56&,7+
?:2#-=1%&56&,7+
>3&#62-1%&56&,7+
3-4;:1%&56&,7+
@6A-4%&56&,7+
@6#-1%&56&,7+
BC8-/%&D8-1C&<6/C+
EF</-11:6#%&,-./-01-2&(B,
Real world
evidence?
The Object Recogni
ti
on Problem
Given an image, determine what is in the image.
Unsolved for decades, solved recently.
Made self-driving cars a reality.
NVIDIA DRIVE
Open Science Solved the Object Recogni
ti
on Problem
Visual Object Classes 2012 competition
Given an image, determine what is in the image
10 million images with 1,000 labelled classes
Created ImageNet
Medical Segmenta
ti
on Decathlon: ImageNet for Medical Images
http://medicaldecathlon.com/
The Medical Segmentation Decathlon https://arxiv.org/abs/2106.05735, Nature Communications
A large annotated medical image dataset for the development and evaluation of segmentation algorithms https://arxiv.org/abs/1902.09063
Medical Segmenta
ti
on Decathlon
Challenge at MICCAI 2018
Develop a semantic segmentation
algorithm that can solve 10
segmentations tasks, separately
without human interaction
Algorithm can learn unseen tasks
ImageNet for medical images
http://medicaldecathlon.com/
Best Algorithm Achieved State-of-the-Art Performance
Surface-based Performance Metrics
Gold Standard
Model
prediction
Standard surface metrics
• Maximum surface distance
( a.k.a Hausdorff)
• 95 percentile of surface distances
(Hausdorff95)
• Mean surface distance
• Median surface distance
Dice = Degree of Overlap
(Radiologist)
ROI Dice
Brain tumour 0.906
Liver tumour 0.884
Pancreas mass 0.654
Colon tumour 0.678
Liver 0.983
Pancreas 0.954
Legacy
Image Segmenta
ti
on
Poulina Tran
MSc - CS
Danielle Cutler
Life Sci
Go see their posters!
Ramtin Mojtahedi Saffari
PhD - CS
Today’s Talk
Image
Segmentation
NLP of Radiology
Reports
!"#$%&'()*+&,-./-01-2
34-"/0%&56&,7
)89562-1%&56&,7
!:;-/%&56&,7
*<4--#%&56&,7
30#./-01%&56&,7+&*"/$-/=+
>2/-#041%&56&,7+
?:2#-=1%&56&,7+
>3&#62-1%&56&,7+
3-4;:1%&56&,7+
@6A-4%&56&,7+
@6#-1%&56&,7+
BC8-/%&D8-1C&<6/C+
EF</-11:6#%&,-./-01-2&(B,
Real World
Evidence?
Developing a “Cancer Digital Twin”
Cancer Twin - a digital replica of a cancer patient.
Leveraging NLP and imaging, create a map of disease burden for
machine learning.
Database of response (and progression) rates and mixed response rates
for investigations into tumor heterogeneity.
K. Batch
CS Student
F. Zulkernine
CS
R. Do
Radiologist
Jianwei Yue
PhD Student
Response Evalua
ti
on in Criteria In Solid Tumors (RECIST)
• published rules assessing disease
burden by imaging
• performed by radiologist,
documented separate from the
radiology report
• oncologist needs reliable,
reproducible methods to assess
treatment response
Eisenhauer, E. A. et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). Eur. J. Cancer 45, 228–247
(2009).
RESIST 1.1
Reference Radiologist:
• Baseline
– ID’s measurable tumours (targets)
– provides unidimensional
measurements and records
• Follow ups:
– use strict criteria to categorize:
stable, progression, partial, or
complete response
Limits Our Understanding of Response Rates
RECIST Limits:
• Time consuming
• Only performed on patients enrolled in clinical trials
• Knowledge of response rates across the general cancer population is
very limited
MSK has 700K Structured Reports Back to 2009
CT report for a 74-year-old male with a history of colorectal cancer and prior hepatic resection for liver metastases.
For each pa
ti
ent …
+ treatment,
demographics,
and outcome
time
Evaluate NLP for Mapping Metasta
ti
c Disease
• Used standard term frequency-inverse document frequency (TF-IDF)
• 91,665 patients
• 387,359 reports
• 2,219 reports were manually reviewed for presence/absence mets in
lungs, pleura, liver, spleen, kidneys, adrenals, mesentery and
peritoneum, pelvic organs, and bones and used for training
• 448 reports were used for validation
• The best-performing NLP model was used to generate a
fi
nal map of
metastatic disease across all patients
Do et al. Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period, Radiology, 2021.
Lecture-2-amber.pdf
Metastases Distribu
ti
on by Primary Cancers
Frequency of Metastases
Prostate Colorectal Pancreas
Sankey diagrams of patients with prostate, colorectal, and pancreas primary cancers, and their most common first and second
sites of metastatic disease.
Do RKG, Lupton K, Causa Andrieu PI, Luthra A, Taya M, Batch K, Nguyen H, Rahurkar P, Gazit L, Nicholas K, Fong CJ, Gangai N, Schultz N, Zulkernine F, Sevilimedu V, Juluru K, Simpson A, Hricak
H. Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period. Radiology. 2021;301(1):115–122.
Mul
ti
-Report Classi
fi
ca
ti
on - Bidirec
ti
onal LSTM
Metric Training Validation
Lung (n = 5413) Liver (n = 1943) Adrenal (n =2874) Lung (n = 1160) Liver (n = 417) Adrenal (n = 616)
Accuracy 97.97% (±0.38%) 99.23% (±0.39%) 99.72% (±0.19%) 97.16% (±0.96%) 98.32% (±1.23%) 99.03% (±0.77%)
Precision 0.9052 (±0.01) 0.9798 (±0.01) 0.9660 (±0.01) 0.8404 (±0.02) 0.9661 (±0.02) 0.9375 (±0.02)
Recall 0.9366 (±0.01) 0.9873 (±0.00) 0.9803 (±0.01) 0.9054 (±0.02) 0.9702 (±0.02) 0.9375 (±0.02)
F1-Score 0.9206 (±0.01) 0.9835 (±0.01) 0.9731 (±0.01) 0.8717 (±0.02) 0.9682 (±0.02) 0.9375 (±0.02)
Batch K, Developing a Cancer Digital Twin: Supervised Metastases Detection from Consecutive Structured Radiology Reports, Frontiers in AI, 2022
Today’s Talk
Image
Segmentation
NLP of Radiology
Reports
!"#$%&'()*+&,-./-01-2
34-"/0%&56&,7
)89562-1%&56&,7
!:;-/%&56&,7
*<4--#%&56&,7
30#./-01%&56&,7+&*"/$-/=+
>2/-#041%&56&,7+
?:2#-=1%&56&,7+
>3&#62-1%&56&,7+
3-4;:1%&56&,7+
@6A-4%&56&,7+
@6#-1%&56&,7+
BC8-/%&D8-1C&<6/C+
EF</-11:6#%&,-./-01-2&(B,
Real World
Evidence?
Metasta
ti
c pa
tt
ern impacts survival in colorectal cancer pa
ti
ents
Real-
ti
me survival curves - “real world” data
No
manual
curation!
Next Steps
Add actual images?
Uncover true response rates?
Map cancers of unknown origin back in time? (Jianwei Yue)
Generate hypotheses for evaluating in clinical trials?
Expedite clinical research?
Popula
ti
on Level Health Data -> RWE!
Lecture-2-amber.pdf
The Province of Ontario
13.2 million people
14 Regional Cancer Centres
151 Hospitals
Body Level One
• Body Level Two
– Body Level Three
– Body Level Four
• Body Level Five
Slide courtesy of Dr. Alice Wei, UHN
Single payer insurance -> OHIP
Hospital -> not for pro
fi
t private corporations
Health Data Landscape
Cancer Care Ontario/Ontario Health Teams
• Centralization and coordination of care
CCTG (Canadian Cancer Trials Group)
• Designs and administers cancer trials across Canada
• Data science is a priority in the new strategic plan
Ontario Health Data Platform
• Access to episodic health data (demographic, outcome, etc.) from ICES
• Links to primary care and hospitals in the future
CPCSNN (Canadian Primary Care Sentinel Surveillance Network)
CIHI (Canadian Institute for Health Innovation)
CAC (Centre for Advanced Computing) - Queen’s
In response to COVID 19 …
• Access to health data of 14
million Ontarians
• High-performance cluster
• Data governance by the
Ministry of Health
• Located at Queen’s
Iden
ti
ty and the digital twin
• Even if can build a cancer digital twin -
should we?
• If you were diagnosed with cancer, would
you want an AI to tell you how long you
have to live?
• How do we address the existential
threats of AI and cancer?
• How do we bring social justice into the
development of AI?
S. Mosurinjohn
Humanist
School of
Religion
L. James
Health Data
Justice Postdoc
Lesson from LLana: the data are already biased, perfect data = myth
Dr. Robyn Rowe (joining July 2022)
• Expert in the areas of national and international Indigenous-led data
governance and sovereignty
• Woven throughout these core areas are intersections of decolonialism, anti-
colonialism, and anti-racism
• Currently an indigenous scholar at ICES
• Partnership with Native Bio Data Consortium
https://nativebio.org/
Indigenous Digital Rights and Decolonization in AI PDF
Centre for Health Innovation
Computational
Community
Clinical Community
We have
the answers!
Help!
Inspired by Janet Eary, NCI
The Most Useful Wri
ti
ng Resources I’ve Ever Found
Rhetorical Pa
tt
erns
Source: https://www.northwestern.edu/climb/resources/written-communication/index.html
https://www.northwestern.edu/climb/resources/written-communication/aims-pages-part-1-the-rhetorical-pattern-of-
introductions-in-aims-pages.html
Five Principles for Readable Sentences
Source: https://www.youtube.com/watch?v=rZxaSMzstB8
Things you should know as a new grad student….
Grad School vs Undergrad
• We want you to succeed and picked you from hundreds of other
potential students because we want to invest in you
• We are training you to replace us (move from being a “student” to a
colleague-in-training)
• Your supervisor is paying you from their research grants that they have
worked hard to secure with the expectation that you will produce
something to move the research lab forward
• Your courses are necessary but not the most important aspect of an
MSc (your research is)
Free Your Mind for Crea
ti
vity
• Lab notebook
• Pomodoro technique
• Kanban (kanban
fl
ow.com/)
• GTD (“Getting Things Done”)
• Bujo (Bullet journaling)
• monday.com
• Outlook, Teams, One Note integration
• Dropbox, Queen’s One Drive
• Reference Manager (Mendeley, Zotaro, EndNote, etc)
• Overleaf
Disserta
ti
on Boot Camp
Source: https://www.queensu.ca/exph/academic-development/writing-support/dissertation-bootcamp%E2%80%8B
Ques
ti
ons?

More Related Content

PPTX
Artificial Intelligence in Oncology: Transforming Cancer Carepptx
PPTX
Fundamentals and Innovations in medical imaging.pptx
PPTX
Precision Radiotherapy: Tailoring Treatment for Individualised Cancer Care.pptx
PDF
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
PDF
Artificial Intelligence in Radiation Oncology
PPTX
Artificial Intelligence in Radiation Oncology
PPTX
AI and whole slide imaging biomarkers
PPTX
Bodysync Cancer Prediction using CNN .pptx
Artificial Intelligence in Oncology: Transforming Cancer Carepptx
Fundamentals and Innovations in medical imaging.pptx
Precision Radiotherapy: Tailoring Treatment for Individualised Cancer Care.pptx
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation Oncology
AI and whole slide imaging biomarkers
Bodysync Cancer Prediction using CNN .pptx

Similar to Lecture-2-amber.pdf (20)

PPTX
Artificial Intelligence in pathology
PPTX
Artificial Intelligence in Radiation Oncology.pptx
PPTX
Artificial Intelligence in Radiation Oncology
PPTX
PPT (wcwcscscaccecscscscscscsccscscec1).pptx
PPTX
University of Toronto - Radiomics for Oncology - 2017
PDF
[ASGO 2019] Artificial Intelligence in Medicine
PDF
20601-38945-1-PB.pdf
PDF
Medical Deep Learning: Clinical, Technical, & Regulatory Challenges and How t...
PDF
Presentation clinical applications of Artificial Intelligence for radiation o...
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
PPTX
P.Surendar - VIVA PPT.pptx
PPTX
Keynote at NVIDIA GPU Technology Conference in D.C.
PDF
Day 1: Real-World Data Panel
PPTX
2013 machine learning_choih
PPTX
GOOGLE------------------------GENAI.pptx
PPTX
ai-in-healthcare-202011-201117103639.pptx
PDF
A Review On Lung Cancer Detection From CT Scan Images Using CNN
PPTX
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
PPTX
lung cancer detection using hybrid approach.pptx
PPTX
Lecture3-AI-Applications-In-Medicine-January23-2023.pptx
Artificial Intelligence in pathology
Artificial Intelligence in Radiation Oncology.pptx
Artificial Intelligence in Radiation Oncology
PPT (wcwcscscaccecscscscscscsccscscec1).pptx
University of Toronto - Radiomics for Oncology - 2017
[ASGO 2019] Artificial Intelligence in Medicine
20601-38945-1-PB.pdf
Medical Deep Learning: Clinical, Technical, & Regulatory Challenges and How t...
Presentation clinical applications of Artificial Intelligence for radiation o...
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
P.Surendar - VIVA PPT.pptx
Keynote at NVIDIA GPU Technology Conference in D.C.
Day 1: Real-World Data Panel
2013 machine learning_choih
GOOGLE------------------------GENAI.pptx
ai-in-healthcare-202011-201117103639.pptx
A Review On Lung Cancer Detection From CT Scan Images Using CNN
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
lung cancer detection using hybrid approach.pptx
Lecture3-AI-Applications-In-Medicine-January23-2023.pptx
Ad

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
sap open course for s4hana steps from ECC to s4
A comparative analysis of optical character recognition models for extracting...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
Advanced methodologies resolving dimensionality complications for autism neur...
Per capita expenditure prediction using model stacking based on satellite ima...
MIND Revenue Release Quarter 2 2025 Press Release
NewMind AI Weekly Chronicles - August'25-Week II
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Machine learning based COVID-19 study performance prediction
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Ad

Lecture-2-amber.pdf

  • 1. EXPLOITING AI IN THE SEARCH FOR REAL WORLD EVIDENCE IN CANCER Amber Simpson, PhD Canada Research Chair in Biomedical Computing and Informatics Associate Professor, Department of Biomedical and Molecular Sciences / School of Computing Director, Centre for Health Innovation Senior Investigator, Canadian Cancer Trials Group Af fi liate Member, Vector Institute for AI amber.simpson@queensu.ca simpsonlab.org @profsimsim
  • 2. Disclosures: My lab receives funding from CFI, CIHR, SSHRC, NSERC, and NIH. I am a chartered member of NIH study section.
  • 3. • Vanderbilt Postdoc • Sloan Kettering Faculty (2015) • Queen’s Faculty (July 2019) • CHI Director (September 2020)
  • 4. My Background • Previously at Memorial Sloan Kettering Cancer Center • Attending Computational Biologist in the Department of Surgery and Professor at Cornell University
  • 6. My Lab Hassan Muhammad PhD Student Travis Williams Post Doc Karen Batch MSc - CS Fraser Raney MSc - CS Minhaj Ansari MSc - CS Abbey Kearney MSc - MBI Sal Choueib MSc - CS Andrew Garven MSc - DBMS Mohammad Hamghalam Post Doc Jacob Peoples AI Scientist LLana James Post Doc Jianwei Yue PhD - CS Katy Scott MSc - CS Poulina Tran MSc - CS Natalia Kim MSc - CS Jean-Paul Salameh Med Student Alex Robbins MSc - PHS Ricky Hu Med Student Danielle Cutler Life Sci Annabelle Suave MSc - CS Alan Dimitriev MSc - CS Jordan Loewen Post Doc Katie Lindale MSc - TMED Ramtin Mojtahedi Saffari PhD - CS
  • 7. Labs Computer Science Health Sciences CHI - Kingston Health Sciences
  • 8. Simpson Lab Methodology Development and Evaluation Biomarker Discovery and Validation Health Data Platforms Crowdsourcing and Grand Challenges Association of Multimodal Data (Genomics/Imaging) with Clinical Characteristics and Outcome, Prognostic Modeling Robustness Evaluation (Repeatability and Reproducibility) Data Sharing, Hosting Segmentation and Survival Grand Challenges Unsupervised and Supervised Analysis of Big Biomedical Data, Contribution of the design and implementation of platforms Novel Algorithms and Approaches for Multimodal Data Integration
  • 9. Rou ti ne Imaging Contains Predic ti ve & Prognos ti c Informa ti on Liver CT
  • 11. Cholangiocarcinoma HAIP Trial Funded by NCI R01 • Hepatic arterial infusion chemotherapy for intrahepatic cholangiocarcinoma led by Bill Jarnagin (MSK) • Correlative studies in genomics and radiomics
  • 12. The Preven ti on of Progression to Pancrea ti c Cancer Trial (The 3P-C Trial) Funded by NCI R01 • Multi-center randomized double-blind placebo controlled trial of patients with high-risk IPMN led by Peter Allan (Duke) • Evaluate the effect of sulindac on the presence or absence of progression of IPMN after 3 years of treatment • Correlative studies in radiomics and cyst fl uid markers
  • 13. Real World Evidence in Medicine
  • 15. Can we useAI to derive real- world evidence?
  • 16. Today’s Talk Image Segmentation NLP of Radiology Reports !"#$%&'()*+&,-./-01-2 34-"/0%&56&,7 )89562-1%&56&,7 !:;-/%&56&,7 *<4--#%&56&,7 30#./-01%&56&,7+&*"/$-/=+ >2/-#041%&56&,7+ ?:2#-=1%&56&,7+ >3&#62-1%&56&,7+ 3-4;:1%&56&,7+ @6A-4%&56&,7+ @6#-1%&56&,7+ BC8-/%&D8-1C&<6/C+ EF</-11:6#%&,-./-01-2&(B, Real world evidence?
  • 17. The Object Recogni ti on Problem Given an image, determine what is in the image. Unsolved for decades, solved recently. Made self-driving cars a reality. NVIDIA DRIVE
  • 18. Open Science Solved the Object Recogni ti on Problem Visual Object Classes 2012 competition Given an image, determine what is in the image 10 million images with 1,000 labelled classes Created ImageNet
  • 19. Medical Segmenta ti on Decathlon: ImageNet for Medical Images http://medicaldecathlon.com/ The Medical Segmentation Decathlon https://arxiv.org/abs/2106.05735, Nature Communications A large annotated medical image dataset for the development and evaluation of segmentation algorithms https://arxiv.org/abs/1902.09063
  • 20. Medical Segmenta ti on Decathlon Challenge at MICCAI 2018 Develop a semantic segmentation algorithm that can solve 10 segmentations tasks, separately without human interaction Algorithm can learn unseen tasks ImageNet for medical images http://medicaldecathlon.com/
  • 21. Best Algorithm Achieved State-of-the-Art Performance Surface-based Performance Metrics Gold Standard Model prediction Standard surface metrics • Maximum surface distance ( a.k.a Hausdorff) • 95 percentile of surface distances (Hausdorff95) • Mean surface distance • Median surface distance Dice = Degree of Overlap (Radiologist) ROI Dice Brain tumour 0.906 Liver tumour 0.884 Pancreas mass 0.654 Colon tumour 0.678 Liver 0.983 Pancreas 0.954
  • 23. Image Segmenta ti on Poulina Tran MSc - CS Danielle Cutler Life Sci Go see their posters! Ramtin Mojtahedi Saffari PhD - CS
  • 24. Today’s Talk Image Segmentation NLP of Radiology Reports !"#$%&'()*+&,-./-01-2 34-"/0%&56&,7 )89562-1%&56&,7 !:;-/%&56&,7 *<4--#%&56&,7 30#./-01%&56&,7+&*"/$-/=+ >2/-#041%&56&,7+ ?:2#-=1%&56&,7+ >3&#62-1%&56&,7+ 3-4;:1%&56&,7+ @6A-4%&56&,7+ @6#-1%&56&,7+ BC8-/%&D8-1C&<6/C+ EF</-11:6#%&,-./-01-2&(B, Real World Evidence?
  • 25. Developing a “Cancer Digital Twin” Cancer Twin - a digital replica of a cancer patient. Leveraging NLP and imaging, create a map of disease burden for machine learning. Database of response (and progression) rates and mixed response rates for investigations into tumor heterogeneity. K. Batch CS Student F. Zulkernine CS R. Do Radiologist Jianwei Yue PhD Student
  • 26. Response Evalua ti on in Criteria In Solid Tumors (RECIST) • published rules assessing disease burden by imaging • performed by radiologist, documented separate from the radiology report • oncologist needs reliable, reproducible methods to assess treatment response Eisenhauer, E. A. et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). Eur. J. Cancer 45, 228–247 (2009).
  • 27. RESIST 1.1 Reference Radiologist: • Baseline – ID’s measurable tumours (targets) – provides unidimensional measurements and records • Follow ups: – use strict criteria to categorize: stable, progression, partial, or complete response
  • 28. Limits Our Understanding of Response Rates RECIST Limits: • Time consuming • Only performed on patients enrolled in clinical trials • Knowledge of response rates across the general cancer population is very limited
  • 29. MSK has 700K Structured Reports Back to 2009 CT report for a 74-year-old male with a history of colorectal cancer and prior hepatic resection for liver metastases.
  • 30. For each pa ti ent … + treatment, demographics, and outcome time
  • 31. Evaluate NLP for Mapping Metasta ti c Disease • Used standard term frequency-inverse document frequency (TF-IDF) • 91,665 patients • 387,359 reports • 2,219 reports were manually reviewed for presence/absence mets in lungs, pleura, liver, spleen, kidneys, adrenals, mesentery and peritoneum, pelvic organs, and bones and used for training • 448 reports were used for validation • The best-performing NLP model was used to generate a fi nal map of metastatic disease across all patients Do et al. Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period, Radiology, 2021.
  • 33. Metastases Distribu ti on by Primary Cancers
  • 34. Frequency of Metastases Prostate Colorectal Pancreas Sankey diagrams of patients with prostate, colorectal, and pancreas primary cancers, and their most common first and second sites of metastatic disease. Do RKG, Lupton K, Causa Andrieu PI, Luthra A, Taya M, Batch K, Nguyen H, Rahurkar P, Gazit L, Nicholas K, Fong CJ, Gangai N, Schultz N, Zulkernine F, Sevilimedu V, Juluru K, Simpson A, Hricak H. Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period. Radiology. 2021;301(1):115–122.
  • 35. Mul ti -Report Classi fi ca ti on - Bidirec ti onal LSTM Metric Training Validation Lung (n = 5413) Liver (n = 1943) Adrenal (n =2874) Lung (n = 1160) Liver (n = 417) Adrenal (n = 616) Accuracy 97.97% (±0.38%) 99.23% (±0.39%) 99.72% (±0.19%) 97.16% (±0.96%) 98.32% (±1.23%) 99.03% (±0.77%) Precision 0.9052 (±0.01) 0.9798 (±0.01) 0.9660 (±0.01) 0.8404 (±0.02) 0.9661 (±0.02) 0.9375 (±0.02) Recall 0.9366 (±0.01) 0.9873 (±0.00) 0.9803 (±0.01) 0.9054 (±0.02) 0.9702 (±0.02) 0.9375 (±0.02) F1-Score 0.9206 (±0.01) 0.9835 (±0.01) 0.9731 (±0.01) 0.8717 (±0.02) 0.9682 (±0.02) 0.9375 (±0.02) Batch K, Developing a Cancer Digital Twin: Supervised Metastases Detection from Consecutive Structured Radiology Reports, Frontiers in AI, 2022
  • 36. Today’s Talk Image Segmentation NLP of Radiology Reports !"#$%&'()*+&,-./-01-2 34-"/0%&56&,7 )89562-1%&56&,7 !:;-/%&56&,7 *<4--#%&56&,7 30#./-01%&56&,7+&*"/$-/=+ >2/-#041%&56&,7+ ?:2#-=1%&56&,7+ >3&#62-1%&56&,7+ 3-4;:1%&56&,7+ @6A-4%&56&,7+ @6#-1%&56&,7+ BC8-/%&D8-1C&<6/C+ EF</-11:6#%&,-./-01-2&(B, Real World Evidence?
  • 37. Metasta ti c pa tt ern impacts survival in colorectal cancer pa ti ents
  • 38. Real- ti me survival curves - “real world” data No manual curation!
  • 39. Next Steps Add actual images? Uncover true response rates? Map cancers of unknown origin back in time? (Jianwei Yue) Generate hypotheses for evaluating in clinical trials? Expedite clinical research?
  • 42. The Province of Ontario 13.2 million people 14 Regional Cancer Centres 151 Hospitals Body Level One • Body Level Two – Body Level Three – Body Level Four • Body Level Five Slide courtesy of Dr. Alice Wei, UHN Single payer insurance -> OHIP Hospital -> not for pro fi t private corporations
  • 43. Health Data Landscape Cancer Care Ontario/Ontario Health Teams • Centralization and coordination of care CCTG (Canadian Cancer Trials Group) • Designs and administers cancer trials across Canada • Data science is a priority in the new strategic plan Ontario Health Data Platform • Access to episodic health data (demographic, outcome, etc.) from ICES • Links to primary care and hospitals in the future CPCSNN (Canadian Primary Care Sentinel Surveillance Network) CIHI (Canadian Institute for Health Innovation) CAC (Centre for Advanced Computing) - Queen’s
  • 44. In response to COVID 19 … • Access to health data of 14 million Ontarians • High-performance cluster • Data governance by the Ministry of Health • Located at Queen’s
  • 45. Iden ti ty and the digital twin • Even if can build a cancer digital twin - should we? • If you were diagnosed with cancer, would you want an AI to tell you how long you have to live? • How do we address the existential threats of AI and cancer? • How do we bring social justice into the development of AI? S. Mosurinjohn Humanist School of Religion L. James Health Data Justice Postdoc Lesson from LLana: the data are already biased, perfect data = myth
  • 46. Dr. Robyn Rowe (joining July 2022) • Expert in the areas of national and international Indigenous-led data governance and sovereignty • Woven throughout these core areas are intersections of decolonialism, anti- colonialism, and anti-racism • Currently an indigenous scholar at ICES • Partnership with Native Bio Data Consortium https://nativebio.org/ Indigenous Digital Rights and Decolonization in AI PDF Centre for Health Innovation
  • 47. Computational Community Clinical Community We have the answers! Help! Inspired by Janet Eary, NCI
  • 48. The Most Useful Wri ti ng Resources I’ve Ever Found
  • 50. Five Principles for Readable Sentences Source: https://www.youtube.com/watch?v=rZxaSMzstB8
  • 51. Things you should know as a new grad student….
  • 52. Grad School vs Undergrad • We want you to succeed and picked you from hundreds of other potential students because we want to invest in you • We are training you to replace us (move from being a “student” to a colleague-in-training) • Your supervisor is paying you from their research grants that they have worked hard to secure with the expectation that you will produce something to move the research lab forward • Your courses are necessary but not the most important aspect of an MSc (your research is)
  • 53. Free Your Mind for Crea ti vity • Lab notebook • Pomodoro technique • Kanban (kanban fl ow.com/) • GTD (“Getting Things Done”) • Bujo (Bullet journaling) • monday.com • Outlook, Teams, One Note integration • Dropbox, Queen’s One Drive • Reference Manager (Mendeley, Zotaro, EndNote, etc) • Overleaf
  • 54. Disserta ti on Boot Camp Source: https://www.queensu.ca/exph/academic-development/writing-support/dissertation-bootcamp%E2%80%8B