SlideShare a Scribd company logo
Building a search engine to find
environmental and phenotypic factors
associated with disease and health
Chirag J Patel

Northeast Big Data Innovation Hub Workshop

02/24/17
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
P = G + EType 2 Diabetes

Cancer

Alzheimer’s

Gene expression
Phenotype Genome
Variants
Environment
Infectious agents

Nutrients

Pollutants

Drugs
We are great at G investigation!
over 2400 

Genome-wide Association Studies (GWAS)

https://www.ebi.ac.uk/gwas/
G
Nothing comparable to elucidate E influence!
E: ???
We lack high-throughput methods
and data to discover new E in P…
until now!
σ2
G
σ2P
H2 =
Heritability (H2) is the range of phenotypic
variability attributed to genetic variability in a
population
Indicator of the proportion of phenotypic
differences attributed to G.
Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
G estimates for burdensome diseases are low and variable:
massive opportunity for E discovery
Type 2 Diabetes
Heart Disease
Asthma
Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
G estimates for complex traits are low and variable:
massive opportunity for high-throughput E discovery
σ2
E : Exposome!
Enhance accessibility of clinical and environmental data,
and analytic artificial intelligence tools!
How can we drive discovery of 

environmental factors (E) in disease phenotypes (P)?
Enhance accessibility of large open data and tools to 

drive discovery of 

environmental factors (E) in disease phenotypes (P)
Greg Cooper, MD, PhD
Pittsburgh
Vasant Honavar, PhD
Penn State
Noémie Elhadad, PhD
Columbia
Chirag Patel, PhD
Harvard
Where do we get disease (P) data?
wearable.com
Where do we get disease P data?

Health record data from your doctor!
• Longitudinal data on millions of patients
• diagnoses, prescriptions, lab reports, notes
• Sitting there in institutional IT infrastructure
• OHDSI provides a unified model to access data across
institutions, enhancing the scientific process!
Noémie Elhadad, PhD
Columbia
Capitalize on digitalized health record data 

(from around the world)!
High-powered dataset(s) for discovery.
Where do we get environmental (E) data?
Examples of sources of disparate external exposome datasets
available in the Exposome Data Warehouse
• Geological
• NASA - Cloud and Atmosphere Profiles

• NOAA Climate Data

• Pollution
• EPA Air Quality Surveillance Data Mart, or AirData,

• Socio-Economic
• US Census American Community Survey (ACS)

• Epidemiological
• CDC Wonder, USDA Food Atlas
A key challenge: 

mashing up Exposome Data Warehouse with patient
data from OHDSI
f(location, time)
PM2.5
income
Pollen count
EPA AirData
American Community
Survey
NOAA Climate
home zipcode
encounter time
0.50
gini index
0.20
0.1
hh income
(K)
100
50
3030
15
Wind
0
individualn
…
..
..
..
..
3/5/1998yes
10
E?age
no
95376
20
sex
M
Temp
individual2
PM2.5
individual1
75 55
02215
1/1/2016 23
Pb
individual3
21
70
F
M
zip
35
Time(E)
2312/11/2015
0
yes
50
302124
OHDSI EPA NOAA Census
millionsofpatients
Will it work? yep!
Does temperature (and weather) influence
asthma-related pediatric ER visits?
• Children <= 17 y/o with >=1
ICD9 code corresponding to 493.*

• N=56K, >84K ER visits
• Weather station data 

• (daily temperature, wind,
humidity)

• Case-crossover design (only
investigated cases)
Yeran Li, PhD (MS, HSPH)
Chirag Lakhani, PhD (HMS)
Yun Wang, PhD (Post-doc, HSPH)
Prevalence of asthma attack varies across the US
Temperature(F)
20
40
60
80
Lag
0
1
2
3
4
5
6
Relativerisk
0.95
1.00
1.05
20 40 60 80
0.91.11.3
Overall temperature effect
Mean temperature (F)
Relativerisk
●
●
●
●
● ● ●
0 1 2 3 4 5 6
0.901.001.10
Lag effect at T=60F
Lag(day)
RelativeRisk
20 40 60 80
0.901.001.10
Temperature effect at Lag=0
Mean temperature (F)
RelativeRisk
Does temperature influence asthma ER visits?: yes!
Relative risk of asthma attack by mean temperature
Rates of asthma attacks depend on season?: yes!
0.9
1.0
1.1
20 40 60 80
season Fall Spring Summer Winter
Temperature)effects)vary)at)different)regions)
(use)NOAA)defini9on,)ER)kids))
h@ps://www.ncdc.noaa.gov/monitoringEreferences/maps/usEclimateEregions.php)
south
northeast
southeast
central
southwest
wn_central
en_central
west
northwest
Region counts by NOAA
Numofobs
02000060000
51052
78126
57238
68719
15256
18050
44551
22923
13895
south northeast southeast central southwest wn_central en_central west northwest
0.5
1.0
1.5
2.0
20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80
Temperature
Risk
weather effect: different weather zones by NOAA
Rates of asthma attacks dependent on region?: yes!
NSF Northeast Hub Big Data Workshop
Does temperature (and weather) influence asthma-
related ER visits in kids?: the tip of the iceberg!
Lag
0
1
2
20 40 60 80
0.91.11.3
Overall temperature effect
Mean temperature (F)
Relativerisk
● ●
5 6
0F
20 40 60 80
0.901.001.10
Temperature effect at Lag=0
Mean temperature (F)
RelativeRisk
What other scientific questions?
•what is influence of pollen?

•what is the influence of air pollution?

•what about adults?
Can we replicate the analysis?
•different populations

•using different data

•with different analysts
ExposomeDB
Environmental Protection Agency
AirData
US Census
Socioeconomic data
National Oceanic and Atmospheric Administration (NOAA)
Climate Data
Observational Health Data
Sciences and Informatics
(OHDSI)
Harvard Medical School Columbia University P&S
Geotemporal database Network of observational data
Causal Analytics
A2
A3
A1
AX Aim X
Analytics Workshop
Hands-on workshops
for leveraging work in
Aims 1-3
Training Resources
Jupyter notebooks
Student internships at Pitt,
Harvard, PSU, and
Columbia P&S
OHDSI Node
A4
University of Pittsburgh Penn State University
Integrating the ExposomeData Warehouse with OHDSI
and causal modeling tools to drive and demonstrate
discovery.
Many hypotheses are possible to address: useful for
public health & planning!
What is the effect of air pollution levels in disease?
Do adverse weather conditions influence hospital use?
What pharmaceutical drugs lead to adverse health
outcomes?
How does socioeconomic context influence hospital use,
disease rates, and recovery?
We will harness tools in machine learning
extract signal from noise!
Greg Cooper, MD, PhD
Pittsburgh
Vasant Honavar, PhD
Penn State
SES
PM2.5
asthma
oF
bayesian networks
socio-economic
case-crossover
weather
air pollution
Sci Rep 2016
http://www.ccd.pitt.edu/tools/
Integrating the ExposomeDB with OHDSI and analytics
to drive and demonstrate discovery by the community!
(trainees especially welcome)
• 2-day hands-on workshop in New York or Boston

• remote “exchange” internship program and 2-week
immersion

• dissemination of electronic training resources
Many hypotheses are possible to address: useful for can
we build a machine learning predictor to estimate E?
Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
σ2
E : Exposome!
Harvard DBMI
Isaac Kohane

Susanne Churchill

Nathan Palmer

Sunny Alvear

Michal Preminger
Chirag J Patel

chirag@hms.harvard.edu

@chiragjp

www.chiragjpgroup.org
Thanks
RagGroup
Chirag Lakhani
Yeran Li
Shreyas Bhave

Rolando Acosta
Noémie Elhadad (Columbia)
Vasant Honavar (PSU)
Greg Cooper (Pitt)
George Hripcsak (Columbia)
René Baston
Katie Naum
Kathleen McKeown
NIH Common Fund

Big Data to Knowledge

More Related Content

PDF
Building a search engine for exposures in disease
PDF
EWAS and the exposome: Mt Sinai in Brescia 052119
PDF
Repurposing large datasets to dissect exposomic (and genomic) contributions i...
PDF
Chirag patel unite for sight 041418
PDF
Big data and the exposome, Oregon State 040616
PDF
NCI systems epidemiology 03012019
PDF
Informatics and data analytics to support for exposome-based discovery
PDF
Biomedical Informatics 706: Precision Medicine with exposures
Building a search engine for exposures in disease
EWAS and the exposome: Mt Sinai in Brescia 052119
Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Chirag patel unite for sight 041418
Big data and the exposome, Oregon State 040616
NCI systems epidemiology 03012019
Informatics and data analytics to support for exposome-based discovery
Biomedical Informatics 706: Precision Medicine with exposures

What's hot (20)

PDF
Intro to Biomedical Informatics 701
PDF
Repurposing large datasets for exposomic discovery in disease
PDF
AACR 041616 digital exposomes
PDF
Bioinformatics Strategies for Exposome 100416
PDF
Correlation globes of the exposome 2016
PDF
Data analytics to support exposome research course slides
PDF
Methods to enhance the validity of precision guidelines emerging from big data
PDF
Japanese Environmental Children's Study and Data-driven E
PDF
Search engine for E NEU network science 080817
PDF
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
PPT
Day2 145pm Crawford
PDF
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
PDF
Parent of origin effect
PPTX
Dr. Andrea Wilson - New PRRS disease phenotypes as vaccine and genetic improv...
PDF
Placental gene expression mediates the interaction between obstetrical histor...
PPTX
Manure Irrigation: Airborne Pathogen Transport and Assessment of Technology U...
PPTX
BUi_Final Presentation_SDZSP
PDF
Wildlife-livestock-human interface: recognising drivers of disease
PDF
Pioneer dehradun-english-edition-2021-06-05
Intro to Biomedical Informatics 701
Repurposing large datasets for exposomic discovery in disease
AACR 041616 digital exposomes
Bioinformatics Strategies for Exposome 100416
Correlation globes of the exposome 2016
Data analytics to support exposome research course slides
Methods to enhance the validity of precision guidelines emerging from big data
Japanese Environmental Children's Study and Data-driven E
Search engine for E NEU network science 080817
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Day2 145pm Crawford
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Parent of origin effect
Dr. Andrea Wilson - New PRRS disease phenotypes as vaccine and genetic improv...
Placental gene expression mediates the interaction between obstetrical histor...
Manure Irrigation: Airborne Pathogen Transport and Assessment of Technology U...
BUi_Final Presentation_SDZSP
Wildlife-livestock-human interface: recognising drivers of disease
Pioneer dehradun-english-edition-2021-06-05
Ad

Similar to NSF Northeast Hub Big Data Workshop (20)

PDF
Effects of Prenatal Exposures to EDCs on Childhood Development
PPTX
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
Predicting asthma related emergency department visits using big data
PDF
Making health data work for Patients and Populations
PPTX
ISM 2016 keynote on computing lifestyle 161211
PPTX
Brescia Exposome climate LMaitre April 2023.pptx
PPTX
Environmental and nutritional diseases animated
PPTX
Augmented Personalized Health: an explicit knowledge enhanced neurosymbolic d...
DOCX
When seeking to identify a patient’s health condition, advanced prac.docx
PPTX
Personal lifestyle and health 161014
PDF
Environmental Epidemiology
PPTX
GA4GH Phenotype Ontologies Task team update
PPTX
29.-Prevalence-Incidence-and-Measures-of-Associations_23Sep2020.pptx
DOCX
Assignment 1 Case Study Assignment Assessment Tools and Diagnostic.docx
PDF
Epidemiology Paper
PPTX
Total Exposure Health
PDF
A PROPOSED NEURO-FUZZY MODEL FOR ADULT ASTHMA DISEASE DIAGNOSIS
PPTX
Convergence of Occupational and Environmental Exposure Science: the Whole Pic...
PDF
Public Health Science Nursing Practice Savage Kub Grove Test Bank
PDF
Public Health Science Nursing Practice Savage Kub Grove Test Bank
Effects of Prenatal Exposures to EDCs on Childhood Development
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Predicting asthma related emergency department visits using big data
Making health data work for Patients and Populations
ISM 2016 keynote on computing lifestyle 161211
Brescia Exposome climate LMaitre April 2023.pptx
Environmental and nutritional diseases animated
Augmented Personalized Health: an explicit knowledge enhanced neurosymbolic d...
When seeking to identify a patient’s health condition, advanced prac.docx
Personal lifestyle and health 161014
Environmental Epidemiology
GA4GH Phenotype Ontologies Task team update
29.-Prevalence-Incidence-and-Measures-of-Associations_23Sep2020.pptx
Assignment 1 Case Study Assignment Assessment Tools and Diagnostic.docx
Epidemiology Paper
Total Exposure Health
A PROPOSED NEURO-FUZZY MODEL FOR ADULT ASTHMA DISEASE DIAGNOSIS
Convergence of Occupational and Environmental Exposure Science: the Whole Pic...
Public Health Science Nursing Practice Savage Kub Grove Test Bank
Public Health Science Nursing Practice Savage Kub Grove Test Bank
Ad

Recently uploaded (20)

PPTX
Clinical approach and Radiotherapy principles.pptx
PPTX
History and examination of abdomen, & pelvis .pptx
PPTX
anal canal anatomy with illustrations...
PPT
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
PPTX
ACID BASE management, base deficit correction
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPT
ASRH Presentation for students and teachers 2770633.ppt
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
PPTX
MANAGEMENT SNAKE BITE IN THE TROPICALS.pptx
PDF
شيت_عطا_0000000000000000000000000000.pdf
PPTX
Cardiovascular - antihypertensive medical backgrounds
PPT
Breast Cancer management for medicsl student.ppt
PPTX
neonatal infection(7392992y282939y5.pptx
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
PPT
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
DOC
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPTX
LUNG ABSCESS - respiratory medicine - ppt
PPTX
antibiotics rational use of antibiotics.pptx
Clinical approach and Radiotherapy principles.pptx
History and examination of abdomen, & pelvis .pptx
anal canal anatomy with illustrations...
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
ACID BASE management, base deficit correction
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
ASRH Presentation for students and teachers 2770633.ppt
MENTAL HEALTH - NOTES.ppt for nursing students
MANAGEMENT SNAKE BITE IN THE TROPICALS.pptx
شيت_عطا_0000000000000000000000000000.pdf
Cardiovascular - antihypertensive medical backgrounds
Breast Cancer management for medicsl student.ppt
neonatal infection(7392992y282939y5.pptx
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
surgery guide for USMLE step 2-part 1.pptx
LUNG ABSCESS - respiratory medicine - ppt
antibiotics rational use of antibiotics.pptx

NSF Northeast Hub Big Data Workshop

  • 1. Building a search engine to find environmental and phenotypic factors associated with disease and health Chirag J Patel Northeast Big Data Innovation Hub Workshop 02/24/17 chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org
  • 2. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Nutrients Pollutants Drugs
  • 3. We are great at G investigation! over 2400 Genome-wide Association Studies (GWAS) https://www.ebi.ac.uk/gwas/ G
  • 4. Nothing comparable to elucidate E influence! E: ??? We lack high-throughput methods and data to discover new E in P… until now!
  • 5. σ2 G σ2P H2 = Heritability (H2) is the range of phenotypic variability attributed to genetic variability in a population Indicator of the proportion of phenotypic differences attributed to G.
  • 6. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com G estimates for burdensome diseases are low and variable: massive opportunity for E discovery Type 2 Diabetes Heart Disease Asthma
  • 7. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com G estimates for complex traits are low and variable: massive opportunity for high-throughput E discovery σ2 E : Exposome!
  • 8. Enhance accessibility of clinical and environmental data, and analytic artificial intelligence tools! How can we drive discovery of environmental factors (E) in disease phenotypes (P)?
  • 9. Enhance accessibility of large open data and tools to drive discovery of environmental factors (E) in disease phenotypes (P) Greg Cooper, MD, PhD Pittsburgh Vasant Honavar, PhD Penn State Noémie Elhadad, PhD Columbia Chirag Patel, PhD Harvard
  • 10. Where do we get disease (P) data?
  • 12. Where do we get disease P data? Health record data from your doctor! • Longitudinal data on millions of patients • diagnoses, prescriptions, lab reports, notes • Sitting there in institutional IT infrastructure • OHDSI provides a unified model to access data across institutions, enhancing the scientific process!
  • 14. Capitalize on digitalized health record data (from around the world)! High-powered dataset(s) for discovery.
  • 15. Where do we get environmental (E) data?
  • 16. Examples of sources of disparate external exposome datasets available in the Exposome Data Warehouse • Geological • NASA - Cloud and Atmosphere Profiles • NOAA Climate Data • Pollution • EPA Air Quality Surveillance Data Mart, or AirData, • Socio-Economic • US Census American Community Survey (ACS) • Epidemiological • CDC Wonder, USDA Food Atlas
  • 17. A key challenge: mashing up Exposome Data Warehouse with patient data from OHDSI f(location, time) PM2.5 income Pollen count EPA AirData American Community Survey NOAA Climate home zipcode encounter time
  • 18. 0.50 gini index 0.20 0.1 hh income (K) 100 50 3030 15 Wind 0 individualn … .. .. .. .. 3/5/1998yes 10 E?age no 95376 20 sex M Temp individual2 PM2.5 individual1 75 55 02215 1/1/2016 23 Pb individual3 21 70 F M zip 35 Time(E) 2312/11/2015 0 yes 50 302124 OHDSI EPA NOAA Census millionsofpatients
  • 20. Does temperature (and weather) influence asthma-related pediatric ER visits? • Children <= 17 y/o with >=1 ICD9 code corresponding to 493.* • N=56K, >84K ER visits • Weather station data • (daily temperature, wind, humidity) • Case-crossover design (only investigated cases) Yeran Li, PhD (MS, HSPH) Chirag Lakhani, PhD (HMS) Yun Wang, PhD (Post-doc, HSPH)
  • 21. Prevalence of asthma attack varies across the US
  • 22. Temperature(F) 20 40 60 80 Lag 0 1 2 3 4 5 6 Relativerisk 0.95 1.00 1.05 20 40 60 80 0.91.11.3 Overall temperature effect Mean temperature (F) Relativerisk ● ● ● ● ● ● ● 0 1 2 3 4 5 6 0.901.001.10 Lag effect at T=60F Lag(day) RelativeRisk 20 40 60 80 0.901.001.10 Temperature effect at Lag=0 Mean temperature (F) RelativeRisk Does temperature influence asthma ER visits?: yes! Relative risk of asthma attack by mean temperature
  • 23. Rates of asthma attacks depend on season?: yes! 0.9 1.0 1.1 20 40 60 80 season Fall Spring Summer Winter
  • 24. Temperature)effects)vary)at)different)regions) (use)NOAA)defini9on,)ER)kids)) h@ps://www.ncdc.noaa.gov/monitoringEreferences/maps/usEclimateEregions.php) south northeast southeast central southwest wn_central en_central west northwest Region counts by NOAA Numofobs 02000060000 51052 78126 57238 68719 15256 18050 44551 22923 13895 south northeast southeast central southwest wn_central en_central west northwest 0.5 1.0 1.5 2.0 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 Temperature Risk weather effect: different weather zones by NOAA Rates of asthma attacks dependent on region?: yes!
  • 26. Does temperature (and weather) influence asthma- related ER visits in kids?: the tip of the iceberg! Lag 0 1 2 20 40 60 80 0.91.11.3 Overall temperature effect Mean temperature (F) Relativerisk ● ● 5 6 0F 20 40 60 80 0.901.001.10 Temperature effect at Lag=0 Mean temperature (F) RelativeRisk What other scientific questions? •what is influence of pollen? •what is the influence of air pollution? •what about adults? Can we replicate the analysis? •different populations •using different data •with different analysts
  • 27. ExposomeDB Environmental Protection Agency AirData US Census Socioeconomic data National Oceanic and Atmospheric Administration (NOAA) Climate Data Observational Health Data Sciences and Informatics (OHDSI) Harvard Medical School Columbia University P&S Geotemporal database Network of observational data Causal Analytics A2 A3 A1 AX Aim X Analytics Workshop Hands-on workshops for leveraging work in Aims 1-3 Training Resources Jupyter notebooks Student internships at Pitt, Harvard, PSU, and Columbia P&S OHDSI Node A4 University of Pittsburgh Penn State University Integrating the ExposomeData Warehouse with OHDSI and causal modeling tools to drive and demonstrate discovery.
  • 28. Many hypotheses are possible to address: useful for public health & planning! What is the effect of air pollution levels in disease? Do adverse weather conditions influence hospital use? What pharmaceutical drugs lead to adverse health outcomes? How does socioeconomic context influence hospital use, disease rates, and recovery?
  • 29. We will harness tools in machine learning extract signal from noise! Greg Cooper, MD, PhD Pittsburgh Vasant Honavar, PhD Penn State SES PM2.5 asthma oF bayesian networks socio-economic case-crossover weather air pollution Sci Rep 2016
  • 31. Integrating the ExposomeDB with OHDSI and analytics to drive and demonstrate discovery by the community! (trainees especially welcome) • 2-day hands-on workshop in New York or Boston • remote “exchange” internship program and 2-week immersion • dissemination of electronic training resources
  • 32. Many hypotheses are possible to address: useful for can we build a machine learning predictor to estimate E? Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com σ2 E : Exposome!
  • 33. Harvard DBMI Isaac Kohane Susanne Churchill Nathan Palmer Sunny Alvear Michal Preminger Chirag J Patel chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org Thanks RagGroup Chirag Lakhani Yeran Li Shreyas Bhave Rolando Acosta Noémie Elhadad (Columbia) Vasant Honavar (PSU) Greg Cooper (Pitt) George Hripcsak (Columbia) René Baston Katie Naum Kathleen McKeown NIH Common Fund Big Data to Knowledge