SlideShare a Scribd company logo
Reproducibility 
in 
human 
cogni4ve 
neuroimaging: 
a 
community-­‐driven 
data 
sharing 
framework 
for 
provenance 
informa4on 
integra4on 
and 
interoperability 
Nolan Nichols 
Dissertation Defense 
Biomedical and Health Informatics 
University of Washington 
Seattle, WA, USA 
December 8, 2014 
1
Outline 
• Introduction 
• Background 
• Research approach 
• Conclusions and future directions 
2
Outline 
• Introduction 
– Motivation for Research 
– Research Goal 
• Background 
• Research approach 
• Conclusions and future directions 
3
Introduction: Motivation for Research 
• Human Cognitive Neuroimaging 
• Inves4gates 
brain 
structure 
and 
func4on 
in 
normal 
and 
neuropsychiatric 
condi4ons 
to 
improve 
human 
health 
• Facilitates 
clinical 
decision 
making 
using 
imaging 
and 
cogni4ve 
phenotypes 
4
Introduction: Motivation for Research 
• Biomedical Informatics (BMI) 
– The interdisciplinary field that studies and 
pursues the effective use of biomedical data, 
information, and knowledge for scientific inquiry, 
problem solving, and decision making, motivated 
by efforts to improve human health 
• Neuroinformatics 
– Applies BMI principles to develop techniques 
and tools for acquiring, sharing, storing, 
publishing, analyzing, modeling, visualizing and 
simulating data across all levels of neuroscience 
5
Introduction: Motivation for Research 
• Neuroinformatics Perspective 
• Research is a process with distinct stages 
• Provenance links together each stage 
Poline et al. (2012), Frontiers in Neuroinformatics 
6
Introduction: Motivation for Research 
• Problem: 
research 
is 
not 
reproducibile 
– Ioannidis 
JPA: 
Why 
Most 
Published 
Research 
Findings 
Are 
False. 
PLoS 
Med 
2005 
– Donoho 
D: 
An 
invita9on 
to 
reproducible 
computa9onal 
research. 
Biosta.s.cs 
2010. 
– Yong 
EE: 
Replica9on 
studies: 
Bad 
copy. 
Nature 
2012 
– Editorial: 
Reducing 
our 
irreproducibility. 
Nature 
2012 
– Begley 
CG: 
Six 
red 
flags 
for 
suspect 
work. 
Nature 
2013 
– Collins 
FS, 
Tabak 
LA: 
Policy: 
NIH 
plans 
to 
enhance 
reproducibility. 
Nature 
2014 
• Reproducibility 
issues 
exist 
along 
a 
spectrum 
– Sta4s4cal 
issues 
– Computa4onal 
issues 
7
Introduction: Motivation for Research 
Can different researchers 
from a different lab obtain 
consistent results using a 
different methodology 
Replicable 
Can different researchers and data? 
from a different lab obtain 
consistent results using 
the same methodology? 
Repeatable 
Can the same researchers 
in the same lab obtain 
consistent results using the 
same methodology and 
data? 
Reproducible 
Confidence 
in 
Findings 
Reproducibility 
Spectrum 
8
Introduction: Motivation for Research 
• Sta4s4cal 
issues 
– Repor4ng 
bias 
of 
brain 
volume 
(Ioannidis, 
2011), 
fMRI 
ac4va4on 
foci 
(David, 
2013) 
– Lack 
of 
sta4s4cal 
power 
in 
neuroscience 
(BuZon, 
2013) 
– Data 
collec4on 
and 
analysis 
methods 
are 
highly 
flexible 
across 
fMRI 
studies 
(Carp, 
2012) 
• Computa4onal 
issues 
– Lack 
of 
data 
sharing 
, 
code, 
and 
analysis 
environments 
9
Introduction: Motivation for Research 
Adapted from Peng (2011), Science. 
• Reusable 
Research 
– Can 
different 
researchers 
from 
a 
different 
lab 
apply 
a 
methodology 
to 
process 
shared 
data 
from 
different 
researchers 
in 
a 
different 
lab? 
10
Introduction: Motivation for Research 
Barriers 
to 
reusable 
research 
• Data 
management 
systems 
are 
not 
interoperable 
Poline et al. (2012), Frontiers in Neuroinformatics 
• Data 
acquisi4on 
and 
analysis 
methods 
lack 
provenance 
• Terminologies 
are 
not 
harmonized 
(e.g., 
brain 
atlases, 
schemas) 
11
• To 
Introduction: Research Goals 
enhance 
the 
reusability 
of 
neuroimaging 
data 
and 
workflow 
code 
• To 
advance 
an 
informa4cs 
data 
exchange 
standard 
that 
incorporates 
provenance 
as 
a 
core 
concept 
• To 
engage 
the 
neuroinforma4cs 
community 
as 
a 
partner 
in 
the 
design 
process 
12
Outline 
• Introduction 
• Background 
– Data exchange 
– Provenance 
– Linked Open Data 
• Research approach 
• Conclusions and future directions 
13
Background: Data Exchange 
• My goal is to extend existing standards to 
facilitate data reusability and interoperability 
hZp://xkcd.com/927/ 
14
Background: Data Exchange 
XCEDE XML Schema 
• Experiment Hierarchy is composed of five levels 
of information relevant to neuroimaging data 
exchange 
– Project 
– Subject 
– Visit 
– Study 
– Episode 
– Acquisition 
XML-­‐based 
Clinical 
Experiment 
Data 
Exchange 
Schema, 
Gadde 
et 
al. 
2012 
15
Background: Data Exchange 
W3C PROV Specification Suite 
• Provenance is information about entities, activities, 
and people involved in producing a piece of data or 
thing, which can be used to form assessments about 
its quality, reliability, or trustworthiness. 
– Entity (e.g., files, data, publications) 
• a physical, digital, conceptual, or other kind or thing with some 
fixed aspects 
– Activity (e.g., workflow, editing a manuscript) 
• something that occurs over a period of time and acts upon or 
with entities 
– Agent (e.g., person, software, organization) 
• something that bears some form of responsibility for an activity 
taking place, for the existence of an entity, or for another agent’s 
activity. 
16
Background: Provenance 
• PROV is an extensible 
language to describe: 
– Responsibility 
– Data Flow 
– Process Flow 
• An image registration process 
– wasAssociatedWith a registration algorithm 
– used an native-space natomical MRI 
• A spatially-normalized anatomical MRI 
– wasGeneratedBy an image registration process 
– wasDerivedFrom an native-space anatomical MRI 
– wasAttrbutedTo a registration algorithm 
17
Background: Linked Open Data 
Seman4c 
Web 
and 
Resource 
Descrip4on 
Framework 
• A 
language 
to 
make 
statements 
about 
unique 
loca4ons 
(URLs) 
on 
the 
Web 
• For 
example, 
at 
the 
URL 
of 
an 
anatomical 
MRI 
– ‘is 
a’ 
hZp://neurolex.org/wiki/Nlx_156814 
18
Background: Linked Open Data 
19
Outline 
• Introduction 
• Background 
• Research approach 
– Specific Aims 
– Study Design 
– Phase 1 
– Phase 2 
• Conclusions and future directions 
20
Research Approach: Specific Aims 
• Aim 1: Research and design a framework to 
represent, access, and query neuroimaging data 
provenance 
• Aim 2: Develop an information system of Web 
services to compute and discover data 
provenance from brain imaging workflow 
21
Research Approach: Study Design 
• Phase 1 – Scalable Neuroimaging Initiative (SNI) 
– West Coast collaboration funded by the National Academies 
Keck Futures Initiative (NAKFI) on Imaging Science 
– I led 15 meetings, 1 face-to-face workshop, and presented 
preliminary results at 3 conferences 
• Phase 2 – Neuroimaging Data Sharing (NIDASH) 
– Task force funded and organized by the International 
Neuroinformatics Coordinating Facility (INCF) 
– I gathered feedback and redesigned the initial SNI framework 
over 14 face-to-face workshops, 2 hackathons, and weekly 
meetings over two years 
22
Research Approach: Study Design 
23
Research Approach: Study Design 
Evaluate 
metadata 
standards 
for 
data 
exchange 
(XCEDE) 
Extend 
PROV 
using 
concepts 
from 
XCEDE 
(Neuroimaging 
Data 
Model) 
Demonstrated 
a 
system 
for 
computa4onal 
access 
to 
data 
(NiQuery) 
Redesign 
NiQuery 
using 
a 
sema4c 
Web 
service 
oriented 
architecture 
Phase 
1 
– 
SNI 
Phase 
2 
– 
NIDASH 
Aim 
1 
– 
Data 
Exchange 
Aim 
2 
– 
Informa9on 
System 
24
Outline 
• Introduction 
• Background 
• Research approach 
– General Approach 
– Phase 1 – SNI 
– Phase 2 – NIDASH 
• Conclusions and future directions 
25
Research Approach: Phase 1 – SNI 
• Scalable 
Neuroimaging 
Ini4a4ve’s 
Mission: 
– To 
specify 
and 
demonstrate 
an 
applica4on 
programming 
interface 
(API) 
that 
can 
support 
agile 
explora4on 
of 
distributed 
neuroimaging 
data 
sources 
while 
allowing 
for 
heterogeneous 
and 
evolving 
data 
management 
systems, 
ontologies, 
image 
data 
formats, 
image 
processing 
tools, 
and 
standard 
anatomical 
spaces. 
• Aim 
1 
– 
Data 
Exchange: 
– Applied 
XCEDE 
as 
a 
data 
exchange 
standard 
for 
two 
neuroimaging 
databases 
• Aim 
2 
– 
Informa4on 
System: 
– Implemented 
a 
system 
architecture 
for 
remote 
access 
to 
content 
within 
neuroimaging 
data 
26
Aim 1 
• Queries shipped out 
to multiple sources 
• Links are passed to 
visualization app 
Aim2 
• Extract time series from 
data remotely 
• Browser and plotting all in 
real-time 
Research Approach: Phase 1 – SNI 
27
App# 
NIQ# 
Research Approach: Phase 1 – SNI 
Common# Stanford## NIMS# 
API# 
Allen## 
Ins+tute# Common# ABA# 
API# 
www.niquery.org# 
UW# 
Stanford# 
…# Common# UW# XNAT# 
API# 
Database#Registry# 
WebLbased# Common#Data#Exchange#Layer# 
Applica+ons# 
Query#Processing# 
Query# 
Integrator# 
• System too slow for real-time access (~30 secs.) 
• XCEDE too strict for changing datatype requirements 
• Framework doesn’t incorporate formal provenance 
NiQuery 
presented 
at 
Neuroinforma4cs, 
2012 
Munich 
Brinkley 
(2012), 
Query 
Integrator. 
JBI. 
28
Research Approach: Phase 1 – SNI 
Lessons 
learned 
• Harmonizing 
the 
XCEDE 
and 
PROV 
Schemas 
– XCEDE has a strict hierarchical structure 
– PROV is designed as a graph and compatible with semantic 
Web technologies 
– A harmonized XCEDE and PROV model could represent the 
stages of electronic data capture, not just the experiment 
hierarchy 
• Solution 1: Extend PROV to represent XCEDE 
• Solution 2: Redesign NiQuery using semantic Web 
design concepts 
29
Outline 
• Introduction 
• Background 
• Research approach 
– General Approach 
– Phase 1 – SNI 
– Phase 2 – NIDASH 
• Conclusions and future directions 
30
Research Approach: Phase 2 – NIDASH 
• Neuroimaging 
Data 
Sharing 
Task 
Force 
Mission: 
– Aiming 
at 
reproducibility 
for 
the 
sake 
of 
reproducibility 
and 
enhanced 
research. 
• Aim 
1 
– 
Data 
Exchange: 
– Applied 
XCEDE 
as 
a 
data 
exchange 
standard 
for 
two 
neuroimaging 
databases 
• Aim 
2 
– 
Informa4on 
System: 
– Implemented 
a 
system 
architecture 
for 
remote 
access 
to 
content 
within 
neuroimaging 
data 
31
Research Approach: Phase 2 – NIDASH 
Neuroimaging 
Data 
Model 
(NIDM) 
32
Research Approach: Phase 2 – NIDASH 
• Extensions 
to 
PROV 
using 
elements 
from 
the 
XCEDE 
experiment 
hierarchy, 
workflow 
tools, 
and 
derived 
data 
to 
create 
Domain 
Object 
Models 
• Enables 
a 
model 
bridging 
informa4on 
from 
experiment, 
workflow 
provenance, 
and 
derived 
data 
Keator, 
et 
al. 
2013 
33
Research Approach: Phase 2 – NIDASH 
34
NIDM 
Collabora4on 
• Mee4ngs 
on 
Monday 
and 
Wednesday 
to 
discuss 
previous 
week’s 
issues 
• Satellite 
mee4ngs 
at 
HBM, 
SfN, 
Imaging 
Gene4cs, 
and 
Neuroinforma4cs 
for 
1-­‐2 
days 
each 
• General 
Workflow 
to 
Contribute 
– Contributors 
create 
a 
“fork” 
from 
Github 
(an 
online 
version 
control 
system 
with 
– Changes 
the 
vocabulary 
ad 
examples 
are 
logged 
as 
“commits” 
in 
the 
contributors 
“fork” 
– Contributor 
submits 
a 
“pull 
request” 
to 
have 
changes 
reviewed 
– Discussion 
takes 
place 
online 
un4l 
consensus 
is 
reached 
35
Aim 2: Design and Methods 
Web services for brain imaging: Demo Query App 
36
37
38
NIDM 
Results 
• A 
full 
descrip4on 
is 
outside 
the 
scope 
of 
this 
talk… 
but 
39
NIDM Results 
• A 
harmonized 
model 
for 
repor4ng 
task-­‐based 
fMRI 
across 
SPM, 
FSL 
and 
(soon) 
AFNI 
hZp://nidm.nidash.org/specs/nidm-­‐results.html 
40
NIDM Results 
• All 
terms 
are 
modeled 
with 
an 
iden4fier, 
a 
defini4on, 
domain/range, 
and 
examples 
• Model 
fipng: 
41
NIDM 
Results 
42
Outline 
• Introduction 
• Background 
• Research approach 
• Conclusions and future directions 
– Contributions 
– Implications 
– Future Directions 
43
Conclusions 
and 
future 
direc4ons 
• Collabora4ve 
Framework 
Outcomes 
– 
Github 
is 
an 
effec4ve 
tool 
for 
standards 
development 
• Closed 
89 
issues 
• 1,087 
commits 
• 9 
contributors 
• 1 
publica4on, 
specifica4on 
suite 
• Sorware 
engineering 
outcomes 
– Implemented 
in 
Nipype 
for 
workflow 
management 
– Being 
used 
to 
model 
task 
fMRI 
• Implemented 
for 
SPM 
12 
and 
FSL 
– Being 
incorporated 
into 
NeuroVault 
for 
automated 
popula4on 
of 
a 
database 
to 
share 
SPMs 
44
Acknowledgments 
Committee Members 
James Brinkley (Chair) 
Susan Coldwell(GSR) 
Thomas Grabowski 
Nicholas Anderson 
Scalable Neuroimaging Initiative 
UW: Todd Detwiler, Randy Frank 
Stanford: Brian Wandell, Bob 
Dougherty, Gunnar Schaeffer 
Neuroinformatics Community 
Satra Ghosh, Rich Stoner, JB 
Poline, David Keator, Karl 
Helmer, Camille Maumet, Tom 
Nichols, Dan Marcus, Christian 
Haselgrove, Jessica Turner, 
David Kennedy, Jack van Horn… 
and many others! 
Integrated Brain Imaging Center 
Katie Askren, Peter Boord, Elliot 
Collins, Tina Guan, Clark Johnson, 
Tara Madhyastha, Sonya Mehta, 
Todd Richards, Rosalia Tungaraza, 
Kurt Weaver, Karl Woelfer, Liza 
Young… and everyone else! 
45

More Related Content

PPT
How do we know what we don’t know: Using the Neuroscience Information Framew...
PPTX
The Neuroscience Information Framework: Making Resources Discoverable for the...
PDF
Functional and Architectural Requirements for Metadata: Supporting Discovery...
PDF
Preparing eScience librarians -- RDAP 2012
PDF
Data Provenance and Scientific Workflow Management
PPTX
Open science in RIKEN-KI doctorial course on March 20, 2019
PPTX
Hattrick-Simpers MRS Webinar on AI in Materials
PPTX
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
How do we know what we don’t know: Using the Neuroscience Information Framew...
The Neuroscience Information Framework: Making Resources Discoverable for the...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Preparing eScience librarians -- RDAP 2012
Data Provenance and Scientific Workflow Management
Open science in RIKEN-KI doctorial course on March 20, 2019
Hattrick-Simpers MRS Webinar on AI in Materials
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...

What's hot (20)

PPTX
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
PPTX
A semantic framework for biomedical image discovery
PPTX
Navigating the Neuroscience Data Landscape
PPTX
Application and Implementation of different deep learning
PDF
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
PPTX
How do we know what we don't know?  Exploring the data and knowledge space th...
PPTX
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
PPTX
The real world of ontologies and phenotype representation: perspectives from...
PPT
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
PPTX
2020.04.07 automated molecular design and the bradshaw platform webinar
PPTX
A Knowledge Discovery Framework for Planetary Defense
PDF
PPTX
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
PDF
Research data management during and after your research ; an introduction / L...
PPTX
Working with Global Infrastructure at a National Level
PDF
Share and Reuse: how data sharing can take your research to the next level
PDF
Repositories & Research Data Management
PPT
Cartic Ramakrishnan's dissertation defense
PDF
Data Management Lab: Session 2 slides
PDF
NRNB Annual Report 2016: Overall
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
A semantic framework for biomedical image discovery
Navigating the Neuroscience Data Landscape
Application and Implementation of different deep learning
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
How do we know what we don't know?  Exploring the data and knowledge space th...
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
The real world of ontologies and phenotype representation: perspectives from...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
2020.04.07 automated molecular design and the bradshaw platform webinar
A Knowledge Discovery Framework for Planetary Defense
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Research data management during and after your research ; an introduction / L...
Working with Global Infrastructure at a National Level
Share and Reuse: how data sharing can take your research to the next level
Repositories & Research Data Management
Cartic Ramakrishnan's dissertation defense
Data Management Lab: Session 2 slides
NRNB Annual Report 2016: Overall
Ad

Viewers also liked (17)

PDF
Article
PPTX
Global warming demands more gobal action
PDF
180 130509 inail-dpi_vie_respiratorie_facciali_filtranti_antipolvere
PPT
переводчик
PDF
PDF
Planilla para mac (2) (1)
DOCX
Sterling Property Services
PPTX
鈊象電子
PDF
Team Tigress - Sailing Partnership
PDF
Bluewater Tune-Up
PPTX
"Коронарное шунтирование при ИБС", Черняк А.Л.
PPT
Keep the booze flowing and the money flowing
PDF
Stabilization or solidification of iron ore mine tailings using cement
PPT
Tequila and the cocktails that made it famous
PPT
Digging into the Roots of Tequila- Tales of the Cocktail 2013: A presentation...
PPT
Ten things you can do, to make your bar more money
PPTX
Diapositivas blogs
Article
Global warming demands more gobal action
180 130509 inail-dpi_vie_respiratorie_facciali_filtranti_antipolvere
переводчик
Planilla para mac (2) (1)
Sterling Property Services
鈊象電子
Team Tigress - Sailing Partnership
Bluewater Tune-Up
"Коронарное шунтирование при ИБС", Черняк А.Л.
Keep the booze flowing and the money flowing
Stabilization or solidification of iron ore mine tailings using cement
Tequila and the cocktails that made it famous
Digging into the Roots of Tequila- Tales of the Cocktail 2013: A presentation...
Ten things you can do, to make your bar more money
Diapositivas blogs
Ad

Similar to Reproducibility in human cognitive neuroimaging: a community-­driven data sharing framework for provenance informaton integration and interoperability (20)

PPTX
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
PPTX
Data-knowledge transition zones within the biomedical research ecosystem
PPTX
A Deep Survey of the Digital Resource Landscape
PPTX
Metadata for Research Objects
PPTX
The Neuroscience Information Framework: Establishing a practical semantic fra...
PPT
Data Landscapes - Addiction
PDF
Medical image analysis and big data evaluation infrastructures
PPTX
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
PPT
Data Landscapes: The Neuroscience Information Framework
PPTX
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
PDF
HPC at NIBR
PPTX
Action research for_librarians_carl2012
PPTX
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
PPTX
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
PPTX
empirical-SLR.pptx
PDF
OAI7 Research Objects
PPTX
Action research for_librarians_carl2012
PDF
RDAP14: Learning to Curate Panel
PPTX
Martone grethe
PDF
NSF Software @ ApacheConNA
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
Data-knowledge transition zones within the biomedical research ecosystem
A Deep Survey of the Digital Resource Landscape
Metadata for Research Objects
The Neuroscience Information Framework: Establishing a practical semantic fra...
Data Landscapes - Addiction
Medical image analysis and big data evaluation infrastructures
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Data Landscapes: The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
HPC at NIBR
Action research for_librarians_carl2012
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
empirical-SLR.pptx
OAI7 Research Objects
Action research for_librarians_carl2012
RDAP14: Learning to Curate Panel
Martone grethe
NSF Software @ ApacheConNA

More from Nolan Nichols (6)

PPTX
Maze's Compass Platform - A data fabric for drug discovery and development
PPTX
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
PPTX
Focus on the Evidence: a knowledge graph approach to profiling drug targets
PDF
Meaningful (meta)data at scale: removing barriers to precision medicine research
PPTX
Implementing Semantics-Driven Data Exchange in Brain Science: the NCANDA Case...
PPTX
The National Consortium on Alcohol and Neurodevelopment in Adolescence (NCAND...
Maze's Compass Platform - A data fabric for drug discovery and development
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
Focus on the Evidence: a knowledge graph approach to profiling drug targets
Meaningful (meta)data at scale: removing barriers to precision medicine research
Implementing Semantics-Driven Data Exchange in Brain Science: the NCANDA Case...
The National Consortium on Alcohol and Neurodevelopment in Adolescence (NCAND...

Recently uploaded (20)

PPTX
2Systematics of Living Organisms t-.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
An interstellar mission to test astrophysical black holes
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
2. Earth - The Living Planet earth and life
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
famous lake in india and its disturibution and importance
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
2Systematics of Living Organisms t-.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
An interstellar mission to test astrophysical black holes
2. Earth - The Living Planet Module 2ELS
2. Earth - The Living Planet earth and life
Classification Systems_TAXONOMY_SCIENCE8.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
famous lake in india and its disturibution and importance
Derivatives of integument scales, beaks, horns,.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Biophysics 2.pdffffffffffffffffffffffffff
Phytochemical Investigation of Miliusa longipes.pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
microscope-Lecturecjchchchchcuvuvhc.pptx
The scientific heritage No 166 (166) (2025)
Introduction to Cardiovascular system_structure and functions-1
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice

Reproducibility in human cognitive neuroimaging: a community-­driven data sharing framework for provenance informaton integration and interoperability

  • 1. Reproducibility in human cogni4ve neuroimaging: a community-­‐driven data sharing framework for provenance informa4on integra4on and interoperability Nolan Nichols Dissertation Defense Biomedical and Health Informatics University of Washington Seattle, WA, USA December 8, 2014 1
  • 2. Outline • Introduction • Background • Research approach • Conclusions and future directions 2
  • 3. Outline • Introduction – Motivation for Research – Research Goal • Background • Research approach • Conclusions and future directions 3
  • 4. Introduction: Motivation for Research • Human Cognitive Neuroimaging • Inves4gates brain structure and func4on in normal and neuropsychiatric condi4ons to improve human health • Facilitates clinical decision making using imaging and cogni4ve phenotypes 4
  • 5. Introduction: Motivation for Research • Biomedical Informatics (BMI) – The interdisciplinary field that studies and pursues the effective use of biomedical data, information, and knowledge for scientific inquiry, problem solving, and decision making, motivated by efforts to improve human health • Neuroinformatics – Applies BMI principles to develop techniques and tools for acquiring, sharing, storing, publishing, analyzing, modeling, visualizing and simulating data across all levels of neuroscience 5
  • 6. Introduction: Motivation for Research • Neuroinformatics Perspective • Research is a process with distinct stages • Provenance links together each stage Poline et al. (2012), Frontiers in Neuroinformatics 6
  • 7. Introduction: Motivation for Research • Problem: research is not reproducibile – Ioannidis JPA: Why Most Published Research Findings Are False. PLoS Med 2005 – Donoho D: An invita9on to reproducible computa9onal research. Biosta.s.cs 2010. – Yong EE: Replica9on studies: Bad copy. Nature 2012 – Editorial: Reducing our irreproducibility. Nature 2012 – Begley CG: Six red flags for suspect work. Nature 2013 – Collins FS, Tabak LA: Policy: NIH plans to enhance reproducibility. Nature 2014 • Reproducibility issues exist along a spectrum – Sta4s4cal issues – Computa4onal issues 7
  • 8. Introduction: Motivation for Research Can different researchers from a different lab obtain consistent results using a different methodology Replicable Can different researchers and data? from a different lab obtain consistent results using the same methodology? Repeatable Can the same researchers in the same lab obtain consistent results using the same methodology and data? Reproducible Confidence in Findings Reproducibility Spectrum 8
  • 9. Introduction: Motivation for Research • Sta4s4cal issues – Repor4ng bias of brain volume (Ioannidis, 2011), fMRI ac4va4on foci (David, 2013) – Lack of sta4s4cal power in neuroscience (BuZon, 2013) – Data collec4on and analysis methods are highly flexible across fMRI studies (Carp, 2012) • Computa4onal issues – Lack of data sharing , code, and analysis environments 9
  • 10. Introduction: Motivation for Research Adapted from Peng (2011), Science. • Reusable Research – Can different researchers from a different lab apply a methodology to process shared data from different researchers in a different lab? 10
  • 11. Introduction: Motivation for Research Barriers to reusable research • Data management systems are not interoperable Poline et al. (2012), Frontiers in Neuroinformatics • Data acquisi4on and analysis methods lack provenance • Terminologies are not harmonized (e.g., brain atlases, schemas) 11
  • 12. • To Introduction: Research Goals enhance the reusability of neuroimaging data and workflow code • To advance an informa4cs data exchange standard that incorporates provenance as a core concept • To engage the neuroinforma4cs community as a partner in the design process 12
  • 13. Outline • Introduction • Background – Data exchange – Provenance – Linked Open Data • Research approach • Conclusions and future directions 13
  • 14. Background: Data Exchange • My goal is to extend existing standards to facilitate data reusability and interoperability hZp://xkcd.com/927/ 14
  • 15. Background: Data Exchange XCEDE XML Schema • Experiment Hierarchy is composed of five levels of information relevant to neuroimaging data exchange – Project – Subject – Visit – Study – Episode – Acquisition XML-­‐based Clinical Experiment Data Exchange Schema, Gadde et al. 2012 15
  • 16. Background: Data Exchange W3C PROV Specification Suite • Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability, or trustworthiness. – Entity (e.g., files, data, publications) • a physical, digital, conceptual, or other kind or thing with some fixed aspects – Activity (e.g., workflow, editing a manuscript) • something that occurs over a period of time and acts upon or with entities – Agent (e.g., person, software, organization) • something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity. 16
  • 17. Background: Provenance • PROV is an extensible language to describe: – Responsibility – Data Flow – Process Flow • An image registration process – wasAssociatedWith a registration algorithm – used an native-space natomical MRI • A spatially-normalized anatomical MRI – wasGeneratedBy an image registration process – wasDerivedFrom an native-space anatomical MRI – wasAttrbutedTo a registration algorithm 17
  • 18. Background: Linked Open Data Seman4c Web and Resource Descrip4on Framework • A language to make statements about unique loca4ons (URLs) on the Web • For example, at the URL of an anatomical MRI – ‘is a’ hZp://neurolex.org/wiki/Nlx_156814 18
  • 20. Outline • Introduction • Background • Research approach – Specific Aims – Study Design – Phase 1 – Phase 2 • Conclusions and future directions 20
  • 21. Research Approach: Specific Aims • Aim 1: Research and design a framework to represent, access, and query neuroimaging data provenance • Aim 2: Develop an information system of Web services to compute and discover data provenance from brain imaging workflow 21
  • 22. Research Approach: Study Design • Phase 1 – Scalable Neuroimaging Initiative (SNI) – West Coast collaboration funded by the National Academies Keck Futures Initiative (NAKFI) on Imaging Science – I led 15 meetings, 1 face-to-face workshop, and presented preliminary results at 3 conferences • Phase 2 – Neuroimaging Data Sharing (NIDASH) – Task force funded and organized by the International Neuroinformatics Coordinating Facility (INCF) – I gathered feedback and redesigned the initial SNI framework over 14 face-to-face workshops, 2 hackathons, and weekly meetings over two years 22
  • 24. Research Approach: Study Design Evaluate metadata standards for data exchange (XCEDE) Extend PROV using concepts from XCEDE (Neuroimaging Data Model) Demonstrated a system for computa4onal access to data (NiQuery) Redesign NiQuery using a sema4c Web service oriented architecture Phase 1 – SNI Phase 2 – NIDASH Aim 1 – Data Exchange Aim 2 – Informa9on System 24
  • 25. Outline • Introduction • Background • Research approach – General Approach – Phase 1 – SNI – Phase 2 – NIDASH • Conclusions and future directions 25
  • 26. Research Approach: Phase 1 – SNI • Scalable Neuroimaging Ini4a4ve’s Mission: – To specify and demonstrate an applica4on programming interface (API) that can support agile explora4on of distributed neuroimaging data sources while allowing for heterogeneous and evolving data management systems, ontologies, image data formats, image processing tools, and standard anatomical spaces. • Aim 1 – Data Exchange: – Applied XCEDE as a data exchange standard for two neuroimaging databases • Aim 2 – Informa4on System: – Implemented a system architecture for remote access to content within neuroimaging data 26
  • 27. Aim 1 • Queries shipped out to multiple sources • Links are passed to visualization app Aim2 • Extract time series from data remotely • Browser and plotting all in real-time Research Approach: Phase 1 – SNI 27
  • 28. App# NIQ# Research Approach: Phase 1 – SNI Common# Stanford## NIMS# API# Allen## Ins+tute# Common# ABA# API# www.niquery.org# UW# Stanford# …# Common# UW# XNAT# API# Database#Registry# WebLbased# Common#Data#Exchange#Layer# Applica+ons# Query#Processing# Query# Integrator# • System too slow for real-time access (~30 secs.) • XCEDE too strict for changing datatype requirements • Framework doesn’t incorporate formal provenance NiQuery presented at Neuroinforma4cs, 2012 Munich Brinkley (2012), Query Integrator. JBI. 28
  • 29. Research Approach: Phase 1 – SNI Lessons learned • Harmonizing the XCEDE and PROV Schemas – XCEDE has a strict hierarchical structure – PROV is designed as a graph and compatible with semantic Web technologies – A harmonized XCEDE and PROV model could represent the stages of electronic data capture, not just the experiment hierarchy • Solution 1: Extend PROV to represent XCEDE • Solution 2: Redesign NiQuery using semantic Web design concepts 29
  • 30. Outline • Introduction • Background • Research approach – General Approach – Phase 1 – SNI – Phase 2 – NIDASH • Conclusions and future directions 30
  • 31. Research Approach: Phase 2 – NIDASH • Neuroimaging Data Sharing Task Force Mission: – Aiming at reproducibility for the sake of reproducibility and enhanced research. • Aim 1 – Data Exchange: – Applied XCEDE as a data exchange standard for two neuroimaging databases • Aim 2 – Informa4on System: – Implemented a system architecture for remote access to content within neuroimaging data 31
  • 32. Research Approach: Phase 2 – NIDASH Neuroimaging Data Model (NIDM) 32
  • 33. Research Approach: Phase 2 – NIDASH • Extensions to PROV using elements from the XCEDE experiment hierarchy, workflow tools, and derived data to create Domain Object Models • Enables a model bridging informa4on from experiment, workflow provenance, and derived data Keator, et al. 2013 33
  • 34. Research Approach: Phase 2 – NIDASH 34
  • 35. NIDM Collabora4on • Mee4ngs on Monday and Wednesday to discuss previous week’s issues • Satellite mee4ngs at HBM, SfN, Imaging Gene4cs, and Neuroinforma4cs for 1-­‐2 days each • General Workflow to Contribute – Contributors create a “fork” from Github (an online version control system with – Changes the vocabulary ad examples are logged as “commits” in the contributors “fork” – Contributor submits a “pull request” to have changes reviewed – Discussion takes place online un4l consensus is reached 35
  • 36. Aim 2: Design and Methods Web services for brain imaging: Demo Query App 36
  • 37. 37
  • 38. 38
  • 39. NIDM Results • A full descrip4on is outside the scope of this talk… but 39
  • 40. NIDM Results • A harmonized model for repor4ng task-­‐based fMRI across SPM, FSL and (soon) AFNI hZp://nidm.nidash.org/specs/nidm-­‐results.html 40
  • 41. NIDM Results • All terms are modeled with an iden4fier, a defini4on, domain/range, and examples • Model fipng: 41
  • 43. Outline • Introduction • Background • Research approach • Conclusions and future directions – Contributions – Implications – Future Directions 43
  • 44. Conclusions and future direc4ons • Collabora4ve Framework Outcomes – Github is an effec4ve tool for standards development • Closed 89 issues • 1,087 commits • 9 contributors • 1 publica4on, specifica4on suite • Sorware engineering outcomes – Implemented in Nipype for workflow management – Being used to model task fMRI • Implemented for SPM 12 and FSL – Being incorporated into NeuroVault for automated popula4on of a database to share SPMs 44
  • 45. Acknowledgments Committee Members James Brinkley (Chair) Susan Coldwell(GSR) Thomas Grabowski Nicholas Anderson Scalable Neuroimaging Initiative UW: Todd Detwiler, Randy Frank Stanford: Brian Wandell, Bob Dougherty, Gunnar Schaeffer Neuroinformatics Community Satra Ghosh, Rich Stoner, JB Poline, David Keator, Karl Helmer, Camille Maumet, Tom Nichols, Dan Marcus, Christian Haselgrove, Jessica Turner, David Kennedy, Jack van Horn… and many others! Integrated Brain Imaging Center Katie Askren, Peter Boord, Elliot Collins, Tina Guan, Clark Johnson, Tara Madhyastha, Sonya Mehta, Todd Richards, Rosalia Tungaraza, Kurt Weaver, Karl Woelfer, Liza Young… and everyone else! 45