Open science resources for `Big Data' Analyses of the human connectome

Open science resources
for ‘Big Data’ analyses of
the human connectome
Cameron Craddock, PhD
Computational Neuroimaging Lab
Center for Biomedical Imaging and Neuromodulation
Nathan S. Kline Institute for Psychiatric Research
Center for the Developing Brain
Child Mind Institute

The Human Connectome
• The sum total of all of the brain’s
connections
– Structural connections: synapses and
fibers
• Diffusion MRI
– Functional connections: synchronized
physiological activity
• Resting state functional MRI
• Nodes are brain areas
• Edges are connections
Craddock et al. Nature Methods, 2013.

Discovery science of human brain function
1. Characterizing inter-individual variation in connectomes (Kelly et al.
2012)
2. Identifying biomarkers of disease state, severity, and prognosis
(Craddock 2009)
3. Re-defining mental health in terms of neurophenotypes, e.g. RDOC
(Castellanos 2013)
Data is often shared only in its raw form – must be preprocessed to remove
nuisance variation and to be made comparable across individuals and sites.

No consensus on preprocessing
Non-white noise in fMRI: Does modelling have an impact?
Torben E. Lund,a,* Kristoffer H. Madsen,a,b
Karam Sidaros,a
Wen-Lin Luo,c
and Thomas E. Nicholsd
a
Danish Research Centre for Magnetic Resonance, Copenhagen University Hospital, Hvidoure, Kettegaard Alle´ 30, 2650 Hvidovre, Denmark
b
Informatics and Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark
c
Merck & Co., Inc., Whitehouse Station, New Jersey 08889-0100, USA
d
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109-2029, USA
Received 16 December 2004; revised 1 July 2005; accepted 6 July 2005
Available online 11 August 2005
The sources of non-white noise in Blood Oxygenation Level Dependent
(BOLD) functional magnetic resonance imaging (fMRI) are many.
Familiar sources include low-frequency drift due to hardware
imperfections, oscillatory noisedueto respiration and cardiac pulsation
and residual movement artefacts not accounted for by rigid body
registration. These contributions give rise to temporal autocorrelation
in theresidualsof thefMRI signal and invalidate thestatistical analysis
as the errors are no longer independent. The low-frequency drift is
often removed by high-pass filtering, and other effects are typically
modelled as an autoregressive (AR) process. In this paper, we propose
an alternative approach: Nuisance Variable Regression (NVR). By
identically normally distributed (i.i.d.), thisobservation isimportant
and hashad largeimpact on paradigm design and dataanalyses.
With non-white noise, the i.i.d. assumption is no longer
fulfilled, and if this is ignored, the estimated standard deviations
will typically be negatively biased, resulting in invalid (liberal)
statistical inferences. Another consequence is the difficulty in
detecting signals when covered in noise. As we are normally
interested in the GM signal, it is problematic that this is the region
where structured noise is most pronounced. With physiological
noise increasing with field strength (Kru¨ger and Glover, 2001;
www.elsevier.com/locate/ynimg
NeuroImage 29 (2006) 54 – 66
e noise in fMRI: Does modelling have an impact?
nd,a,* Kristoffer H. Madsen,a,b
Karam Sidaros,a
c
and Thomas E. Nicholsd
tre for Magnetic Resonance, Copenhagen University Hospital, Hvidoure, Kettegaard Alle´ 30, 2650 Hvidovre, Denmark
ematical Modelling, Technical University of Denmark, Lyngby, Denmark
hitehouse Station, New Jersey 08889-0100, USA
istics, University of Michigan, Ann Arbor, Michigan 48109-2029, USA
r 2004; revised 1 July 2005; accepted 6 July 2005
ugust 2005
hite noisein Blood Oxygenation Level Dependent
magnetic resonance imaging (fMRI) are many.
clude low-frequency drift due to hardware
identically normally distributed (i.i.d.), thisobservation isimportant
andhashadlargeimpact onparadigmdesignanddataanalyses.
NeuroImage 29 (2006) 54 – 66
A component based noise correction method (CompCor) for BOLD
and perfusion based fMRI
Yashar Behzadi,a,b
Khaled Restom,a
Joy Liau,a,b
and Thomas T. Liua,⁎
a
UCSD Center for Functional Magnetic Resonance Imaging and Department of Radiology, 9500 Gilman Drive, MC 0677, La Jolla, CA 92093-0677, USA
b
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Received 18 December 2006; revised 23 April 2007; accepted 25 April 2007
Available online 3 May 2007
A component based method (CompCor) for the reduction of noise in
both blood oxygenation level-dependent (BOLD) and perfusion-
based functional magnetic resonance imaging (fMRI) data is
presented. In the proposed method, significant principal components
are derived from noise regions-of-interest (ROI) in which the time
series data are unlikely to be modulated by neural activity. These
components are then included as nuisance parameters within general
linear models for BOLD and perfusion-based fMRI time series data.
Two approaches for the determination of the noise ROI are
neurovascular coupling mechanisms (Hoge et al., 1999). However,
as the fMRI community has moved to higher field strengths,
physiological noise has become an increasingly important
confound limiting the sensitivity and the application of fMRI
studies (Kruger and Glover, 2001; Liu et al., 2006).
Physiological fluctuations have been shown to be a significant
source of noise in BOLD fMRI experiments, with an even greater
effect in perfusion-based fMRI utilizing arterial spin labeling
NeuroImage 37 (2007) 90– 101
This is particularly complicated for “post-hoc” aggregated datasets

The cost of discovery
“Best practice” r-fMRI preprocessing: ~ 2 hours
Discovery dataset: ~1,000 subjects
“Point and click” processing: 2,000 person hours (1 year)
Scripted processing: 2,000 CPU hours (84 days to
minutes)
Different derivatives and analyses add time
Different preprocessing strategies scale time

Configurable Pipeline for the Analysis of
Connectomes (CPAC)
• Pipeline to automate preprocessing and analysis
of large-scale datasets
• Most cutting edge functional connectivity
preprocessing and analysis algorithms
• Configurable to enable “plurality” – evaluate
different processing parameters and strategies
• Automatically identifies and takes advantage of
parallelism on multi-threaded, multi-core, and
cluster architectures
• “Warm restarts” – only re-compute what has
changed
• Open science – open source
• http://fcp-indi.github.io
Nypipe

• 33 datasets acquired with a variety of different test-retest
designs
– Intra- and inter-session re-tests
– 1629 subjects
– 3357 anatomical MRI scans
– 5093 resting state fMRI scans
– 1302 diffusion MRI scans
http://fcon_1000.projects.nitrc.org/indi/CoRR/html/index.html

Why not share preprocessed data?
• Make data available to a
wider audience of
researchers
• Evaluate reproducibility
of analysis results
http://preprocessed-connectomes-project.github.io/

ADHD-200 Preprocessed
• 374 ADHD & 598 TDC
– 7-21 years old
• Two functional pipelines
– Athena: FSL & AFNI,
precursor to C-PAC
– NIAK: MINC tools + NIAK
using PSOM pipeline
• Structural pipeline
– Burner: SPM5 based VBM

ADHD-200 Preprocessed (2)
• 9,500 downloads from 49
different users
• Athena preprocessed data
used by winning team of the
ADHD Global competition
• 31 peer reviewed publications,
2 dissertations and 1 patent
– (http://www.mendeley.com/gr
oups/4198361/adhd-200-
preprocessed/)
Figure 2. Overview of the ADHD-200 Preprocessed audience.

Beijing DTI Preprocessed
• 180 healthy college
students
• 55 with Verbal,
Performance, and Full IQ
• Preprocessed using FSL
– DTI scalars (FA, MD, etc…)
– Probabilistic Tractography

ABIDE Preprocessed indexed by NDAR
• 539 ASD and 573 typical
– 6 – 64 years old
– Some overlap with controls
from ADHD-200
• 4 Functional Preprocessing
Pipelines
• 4 Preprocessing strategies
– GSR, No-GSR, Filtering, No-
Filtering
• 4 Cortical thickness pipelines
– ANTS, CIVET, Freesurfer,
Mindboggle

ABIDE Preprocessed (2)
DPARSF
CCS CBRAIN
Team Tools Analyses
C-BRAIN CIVET, MINC Cortical Measures
C-PAC AFNI, ANTs, FSL, Nipype R-fMRI, VBM, Cortical Measures
CCS AFNI, Freesurfer, FSL R-fMRI, VBM, Cortical Measures
DPARSF DPARSF, REST, SPM R-fMRI
Mindboggle Mindboggle Cortical Measures
NIAK II MINC, NIAK, PSOM R-fMRI

Quality Assessment Protocol
• Spatial Measures
– Contrast to Noise Ratio
– Entropy Focus Criterion
– Foreground to Background Energy Ratio
– Smoothness (FWHM)
– % Artifact Voxels
– Signal-to-Noise Ratio
• Temporal Measures
– Standardized DVARS
– Median distance index
– Mean Functional Displacement
– # Voxels with FD > 0.2m
– % Voxels with FD > 0.2m
http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/

Quality Assessment Protocol (2)
• Implemented in python
• Normative datasets to
help learn thresholds
for quality control
– ABIDE
– CoRR
http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/

Open science resources for `Big Data' Analyses of the human connectome

Regional Brainhacks
• One event that linked 8 Cities,
3 Countries, 2 continents
– Ann Arbor
– Boston
– Miami
– Montreal
– New York City
– Porto Alegre, Brazil
– Toronto
– Washington DC

Acknowledgements
CPAC Team: Daniel Clark, Steven Giavasis and Michael Milham.
Quality Assessment Protocol: Zarrar Shehzad, Daniel Lurie, Steven Giavasis, and Sang Han Lee.
ABIDE Preprocessed: Pierre Bellec, Yassine Benhajali, Francois Chouinard, Daniel Clark, R.
Cameron Craddock, Alan Evans, Steven Giavasis, Budhachandra Khundrakpam, John Lewis,
Qingyang Li, Zarrar Shezhad, Aimi Watanabe, Ting Xu, Chao-Gan Yan, Zhen Yang, Xinian Zuo, and
the ABIDE consortium.
Brainhack Organizers: Pierre Bellec, Daniel Margulies, Maarten Mennes, Donald McLauren, Satra
Ghosh, Matt Hutchison, Robert Welsh, Scott Peltier, Jonathan Downer, Stephen Strother, Katie
Dunlop, Angie Laird, Lucina Uddin, Benjamin De Leener, Julien Cohen-Adad, Andrew Gerber, Alex
Franco, Caroline Froehlich, Felipe Meneguzzi, John VanMeter, Lei Liew, Ziad Saad, Prantik Kundu
CPAC-NDAR integration was funded by a contract from NDAR.
ABIDE Preprocessed data is hosted in a Public S3 Bucket provided by AWS.

Open science resources for `Big Data' Analyses of the human connectome

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Open science resources for `Big Data' Analyses of the human connectome (20)

More from Cameron Craddock (6)

Recently uploaded (20)

Open science resources for `Big Data' Analyses of the human connectome

Editor's Notes