SlideShare a Scribd company logo
Open science resources
for ‘Big Data’ analyses of
the human connectome
Cameron Craddock, PhD
Computational Neuroimaging Lab
Center for Biomedical Imaging and Neuromodulation
Nathan S. Kline Institute for Psychiatric Research
Center for the Developing Brain
Child Mind Institute
The Human Connectome
• The sum total of all of the brain’s
connections
– Structural connections: synapses and
fibers
• Diffusion MRI
– Functional connections: synchronized
physiological activity
• Resting state functional MRI
• Nodes are brain areas
• Edges are connections
Craddock et al. Nature Methods, 2013.
Connectomics is Big Data
Discovery science of human brain function
1. Characterizing inter-individual variation in connectomes (Kelly et al.
2012)
2. Identifying biomarkers of disease state, severity, and prognosis
(Craddock 2009)
3. Re-defining mental health in terms of neurophenotypes, e.g. RDOC
(Castellanos 2013)
Data is often shared only in its raw form – must be preprocessed to remove
nuisance variation and to be made comparable across individuals and sites.
No consensus on preprocessing
Non-white noise in fMRI: Does modelling have an impact?
Torben E. Lund,a,* Kristoffer H. Madsen,a,b
Karam Sidaros,a
Wen-Lin Luo,c
and Thomas E. Nicholsd
a
Danish Research Centre for Magnetic Resonance, Copenhagen University Hospital, Hvidoure, Kettegaard Alle´ 30, 2650 Hvidovre, Denmark
b
Informatics and Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark
c
Merck & Co., Inc., Whitehouse Station, New Jersey 08889-0100, USA
d
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109-2029, USA
Received 16 December 2004; revised 1 July 2005; accepted 6 July 2005
Available online 11 August 2005
The sources of non-white noise in Blood Oxygenation Level Dependent
(BOLD) functional magnetic resonance imaging (fMRI) are many.
Familiar sources include low-frequency drift due to hardware
imperfections, oscillatory noisedueto respiration and cardiac pulsation
and residual movement artefacts not accounted for by rigid body
registration. These contributions give rise to temporal autocorrelation
in theresidualsof thefMRI signal and invalidate thestatistical analysis
as the errors are no longer independent. The low-frequency drift is
often removed by high-pass filtering, and other effects are typically
modelled as an autoregressive (AR) process. In this paper, we propose
an alternative approach: Nuisance Variable Regression (NVR). By
identically normally distributed (i.i.d.), thisobservation isimportant
and hashad largeimpact on paradigm design and dataanalyses.
With non-white noise, the i.i.d. assumption is no longer
fulfilled, and if this is ignored, the estimated standard deviations
will typically be negatively biased, resulting in invalid (liberal)
statistical inferences. Another consequence is the difficulty in
detecting signals when covered in noise. As we are normally
interested in the GM signal, it is problematic that this is the region
where structured noise is most pronounced. With physiological
noise increasing with field strength (Kru¨ger and Glover, 2001;
www.elsevier.com/locate/ynimg
NeuroImage 29 (2006) 54 – 66
e noise in fMRI: Does modelling have an impact?
nd,a,* Kristoffer H. Madsen,a,b
Karam Sidaros,a
c
and Thomas E. Nicholsd
tre for Magnetic Resonance, Copenhagen University Hospital, Hvidoure, Kettegaard Alle´ 30, 2650 Hvidovre, Denmark
ematical Modelling, Technical University of Denmark, Lyngby, Denmark
hitehouse Station, New Jersey 08889-0100, USA
istics, University of Michigan, Ann Arbor, Michigan 48109-2029, USA
r 2004; revised 1 July 2005; accepted 6 July 2005
ugust 2005
hite noisein Blood Oxygenation Level Dependent
magnetic resonance imaging (fMRI) are many.
clude low-frequency drift due to hardware
identically normally distributed (i.i.d.), thisobservation isimportant
andhashadlargeimpact onparadigmdesignanddataanalyses.
www.elsevier.com/locate/ynimg
NeuroImage 29 (2006) 54 – 66
A component based noise correction method (CompCor) for BOLD
and perfusion based fMRI
Yashar Behzadi,a,b
Khaled Restom,a
Joy Liau,a,b
and Thomas T. Liua,⁎
a
UCSD Center for Functional Magnetic Resonance Imaging and Department of Radiology, 9500 Gilman Drive, MC 0677, La Jolla, CA 92093-0677, USA
b
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Received 18 December 2006; revised 23 April 2007; accepted 25 April 2007
Available online 3 May 2007
A component based method (CompCor) for the reduction of noise in
both blood oxygenation level-dependent (BOLD) and perfusion-
based functional magnetic resonance imaging (fMRI) data is
presented. In the proposed method, significant principal components
are derived from noise regions-of-interest (ROI) in which the time
series data are unlikely to be modulated by neural activity. These
components are then included as nuisance parameters within general
linear models for BOLD and perfusion-based fMRI time series data.
Two approaches for the determination of the noise ROI are
neurovascular coupling mechanisms (Hoge et al., 1999). However,
as the fMRI community has moved to higher field strengths,
physiological noise has become an increasingly important
confound limiting the sensitivity and the application of fMRI
studies (Kruger and Glover, 2001; Liu et al., 2006).
Physiological fluctuations have been shown to be a significant
source of noise in BOLD fMRI experiments, with an even greater
effect in perfusion-based fMRI utilizing arterial spin labeling
www.elsevier.com/locate/ynimg
NeuroImage 37 (2007) 90– 101
This is particularly complicated for “post-hoc” aggregated datasets
A variety of analyses
The cost of discovery
“Best practice” r-fMRI preprocessing: ~ 2 hours
Discovery dataset: ~1,000 subjects
“Point and click” processing: 2,000 person hours (1 year)
Scripted processing: 2,000 CPU hours (84 days to
minutes)
Different derivatives and analyses add time
Different preprocessing strategies scale time
Configurable Pipeline for the Analysis of
Connectomes (CPAC)
• Pipeline to automate preprocessing and analysis
of large-scale datasets
• Most cutting edge functional connectivity
preprocessing and analysis algorithms
• Configurable to enable “plurality” – evaluate
different processing parameters and strategies
• Automatically identifies and takes advantage of
parallelism on multi-threaded, multi-core, and
cluster architectures
• “Warm restarts” – only re-compute what has
changed
• Open science – open source
• http://fcp-indi.github.io
Nypipe
• 33 datasets acquired with a variety of different test-retest
designs
– Intra- and inter-session re-tests
– 1629 subjects
– 3357 anatomical MRI scans
– 5093 resting state fMRI scans
– 1302 diffusion MRI scans
http://fcon_1000.projects.nitrc.org/indi/CoRR/html/index.html
Why not share preprocessed data?
• Make data available to a
wider audience of
researchers
• Evaluate reproducibility
of analysis results
http://preprocessed-connectomes-project.github.io/
ADHD-200 Preprocessed
• 374 ADHD & 598 TDC
– 7-21 years old
• Two functional pipelines
– Athena: FSL & AFNI,
precursor to C-PAC
– NIAK: MINC tools + NIAK
using PSOM pipeline
• Structural pipeline
– Burner: SPM5 based VBM
ADHD-200 Preprocessed (2)
• 9,500 downloads from 49
different users
• Athena preprocessed data
used by winning team of the
ADHD Global competition
• 31 peer reviewed publications,
2 dissertations and 1 patent
– (http://www.mendeley.com/gr
oups/4198361/adhd-200-
preprocessed/)
Figure 2. Overview of the ADHD-200 Preprocessed audience.
Beijing DTI Preprocessed
• 180 healthy college
students
• 55 with Verbal,
Performance, and Full IQ
• Preprocessed using FSL
– DTI scalars (FA, MD, etc…)
– Probabilistic Tractography
ABIDE Preprocessed indexed by NDAR
• 539 ASD and 573 typical
– 6 – 64 years old
– Some overlap with controls
from ADHD-200
• 4 Functional Preprocessing
Pipelines
• 4 Preprocessing strategies
– GSR, No-GSR, Filtering, No-
Filtering
• 4 Cortical thickness pipelines
– ANTS, CIVET, Freesurfer,
Mindboggle
ABIDE Preprocessed (2)
DPARSF
CCS CBRAIN
Team Tools Analyses
C-BRAIN CIVET, MINC Cortical Measures
C-PAC AFNI, ANTs, FSL, Nipype R-fMRI, VBM, Cortical Measures
CCS AFNI, Freesurfer, FSL R-fMRI, VBM, Cortical Measures
DPARSF DPARSF, REST, SPM R-fMRI
Mindboggle Mindboggle Cortical Measures
NIAK II MINC, NIAK, PSOM R-fMRI
Quality Assessment Protocol
• Spatial Measures
– Contrast to Noise Ratio
– Entropy Focus Criterion
– Foreground to Background Energy Ratio
– Smoothness (FWHM)
– % Artifact Voxels
– Signal-to-Noise Ratio
• Temporal Measures
– Standardized DVARS
– Median distance index
– Mean Functional Displacement
– # Voxels with FD > 0.2m
– % Voxels with FD > 0.2m
http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/
Quality Assessment Protocol (2)
• Implemented in python
• Normative datasets to
help learn thresholds
for quality control
– ABIDE
– CoRR
http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/
Open science resources for `Big Data' Analyses of the human connectome
Regional Brainhacks
• One event that linked 8 Cities,
3 Countries, 2 continents
– Ann Arbor
– Boston
– Miami
– Montreal
– New York City
– Porto Alegre, Brazil
– Toronto
– Washington DC
Acknowledgements
CPAC Team: Daniel Clark, Steven Giavasis and Michael Milham.
Quality Assessment Protocol: Zarrar Shehzad, Daniel Lurie, Steven Giavasis, and Sang Han Lee.
ABIDE Preprocessed: Pierre Bellec, Yassine Benhajali, Francois Chouinard, Daniel Clark, R.
Cameron Craddock, Alan Evans, Steven Giavasis, Budhachandra Khundrakpam, John Lewis,
Qingyang Li, Zarrar Shezhad, Aimi Watanabe, Ting Xu, Chao-Gan Yan, Zhen Yang, Xinian Zuo, and
the ABIDE consortium.
Brainhack Organizers: Pierre Bellec, Daniel Margulies, Maarten Mennes, Donald McLauren, Satra
Ghosh, Matt Hutchison, Robert Welsh, Scott Peltier, Jonathan Downer, Stephen Strother, Katie
Dunlop, Angie Laird, Lucina Uddin, Benjamin De Leener, Julien Cohen-Adad, Andrew Gerber, Alex
Franco, Caroline Froehlich, Felipe Meneguzzi, John VanMeter, Lei Liew, Ziad Saad, Prantik Kundu
CPAC-NDAR integration was funded by a contract from NDAR.
ABIDE Preprocessed data is hosted in a Public S3 Bucket provided by AWS.

More Related Content

PDF
PRML11章
PDF
これから Haskell を書くにあたって
PDF
PRML輪読#11
PDF
OpenStackトラブルシューティング入門
PPTX
Lecture 5
PPTX
Visual Studio CodeでRを使う
PDF
PRML輪読#10
PPTX
電子工作のための電子回路基礎ー前編
PRML11章
これから Haskell を書くにあたって
PRML輪読#11
OpenStackトラブルシューティング入門
Lecture 5
Visual Studio CodeでRを使う
PRML輪読#10
電子工作のための電子回路基礎ー前編

What's hot (20)

PDF
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
PDF
2 3.GLMの基礎
PDF
PRML Chapter 14
PDF
深層学習 第6章
PDF
スパースモデリング
PDF
PRML輪読#13
ODP
プログラミング言語のマスコットとか紹介
PDF
PRML輪読#9
PPTX
最低6回は見よ
PPTX
OpenStackで始めるクラウド環境構築入門 Havana&DevStack編
PDF
PRML 2.3.1-2.3.2
PDF
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
PDF
Linuxでソフトシンセを作って動かす
PDF
From mcmc to sgnht
PDF
[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク
PDF
Introducing the elastic 8.0 release a new era of speed, scale, relevance, and...
PPTX
連続変量を含む条件付相互情報量の推定
PDF
Tokyor35 人工データの発生
PDF
PRML 6.4-6.5
PDF
MCMCで研究報告
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
2 3.GLMの基礎
PRML Chapter 14
深層学習 第6章
スパースモデリング
PRML輪読#13
プログラミング言語のマスコットとか紹介
PRML輪読#9
最低6回は見よ
OpenStackで始めるクラウド環境構築入門 Havana&DevStack編
PRML 2.3.1-2.3.2
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
Linuxでソフトシンセを作って動かす
From mcmc to sgnht
[DL輪読会]Deep Learning 第10章 系列モデリング 回帰結合型ニューラルネットワークと再帰型ネットワーク
Introducing the elastic 8.0 release a new era of speed, scale, relevance, and...
連続変量を含む条件付相互情報量の推定
Tokyor35 人工データの発生
PRML 6.4-6.5
MCMCで研究報告
Ad

Viewers also liked (20)

PPTX
Computational approaches for mapping the human connectome
PPTX
CPAC Connectome Analysis in the Cloud
PPTX
Connectome
PDF
A tutorial in Connectome Analysis (1) - Marcus Kaiser
PDF
A tutorial in Connectome Analysis (2) - Marcus Kaiser
PPTX
What the church dont want to talk about (homosexuality) (2)
PPTX
Multiple sclerosis as a simultaneous "2 components" disease
PPTX
how neurons connect to each others?
DOCX
VEU_CST499_FinalReport
PPTX
Slide Presentation
PPTX
CMB Final Presentation
PDF
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
PDF
Connectome Classification: Statistical Connectomics for Analysis of Connectom...
PDF
Robust Object Recognition with Cortex-Like Mechanisms
PDF
Open Source in Quant Finance - xlwings
PDF
A tutorial in Connectome Analysis (3) - Marcus Kaiser
PPTX
Multiple Sclerosis and pregnancy: Guidelines from the literature
PPTX
Healthy Mind
PDF
Introduction to Neuroimaging
PDF
fMRI preprocessing steps (in SPM8)
Computational approaches for mapping the human connectome
CPAC Connectome Analysis in the Cloud
Connectome
A tutorial in Connectome Analysis (1) - Marcus Kaiser
A tutorial in Connectome Analysis (2) - Marcus Kaiser
What the church dont want to talk about (homosexuality) (2)
Multiple sclerosis as a simultaneous "2 components" disease
how neurons connect to each others?
VEU_CST499_FinalReport
Slide Presentation
CMB Final Presentation
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Connectome Classification: Statistical Connectomics for Analysis of Connectom...
Robust Object Recognition with Cortex-Like Mechanisms
Open Source in Quant Finance - xlwings
A tutorial in Connectome Analysis (3) - Marcus Kaiser
Multiple Sclerosis and pregnancy: Guidelines from the literature
Healthy Mind
Introduction to Neuroimaging
fMRI preprocessing steps (in SPM8)
Ad

Similar to Open science resources for `Big Data' Analyses of the human connectome (20)

PPTX
Introduction to resting state fMRI preprocessing and analysis
PDF
Computational approaches to fMRI analysis
PPTX
PCP Quality Assessment Protocol
PDF
Fmri From Nuclear Spins To Brain Functions 1st Ed 2015 Kmil Uluda
PDF
Cosmo-not: a brief look at methods of analysis in functional MRI and in diffu...
PPTX
Introduction to fMRI
PPTX
Open repositories for neuroimaging research
PDF
Neuroscience: Myths, Metaphors and Marketing
PDF
feilner0201
PPT
praise poster
PPTX
Puce U kentucky_2020
PPTX
Rutherford_MiCHAMP2020.pptx
PDF
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
PDF
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
PPTX
NEURAL ENGINEERING UNIT 4 FUNCTIONAL NEUROIMAGING AND RECOGNITION
PPTX
Neuro circuit
PPT
Brain Imaging for Fun and Profit
PDF
Spatial and Temporal Features of Noise in fMRI
PDF
Connectomics_Journal Club
PDF
Learning about the brain: Neuroimaging and Beyond
Introduction to resting state fMRI preprocessing and analysis
Computational approaches to fMRI analysis
PCP Quality Assessment Protocol
Fmri From Nuclear Spins To Brain Functions 1st Ed 2015 Kmil Uluda
Cosmo-not: a brief look at methods of analysis in functional MRI and in diffu...
Introduction to fMRI
Open repositories for neuroimaging research
Neuroscience: Myths, Metaphors and Marketing
feilner0201
praise poster
Puce U kentucky_2020
Rutherford_MiCHAMP2020.pptx
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
NEURAL ENGINEERING UNIT 4 FUNCTIONAL NEUROIMAGING AND RECOGNITION
Neuro circuit
Brain Imaging for Fun and Profit
Spatial and Temporal Features of Noise in fMRI
Connectomics_Journal Club
Learning about the brain: Neuroimaging and Beyond

More from Cameron Craddock (6)

PDF
Genetics influence inter-subject Brain State Prediction.
PPTX
Prediction Analysis in Clinical and Basic Neuroscience
PPTX
Using RealTime fMRI Based Neurofeedback To Probe Default Network Regulation
PPTX
Head Motion in fMRI
PPTX
Using RealTime fMRI Based Neurofeedback to Probe Default Network Regulation
PPTX
Tracking Dynamic Networks in Real Time
Genetics influence inter-subject Brain State Prediction.
Prediction Analysis in Clinical and Basic Neuroscience
Using RealTime fMRI Based Neurofeedback To Probe Default Network Regulation
Head Motion in fMRI
Using RealTime fMRI Based Neurofeedback to Probe Default Network Regulation
Tracking Dynamic Networks in Real Time

Recently uploaded (20)

PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Seminar Hypertension and Kidney diseases.pptx
PDF
The Land of Punt — A research by Dhani Irwanto
PDF
Phytochemical Investigation of Miliusa longipes.pdf
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
C1 cut-Methane and it's Derivatives.pptx
PDF
The scientific heritage No 166 (166) (2025)
PDF
Sciences of Europe No 170 (2025)
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Overview of calcium in human muscles.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
perinatal infections 2-171220190027.pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
Microbes in human welfare class 12 .pptx
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Biophysics 2.pdffffffffffffffffffffffffff
Seminar Hypertension and Kidney diseases.pptx
The Land of Punt — A research by Dhani Irwanto
Phytochemical Investigation of Miliusa longipes.pdf
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
C1 cut-Methane and it's Derivatives.pptx
The scientific heritage No 166 (166) (2025)
Sciences of Europe No 170 (2025)
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Overview of calcium in human muscles.pptx
Introduction to Cardiovascular system_structure and functions-1
perinatal infections 2-171220190027.pptx
lecture 2026 of Sjogren's syndrome l .pdf
Introcution to Microbes Burton's Biology for the Health
Microbes in human welfare class 12 .pptx
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...

Open science resources for `Big Data' Analyses of the human connectome

  • 1. Open science resources for ‘Big Data’ analyses of the human connectome Cameron Craddock, PhD Computational Neuroimaging Lab Center for Biomedical Imaging and Neuromodulation Nathan S. Kline Institute for Psychiatric Research Center for the Developing Brain Child Mind Institute
  • 2. The Human Connectome • The sum total of all of the brain’s connections – Structural connections: synapses and fibers • Diffusion MRI – Functional connections: synchronized physiological activity • Resting state functional MRI • Nodes are brain areas • Edges are connections Craddock et al. Nature Methods, 2013.
  • 4. Discovery science of human brain function 1. Characterizing inter-individual variation in connectomes (Kelly et al. 2012) 2. Identifying biomarkers of disease state, severity, and prognosis (Craddock 2009) 3. Re-defining mental health in terms of neurophenotypes, e.g. RDOC (Castellanos 2013) Data is often shared only in its raw form – must be preprocessed to remove nuisance variation and to be made comparable across individuals and sites.
  • 5. No consensus on preprocessing Non-white noise in fMRI: Does modelling have an impact? Torben E. Lund,a,* Kristoffer H. Madsen,a,b Karam Sidaros,a Wen-Lin Luo,c and Thomas E. Nicholsd a Danish Research Centre for Magnetic Resonance, Copenhagen University Hospital, Hvidoure, Kettegaard Alle´ 30, 2650 Hvidovre, Denmark b Informatics and Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark c Merck & Co., Inc., Whitehouse Station, New Jersey 08889-0100, USA d Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109-2029, USA Received 16 December 2004; revised 1 July 2005; accepted 6 July 2005 Available online 11 August 2005 The sources of non-white noise in Blood Oxygenation Level Dependent (BOLD) functional magnetic resonance imaging (fMRI) are many. Familiar sources include low-frequency drift due to hardware imperfections, oscillatory noisedueto respiration and cardiac pulsation and residual movement artefacts not accounted for by rigid body registration. These contributions give rise to temporal autocorrelation in theresidualsof thefMRI signal and invalidate thestatistical analysis as the errors are no longer independent. The low-frequency drift is often removed by high-pass filtering, and other effects are typically modelled as an autoregressive (AR) process. In this paper, we propose an alternative approach: Nuisance Variable Regression (NVR). By identically normally distributed (i.i.d.), thisobservation isimportant and hashad largeimpact on paradigm design and dataanalyses. With non-white noise, the i.i.d. assumption is no longer fulfilled, and if this is ignored, the estimated standard deviations will typically be negatively biased, resulting in invalid (liberal) statistical inferences. Another consequence is the difficulty in detecting signals when covered in noise. As we are normally interested in the GM signal, it is problematic that this is the region where structured noise is most pronounced. With physiological noise increasing with field strength (Kru¨ger and Glover, 2001; www.elsevier.com/locate/ynimg NeuroImage 29 (2006) 54 – 66 e noise in fMRI: Does modelling have an impact? nd,a,* Kristoffer H. Madsen,a,b Karam Sidaros,a c and Thomas E. Nicholsd tre for Magnetic Resonance, Copenhagen University Hospital, Hvidoure, Kettegaard Alle´ 30, 2650 Hvidovre, Denmark ematical Modelling, Technical University of Denmark, Lyngby, Denmark hitehouse Station, New Jersey 08889-0100, USA istics, University of Michigan, Ann Arbor, Michigan 48109-2029, USA r 2004; revised 1 July 2005; accepted 6 July 2005 ugust 2005 hite noisein Blood Oxygenation Level Dependent magnetic resonance imaging (fMRI) are many. clude low-frequency drift due to hardware identically normally distributed (i.i.d.), thisobservation isimportant andhashadlargeimpact onparadigmdesignanddataanalyses. www.elsevier.com/locate/ynimg NeuroImage 29 (2006) 54 – 66 A component based noise correction method (CompCor) for BOLD and perfusion based fMRI Yashar Behzadi,a,b Khaled Restom,a Joy Liau,a,b and Thomas T. Liua,⁎ a UCSD Center for Functional Magnetic Resonance Imaging and Department of Radiology, 9500 Gilman Drive, MC 0677, La Jolla, CA 92093-0677, USA b Department of Bioengineering, University of California San Diego, La Jolla, CA, USA Received 18 December 2006; revised 23 April 2007; accepted 25 April 2007 Available online 3 May 2007 A component based method (CompCor) for the reduction of noise in both blood oxygenation level-dependent (BOLD) and perfusion- based functional magnetic resonance imaging (fMRI) data is presented. In the proposed method, significant principal components are derived from noise regions-of-interest (ROI) in which the time series data are unlikely to be modulated by neural activity. These components are then included as nuisance parameters within general linear models for BOLD and perfusion-based fMRI time series data. Two approaches for the determination of the noise ROI are neurovascular coupling mechanisms (Hoge et al., 1999). However, as the fMRI community has moved to higher field strengths, physiological noise has become an increasingly important confound limiting the sensitivity and the application of fMRI studies (Kruger and Glover, 2001; Liu et al., 2006). Physiological fluctuations have been shown to be a significant source of noise in BOLD fMRI experiments, with an even greater effect in perfusion-based fMRI utilizing arterial spin labeling www.elsevier.com/locate/ynimg NeuroImage 37 (2007) 90– 101 This is particularly complicated for “post-hoc” aggregated datasets
  • 6. A variety of analyses
  • 7. The cost of discovery “Best practice” r-fMRI preprocessing: ~ 2 hours Discovery dataset: ~1,000 subjects “Point and click” processing: 2,000 person hours (1 year) Scripted processing: 2,000 CPU hours (84 days to minutes) Different derivatives and analyses add time Different preprocessing strategies scale time
  • 8. Configurable Pipeline for the Analysis of Connectomes (CPAC) • Pipeline to automate preprocessing and analysis of large-scale datasets • Most cutting edge functional connectivity preprocessing and analysis algorithms • Configurable to enable “plurality” – evaluate different processing parameters and strategies • Automatically identifies and takes advantage of parallelism on multi-threaded, multi-core, and cluster architectures • “Warm restarts” – only re-compute what has changed • Open science – open source • http://fcp-indi.github.io Nypipe
  • 9. • 33 datasets acquired with a variety of different test-retest designs – Intra- and inter-session re-tests – 1629 subjects – 3357 anatomical MRI scans – 5093 resting state fMRI scans – 1302 diffusion MRI scans http://fcon_1000.projects.nitrc.org/indi/CoRR/html/index.html
  • 10. Why not share preprocessed data? • Make data available to a wider audience of researchers • Evaluate reproducibility of analysis results http://preprocessed-connectomes-project.github.io/
  • 11. ADHD-200 Preprocessed • 374 ADHD & 598 TDC – 7-21 years old • Two functional pipelines – Athena: FSL & AFNI, precursor to C-PAC – NIAK: MINC tools + NIAK using PSOM pipeline • Structural pipeline – Burner: SPM5 based VBM
  • 12. ADHD-200 Preprocessed (2) • 9,500 downloads from 49 different users • Athena preprocessed data used by winning team of the ADHD Global competition • 31 peer reviewed publications, 2 dissertations and 1 patent – (http://www.mendeley.com/gr oups/4198361/adhd-200- preprocessed/) Figure 2. Overview of the ADHD-200 Preprocessed audience.
  • 13. Beijing DTI Preprocessed • 180 healthy college students • 55 with Verbal, Performance, and Full IQ • Preprocessed using FSL – DTI scalars (FA, MD, etc…) – Probabilistic Tractography
  • 14. ABIDE Preprocessed indexed by NDAR • 539 ASD and 573 typical – 6 – 64 years old – Some overlap with controls from ADHD-200 • 4 Functional Preprocessing Pipelines • 4 Preprocessing strategies – GSR, No-GSR, Filtering, No- Filtering • 4 Cortical thickness pipelines – ANTS, CIVET, Freesurfer, Mindboggle
  • 15. ABIDE Preprocessed (2) DPARSF CCS CBRAIN Team Tools Analyses C-BRAIN CIVET, MINC Cortical Measures C-PAC AFNI, ANTs, FSL, Nipype R-fMRI, VBM, Cortical Measures CCS AFNI, Freesurfer, FSL R-fMRI, VBM, Cortical Measures DPARSF DPARSF, REST, SPM R-fMRI Mindboggle Mindboggle Cortical Measures NIAK II MINC, NIAK, PSOM R-fMRI
  • 16. Quality Assessment Protocol • Spatial Measures – Contrast to Noise Ratio – Entropy Focus Criterion – Foreground to Background Energy Ratio – Smoothness (FWHM) – % Artifact Voxels – Signal-to-Noise Ratio • Temporal Measures – Standardized DVARS – Median distance index – Mean Functional Displacement – # Voxels with FD > 0.2m – % Voxels with FD > 0.2m http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/
  • 17. Quality Assessment Protocol (2) • Implemented in python • Normative datasets to help learn thresholds for quality control – ABIDE – CoRR http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/
  • 19. Regional Brainhacks • One event that linked 8 Cities, 3 Countries, 2 continents – Ann Arbor – Boston – Miami – Montreal – New York City – Porto Alegre, Brazil – Toronto – Washington DC
  • 20. Acknowledgements CPAC Team: Daniel Clark, Steven Giavasis and Michael Milham. Quality Assessment Protocol: Zarrar Shehzad, Daniel Lurie, Steven Giavasis, and Sang Han Lee. ABIDE Preprocessed: Pierre Bellec, Yassine Benhajali, Francois Chouinard, Daniel Clark, R. Cameron Craddock, Alan Evans, Steven Giavasis, Budhachandra Khundrakpam, John Lewis, Qingyang Li, Zarrar Shezhad, Aimi Watanabe, Ting Xu, Chao-Gan Yan, Zhen Yang, Xinian Zuo, and the ABIDE consortium. Brainhack Organizers: Pierre Bellec, Daniel Margulies, Maarten Mennes, Donald McLauren, Satra Ghosh, Matt Hutchison, Robert Welsh, Scott Peltier, Jonathan Downer, Stephen Strother, Katie Dunlop, Angie Laird, Lucina Uddin, Benjamin De Leener, Julien Cohen-Adad, Andrew Gerber, Alex Franco, Caroline Froehlich, Felipe Meneguzzi, John VanMeter, Lei Liew, Ziad Saad, Prantik Kundu CPAC-NDAR integration was funded by a contract from NDAR. ABIDE Preprocessed data is hosted in a Public S3 Bucket provided by AWS.

Editor's Notes

  • #4: Green boxes indicate initiatives in which data is aggregated after it is acquired, rather than centralized initiatives in which data acquisition was coordinated between sites. Since the data collection is not coordinated for these sites, the data is more heterogeneous, being collected with different parameters.
  • #5: The goal of large-scale analyses of connectomes data are to map inter-individual variation in phenotype, such as sex, age, IQ, etc, to variation in the connectome. For clinical populations, we are particularly concerned with identifying connectome based biomarkers of disease presence, its severity, and prognosis, specifically treatment outcomes. Recently, there has been consternation about the ecological validity of psychiatric disease classifications that are based on syndromes that are described by the presence of symptoms. This provides the opportunity to redefine psychiatric populations based on the similarity of connectomes, i.e. clustering individuals based on their connectivity profiles.
  • #6: Several different methods have been proposed for preprocessing connectomes data to remove nuisance variation that obscures biological variation. Some of these methods have been shown to introduce artifacts that bias results. Rather than an single best practice, a pluralistic approach is needed, in which several different procedures are performed and the results are compared to identify those that are robust across strategies.
  • #7: In addition to a large number of preprocessing strategies, several different methods of analyses have been proposed such as: (left to right) (top row) eigenvector and degree centrality, voxel mirrored homotopic connecitivity, fractional amplitude of low frequency fluctuations, bootstrap analysis of stable clusters, (bottom) regional homogeniety, amplitude of low frequency fluctuations, and multi-dimensional matrix regression.
  • #8: Preprocessing large datasets using current tools requires a substantial amount of time, even if automated using scripts. Performing multiple preprocessing strategies is a multiplier for execution time, and different analysis adds to computation time. What is needed are tools that can not only automate the preprocessing, but also take advantage of parallelism inherent in the data and algorithms, to achieve high-throughput processing on high performance computing architectures. We are developing CPAC to this aim.
  • #9: The ultimate goal of CPAC is to make high-throughput state-of-the-art connectomes analyses accessible to researchers who lack programming and/or neuroimaging expertise. It is currently still in alpha, with the expectation of being beta by mid 2015. It is currently limited to functional connectivity analyses, but we plan to add DTI by the end of 2015.
  • #10: With all of the different options for preprocessing and analysis, how can we determine which is best? One option is to use the preprocessing that optimizes test-retest reliability, but performing this analysis requires retest data. The recent CORR initiative addresses this problem by amassing and sharing a very large repository of test-retest datasets.
  • #11: Instead of sharing just raw data, why not share the preprocessed data as well?
  • #12: In 2011, the ADHD-200 consortium put together a large database of MRI data on individuals with and without ADHD. They inaugurated this initiative with a competition to learn the best classifier of ADHD, with the hope of attracting Data Scientists to using the data. But this was initially unsuccessful due to the lack of resources and knowledge required to preprocess the data. The ADHD-200 preprocessed initiative was created to remove this barrier from the competition.
  • #13: Since its release, the ADHD200 preprocessed intitiative has been very successful with 9,500 downloads, and has been used in several publications, including the entry that won the ADHD-200 global competition.
  • #14: The preprocessed connectomes initiative is not restricted to fMRI data, the Beijing DTI preprocessed initiative containts data from 180 healthy individuals, IQ is available on 55 of these individuals.
  • #15: Recently this initiative has been expanded to include the ABIDE dataset that includes data from individuals with Autism. In this initiative, 4 different preprocessing pipelines were used, each of which performed preprocessing using 4 different strategies. We also include 4 different structural processing pipelines.
  • #17: A major challenge with using openly shared data neuroimaging data is determining which of the data is useable. This is particularly problematic for preprocessed data, since many of the image features that allow us to evaluate image quality by visual inspection has been removed from the data. There is currently no consensus on which quantitative measures of image quality are the most useful for quality assessment, or which thresholds can be applied to these measures to differentiate good quality data from bad. The objective of the QAP is to start addressing these issues.
  • #18: . Several quality measures have been selected from a literature review, and have been used to calculate quality measures from the ABIDE and CORR datasets. The goal is to build normative distributions of these measures, to be used to start learning the relevance of the measures for assessing quality
  • #19: For the past three years, the Neuro Bureau has been hosting annual brainhacks to encourage researchers from a variety of different backgrounds to work together in open collaboration on neuroscientific problems. We have hosted events in Leipzig Germany, Paris, and Berlin, and all have been well attended. Projects have ranged from developing web-based methods for visualizing neuroimaging results, analyses to determine disease related cortical thickness differences in autism, and art projects. The international events are hosted annually.
  • #20: Although international brainhack events have been successful at bringing researchers from different countries together to collaborate, they do not incentivize collaboration at the regional scale. In october 2014 we hosted a series of regional brainhacks that were designed to do just that. We had over 300 participants across 8 cites in 3 countries and 2 continents. In the first year we limited the event to sites within one or two hours of the eastern daylight timezone so that the sites could readily share content. Moving forward we hope to host these events across the timezones.