SlideShare a Scribd company logo
minx ||y - Ax||2 + λ ||x||1
Irina Rish
Computational Psychiatry and Neuroimaging
IBM T.J. Watson Research Center
Learning About the Brain:
Neuroimaging and Beyond
Collaborators (an incomplete list)
IBM T.J. Watson Research:
Guillermo Cecchi Steve Heisig Aurelie Lozano
Google: Mt Sinai: Northwestern U.
Melissa Carroll Rita Goldstein A. Vania Apkarian
INRIA: Neurospin/UC Berkeley
Bertrand Thirion
MIT:
Pouya Bashivan
Purdue:
Jean Honorio
Lehigh U.
Katya Scheinberg
SUNY Stony Brook
Dimitris Samaras
St. Johns U.
Genady GrabarnikJB Poline
USC:
Sahil Garg
AI Brain2
Brain 2 AI: Brain-inspired AI Algorithms
AI 2 Brain: Mental-State Prediction
and Statistical Biomarker Discovery
Mental State Recognition to
Improve Mental Function
Detecting emotional & cognitive changes to predict response to
different types of input, e.g. music, video, news, ads, emails
(both for mental health and for neuromarketing)
Safety: detecting changes in driver’s alertness level
(drowsiness, microsleeps) to prevent accidents
Computational psychiatry:
data-analytic approach to diagnosis based on objective measurements
(new Research Domain Criteria (RDoC) initiative by NIMH)
Our current focus: schizophrenia, addiction, Huntington’s, Alzheimer’s, Parkinson’s
“Psychiatric research is in crisis”
[Wiecki et al. 2015]
AI 2 Brain:
Health & Productivity: mental-state-sensitive software
monitoring cognitive load, focus/attention; monitoring
stress/anxiety
Measuring Brain Activity with Functional MRI
Image courtesy of fMRI Research
Center at Columbia University
• Blood-oxygen-level-dependent (BOLD) signal
related to brain activity while subject performs some task in scanner
• 4D ‘brain movie’: a sequence of 3D brain volumes
3D voxels ~ 3x3x3 mm, time repetitions (TR) ~1-2s
• Challenge: high-dimensional, small-sample datasets
10,000 to 100,000 variables (voxels), few 100s of TRs (samples), few 100 subjects or less
n
Data from
[Baliki, Geha, Apkarian 2008]
14 healthy subjects presented with painful thermal
stimuli while in fMRI scanner, and asked to rate their
pain level (using a finger-span device).
Example: Pain Perception
Which brain areas are “relevant” to pain?
Can we predict pain perception from fMRI?
[PLoS Comp Bio 2012] , [SPIE Med.Imaging 2012], [Brain Informatics 2010]
Another Example: Cocaine Addiction and Methylphenidate
[Rish et al, SPIE 2016]
Does MPH normalize CUD brain activity?
Yes!
- univariate hypothesis testing [Konova 2013][Goldstein 2010]
- Our ML approach: CUD’s are harder to discriminate from
controls when under MPH  their brains look more ‘normal’
A therapeutic agent for CUD? A stimulant for a stimulant?
(similarly to nicotine patch and methadone for heroin addiction)
Mechanism: cocaine affects the reward and may lead to
addiction (cocaine use disorder, or CUD)
Resting-state fMRI experiment: MPH vs. placebo, CUD vs. controls
Functional connectivity features (degrees) [Konova et al 2013]
MPH (Ritalin), often used to treat ADHD, has similar chemical
structure and mechanism of action, but slower rate of clearance
(90 vs 20 min; thus a lower abuse potential).
[Goldstein & Volkow, 2002]
What are we typically looking for in fMRI data?
Question: given a stimulus, mental state or disorder,
find relevant brain areas and/or interactions among them
Traditional GLM approach - voxel-wise ‘activations’ +
univariate stat. tests - ignores voxel interactions!
But, simple and interpretable, and thus still very popular
Alternative: multivariate methods to predict mental states
• cognitive (e.g., viewing a picture, listening to instructions)
• emotional (level of pain, anxiety, happiness)
• disorders (e.g. schizophrenia, ADHD, addiction)
Look for predictive patterns (hint: no black-box models)
Feature Engineering (prior knowledge)
- e.g. network properties
[Rish et al, PLoS One 2013], [Cecchi et al, NIPS 2009]
[Rish et al, SPIE Med.Imaging 2012], [Gheiratmand et al, submitted]
Feature Selection (sparsity)
[Rish et al, SPIE Med.Imaging 2012], [Honorio et al, AISTATS 2012],
[Rish et al, Brain Informatics 2010],[Carroll et al, Neuroimage 2009]
Feature Extraction: Learning Representations
- dictionary learning, deep convnets learning
[Rish et al, SfN 2011], [Rish et al, ICML 2008],
[Bashivan et al, ICLR 2016], [Garg et al, submitted]
Biomarkers

Predictive
Features
+
++
+
- -
---
Predictive Model
patients
controls
Our Goal: Interpretable Machine Learning
minx ||y - Ax||2 + λ ||x||1
Mental State Prediction via Sparse Regression
y = Ax + noise
fMRI data (“encoding’)
rows – samples (~500)
Columns – voxels (~30,000)
Unknown
parameters
(‘signal’)
Measurements:
mental states, behavior,
tasks or stimuli
Solution: embedded variable selection via sparse regression
Find a small number of (jointly) most relevant brain voxels
Issue: high-dimensional, small-sample data
Need to (1) prevent overfitting and (2) find interpretable solution
ISSUE: high-dimensional, small-sample problem
- solutions are overfit to data: poor generalization
- difficult to interpret (determine relevant voxels)
APPROACH:
- LASSO: adds ℓ1-norm regularization
- selects relevant voxels (sparse solution  many zero coefficients)
- improving LASSO: Elastic Net - sparsity + grouping of correlated variables
Sparse Regression Methods
Adding Structure (Grouping) Helps
Elastic Net vs. LASSO:
- Higher prediction accuracy:
0.7-0.8 correlation between actual and predicted pain
ratings and other tasks (e.g., PBAIC competition)
- Better interpretability: voxel clusters (areas) versus
scattered single voxels
- Grouping parameter improves model stability (overlap)
across different runs
Some other ‘structured sparsity’ methods:
• Group LASSO - when groups (e.g. regions) are known
• Fused LASSO – spatial (or temporal) continuity
• Moreover, adding structure in graphical LASSO, etc.
[SPIE Med.Imaging 2012], [Brain Informatics 2010], [Neuroimage 2009]
VaryingPainPerceptionData Driven + Analytical Models = Best Performance
[PLoS Comp Bio 2012]
Stimulus (temperature) to Pain dynamical model
with 3 parameters captures inter-subject variability!
pain threshold anticipationforgetting
Dynamical model (stimulus to pain) + sparse regression
(fMRI to stimulus) outperforms purely data driven model
(fMRI to pain) – due to highly accurate analytical part!
Looking Beyond Single Sparse Solution:
Local vs Distributed (“Holographic”) Information
Simple auditory task (PBAIC):
localized, sparse
Complex pain experience:
Distributed/‘holographic’
Sharp transition from highly relevant first
two solutions (2000 voxels), to almost
irrelevant rest of voxels (accuracy < 0.2)
No such sharp transition, slow linear decay
from best (on average) 0.65 accuracy (1st
solution) to 0.5 (10th sol.) and 0.4 accuracy
(24th solution, 23,000 voxels removed)
[Rish et al, SPIE Med.Imaging 2012]
Exponential decay in relevance measured by
univariate correlation with the task vs. linear
decay for prediction accuracy
Highly predictive solution #25 (0.52 accuracy
vs. 0.67 of the 1st solution) has no voxels with
individual correlation above 0.1!
[SPIE 2012]
Standard GLM Analysis Would Not Reveal Such Behavior!
Subsequent solutions are indeed
distributed through the brain
[Rish et al, SPIE Med.Imaging 2012]
Objective: discover discriminative patterns (biomarkers)
Not localized, but rather a network disease
Functional network features vs local voxel activations:
Network features significantly outperformed activation features in
(1) significance tests, (2) classification and (3) stability across CV-subsets, i.e.
despite ‘normal’ task-response, functional connectivity is significantly disrupted
Network Extracted
Correlation Matrix
(N
2
=2x10
10
)
Thresholded Matrix
MR
Signal
M1
V1
PP
1 N
1
N
-0.5
0
0.5
1
1 N
1
N
Features Engineering to Predict Schizophrenia
Voxel degrees +
GMRF = 86% accuracy
Cross-voxel correlations
+ SVM = 93% accuracy
[Cecchi et al, NIPS 2009] [Rish et al, PLoS ONE 2013]
Functional network features:
degrees, link weights, etc.
Activation features: univariate
correlations of a voxel w/ task
Learning Interpretable Whole-Brain Markov Networks
Problem: whole-brain Markov nets, even link-sparse (e.g., learned by glasso),
are hard to interpret; can we identify most relevant nodes (voxels) ?
Hypothesis: often, only a relatively few “important” variables are
interacting with each other, forming clusters.
variable-selection prior: block l1/lp norm
log-likelihood link sparsity node sparsity
Proposed approach: group-Lasso penalty for node selection
[Honorio et al, AISTATS 2012]
Our method vs standard glasso:
- higher accuracy (log-likelihood)
- better interpretability
We observe in cocaine addiction:
• increased connectivity between the visual
cortex (left) and the prefrontal cortex (right)
• decreased connectivity between the visual
cortex and other brain areas
• relation to prior art:
• visual cortex abnormalities in addiction
observed by [Lee et al 2003]
• prefrontal cortex is involved in decision
making and reward processing, abnormal
monetary processing in PFC reported in
[Goldstein et al, 2009]
Markov Network Disruptions in Cocaine Addiction
cocaine addicts control subjects
graphicallassoourmethod
blue - positive interactions
red - negative interactions
Visual attention task + monetary reward
[Honorio et al, AISTATS 2012]
Representation Learning with ConvNets: an EEG study
EEG Experiment:working memory
13 subjects, 240 runs (3120 trials)
Samples: a subset of 2670 correctly answered trials
Evaluation: leave-one-subject out (i.e., 13-fold) CV
Feature Extraction: FFT to find spectral power within each
electrode at three frequency bands - theta (4-8Hz),
alpha (8-13Hz), and beta (13-30Hz).
[Bashivan et al, ICLR 2016]
EEG-images: 3D electrode
locations (64) are projected
into a 2D surface via distance-
preserving Azimuthal
Equidistant Projection.
Topographical activity map
within each band is transformed
into an image by interpolating
the values between electrodes
on a 32 x 32 mesh.
• FFT over the complete trial =
single image for each trial
• VGG style ConvNets
[Simonyan & Zisserman, 2015]
• Convolutional layers with 3 x 3
receptive fields
• Various architectures were
explored with different
number of layers
ConvNets Architectures: Single-Frame Approach
ConvNet Configurations
A B C D
input (32 x 32 3-channel image)
Conv3-32
Conv3-32
Conv3-32
Conv3-32
Conv3-32
Conv3-32
Conv3-32
Conv3-32
Conv3-32
Conv3-32
maxpool
-
Conv3-64
Conv3-64
Conv3-64
Conv3-64
Conv3-64
Conv3-64
- maxpool
- -
Conv3-
128
Conv3-
128
- - maxpool
Architecture
Number of
parameters
Test Error
A ~10k 13.05
B ~65.5k 13.17
C ~139.4k 13.91
D ~158k 12.39
Better Results with Recurrent ConvNets
Best result of 8.9% error discriminating
among 4 levels of cognitive load
achieved by recurrent Conv Nets with
LSTM + time convolution
[Bashivan et al, ICLR 2016]
• EEG times series for each trial split into 7
windows (0.5 sec). FFT on each time window
to get an image as before
• Best ConvNet (7-layer) used as C
• All 7 ConvNets shared parameters
• video classification architectures from
[Ng et al, CVPR 2015]
• Temporal Maxpool: Max pool over time frames
• Temporal Convolution: 1D convolution over time
frames
• LSTM - sequence mapping over times frames
• Mixed LSTM/1D-Conv: Combination of both LSTM
and 1D-Conv architectures
Architecture
Test
Error (%)
Validation
Error (%)
Number of
parameters
RBF SVM 15.34 - -
L1-logistic
regression
15.32 -
-
Random Forest 12.59 - -
DBN 14.96 8.37 1.02 mil
ConvNet+Maxpoo
l
14.80 8.48
1.21 mil
ConvNet+1D-
Conv
11.32 9.28
441 k
ConvNet+LSTM 10.54 6.10 1.34 mil
ConvNet+LSTM/1
D-Conv
8.89 8.39 1.62 mil
But What About Interpretability?
Code: https://github.com/pbashivan/EEGLearn
Using deconvnet of [Zeiler et al] to map features
back to the brain images
Back Projections: maps obtained by deconvnet on
the feature map displaying structures in the input
image that excite that particular feature map.
Some of these features correspond to well-known
electrophysiological markers of cognitive load.
First-layer features (1st stack, kernel 7) captured wide-
spread theta (1st stack output-kernel7) and another
(1st stack, kernel 23) frontal beta activity
Second- and third-layer features – frontal theta/beta
(2nd stack,kernel7) and 3rd stack kernel60, 112) as well
as parietal alpha (2nd stack kernel29) .
Frontal theta and beta activity as well as parietal alpha
are most prominent markers of cognitive/memory load
in neuroscience literature [Bashivan et al., 2015; Jensen
et al., 2002; Onton et al., 2005; Tallon-Baudry et al., 1999]
Input EEG images: top 9 images with highest
feature activations across the training set
Layer4Layer6Layer7
Summary: Machine Learning in Neuroimaging
“Statistical biomarkers”:
[Cecchi et al, NIPS 2009]
[Rish et al, PLOS One, 2013]
[Carroll et al, Neuroimage 2009]
[Scheinberg&Rish, ECML 2010]
Schizophrenia classification: 86% to 93% accuracy
[Rish et al, Brain Informatics 2010]
[Rish et al, SPIE Med.Imaging 2012]
[Cecchi et al, PLOS Comp Bio 2012]
Cognitive state prediction in videogames: 70-95%
Pain perception: 70-80%, distributed activation patterns
[Honorio et al, AISTATS 2012]
[Rish et al, SPIE Med.Imaging 2016]
Cocaine addiction: distinct Markov network patterns
[Bashivan et al, ICLR 2016]
EEG-cognitive load prediction: 91% w/ recurrent ConvNets
+
++
+
- -
---
Predictive Model
mental
disorder
healthy
Beyond the Scanner: Using ‘Cheaper’ Sensors?
NeuroSky
Muse EEG
EEG, accelerometer
Hexoskin
Heart rate
Respiration
Heart-rate
variability
Jawbone UP3
Heart rate
Respiration
Galvanic Skin Response (GSR)
Skin temperature
Ambient Temperature
Accelerometer
So much data, so little inference 
What can we actually learn from all these “big personal data”?
Anything about mental states?
Device detects voltage
at different skin locations:
1. TP9 Behind Left Ear
2. FP1 Front Left Forehead
3. FP2 Front Right Forehead
4. TP10 Behind Right Ear
FP2
FP1
TP10
TP9
Delta – Adult slow wave sleep, continuous attention processes
Theta – Drowsiness, idling, inhibition
Alpha – Relaxed, reflecting
Beta – Alert, busy, anxious, thinking
Gamma – Short term memory usage, using 2 senses at once
Experiments with Muse Wearable EEG
Experiment 1: Meditation
• Exploring differences across people engaged in the same activity
• Each session consisted of 7 minutes of closed eyes meditation
• Clustering mean frequency band vectors
26
Solutions
SH
PB
IR
G
HC
WomenMen
Green: lower than average
Red: higher then average
Lower beta, higher alpha:
More relaxed, less anxious
Higher beta, lower alpha:
Vice versa
HC
• Can you tell from EEG what kind of movie a person is watching?
• 2 short (~7 min) youtube videos, in 2 sessions
• Funny (“emotional”) cat videos vs Educational (“rational”) Khan Academy
• Feature extraction from raw EEG data:
• 1st and 2nd order statistics from frequency bands
• ‘Functional networks’ across bands and sensors
• Classifiers:
• SVM (RBF kernel), sparse logistic regression, decision forest,
deep nets (3-layer RBM)
• Best results:
• Individual-level: SVM, 74.5% accuracy
• Group-level: sparse logistic regression, 74% accuracy
• Most-predictive features:
• variances in power within frequency bands, and
correlations cross-bands and cross-channels (‘functional network’)
• Sparse logistic regression predicted best with only 31 out of 441 (7% of all) features
27
Experiment 2: Watching Videos
[Bashivan et al, 2015]
TP9 – HG
FP1 – B
FP2 – A
FP2 – B
TP10 – G
TP10 – HG
Text Analytics for Computational Psychiatry
``Language is a window into the brain’’ - M. Covington
• 93% accuracy discriminating schizophrenics from manics
based on syntactic speech graphs [PLoS One, 2012]
• Nearly 100% accuracy predicting 1st
psychotic episode 1-2 YEARS in
advance via coherence and other
features [Nature Schizophrenia, 2015]
• 88% accuracy discriminating ecstasy and
meth users from controls, using semantic
features such as proximity to ‘empathy’
concept, etc., and graph features
[Neuropsychopharmacology, 2014]
Example: Speech Coherence
Text coherence:
Currently measured as the angle
between vector representations of
consecutive sentences (word vectors
computed by LSA)
https://www.youtube.com/watch?v=MXzwAXzUwwEhttps://www.youtube.com/watch?v=6xx_pwu7n-Y
Sober vs. Non-sober
Speech Coherence for Jenna
Phrase-to-phrase Coherence
Alternate-phraseCoherence
Non-sober
Sober
[Heisig et al, AAAI ws 2014]
Sensor 1
Sensor 3
I walked
into a café ..
Sensor 2
Sensor data
 Text
 Audio
 Video
 EEG signal
 Temperature
 Heart-rate
 Skin-
conductance
Psycho- and
physiological
Features
Voice power
spectrum
Text topic model
Syntactic graph
HRV spectrum
Cheap
data
+
Smart Analytics:
Machine learning+
graph theory
=
Behavioral
prediction
Brain sciences:
Psychology+
Neuroscience
Behavioral
Phenotype
Baselining
Change-point
detection
Predictions
Towards “Augmented Human”:
Real-Time Mind-Reading from Cheap Sensors
• Current theories: the hippocampus functions as an autoenconder to evoke
memories; similar encoding function is suggested in the olfactory bulb
• Our computational model: sparse linear autoencoder (online dictionary learning of
Mairal et al) + dynamic addition (birth) abnd deletion (death) of hidden nodes
Adult Neurogenesis:
Inspiration for Adaptive Representation Learning
• Predominant in the dentate gyrus of the hippocampus
and in the olfactory bulb
Olfactory bulb Dentate gyrus
[Garg, Rish, Cecchi, Lozano 2016; submitted]
nsamples
p variables
~~
mbasisvectors
(dictionary)sparse
representation
input x
output x’ 
reconstructed x
hidden nodes c 
encoded x
link weights 
‘dictionary’ D
c c
Brain 2 AI:
Better Adaptation in Non-Stationary Environment
Learned dictionary size ‘Old’ domain reconstruction ‘New’ domain reconstruction
non-stationary visual input
Outperforms fixed-size autoencoder on non-stationary input:
improved accuracy + more compact representation
Adapts to a new domain without forgetting the old one
(via ‘memory’ matrices, part of original Mairal’s method)
Some Lessons
 Data-driven + analytical models = Superior Performance
 Importance of appropriate domain-specific prior (e.g., group sparsity)
 Feature engineering based on domain knowledge is still important!
 Importance of model stability (reproducibility)
 Model interpretability is key: sparsity, deconvolution - map back to brain
 Importance of exploring solution space beyond the single optimal one:
(there are many ways to skin the cat)
 Deep learning faces specific challenges in neuroimaging
 Datasets are relatively small (e.g., few 1000 samples) – regularize!
 Model interpretability should be incorporated (e.g. deconvolution)
Thank you!
And now… some shameless self-promotion 
Come to our workshop tomorrow! (Fri 12/9, Room 114)
Representation Learning in Artificial and Biological Neural Networks
Books:
Sparse Modeling,
I. Rish and G. Grabarnik,
CRC Press, 2014
Practical Applications of
Sparse Modeling, edited
by I. Rish, G. Cecchi,
A. Lozano, A. Niculescu-
Mizil, MIT Press, 2014.
Publication page:
http://researcher.watson.ibm.com/researcher/view_person_pubs.php?person=us-rish&t=1
Code:
https://github.com/pbashivan/EEGLearn
References
[Garg, Rish, Cecchi, Lozano 2016; submitted] S. Garg, I. Rish, G. Cecchi, A. Lozano. Neurogenesis-inspired Dictionary Learning: Online Model Adaptation in
a changing world, submitted to ICLR-2017
[Bashivan et al, ICLR 2016] P. Bashivan, I. Rish, M. Yeasin, N. Codella. Learning Representations from EEG with Deep Recurrent-Convolutional Neural
Networks. ICLR 2016 : International Conference on Learning Representations.
[Bashivan et al, 2015] Mental State Recognition via Wearable EEG, in Proc. of MLINI-2015 workshop at NIPS-2015.
[Heisig et al, 2014] S. Heisig, G. Cecchi, R. Rao and I. Rish. Augmented Human: Human OS for Improved Mental Function. AAAI 2014 Workshop on Cognitive
Computing and Augmented Human Intelligence.
[Neuropsychopharmacology, 2014] A Window into the Intoxicated Mind? Speech as an Index of Psychoactive Drug Effects. Bedi G, Cecchi G A, Fernandez
Slezak D, Carrillo F, Sigman M, de Wit H. Neuropsychopharmacology, 2014
[NPJ 2015] G. Bedi, F. Carrillo, G. A Cecchi, D. F. Slezak, M. Sigman, N. B Mota, S. Ribeiro, D C Javitt, M. Copelli, C M Corcoran. Automated analysis of free
speech predicts psychosis onset in high-risk youths. NPJ Schizophrenia 2015.
[PLoS ONE, 2013] Schizophrenia as a Network Disease: Disruption of Emergent Brain Function in Patients with Auditory Hallucinations, I Rish, G Cecchi, B
Thyreau, B Thirion, M Plaze, M-L Paillere-Martinot, C Martelli, J-L Martinot, J-B Poline. PloS ONE 8(1), e50625, Public Library of Science, 2013.
[PLoS One, 2012] Speech Graphs Provide a Quantitative Measure of Thought Disorder in Psychosis. N.B. Mota, N.A.P. Vasconcelos, N. Lemos, A.C. Pieretti,
O. Kinouchi, G.A. Cecchi, M. Copelli, S. Ribeiro. PLoS One, 2012
[Rish et al, SPIE 2016] I.Rish, P. Bashivan, G. A. Cecchi, R.Z. Goldstein, Evaluating Effects of Methylphenidate on Brain Activity in Cocaine Addiction: A
Machine-Learning Approach. SPIE Medical Imaging, 2016
[SPIE Med.Imaging 2012] Sparse regression analysis of task-relevant information distribution in the brain.
Irina Rish, Guillermo A Cecchi, Kyle Heuton, Marwan N Baliki, A Vania Apkarian, SPIE Medical Imaging, 2012.
[AISTATS 2012] J. Honorio, D. Samaras, I. Rish, G.A. Cecchi. Variable Selection for Gaussian Graphical Models. AISTATS, 2012.
[PLoS Comp Bio 2012] Predictive Dynamics of Human Pain Perception, GA Cecchi, L Huang, J Ali Hashmi, M Baliki, MV Centeno, I Rish, AV Apkarian,
PLoS Comp Bio 8(10), e1002719, Public Library of Science, 2012.
[Brain Informatics 2010] I. Rish, G. Cecchi, M.N. Baliki and A.V. Apkarian. Sparse Regression Models of Pain Perception, in Proc. of Brain Informatics (BI-
2010), Toronto, Canada, August 2010.
[NeuroImage, 2009] Prediction and interpretation of distributed neural activity with sparse models. Melissa K Carroll, Guillermo A Cecchi, Irina Rish, Rahul
Garg, A Ravishankar Rao. NeuroImage 44(1), 112--122, Elsevier, 2009.
[NIPS, 2009] Discriminative network models of schizophrenia, GA Cecchi, I Rish, B Thyreau, B Thirion, M Plaze, M-L Paillere-Martinot, C Martelli, J-L Martinot,
J-B Poline. Advances in Neural Information Processing Systems (NIPS 2009) , pp. 252--260, 2009.
“Psychiatric research is in crisis”
- Wiecki, Poland, Frank, 2015
“Imagine going to a doctor because of chest pain that has been bothering you for a couple of weeks.
The doctor would sit down with you, listen carefully to your description of symptoms, and prescribe
medication to lower blood pressure in case you have a heart condition. After a couple of weeks,
your pain has not subsided. The doctor now prescribes medication against reflux, which finally
seems to help. In this scenario, not a single medical analysis (e.g., electrocardiogram, blood work,
or a gastroscopy) was performed, and medication with potentially severe side effects was
prescribed on a trial-and-error basis.
…This scenario resembles much of contemporary psychiatry diagnosis and treatment.”
Objective measurements??

More Related Content

PDF
Rm psych stats & graphs
PDF
Mobile App Recommendations Using Deep Learning and Big Data
PDF
S'pore 2004
PDF
The VSE Factor
PDF
BSTAT 5325_Group 8_Project Report
PDF
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
PPTX
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
PPTX
Layla El Asri, Research Scientist, Maluuba
Rm psych stats & graphs
Mobile App Recommendations Using Deep Learning and Big Data
S'pore 2004
The VSE Factor
BSTAT 5325_Group 8_Project Report
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Layla El Asri, Research Scientist, Maluuba

Similar to Learning about the brain: Neuroimaging and Beyond (20)

PPTX
Rutherford_MiCHAMP2020.pptx
PPTX
How to mea­sure and improve brain-based out­comes that mat­ter in health care
PPTX
Computational approaches for mapping the human connectome
PDF
AI for Neuroscience and Neuroscience for AI
PPTX
The current state of prediction in neuroimaging
PPTX
Open science resources for `Big Data' Analyses of the human connectome
PDF
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...
PDF
Multimodal functional MRI (多模态功能磁共振成像)
PDF
Structural and functional neural correlates of emotional responses to music
PPTX
CSU_comp
PPTX
Brain Computer Interface for reconstructing sensory experiences
PDF
Functional specialization in human cognition: a large-scale neuroimaging init...
PPTX
AI and Big Data in Psychiatry: An Introduction and Overview
PPTX
COMA_Progress_PPTpresentaionof itsuses.pptx
PDF
Computational approaches to fMRI analysis
PPTX
Open repositories for neuroimaging research
PPTX
Puce U kentucky_2020
PDF
Machine learning and cognitive neuroimaging: new tools can answer new questions
PDF
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...
PDF
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
Rutherford_MiCHAMP2020.pptx
How to mea­sure and improve brain-based out­comes that mat­ter in health care
Computational approaches for mapping the human connectome
AI for Neuroscience and Neuroscience for AI
The current state of prediction in neuroimaging
Open science resources for `Big Data' Analyses of the human connectome
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...
Multimodal functional MRI (多模态功能磁共振成像)
Structural and functional neural correlates of emotional responses to music
CSU_comp
Brain Computer Interface for reconstructing sensory experiences
Functional specialization in human cognition: a large-scale neuroimaging init...
AI and Big Data in Psychiatry: An Introduction and Overview
COMA_Progress_PPTpresentaionof itsuses.pptx
Computational approaches to fMRI analysis
Open repositories for neuroimaging research
Puce U kentucky_2020
Machine learning and cognitive neuroimaging: new tools can answer new questions
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
Ad

Recently uploaded (20)

PPTX
Tablets And Capsule Preformulation Of Paracetamol
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PPTX
Tour Presentation Educational Activity.pptx
PPTX
Self management and self evaluation presentation
PPTX
Impressionism_PostImpressionism_Presentation.pptx
PPTX
fundraisepro pitch deck elegant and modern
PDF
Instagram's Product Secrets Unveiled with this PPT
PPTX
MERISTEMATIC TISSUES (MERISTEMS) PPT PUBLIC
PPTX
nose tajweed for the arabic alphabets for the responsive
PPTX
BIOLOGY TISSUE PPT CLASS 9 PROJECT PUBLIC
PPTX
Sustainable Forest Management ..SFM.pptx
PPTX
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
PPTX
Effective_Handling_Information_Presentation.pptx
PPT
First Aid Training Presentation Slides.ppt
PDF
Nykaa-Strategy-Case-Fixing-Retention-UX-and-D2C-Engagement (1).pdf
PPTX
Lesson-7-Gas. -Exchange_074636.pptx
PPTX
2025-08-10 Joseph 02 (shared slides).pptx
PPTX
An Unlikely Response 08 10 2025.pptx
PPTX
Anesthesia and it's stage with mnemonic and images
PDF
natwest.pdf company description and business model
Tablets And Capsule Preformulation Of Paracetamol
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Tour Presentation Educational Activity.pptx
Self management and self evaluation presentation
Impressionism_PostImpressionism_Presentation.pptx
fundraisepro pitch deck elegant and modern
Instagram's Product Secrets Unveiled with this PPT
MERISTEMATIC TISSUES (MERISTEMS) PPT PUBLIC
nose tajweed for the arabic alphabets for the responsive
BIOLOGY TISSUE PPT CLASS 9 PROJECT PUBLIC
Sustainable Forest Management ..SFM.pptx
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
Effective_Handling_Information_Presentation.pptx
First Aid Training Presentation Slides.ppt
Nykaa-Strategy-Case-Fixing-Retention-UX-and-D2C-Engagement (1).pdf
Lesson-7-Gas. -Exchange_074636.pptx
2025-08-10 Joseph 02 (shared slides).pptx
An Unlikely Response 08 10 2025.pptx
Anesthesia and it's stage with mnemonic and images
natwest.pdf company description and business model
Ad

Learning about the brain: Neuroimaging and Beyond

  • 1. minx ||y - Ax||2 + λ ||x||1 Irina Rish Computational Psychiatry and Neuroimaging IBM T.J. Watson Research Center Learning About the Brain: Neuroimaging and Beyond
  • 2. Collaborators (an incomplete list) IBM T.J. Watson Research: Guillermo Cecchi Steve Heisig Aurelie Lozano Google: Mt Sinai: Northwestern U. Melissa Carroll Rita Goldstein A. Vania Apkarian INRIA: Neurospin/UC Berkeley Bertrand Thirion MIT: Pouya Bashivan Purdue: Jean Honorio Lehigh U. Katya Scheinberg SUNY Stony Brook Dimitris Samaras St. Johns U. Genady GrabarnikJB Poline USC: Sahil Garg
  • 3. AI Brain2 Brain 2 AI: Brain-inspired AI Algorithms AI 2 Brain: Mental-State Prediction and Statistical Biomarker Discovery
  • 4. Mental State Recognition to Improve Mental Function Detecting emotional & cognitive changes to predict response to different types of input, e.g. music, video, news, ads, emails (both for mental health and for neuromarketing) Safety: detecting changes in driver’s alertness level (drowsiness, microsleeps) to prevent accidents Computational psychiatry: data-analytic approach to diagnosis based on objective measurements (new Research Domain Criteria (RDoC) initiative by NIMH) Our current focus: schizophrenia, addiction, Huntington’s, Alzheimer’s, Parkinson’s “Psychiatric research is in crisis” [Wiecki et al. 2015] AI 2 Brain: Health & Productivity: mental-state-sensitive software monitoring cognitive load, focus/attention; monitoring stress/anxiety
  • 5. Measuring Brain Activity with Functional MRI Image courtesy of fMRI Research Center at Columbia University • Blood-oxygen-level-dependent (BOLD) signal related to brain activity while subject performs some task in scanner • 4D ‘brain movie’: a sequence of 3D brain volumes 3D voxels ~ 3x3x3 mm, time repetitions (TR) ~1-2s • Challenge: high-dimensional, small-sample datasets 10,000 to 100,000 variables (voxels), few 100s of TRs (samples), few 100 subjects or less
  • 6. n Data from [Baliki, Geha, Apkarian 2008] 14 healthy subjects presented with painful thermal stimuli while in fMRI scanner, and asked to rate their pain level (using a finger-span device). Example: Pain Perception Which brain areas are “relevant” to pain? Can we predict pain perception from fMRI? [PLoS Comp Bio 2012] , [SPIE Med.Imaging 2012], [Brain Informatics 2010]
  • 7. Another Example: Cocaine Addiction and Methylphenidate [Rish et al, SPIE 2016] Does MPH normalize CUD brain activity? Yes! - univariate hypothesis testing [Konova 2013][Goldstein 2010] - Our ML approach: CUD’s are harder to discriminate from controls when under MPH  their brains look more ‘normal’ A therapeutic agent for CUD? A stimulant for a stimulant? (similarly to nicotine patch and methadone for heroin addiction) Mechanism: cocaine affects the reward and may lead to addiction (cocaine use disorder, or CUD) Resting-state fMRI experiment: MPH vs. placebo, CUD vs. controls Functional connectivity features (degrees) [Konova et al 2013] MPH (Ritalin), often used to treat ADHD, has similar chemical structure and mechanism of action, but slower rate of clearance (90 vs 20 min; thus a lower abuse potential). [Goldstein & Volkow, 2002]
  • 8. What are we typically looking for in fMRI data? Question: given a stimulus, mental state or disorder, find relevant brain areas and/or interactions among them Traditional GLM approach - voxel-wise ‘activations’ + univariate stat. tests - ignores voxel interactions! But, simple and interpretable, and thus still very popular Alternative: multivariate methods to predict mental states • cognitive (e.g., viewing a picture, listening to instructions) • emotional (level of pain, anxiety, happiness) • disorders (e.g. schizophrenia, ADHD, addiction) Look for predictive patterns (hint: no black-box models)
  • 9. Feature Engineering (prior knowledge) - e.g. network properties [Rish et al, PLoS One 2013], [Cecchi et al, NIPS 2009] [Rish et al, SPIE Med.Imaging 2012], [Gheiratmand et al, submitted] Feature Selection (sparsity) [Rish et al, SPIE Med.Imaging 2012], [Honorio et al, AISTATS 2012], [Rish et al, Brain Informatics 2010],[Carroll et al, Neuroimage 2009] Feature Extraction: Learning Representations - dictionary learning, deep convnets learning [Rish et al, SfN 2011], [Rish et al, ICML 2008], [Bashivan et al, ICLR 2016], [Garg et al, submitted] Biomarkers  Predictive Features + ++ + - - --- Predictive Model patients controls Our Goal: Interpretable Machine Learning minx ||y - Ax||2 + λ ||x||1
  • 10. Mental State Prediction via Sparse Regression y = Ax + noise fMRI data (“encoding’) rows – samples (~500) Columns – voxels (~30,000) Unknown parameters (‘signal’) Measurements: mental states, behavior, tasks or stimuli Solution: embedded variable selection via sparse regression Find a small number of (jointly) most relevant brain voxels Issue: high-dimensional, small-sample data Need to (1) prevent overfitting and (2) find interpretable solution
  • 11. ISSUE: high-dimensional, small-sample problem - solutions are overfit to data: poor generalization - difficult to interpret (determine relevant voxels) APPROACH: - LASSO: adds ℓ1-norm regularization - selects relevant voxels (sparse solution  many zero coefficients) - improving LASSO: Elastic Net - sparsity + grouping of correlated variables Sparse Regression Methods
  • 12. Adding Structure (Grouping) Helps Elastic Net vs. LASSO: - Higher prediction accuracy: 0.7-0.8 correlation between actual and predicted pain ratings and other tasks (e.g., PBAIC competition) - Better interpretability: voxel clusters (areas) versus scattered single voxels - Grouping parameter improves model stability (overlap) across different runs Some other ‘structured sparsity’ methods: • Group LASSO - when groups (e.g. regions) are known • Fused LASSO – spatial (or temporal) continuity • Moreover, adding structure in graphical LASSO, etc. [SPIE Med.Imaging 2012], [Brain Informatics 2010], [Neuroimage 2009]
  • 13. VaryingPainPerceptionData Driven + Analytical Models = Best Performance [PLoS Comp Bio 2012] Stimulus (temperature) to Pain dynamical model with 3 parameters captures inter-subject variability! pain threshold anticipationforgetting Dynamical model (stimulus to pain) + sparse regression (fMRI to stimulus) outperforms purely data driven model (fMRI to pain) – due to highly accurate analytical part!
  • 14. Looking Beyond Single Sparse Solution: Local vs Distributed (“Holographic”) Information Simple auditory task (PBAIC): localized, sparse Complex pain experience: Distributed/‘holographic’ Sharp transition from highly relevant first two solutions (2000 voxels), to almost irrelevant rest of voxels (accuracy < 0.2) No such sharp transition, slow linear decay from best (on average) 0.65 accuracy (1st solution) to 0.5 (10th sol.) and 0.4 accuracy (24th solution, 23,000 voxels removed) [Rish et al, SPIE Med.Imaging 2012]
  • 15. Exponential decay in relevance measured by univariate correlation with the task vs. linear decay for prediction accuracy Highly predictive solution #25 (0.52 accuracy vs. 0.67 of the 1st solution) has no voxels with individual correlation above 0.1! [SPIE 2012] Standard GLM Analysis Would Not Reveal Such Behavior! Subsequent solutions are indeed distributed through the brain [Rish et al, SPIE Med.Imaging 2012]
  • 16. Objective: discover discriminative patterns (biomarkers) Not localized, but rather a network disease Functional network features vs local voxel activations: Network features significantly outperformed activation features in (1) significance tests, (2) classification and (3) stability across CV-subsets, i.e. despite ‘normal’ task-response, functional connectivity is significantly disrupted Network Extracted Correlation Matrix (N 2 =2x10 10 ) Thresholded Matrix MR Signal M1 V1 PP 1 N 1 N -0.5 0 0.5 1 1 N 1 N Features Engineering to Predict Schizophrenia Voxel degrees + GMRF = 86% accuracy Cross-voxel correlations + SVM = 93% accuracy [Cecchi et al, NIPS 2009] [Rish et al, PLoS ONE 2013] Functional network features: degrees, link weights, etc. Activation features: univariate correlations of a voxel w/ task
  • 17. Learning Interpretable Whole-Brain Markov Networks Problem: whole-brain Markov nets, even link-sparse (e.g., learned by glasso), are hard to interpret; can we identify most relevant nodes (voxels) ? Hypothesis: often, only a relatively few “important” variables are interacting with each other, forming clusters. variable-selection prior: block l1/lp norm log-likelihood link sparsity node sparsity Proposed approach: group-Lasso penalty for node selection [Honorio et al, AISTATS 2012]
  • 18. Our method vs standard glasso: - higher accuracy (log-likelihood) - better interpretability We observe in cocaine addiction: • increased connectivity between the visual cortex (left) and the prefrontal cortex (right) • decreased connectivity between the visual cortex and other brain areas • relation to prior art: • visual cortex abnormalities in addiction observed by [Lee et al 2003] • prefrontal cortex is involved in decision making and reward processing, abnormal monetary processing in PFC reported in [Goldstein et al, 2009] Markov Network Disruptions in Cocaine Addiction cocaine addicts control subjects graphicallassoourmethod blue - positive interactions red - negative interactions Visual attention task + monetary reward [Honorio et al, AISTATS 2012]
  • 19. Representation Learning with ConvNets: an EEG study EEG Experiment:working memory 13 subjects, 240 runs (3120 trials) Samples: a subset of 2670 correctly answered trials Evaluation: leave-one-subject out (i.e., 13-fold) CV Feature Extraction: FFT to find spectral power within each electrode at three frequency bands - theta (4-8Hz), alpha (8-13Hz), and beta (13-30Hz). [Bashivan et al, ICLR 2016] EEG-images: 3D electrode locations (64) are projected into a 2D surface via distance- preserving Azimuthal Equidistant Projection. Topographical activity map within each band is transformed into an image by interpolating the values between electrodes on a 32 x 32 mesh.
  • 20. • FFT over the complete trial = single image for each trial • VGG style ConvNets [Simonyan & Zisserman, 2015] • Convolutional layers with 3 x 3 receptive fields • Various architectures were explored with different number of layers ConvNets Architectures: Single-Frame Approach ConvNet Configurations A B C D input (32 x 32 3-channel image) Conv3-32 Conv3-32 Conv3-32 Conv3-32 Conv3-32 Conv3-32 Conv3-32 Conv3-32 Conv3-32 Conv3-32 maxpool - Conv3-64 Conv3-64 Conv3-64 Conv3-64 Conv3-64 Conv3-64 - maxpool - - Conv3- 128 Conv3- 128 - - maxpool Architecture Number of parameters Test Error A ~10k 13.05 B ~65.5k 13.17 C ~139.4k 13.91 D ~158k 12.39
  • 21. Better Results with Recurrent ConvNets Best result of 8.9% error discriminating among 4 levels of cognitive load achieved by recurrent Conv Nets with LSTM + time convolution [Bashivan et al, ICLR 2016] • EEG times series for each trial split into 7 windows (0.5 sec). FFT on each time window to get an image as before • Best ConvNet (7-layer) used as C • All 7 ConvNets shared parameters • video classification architectures from [Ng et al, CVPR 2015] • Temporal Maxpool: Max pool over time frames • Temporal Convolution: 1D convolution over time frames • LSTM - sequence mapping over times frames • Mixed LSTM/1D-Conv: Combination of both LSTM and 1D-Conv architectures Architecture Test Error (%) Validation Error (%) Number of parameters RBF SVM 15.34 - - L1-logistic regression 15.32 - - Random Forest 12.59 - - DBN 14.96 8.37 1.02 mil ConvNet+Maxpoo l 14.80 8.48 1.21 mil ConvNet+1D- Conv 11.32 9.28 441 k ConvNet+LSTM 10.54 6.10 1.34 mil ConvNet+LSTM/1 D-Conv 8.89 8.39 1.62 mil
  • 22. But What About Interpretability? Code: https://github.com/pbashivan/EEGLearn Using deconvnet of [Zeiler et al] to map features back to the brain images Back Projections: maps obtained by deconvnet on the feature map displaying structures in the input image that excite that particular feature map. Some of these features correspond to well-known electrophysiological markers of cognitive load. First-layer features (1st stack, kernel 7) captured wide- spread theta (1st stack output-kernel7) and another (1st stack, kernel 23) frontal beta activity Second- and third-layer features – frontal theta/beta (2nd stack,kernel7) and 3rd stack kernel60, 112) as well as parietal alpha (2nd stack kernel29) . Frontal theta and beta activity as well as parietal alpha are most prominent markers of cognitive/memory load in neuroscience literature [Bashivan et al., 2015; Jensen et al., 2002; Onton et al., 2005; Tallon-Baudry et al., 1999] Input EEG images: top 9 images with highest feature activations across the training set Layer4Layer6Layer7
  • 23. Summary: Machine Learning in Neuroimaging “Statistical biomarkers”: [Cecchi et al, NIPS 2009] [Rish et al, PLOS One, 2013] [Carroll et al, Neuroimage 2009] [Scheinberg&Rish, ECML 2010] Schizophrenia classification: 86% to 93% accuracy [Rish et al, Brain Informatics 2010] [Rish et al, SPIE Med.Imaging 2012] [Cecchi et al, PLOS Comp Bio 2012] Cognitive state prediction in videogames: 70-95% Pain perception: 70-80%, distributed activation patterns [Honorio et al, AISTATS 2012] [Rish et al, SPIE Med.Imaging 2016] Cocaine addiction: distinct Markov network patterns [Bashivan et al, ICLR 2016] EEG-cognitive load prediction: 91% w/ recurrent ConvNets + ++ + - - --- Predictive Model mental disorder healthy
  • 24. Beyond the Scanner: Using ‘Cheaper’ Sensors? NeuroSky Muse EEG EEG, accelerometer Hexoskin Heart rate Respiration Heart-rate variability Jawbone UP3 Heart rate Respiration Galvanic Skin Response (GSR) Skin temperature Ambient Temperature Accelerometer So much data, so little inference  What can we actually learn from all these “big personal data”? Anything about mental states?
  • 25. Device detects voltage at different skin locations: 1. TP9 Behind Left Ear 2. FP1 Front Left Forehead 3. FP2 Front Right Forehead 4. TP10 Behind Right Ear FP2 FP1 TP10 TP9 Delta – Adult slow wave sleep, continuous attention processes Theta – Drowsiness, idling, inhibition Alpha – Relaxed, reflecting Beta – Alert, busy, anxious, thinking Gamma – Short term memory usage, using 2 senses at once Experiments with Muse Wearable EEG
  • 26. Experiment 1: Meditation • Exploring differences across people engaged in the same activity • Each session consisted of 7 minutes of closed eyes meditation • Clustering mean frequency band vectors 26 Solutions SH PB IR G HC WomenMen Green: lower than average Red: higher then average Lower beta, higher alpha: More relaxed, less anxious Higher beta, lower alpha: Vice versa HC
  • 27. • Can you tell from EEG what kind of movie a person is watching? • 2 short (~7 min) youtube videos, in 2 sessions • Funny (“emotional”) cat videos vs Educational (“rational”) Khan Academy • Feature extraction from raw EEG data: • 1st and 2nd order statistics from frequency bands • ‘Functional networks’ across bands and sensors • Classifiers: • SVM (RBF kernel), sparse logistic regression, decision forest, deep nets (3-layer RBM) • Best results: • Individual-level: SVM, 74.5% accuracy • Group-level: sparse logistic regression, 74% accuracy • Most-predictive features: • variances in power within frequency bands, and correlations cross-bands and cross-channels (‘functional network’) • Sparse logistic regression predicted best with only 31 out of 441 (7% of all) features 27 Experiment 2: Watching Videos [Bashivan et al, 2015] TP9 – HG FP1 – B FP2 – A FP2 – B TP10 – G TP10 – HG
  • 28. Text Analytics for Computational Psychiatry ``Language is a window into the brain’’ - M. Covington • 93% accuracy discriminating schizophrenics from manics based on syntactic speech graphs [PLoS One, 2012] • Nearly 100% accuracy predicting 1st psychotic episode 1-2 YEARS in advance via coherence and other features [Nature Schizophrenia, 2015] • 88% accuracy discriminating ecstasy and meth users from controls, using semantic features such as proximity to ‘empathy’ concept, etc., and graph features [Neuropsychopharmacology, 2014]
  • 29. Example: Speech Coherence Text coherence: Currently measured as the angle between vector representations of consecutive sentences (word vectors computed by LSA) https://www.youtube.com/watch?v=MXzwAXzUwwEhttps://www.youtube.com/watch?v=6xx_pwu7n-Y Sober vs. Non-sober Speech Coherence for Jenna Phrase-to-phrase Coherence Alternate-phraseCoherence Non-sober Sober [Heisig et al, AAAI ws 2014]
  • 30. Sensor 1 Sensor 3 I walked into a café .. Sensor 2 Sensor data  Text  Audio  Video  EEG signal  Temperature  Heart-rate  Skin- conductance Psycho- and physiological Features Voice power spectrum Text topic model Syntactic graph HRV spectrum Cheap data + Smart Analytics: Machine learning+ graph theory = Behavioral prediction Brain sciences: Psychology+ Neuroscience Behavioral Phenotype Baselining Change-point detection Predictions Towards “Augmented Human”: Real-Time Mind-Reading from Cheap Sensors
  • 31. • Current theories: the hippocampus functions as an autoenconder to evoke memories; similar encoding function is suggested in the olfactory bulb • Our computational model: sparse linear autoencoder (online dictionary learning of Mairal et al) + dynamic addition (birth) abnd deletion (death) of hidden nodes Adult Neurogenesis: Inspiration for Adaptive Representation Learning • Predominant in the dentate gyrus of the hippocampus and in the olfactory bulb Olfactory bulb Dentate gyrus [Garg, Rish, Cecchi, Lozano 2016; submitted] nsamples p variables ~~ mbasisvectors (dictionary)sparse representation input x output x’  reconstructed x hidden nodes c  encoded x link weights  ‘dictionary’ D c c Brain 2 AI:
  • 32. Better Adaptation in Non-Stationary Environment Learned dictionary size ‘Old’ domain reconstruction ‘New’ domain reconstruction non-stationary visual input Outperforms fixed-size autoencoder on non-stationary input: improved accuracy + more compact representation Adapts to a new domain without forgetting the old one (via ‘memory’ matrices, part of original Mairal’s method)
  • 33. Some Lessons  Data-driven + analytical models = Superior Performance  Importance of appropriate domain-specific prior (e.g., group sparsity)  Feature engineering based on domain knowledge is still important!  Importance of model stability (reproducibility)  Model interpretability is key: sparsity, deconvolution - map back to brain  Importance of exploring solution space beyond the single optimal one: (there are many ways to skin the cat)  Deep learning faces specific challenges in neuroimaging  Datasets are relatively small (e.g., few 1000 samples) – regularize!  Model interpretability should be incorporated (e.g. deconvolution)
  • 34. Thank you! And now… some shameless self-promotion  Come to our workshop tomorrow! (Fri 12/9, Room 114) Representation Learning in Artificial and Biological Neural Networks Books: Sparse Modeling, I. Rish and G. Grabarnik, CRC Press, 2014 Practical Applications of Sparse Modeling, edited by I. Rish, G. Cecchi, A. Lozano, A. Niculescu- Mizil, MIT Press, 2014. Publication page: http://researcher.watson.ibm.com/researcher/view_person_pubs.php?person=us-rish&t=1 Code: https://github.com/pbashivan/EEGLearn
  • 35. References [Garg, Rish, Cecchi, Lozano 2016; submitted] S. Garg, I. Rish, G. Cecchi, A. Lozano. Neurogenesis-inspired Dictionary Learning: Online Model Adaptation in a changing world, submitted to ICLR-2017 [Bashivan et al, ICLR 2016] P. Bashivan, I. Rish, M. Yeasin, N. Codella. Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks. ICLR 2016 : International Conference on Learning Representations. [Bashivan et al, 2015] Mental State Recognition via Wearable EEG, in Proc. of MLINI-2015 workshop at NIPS-2015. [Heisig et al, 2014] S. Heisig, G. Cecchi, R. Rao and I. Rish. Augmented Human: Human OS for Improved Mental Function. AAAI 2014 Workshop on Cognitive Computing and Augmented Human Intelligence. [Neuropsychopharmacology, 2014] A Window into the Intoxicated Mind? Speech as an Index of Psychoactive Drug Effects. Bedi G, Cecchi G A, Fernandez Slezak D, Carrillo F, Sigman M, de Wit H. Neuropsychopharmacology, 2014 [NPJ 2015] G. Bedi, F. Carrillo, G. A Cecchi, D. F. Slezak, M. Sigman, N. B Mota, S. Ribeiro, D C Javitt, M. Copelli, C M Corcoran. Automated analysis of free speech predicts psychosis onset in high-risk youths. NPJ Schizophrenia 2015. [PLoS ONE, 2013] Schizophrenia as a Network Disease: Disruption of Emergent Brain Function in Patients with Auditory Hallucinations, I Rish, G Cecchi, B Thyreau, B Thirion, M Plaze, M-L Paillere-Martinot, C Martelli, J-L Martinot, J-B Poline. PloS ONE 8(1), e50625, Public Library of Science, 2013. [PLoS One, 2012] Speech Graphs Provide a Quantitative Measure of Thought Disorder in Psychosis. N.B. Mota, N.A.P. Vasconcelos, N. Lemos, A.C. Pieretti, O. Kinouchi, G.A. Cecchi, M. Copelli, S. Ribeiro. PLoS One, 2012 [Rish et al, SPIE 2016] I.Rish, P. Bashivan, G. A. Cecchi, R.Z. Goldstein, Evaluating Effects of Methylphenidate on Brain Activity in Cocaine Addiction: A Machine-Learning Approach. SPIE Medical Imaging, 2016 [SPIE Med.Imaging 2012] Sparse regression analysis of task-relevant information distribution in the brain. Irina Rish, Guillermo A Cecchi, Kyle Heuton, Marwan N Baliki, A Vania Apkarian, SPIE Medical Imaging, 2012. [AISTATS 2012] J. Honorio, D. Samaras, I. Rish, G.A. Cecchi. Variable Selection for Gaussian Graphical Models. AISTATS, 2012. [PLoS Comp Bio 2012] Predictive Dynamics of Human Pain Perception, GA Cecchi, L Huang, J Ali Hashmi, M Baliki, MV Centeno, I Rish, AV Apkarian, PLoS Comp Bio 8(10), e1002719, Public Library of Science, 2012. [Brain Informatics 2010] I. Rish, G. Cecchi, M.N. Baliki and A.V. Apkarian. Sparse Regression Models of Pain Perception, in Proc. of Brain Informatics (BI- 2010), Toronto, Canada, August 2010. [NeuroImage, 2009] Prediction and interpretation of distributed neural activity with sparse models. Melissa K Carroll, Guillermo A Cecchi, Irina Rish, Rahul Garg, A Ravishankar Rao. NeuroImage 44(1), 112--122, Elsevier, 2009. [NIPS, 2009] Discriminative network models of schizophrenia, GA Cecchi, I Rish, B Thyreau, B Thirion, M Plaze, M-L Paillere-Martinot, C Martelli, J-L Martinot, J-B Poline. Advances in Neural Information Processing Systems (NIPS 2009) , pp. 252--260, 2009.
  • 36. “Psychiatric research is in crisis” - Wiecki, Poland, Frank, 2015 “Imagine going to a doctor because of chest pain that has been bothering you for a couple of weeks. The doctor would sit down with you, listen carefully to your description of symptoms, and prescribe medication to lower blood pressure in case you have a heart condition. After a couple of weeks, your pain has not subsided. The doctor now prescribes medication against reflux, which finally seems to help. In this scenario, not a single medical analysis (e.g., electrocardiogram, blood work, or a gastroscopy) was performed, and medication with potentially severe side effects was prescribed on a trial-and-error basis. …This scenario resembles much of contemporary psychiatry diagnosis and treatment.” Objective measurements??