CSU_comp

Computational Classification
Techniques for Neuroimaging
A Machine Learning Based Approach
Adrian Smith – Undergraduate
Computer Science Department
Sonoma State University

Fundamentals
• Understanding the human
brain has been a central
theme of human history
• By growing our
understanding of the brain,
we improve our ability to
treat diseases (Gur2002)
• Understanding the brain
helps us be aware of it’s
limitations
Artist’s Depiction of Neurons
UCI Research
Courtesy of OSA Student Chapter at UCI Art in Science Contest.
Photo by: Ardy Rahman

fMRI Scanning
• Functional Magnetic
Resonance Imaging (fMRI)
allows us to measure localized
brain activity
• This allows one to find
relationships between cognition
and brain activity
• Blood oxygen is used as a
measure of activity (BOLD
imaging)
• This technique produces rich
data, but contains high levels of
noise
CSRB (Keck MRI Center)

Data Collection
• One major advantage of
researching fMRI data is it’s
availability on a variety of
online locations
• We worked with 1452 total
brain scans each
corresponding to one of 9
categories
• The categories refer to the
image a subject was
observing

Analysis Goals
• Our goal was to be able to, given the
fMRI scan of a subject, predict what
image they were observing
• This means differentiating scans
based on the image the subject is
observing
• What is the relationship?
Haxby2001 Stimulus Images

Machine Learning Techniques
• Machine learning is an
information processing
technique
• The field of machine learning is
at the heart of understanding
“Big Data”
• We aimed to use modern
machine learning techniques to
help classify fMRI data.

How does Machine Learning Work?
• Machine Learning classification
focuses on designing
algorithms which are trained to
categorize objects
• This is done by combining
some defining characteristics
and a label
• The algorithm trains on one set
of data, and then is tested to
see how accurately it can
predict the label of some piece
of data.
• What is the data?
By Antti Ajanki AnAj (Own work) [GFDL
(http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-
3.0 (http://creativecommons.org/licenses/by-sa/3.0/)

Which is active before processing?
Unprocessed Active Unprocessed Rest

Which is active after processing?
Processed Active Processed Rest

Preprocessing
• We applied masks that came with
the dataset in order to focus on the
Ventral Temporal cortex, our region
of interest
• We then applied a polynomial
detrender, which eliminates
systematic trends, such as signal
increase as the machine warms up
• This was followed by a key step, z-
scoring against the rest position
Graph of Normal Distribution
Public Domain

Classification
• We now had to decide how to
process the image data
• This meant choosing features
that best represented the data
we sought
• We also tested a variety of
classification algorithms which
would label images based on the
chosen feature

Features
• We started with the our preprocessed values, and then looked at a
variety of transforms
• We chose the full vector and the PCA reduced version as our main
features of interest
• Principle Component Analysis (PCA) is a tool to reduce the dimensionality of a
dataset
PCA
Full Vector (Samples)
50 Highest Values
Histogram
[0.5, .01, -.02, 1.5, 2.0, … -3.0]576
One Volume

Experimental Design
• Data was split evenly and randomly into training and test
• We used several feature vectors to test each classifier
• We primarily focused on k Nearest Neighbor (kNN) and Support
Vector Machine (SVM) classifiers
• Tests were repeated 15 times and scores averaged
Train
Feature
Training
Label
Trained
Classifier
Testing
Feature
Predicted
Label
Testing
Label
Comparison
Accuracy and
Confusion
Matrix
Classifier

kNN vs. SVM
• SVM preforms better than kNN
• Increase in accuracy is likely due to the weakness of kNN when
dealing with high dimensionality
SVM on samples, 90.9%
accuracy
kNN on samples, 75.6%
accuracy

• We applied PCA to the processed data
• This produced a vector over half the size of our original
• This smaller vector produces more accurate results
Samples vs. PCA
PCA (SVM), 92.1%
accuracy
SVM on samples, 90.9%
accuracy

• PCA and SVM in combination gave the best results
after repeated testing
• We achieved on average 92.1% accuracy among 9
labels, with a 2.0% standard deviation.
• Our classification methods are effective and
repeatable
• We also gained a variety of insights about the
nature of the data
Classification Results: Accuracy

• We saw several labels which
repeatedly misclassified, and
saw accuracy improve as they
were removed
• One area of further study is
investigating whether these
patterns exist between multiple
subjects, and why
PCA (SVM), 92.1%
accuracy
Classification Results: Insights

Future Exploration
• We intend to move towards classifying
across multiple subjects
• This is of utmost importance to clinical
applications of fMRI data
• Multisubject comparison presents
challenges due to the variation in brain
structure
• We intend to build upon previous work on
feature detection and scaling maps
(Gill2014)

Sources
• Gur, R. E., McGrath, C., Chan, R. M., Schroeder, L., Turner, T., Turetsky, B. I., ...
& Gur, R. C. (2002). An fMRI study of facial emotion processing in patients with
schizophrenia. American Journal of Psychiatry.
• Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P.
(2001). Distributed and overlapping representations of faces and objects in
ventral temporal cortex. Science, 293(5539), 2425-2430.
• Gill, G., Bauer, C., & Beichel, R. R. (2014). A method for avoiding overlap of left
and right lungs in shape model guided segmentation of lungs in CT volumes.
Medical physics, 41(10), 101908.
• Dataset: This data was obtained from the OpenfMRI database. Its accession
number is ds000105. The original authors of :ref:`Haxby et al. (2001) <HGF+01>`
hold the copyright of this dataset and made it available under the terms of the
`Creative Commons Attribution-Share Alike 3.0`_ license.

Acknowledgments
• Dr. Gurman Gill – Mentor
• OpenfMRI – Source of all data, and amazing example of open
data in science
• pyMVPA – Python toolkit used in preprocessing
• Scikit-learn – Python toolkit used in classification
• Dr. Yaroslav Halchenko – Researcher who provided extensive
aid in understanding and dealing with fMRI data

Extra Graphics
SVM of top 400 values.
30.9% accuracy SVM on 90% PCA. 92.2%
accuracy

CSU_comp

More Related Content

What's hot (20)

Viewers also liked (15)

Similar to CSU_comp (20)

CSU_comp