Data-driven Ophthalmology

Introduction
●
Purpose of this presentation is to
provide a light visual literature
review on “big data” or deep
learning / artificial intelligence
solutions to come for
ophthalmology and vision sciences.
– More with an idea to introduce
topics that you might have not
thought of before without
going to deeply to details
Some of the background in order
to understand this presentation
better are covered in my previous
presentation →
●
Presentation itself is quite dense,
and better suitable to be read from
a tablet/desktop rather than as a
slideshow projected somewhere
Shallow introduction for Deep Learning Retinal
Image Analysis
Published on Aug 20, 2016
https://www.slideshare.net/PetteriTeikariPhD/shallow-introduction
-for-deep-learning-retinal-image-analysis

“Old-school” unimodal model
Imageclassificationforretinalpathologies

Ophthalmic IMAGING 2D Fundus 3D OCT→
Examples of color and high-dynamic-range (HDR) disc
photographs of 2 normal controls (a, b and c, d) and 2
glaucoma patients (e, f and g, h). Left column (a, c, e,
and g) color disc photograph and right column (b, d, f,
and h) high-dynamic-range concept disc photograph.
https://doi.org/10.1155/2017/8209270
Linear-scale adaptive optics (AO)-Optical Coherence Tomography (OCT) volume acquired with three different AO focus depths (RNFL, OPL, and
IS/OS) and combined for displaying appearance of retinal layers in AO-OCT images. En face images are projections of subvolumes shown in
the middle, demonstrating the fine-depth sectioning ability of AO-OCT. (Jonnal et al., 2016)
Optical Coherence Tomography (OCT) and its variants, the de facto standard for eye diagnostics
Multispectral imaging going beyond RGB channels and laser-based OCTs (Figure from Annidis)

Ophthalmic IMAGING (A)SLO and multimodal systems
(2015) https://doi.org/10.1364/BOE.6.001407
https://doi.org/10.1007/s00417-016-3361-7
Fundus autofluorescence, microperimetry and
hyperreflective intraretinal spor (HRS) analysis using OCT

Ophthalmic IMAGING Functional Imaging
http://dx.doi.org/10.1167/iovs.16-21389
Model of the retinal vasculature represented by a binary tree. The
vessels bifurcate in a dichotomous manner except for the
precapillaries, which are point of origin of four capillaries. Adapted
from Takahashi et al. (2009)
http://dx.doi.org/10.1111/aos.13365
http://dx.doi.org/10.1080/02713683.2016.1217544
KEYWORDS: Hyperspectral retinal camera, primary open-angle glaucoma, retinal oxygen saturation
The average arteriolar (left) and venular (right) OD values at each given (5-nm) imaged wavelength
from 500 to 600 nm for all of the volunteers.
In summary, this article has described a novel hyperspectral prototype for
spectral imaging of the retina that can potentially be used in the future to
acquire retinal vessel blood oxygen saturation values. By considering the
limitations of ocular imaging encountered by other retinal oximetry studies,
namely longer acquisition and exposure times, flash exposure, and limited
wavelength intervals, this new instrument may be promising in acquiring more
refined and faster measurements of nonflash exposure retinal oximetry
measurements in vivo that can potentially be applied to human retinal vascular
disease.

Ophthalmic IMAGING portable imaging
Human Factor and Usability Testing of a
Binocular OCT System - EASE Study
Reena Chopra1
, Padraig J. Mulholland1, 2
, Adam M. Dubis1
, Roger S.
Anderson1, 2
, Pearse A. Keane1
1
NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS
Foundation Trust and UCL Institute of Ophthalmology, London, United Kingdom; 2
Optometry
and Vision Science Research Group, School of Biomedical Sciences, Ulster University,
Coleraine, Northern Ireland, United Kingdom
Automated quantitative pupillometry using the Binocular OCT
Purpose: A prototype binocular optical coherence
tomography (OCT) device has recently been developed that
performs ‘whole-eye’ OCT imaging in an automated manner
(Envision Diagnostics, Inc. USA). The inclusion of ‘smart
technology’ such as customizable display screens and voice
recognition also permits the quantitative assessment of
visual acuity (VA), visual fields, ocular motility, and
pupillometry (Fig. 1). As this device will primarily be used in
elderly and visually impaired populations, we performed
prospective usability testing of an early prototype with a
view to predicting function in a clinical setting, and to
identify any potential user errors – EASE Study
(ClinicalTrials.gov Identfier: NCT02822612).
ARVO 2017 Annual Meeting Abstracts
Session 516: Advancements in OCT
Ophthalmologica 2017;238:89-99https://doi.org/10.1159/000475773
http://dx.doi.org/10.15761/NFO.1000102
Fundus Photography in the 21st Century—A Review of Recent Technological Advances
and Their Implications for Worldwide Healthcare
Panwar Nishtha, Huang Philemon, Lee Jiaying, Keane Pearse A., Chuan Tjin Swee, Richhariya Ashutosh, Teoh
Stephen, Lim Tock Han, and Agrawal Rupesh. Telemedicine and e-Health. March 2016, 22(3): 198-208.
https://doi.org/10.1089/tmj.2015.0068
iCam, 3nethra, CenterVue, iOptics EasyScan, Topcon TRC-NW8FPLUS, Zeiss Visucam 200, Kowa Nonmyd7, Canon CR-2, Oculus Imagecam,
iExaminer, PanOptic, Volk Pictor, VersaCam, JedMed Horus Scope, Optomed Smartscope, Kowa Genesis-D, Riester, Ocular Cellscope, PEEK,
dEye

Retinal Layer Segmentation Pathological retina challenging still
https://arxiv.org/abs/1704.02161
Branch Residual U-Network (BRU-net)
https://doi.org/10.1364/BOE.8.003292
Voxeleron Awarded NIH SBIR
Grant for Device-independent
Retinal OCT Image Analysis
Software
February 8, 2017 Daniel Russakoff
Voxeleron will collaborate with Professor Pablo
Villoslada of UCSF/IDIBAPS and Dr. Pearse Keane of
Moorfields Eye Hospital to validate the algorithms
and ensure clinical utility.
in the choriocapillaris is shown. https://www.voxeleron.com/orion/

Vascular segmentation
http://dx.doi.org/10.1136/bmjophth-2016-000032
https://doi.org/10.1007/978-3-319-59876-5_56

Other Retinal segmentation & Detection
Christos Bergeles, Adam M. Dubis, Benjamin Davidson,
Melissa Kasilian, Angelos Kalitzeos, Joseph Carroll, Alfredo
Dubra, Michel Michaelides, and Sebastien Ourselin
Biomedical Optics Express Vol. 8, Issue 6, pp. 3081-3094
(2017) https://doi.org/10.1109/ISBI.2017.7950704
Suman Sedai, Ruwan Tennakoon, Pallab Roy Khoa Cao and Rahil Garnavi
IBM Research - Australia, Melbourne, VIC, Australia
localization of the fovea, second stage produces an accurate
segmentation of the fovea region.
We present an algorithm that automatically detects cones in
AOSLO split-detection images without supervision. Our
algorithm is among the first that use machine learning to
develop and use a photoreceptor model on-the-fly. Comparing
to Cunefare et al. (2016), specifically, the approach presented
here can tackle both densely and sparsely populated
photoreceptor images as it is independent of the spatial
arrangement of cones. Further, it introduces contrast
enhancement filters, which improve the quality of low signal-to-
noise ratio (SNR) images.
m

Optic disc and Cup segmentation or detection
Visual comparison of the predicted results and correct
segmentation on RIM-ONE v.3 for the optic disc (a)-(c), (g)-(i)
and cup (d)-(f), (j)-(l). On (d)-(f), (j)-(l) region of the optic disc is
shown as an input image.
https://doi.org/10.1109/TPAMI.2016.2577031
We propose a simple yet
effective method, termed
Deep Descriptor
Transforming (DDT), for
evaluating the correlations of
descriptors and then
obtaining the category-
consistent regions, which can
accurately locate the common
object in a set of unlabeled
images, i.e., unsupervised
object discovery.

IMAGE CLASSIFICATION #1
July–August, 2017 Volume 1, Issue 4, Pages 322–327
Cecilia S. Lee, MD, Doug M. Baughman, BS, Aaron Y. Lee, MD, MSCI
Department of Ophthalmology, University of Washington School of Medicine,
Seattle, Washington. http://dx.doi.org/10.1016/j.oret.2016.12.009
Examples of identification of pathology by the deep learning algorithm. Optical coherence
tomography images showing age-related macular degeneration (AMD) pathology
(A, B, C) are used as input images, and hotspots (D, E, F) are identified using an occlusion
test from the deep learning algorithm. The intensity of the color is determined by the drop
in the probability of being labeled AMD when occluded.
An occlusion test (Zeiler and Fergus, 2016) was performed to identify the
areas contributing most to the neural network's assigning the category of
AMD. A blank 20 × 20-pixel box was systematically moved across every
possible position in the image and the probabilities were recorded. The
highest drop in the probability represents the region of interest that
contributed the highest importance to the deep learning algorithm.
Varun Gulshan, PhD1; Lily Peng, MD, PhD1; Marc Coram, PhD1; et al
JAMA. 2016;316(22):2402-2410. doi:10.1001/jama.2016.17216
Validation Set Performance for All-Cause Referable Diabetic
Retinopathy in the EyePACS-1 Data Set (9946 Images) Performance of
the algorithm (black curve) and ophthalmologists (colored circles) for all-cause
referable diabetic retinopathy, defined as moderate or worse diabetic
retinopathy, diabetic macular edema, or ungradable image. The black
diamonds highlight the performance of the algorithm at the high-sensitivity
and high-specificity operating points.

IMAGE CLASSIFICATION #2
Stefanos Apostolopoulos, Carlos Ciller, Sandro I. De Zanet,
Sebastian Wolf, Raphael Sznitman
Ahmed ElTanboly,Marwa Ismail, Ahmed Shalaby, Andy Switala, Ayman El-Baz, Shlomit
Schaal, Georgy Gimel’farb,Magdi El-Azab
First published: 17 March 2017
DOI: 10.1002/mp.12071
https://doi.org/10.1146/annurev-bioeng-071516-044442

IMAGE Quality in image classification
Image Restoration: From Sparse and Low-rank Priors to Deep Priors
Learning Deep CNN Denoiser Prior for Image Restoration
Lei Zhang,, Wangmeng Zuo
The Hong Kong Polytechnic University, Harbin Institute of Technology
CLEAN
GAUSSIAN NOISE
GAUSSIAN BLUR
Example performance of quality resilient networks on various quality
distortions. This table shows the class prediction for an image under several
different types of distortions (from top to bottom: clean, Gaussian noise and
Gaussian blur). The original VGG16 network (Mclean
) fails on distorted images.
Networks fine-tuned on different types of distortions perform well on that
particular distortion, but not on other distortion types (Mnoise
and Mblur
). Our
mixture of experts based model (Mmix
) performs well over all distortion types as
well as the original clean image.
State-of-the-art image classification networks like VGG-16 perform poorly on blurred input (left),
when using model weights trained on high-quality sharp image datasets (center). However, while
they often make erroneous predictions in terms of the most likely classes for a blurred image, they
do so with lower confidence—producing distributions that are higher-entropy than those for sharp
images. However, this drop in performance is largely an artifact of being trained without any
blurred examples. We find that by fine-tuning the model on a mix of blurred and sharp images
for just a few epochs, allows it to perform well on both sharp and blurred inputs (right).

IMAGE Restoration enhancement
Deep Bilateral Learning for Real-Time Image Enhancement
MICHAËL GHARBI, MIT CSAIL; JIAWEN CHEN, Google Research; JONATHAN T. BARRON,
Google Research; SAMUEL W. HASINOFF, Google Research; FRÉDO DURAND, MIT
CSAIL / Inria, Université Côte d’Azur, http://dx.doi.org/10.1145/3072959.3073592
Our novel neural network architecture can reproduce sophisticated
image enhancements with inference running in real time at full HD
resolution on mobile devices. It can not only be used to dramatically
accelerate reference implementations, but can also learn subjective
effects from human retouching.
Image Restoration: From Sparse and Low-rank Priors to Deep Priors
Lei Zhang,, Wangmeng Zuo
The Hong Kong Polytechnic University, Harbin Institute of Technology
Kai Zhang ; Wangmeng Zuo ; Yunjin Chen ; Deyu Meng ; Lei Zhang
https://doi.org/10.1109/TIP.2017.2662206
An example to show the capacity of our proposed model for three different tasks (denoising, super-resolution, JPEG image deblocking). The input image is composed by noisy images with noise level 15
(upper left) and 25 (lower left), bicubically interpolated low-resolution images with upscaling factor 2 (upper middle) and 3 (lower middle), JPEG images with quality factor 10 (upper right) and 30 (lower right).
Note that the white lines in the input image are just used for distinguishing the six regions, and the residual image is normalized into the range of [0, 1] for visualization. Even the input image is corrupted with
different distortions in different regions, the restored image looks natural and does not have obvious artifacts.

IMAGE CLASSIFICATION Jointly with image restoration
(a) The whole ground truth image 0051x4 from DIV2K dataset. We show the
comparison of the zoom-in region between: (b) the ground truth; (c) the noisy image
with i.i.d. Gaussian noise of zero mean and σ = 30; (d) the denoised image by BM3D
; the denoising result of our proposed denoising network (e) without the guidance of
high-level vision information; (f) with the guidance of high-level vision information
Our experimental results demonstrate that the proposed architecture
not only yields superior image denoising results preserving fine
details, but also overcomes the performance degradation of different
high-level vision tasks, e.g., image classification and semantic
segmentation, due to image noise or artifacts caused by conventional
denoising approaches such as over-smoothing.
We propose a novel end-to-end differentiable architecture for joint denoising,
deblurring, and classification that makes classification robust to realistic noise and
blur. The proposed architecture dramatically improves the accuracy of a
classification network in low light and other challenging conditions,
outperforming alternative approaches such as retraining the network on noisy and
blurry images and preprocessing raw sensor inputs with conventional denoising
and deblurring algorithms

UNCERTAINTY in image enhancement
In this work, we investigate the value of uncertainty modelling in 3D super-
resolution with convolutional neural networks (CNNs). However, the highly ill-
posed nature of such problems results in inevitable ambiguity in the learning of
networks. We propose to account for intrinsic uncertainty through a per-patch
heteroscedastic noise model and for parameter uncertainty through approximate
Bayesian inference in the form of variational dropout. We demonstrate through
experiments on both healthy and pathological brains the potential utility of such
an uncertainty measure in the risk assessment of the super-resolved images for
subsequent clinical use.
This paper proposes a new implementation of supervised image quality
enhancement method referred as Bayesian image quality transfer (IQT). via
CNNs. This involves two key innovations in CNN-based models: 1) we extend
the subpixel CNNs previously limited to 2D images, to 3D volumes,
outperforming previous models in accuracy and speed on a DTI SR task; 2)
we devise new architectures enabling estimates of different components of
the uncertainty in the SR mapping

Sparsity and Model compressability
We thoroughly explored the granularity of sparsity with experiments on detailed
accuracy-density relationship. Due to the advantage of index saving, coarse-grained
pruning is able to achieve a higher model compression ratio, which is desirable for mobile
implementation. We also analyzed the hardware implementation advantages and show
that coarse-grained sparsity saves 2× output∼ memory access compared with fine-
grained sparsity, and ∼ 3× compared with dense implementation. Given the advantages
of simplicity and efficiency from a hardware perspective, coarse-grained sparsity enables
more efficient hardware architecture design of deep neural networks.

Towards multimodal models
Combining structuralandfunctionaldata

Future of OCT and retinal biomarkers
From Schmidt-Erfurth et al. (2016): “The therapeutic efficacy of VEGF inhibition in combination with the potential of
OCT-based quantitative biomarkers to guide individualized treatment may shift the medical need from CNV treatment
towards other and/or additional treatment modalities. Future therapeutic approaches will likely focus on early and/or
disease-modifying interventions aiming to protect the functional and structural integrity of the morphologic complex
that is primarily affected in AMD, i.e. the choriocapillary - RPE – photoreceptor unit. Obviously, new biomarkers
tailored towards early detection of the specific changes in this functional unit will be required as well as follow-up
features defining the optimal therapeutic goal during extended therapy, i.e. life-long in neovascular AMD. Three novel
additions to the OCT armamentarium are particularly promising in their capability to identify the biomarkers of the
future:”
Polarization-sensitive OCT OCT angiography Adaptiveopticsimaging
“this modality is particularly appropriate to highlight early
features during the pathophysiological development of
neovascular AMD
Findings from studies using adaptive optics implied that
decreased photoreceptor function in early AMD may be
possible, suggesting that eyes with pseudodrusen appearance
may experience decreased retinal (particularly scotopic) function
in AMD independent of CNV or RPE atrophy.”
“...the specific patterns of RPE plasticity
including RPE atrophy, hypertrophy, and
migration can be assessed and quantified).
Moreover, polarization-sensitiv e OCT allows
precise quantification of RPE-driven disease
at the early stage of drusen”,
“Angiographic OCT with its potential
to capture choriocapillary, RPE, and
neuroretinal fetures provides novel
types of biomarkers identifying
disease pathophysiology rather than
late consecutive features during
advanced neovascular AMD.””
Schlanitz et al. (2011)
zmpbmt.meduniwien.ac.at
See also Leitgeb et al. (2014)
Zayit-Soudry et al. (2013)

Multimodal models in general in medicine
https://dx.doi.org/10.1097%2FWCO.0000000000000460
Imaging plus X: multimodal models of neurodegenerative disease
Neil P. Oxtoby and Daniel C. Alexander, for the EuroPOND consortium
Old paradigm disease progression models. (a) It shows the hypothetical
model of Jack et al. (2010), which illustrates qualitative sigmoid evolution in
AD of scalar biomarkers such as CSF Aβ level, cognitive test scores and
hippocampal volume or atrophy. The lack of quantitative information prevents
direct diagnostic usage. (b) It shows a traditional longitudinal model of AD
atrophy Scahill et al. (2002) by binning individuals a-priori into ‘mild’,
‘moderate’ and ‘severe’ classes based on cognitive test scores. The model
can potentially match new individuals to the same stages using imaging data,
but must exclude cognitive scores to avoid circularity. AD, Alzheimer's disease.
The temporally continuous self-modelling regression approach of Jedynak et al. (2012).
The model shows the characteristic trajectories of a diverse set of biomarkers against a
common continuous disease stage variable learned from the ADNI and PAQUID (Personnes
Agées Quid) data sets. The model can potentially estimate the disease stage of a new
patient by identifying the position along the trajectory set that best matches their data.
ADNI, Alzheimer's disease neuroimaging initiative.
We have reviewed data-driven model-based analyses of neurodegenerative disease. We have argued the
potential for generative data-driven models to take centre stage in the study and management of
neurodegenerative diseases if we are to generate new avenues for disease understanding in the earliest,
preclinical stages. This is necessitated by the challenges in monitoring any neurological disease over its
full time course, coupled with overlapping phenotypes and lack of a single biomarker that is dynamic
across the full disease time course.
The main focus of development and application to date has been in Alzheimer's disease, but various efforts
including the EuroPOND project are expanding the application to other dementias, multiple-sclerosis, prion
diseases, normal ageing and development, and even non-brain applications. These techniques have the
potential for widespread impact in realising precision medicine across many such domains.

Retina as deep learning network
Photoreceptor
layer
Horizontal
Cells
BipolarCells AmacrineCells GanglionCell
layer
DL Layer1 DLLayer2 DL Layer3 DL Layer4 DLLayer5
LIGHT BRAIN
With enough data, we can do densely
connected (i.e. every layer is connected to
every other layer) feedforward network (or
even recurrent) not having to constrain the
network as all the modulatory pathways are
notwell known
https://arxiv.org/abs/1608.06993; Cited by 29
Joint training of
alllayers with
layer-wise
targets
derived from
ERGand
pupillometry
OPN4
https://arxiv.org/abs/1409.5185;Citedby292
Forexample, glaucoma
affectsganglion cell function,
whereas retinitis
pigmentosa affects
photoreceptors
DL-Deeplearning
OPN4- Melanopsin (ipRGC)

Retina (and V1) as deep learning network
DOI: 10.13140/RG.2.2.27438.72003 12/2016, Conference: NIPS 2016 Workshop -
Brains and Bits: Neuroscience Meets Machine Learning,
Riccardo Volpi, Istituto Italiano di Tecnologia; Matteo Zanotto; Diego Sona,: Vittorio Murino
International Work-Conference on the Interplay Between Natural and Artificial Computation
IWINAC 2017: Natural and Artificial Computation for Biomedicine and Neuroscience pp 464-472
Towards a Deep Learning Model of Retina: Retinal Neural Encoding of
Color Flash Patterns
Antonio Lozano. Javier Garrigós, J. Javier Martínez, J. Manuel Ferrández, Eduardo Fernández
Visualizing the internal activity of a CNN
in response to a natural scene stimulus.
(A-C) Time series of the CNN activity
(averaged over space) for the first
convolutional layer (8 units, A), the
second convolutional layer (16 units, B),
and the final predicted response for an
example cell (C, cyan trace). The
recorded (true) response is shown below
the model prediction (C, gray trace) for
comparison. (D) Spatial activation of
example CNN filters at a particular time
point. The selected stimulus frame (top,
grayscale) is represented by parallel
pathways encoding spatial information
in the first (purple) and second (green)
convolutional layers (a subset of the
activation maps is shown for brevity). (E)
Autocorrelation of the temporal activity
in (A-C). The correlation in the recorded
firing rates is shown in gray
https://doi.org/10.1101/120956
Furthermore, the composite nonlinear computation performed by retinal
circuitry corresponds to a boolean OR function applied to bipolar cell feature
detectors. Our general computational framework may aid in extracting
principles of nonlinear hierarchical sensory processing across diverse
modalities from limited data.

Retina Model synthesis as Deep learning architecture
Indirectinferenceonretinalcircuit: Hardtorecordeveryintermediatestepin humans
INPUT
Light
OUTPUT
Pupil size
McDougal and Gamlin 2008
AUXILIARY OUTPUT
functional MRI (fMRI)
Temporal transfer functions for the postreceptoral cone
pathways.Spitschanet al. (2016). Seealso Hung etal. (2016).The original responses from the achromatic luminance experiments and their
derived PCA waveforms. The results of the component analysis illustrate that the
pupil response can be described quite well as a linear sum of a sustained and a
transientcomponent. - Young etal. (1993)
Maynard et al. (2015)
INTERMEDIATE
OUTPUT
Electroretinography (ERG)
(left) Proposed neural pathways andsynapticmechanisms underlying ipRGC
influence on light adaptation (right) M1 ipRGCs modulate the light-adapted
ERG b-wave viaD4dopaminereceptors– Priggeetal. (2016)
Multifocal Electroretinogram (UC Davis)
The relative spectral sensitivities of the five
photoreceptors in human retina, including S-, M-,
L-cones, rods, and ipRGCs (A), LED spectral
distributions (B), and LED chromaticities in 1964
CIE 10°space(C).- Cao etal. (2015)
Deeplearningframework forphototransduction studies, and clinicaldiagnosisdecisionsupportsystems

Retina Model synthesis Photoreceptor contributions #1: ERG
INPUT
Light
OUTPUT
Pupil size ?
Not done in the study by
Allen et al. (2016)
INTERMEDIATE OUTPUT
Vary the light parameters (intensity, wavelight, modulation) to probe what are the 'normal' responses
either in visual processing/phototransduction in 'basic science' paradigms, or alternatively employ light
parametersthatbestdiscriminate between retinalpathologies.
Note! In optimally constructed model with more parameters (more explicit retinal circuitry), one could infer all possible outcomes
(pathologicalornot)fromtheframework.Butinpracticewearelimitedtothedataavailable.
For example if glaucoma is shown to be detected well using PLR, we could extend that dataset with using same protocol and
simultaneously record ERG, visual fields, etc, and then have more complete model, and then have “good” predictive power with
ERGandvisualfieldaloneifPLRisnotpossible todo.
Rod and cone ERGsover mesopic
irradiances. Allenetal.(2016)
Stimulusdesignand quantification. The output
of athree-primaryLED light source (peak
emission at 354, 460, and 600 nm) wasused to
generatefour spectra, with precise excitation of
melanopsin, rod, SWS, and LWSopsins.
Allenetal.(2016)
Normalized b-wave amplitudes(G), implicit times(H), and OP amplitudes (I) for light-adapted
cone ERGsin Opn1mwR mice for pairsof rod-divergent stimuli(blackfilled circles are rod/mel-
lowand grayopen circles are rod/mel-high) withstimulusintensityquantified intermsof rod
effective photons/cm2/s. - Allen et al. (2016)
We now have the 'pure photoreceptor' response (well,
you know Ray), and if these responses are normal but
PLR abnormal, we could assume that the problem is
downstream giving hints about the given pathology

ERG Methodological background #1
Bingyao Tan; Erik Mason; Benjamin MacLellan; Kostadinka K. Bizheva
IOVS March 2017, Vol.58, 1673-1681. doi:10.1167/iovs.17-21543
Comparison of the changes in the total axial retinal blood flow (RBF) and the ERG b-wave
magnitude resulting from 200-ms single flash and 1-second, 10 Hz, 20% duty cycle flicker stimuli of
the same illumination intensity. (A) Representative ERG traces. The pink and gray shaded areas mark
the duration of the visual stimuli. Original time recordings of the total axial RBF in response to the
single flash and flicker stimuli.
Pedro Monsalve; Giacinto Triolo; Jonathon Toft-Nielsen; Jorge Bohorquez; Amanda D. Henderson;
Rafael Delgado; Edward Miskiel; Ozcan Ozdamar; William J. Feuer; Vittorio Porciatti
Translational Vision Science & Technology May 2017, Vol.6, 5.
doi:10.1167/tvst.6.3.5
A new PERG method with increased dynamic range allows recording of retinal
ganglion cell function in advanced stages of optic nerve disorders. It also
quantifies the response decline during the test, an autoregulatory
adaptation to metabolic challenge that decreases with age and presence of
disease.
Here we describe a new method for steady-state PERG recording in human
based on a visual display unit built with Light-Emitting Diode (LED) technology,
skin electrodes, and optimized signal processing to quantify response
adaptation (dubbed PERGx as a contraction of PERGnext). We show that,
compared to a validated method, the PERGx has a very high signal-to-noise ratio
(SNR); this suggests that meaningful responses can be recorded in advanced
stages of diseases such as nonarteritic ischemic optic neuropathy (NAION).
PERGx temporal dynamics and intrinsic variability in a representative
normal subject. (A) The amplitude of PERGx samples (blue circles, 16
consecutive partial averages of 64 epochs each over 2 minutes) progressive
declined (adapted) with a slope of −0.031 μV/sample (R2 = 0.48), whereas the
PERGx phase (red circles) was stationary. (B) Polar diagram displaying combined
amplitude and phase of PERG samples (open black circles) and noise samples
(open grey triangles). The PERG amplitude (1.65 μV) is represented by the
length of vector connecting the origin of the axes with the cluster centroid. The
PERG phase (63.6°) is represented by the angle Φ between the vector and the
x-axis.

ERG Methodological background #2
https://doi.org/10.1007/s10633-017-9593-y
Discrete Wavelet Transform (DWT) analysis applied to the mfERG response from a control (left)
and a patient (right). Topgraphical representation of the 2F-mfERG M-sequence used here
(MOFOFO), with frames displaced in time in order to better correspond visually to the recorded
response. The original signal from one hexagon of the mfERG (waveform inside box on top) can
be decomposed into many frequency levels, depending on the length of the time series. The first
level (1211 Hz) corresponds to high frequencies (noise), while the highest level (11 Hz)
corresponds to the lowest frequencies. For each frequency level, the vertical lines represent
individual wavelet coefficients. For each level, the variance between these coefficients is
computed and subjected to further analysis as the WVA (wavelet variance). Legend: DC direct
component; IC1 first induced component; IC2 second induced component
The entire process of retinal visual processing involves
the phototransduction cascade with different groups of
cells and circuits from the photoreceptors to the
ganglion cells. Thus, electrical signals produced by
different biological structures contribute to the retinal
response of the mfERG that is recorded from the cornea
[Hood et al. (2002); Luo et al. (2011)]
. In the standard mfERG, amplitude
and implicit time are often analyzed [Hood et al. (2012)]
.
Early glaucoma Dilru C Amarasekera BS, Arthur F Resende MD, Michael Waisbourd MD, Sanjeev Puri MD, Marlene R Moster MD,
Lisa A Hark PhD, L Jay Katz MD, Scott J Fudemberg MD, Anand V Mantravadi MD
First published: 20 July 2017 DOI: 10.1111/ceo.13006
Unreliable test results were excluded.
Abbreviations: ss-PERG=Steady-State Pattern Electroretinogram; SD-tVEP=ShortDuration
transient Visual Evoked Potentials; Lc=Low Contrast; Hc=High Contrast; SNR=Signal-to-
Noise Ratio.
Electrophysiological techniques thus play a valuable role in a diagnostic environment dominated by
highly effective tools such as OCT via the addition of an objective functional perspective to the
diagnosis of glaucoma. Although the use of PERG and VEP as a measure of retinal ganglion cell and
visual pathway dysfunction has been established, few studies have measured the potential clinical
utility of the novel rapid testing platform of ss-PERG and SD-tVEP in patients with glaucoma.ss-
PERG was effectively able to discern between glaucomatous and healthy eyes. The diagnostic
ability of ss-PERG was superior to that of SD-tVEP. ss-PERG may thus have a role as a clinically useful
electrophysiological diagnostic tool.

Retina Model synthesis Photoreceptor contributions #2: PLR
INPUT
Light
OUTPUT
Pupil size
INTERMEDIATE OUTPUT
Electroretinography (ERG)?
ERGnot done thistime
Experimental design. (A, Left) L, M, and S cones and melanopsin-containing
ipRGCs mediate vision at daytime light levels. (Center) Photoreceptor spectral
sensitivities. (Right) Physiological measurements of ipRGCs find excitatory L
and M cone inputs and inhibitory S-cone inputs (12). (B) A digital spectral
integrator produces sinusoidal photoreceptor-directed modulations that pass
through an artificial pupil into the pharmacologically dilated left eye. The
consensual pupil response of the right eye is recorded. (C) Photoreceptor-
directed modulations. Balanced changes in the spectrum of light around a
background spectrum nominally isolate targeted photoreceptors. -
Spitschan et al. (2014)
Group PLR data are well fit by the two componentlinear filtermodel. (A) The mean response across all subjects
(01–16) is shown at 0.05 and 0.5 Hz, for L+M-, melanopsin-, and S-cone-directed modulations. Fit values are
derived from those found for subject 01, with only amplitude parameters adjusted (Table S2). This is because the
average data are available at only two temporal frequencies and do not sufficiently constrain all parameters of the
model. To obtain the average data plotted, amplitudes and phases were averaged separately (i.e., average
amplitude obtained without consideration of phase, average phase obtained without consideration of amplitude).
The model was fit to the data as plotted. (B)Polar-plot representations of the group data with model fit points,
following conventions as in Fig 3. The data are normalized separately for each temporal frequency. Error bars (± 2
SEM across subjects) are smaller than the plot points for the data. -Spitschan et al. (2014)
Now aswe are feeding
in more data,we are in
theory learning how the
lightparameters
should be designed
tohave the best
photoreceptor
response isolation.
And have presentations
for corresponding ERG
and PLR responses.
It would also help if all the
studieswerefromhumans:P

Retina Model synthesis further downstream
INPUT
Light
OUTPUT
Pupil size
INTERMEDIATE OUTPUT
“KNOWN BEHAVIOR”
Auxiliary OUTPUT
dLGN
Build on top of previous models. We “know” how specific light stimulus is processed by the retina (ERG), and how is
this reflected in pupil behavior (PLR) via olivary pretectal nucleus (OPN). So using the same parameters, record the
activityofLGN for example whichisnice atleast forbasicscience, not necessarilyforpathologyscreening.
A: LED spectral power densities and in vivo photoreceptor spectral sensitivity
(normalised). The output of blue and yellow LEDs was adjusted to produce
equivalent effects on rods (black line). By contrast, the blue LED, always
appeared brighter for melanopsin (green line). B: Protocol 1. Melanopsin-
isolating steps in dLGN and retina, respectively) presentations of the blue LED
were interleaved with 210 or 180 sec of the (dLGN and retina, respectively)
yellow to produce a ‘step’ visible only to melanopsin. C: Protocol 2. Irradiance
slowly ramped up (0.5 ND per 200 seconds) before remaining at a steady state
for 10 seconds. D: The effective change in photon flux for melanopsin (green)
and rods (black) across a full repeat of Protocol 2. Settings of ND filter at the
point of each melanopsin isolating step are provided above.
- Davis et al. (2015)
INTERMEDIATE OUTPUT #2
Ganglion cell firingrates
Responsestomelanopsin-isolatingstepsand
gradual irradiance rampsin retina.
Responsesto melanopsin-stepsin the dLGN.

Retina Model synthesis
INPUT
Light
INTERMEDIATE OUTPUT
Electroretinography(ERG)
OUTPUT
Pupil size
Auxiliary OUTPUT
dLGN
Sonow we know how retina worksin a data-drivendeep learningsense (noexplicitmodelling ofretinainbiological
sense). We can heuristically cheatand define connectionsasdefined fromliterature
So as we feed in data from studies, the interactions between blocks are “automagically” quantified by adjusting the
convolutional weights in the deep learning model. At some point if we have enough data we could also start to relax the
circuitconstraints and hypothesize thatthere could be recurrent feedback from dLGN to OPN (controlling pupil size), and do
'blindcausalityanalysis' (Nikolaprobably an experton that)
We have proposed a novel framework for
causal analysis in time-series which does not
require any assumptions about the statistical
relationships among the variables of the study,
i.e., it is model-free.
Our results show that Twitter data polarity
does indeed have a causal impact on the
stock market prices of the examined
companies. Hence, we believe social media
data could represent a valuable source of
information for understanding the dynamics
of stock market movements
http://www.slideshare.net
http://dx.doi.org/10.1534/genetics.114.165704

Retina Model synthesis Pathologies?
INPUT
Light
INTERMEDIATE OUTPUT
OUTPUT
Pupil size
Auxiliary OUTPUT
dLGN
In case with glaucoma, one would expect that the peripheral retina gets destroyed first
(A) Schematic diagram showing the flash stimulation sequence of
the slow-sequence (slow flickering stimulation, MOOO) multifocal
electroretinogram (mfERG). (B) The first-order kernel of the slow-
sequence mfERG from the central (rings 1 to 2) and peripheral (rings
3 to 6)region - Chanetal.(2011)s.
Overlapping visual field test-region layout and luminance
characteristics of the multifocal pupillographic objective
perimetrystimuli forall protocols. -Carleetal. (2014)
Now we can define normal and pathologies as classes as you would in typical image classificationtasks ('dogs',
'cats', 'etc'), but instead of just using single image (whether it be fundus or OCT (SD/SS/AO/Angiography), we can
combine boththe image and behavioral response for better quantificationof the retinal pathology.

Retina Model synthesis VISUAL FIELD
Old school psychophysical functional measure that is often found stressful by the patients
https://doi.org/10.1016/j.ophtha.2017.04.021
De Moraes CG, Hood DC, Thenappan A, Girkin CA, Medeiros FA,
Weinreb RN, Zangwill LM, Liebmann JM.
Central visual field damage seen on the 10-2 test is
often missed with the 24-2 strategy in all groups. This
finding has implications for the diagnosis of glaucoma
and classification of severity.
JAMA Ophthalmol. 2017;135(7):783-788.
doi: 10.1001/jamaophthalmol.2017.1659
JAMA Ophthalmol. 2017;135(7):742-747.
doi: 10.1001/jamaophthalmol.2017.1396
A deep-learning based automatic
glaucoma identification
ARVO 2017: 320 Visual Fields, Vision Function, Psychophysics I
Serife Seda S. Kucur, Mathias Abegg, Sebastian Wolf, Raphael Sznitman.
ARTORG Center, University of Bern, Bern, Switzerland; Department of Opthalmology,
Inselspital Bern, Bern, Switzerland.
The inherent local and global characteristics of visual fields (VFs)
can be exploited in a strong data-driven sense and could provide
better understanding of VFs with regards to glaucoma. Ultimately,
this may help to efficiently automatize the diagnosis process.
Our hypothesis is that alternative representations of raw VFs, in
terms of different spatial scales, could be learned by computers
using machine learning techniques towards an effective
automatized glaucoma identification task. Accordingly, we present
a Convolutional Neural Network (CNN)-based approach for
classification of VFs as being glaucomatous or non-glaucomatous.
Conclusions: These results support the fact that processing Vfs
through a CNN generates different representation of data in
terms of its hidden characteristics and patterns that are efficient
to discriminate between glaucomatous and non-glaucomatous
VFs in an automated way. The performance could be further
improved with a different CNN architecture. The trained CNNs
have the potential to be utilized for glaucoma progression
analysis as well
https://doi.org/10.1016/j.ophtha.2017.01.027
http://dx.doi.org/10.1097/IJG.0000000000000710

Retina Model synthesis beyond retinopathies #1
What to diagnose fromthe eye, e.g. neurodegenerative disease such as alzheimer’s disease
Is the Eye an Extension of the Brain in
Central Nervous System Disease?
Lies De Groef1,2
and Maria Francesca Cordeiro1,3,4
Journal of Ocular Pharmacology and Therapeutics. June 2017,
https://doi.org/10.1089/jop.2016.0180
1
Glaucoma and Retinal Neurodegenerative Disease Research Group, Institute of Ophthalmology, University
College London, London, United Kingdom.
2
Neural Circuit Development and Regeneration Research Group, Department of Biology, University of Leuven,
Leuven, Belgium.
3
Western Eye Hospital, Imperial College Healthcare NHS Trust, London, United Kingdom.
4
ICORG, Department of Surgery and Cancer, Imperial College London, London, United Kingdom.
Compilation of examples to illustrate the concept ‘‘the eye as a window to the brain’’.
Typical ocular diseases, such as uveitis, glaucoma, and AMD, have in common several
pathological mechanisms with CNS diseases, for example, MS and AD. Both in vivo and post
mortem examinations of the eye can therefore be used to study the disease mechanisms
underlying these pathologies in the eye and brain. (1) fluorescein angiography; (2)
intraocular pressure measurement (copyright iCare, TonoLab); (3) optical coherence
tomography scan; (4) confocal scanning laser ophthalmoscopy imaging of curcumin-labeled
protein aggregates; (5) retinal oximetry; (6) ZO-1 tight junction immunostaining on
wholemounted retina; (7) transmission electron microscopy image of trabecular meshwork;
(8) Iba-1 microglia immunostaining on retinal section; (9) Brn3a retinal ganglion cell
immunostaining on wholemounted retina; (10) β-amyloid immunostaining on retinal section;
and (11) concanavalin A vessel labeling on wholemounted retina. AD, Alzheimer’s; AMD,
age-related macular degeneration; MS, multiple sclerosis
Front Aging Neurosci. 2017; 9: 214.
Published online 2017 Jul 6. doi: 10.3389/fnagi.2017.00214
The Role of Microglia in Retinal Neurodegeneration:
Alzheimer's Disease, Parkinson, and Glaucoma
Ana I. Ramirez,1,2 Rosa de Hoz,1,2 Elena Salobrar-Garcia,1,3 Juan J. Salazar,1,2 Blanca Rojas,1,3 Daniel Ajoy,1
Inés López-Cuenca,1 Pilar Rojas,1,4 Alberto Triviño,1,3 and José M. Ramírez1,3,*
Front Neurol. 2017; 8: 162.
Published online 2017 May 4. doi: 10.3389/fneur.2017.00162
Retinal Ganglion Cells and Circadian Rhythms in
Alzheimer’s Disease, Parkinson’s Disease, and
Beyond
Chiara La Morgia,1,2,* Fred N. Ross-Cisneros,3 Alfredo A. Sadun,3,4 and Valerio Carelli1,2
Summary of circadian rhythm
abnormalities in AD, PD, and HD.
AD, Alzheimer’s disease; PD, Parkinson’s disease;
HD, Huntington’s disease; IV, intra-daily variability;
IS, inter-daily stability; RA, relative amplitude; BP,
blood pressure; HR, heart rate.
Schematic representation of the hypothetical events
associated with the neuroinflammation in AD (A),
PD (B), and glaucoma (C). AD, Alzheimer's Disease; PD,
Parkinson's Disease; ILM, inner limitant membrane; NFL,
nerve fiber layer; GCL, ganglion cell layer; IPL, inner
plexiform layer; INL, inner nuclear layer; OPL, outer
plexiform layer; ONL, outer nuclear layer; OLM, outer
limitant membrane; PL, photoreceptor layer; RPE, retinal
pigment epithelium; BM, Bruch membrane; C, choroid;
Aβ, beta-amyloid; pTau, phosphorylated tau.

Health Economics for Medical
Startups | Background

Business Models focus
●
Often technical founders focus too much on the technology, and do no achieve the
Product-market fit
– In medical startups, it is often very useful to do proper health economics
calculations to see your idea to customers and investors.
●
In other words, how much can your solution make the healthcare more efficient
economically while improving quality of care to the patient.
– Other common problem in the long run is the reimbursement as in most
countries, the patient itself does not that pay fully the healthcare that the patient
receives, and the market access is complicated with varying regulations/policies in
each country.
http://startupheretoronto.com
www.smi-online.co.uk

Business Models Innovations on the model
https://hbr.org/2016/10
Healx: A Case Study
Informed by our business model framework, we advised (and Cambridge
Judge Business School’s business accelerator supported) the tech venture
Healx, which focuses on the treatment of patients with rare diseases in the
emerging field of personalized medicine. A big challenge for pharmaceutical
companies in this domain is that rare-disease markets are very small, so
companies usually have to charge astronomical prices. (One drug, Soliris,
used in the treatment of paroxysmal nocturnal hemoglobinuria, costs about
$500,000 per patient-year.)
Enter Healx, with a platform that leverages big data technology and analytics
across multiple databases owned by various organizations within global life
sciences and health care to efficiently match treatments to rare-disease patients.
Its initial business model hit three of our six key features. First, Healx’s value
proposition was about asset sharing (for example, making available clinical-trial
databases that record the effectiveness of most drugs across therapeutic areas
and diseases, including rare ones). Second, the business promised
more personalization by revealing drugs with high potential for treating the rare
diseases covered. Finally, Healx’s model would, in theory, create a collaborative
ecosystem by bringing together big pharma (which has the treatment and trial
data) and health care providers (which have data about effectiveness and
incompatibility reactions and also personal genome descriptions).
https://healx.io/
More recently, Healx has developed a machine-learning algorithm that can use a
patient’s biological information not only to match drugs to disease symptoms but
also to predict exactly which drug will achieve what level of effectiveness for
that particular patient. The latest version of its business model
brings personalization to the maximum possible level and adds agility, because
the treating clinician—armed with the biological data and the algorithm—can make
better treatment decisions directly with the patient and doesn’t have to rely on
fixed rules of thumb about which of the few available off-label drugs to use. In
this way, Healx is able to support decentralized, real-time, accurate decision
making.
This version of the Healx model has even more transformation potential—it
exhibits four of the six features; it has already generated revenue from
customers; and in the long term it could empower patients by giving them much
more information before they consult a medical practitioner. Although it is still
too early to tell whether that potential will be realized, Healx is clearly a venture
to watch. It has earned a number of prizes (including the 2015 Life Science
Business of the Year and the 2016 Graduate Business of the Year in the
Cambridge cluster) and sizable investments from several global funds.

LOSs Function performance quantification
●
In medical studies, the ROC curve and especially Area Under the Curve (AUC) is used as an
easy scalar to describe the performance of the classifier.
TensorFlow allows direct
optimization of ROC
http:dx.doi.org/10.1093/bib/bbr008
http://arxiv.org/abs/1605.06652
Conclusion: The AUC is an unreliable
measure of screening performance because
in practice the standard deviation of a
screening or diagnostic test in affected and
unaffected individuals can differ. The
problem is avoided by not using AUC at all,
and instead specifying detection rates
(DRs) for given false positive rates (FPRs) or
FPRs for given Drs.
http://dx.doi.org/10.1177/0969141313517497
http://tflearn.org/objectives/
Mozer, Michael C. "Optimizing classifier
performance via an approximation to
the Wilcoxon-Mann-Whitney statistic."
(2003). aaai.org/Papers
Front Public Health. 2015; 3: 57.
Published online 2015 Apr 20.
doi: 10.3389/fpubh.2015.00057
PMCID: PMC4403252
Threshold-Free Measures for
Assessing the Performance of
Medical Screening Tests

HEALTH ECONOMICAL Loss function
wikipedia.org
Analogies from churn prediction?
http://dx.doi.org/10.1186/s40165-015-0014-6
“Nevertheless, current state-of-the-art classification algorithms are not well
aligned with commercial goals, in the sense that, the models miss to include
the real financial costs and benefits during the training and evaluation phases.
In the case of churn, evaluating a model based on a traditional measure such
as accuracy or predictive power, does not yield to the best results when
measured by the actual financial cost, ie. investment per subscriber on a
loyalty campaign and the financial impact of failing to detect a real churner
versus wrongly predicting a non-churner as a churner”
What are the economical costs of each block in the contingency table, optimization for medical economics?
- More expensive to have false negatives as patients will not be diagnosed both in terms of economical cost and reduced quality of life for patients

Health economics models
https://dx.doi.org/10.3310/hta11410
Screening in UK for Glaucoma, NHS Setting
Published: Ann Intern Med. 2013;159(7):484-489
DOI: 10.7326/0003-4819-159-6-201309170-00686
Estimate of needed duration and number of subjects by Steve Kymes needed for
proper health economical study for glaucoma screening program. Presented by John
Boland at “Should we screen for glaucoma?” session at World Glaucoma Congress
2017 in Helsinki, Finland.
Indian J Ophthalmol. 2011 Jan; 59(Suppl1): S24–S30.
doi: 10.4103/0301-4738.73684 PMCID: PMC3038514
Cost-effectiveness of screening for open angle
glaucoma in developed countries
Anja Tuulonen
Clin Ophthalmol. 2017; 11: 337–346.
doi: 10.2147/OPTH.S120398 PMCID: PMC5317344
Cost and detection rate of glaucoma screening
with imaging devices in a primary care center
Alfonso Anton,1,2,3,4 Monica Fallon,3,5 Francesc Cots,2 María A Sebastian,6
Antonio Morilla-Grasa,4Sergi Mojal,3 and Xavier Castells2

RISK STRATIFICATION & Screening
Target screening for high-risk cases (family history, age, ethnicity, gender)
https://doi.org/10.1016/j.ajo.2017.05.017
(2016) https://doi.org/10.1109/TMI.2016.2608782
We introduce a novel Bayesian nonparametric model that uses
the concept of disease trajectories for disease subtype
identification.. We investigate several models with our
algorithm, and show that one with age, pack years (a measure of
cigarette exposure), and smoking status as predictors gives the
best compromise between estimated predictive performance
and model complexity.
The proposed risk score incorporates both the patients’ non-stationary temporal
physiological information and their individual baseline co-variates in order to accurately
describe the patients’ physiological trajectories.
Aaron Zalewski ; William Long ; Alistair E. W. Johnson ; Roger G. Mark ; Li-wei H. Lehman
Date of Conference: 16-19 Feb. 2017, https://doi.org/10.1109/BHI.2017.7897302

RISK factors
For example for Glaucoma
“Overview of ethnicity and race” by M. Roy Wilson (United States)
at Risk Profiling symposium at World Glaucoma Congress 2017, Helsinki, Finland
http://dx.doi.org/10.1001/jamaophthalmol.2015.1478
http://dx.doi.org/10.1126/science.aam7935

“Doctor AI” Systems | Introduction

AI Doctor
Longitudinal analysis → try to diagnose pathologies as early as possible.
Incorporate disease progression measurements and treatment interventions for
optimal personalized treatment.
Feature engineering remains a major bottleneck when creating predictive systems from electronic
medical records. At present, an important missing element is detecting predictive regular clinical motifs
from irregular episodic records. We present Deepr (short for Deep record), a new end-to-end deep
learning system that learns to extract features from medical records and predicts future risk
automatically. Deepr transforms a record into a sequence of discrete elements separated by coded
time gaps and hospital transfers. On top of the sequence is a convolutional neural net that detects and
combines predictive local clinical motifs to stratify the risk. Deepr permits transparent inspection and
visualization of its inner working. We validate Deepr on hospital data to predict unplanned readmission
after discharge. Deepr achieves superior accuracy compared to traditional techniques, detects
meaningful clinical motifs, and uncovers the underlying structure of the disease and intervention
space.

Condition dynamics Long short-term memory (LSTM)
C memory of LSTM
x diagnoses (features vector)
p procedures, medications
f illness "forgetting" (curing or toxicity)
m planned/unplanned admission flag
h weighed "illness pooling"
i input gate (new information updated to memory)
o output gate (disease state)

Condition dynamics always missing data in clinical time series
TREATING MISSING DATA Various options
1. Zero-Imputation Set to zero when missing data
2. FORWARD-FILLING use previous values
3. MISSINGNESS Treat the missing value as a signal, as lack of a value
measured e.g. in an ICU can carry information itself (Lipton et al.
2016)
4. BAYESIAN STATE-SPACE MODELING to fill the missing data (Luttinen et
al. 2016, BayesPy package)
5. GENERATIVE MODELING Train the deep network to generate missing
samples (Im et al. 2016, RNN GAN; see also github:
sequence_gan)

Condition dynamics -based Individualized treatment
●
Schmidt-Erfurth and Waldstein (2016): There is a critical unmet medical need to identify, characterize, and
validate biomarkers that could provide solid guidance for an efficient individualized treatment with regards to
optimal functional outcome and disease management. Such biomarkers would enable the treating physician to
tailor personalized treatment to each patient's individual disease and need, in order to provide adequate disease
control, minimize recurrence and neurosensory damage, and limit the number of invasive and costly
interventions.
Relationship between initial visual acuity, visual acuity
change and final visual acuity during therapy of
neovascular age-related macular degeneration (i.e.,the
ceiling effect). The interpolation curves illustrate final
visual acuity levels dependent on baseline visual acuity
in the controlled trials CATT and IVAN as well inthe
real-world UK neovascular AMD database study
Role of subretinal fluid as a treatment-modifying imaging
biomarker. In patients with subretinal fluid at baseline (blue
graphs), antiangiogenic therapy leads to identical visual
acuity outcomes, regardless of treatment regimen (monthly
versus every 12 weeks dosing). In contrast, patients without
subretinal fluid at baseline (red graphs) demonstrate
unfavourable outcomes if treatment was not administered
on a monthly basis.
Pigment-epithelial detachment as risk factor for vision loss
during individualized dosing. In the VIEW studies, patients
received continuous anti-VEGF therapy during the first 48
weeks. At 52 weeks, a discontinuous, “as-needed” dosing
regimen was introduced. Only in a precisely defined patient
population, i.e. eyes with pigment-epithelial detachments
developing secondary intraretinal cystoid fluid (IRC, red graph),
the reactive dosing regimen led to pronounced vision loss.
Future therapeutic approaches will likely focus on early and/or disease modifying interventions aiming to protect
the functional and structural integrity of the morphologic complex that is primarily affected in AMD, i.e. the
choriocapillary e RPE e photoreceptor unit.
Multimodal innovative imaging technologies, such as PS-OCT, OCT angiography, and adaptive optics allow access
to yet unidentified biomarkers representing the origin of neovascular AMD as well as functionally relevant
therapeutic aims. Improved big-data applicability and reproducibility aided by computerized OCT analysis will likely
allow personalized antiangiogenic therapy with minimal interventions, while providing maximum disease
control,using advanced imaging software and hardware. It is the responsibility of the scientific and clinical community
to follow the open path of advanced imaging in a collaborative and interdisciplinary approach together with
ophthalmologists, biologists, physicists, and computer scientists in an efficient interdisciplinary approach.

Condition dynamics risk factors for glaucoma progression
https://doi.org/10.1016/j.ajo.2017.06.003
To determine the intraocular and systemic risk factor differences
between a cohort of rapid glaucoma disease progressors and non-
rapid disease progressors.
Conclusion: Cardiovascular disease is an important risk factor for
rapid glaucoma disease progression irrespective of IOP control.

Condition dynamics Disease progression #1
Clin Ophthalmol. 2017; 11: 1015–1020. May 23.
doi: 10.2147/OPTH.S116265 PMCID: PMC5449101
Automated retinal imaging and trend
analysis – a tool for health monitoring
Karin Roesch, Tristan Swedish, and Ramesh Raskar
MIT Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
The future of health diagnostics. Current diagnostics are based on a
“snapshot” in time and limited data points. In the future, large datasets
acquired over time through constant monitoring will be analyzed to
establish baselines and trends, enabling preventative interventions.
Knowing when a feature occurred is key. For example, the MA
population is dynamic and changes occur in a matter of months. For
diabetic retinopathy (DR), it has been established that microaneurysms
(MAs) are the earliest lesions visible.6 Additionally, MA turnover rates
are indicative of early-stage DR as well as the likelihood of DR
progression to macular edema.
Po-Hsiang Chiu, George Hripcsak
Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA
https://doi.org/10.1016/j.jbi.2017.04.009
Learning statistical models of phenotypes using noisy
labeled training data
Vibhu Agarwal Tanya Podchiyska Juan M Banda Veena Goel Tiffany I LeungEvan P Minty Timothy E Sweeney Elsie Gyang Nigam H Shah
J Am Med Inform Assoc (2016) 23 (6): 1166-1173.
DOI: https://doi.org/10.1093/jamia/ocw028

Condition dynamics Disease progression #2
Hrvoje Bogunović; Alessio Montuoro; Magdalena Baratsits;
Maria G. Karantonis; Sebastian M. Waldstein; Ferdinand Schlanitz;
Ursula Schmidt-Erfurth
Investigative Ophthalmology & Visual Science June 2017,
Vol.58, BIO141-BIO150. DOI: 10.1167/iovs.17-21789
Observations at baseline and the first follow-up are used for predicting
drusen regression in the future, for example, the following 1-year period.
Examples of drusen thickness maps and the drusen regression prediction within 1-year
period. Last column shows true positives (green), false positives (orange), and false negatives
(blue). Each row represents one example eye.
http://dx.doi.org/10.1001/jamaophthalmol.2016.5111
http://dx.doi.org/10.1002/sim.7300
Application of our approach using linear mixed models to Alzheimer’s Disease Neuroimaging Initiative data with
bootstrapped 95% CI including boxplots of neocortical Aβ burden (standard uptake value ratio (SUVR)) for each
diagnosis group, separately for amyloid–β positive and negative individuals. It takes 24.47 years to progress from an
SUVR of 0.79 to 1.01. This is equivalent to a rate of 0.009 increase in SUVR per year. Similarly, it takes 10.76 years
to progress from an SUVR of 0.73 to 0.79. See the text for further details. HC, healthy control; MCI, mild cognitively
impaired; AD, Alzheimer’s disease

Condition dynamics Natural Language processing (NLP)
http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf
http://www.bioscience.ai/schedule

Text analysis for clinical notes #1
http://dx.doi.org/10.3233/978-1-61499-753-5-201
Medical Text Classification using Convolutional Neural
Networks
Mark Hughes, Irene Li, Spyros Kotoulas, Toyotaro Suzumura (Submitted on
22 Apr 2017). https://arxiv.org/abs/1704.06841
We present an approach to automatically classify clinical text at a sentence level. We are
using deep convolutional neural networks to represent complex features. We train the
network on a dataset providing a broad categorization of health information. Through a
detailed evaluation, we demonstrate that our method outperforms several approaches
widely used in natural language processing tasks by about 15%.

Text analysis for clinical notes #2
13 April 2017. https://doi.org/10.1109/BHI.2017.7897302
https://doi.org/10.1016/j.jbi.2017.07.006
We proposed the first models based on recurrent neural networks (more
specifically Long Short-Term Memory - LSTM) for classifying relations from
clinical notes.
We also evaluated the impact of word embedding on the performance of
LSTM models and showed that medical domain word embedding help
improve the relation classification. These results support the use of LSTM
models for classifying relations between medical concepts, as they show
comparable performance to previously published systems while requiring
no manual feature engineering.
In this work, we explore the use of Hierarchical Dirichlet Processes (HDP)
as a Bayesian nonparametric framework to infer patients' states of health
by combining multiple sources of data. In particular, we employ HDP to
combine clinical time series and text from the nursing progress notes in a
probabilistic topic modeling framework for patient risk stratification
iDoctor: Personalized and professionalized medical
recommendations based on hybrid matrix factorization
Future Generation Computer Systems
Volume 66, January 2017, Pages 30-35
https://doi.org/10.1016/j.future.2015.12.001

Personalized Medicine | Introduction

Precision / personalized medicine #1
re-work.co/blog
“For the first time, we demonstrate that DLNN trained on a large pharmacogenomic data set can effectively
predict the therapeutic response of specific drugs in specific cancer types, from a large panel of both drugs and
cancer cell lines. These findings serve as a proof of concept for the application of DLNN to predict therapeutic
responsiveness, a milestone in precision medicine.”
http://dx.doi.org/10.1056/NEJMp1500523
http://dx.doi.org/10.3389%2Ffpsyt.2016.00034
http://dx.doi.org/10.1016/j.media.2016.06.024

We introduce an IoT driven architecture and discuss how non-
invasive, affordable, unobtrusive sensing using mobile phones,
wearables and nearables is making physiological and pathological
data collection from human body possible in thus far unimaginable
ways. We also introduce breakthrough technologies in form
of exosomes and 3D organ printing that has the potential to disrupt
the future healthcare landscape.
http://dx.doi.org/10.1007/978-3-319-42141-4_9
https://doi.org/10.1109/TMM.2016.2614225
To facilitate the intensive computation required for interactive analytics, we design an efficient
sparse principal component analysis (SPCA) solver based on a variance reduced stochastic
gradient technique. The benefits of our method are demonstrated by analyzing two different
EHR patient cohorts, a public and a private dataset containing EHRs of 101 767 and 223 076
patients, respectively. Our evaluations show that PHENOTREE can detect clinically meaningful
hierarchical phenotypes.
http://dx.doi.org/10.3390/ijms17091555

Multimorbidity space and dynamic disease progression.
http://dx.doi.org/10.1038/nrg.2016.87
The co-occurrence of diseases can inform the underlying network biology of shared and multifunctional genes and pathways. In addition,
comorbidities help to elucidate the effects of external exposures, such as diet, lifestyle and patient care. With worldwide health transaction data
now often being collected electronically, disease co-occurrences are starting to be quantitatively characterized.
Linking network dynamics to the real-life, non-ideal patient in whom diseases co-occur and interact provides a valuable basis for generating
hypotheses on molecular disease mechanisms, and provides knowledge that can facilitate drug repurposing and the development of targeted
therapeutic strategies.

Glaucoma decision support tools
Old-school methodsformultimodal and structural features
Development of machine learning models
for diagnosis of glaucoma
Seong Jae Kim, Kyong Jin Cho, Sejong Oh
Published: May 23, 2017.
https://doi.org/10.1371/journal.pone.0177726
We used 100 cases of data as a test dataset and 399 cases
of data as a training and validation dataset. To develop the
glaucoma prediction model, we considered four machine
learning algorithms: C5.0, random forest (RF), support vector
machine (SVM), and k-nearest neighbor (KNN).
Color-fundus and red-free fundus photography (A),
peripapillary RNFL thickness measured by SD-OCT (B),
and automated 30–2 visual field test (C). The presence of
a tigroid fundus and peripapillary atrophy was observed, and
there was a decrease in the RNFL thickness on the
peripapillary RNFL thickness scan. In the visual field test,
the abnormalities were judged to be of no clinical
significance.
Computers in Biology and Medicine
Volume 8, Issue 1, January 1978, Pages 25-40
Glaucoma consultation by computer
Sholom Weiss, Casimir A. Kulikowski, Aran Safir
https://doi.org/10.1016/0010-4825(78)90011-2
Automated detection of glaucoma using structural and
non structural features
SpringerPlus December 2016, 5:1519
Anum A. Salam, Tehmina Khalil, M. Usman Akram, Amina Jameel, Imran Basit
First Online: 09 September 2016

Tensor Networks Inspiration from quantum networks #1
Supervised Learning with Quantum-
Inspired Tensor Networks
E. Miles Stoudenmire, David J. Schwab last revised 18 May 2017
Deep Learning and Quantum Entanglement:
Fundamental Connections with Implications to
Network Design
Yoav Levine, David Yakira, Nadav Cohen, Amnon Shashua last revised 10 Apr 2017
Neural networks for computing best
rank-one approximations of tensors
and its applications
Maolin Che, Andrzej Cichocki, Yimin Wei. 22 May 2017
https://doi.org/10.1016/j.neucom.2017.04.058
This paper presents the neural dynamical network
to compute a best rank-one approximation of a
real-valued tensor. We implement the neural
network model by the ordinary differential
equations (ODE), which is a class of continuous-
time recurrent neural network. Finally, we
generalize the proposed neural networks to the
computation of the restricted singular values and
the associated restricted singular vectors of real-
valued tensors. We illustrate and validate
theoretical results via numerical simulations.
Keywords: Neural network, Ordinary differential equations, Lyapunov function, Lyapunov stability theory, Rank-one tensor, Best rank-one
approximation, Z-eigenpair, Symmetric-definite tensor pair, H-eigenpair, The local maximal generalized eigenpair, The local minimal
generalized eigenpair, Generalized tensor eigenpair, Local optimal rank-one approximation, Restricted singular value, Restricted singular
vector
We theoretically analyze convolutional arithmetic circuit (ConvACs), and empirically validate
our findings on more common ConvNets which involve ReLU activations and max pooling.
Beyond the results described above, the description of a deep convolutional network in well-
defined graph-theoretic tools and the formal connection to quantum entanglement, are two
interdisciplinary bridges that are brought forth by this work.
Neural-network representation of the
many-body ground states.
convolutional neural networks, can constitute the
basis of more advanced NQS and therefore have
the potential for increasing their expressive
power.

Tensor Networks Inspiration from quantum networks #2
Low-Rank Tensor Networks for Dimensionality
Reduction and Large-Scale Optimization
Problems: Perspectives and Challenges PART 1
A. Cichocki, N. Lee, I.V. Oseledets, A.-H. Phan, Q. Zhao, D. Mandic last revised 19 Jul
2017 (this version, v2)
Tensor Networks for Dimensionality Reduction and
Large-scale Optimization: Part 2 Applications and
Future Perspectives
A. Cichocki, N. Lee, I.V. Oseledets, A.-H. Phan, Q. Zhao, D. Mandic Foundations and
Trends® in Machine Learning (2017): Vol. 9: No. 6, pp 431-673.
“Tensor decompositions and tensor network algorithms require sophisticated software libraries, which are being rapidly
developed. The TT Toolbox, developed by Oseledets and coworkers, (http://github.com/oseledets/TT-Toolbox) for
MATLAB and (http://github.com/oseledets/ttpy) for PYTHON is currently the most complete software for the TT
(MPS/MPO) and QTT networks. The TT toolbox supports advanced applications, which rely on solving sets of linear
equations (including the AMEn algorithm), symmetric eigenvalue decomposition (EVD), and inverse/psudoinverse of
huge matrices.”
Keywords: Tensor networks, Function-related tensors, CP decomposition, Tucker models, tensor train (TT) decompositions, matrix product states (MPS), matrix product operators (MPO), basic tensor operations, multiway component analysis, multilinear blind source
separation, tensor completion, linear/multilinear dimensionality reduction, large-scale optimization problems, symmetric eigenvalue decomposition (EVD), PCA/SVD, huge systems of linear equations, pseudo-inverse of very large matrices, Lasso and Canonical
Correlation Analysis (CCA)

Tensor Networks in Healthcare
SCH: INT: Collaborative Research: High-throughput Phenotyping on Electronic Health
Records using Multi-Tensor Factorization
Jimeng Sun, Bradley Malin, Joshua Denny, Joydeep Ghosh, Abel Kho
Funding Source: NSF Smart Connect Health Integrated Grant: Award Number 1418511
http://www.sunlab.org/research/phenotyping/
Techniques
Task 1: Phenotyping Generation: How to turn EHR data into meaningful clinical concepts (Phenotypes)?
Task 2: Phenotyping Refinement: How to incoporate feedback to ensure the generated phenotypes clinically meaningful?
Task 3: Phenotyping Adaptation: How to port phenotypes from one institution to another?
Applications
App 1: Cohort Construction: Validate the generated phenotypes recover some existing phenotypes (from PheKB)
App 2: GWAS: Develop genomic-wide association studies using the generated phenotypes (as target or control variables)
App 3:Predictive modeling: Use generated phenotypes as features to faciliate predictive modeling https://arxiv.org/abs/1704.03141

Tensor Networks in Industry
Animashree Anandkumar
Associate Professor (with tenure)
University of California Irvine
I am a faculty at CS department within ICS at University of California Irvine since December 2016. Before that I was
a faculty at EECS department at UCIrvine since August 2010. I am a member of the center for pervasive
communications and computing (CPCC).
I am currently a principal scientist at Amazon Web Services (AWS) and on leave from UCI.
My research focus is in the high-dimensional learning of probabilistic graphical models and latent variable models.
Broadly I am interested in machine learning, high-dimensional statistics, tensor methods, statistical physics,
information theory and signal processing.
https://youtu.be/gEFaLKzrKYc?t=6m52s
https://youtu.be/KmvZu9qJNzg?t=7m15s
https://youtu.be/B4YvhcGaafw?t=5m40s
https://www.oreilly.com/ideas/lets-build-open-source-tensor-
libraries-for-data-science

“Model Refinement” Techniques

UNCERTAINTY ANALYSIS
’Layperson’background
development at internet giants like Google and Facebook.
https://www.wired.com/2016/12/uber-buys-mysterious-startup-make-ai-company/

UNCERTAINTY ANALYSIS
In practicefor retinalimaging
https://doi.org/10.1101/084210
Here we propose to estimate the uncertainty of DNNs in medical
diagnosis based on a recent theoretical insight on the link between
dropout networks and approximate Bayesian inference. Using the example
of detecting diabetic retinopathy (DR) from fundus photographs, we
show that uncertainty informed decision referral improves diagnostic
performance. Experiments across different networks, tasks and datasets
showed robust generalization.
Depending on network capacity and task/dataset difficulty, we surpass
85% sensitivity and 80% specificity as recommended by the NHS when
referring 0%-20% of the most uncertain decisions for further inspection.
We analyse causes of uncertainty by relating intuitions from 2D
visualizations to the high-dimensional image space, showing that it is in
particular the difficult decisions that the networks consider uncertain.
bioRxiv preprint first posted online
Oct. 28, 2016

Visualizing disease Clinicians want answers
Mitigating the resistance from clinical community, put effort in explaining the diagnosis
Roth et al. (2015)
Ribeiro et al. (2016)Baskaran et al. (2012)
ClinicalHeuristic Glaucomadecision tree
"Cliniciansneedthe data-drivenmodelpredictionstoalignwiththeirdomainknowledge"
Dr. Jenna Wiens @ NIPS 2016, “NIPS 2016 Workshop on Machine Learning for Health”
http://www.nipsml4hc.ws/jenna-wiens
Essentially the causal decision tree becomes now “hard-
to-interpret” deep learning model. How to communicate
this paradigm shift to clinicians?

Visualization state-of-the-art techniques in General
DOI: 10.1111/cgf.13210
An example of modeling
with visual analytics.
BaobabView
[Van den Elzen and van Wijk (2011)]
uses a
tree-like interactive view to
support a manually
controlled decision tree
construction process
An example of model selection. Squares [Ren et al. (2017)] uses small multiples
composed of grids of different colors and visual textures to display the
distribution of probabilities in classification
© VADER Lab at ASU 2017.
All rights for the techniques and
images belong to their respective
owners.

Visualization high-dimensional visualization #1
Shusen Liu ; Dan Maljovec ; Bei Wang ; Peer-Timo Bremer ; Valerio Pascucci
(2016) https://doi.org/10.1109/TVCG.2016.2640960
Dominik Sacha ; Leishi Zhang ; Michael Sedlmair ; John A. Lee ; Jaakko Peltonen ;
Daniel Weiskopf ; Stephen C. North ; Daniel A. Keim
(2016) https://doi.org/10.1109/TVCG.2016.2598495

Visualization high-dimensional visualization #2
http://dx.doi.org/10.1111/cgf.13237
Dimensionality reduction provides a scalable alternative to create visualizations
(projections) that enable insight into the structure of such datasets. However, applying
dimensionality reduction independently for each dataset in a sequence may introduce
unnecessary variability in the resulting sequence of projections, which makes tracking
the evolution of the data significantly more challenging. We show that this issue
affects t-SNE, a widely used dimensionality reduction technique. In this context, we
propose dynamic t-SNE, an adaptation of t-SNE that introduces a controllable trade-
off between temporal coherence and projection reliability. Our evaluation in two
time-dependent datasets shows that dynamic t-SNE eliminates unnecessary temporal
variability and encourages smooth changes between projections.
https://doi.org/10.2312/eurovisshort.20161164

Visualization ”unboxing” ConvNet black box #1
https://arxiv.org/abs/1311.2901; Cited by 2,133 articles
https://doi.org/10.1109/TVCG.2016.2598838
To enable a more intuitive exploration process, we are open-sourcing the Embedding Projector, a
web application for interactive visualization and analysis of high-dimensional data recently
shown as an A.I. Experiment, as part of TensorFlow. We are also releasing a standalone version
at projector.tensorflow.org, where users can visualize their high-dimensional data without the
need to install and run TensorFlow.

Visualization ”unboxing” ConvNet black box #2
HILDA’17, Chicago, IL, USA
http://dx.doi.org/10.1145/3077257.3077260
“ACTIVIS has been deployed on Facebook’s machine learning platform. We present case studies with
Facebook researchers and engineers, and usage scenarios of how ACTIVIS may work with different models.”
Minsuk Kahng is with Georgia Tech; Pierre Andrews is with Facebook; Aditya Kalro is with Facebook; Duen Horng (Polo) Chau.
DARVIZ: deep abstract representation, visualization,
and verification of deep learning models
ICSE-NIER '17 Proceedings of the 39th International Conference on Software Engineering:
New Ideas and Emerging Results Track. https://doi.org/10.1109/ICSE-NIER.2017.13
ShapeShop: Towards Understanding Deep Learning
Representations via Interactive Experimentation
CHI EA '17 Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors
in Computing Systems https://doi.org/10.1145/3027063.3053103

Visualization ”unboxing” recurrent/Sequence black box #1
Uninterpretable examples. Left: Illustration of an arbitrary set of parameters for
an LSTM trained on the MIT-BIH dataset. Numbers indicate different connections
for the input weight vector (rectangle) and the hidden layer weight matrix (square).
Right: The memory values c for arbitrary units in the LSTM trained on the MIT-
BIH data
LSTM hidden unit outputs compared to wavelet coefficients. The top of each column is the
original sample that was correctly classified using the respective LSTM model. The following
two pairs of rows are the cherry-picked pairs of wavelet coefficients and hidden unit outputs
that are roughly similar. The type of wavelet coefficient and the specific hidden unit are
indicated above each plot. The Daubechies wavelet coefficients are 108 time steps long
(instead of 216) because it makes use of the discrete wavelet transform. The wavelet
coefficients were computed using the PyWavelets package in Python
The sample saliencies for the ECG data using different techniques depicted in each column.
The occlusion width is the number of time steps that are occluded per instance. All the
samples shown have a length of 216 time steps (x-axis) and were correctly classified by the
model. The importance of each input step is shown on a scale of 0 to 1, with 1 being the
most important. The type of ECG signal is indicated on the left with LBBB – left bundle
branch block beat, RBBB – right bundle branch block beat, Paced – paced beat, and V-fib –
ventricular fibrillation.
Class mode
visualizations. The
optimized class
modes for the ECG
data (left) and the
MNIST data (right).
Here the input is
optimized with
respect to each class
in order to find the
most likely input for
each class. The class
for each plot is
indicated on the left
of the image. This
technique did not
yield interpretable
results.

Visualization Medical deep learning models #1
Overall illustration of MDNet. We use a bladder image with its diagnostic report as an example. The
image model generates an image feature to pass to LSTM in the form of a task tuple and a Conv
feature embedding (for the attention model) computed by the AAS module (defined in the method).
LSTM executes prediction tasks according to the specified image feature type
The illustration of class-specific
attention. From top to bottom,
test images, pathologist
annotations, and class attention
maps. Like the pathologist
annotations, the attention maps
are most activated in urothelial
regions, largely ignoring stromal
or background regions. Best
viewed in color.
http://dx.doi.org/10.1016/j.oret.2016.12.009
An occlusion test (Zeiler and Fergus, 2016) was performed to identify the areas
contributing most to the neural network's assigning the category of AMD. A blank 20
× 20-pixel box was systematically moved across every possible position in the image
and the probabilities were recorded. The highest drop in the probability represents
the region of interest that contributed the highest importance to the deep learning
algorithm.

Visualization Medical deep learning models #2
Inspired by Zhou et al. (2016), we present in
this section the idea of generating the
Regression Activation Maps (RAM) of an
input image to localize the discriminative region
towards the regression outcomes. It is known
that the convolutional units of each layers of
CNN act as visual concept detectors to identify
low-level concepts like textures or materials, to
high-level concepts like objects or scenes.
Deeper into the network, the units become
increasingly discriminative. However, the fully-
connected layers will make it difficult to
identify the importance of different units for
identifying the output labels (regression values,
in our networks). Instead, using global
averaging pooling (GAP) and the linear
output unit, we can directly visualize the region
of interest (ROI) that are most discriminative
for a given regression value. As we use
regression for the purpose of classification,
each single RAM obtained for each single
image explicitly depict the ROI on different
clinical level.
In this work, we provided a deep learning model that
includes regression activation maps layer (RAM). The
RAM layer can provide the robust interpretability of the
proposed detection model by monitoring the
pathogenesis so that the proposed model can be taken
as an assistant for clinicians

Interpretability to EHR Mining and decision making #1
https://youtu.be/co3lTOSgFlA
The source code of RETAIN is publicly available at
https://github.com/mp2893/retain Model Interpretation for Heart Failure Prediction We demonstrate the interpretability of RETAIN by
studying its behavior in the HF prediction task. We choose a HF patient from the test set and calculate the contribution of
the variables (medical codes in this case) for making the binary prediction. Figure 3a is the visualization of the
contributions of the variables in each visit. The patient suffered from skin problems, skin disorder (SD), benign neoplasm
(BN), excision of skin lesion (ESL), for some time before showing symptoms of HF, cardiac dysrhythmia (CD), heart valve
disease (HVD) and coronary atherosclerosis (CA), then being diagnosed with HF at the end. We can see that skin-related
codes from the earlier visits made little contribution to HF prediction as expected. RETAIN properly puts much attention
to the HF-related codes that occurred in recent visits.

Interpretability to EHR Mining and decision making #1
GRAM: Graph-based Attention Model for Healthcare
Representation Learning
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, Jimeng Sun‘
last revised 1 Apr 2017 (this version, v3)
“Deep learning methods exhibit promising performance for predictive modeling in healthcare, but
two important challenges remain: - Data insufficiency: Often in healthcare predictive modeling,
the sample size is insufficient for deep learning methods to achieve satisfactory results.
-Interpretation: The representations learned by deep learning methods should align with medical
knowledge. To address these challenges, we propose a GRaph-based Attention Model, GRAM that
supplements electronic health records (EHR) with hierarchical information inherent to medical
ontologies.”
https://jkulas12.github.io/GRAM_Visualization/ :

Dataset Size How much samples?
The more the better, but there are obvious problems with obtaining huge medical datasets
(A) The number of misclassified images on each body part class and (B) of total
misclassified ones on whole body in increasing number of training data sets.
Classification accuracy results according to increasing size of training data sets
There is rule-of-thumb (#1)stating that one should have
10x the number of samples as parameters in the
network (for more formal approach, see VC dimension),
and for example the ResNet (He et al. 2015) in the
ILSVRC2015 challenge had around 1.7M parameters,
thus requiring 17M images with this rule-of-thumb.
https://www.researchgate.net/post/What_is_the_minimum_sample_size_required_
to_train_a_Deep_Learning_model-CNN

Dataset Size How much samples?
More is better always if you train with higher capacity models
Since 2012, there have been significant advances in
representation capabilities of the models and
computational capabilities of GPUs. But the size of the
biggest dataset has surprisingly remained constant. What
will happen if we increase the dataset size by 10× or
100×?
Our experiments yield some surprising (and some
expected) findings:
Better Representation Learning Helps! Our first observation is that large-scale
data helps in representation learning as evidenced by improvement in performance
on each and every vision task we study. This suggests that collection of a larger-
scale dataset to study pretraining may greatly benefit the field. Our findings also
suggest a bright future for unsupervised or self-supervised [10, 42]
representation learning approaches. It seems the scale of data can overpower
noise in the label space.
Performance increases linearly with orders of magnitude of training data!
Perhaps the most surprising element of our finding is the relationship between
performance on vision tasks and the amount of training data (log-scale) used for
representation learning. We find that this relationship is still linear! Even with
300M training images, we do not observe any plateauing effect for the tasks
studied.
Capacity is Crucial: We also observe that to fully exploit 300M images, one needs
higher capacity models. For example, in case of ResNet-50 the gain on COCO
object detection is much smaller (1.87%) compared to (3%) when using ResNet-
152.
Training with Long-tail: Our data has quite a long tail and yet the representation
learning seems to work. This long-tail does not seem to adversely affect the
stochastic training of ConvNets (training still converges).
New state of the art results: Finally, our paper presents new state-of-the-art
results on several benchmarks using the models learned from JFT-300M. For
example, a single model (without any bells and whistles) can now achieve 37.4 AP
as compared to 34.3 AP on the COCO detection benchmark.

Dataset Size data augmentation #1
s
Images from:
ftp://ftp.dca.fee.unicamp.br/pub/docs/vonzuben/ia353_1s15/topico10_IA353_1s2015.pdf |
Wu et al. (2015)
Synthetically increase the number of training sample by distorting them in way expected from the dataset (random
xy-shifts, left-right flips, add gaussian noise, blur, etc.) → This have shown to reduce overfitting.
As noted in the previous slides on image quality, it is useful to train the model with various image quality levels
Köhler et al. (2013)
The most successful convolutional architectures are developed starting from ImageNet, a large
scale collection of images of object categories downloaded from the Web. This kind of images is
very different from the situated and embodied visual experience of robots deployed in
unconstrained settings. To reduce the gap between these two visual experiences, this paper
proposes a simple yet effective data augmentation layer that zooms on the object of interest
and simulates the object detection outcome of a robot vision system. The layer, that can be used
with any convolutional deep architecture, brings to an increase in object recognition performance
of up to 7%, in experiments performed over three different benchmark databases.

Dataset Size data augmentation #2
Apply domain-specific perturbations
Dataset Augmentation in Feature Space
Terrance DeVries, Graham W. Taylor(Submitted on 17 Feb 2017)
Dreaming More Data: Class-dependent Distributions over
Diffeomorphisms for Learned Data Augmentation
Søren Hauberg, Oren Freifeld, Anders Boesen Lindbo Larsen, John Fisher, Lars
Hansen ; Proceedings of the 19th International Conference on Artificial Intelligence and Statistics,
PMLR 51:342-350, 2016.
http://proceedings.mlr.press/v51/hauberg16.html
Our approach is, however, not limited to MNIST:
●
Image alignment and registration is a routine task in many medical imaging tasks,
such as the analysis of MRI.
●
We make similar observations for time-series data such as acoustic signals. Here
dynamic time warping (DTW) is often used as preprocessing to remove
differences in the temporal speed of individual signals.
●
Mesh alignment is also standard pre-processing step in the analysis of three-
dimensional meshes. As deep models are beginning to appear for three-
dimensional data it would be interesting to combine them with learned
augmentation schemes.
https://doi.org/10.1016/j.neucom.2016.12.025
In this paper, we propose five data augmentation methods dedicated to face images,
including landmark perturbation and four synthesis methods (hairstyles, glasses, poses,
illuminations). The proposed methods effectively enlarge the training dataset, which
alleviates the impacts of misalignment, pose variance, illumination changes and partial
occlusions, as well as the overfitting during training

Dataset Size Generative synthetic data
Augmentation through generative adversarial models (GAN)
the CVPR 2017 awards are out! The two winners are
Densely Connected Convolutional Networks by Facebook and
Improving the Realism of Synthetic Images
https://machinelearning.apple.com/2017/07/07/GAN.html
https://github.com/wayaai/SimGAN
https://github.com/val-iisc/deligan
TextureGAN: Controlling Deep Image Synthesis with
Texture Patches
Wenqi Xian, Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
(Submitted on 9 Jun 2017)

Dataset Size semi-supervised training #1
Jointly use labeled and unlabeled data
Our empirical results show that using the tangents of the data manifold (as estimated by the
generator of the GAN) to inject invariances in the classifier improves the performance on semi-
supevised learning tasks.
N. Siddharth, Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Frank Wood,
Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr
Here we are interested in learning disentangled representations that encode
distinct aspects of the data into separate variables. We propose to learn such
representations using model architectures that generalize from standard
Variational autoencoders (VAEs) employing a general graphical model structure
in the encoder and decoder. This allows us to train partially-specified models
that make relatively strong assumptions about a subset of interpretable
variables and rely on the flexibility of neural networks to learn representations
for the remaining variables. We further define a general objective for semi-
supervised learning in this model class, which can be approximated using an
importance sampling procedure.

Dataset Size semi-supervised training #2
In this work, we present a semi-supervised learning framework that
uses generated data to boost task performance. Under this
framework, we characterize the properties of various generators
and theoretically prove that a complementary (i.e. bad) generator
improves generalization. Empirically our proposed method improves
the performance of image classification on several benchmark
datasets.
Our proposed method, adversarial dropout, can be viewed from the
dropout and from the adversarial training perspectives. Our
proposed adversarial dropout can be interpreted as dropout masks
whose direction is counter-optimized, adversarially, to the model’s
label assignment. However, it should be noted that adversarial
dropout and traditional adversarial training with additive
perturbation are different because adversarial dropout induces the
sparse structure of neural network while the other do not make
changes on the neural network directly.

Dataset Size Active learning and “smart” labeling #1
When labelingis very time-consuming, activelearning can help us in choosing which unlabeled samples to label
Active Learning and Proofreading
for Delineation of Curvilinear
Structures
Mosinska, Agata Justyna; Tarnawski, Jakub; Fua, Pascal
Presented at: MICCAI, Quebec City, Canada, September 10-14, 2017
https://infoscience.epfl.ch/record/229472

Dataset Size Transfer learning
Leveraging features learned from bigger non-medical datasets
Our approach fine-tunes a pre-trained convolutional neural network (CNN),
GoogLeNet. The fine-tuned CNN could effectively identify pathologies in
comparison to classical learning. Our algorithm aims to demonstrate that
models trained on non-medical images can be fine-tuned for classifying OCT
images with limited training data.
Biomedical Optics Express Vol. 8, Issue 2, pp. 579-592 (2017)
International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis
International Workshop on Deep Learning in Medical Image Analysis
LABELS 2016, DLMIA 2016: Deep Learning and Data Labeling for Medical Applications pp 188-196
Understanding the Mechanisms of Deep
Transfer Learning for Medical Images
Hariharan Ravishankar, Prasad Sudhakar, Rahul Venkataramani, Sheshadri Thiruvenkadam, Pavan Annangi, Narayanan
Babu, Vivek Vaidya
Deep Learning and Convolutional Neural Networks for Medical Image Computing Pp 181-193
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)
On the Necessity of Fine-Tuned Convolutional Neural Networks for
Medical Imaging
Nima Tajbakhsh, Jae Y. Shin, Suryakanth R. Gurudu, R. Todd Hurst, Christopher B. Kendall,
Michael B. Gotway, Jianming Liang
In this paper, we studied the necessity of fine-tuning and the effective level
of knowledge transfer to 4 medical imaging applications. Our experiments
demonstrated medical imaging applications were conducive to transfer
learning and that fine-tuned CNNs were necessary to achieve high
performance particularly with limited training datasets. We also showed that
the desired level of fine-tuning differed from one application to another.
While deeper levels of fine-tuning were suitable for polyp and PE detection,
intermediate fine-tuning worked the best for interface segmentation and
colonoscopy frame classification. Our findings led us to conclude that layer-
wise fine-tuning is a practical way to reach the best performance based on
the amount of available data.

Dataset Quality Beyond
A giant with feet of clay: on the validity of the
data that feed machine learning in medicine
Federico Cabitza, Davide Ciucci, Raffaele Rasoini last revised 26 Jun 2017
We point out how uncertainty is so ingrained in medicine that it
biases also the representation of clinical phenomena, that is the
very input of ML models, thus undermining the clinical
significance of their output. Recognizing this can motivate both
medical doctors, in taking more responsibility in the
development and use of these decision aids, and the
researchers, in pursuing different ways to assess the value of
these systems. In so doing, both designers and users could take
this intrinsic characteristic of medicine more seriously and
consider alternative approaches that do not "sweep uncertainty
under the rug" within an objectivist fiction, which everyone can
come up by believing as true.
5 Garbage in, Gospel out
The question of the quality of medical record and of the data
extracted from there is still understudied [
Cabitza and Batini, 2016; Stetson et al. 2012], let alone in
regard to machine learning projects [Feldman et al. 2017]. The
assumption that medical data could support secondary uses
has been challenged since almost 25 years ago, and also
strongly so, e.g., by Reiser 1991, who described several cases
of erroneous, missing and ambiguous data, and by
Burnum (1989), who provocatively wrote that “all medical
record information should be regarded as suspect; much of it is
fiction” (p. 484)”
JAMA. Published online July 20, 2017. doi: 10.1001/jama.2017.7797
https://doi.org/10.1177/0272989X12465490
Conclusions: Our exploratory analysis method reveals
unexpected effects. It indicates that, despite the original
study detecting no significant average effect, computer-
aided detection (CAD) helped the less discriminating
readers but hindered the more discriminating readers.
Such differential effects, although subtle, may be clinically
significant and important for improving both computer
algorithms and protocols for their use. They should be
assessed when evaluating CAD and similar warning
systems.

RETINA
A schematic view of the retina showing the organization of different neuronal populations and
their synaptic connections. Rods and cones are confined to the photoreceptor layer. Light
detected by rods and cones is processed and signalled to retinal ganglion cells (RGCs) through
horizontal, amacrine and bipolar cells. RGCs are the only output neurons from the
retina to the brain. A subset of RGCs (4–5% of the total number of RCGs) are intrinsically
photosensitive RGCs (ipRGCs) containing the photopigment melanopsin. There are at least
five subtypes of ipRGCs (M1–M5) with different morphological and electrophysiological
properties, which show widespread projection patterns throughout the brain.
LeGates et al. (2014):
“Light as a central modulator of circadian rhythms, sleep and affect”
Retinal circuits. (a) The cellular and synaptic (i.e., plexiform) layers of the retina. Some of the various
cell types composing the five classes of neurons are shown: rod and cone photoreceptors, horizontal
cells (HCs), ON and OFF cone bipolar cells (BCs), rod BCs, AII and wide-field (WF) amacrine cells
(ACs), and ON and OFF ganglion cells (GCs). The ON and OFF BC axon terminals and GC dendrites
stratify in separate halves of the inner plexiform layer. (b) Several cell types from panel a, redrawn to
illustrate how rod signals pass through the inner retina. Excitatory (+) and inhibitory (−) synapses are
shown. A gap junction (denoted by the resistor symbol) allows bidirectional current flow between AII
ACs and ON cone BCs. The AII AC splits the ON rod BC signal into ON and OFF components using
either electrical (gap junction, ON) or chemical (glycinergic, OFF) synapses. Note that in daylight
conditions, cone-mediated drive to the AII influences the OFF pathway as follows: cone → ON cone
BC → AII AC → OFF BC and GC.
Example of circuit switching. (a) The
excitatory input to an ON ganglion cell (GC) is
driven by both rod and cone circuits. The rod
circuits actually signal via the cone bipolar cell
terminal. The inhibition from the surround is
mediated by a wide-field amacrine cell (WF AC)
driven exclusively by cone circuits. (b) When the
rod circuit is active, the ON GC has a receptive field
with an excitatory center component only. When
the cone circuit is active, the inhibitory surround
component switches on.
Synaptic motifs. (a) From the perspective of a bipolar cell (pipette
attached), inhibition arising from amacrine cells (ACs) occurs via multiple
synaptic motifs. Excitatory (+) and inhibitory (−) synapses are indicated;
feedback and feedforward synapses can occur in both ON and OFF
systems, and crossover inhibition acts between ON and OFF systems. The
illustrated circuit is an ON → OFF inhibitory one, but the opposite pattern
(OFF → ON) could also occur. (b) From the perspective of a ganglion cell
(GC) (pipette attached), inhibition from ACs occurs via multiple synaptic
motifs. This panel follows the same conventions as used in panel a.
Note! Melanopsin-containing retinal ganglion cells (pRGC, ipRGC,
mRGC, the same thing) were discovered only recently in 2002 by
Berson et al. [Cited by 1956], thus you might find them missing from
textbook versions of retinal circuits
Initially they were thought of contributing mainly on sleep/alertness
and circadian rhythm regulation, but recently it has been shown that
they contribute to image forming vision as well.

RETINA response characteristics: Spectral #1
SPECTRAL PROPERTIES
Teikari thesis (2012)
Enezi et al. 2011.
Stockmann And Sharpe (2000), CVRL
Govardovskii et al. 2000
van de Kraats and van Norren 200
7
Walraven 2003 CIE Report
“For environmental light”
“At retinal level
if you would not have
ocular media”
The absorbance spectrum of an exemplary vertebrate rhodopsin (lmax
~ 500 nm), considered as a sum of absorbance bands, indicated by
alpha (a), beta (b), gamma (g), sigma (s) and epsilon (e) normalized to
the peak absorbance of the alpha-band (after
Stavenga and van Barneveld 1975, from Stavenga 2010).
The sidelobe on the short-wave side
come from the beta band (see
template from Govardovskii et al. 2000)
Self-screening effect changes the
width/peak of the absorption spectrum. (A)
Percentage absorption spectra of various concentrations of
photopigment (OD - optical density in log units). (B) An
illustration of self-screening in at various photoreceptor lengths.
Human rod photoreceptor is ~25 mm, (Pugh and Lamb 2000)
and the cone photoreceptor 13 mm (Baylor et al. 1984). The
longest known photoreceptor has been found in dragonfly, the
length being 1,100 mm (Labhart and Nilsson 1995).
“Human crystalline lens
strongly absorb blue light
and UV”
V'(l) is the spectral sensitivity for night vision, and V(l) for
daytime vision. Not shown is mesopic vision VM(l) that is a
nonlinear combination of daytime and night vision operating on
dim light color vision.
Quantally defined daytime sensitivity
(2º central vision, Sharpe et al., 2005):
V*(l) = [1.891·l(l) + m(l)]/2.80361
Where l is long-wavelength ('red') cone sensitivity,
and m medium-wavelength (green) cone sensitivity
Note!
Melanopsin and
Scones do not
seem to
contribute to
central vision
luminance
perception
vs.
RGB Luminance
Stockman, A., & Sharpe, L. T. (2008).
Spectral sensitivity In The Senses: A
Comprehensive Reference, Volume 2: Vision
II (pp. 87-100)
Goodeve et al., 1942
Without the
crystalline
lens
(aphakic
eye), visual
sensitivity
would
extend to
ultraviolet

Dominance of L cones over S cones across species. Measured S cone proportion is shown for a
variety of animals. For some animals, two measurements at different locations on the retina are shown.
Large variation in L cone proportion indicates dorso-ventral asymmetries, like those discussed in
Szél et al. (2000).
Science 10 Jun 2011:Vol. 332, Issue 6035, pp. 1307-1312 DOI: 10.1126/science.1200172
For a short wavelength–sensitive pigment, although
its noise literally disappears at lmax < 400 nm (Fig. C),
nonspecific light absorption by proteins, peaking at
~280 nm, becomes a limiting factor. These
considerations probably explain, at least partially, why
the lmax values of native visual pigments are
confined to the narrow bandwidth of ~360 to
620nm,limitingcolorvisionaccordingly.
Predicted thermal-noise rate constant
as a function of lmax . Black circles,
rhodopsins; red squares, cone
pigments.
http://dx.doi.org/10.1016/S0896-6273(00)80845-4
Present-day vertebrates vary enormously in the sophistication of their color vision, the density and spatial distribution of
cone classes, and the number and absorption maxima of their cone pigments (38, 30, 31 and 100). At one extreme, most
mammals have only three pigments: the two ancestral cone pigments and rhodopsin. At the other evolutionary extreme,
chickenspossess sixpigments: fourconepigments,onerhodopsin, and a pinealvisual pigment, pinopsin.
In this evolutionary comparison, humans and their closest primate relatives represent an intermediate level of
complexity. Humans have four visual pigments (1999): a single member of the <500 nm family of cone pigments (the
blue or short-wave pigment, with an absorption maximum at 425 nm), two highly homologous members of the >500 nm∼
family (the green or middle-wave pigment, and red or long-wave pigment, with absorption maxima at 530 and 560 nm,∼ ∼
respectively), and rhodopsin. The presence of only a single gene encoding a >500 nm pigment in almost all New World
primates, and in all nonprimate mammals studied to date, places the red/green visual pigment gene duplication in the Old
World primate lineage at 30–40 million years ago, shortly after the geologic split between Africa and South America (∼
Jacobs1993)
DOI:10.1098/rstb.2009.0050
The spectral tuning of vertebrate opsins will also be influenced by their evolutionary
history (Goldsmith1990). Melanopsin acts as a bistable pigment able to regenerate
(recycle) itschromophore (11-cis-retinal) using all-trans-retinal and long-wavelength light
inamanner reminiscentoftheinvertebratephotopigments(Melyan etal.2005).
In this regard melanopsin may be unique among mammalian photopigments in
formingastableassociationwithall-trans-retinal.
Interestingly, the melanopsins appear to share some of the key characteristics of an
invertebrate-like signal transduction pathway. Both pRGCs and cells transfected
with melanopsin show depolarizing responses to light and, displays chromophore
bistability/tristability (Emanueletal.2015) another feature of the invertebrate
photopigments. Amino acid sequence features of melanopsin protein resulting delayed
deactivationoftemporalintegrationoflightsignal(Mureetal.,2016).

Transducing intermediate pigment states
Schematic representation of the photochemical invertebrate rhodopsin
cycle of blowfly (Calliphora). rhodopsin R excited by light absorption converts
to bathorhodopsin B. Thermal decay via lumirhodopsin L to metarhodopsin M
follows. The back reaction proceeds via putative intermediates K and possibly
N.Timeconstants oftheconversion stepsareindicated (Kruizinga etal. 1983 ).
Vertebrate rhodopsin intermediates. (A)Decay of theactivated Meta II state
to Meta III. Illumination of rhodopsin’s dark state (lmax = 500 nm ) produces the
Meta I/Meta II photoproduct equilibrium. By applying a second illumination, the
decay product Meta III of the second pathway can be converted back to Meta
I/Meta II (again consisting mostly of Meta II), while the decay products of the first
pathway, opsin and all-trans retinal, remain largely unreactive .(B, Bovine)
rhodopsin transduction. Activation of rhodopsin is achieved by light-dependent
isomerization of the chromophore and subsequent thermal relaxation of the
receptor on the millisecond time scale to the active receptor conformation (
Bartl and Vogel 2007).
A State Model for Tristable Melanopsin (A) State diagram of
melanopsin (top) based on parameters measured biochemically from
purified pigment (Matsuyamaet al., 2012). Shown are melanopsin (R),
metamelanopsin (M), and extramelanopsin (E) with chromophores
designated. Below are plotted the relative photosensitivities (i.e., products of
the extinction coefficients and quantum efficiencies) of these states as a
function of wavelength. (B) Predicted equilibrium fraction of each pigment
state as a function of wavelength. Lines show the R state (black), M state
(blue), and E state(red) - Emanuel etal. 2015
Photoreversal of vertebrate rhodopsin (Williams1964). Both the test flash
and the bleaching light consisted of long wavelengths primarily absorbed by
rhodopsin. The blue, photoregenerating flash contained wavelengths
absorbed by the longer-lived intermediates of the bleaching process. This
photoreversal might in practice enhanceBlue Light Hazard (Grimmetal.2000).
Regeneration of pigment to responsive state by
second illumination both with 'invertebrate'-like melanopsin
andvertebraterhodopsin.
DOI: 10.1042/bj3301201
http://dx.doi.org/10.1016/j.visres.2005.12.017 | Cited by 26
Time courses on amounts of photolysis products in goldfish cones
normalized tobleached visual pigment.
Decompositionof final T- and L-spectra
of rod outer segments at 1800 s
postbleach (noisy curves) into
components. It reveals, in addition to
dehydroretinal and P480, a generation
of a small amount of dehydroretinol.
The sum of RAL, ROL, and P480 (bold
curves) provides a good approximation
of theexperimental spectra.

Genetic and psychophysical results from the latter class indicated
that limited red–green discrimination can be achieved with
pigments that have the same peak wavelength sensitivity and that
differ only in optical density.
Types of color blindness with their prevalence
faculty.montgomerycollege.edu
http://dx.doi.org/10.1016/j.visres.2011.08.016
sensationalcolor.com/understanding-color
www.npr.org/2014/11/16
http://www.bbc.co.uk/news/entertainment-arts-27884975
By Colin Schultz | smithsonian.com | August 20, 2012

A tiny group of people can see ‘invisible’ colours (“tetrachromacy” - four cones,
instead of three cones) that no-one else can perceive, discovers David Robson.
How do they do it?
“Jordan’s “acid test” involved coloured discs showing different mixtures of
pigment, such as a green made of yellow and blue. The mixtures were
too subtle for most people to notice: almost all people would see the same
shade of olive green, but each combination should give out a subtly different
spectrum of light that would be perceptible to someone with a fourth cone.
Sure enough, Jordan’s subject was able to differentiate between the different
mixtures each time. “
http://www.bbc.com/future/story/20140905-the-women-with-super-human-vision
While tetrachromacy is so rare that it makes headlines every time a new case
emerges, it might come as a surprise that women with four cone types in their
retinas are actually more common than we think. Researchers estimate that they
represent as much as 12% of the female population (4). So why aren’t we
surrounded by women with extraordinary colour vision? Researchers have found
that only a small fraction of women who possess an extra cone type actually get to
enjoy more colours. So what does it take to be a true tetrachromat? How does the
human retina come to produce four cone types, and why does it only concern
women? More importantly, why don’t all women fulfil their genetic potential? And
how do we find the special women who do?
theneurosphere.com/2015/12/17
[4] Jordan, G. et al. (2010). The dimensionality of color vision in carriers of anomalous trichromacy.
Journal of Vision, 10.
Tetrachromats are rare enough, but Concetta Antico is particularly remarkable, since, as an artist, she is able to
give us a rare view into that world. “Her artwork might tap into a structure that all of us can appreciate,” says
Kimberly Jameson at the University of California, Irvine, who has studied Antico extensively. It’s even possible
that she might suggest ways for more people to see the same way.

RETINA response characteristics: Intensity
Illumination levels. Typical ambient light levels are
compared with photopic luminance (log cd m-2
), pupil
diameter (mm), photopic and scotopic retinal illuminance
(log photopic and scotopic trolands respectively) and
visual function. The scotopic, mesopic and photopic
regions are defined according to whether rods alone,
rods and cones, or cones alone operate. The conversion
from photopic to scotopic values assumed a white
standard CIE D65 illumination. (
Stockman and Sharpe 2006)
How these four separate mechanisms — photopigment
depletion, pupil contraction, cellular adaptation and response
compression — coordinate luminance adaptation is not yet
known. However, Peter Kaiser and Robert Boynton provide a
quantitative illustration of how the four principal processes
might interact, as shown below.
http://www.handprint.com/HP/WCL/color4.htm
l
http://www.handprint.com/HP/WCL/color4.htm
l
Top right: Spectral response of the eye for point sources. Peak cone
sensitivity is over 200 times lower than peak rod sensitivity.
Relative sensitivities of S, L and M cones are shown within photopic mode; by
combining their inputs, the brain creates colors. Bottom left: Exposed to low-
light conditions in full photopic mode, cone sensitivity increases 30-100 times
within ~10 minutes, reaching its maximum sensitivity level (the darker it is, the
faster transition from cones-to-rods function; in near-complete darkness, the
cones shut down almost instantly). At the point of cones-rods break, rods
become dominant, gaining in sensitivity some 200-1000 times over peak cone
sensitivity within the next ~20 minutes (individual sensitivity varies within the
shown approximate range: by a factor of ~3 and ~10 for the cones and rods,
respectively). In the process, peak sensitivity shifts from ~555nm (photopic) to
~507nm (scotopic). The response range shifts from ~400-730nm to ~370-
650nm, respectively. Dark-to-light eye adaptation takes considerably less:
only about 7 minutes.
a
Maximum sensitivity level, after ~10 min in darkness; maximum bright-light
cone sensitivity is 30-100 times lower.
http://www.telescope-optics.net/eye_spectral_response.htm

RETINA response characteristics: Circuit
(A) Time course of the Early Receptor Potential (ERP) and Late Receptor Potential (LRP) in
monkey retina compared to ERG a-wave (redrawn from Brown and Murakami1964). (B)
Intensity-dependence of human ERP illustrating the log-linear relationship between light
intensity in pigment-level responses, and non-linear relationship between light intensity and a-
wave response (redrawn from Debecker and Zanen 1975 ). Graph from Teikari thesis (2012).
The cells of the retina and their response to a spot light flash. The photoreceptors are the rods and
cones in which a negative receptor potential is elicited. This drives the bipolar cell to become either
depolarized or hyperpolarized. The amacrine cell has a negative feedback effect. The ganglion cell
fires an action pulse so that the resulting spike train is proportional to the light stimulus level. (bem.fi)
The classical photoreceptors cones and rods are not designed to encode absolute light levels (unlike
melanopsin RGCs), and non-linearity in the visual processing is introduced very early already on the
retinal level. The pigment conformational change (cis to trans) is linear in relation to light intensity, but
photoreceptor response already is nonlinear (a-wave)
The dependence of the b-wave amplitude
(solid squares) and the a-wave amplitude
(open squares) on the log intensity of the
light stimulus. The data points describe the
mean ± 2 SD of the responses obtained in
the dark-adapted state from 40 eyes of20
volunteers with normal vision.
The relationship between he b wave amplitude and the a wave amplitude
obtained from responses evoked in the dark-adapted state. The continuous
line describes the mean relationship, while the 2 dashed lines bind the
normal range (mean+2SD). Open and solid triangles represent normal ERG
data obtained, respectively, from papers by Berson and Weleber. Data of 2
patients are also illustrated; one patient suffers from high myopia (open
circles), while the other complained of nyctalopia (solid circles).
Relationship between
the amplitudesof the b
wave and the a wave asa
useful index for
evaluating the
electroretinogram.
I. Periman
Br JOphthalmol
1983;67:443-448
doi:10.1136/bjo.67.7.443
Cited by69

Retina advanced processing
(2010) http://dx.doi.org/10.1016/j.neuron.2009.12.009, Cited by 266
(A) Detection of dim light flashes in the rod-
to-rod bipolar pathway. Each photoreceptor
output is sent through a band-pass temporal
filter followed by a thresholding operation
before summation by the rod bipolar cell.
Computations Performed by the Retina and Their Underlying Microcircuits.
(B) Sensitivity to texture motion. The bipolar
cells have biphasic dynamics and thus
respond transiently. Only the depolarized
bipolar cells communicate to the ganglion cell,
because of rectification in synaptic
transmission.
(C) Detection of differential motion Polyaxonal
amacrine cells in the periphery are excited by
the same motion-sensitive circuit and send
inhibitory inputs to the center. If motion in the
periphery is synchronous with that in the
center, the excitatory transients will coincide
with the inhibitory ones, and firing is
suppressed.
(D) Detection of approaching motion. The
circuit that generates this approach sensitivity is
composed of excitation from OFF bipolar cells
and inhibition from amacrine cells that are
activated by ON bipolar cells, at least partly via
gap junction coupling. Importantly, these inputs
are nonlinearly rectified before integration by the
ganglion cell
(E) Rapid encoding of spatial structures with
spike latencies The responses result from a circuit
that combines synaptic inputs from both ON and
OFF bipolar cells whose signals are individually
rectified. The timing differences in the responses
follow from a delay (∆t) in the ON pathwa. y
(F) Switching circuit. A control signal selectively
gates one of two potential input signals. (Right) In
the retina, such a control signal is driven by certain
wide-field amacrine cells (A1), which are activated
during rapid image shifts in the periphery. Their
activation leads to a suppression of OFF bipolar
signals and, through a putative local amacrine cell
(A2), to disinhibition of ON bipolar signals
Journal of Vision May 2008, Vol.8, 15
doi: 10.1167/8.5.15
Cited by 42
Basic data from Hofer, Singer, et al. (2005). (Top
panels) Schematics of retinal mosaics used for
the 5 observers. These are subsets of the full
regions characterized for each observer. For
each observer, the imaging and densitometry
data were insufficient to assign a class or exact
location to some cones. These parameters were
filled in according to the procedure described by
Hofer, Singer, et al. (2005). In the schematics, L
cones are colored red, M cones green, and S
cones blue. L:M ratios of mosaics used: HS
1:3.1; YY 1.2:1; AP 1.3:1; MD 1.6:1, BS, 14.7:1
Roorda (2011): Instead of eliciting three classes of response generated from the stimulation of the
three cone classes, they found that subjects demanded as many as 7 color categories. Analysis
of the responses suggested that the color appearance generated by a single cone is more a function of
how it is situated with respect to other cones rather than by its spectral subtype. Cones that are in a
position to provide strong chromatic cues generate colored percepts, whereas cones that are not in a
good position to do so generate achromatic, or white percepts. Given the random arrangement of the
three cone classes in the retina, it is sensible that the visual system would develop in this way to best
handle the dual role that retina has in conveying both spatial and color vision.
An adaptive optics
system was used to
measure and correct
for aberrations in the
optics of individual
observers. This
enabled resolution of
individual cones in
acquired fundus
images.
Science Advances 14 Sep 2016:
Vol. 2, no. 9, e1600797

Retina already recurrent as well
Published: May 3, 2011
http://dx.doi.org/10.1371/journal.pbio.1001058
Published: May 3, 2011 | http://dx.doi.org/10.1371/journal.pbio.1001057
A conceptual model of positive feedback in the outer retina. (A)
Diagram depicting the differential spread of positive and negative feedback within
an HC. The top bar denotes the illumination pattern. A cone depolarized in darkness
will release glutamate, activating AMPA receptors (APMARs), causing depolarization
and Ca2+ influx. The rise in Ca2+ is restricted to the specific dendrite that contacts
the cone, and the resulting positive feedback is localized to that cone. The
depolarization spreads electrotonically through the HC, resulting in negative
feedback from all of the dendrites. (B) Model simulations of the effect of feedback
on synaptic release from a linear array of cones exposed to a dark spot on a non-
saturating light background (see Methods). The positive feedback signal (blue)
is localized to HC dendrites in contact with dark cones while the negative
feedback signal (red) electrotonically spreads through the HCs. Traces show
simulated cone release with no feedback (green), with negative feedback, (red), and
with equally weighted negative and positive feedback (blue).
Spatial circuitry models in dark- and light-adapted conditions. A: in
dark-adapted conditions, OFF bipolar cells receive wide spatial inhibition from wide-
field GABAergic amacrine cells. Coupling between both AII and other glycinergic
amacrine cells likely contribute to increasing the wide spatial spread of glycinergic
signals to OFF bipolar cells. B: in light-adapted conditions, OFF bipolar cells receive
spatially narrow glycinergic input, likely due to uncoupling of AII and other glycinergic
amacrine cells. Light stimuli distant from the bipolar cell likely active serial inhibitory
connections between GABAergic amacrine cells, which would shorten spatial
GABAergic signals to OFF bipolar cells. C: functional schematic of changing bipolar
cell center-surround sizes. In dark-adapted conditions, OFF bipolar cells receive
wide and strong inhibition, so their inhibitory surrounds are large. If 2 small spots
of light are presented to the retina, spot A stimulates excitatory output from the
center of one OFF bipolar cell, whereas spot B stimulates surround inhibitory
connections to that same cell. Overall output is reduced in this instance due to the
addition of inhibitory input. In light-adapted conditions, OFF bipolar cells receive
narrow and weaker inhibition, so their inhibitory surrounds are small. In these
conditions, spot B does not stimulate the inhibitory surround, and there is no
reduction in excitatory bipolar cell output from spot A. Thus the strength of the
bipolar cell output in the light-adapted case is stronger
doi:10.1152/jn.00948.2015

Neuroscience Deep Learning |
Background

fMRI+EEG+Behavioral data multimodal data
http://dx.doi.org/10.1016/j.neuroimage.2015.12.030
Specifically, we show how combining either EEG and
fMRI with a behavioral model can perform substantially
better than a behavioral-data-only model in both
generativeandpredictivemodelinganalyses.
We then show how a trivariate model – a model
including EEG, fMRI, and behavioral data – outperforms
bivariate models in both generative and predictive
modelinganalyses
Graphicaldiagramderivedfrom
Turneretal.(2016) [seepreviousslidesfor
EEG+fMRI: Observable data are
representedasgrayboxes,whereas
unknown(latent) variables arerepresented
asemptycircles.Theorangeplate
representsthebehavioraldata/model,the
greenplaterepresentstheEEGdata/model,
andtheblueplaterepresentsthefMRI
data/model.
ζ
I
Themethodallows
foranybehavioral
modeltobe
combinedwith
multipleneural
measures.
Generative
Deep Network
Improvetheexisting
generativemodels

MEG Visual processing with deep learning
http://dx.doi.org/10.1016/j.neuroimage.2016.03.063
Magnetoencephalography (MEG)
Image set and single-image decoding. (A) The stimulus
set comprised 48 indoor scene images differing in the size of the
space depicted (small vs. large), as well as clutter, contrast, and
luminance level; here each experimental factor combination is
exemplified by one image. The image set was based on
behaviorally validated images of scenes differing in size and
clutter level, de-correlating factors size and clutter explicitly by
experimental design (Park et al., 2015).
The deep neural network architecture “AlexNet” was implemented
following Krizhevsky et al. (2012). We chose this particular
architecture because it was the best performing model in object
classification in the ImageNet 2012 competition (
Russakovsky et al., 2014)
Supplementary Movie 1.
The deep scene model accounts for more of the MEG size
signal than other models. (A) We combined representational
similarity with partial correlation analysis to determine which
computational models explained emerging representations of
scene size in the brain.
Together our data provide a first description of an electrophysiological signal for layout
processing in humans and suggest that deep neural networks are a promising
framework to investigate how spatial layout representations emerge in the human brain.
Future studies using image sets optimized to drive low-and high level visual cortex equally
are necessary, to test whether layer-specific representations in deep neural networks can
be mapped in both time and in space onto processing stages in the human brain.
Sidenote!
AlexNet was indeed
revolutionary at its
time, but the 2015
winner ResNet from
Microsoft surpassed
human performance

Brain Circuit feed-forward vs recurrent
a | Feedforward network. The
diagram shows a multilayer perceptron,
consisting of three sequential layers of
neurons (represented by circles), in which
every neuron from each layer is
connected to every neuron of the next
layer. In this network, inputs are
sequentially processed layer by layer in a
unidirectional fashion, from the input
layer on the left, to the ‘hidden’ layer in
the middle, to the output layer on the
right. The simple addition of synaptic
weights in the output layer results in the
generation of selective responses. The
computation is an emergent property of
the activity of the entire network.
b | Recurrent network: an
example of an attractor (feedback)
neural network in which four
pyramidal neurons (blue) are
connected to themselves through
recurrent axons (thin lines) with
synaptic weights (wij) that change
owing to a learning rule. The
network receives an external set of
inputs (top connections) and
generates an output (bottom
arrows). In networks with recurrent
and symmetric connectivity the
activity becomes ‘attracted’ to
particular stable patterns.
http://dx.doi.org/10.1038/nrn3962, Cited by 49
Nature Reviews Neuroscience 11, 615-627
(September 2010)
doi:10.1038/nrn2886
The feedforward network as a model of
information processing in the brain. a | A
schematic of hierarchical processing in the visual
systems of primates. Similar schematic models
have also been described for other sensory and
motor areas. b | Each module in part a can be
considered as a recurrent network of excitatory
and inhibitory neurons. Each of the rectangular
boxes represents a recurrent random network. The
hierarchical structure of the brain is conceived
here as a network of recurrent networks
with forward and backward excitatory
connections. So far, only the feedforward part
(shown in black) of such a network of networks
has been investigated in a systematic manner.
Recurrent excitation and inhibition within one
group and excitatory synapses that do not
contribute to the feedforward hierarchy of
subsequent groups (shown in grey) have not been
considered yet

Residual variants state-of-the-art Deep feedforward network
https://arxiv.org/abs/1512.03385; Cited by 578
http://dx.doi.org/10.1007/978-3-319-46976-8_19
Skip connections
The framework of the proposed DEGREE network. The recurrent residual network
recovers sub-bands of the HR image features iteratively and edge features are utilized
as the guidance in image SR for preserving sharp details.

Circuit design deep networks vs Human brain #1
Center for Data Science, New York University
Department of Brain and Cognitive Sciences, MIT
Department of Psychology and Center for Brain Science, Harvard University
Center for Brains Minds and Machines
https://arxiv.org/abs/1604.00289; Citedby13
Science 11 Dec 2015:
Vol. 350, Issue 6266, pp. 1332-1338
DOI: 10.1126/science.aab3050; Cited by70

Center for Brains, Minds and Machines, McGovern Institute, MIT
How similar is an ultra-deep residual network to the primate cortex? A notable difference is the
depth. While a residual network has as many as 1202 layers, biological systems seem to have two
orders of magnitude less. In fact, there are about half a dozen areas in the ventral stream of visual
cortex from the retina to the Inferior Temporal cortex. Notice that it takes in the order of 10ms for
neural activity to propagate from one area to another one. The evolutionary advantage of having
fewer layers is apparent: it supports rapid (100msec from image onset to meaningful information
in IT neural population) visual recognition, which is a key ability of human and non-human primates.
It is intriguingly possible to account for this discrepancy by taking into account recurrent
connections within each visual area. Areas in visual cortex comprise six different layers with lateral
and feedback connections, which are believed to mediate some attentional effectsand even learning
(such as backpropagation). “Unrolling” in time the recurrent computations carried out by the visual
cortex provides an equivalent “ultra-deep” feedforward network, which might represent a more
appropriate comparison with the state-of-the-art computer vision models.
In addition, we conjecture that the effectiveness of recent “ultra-deep” neural networks primarily
come from the fact they can efficiently model the recurrent computations that are required by
the recognition task. We show compelling evidences for this conjecture by demonstrating that 1. a
deep residual network is formally equivalent to a shallow RNN; 2. such a RNN with weight
sharing, thus with orders of magnitude less parameters (depending on the unrolling depth), can
retain most of the performance of the corresponding deep residual network. Furthermore, we
generalize such a RNN into a class of models that are more biologically-plausible models of cortex
and show their effectiveness on CIFAR-10.
The transition matrices used in the paper. “BN” denotes Batch Normalization and “Conv” denotes convolution. Deconvolution layer (denoted
by “Deconv”) is [34] used as a transition function from a spacially small state to a spacially large one. BRCx2/BRDx2 denotes a BN-ReLU-
Conv/Deconv-BN-ReLU-Conv/Deconv pipeline (similar to a residual module). There is always a 2x2 subsampling/upsampling between nearby
states (e.g., V1/h1: 32x32, V2/h2: 16x16, V4/h3:8x8, IT:4x4). Stride 2 (convolution) or upsampling 2 (deconvolution) is used in transition
functions to match the spacial sizes of input and output states. The intermediate feature sizes of transition function BRCx2/BRDx2 or
BRCx3/BRDx3 are chosen to be the average feature size of input and output states. “+I” denotes a identity shortcut mapping. The design of
transition functions could be an interesting topic for future research.

HYPOTHESIS & THEORY ARTICLE
Front. Comput. Neurosci., 14 September 2016 | http://dx.doi.org/10.3389/fncom.2016.00094; Cited by 5
Putative differences between conventional and brain-like neural network
designs. (A) In conventional deep learning, supervised training is based on externally-supplied,
labeled data. (B) In the brain, supervised training of networks can still occur via gradient descent
on an error signal, but this error signal must arise from internally generated cost functions.
(C) Internally generated cost functions and error-driven training of cortical deep networks form
part of a larger architecture containing several specialized systems. Although the trainable cortical
areas are schematized as feedforward neural networks here, LSTMs or other types of recurrent
networks may be a more accurate analogy, and many neuronal and network properties such as
spiking, dendritic computation, neuromodulation, adaptation and homeostatic plasticity, timing-
dependent plasticity, direct electrical connections, transient synaptic dynamics,
excitatory/inhibitory balance, spontaneous oscillatory activity, axonal conduction delays (
Izhikevich, 2006) and others, will influence what and how such networks learn.
THISARTICLE ISPART OF THE RESEARCH TOPIC
ArtificialNeural NetworksasModelsofNeuralInformationProcessing
Machine learning and neuroscience speak different languages today. Brain science has
discovered a dazzling array of brain areas (Solari and Stoner, 2011), cell types, molecules,
cellular states, and mechanisms for computation and information storage. Machine
learning, in contrast, has largely focused on instantiations of a single principle: function
optimization.
We will argue here, however, that neuroscience and machine learning are again ripe for
convergence. Three aspects of machine learning are particularly important in the context
of this paper.
Hypothesis 1 – The Brain Optimizes Cost Functions
Hypothesis 2 – Cost Functions Are Diverse across Areas and Change over
Development
Hypothesis 3 – Specialized Systems Allow Efficient Solution of Key Computational
Problems
Machine learning may be equally transformed by neuroscience. Within the brain, a
myriad of subsystems and layers work together to produce an agent that exhibits general
intelligence.
Hypothesis 1 – Existence of Cost Functions
Hypothesis 2 – Biological Fine-structure of Cost Functions
Hypothesis 3 – Embedding within a Pre-structured Architecture
Hypothesis 1–
Did Evolution Separate Cost Functions from
OptimizationAlgorithms?
We hypothesize that the brain also acquired such a separation between
optimization mechanisms and cost functions. When did the division
between cost functions and optimization algorithms occur? How is this
separation implemented? How did innovations in cost functions and
optimization algorithms evolve? And how do our own cost functions and
learning algorithms differ from those of other animals?

Data-driven Ophthalmology

More Related Content

Similar to Data-driven Ophthalmology (20)

More from PetteriTeikariPhD (20)

Recently uploaded (20)

Data-driven Ophthalmology