Deep neural network with GANs pre- training for tuberculosis type classification based on CT scans

Deep neural network with GANs pre-
training for tuberculosis type
classification based on CT

scans
Presenter: Behzad Shomal
i

Supervisor: Prof. Rouhan
i

2022, Jun 11

Outline
• Abstrac
t

• Backgroun
d

• Challenge
s

• Proposed metho
d

• Experiment
s

• Conclusion

Abstract
• Tuberculosis (TB) is an airborne disease affects people’s lung
s

• Predict TB type of each infected chest CT sca
n

• Train a GAN on multiple dataset
s

• Discriminator: comprehends structure of a CT scan

• Generator: can produce photo-realistic images

• Discriminator is considered as a pre-trained model

• Fine-tune discriminator on the primary dataset
4

Generative Adversarial Network (GAN)
• First was introduced in 201
4

• Consists of two separated network
s

• Discriminato
r

• Generato
r

• In a minimax game
:

• Generator learns to make fakes that look rea
l

• Discriminator learns to distinguish real/fake
s

5

Generative Adversarial Network (GAN)
Discriminative
networks
𝑋
→
𝑌
Class
Features
Generative
networks

𝜉
, [
𝑌
] →
𝑋
Features
Noise Class
Images source: https://www.thispersondoesnotexist.com/
6

Background
• Deep learning outstanding performance in various field
s

• Convolutional neural networks (CNN
)

• Convolution operation: map & process data in a new spac
e

• Kernel: automatically extract information and patterns

• Approaches of dealing with volumetric data
:

• 2
D

• 2.5
D

• 3
D

Video source: https://youtu.be/2J8bDkALBic
8

Challenges
• Gradient vanishin
g

• Mode collaps
e

• Imbalanced & small datase
t

• Outputs quality vs. diversit
y

• Checkerboard artifact
s

11

Gradient vanishing
• A problem with gradient-based learning method
s

• Gradients of the loss function approach zer
o

• Prevents the network from efficiently updating its parameter
s

• The deeper the network, the more prone it is to gradient vanishin
g

• Mainly causes because of
:

• Using some activation functions

• Nature of some cost function
s

12

Sigmoid
• Some activation functions like Sigmoid
:

• Squishes a large space into a small space

• Large changes in input cause a small changes in outpu
t

• Derivative for very small/large inputs has an insignificant value

• Intensified by multiplying by the learning rate and partial derivative
s

13

ReLU
• Remedy: alternating Sigmoid with sth. like ReL
U

• With a range of [0, +∞
)

• Suffers from
:

• Zero gradient for negative inputs

• Zero pixels while upsampling

14

LeakyReLU
• Remedy: alternating ReLU with LeakyReL
U

• Take advantage of ReLU’s benefit
s

• Multiplies a coefficient to negative inputs

• Very effective hyperparameter in my experienc
e

• Reduces the likelihood of gradient vanishing
15

Gradient vanishing
• Although using LeakyReLU helped with the gradient problem
vanishing, the problem was not completely solved
!

• We looked elsewhere for the source of the problem, the cost function
!

16

Binary Cross-entropy Cost (BCE)
𝐽
= −
1
𝑚
𝑚
∑
𝑖
=1
[
𝑦
(
𝑖
)
log(h(
𝑥
(
𝑖
)
)) + (1 −
𝑦
(
𝑖
)
)log(1 − h(
𝑥
(
𝑖
)
))]
• It is traditionally used for training the GAN
s

• It is prone to:
• Gradient vanishin
g

• Mode collaps
e

• When discriminator dominates
:

• Gradients have an insignificant valu
e

• No valuable feedback
• Gradient vanishing (maybe!)
Difference between distributions
Figures source: DeepLearning.AI
17

Wasserstein Distance
• The remedy is substituting BCE with Wasserstein Distance
• But why is it better than BCE?
Weng, Lilian. "From gan to wgan." arXiv preprint arXiv:1904.08994 (2019).
18

Wasserstein Distance
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International conference on machine learning. PMLR, 2017.
19

1-Lipschitz continuity (1-L)
• A differentiable function is 1-L if and only if it has gradients
with the norm at most 1 everywher
e

• Critic must be 1-L continuou
s

• W-Loss validly approximate EM
D

Figures source: DeepLearning.AI
GIF source: https://en.wikipedia.org
20

1-Lipschitz continuity (1-L)
Weight
clipping
Gradient
penalty
• Hard constrain
t

• Clip weights to a fixed interva
l

• Done after updating parameter
s

• Limits the ability of the mode
l

• Innovators believe it is terrible!
• Soft constrain
t

• Penalize norm of gradients w.r.t. inpu
t

• Done by adding a regularization ter
m

• Alternative to weight clippin
g

• Impossible to check every point in spac
e

• Need an interpolation
21

Mode collapse
Left figure source: Metz, Luke, et al. "Unrolled generative adversarial networks." arXiv preprint arXiv:1611.02163 (2016).
• Generator is only able to produce small subset of mode
s

• Complete mode collapse:

• Generator maps several different input z values to the same output poin
t

• Very rar
e

• Partial mode collapse:

• Generator makes multiple images that contain the same texture themes

• Most commo
n

Right figure source: Goodfellow, Ian. "Nips 2016 tutorial: Generative adversarial networks." arXiv preprint arXiv:1701.00160 (2016).
22

Imbalanced & small dataset
• Weighted loss
• Undersamplin
g

• Data duplicatio
n

• Data augmentation
23

• Weighted los
s

• Assign a weight to each class (
)

• The smaller the weight, the less contribution to the learning proces
s

• Reduce the bias toward over-presented classe
s

• Undersamplin
g

• Eliminate samples from majority classe
s

• Use output of Kmeans algorithm as a heuristi
c

• Get help from the pre-trained model for the new representatio
n

~
1
#
𝑜
𝑓

𝑠
𝑎
𝑚
𝑝
𝑙
𝑒
𝑠
24

• Data duplicatio
n

• In fine-tuning, we duplicated the data instead of undersamplin
g

• Duplicate each class by a proper rati
o

• Data augmentatio
n

• Horizontally flippin
g

• Zoomin
g

• Randomly rotating [−20, −10, −5, 5, 10, 20
]

25

Outputs quality vs. diversity
All images have been produced using StyleGAN2.ipynb
26

Outputs quality vs. diversity
μ - 2σ μ + 2σ
95%
27
• Truncation trick is a latent sampling procedur
e

• Sample from a truncated normal distributio
n

• The thinner the distribution, the better output quality and the less diversit
y

• Used μ = 0, σ = 0.5

• Truncated with
:

• Upper bound: μ + 2σ

• Lower bound: μ − 2σ

Checkerboard artifacts
Figure source: Odena, Augustus, Vincent Dumoulin, and Chris Olah. "Deconvolution and checkerboard artifacts. Distill (2016)." (2016): 165.
28

Checkerboard artifacts
• Strange checkerboard pattern of artifact
s

• Uneven overla
p

• Caused by: indivisible kernel size & strid
e

• Causes: putting more paint in some pixel
s

• Have better upsampling layer
s

•
• Separate out upsampling from convolutio
n

• Upsampling:
• Convolution: compute feature
s

𝑆
𝑖
𝑧
𝑒
𝑘
𝑒
𝑟
𝑛
𝑒
𝑙
%
𝑆
𝑡
𝑟
𝑖
𝑑
𝑒
𝑘
𝑒
𝑟
𝑛
𝑒
𝑙
= 0
𝐿
𝑜
𝑤

𝑟
𝑒
𝑠
𝑜
𝑙
𝑢
𝑡
𝑖
𝑜
𝑛
→
𝐻
𝑖
𝑔
h
𝑒
𝑟

𝑟
𝑒
𝑠
𝑜
𝑙
𝑢
𝑡
𝑖
𝑜
𝑛
Figure source: Li, Yangyang, et al. "RADet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images." Remote Sensing 12.3 (2020): 389.
29

Proposed method
31
Pre-train Fine-tune

Preprocess
• Rotate by a degree of 90
o

• Select 128 slice
s

• Eliminate initial/last slices + zoo
m

• Zoom across z-axi
s

• Resize to
• Normaliz
e

• Set HU value to
• Scale pixels to [-1,1
]

128 ∗ 128 ∗ 128
𝑊
:1400
𝐿
: − 300
32

Networks structure
~ 950 K parameters ~ 1.1 M parameters
33

Pre-train
34
• Adversarial training is a mechanism to improve models robustnes
s

• As the training goes on
:

• Generator produces more realistic images

• Discriminator learns to detect more realistic copies

• Discriminator will
:

• Learn the structure of a CT

• Extract robust features from a CT scan of a lung

Pre-train
35
• Trained for 32,000 batche
s

• Batch size:
6

• Two mini-batches (size: 3+3
)

• Batch of fake image
s

• Batch of genuine image
s

• RMSprop optimize
r

• Learning rate: 5e-
5

• ncritic :
5

Fine-tune
36
• Knowledge learned from source dataset is transferred to target datase
t

• Use weights of pre-trained model as initialization weight
s

• Speeding up training proces
s

• Overcoming issue of small dataset
s

• Only updated dense layer
s

• Added Batch Normalization laye
r

• Fine-tuned of total parameter
s

• RMSprop optimize
r

• = 8e-
4

•

• Trained for 100 epochs (batch size = 8
)

• Used label smoothing (0.15) as a regularize
r

𝟒
𝟎
%

Datasets
• Contains 1110 samples from different patient
s

• With size of
• es were varied with the median of 41
512 ∗ 512 ∗
𝑑
𝑒
𝑝
𝑡
h
𝑑
𝑒
𝑝
𝑡
h
MosMed COVID-19 Chest
CT (external dataset
)

Morozov, S. P., et al. "Mosmeddata: Chest ct scans with covid-19 related findings dataset." arXiv preprint arXiv:2005.06465 (2020).
• 917 training and 421 test sample
s

• 5 TB types (classes
)

• With size of
• es were varied with the median of 128
512 ∗ 512 ∗
𝑑
𝑒
𝑝
𝑡
h
𝑑
𝑒
𝑝
𝑡
h
ImageCLEFmed Tuberculosi
s

(primary dataset
)

Kozlovski, Serge, et al. "Overview of ImageCLEFtuberculosis 2021-CT-based tuberculosis type classification." CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org< http://ceur-ws. org>, Bucharest, Romania. 2021.
38

Experiments
39
Pre-training phase
:

• We have achieved the goal !!
!

• Generator’s output can be used as data augmentation tool in future work
s

Fine-tuning phase
:

• Test time augmentatio
n

• 4 inference step
s

• +1: original imag
e

• +3: augmentation techniques used in trainin
g

• Used last 10 models saved during training as an ensembl
e

• Aggregation the result
s

• Pick the most frequent predicted labe
l

• Pick the label with the highest mean of Softmax output

Total of 40
prediction
per CT

Results
40
• Despite all the effort, results were not so promising
!

• Got accuracy of 40% and 28% on validation and test dat
a

• However the low score are not a big surprise
!

• CTs contain more than 1 lesion typ
e

• Neither a human can find the rationale behind them nor a machine
!

• CT scan is not the best practice used by experts to diagnose TB

• We assume that there was probably an issue in the implementation!

Conclusion
• To have a robust Generator, first we need a robust Critic
!

• WGAN-GP significantly improved the performance of the GA
N

• Using batch normalization in the final classifier plays an important rol
e

42

Questions
Deep neural network with GANs pre-training for tuberculosis type classification based on CT scans

Deep neural network with GANs pre- training for tuberculosis type classification based on CT scans

More Related Content

Similar to Deep neural network with GANs pre- training for tuberculosis type classification based on CT scans (20)

Recently uploaded (20)

Deep neural network with GANs pre- training for tuberculosis type classification based on CT scans