SlideShare a Scribd company logo
Deep neural network with GANs pre-
training for tuberculosis type
classification based on CT
 

scans
Presenter: Behzad Shomal
i

Supervisor: Prof. Rouhan
i

2022, Jun 11
Outline
• Abstrac
t

• Backgroun
d

• Challenge
s

• Proposed metho
d

• Experiment
s

• Conclusion
Outline
• Abstrac
t

• Backgroun
d

• Challenge
s

• Proposed metho
d

• Experiment
s

• Conclusion
Abstract
• Tuberculosis (TB) is an airborne disease affects people’s lung
s

• Predict TB type of each infected chest CT sca
n

• Train a GAN on multiple dataset
s

• Discriminator: comprehends structure of a CT scan
 

• Generator: can produce photo-realistic images
 

• Discriminator is considered as a pre-trained model
 

• Fine-tune discriminator on the primary dataset
4
Generative Adversarial Network (GAN)
• First was introduced in 201
4

• Consists of two separated network
s

• Discriminato
r

• Generato
r

• In a minimax game
:

• Generator learns to make fakes that look rea
l

• Discriminator learns to distinguish real/fake
s

5
Generative Adversarial Network (GAN)
Discriminative
networks
𝑋
  →
𝑌
Class
Features
Generative
networks
 
𝜉
,  [
𝑌
]  →
𝑋
Features
Noise Class
Images source: https://www.thispersondoesnotexist.com/
6
Outline
• Abstrac
t

• Backgroun
d

• Challenge
s

• Proposed metho
d

• Experiment
s

• Conclusion
Background
• Deep learning outstanding performance in various field
s

• Convolutional neural networks (CNN
)

• Convolution operation: map & process data in a new spac
e

• Kernel: automatically extract information and patterns
 

• Approaches of dealing with volumetric data
:

• 2
D

• 2.5
D

• 3
D

Video source: https://youtu.be/2J8bDkALBic
8
2D, 2.5D, and 3D approaches
9
Outline
• Abstrac
t

• Backgroun
d

• Challenge
s

• Proposed metho
d

• Experiment
s

• Conclusion
Challenges
• Gradient vanishin
g

• Mode collaps
e

• Imbalanced & small datase
t

• Outputs quality vs. diversit
y

• Checkerboard artifact
s

11
Gradient vanishing
• A problem with gradient-based learning method
s

• Gradients of the loss function approach zer
o

• Prevents the network from efficiently updating its parameter
s

• The deeper the network, the more prone it is to gradient vanishin
g

• Mainly causes because of
:

• Using some activation functions
 

• Nature of some cost function
s

12
Sigmoid
• Some activation functions like Sigmoid
:

• Squishes a large space into a small space
 

• Large changes in input cause a small changes in outpu
t

• Derivative for very small/large inputs has an insignificant value
 

• Intensified by multiplying by the learning rate and partial derivative
s

13
ReLU
• Remedy: alternating Sigmoid with sth. like ReL
U

• With a range of [0, +∞
)

• Suffers from
:

• Zero gradient for negative inputs
 

• Zero pixels while upsampling
 

14
LeakyReLU
• Remedy: alternating ReLU with LeakyReL
U

• Take advantage of ReLU’s benefit
s

• Multiplies a coefficient to negative inputs
 

• Very effective hyperparameter in my experienc
e

• Reduces the likelihood of gradient vanishing
15
Gradient vanishing
• Although using LeakyReLU helped with the gradient problem
vanishing, the problem was not completely solved
!

• We looked elsewhere for the source of the problem, the cost function
!

16
Binary Cross-entropy Cost (BCE)
𝐽
= −
1
𝑚
𝑚
∑
𝑖
=1
[
𝑦
(
𝑖
)
log(h(
𝑥
(
𝑖
)
)) + (1 −
𝑦
(
𝑖
)
)log(1 − h(
𝑥
(
𝑖
)
))]
• It is traditionally used for training the GAN
s

• It is prone to:
• Gradient vanishin
g

• Mode collaps
e

• When discriminator dominates
:

• Gradients have an insignificant valu
e

• No valuable feedback
• Gradient vanishing (maybe!)
Difference between distributions
Figures source: DeepLearning.AI
17
Wasserstein Distance
• The remedy is substituting BCE with Wasserstein Distance
• But why is it better than BCE?
Weng, Lilian. "From gan to wgan." arXiv preprint arXiv:1904.08994 (2019).
18
Wasserstein Distance
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International conference on machine learning. PMLR, 2017.
19
1-Lipschitz continuity (1-L)
• A differentiable function is 1-L if and only if it has gradients
with the norm at most 1 everywher
e

• Critic must be 1-L continuou
s

• W-Loss validly approximate EM
D

Figures source: DeepLearning.AI
GIF source: https://en.wikipedia.org
20
1-Lipschitz continuity (1-L)
Weight
clipping
Gradient
penalty
• Hard constrain
t

• Clip weights to a fixed interva
l

• Done after updating parameter
s

• Limits the ability of the mode
l

• Innovators believe it is terrible!
• Soft constrain
t

• Penalize norm of gradients w.r.t. inpu
t

• Done by adding a regularization ter
m

• Alternative to weight clippin
g

• Impossible to check every point in spac
e

• Need an interpolation
21
Mode collapse
Left figure source: Metz, Luke, et al. "Unrolled generative adversarial networks." arXiv preprint arXiv:1611.02163 (2016).
• Generator is only able to produce small subset of mode
s

• Complete mode collapse:
 

• Generator maps several different input z values to the same output poin
t

• Very rar
e

• Partial mode collapse:
 

• Generator makes multiple images that contain the same texture themes
 

• Most commo
n

Right figure source: Goodfellow, Ian. "Nips 2016 tutorial: Generative adversarial networks." arXiv preprint arXiv:1701.00160 (2016).
22
Imbalanced & small dataset
• Weighted loss
• Undersamplin
g

• Data duplicatio
n

• Data augmentation
23
Imbalanced & small dataset
• Weighted los
s

• Assign a weight to each class (
)

• The smaller the weight, the less contribution to the learning proces
s

• Reduce the bias toward over-presented classe
s

• Undersamplin
g

• Eliminate samples from majority classe
s

• Use output of Kmeans algorithm as a heuristi
c

• Get help from the pre-trained model for the new representatio
n

~
1
# 
𝑜
𝑓
 
𝑠
𝑎
𝑚
𝑝
𝑙
𝑒
𝑠
24
Imbalanced & small dataset
• Data duplicatio
n

• In fine-tuning, we duplicated the data instead of undersamplin
g

• Duplicate each class by a proper rati
o

• Data augmentatio
n

• Horizontally flippin
g

• Zoomin
g

• Randomly rotating [−20, −10, −5, 5, 10, 20
]

25
Outputs quality vs. diversity
All images have been produced using StyleGAN2.ipynb
26
Outputs quality vs. diversity
μ - 2σ μ + 2σ
95%
27
• Truncation trick is a latent sampling procedur
e

• Sample from a truncated normal distributio
n

• The thinner the distribution, the better output quality and the less diversit
y

• Used μ = 0, σ = 0.5
 

• Truncated with
:

• Upper bound: μ + 2σ
 

• Lower bound: μ − 2σ
Checkerboard artifacts
Figure source: Odena, Augustus, Vincent Dumoulin, and Chris Olah. "Deconvolution and checkerboard artifacts. Distill (2016)." (2016): 165.
28
Checkerboard artifacts
• Strange checkerboard pattern of artifact
s

• Uneven overla
p

• Caused by: indivisible kernel size & strid
e

• Causes: putting more paint in some pixel
s

• Have better upsampling layer
s

•
• Separate out upsampling from convolutio
n

• Upsampling:
• Convolution: compute feature
s

 
𝑆
𝑖
𝑧
𝑒
𝑘
𝑒
𝑟
𝑛
𝑒
𝑙
 % 
𝑆
𝑡
𝑟
𝑖
𝑑
𝑒
𝑘
𝑒
𝑟
𝑛
𝑒
𝑙
= 0
𝐿
𝑜
𝑤
 
𝑟
𝑒
𝑠
𝑜
𝑙
𝑢
𝑡
𝑖
𝑜
𝑛
  →
𝐻
𝑖
𝑔
h
𝑒
𝑟
 
𝑟
𝑒
𝑠
𝑜
𝑙
𝑢
𝑡
𝑖
𝑜
𝑛
Figure source: Li, Yangyang, et al. "RADet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images." Remote Sensing 12.3 (2020): 389.
29
Outline
• Abstrac
t

• Backgroun
d

• Challenge
s

• Proposed metho
d

• Experiment
s

• Conclusion
Proposed method
31
Pre-train Fine-tune
Preprocess
• Rotate by a degree of 90
o

• Select 128 slice
s

• Eliminate initial/last slices + zoo
m

• Zoom across z-axi
s

• Resize to
• Normaliz
e

• Set HU value to
• Scale pixels to [-1,1
]

128 ∗ 128 ∗ 128
𝑊
:1400 
𝐿
: − 300
32
Networks structure
~ 950 K parameters ~ 1.1 M parameters
33
Pre-train
34
• Adversarial training is a mechanism to improve models robustnes
s

• As the training goes on
:

• Generator produces more realistic images
 

• Discriminator learns to detect more realistic copies
 

• Discriminator will
:

• Learn the structure of a CT
 

• Extract robust features from a CT scan of a lung
Pre-train
35
• Trained for 32,000 batche
s

• Batch size:
6

• Two mini-batches (size: 3+3
)

• Batch of fake image
s

• Batch of genuine image
s

• RMSprop optimize
r

• Learning rate: 5e-
5

• ncritic :
5
Fine-tune
36
• Knowledge learned from source dataset is transferred to target datase
t

• Use weights of pre-trained model as initialization weight
s

• Speeding up training proces
s

• Overcoming issue of small dataset
s

• Only updated dense layer
s

• Added Batch Normalization laye
r

• Fine-tuned of total parameter
s

• RMSprop optimize
r

• = 8e-
4

•
 

• Trained for 100 epochs (batch size = 8
)

• Used label smoothing (0.15) as a regularize
r

𝟒
𝟎
%
Outline
• Abstrac
t

• Backgroun
d

• Challenge
s

• Proposed metho
d

• Experiment
s

• Conclusion
Datasets
• Contains 1110 samples from different patient
s

• With size of
• es were varied with the median of 41
512 ∗ 512 ∗
𝑑
𝑒
𝑝
𝑡
h
𝑑
𝑒
𝑝
𝑡
h
MosMed COVID-19 Chest
CT (external dataset
)

Morozov, S. P., et al. "Mosmeddata: Chest ct scans with covid-19 related findings dataset." arXiv preprint arXiv:2005.06465 (2020).
• 917 training and 421 test sample
s

• 5 TB types (classes
)

• With size of
• es were varied with the median of 128
512 ∗ 512 ∗
𝑑
𝑒
𝑝
𝑡
h
𝑑
𝑒
𝑝
𝑡
h
ImageCLEFmed Tuberculosi
s

(primary dataset
)

Kozlovski, Serge, et al. "Overview of ImageCLEFtuberculosis 2021-CT-based tuberculosis type classification." CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org< http://ceur-ws. org>, Bucharest, Romania. 2021.
38
Experiments
39
Pre-training phase
:

• We have achieved the goal !!
!

• Generator’s output can be used as data augmentation tool in future work
s

Fine-tuning phase
:

• Test time augmentatio
n

• 4 inference step
s

• +1: original imag
e

• +3: augmentation techniques used in trainin
g

• Used last 10 models saved during training as an ensembl
e

• Aggregation the result
s

• Pick the most frequent predicted labe
l

• Pick the label with the highest mean of Softmax output
 

Total of 40
prediction
per CT
Results
40
• Despite all the effort, results were not so promising
!

• Got accuracy of 40% and 28% on validation and test dat
a

• However the low score are not a big surprise
!

• CTs contain more than 1 lesion typ
e

• Neither a human can find the rationale behind them nor a machine
!

• CT scan is not the best practice used by experts to diagnose TB
 

• We assume that there was probably an issue in the implementation!
Outline
• Abstrac
t

• Backgroun
d

• Challenge
s

• Proposed metho
d

• Experiment
s

• Conclusion
Conclusion
• To have a robust Generator, first we need a robust Critic
!

• WGAN-GP significantly improved the performance of the GA
N

• Using batch normalization in the final classifier plays an important rol
e

42
Questions
Deep neural network with GANs pre-training for tuberculosis type classification based on CT scans

More Related Content

PDF
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
PDF
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
PDF
Tutorial on Theory and Application of Generative Adversarial Networks
PDF
Introduction to Generative Adversarial Network
PDF
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
PPTX
Usage of Generative Adversarial Networks (GANs) in Healthcare
PDF
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
PDF
lecture_13_jiajun.pdf Generative models GAN
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Tutorial on Theory and Application of Generative Adversarial Networks
Introduction to Generative Adversarial Network
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
Usage of Generative Adversarial Networks (GANs) in Healthcare
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
lecture_13_jiajun.pdf Generative models GAN

Similar to Deep neural network with GANs pre- training for tuberculosis type classification based on CT scans (20)

PDF
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
PPTX
Batch normalization presentation
PDF
Deep Generative Modelling (updated)
PDF
A Short Introduction to Generative Adversarial Networks
PDF
Deep image generating models
PPTX
Chapter10.pptx
PDF
Generative adversarial networks
PDF
PDF
Introduction To Generative Adversarial Networks GANs
PDF
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
PDF
Generative adversarial networks
PDF
GAN in medical imaging
PDF
Generative adversarial networks
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PDF
Unsupervised learning represenation with DCGAN
PDF
11_gan.pdf
PDF
Generative adversarial network_Ayadi_Alaeddine
PDF
TensorFlow London: Progressive Growing of GANs for increased stability, quali...
PPTX
GAN_SANTHOSH KUMAR_Architecture_in_network
PDF
A Walk in the GAN Zoo
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
Batch normalization presentation
Deep Generative Modelling (updated)
A Short Introduction to Generative Adversarial Networks
Deep image generating models
Chapter10.pptx
Generative adversarial networks
Introduction To Generative Adversarial Networks GANs
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Generative adversarial networks
GAN in medical imaging
Generative adversarial networks
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
Unsupervised learning represenation with DCGAN
11_gan.pdf
Generative adversarial network_Ayadi_Alaeddine
TensorFlow London: Progressive Growing of GANs for increased stability, quali...
GAN_SANTHOSH KUMAR_Architecture_in_network
A Walk in the GAN Zoo
Ad

Recently uploaded (20)

PDF
Presentation1 [Autosaved].pdf diagnosiss
PPTX
power point presentation ofDracena species.pptx
PPTX
FINAL TEST 3C_OCTAVIA RAMADHANI SANTOSO-1.pptx
PPTX
Anesthesia and it's stage with mnemonic and images
PPTX
Lesson-7-Gas. -Exchange_074636.pptx
PPTX
2025-08-17 Joseph 03 (shared slides).pptx
DOCX
Action plan to easily understanding okey
PPTX
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
PDF
COLEAD A2F approach and Theory of Change
PPT
First Aid Training Presentation Slides.ppt
PPTX
HOW TO HANDLE THE STAGE FOR ACADEMIA AND OTHERS.pptx
PDF
Module 7 guard mounting of security pers
PDF
natwest.pdf company description and business model
PPTX
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
PDF
PM Narendra Modi's speech from Red Fort on 79th Independence Day.pdf
PPTX
ANICK 6 BIRTHDAY....................................................
PPTX
INDIGENOUS-LANGUAGES-AND-LITERATURE.pptx
DOC
LBU毕业证学历认证,伦敦政治经济学院毕业证外国毕业证
PPTX
PurpoaiveCommunication for students 02.pptx
PDF
MODULE 3 BASIC SECURITY DUTIES AND ROLES.pdf
Presentation1 [Autosaved].pdf diagnosiss
power point presentation ofDracena species.pptx
FINAL TEST 3C_OCTAVIA RAMADHANI SANTOSO-1.pptx
Anesthesia and it's stage with mnemonic and images
Lesson-7-Gas. -Exchange_074636.pptx
2025-08-17 Joseph 03 (shared slides).pptx
Action plan to easily understanding okey
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
COLEAD A2F approach and Theory of Change
First Aid Training Presentation Slides.ppt
HOW TO HANDLE THE STAGE FOR ACADEMIA AND OTHERS.pptx
Module 7 guard mounting of security pers
natwest.pdf company description and business model
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
PM Narendra Modi's speech from Red Fort on 79th Independence Day.pdf
ANICK 6 BIRTHDAY....................................................
INDIGENOUS-LANGUAGES-AND-LITERATURE.pptx
LBU毕业证学历认证,伦敦政治经济学院毕业证外国毕业证
PurpoaiveCommunication for students 02.pptx
MODULE 3 BASIC SECURITY DUTIES AND ROLES.pdf
Ad

Deep neural network with GANs pre- training for tuberculosis type classification based on CT scans

  • 1. Deep neural network with GANs pre- training for tuberculosis type classification based on CT scans Presenter: Behzad Shomal i Supervisor: Prof. Rouhan i 2022, Jun 11
  • 2. Outline • Abstrac t • Backgroun d • Challenge s • Proposed metho d • Experiment s • Conclusion
  • 3. Outline • Abstrac t • Backgroun d • Challenge s • Proposed metho d • Experiment s • Conclusion
  • 4. Abstract • Tuberculosis (TB) is an airborne disease affects people’s lung s • Predict TB type of each infected chest CT sca n • Train a GAN on multiple dataset s • Discriminator: comprehends structure of a CT scan • Generator: can produce photo-realistic images • Discriminator is considered as a pre-trained model • Fine-tune discriminator on the primary dataset 4
  • 5. Generative Adversarial Network (GAN) • First was introduced in 201 4 • Consists of two separated network s • Discriminato r • Generato r • In a minimax game : • Generator learns to make fakes that look rea l • Discriminator learns to distinguish real/fake s 5
  • 6. Generative Adversarial Network (GAN) Discriminative networks 𝑋   → 𝑌 Class Features Generative networks   𝜉 ,  [ 𝑌 ]  → 𝑋 Features Noise Class Images source: https://www.thispersondoesnotexist.com/ 6
  • 7. Outline • Abstrac t • Backgroun d • Challenge s • Proposed metho d • Experiment s • Conclusion
  • 8. Background • Deep learning outstanding performance in various field s • Convolutional neural networks (CNN ) • Convolution operation: map & process data in a new spac e • Kernel: automatically extract information and patterns • Approaches of dealing with volumetric data : • 2 D • 2.5 D • 3 D Video source: https://youtu.be/2J8bDkALBic 8
  • 9. 2D, 2.5D, and 3D approaches 9
  • 10. Outline • Abstrac t • Backgroun d • Challenge s • Proposed metho d • Experiment s • Conclusion
  • 11. Challenges • Gradient vanishin g • Mode collaps e • Imbalanced & small datase t • Outputs quality vs. diversit y • Checkerboard artifact s 11
  • 12. Gradient vanishing • A problem with gradient-based learning method s • Gradients of the loss function approach zer o • Prevents the network from efficiently updating its parameter s • The deeper the network, the more prone it is to gradient vanishin g • Mainly causes because of : • Using some activation functions • Nature of some cost function s 12
  • 13. Sigmoid • Some activation functions like Sigmoid : • Squishes a large space into a small space • Large changes in input cause a small changes in outpu t • Derivative for very small/large inputs has an insignificant value • Intensified by multiplying by the learning rate and partial derivative s 13
  • 14. ReLU • Remedy: alternating Sigmoid with sth. like ReL U • With a range of [0, +∞ ) • Suffers from : • Zero gradient for negative inputs • Zero pixels while upsampling 14
  • 15. LeakyReLU • Remedy: alternating ReLU with LeakyReL U • Take advantage of ReLU’s benefit s • Multiplies a coefficient to negative inputs • Very effective hyperparameter in my experienc e • Reduces the likelihood of gradient vanishing 15
  • 16. Gradient vanishing • Although using LeakyReLU helped with the gradient problem vanishing, the problem was not completely solved ! • We looked elsewhere for the source of the problem, the cost function ! 16
  • 17. Binary Cross-entropy Cost (BCE) 𝐽 = − 1 𝑚 𝑚 ∑ 𝑖 =1 [ 𝑦 ( 𝑖 ) log(h( 𝑥 ( 𝑖 ) )) + (1 − 𝑦 ( 𝑖 ) )log(1 − h( 𝑥 ( 𝑖 ) ))] • It is traditionally used for training the GAN s • It is prone to: • Gradient vanishin g • Mode collaps e • When discriminator dominates : • Gradients have an insignificant valu e • No valuable feedback • Gradient vanishing (maybe!) Difference between distributions Figures source: DeepLearning.AI 17
  • 18. Wasserstein Distance • The remedy is substituting BCE with Wasserstein Distance • But why is it better than BCE? Weng, Lilian. "From gan to wgan." arXiv preprint arXiv:1904.08994 (2019). 18
  • 19. Wasserstein Distance Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International conference on machine learning. PMLR, 2017. 19
  • 20. 1-Lipschitz continuity (1-L) • A differentiable function is 1-L if and only if it has gradients with the norm at most 1 everywher e • Critic must be 1-L continuou s • W-Loss validly approximate EM D Figures source: DeepLearning.AI GIF source: https://en.wikipedia.org 20
  • 21. 1-Lipschitz continuity (1-L) Weight clipping Gradient penalty • Hard constrain t • Clip weights to a fixed interva l • Done after updating parameter s • Limits the ability of the mode l • Innovators believe it is terrible! • Soft constrain t • Penalize norm of gradients w.r.t. inpu t • Done by adding a regularization ter m • Alternative to weight clippin g • Impossible to check every point in spac e • Need an interpolation 21
  • 22. Mode collapse Left figure source: Metz, Luke, et al. "Unrolled generative adversarial networks." arXiv preprint arXiv:1611.02163 (2016). • Generator is only able to produce small subset of mode s • Complete mode collapse: • Generator maps several different input z values to the same output poin t • Very rar e • Partial mode collapse: • Generator makes multiple images that contain the same texture themes • Most commo n Right figure source: Goodfellow, Ian. "Nips 2016 tutorial: Generative adversarial networks." arXiv preprint arXiv:1701.00160 (2016). 22
  • 23. Imbalanced & small dataset • Weighted loss • Undersamplin g • Data duplicatio n • Data augmentation 23
  • 24. Imbalanced & small dataset • Weighted los s • Assign a weight to each class ( ) • The smaller the weight, the less contribution to the learning proces s • Reduce the bias toward over-presented classe s • Undersamplin g • Eliminate samples from majority classe s • Use output of Kmeans algorithm as a heuristi c • Get help from the pre-trained model for the new representatio n ~ 1 #  𝑜 𝑓   𝑠 𝑎 𝑚 𝑝 𝑙 𝑒 𝑠 24
  • 25. Imbalanced & small dataset • Data duplicatio n • In fine-tuning, we duplicated the data instead of undersamplin g • Duplicate each class by a proper rati o • Data augmentatio n • Horizontally flippin g • Zoomin g • Randomly rotating [−20, −10, −5, 5, 10, 20 ] 25
  • 26. Outputs quality vs. diversity All images have been produced using StyleGAN2.ipynb 26
  • 27. Outputs quality vs. diversity μ - 2σ μ + 2σ 95% 27 • Truncation trick is a latent sampling procedur e • Sample from a truncated normal distributio n • The thinner the distribution, the better output quality and the less diversit y • Used μ = 0, σ = 0.5 • Truncated with : • Upper bound: μ + 2σ • Lower bound: μ − 2σ
  • 28. Checkerboard artifacts Figure source: Odena, Augustus, Vincent Dumoulin, and Chris Olah. "Deconvolution and checkerboard artifacts. Distill (2016)." (2016): 165. 28
  • 29. Checkerboard artifacts • Strange checkerboard pattern of artifact s • Uneven overla p • Caused by: indivisible kernel size & strid e • Causes: putting more paint in some pixel s • Have better upsampling layer s • • Separate out upsampling from convolutio n • Upsampling: • Convolution: compute feature s   𝑆 𝑖 𝑧 𝑒 𝑘 𝑒 𝑟 𝑛 𝑒 𝑙  %  𝑆 𝑡 𝑟 𝑖 𝑑 𝑒 𝑘 𝑒 𝑟 𝑛 𝑒 𝑙 = 0 𝐿 𝑜 𝑤   𝑟 𝑒 𝑠 𝑜 𝑙 𝑢 𝑡 𝑖 𝑜 𝑛   → 𝐻 𝑖 𝑔 h 𝑒 𝑟   𝑟 𝑒 𝑠 𝑜 𝑙 𝑢 𝑡 𝑖 𝑜 𝑛 Figure source: Li, Yangyang, et al. "RADet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images." Remote Sensing 12.3 (2020): 389. 29
  • 30. Outline • Abstrac t • Backgroun d • Challenge s • Proposed metho d • Experiment s • Conclusion
  • 32. Preprocess • Rotate by a degree of 90 o • Select 128 slice s • Eliminate initial/last slices + zoo m • Zoom across z-axi s • Resize to • Normaliz e • Set HU value to • Scale pixels to [-1,1 ] 128 ∗ 128 ∗ 128 𝑊 :1400  𝐿 : − 300 32
  • 33. Networks structure ~ 950 K parameters ~ 1.1 M parameters 33
  • 34. Pre-train 34 • Adversarial training is a mechanism to improve models robustnes s • As the training goes on : • Generator produces more realistic images • Discriminator learns to detect more realistic copies • Discriminator will : • Learn the structure of a CT • Extract robust features from a CT scan of a lung
  • 35. Pre-train 35 • Trained for 32,000 batche s • Batch size: 6 • Two mini-batches (size: 3+3 ) • Batch of fake image s • Batch of genuine image s • RMSprop optimize r • Learning rate: 5e- 5 • ncritic : 5
  • 36. Fine-tune 36 • Knowledge learned from source dataset is transferred to target datase t • Use weights of pre-trained model as initialization weight s • Speeding up training proces s • Overcoming issue of small dataset s • Only updated dense layer s • Added Batch Normalization laye r • Fine-tuned of total parameter s • RMSprop optimize r • = 8e- 4 • • Trained for 100 epochs (batch size = 8 ) • Used label smoothing (0.15) as a regularize r 𝟒 𝟎 %
  • 37. Outline • Abstrac t • Backgroun d • Challenge s • Proposed metho d • Experiment s • Conclusion
  • 38. Datasets • Contains 1110 samples from different patient s • With size of • es were varied with the median of 41 512 ∗ 512 ∗ 𝑑 𝑒 𝑝 𝑡 h 𝑑 𝑒 𝑝 𝑡 h MosMed COVID-19 Chest CT (external dataset ) Morozov, S. P., et al. "Mosmeddata: Chest ct scans with covid-19 related findings dataset." arXiv preprint arXiv:2005.06465 (2020). • 917 training and 421 test sample s • 5 TB types (classes ) • With size of • es were varied with the median of 128 512 ∗ 512 ∗ 𝑑 𝑒 𝑝 𝑡 h 𝑑 𝑒 𝑝 𝑡 h ImageCLEFmed Tuberculosi s (primary dataset ) Kozlovski, Serge, et al. "Overview of ImageCLEFtuberculosis 2021-CT-based tuberculosis type classification." CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org< http://ceur-ws. org>, Bucharest, Romania. 2021. 38
  • 39. Experiments 39 Pre-training phase : • We have achieved the goal !! ! • Generator’s output can be used as data augmentation tool in future work s Fine-tuning phase : • Test time augmentatio n • 4 inference step s • +1: original imag e • +3: augmentation techniques used in trainin g • Used last 10 models saved during training as an ensembl e • Aggregation the result s • Pick the most frequent predicted labe l • Pick the label with the highest mean of Softmax output Total of 40 prediction per CT
  • 40. Results 40 • Despite all the effort, results were not so promising ! • Got accuracy of 40% and 28% on validation and test dat a • However the low score are not a big surprise ! • CTs contain more than 1 lesion typ e • Neither a human can find the rationale behind them nor a machine ! • CT scan is not the best practice used by experts to diagnose TB • We assume that there was probably an issue in the implementation!
  • 41. Outline • Abstrac t • Backgroun d • Challenge s • Proposed metho d • Experiment s • Conclusion
  • 42. Conclusion • To have a robust Generator, first we need a robust Critic ! • WGAN-GP significantly improved the performance of the GA N • Using batch normalization in the final classifier plays an important rol e 42
  • 43. Questions Deep neural network with GANs pre-training for tuberculosis type classification based on CT scans