Medical data management: COVID-19 detection using cough recordings, chest X-rays classification and generation

MEDICAL DATA MANAGEMENT:
COVID-19 DETECTION USING COUGH
RECORDINGS,
CHEST X-RAYS CLASSIFICATION AND
GENERATION
University of Milano-Bicocca
Master's Degree in Data Science
Digital Signal and Image Management
Academic Year 2022-2023
Authors:
Giorgio CARBONE matricola n.
811974
Gianluca CAVALLARO matricola n.
826049 Remo MARCONZINI matricola
n. 883256

PROCESSING OF
ONE-DIMENSIONAL
SIGNALS

Dataset:
 Crowdsource dataset
 Recordings collected between April 1st, 2020 and
December 1st, 2020
 34,434 recordings and their metadata
• One .json for each recording
• One .csv file containing all metadata
 Most relevant attributes
• uuid → Name of the recording
• cough_detected → Probability of being cough
sound
• status → Self-reported health condition
uuid 00039425-7f3a-42aa-ac13-
834aaa2b6b92
document 2020-04-
13T21:30:59.801831+00:00
cough_detected 0.9754
age [0, …, 99, NaN]
gender [Male, Female, NaN]
respiratory_condition [True, False, NaN]
fever_muscle_pain [True, False, NaN]
status [Healthy, Symptomatic,
COVID-19, NaN]

Data Cleaning
 Removing rows with unknown status
 Filter for recordings with cough_detected > 0.8
• Value recommended by the authors
 Number of recordings after cleaning: 12119
 Recordings distribution:
• Healthy: 9631
• Symptomatic: 2622
• COVID-19: 634
 The dataset is imbalanced
N° recordings
Healthy 9167
Symptomatic 2339
COVID-19 613
Total 12119

Preprocessing
 Noise reduction
• Spectral gating using noisereduce
 Silence removal
• To maintain only relevant audio patterns
• Silence > 1s is removed
• 0.5s of silence maintained at the beginning
and the end of the recording
 Length standardization
• Need for a fixed dimensions of the audio
features
• Trade-off between information loss and amount
of sparse values
Duration N° recordings
< 2s 1439
<3s 3461
< 4s 5826
< 5s 7892
< 6s 9468
< 7s 10680
< 8s 11470
< 9s 11941

Noise reduction
Original recording
Noise reduction

Silence removal
Noise reduction
Silence removal

Class imbalance problem
 Binary classification problem
• COVID-19 Positive vs. COVID-19 Negative
• 613 recordings vs. 11506 recordings
 Data augmentation to deal with class imbalance
• Generation of synthetic audio tracks
belonging to the minority class
 Data augmentation on raw signal
• Time Stretch
• Pitch Shift
• Shift
• Gain
N° recordings
Healthy
Negative
9167
11506
Symptomatic 2339
COVID-19 Positive 613 613
Total 12119

Data augmentation
Preprocessed track
Augmented track

Feature extraction
 Cough sounds contain more energy in lower
frequencies
 MFCCs are a suitable representation for
cough recordings
• 15 MFCCs per frame
 Audio samples have a duration of 6 seconds
• MFCC matrices 15x259
 Also MFCC-∆ and MFCC-∆∆ were considered
• Features dimension 3x15x259

Network architecture
 Convolution layer, 64 filters, kernel size
3x3, ReLU activation function, input shape
259x15x3
 Max pooling layer, pool size 2x2
 Convolution layer, 32 filters, kernel size
2x2, ReLU activation function
 Batch normalization layer
 Flatten layer
 Fully connected layer, 256 units, ReLU
activation function
 Dropout layer, rate 0.5
 Fully connected layer, 128 units, ReLU
activation function
 Dropout layer, rate 0.3
 Output layer, 1 neuron, Sigmoid activation

Training & Results
 Standard procedure with augmentation only on
training set:
• Balanced training set (positive:negative =
1:3)
• Unbalanced validation and test set
 Terrible results for validation and test set
 The model don’t recognize actual positive
recordings
Loss Accuracy Precision
Val Test Val Test Val Test
3.80 3.81 0.91 0.89 0.07 0.04
Recall AUC
Val Test Val Test
0.07 0.05 0.48 0.52
Confusion matrix on test set

Training & Results
 Procedure followed in various papers:
• Data augmentation on full dataset, before
splitting
 Much better performances
 Questions:
• Is the classifier recognizing the positives
or the augmented audio?
• Is this approach reliable in evaluating real
audio?
Loss Accuracy Precision
Val Test Val Test Val Test
0.42 0.41 0.94 0.94 0.96 0.95
Recall AUC
Val Test Val Test
0.79 0.81 0.91 0.92
Confusion matrix on test set

PROCESSING OF
BI-DIMENSIONAL
SIGNALS

Dataset: COVIDx CXR-3
 Create by COVID-NET team
 8 different data sources
 Last release: 06/02/2022
 2 different datasets:
 Training Set
 Test Set
 3 classes: COVID-19, Pneumonia, Normal
 Two .txt file (train, test) containing metadata
• Patient ID
• File name
• Class
• Data Source
Patient ID 101
filename pneumocystis-jirovecii-
pneumonia-3-1.jpg
class pneumonia
Data source cohen

Data exploration
 Training set: 29.404 CXR images:
 COVID-19: 15.774 images
 Normal (no pathology) : 8.085 images
 Pneumonia: 5.545 images
 Test set: 400 CXR images:
 COVID-19: 200 images
 Normal (no pathology) : 100 images
 Pneumonia: 100 images
 The dataset is imbalanced
Training Set Distribution
Test Set distribution

Images Exploration
CXR «Normal»
CXR «COVID-19»
CXR «Pneumonia»
 Images are 1024x1024 pixels with 3 channel:
 Only Posterior-Anterior (PA) CXR
 Many images contain:
 Noise
 Undesirable parts
 Preliminary operations:
 Resized to 112x122x3
 Reduced computational cost
 Data Splitted
 Data Normalization

Image Pre-Processing
 Image Enhancement:
 Techniques used to improve the information
interpretability in images
 For radiologists and automated systems
 Pre-Processing
 Removal of textual information commonly
embedded in CXR images
Noisy CXR-image
Common textual items

Improved Adaptive Gamma Correction
 Adaptive Gamma Correction tool
 AGC (Adaptive Gamma Correction) is a tool
for image contrast
 AGC relates the gamma parameter with the
cumulative distribution function (CDF) of
the pixel gray levels
 good for most dimmed images, but fails for
globally bright images
 Improved Adaptive Gamma Correction
 new AGC algorithm
 enhance bright images with the use of
negative images
 enhance dimmed images with the use of gamma
correction modulated by truncated CDF
Flowchart of Improved AGC tool

Improved Adaptive Gamma Correction
No ACG applied ACG applied (too
bright)
ACG applied (too dim)

Pre-Processing:
 The chest CXR images were cropped
 top 8% of the image
 Commonly embedded textual information
 Central crop
 To Centre the cropped image
Some pre-processing examples

Class imbalance problem
 Different techniques explored to handle
unbalanced classes
 Under-sampling of the dataset
 Rebalancing with respect to the least
populated class
 Class-weights
 Assigns higher weights to samples from
underrepresented classes
 Over-sampling of the dataset
 Data augmentation on minority classes
 Positional-based Data Augmentation
 GAN
Classes Nr. images
COVID-19 15.774
Pneumonia 5.545
Normal 8.085
Total 29.904

Data Augmentation
 A data augmentation technique was adopted to
balance the classes, in particular was:
 Implemented after under-sampling (performing
it on all classes)
 Implemented to increase minority classes (not
performing it on the most populated class)
 Data augmentation was exploited with the
following types of augmentation:
 Translation (± 10% in x and y directions)
 Rotation (± 10)
 Horizontal flip, zoom (± 15%)
 Intensity shift (± 10%)
Some augmentation examples

CNN: Network Architecture
 Input layer (112x112x3)
 2 convolutional blocks, with:
 Convolutional layers
 Batch Normalization layers
 ReLu
 2 convolutional blocks with: Convolutional layer, ReLu
 2 Max Pooling layers
 2 Dropout layers (rate 0,2)
 Output of feature extractor is passed to Flatten layer
 Fully connected layer (128 neurons), ReLu
 Dropout layer (rate 0,5)
 Output layer, 3 neurons, Softmax activation function
Parameters Value
Max Epoch 50
Optimizer Adam
Learning rate 0.0001 (fixed)
Batch Size 32
Step per epoch 1035
Params:
2,416,611
Trainable:
2,416,451
Non-trainable:
160

Over-Sampling wPositional Augmentation Results
 The solution that produced the best results
turned out to be the one:
 without preprocessing
 and Over-Sampling of minority classes with
positional augmentation
Confusion matrix on
test set

Under-sampling Class-Weights AC-GAN Augmentation

Image Enhancement
Image Processing

Explainable AI: Class activation Heat-Map
 We developed an explainability algorithm based on the use of Gradient-weighted
Class Activation Mapping (Grad-CAM)
 It provides a visual output of the most interesting areas found by the proposed
CNN models
 Grad-CAM uses the gradients of any target concept, flowing into the final
convolutional layer to produce a coarse localization map highlighting the
important regions in the image for predicting the concept.
COVID-19 CXR, Activation
Map
Pneumonia CXR, Activation
Map

SYNTHETIC CHEST X-RAY
IMAGES GENERATION USING
AC-GAN

Conditional Generation of Synthetic Chest X-Ray Images
 Objectives:
 Train an AC-GAN to synthesize chest x-rays
images
 Conditional generation of healthy, covid-
19 and pneumonia patients x-rays
 Data augmentation on the class-imbalanced
COVIDx dataset to improve classification
performances
 Dataset → COVIDx
 Simple image pre-processing →112𝑥112
resizing and [0,1] pixel scaling
 Data augmentation → shearing and zooming
Normal
COVID-19
Pneumonia

Auxiliary Classifier Generative Adversarial Network (AC-GAN)
 AC-GAN → extension of the GAN architecture
 The generator is class conditional as with
cGANs
 Input → randomly sampled 100-dimensional
noise vector and a label,
 Output → conditionally generating a
112x112x3 image
 The classes → coded by integers (0,1,2).
 The discriminator → comes with an auxiliary
classifier
 trained to reconstruct the input image
class label.
 Input → 112x112x3 image (real or
synthesised)
 Output → predicts its source (real/fake)
and class (0,1,2)

1. Two inputs:
1. random 100-dimensional noise vector
2. integer class label c (0, 1, 2)
2. Class label → embedding layer → dense layer → 7
× 7 × 1
3. Noise vector → dense layer → 7 × 7 × 1024
4. These two tensors are then concatenated → 7 × 7 ×
1025
5. Four transposed convolutional layers (kernel size
= 5, stride = 2) → 112 × 112 × 3
• The first three are paired with batch
normalization and a Rectified Linear Unit
(ReLU) activation
• Last one with tanh activation
6. Output: fake image with size 112 × 112 × 3
Generator Noise Vector
100
Clas
s
Labe
l
1
Embedding 100
Dense 7 * 7 7 x 7 x 1
Dense 7 * 7 *
1024
7 x 7 x
1024
ReLU
Reshape
C 7 x 7 x
1025
14 x 14
x512
5x5 Conv2DTranspose
Batch Normalization
ReLU
28 x 28 x
256
5x5 Conv2DTranspose
Batch Normalization
ReLU
56 x 56 x
128
5x5 Conv2DTranspose
Tanh Activation
112 x 112
x 3
Fake image
112 x 112
x 3
𝑁(𝜇 = 0, 𝜎 = 0.02)
Params:
22,303,108
Trainable:
22,301,316
Non-trainable:
1,792

Discriminator
1. Input: 112 × 112 × 3 image → dataset (real) or
synthetic (fake)
2. Four blocks:
 Sequence of: convolutional layer, batch
normalization layer, LeakyReLU activation
(slope = 0.2) and dropout layer (p = 0.5).
 Image size: 112 × 112 × 3 → 7 × 7 × 512
3. The tensor is flattened → fed into two dense
layers
4. First dense layer + sigmoid activation
 Binary classifier → outputs a probability
indicating whether the image is from the
original dataset (as "real") or generated by
the generator (as "fake").
5. Second dense layer + softmax activation
 Multiclass classifier → outputs a 1D tensor of
probabilities of each class
Real / Fake
Image
112 x 112 x 3
Input Layer
3x3 Conv2D (stride 2)
Batch Normalization
LeakyReLU
Dropout
3x3 Conv2D (stride 2)
Batch Normalization
LeakyReLU
Dropout
56 x 56 x
64
28 x 28 x
128
3x3 Conv2D
(stride 2)
Batch
Normalization
LeakyReLU
Dropout
14 x 14 x
256
112 x 112
x 3
7 x 7 x
512
Flatten
25088
Dense 1 Dense 3
Auxiliary
Source
Sigmoid
Activation
Softmax
Activation
COVID-19 0
NORMAL 1
PNEUMONIA 2
FAKE 0 / REAL 1
Params:
1,672,900
Trainable:
1,670,916
Non-trainable:
1,984

Training and regularization
 Adam optimizer → both the generator and the
discriminator
 Two loss functions, one for each output layer of the
discriminator
 First output layer → binary cross-entropy loss
(source loss 𝑳𝒔)
 Second output layer → sparse categorical cross
entropy (auxiliary classifier loss 𝑳𝒄)
 Minimize the overall loss 𝑳 = 𝑳𝒔 + 𝑳𝒄 → during the
generator training as well as the discriminator
training
 Label flipping (generator training) → all the fake
(0) images generated are passed to discriminator
labelled as real (1)
 Labels smoothing (discriminator training) → applied to
the binary vectors describing the origin of the image
(0/real – 1/fake) as a regularization method
Parameters Value
Max Epoch 388
Optimizer Adam
Learning rate 0.0002 (fixed)
Adam 𝜷𝟏 0.5 (fixed)
Batch Size 64
Steps per epoch 460

Auxiliary Loss 𝑳𝒄
Source Loss 𝑳𝒔 Total Loss 𝑳
Training
Testing
Discriminator
Discriminat
or
Generator
Overall
Accuracy
Real
Accuracy
Fake Accuracy

Choosing the best AC-GAN model weights for data augmentation
1. First set of models selection based on:
 ↑ visual quality qualitative evaluation of
sample images generated during each epoch
 ↓ generator losses
 ↓ discriminator accuracy in correctly
classifying fake images as fake.
2. Trained a classifier on synthetic images only →
evaluated the classification accuracy on real
COVIDx images
 epoch 288 → best model
3. Generated Images Quality Evaluation
 ↓ FID, ↓ Intra-FID and ↑ Inception Score (IS) →
InceptionV3
4. 2D t-SNE embedding visualization of generated and
real images

Evaluation
Metric Value
Generator loss 𝑳 0.44
Discriminator
accuracy (fake
images)
0.13
Qualitative
appearance
Realistic
CNN Accuracy (on
real images)
0.63
Real t-
SNE
Synthetic t-
SNE
Our AC-GAN Paper AC-
GAN [6]
IS ↑ 2.71 (±
1.70)
2.51 (±
0.12)
FID ↓ 123.26 (±
0.02)
50.67 (±
8.13)
Intra
FID ↓
136 (±
0.02)

Real and Synthetic chest x-ray sample
Normal
Pneumonia
COVID-19
Real Fake

/ Bibliography
1. Fakhry, A., Jiang, X., Xiao, J., Chaudhari, G., Han, A., & Khanzada, A. (2021). Virufy: A
multi-branch deep learning network for automated detection of COVID-19.
2. Hamdi, S., Oussalah, M., Moussaoui, A., & Saidi, M. (2022). Attention-based hybrid CNN-LSTM and
spectral data augmentation for COVID-19 diagnosis from cough sound. Journal of Intelligent
Information Systems, 59(2), 367-389.
3. Mahanta, S. K., Kaushik, D., Van Truong, H., Jain, S., & Guha, K. (2021, December). Covid-19
diagnosis from cough acoustics using convnets and data augmentation. In 2021 First
International Conference on Advances in Computing and Future Communication Technologies
(ICACFCT) (pp. 33-38). IEEE.
4. COUGHVID: A cough based COVID-19 fast screening project. https://c4science.ch/diffusion/10770/
5. Orlandic, L., Teijeiro, T., & Atienza, D. (2021). The COUGHVID crowdsourcing dataset, a corpus
for the study of large-scale cough analysis algorithms. Scientific Data, 8(1), 156.
6. Odena, A., Olah, C., & Shlens, J. (2017). Conditional Image Synthesis With Auxiliary Classifier
GANs (arXiv:1610.09585). arXiv. https://doi.org/10.48550/arXiv.1610.09585
7. Christi Florence, C. (2021). Detection of Pneumonia in Chest X-Ray Images Using Deep Transfer
Learning and Data Augmentation With Auxiliary Classifier Generative Adversarial Network. 14.

/ Bibliography
8. Karbhari, Y., Basu, A., Geem, Z. W., Han, G.-T., & Sarkar, R. (2021). Generation of Synthetic
Chest X-ray Images and Detection of COVID-19: A Deep Learning Based Approach. Diagnostics,
11(5), Article 5. https://doi.org/10.3390/diagnostics11050895.
9. DeVries, T., Romero, A., Pineda, L., Taylor, G. W., & Drozdzal, M. (2019). On the Evaluation of
Conditional GANs (arXiv:1907.08175). arXiv. https://doi.org/10.48550/arXiv.1907.08175
10.Borji, A. (2018). Pros and Cons of GAN Evaluation Measures (arXiv:1802.03446). arXiv.
https://doi.org/10.48550/arXiv.1802.03446
11.Goel S, Kipp A, Goel N, et al. (November 22, 2022) COVID-19 vs. Influenza: A Chest X-ray
Comparison. Cureus 14(11): e31794. doi:10.7759/cureus.31794
12.Kim, S.-H.; Wi, Y.M.; Lim, S.; Han, K.-T.; Bae, I.-G. Differences in Clinical Characteristics
and Chest Images between Coronavirus Disease 2019 and Influenza-Associated Pneumonia.
Diagnostics 2021, 11, 261. https://doi.org/10.3390/ diagnostics11020261

/ Bibliography
13.Wang, L., Lin, Z.Q. & Wong, A. COVID-Net: a tailored deep convolutional neural network design
for detection of COVID-19 cases from chest X-ray images. Sci Rep 10, 19549 (2020).
https://doi.org/10.1038/s41598-020-76550-z
14.Gang Cao, Lihui Huang, Huawei Tian, Xianglin Huang, Yongbin Wang, Ruicong Zhi, Contrast
enhancement of brightness-distorted images by improved adaptive gamma correction, Computers &
Electrical Engineering, Volume 66, 2018, Pages 569-582, ISSN 0045-7906,
https://doi.org/10.1016/j.compeleceng.2017.09.012.
15.Ait Nasser, A.; Akhloufi, M.A. A Review of Recent Advances in Deep Learning Models for Chest
Disease Detection Using Radiography. Diagnostics 2023, 13, 159.
https://doi.org/10.3390/diagnostics13010159
16.Huang, W., Song, G., Li, M., Hu, W., Xie, K. (2013). Adaptive Weight Optimization for
Classification of Imbalanced Data. In: Sun, C., Fang, F., Zhou, ZH., Yang, W., Liu, ZY. (eds)
Intelligence Science and Big Data Engineering. IScIDE 2013. Lecture Notes in Computer Science,
vol 8261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42057-3_69
17.Elshennawy, N.M.; Ibrahim, D.M. Deep-Pneumonia Framework Using Deep Learning Models Based on
Chest X-Ray Images. Diagnostics 2020, 10, 649. https://doi.org/10.3390/diagnostics10090649
18.Chetoui, M.; Akhloufi, M.A.; Yousefi, B.; Bouattane, E.M. Explainable COVID-19 Detection on
Chest X-rays Using an End-to-End Deep Convolutional Neural Network Architecture. Big Data Cogn.

Auxiliary Loss 𝑳𝒄
Source Loss 𝑳𝒔 Total Loss 𝑳
Training
Testing
Generator
Discriminator

Medical data management: COVID-19 detection using cough recordings, chest X-rays classification and generation

More Related Content

Similar to Medical data management: COVID-19 detection using cough recordings, chest X-rays classification and generation (20)

Recently uploaded (20)

Medical data management: COVID-19 detection using cough recordings, chest X-rays classification and generation