SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 3, September 2024, pp. 3273~3282
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3273-3282  3273
Journal homepage: http://ijai.iaescore.com
Transfer learning scenarios on deep learning for ultrasound-
based image segmentation
Didik Bani Unggul, Nur Iriawan, Heri Kuswanto
Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Article Info ABSTRACT
Article history:
Received Jan 3, 2024
Revised Feb 21, 2024
Accepted Mar 6, 2024
Deep learning coupled with transfer learning, which involves reusing a
pretrained model's network structure and parameter values, offers a rapid
and accurate solution for image segmentation. Differing approaches exist in
updating transferred parameters during training. In some studies, parameters
remain frozen or untrainable (referred to as TL-S1), while in others, they act
as trainable initial values updated from the first iteration (TL-S2). We
introduce a new state-of-the-art transfer learning scenario (TL-S3), where
parameters initially remain unchanged and update only after a specified
cutoff time. Our research focuses on comparing the performance of these
scenarios, a dimension yet unexplored in the literature. We simulate on three
architectures (Dense-UNet-121, Dense-UNet-169, and Dense-UNet-201)
using an ultrasound-based dataset with the left ventricular wall as the region
of interest. The results reveal that the TL-S3 consistently outperforms the
previous state-of-the-art scenarios, i.e., TL-S1 and TL-S2, achieving correct
classification ratios (CCR) above 0.99 during training with noticeable
performance spikes post-cutoff. Notably, two out of three top-performing
models in the validation data also originate from TL-S3. Finally, the best
model is the Dense-UNet-121 with TL-S3 and a 20% cutoff. It achieves the
highest CCR for training 0.9950, validation 0.9699, and testing data 0.9695,
confirming its excellence.
Keywords:
Deep learning
Image segmentation
Left ventricular wall
Transfer learning
Ultrasound
This is an open access article under the CC BY-SA license.
Corresponding Author:
Nur Iriawan
Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember
Kampus ITS Sukolilo, Surabaya, Indonesia
Email: nur_i@statistika.its.ac.id
1. INTRODUCTION
Image segmentation is a crucial task in image and video processing. This process involves dividing
the image into multiple segments or objects by assigning class labels to each pixel [1]. Its applications are
widespread and encompass medical imaging [2]–[4], remote sensing [5]–[7], and the development of
autonomous vehicles [8]–[10]. Amid various segmentation methods, deep learning emerges as a promising
approach [11]–[13]. They decompose complex mappings into a sequence of simpler ones, each described by
different layers [14]. The input is presented in a visible layer, and subsequent hidden layers extract abstract
features from it. The refinement of these layers is driven by the results of the training process, rather than
manual intervention [15]. With a large number of layers, they can accurately represent input features and
effectively perform complex tasks like image segmentation, natural language processing, or stock price
prediction [16]. Due to this benefit, deep learning is better than traditional machine learning methods, which
still rely on domain expertise for feature extraction.
Deep learning implementation, however, requires a large amount of training data and may require a
while to complete [17]. This presents difficulties, particularly in the medical domain where labeled datasets
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282
3274
are scarce [18]. To overcome this issue, transfer learning can be coupled with deep learning approach [17].
Reusing pre-trained network components, such as the structure and parameter values, is part of this process.
To be more precise, the network is typically divided into two parts: the part receiving transfer learning and
the part not receiving it. The first, leveraging transfer learning, will be structurally identical with parameter
values transferred from a pre-trained model. The source model is typically trained on a larger dataset, which
may be related or entirely different. The next section is a non-transferred layer, meaning its parameter values
are initialized and updated during training.
Furthermore, variations exist in how parameter values are handled in layers affected by transfer
learning. These values can be "frozen" (non-trainable) and maintained in that state, or they can be "unfrozen"
(trainable) and updated as the training progresses. Some studies treat them as non-trainable parameters
[19]–[22]. On the other hand, some researchers utilize transfer learning values for initialization and updating
them immediately in the first training iteration [23], [24]. Unfortunately, to the best of our knowledge, no
research has evaluated the effectiveness of these two scenarios simultaneously. The majority of the articles
only contrasted one scenario of transfer learning with a model that did not employ transfer learning [19],
[21]. Furthermore, a lot of applications only construct a transfer learning model without contrasting it with
any other models [18]. This leads to a gap in knowledge that requires research. Therefore, this study aims to
compare those two parameter update scenarios, as well as introduce a new state-of-the-art transfer learning
scenario. This scenario involves updating the newly transferred parameter values only after a specific time
point is reached.
Dense-UNet, a deep learning architecture that hybridizes Unet [4] and DenseNet [25], was
employed in this investigation. This architecture was implemented to limit the number of model parameters,
maximize information flow between network layers, and address vanishing gradient concerns due to its
feature reuse and dense connections at each stage [26]. The encoder and the decoder are the two primary
components of this architecture in general. The encoder, also known as the contraction path, is responsible
for applying transfer learning from a pre-trained model and extracting features. The second component,
known as the expanding path or decoder, is amid reconfiguring features and boosting spatial resolution
through the use of upsampling operators [4], [27]. These two paths are connected via skip connections, in
which the feature maps from the encoder are bypassed and concatenated with the decoder results at specific
positions [28].
The simulation will be conducted on an ultrasound-based cardiac assessment dataset. Ultrasound,
known for its accessibility, affordability, and absence of radiation exposure, addresses key healthcare concerns
[29]. However, due to increased noise and decreased contrast, observing certain cardiac features can be
challenging, as they are typically difficult to determine and interpret [30]. Therefore, automatic segmentation
is urgently required for assistance in identifying the region of interest in ultrasound-based images.
Nevertheless, in contrast to other non-invasive imaging modalities like magnetic resonance imaging (MRI)
and computed tomography scan (CT-scan), research on automatic segmentation in ultrasound, particularly
utilizing deep learning, has been very limited in recent years [31]. To overcome this problem, we employ a
publicly available dataset from Hamad Medical Corporation, Qatar University, and Tampere University
known as the HMC-QU dataset, accessible at https://www.kaggle.com/datasets/aysendegerli/hmcqu-dataset.
This dataset encompasses ultrasound-based assessments featuring diverse patients and viewpoint types.
Furthermore, the ground truth is supplied, with the left ventricular wall (LVW) serving as the region of
interest (ROI). This is essential to us because LVW movement and structure analysis serves as an early
indicator of various heart problems, including myocardial infarction and hypertrophic cardiomyopathy [30],
[32]. This dataset has been used in several earlier investigations, either for segmentation or for the
identification of structural and movement anomalies [33]–[37]. While deep learning remains the dominant
option, none of these studies has explored the use of transfer learning to the extent that we propose.
Therefore, our research provides practical benefits for the development of ultrasound-based cardiac image
processing in addition to theoretical benefits for deep learning transfer learning scenarios.
2. METHOD
2.1. Dense-UNet architecture
Dense-UNet is a modified U-Net architecture that incorporates dense blocks and transition layers
into its structure, drawing inspiration from the DenseNet architecture introduced by [25]. Their layer-to-layer
linkages are the main distinction between standard block layers and dense blocks. Each layer in a dense block
obtains feature mappings from every layer before it via some concatenation [25]. This feature reuse
minimizes the addition of excessive features in each layer, consequently reducing the required parameters.
However, it necessitates that the dimensions of feature maps remain unchanged due to concatenation-based
merging. This limitation impedes the implementation of a pooling procedure, which is generally resolved by
Int J Artif Intell ISSN: 2252-8938 
Transfer learning scenarios on deep learning for ultrasound-based image segmentation (Didik Bani Unggul)
3275
adding a transition layer. In the original configuration, this transition layer consists of 2×2 average pooling
preceded by 1×1 convolution.
Figure 1 illustrates the structure of the nine-stage Dense-UNet. A 7×7 convolution is employed in
the first step to process the input dimensions from 224×224 to 112×112. This process continues with the first
transition layer, leading us to the first dense block in the second stage. Within a dense block, layer
configurations include batch normalization (BN), rectified linear unit (ReLU) activation, 3×3 convolution,
another BN, ReLU activation, and 1×1 convolution. This sequence is repeated several times depending on
the architectural construction. Subsequently, the second transition layer guides us to the third stage (second
dense block). We will continue this process until we reach the fourth dense block in the fifth stage when we
have 7×7 feature maps. The next step involves starting 2×2 upsampling and concatenating the result with the
final feature maps from the fourth stage. Their results will serve as the input for the fifth dense block, which
has the same layer configuration as its mirrored version (third dense block). This provision continues until we
reach the ninth stage, concluding with a sigmoid activation layer and a resulting output of 224×224.
Determining how many layers are present in each dense block is another crucial factor. The number
of layers in this study, ranging from stage one to stage five, follows the DenseNet-121, DenseNet-169, and
DenseNet-201 structure of the original DenseNet versions [25]. The sixth to ninth stages replicate this
structure by mirroring the number of layers. Under these conditions, the three Dense-UNet architectures in
this study are named Dense-UNet-121, Dense-UNet-169, and Dense-UNet-201.
Figure 1. Dense-UNet structure with nine processing stages
2.2. Transfer learning
Transfer learning is a concept in learning that attempts to enhance model performance by employing
knowledge acquired from a learning task in one domain (the source domain) to improve performance in a
different area (the target domain). Addressing data inadequacy in the target domain is one of the many
benefits of this method [17], [38], [39]. It will mitigate this issue by relaxing the assumption that training and
testing data must originate from the identical domain. Transfer learning in deep learning refers to pre-training
a network on a source domain, frequently a larger dataset like ImageNet [40]. This process leads to a model
with optimized parameter values representing previously acquired knowledge. These parameter values are
subsequently transferred to another network created particularly for the target dataset. Notably, the two
networks are often dissimilar. As a result, the new model incorporates layers that receive the transfer learning
results alongside layers that do not. In the Dense-UNet architecture discussed earlier, we can determine that
the portion designated for transfer learning is the initial half known as the encoder, encompassing the first to
fifth stages. If the architecture has M layers and the encoder consists of 𝐾 layers (𝐾 < 𝑀), the first 𝐾 layers
will receive the parameter values from the pre-trained model. Furthermore, the other layers will be initialized
using either fixed or random numbers [41]. After that, there are different scenarios for how we handle the
transferred parameters:
‒ Scenario 1: freeze scenario (TL-S1). Transferred parameters are regarded as untrainable since their values
remain unchanged while they are being trained, essentially freezing them. Layers not undergoing transfer
learning will be initialized and updated from the first iteration until the completion of training.
‒ Scenario 2: unfreeze scenario (TL-S2). In this scenario transfer learning parameter values act as
initializations and are changed in real time throughout training. Thus, all parameters are deemed trainable,
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282
3276
regardless of whether they are in layers with or without transfer learning. The initialization procedure is
where the differences arise: non-transferred layers begin with glorot uniform initialization, whereas other
layers start with values from a pre-trained model.
‒ Scenario 3: freeze-unfreeze scenario (TL-S3). Parameters in layers affected by transfer learning will
remain unchanged for an initial portion of the training process. In other words, only the parameters in
layers not affected by transfer learning will be updated, while those influenced by transfer learning will be
frozen. After reaching a pre-defined epoch threshold, the transfer learning layer is unfrozen, and training
continues across all layers. The transfer learning cutoff will be explored at various stages, including 20%,
40%, 60%, and 80% of the total training epochs. This exploration will clarify how the timing of the
transition impacts the final outcome.
In this study, we will simulate the three scenarios that are depicted in Figure 2.
Figure 2. Three parameter updating scenarios in transfer learning
2.3. Optimization technique
In architectures like Dense-UNet, “trainable parameters” encompass weights and biases in
convolution layers, as well as scale and shift parameters in BN layers. During training, these parameters are
optimized, commencing with initial values generated by glorot-uniform initialization [42]. This technique
uses a uniform distribution with an interval limit of [−𝑎, 𝑎], where 𝑎 is calculated employing (1). The values
of 𝑛𝑖𝑛 and 𝑛𝑜𝑢𝑡 represent the number of input and output units of the layer, respectively.
𝑎 =
√6
√𝑛𝑖𝑛+𝑛𝑜𝑢𝑡
(1)
Next, the adaptive moment (Adam) technique proposed in [43] will be utilized for updating the
initial value iteratively. This method updates parameter values using bias-corrected values of gradients' first
and second moments estimations. Algorithm 1 illustrates the procedure. The first component that must be
calculated is the gradient of the loss function with respect to the model parameters, denoted by 𝑔𝑡 where 𝑡 is
the index of iteration performed. The binary cross-entropy loss function as in (2) was selected to suit the
binary classification task.
𝐿𝑖 = −[𝑐𝑖 𝑙𝑜𝑔(𝑝(𝑐𝑖)) + (1 − 𝑐𝑖) 𝑙𝑜𝑔(1 − 𝑝(𝑐𝑖))]q (2)
Int J Artif Intell ISSN: 2252-8938 
Transfer learning scenarios on deep learning for ultrasound-based image segmentation (Didik Bani Unggul)
3277
Algorithm 1. Adam Optimization
Require: 𝛽1, 𝛽2, 𝜂, 𝜀: Hyperparameter
Require: 𝑓𝑡(𝜃𝑡−1): Stochastic objective function as in (2) with trainable parameter 𝜃𝑡−1 at time-step 𝑡 − 1
Require: 𝜃0: Initial parameter vector generated from glorot uniform
𝑚0 ← 0 (Initialize 1st
moment vector)
𝑣0 ← 0 (Initialize 2nd
moment vector)
𝑡 ← 0 (Initialize time-step)
While 𝜃𝑡 not converged do
𝑡 ← 𝑡 + 1
Get gradients w.r.t. stochastic objective at timestep 𝑡 using (3):
𝑔𝑡 = 𝛻𝜃𝑓𝑡(𝜃𝑡−1) (3)
Update biased 1st
moment estimate using (4):
𝑚𝑡 = 𝛽1𝑚𝑡−1 + (1 − 𝛽1)𝑔𝑡 (4)
Update biased 2nd
moment estimate using (5):
𝑣𝑡 = 𝛽2𝑣𝑡−1 + (1 − 𝛽2)𝑔𝑡
2
(5)
Compute bias-corrected 1st
moment estimate using (6):
𝑚
̂𝑡 =
1
1−(𝛽1)𝑡 𝑚𝑡 (6)
Compute bias-corrected 2nd
moment estimate using (7):
𝑣
̂𝑡 =
1
1−(𝛽2)𝑡 𝑣𝑡 (7)
Update parameters using (8):
𝜃𝑡 = 𝜃𝑡−1 −
𝜂
√𝑣
̂𝑡+𝜀
𝑚
̂𝑡 (8)
End while
Return 𝜃𝑡 (Resulting parameters)
The loss for the 𝑖𝑡ℎ
pixel, denoted as 𝐿𝑖, is defined for 𝑖 = 1, . . . , 𝑁 with 𝑁 representing the total
pixels in the output image. The actual classification class of 𝑖𝑡ℎ
pixel is notated by 𝑐𝑖 ∈ {0,1}, in which
𝑐𝑖 = 0 is for the background and 𝑐𝑖 = 1 is for the ROI. Lastly,𝑝(𝑐𝑖) is the predicted probability of belonging
to class 𝑐𝑖 calculated by the model. After finding the 𝑔𝑡, we are able to calculate the exponentially weighted
moving average of the gradient (𝑚𝑡) and squared gradient (𝑣𝑡). This step demands us to configure the
hyperparameter values 𝛽1, 𝛽2 ∈ [0,1) as the exponential decay rates for the moment estimates. We then
utilize the bias-corrected versions of 𝑚𝑡 and 𝑣𝑡 along with 𝜂 and 𝜀 to update parameter values from 𝜃𝑡−1 to
𝜃𝑡. We also set the hyperparameter values at 𝛽1 = 0.9, 𝛽2 = 0.999, 𝜂 = 10−6
, and 𝜀 = 10−8
.
2.4. Evaluation metric
This study utilizes the correct classification ratio (CCR) to evaluate the model performances. This
metric is demonstrated in (9). 𝐺𝑇𝑗 represents the ground truth area for class 𝑗, while 𝑆𝑒𝑔𝑗 depicts the model's
corresponding segmentation area. Class 𝑗 = 0 is designated for the background (non-ROI) and 𝑗 = 1 is for
the LVW area (ROI).|𝐺𝑇𝑗 ∩ 𝑆𝑒𝑔𝑗| denotes the number of pixels from class 𝑗 which are accurately classified
by the model. |𝐺𝑇| can be measured by counting the number of pixels from the union of the 𝐺𝑇0 and 𝐺𝑇1
areas. The CCR values vary between 0 and 1. All pixels are appropriately categorized and our segmentation
precisely matches the ground truth if the CCR value is one. A decreasing CCR indicates deteriorating
segmentation results.
CCR = ∑
|𝐺𝑇𝑗∩𝑆𝑒𝑔𝑗|
|𝐺𝑇|
1
𝑗=0 (9)
2.5. Dataset and experimental setup
An online echocardiogram (ECG) dataset made available by HMC-QU was employed in this study.
Specifically, we concentrated on a subset of the dataset that included 109 ECG video recordings that had
224×224 pixels ground truth available. These videos presented the apical 4-chamber view at a resolution of
636×422 pixels. They had a frame rate of 25 frames per second and ranged in duration from 1 to 3 seconds.
The videos were randomly divided into training, validation, and testing sets (80%:10%:10%),
resulting in 87 training videos, 11 for validation, and 11 for testing. The images were then extracted and fed
into preprocessing. They were center-cropped to 422×422 pixels and resized to 224×224 pixels. The red,
green, and blue channels' color intensities were then extracted into three matrices. The matrix elements,
initially ranging from 0 to 255, were normalized to a 0-to-1 range, serving as input for the deep learning
architecture.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282
3278
The training utilized a batch size of 10, with 10 images selected at random for each iteration. Each
epoch concluded after processing all images, and this procedure was repeated for 100 epochs. The
experiment was conducted on Google Colab using an NVIDIA V100 GPU, with Python 3 and the Keras
framework chosen for their effectiveness and executability.
3. RESULTS AND DISCUSSION
Figure 3 displays an example of many ultrasound pictures processed during this study. The original
photographs that were selected at random are presented in the first row, and the ground truth for those images
is displayed in the second row. The combined result, which displays the location of the LVW area
determined by the ground truth, can be examined in the third row. The models use ground truth as a reference
during training to effectively learn and recognize LVW characteristics.
Figure 3. Some examples of ultrasound-based images and their ground truth (mask)
Table 1 summarizes the training durations (in seconds), loss values, and CCR for the three
Dense-UNet architectures with various transfer learning scenarios. Notably, across all architectures, the
suggested third scenario (TL-S3) consistently outperforms models without transfer learning (NoTL), TL-S1,
and TL-S2. The models under TL-S3 achieve a remarkable CCR exceeding 0.99, a level not attained by
models from other scenarios. Furthermore, TL-S3 models demonstrate far lower losses than the others, with
reductions ranging from 82% to 97%.
Table 1. The performance results of Dense-UNet models
Architecture Transfer Learning
Scenario
Training Validation Duration
(in second)
Loss CCR Loss CCR
Dense-UNet-121 No TL 0.2338 0.9772 0.2461 0.9685 2,983
TL-S1 0.0918 0.9815 0.1166 0.9685 1,955
TL-S2 0.1509 0.9849 0.1793 0.9679 2,966
TL-S3 20%-F 0.0095 0.9950 0.2063 0.9699 2,857
TL-S3 40%-F 0.0106 0.9948 0.1765 0.9694 2,666
TL-S3 60%-F 0.0169 0.9925 0.1369 0.9677 2,455
TL-S3 80%-F 0.0169 0.9925 0.1282 0.9692 2,250
Dense-UNet-169 No TL 0.2491 0.9777 0.2610 0.9679 3,849
TL-S1 0.1639 0.9780 0.1763 0.9694 2,383
TL-S2 0.1555 0.9854 0.1886 0.9674 3,838
TL-S3 20%-F 0.0095 0.9950 0.2074 0.9678 2,752
TL-S3 40%-F 0.0101 0.9949 0.2038 0.9678 2,552
TL-S3 60%-F 0.0110 0.9946 0.1823 0.9689 2,373
TL-S3 80%-F 0.0198 0.9913 0.1321 0.9667 2,170
Dense-UNet-201 No TL 0.1391 0.9833 0.1648 0.9694 4,874
TL-S1 0.2200 0.9746 0.2282 0.9683 3,112
TL-S2 0.2872 0.9791 0.3048 0.9669 4,893
TL-S3 20%-F 0.0100 0.9949 0.1905 0.9684 4,668
TL-S3 40%-F 0.0108 0.9947 0.1691 0.9680 4,327
TL-S3 60%-F 0.0115 0.9945 0.1804 0.9687 3,989
TL-S3 80%-F 0.0204 0.9910 0.1085 0.9679 3,707
No TL: Without transfer learning; TL-S1: Transfer learning scenario 1 (freeze scenario); TL-S2: Transfer learning
scenario 2 (unfreeze scenario); TL-S3 20%, 40%, 60%, 80%-F: Transfer learning scenario 3 (freeze-unfreeze scena-
rio with non-freezing start cutoffs being 20%, 40%, 60%, and 80% of the total epoch, respectively).
Int J Artif Intell ISSN: 2252-8938 
Transfer learning scenarios on deep learning for ultrasound-based image segmentation (Didik Bani Unggul)
3279
Investigation reveals that when transfer learning parameters are unfrozen after the cutoff, the TL-S3
models perform noticeably better. TL-S3 20%-F models, for example, exhibit better performance spikes after
20 epochs, whereas TL-S3 40%-F models show better performance surges after 40 epochs. The TL-S3
60%-F and TL-S3 80%-F models also exhibit this pattern. The learning curve provides a visual
representation of this circumstance, with Figures 4(a) to 4(c) representing CCR and Figures 5(a) to 5(c)
representing loss. It validates the hypothesis that temporarily freezing transfer learning layers enables the
model to adapt to the current case's characteristics without disrupting the robust feature extraction of pre-
trained layers. After the non-transfer learning layer stabilizes, unfreezing the transfer learning layer boosts
performance by iteratively updating all parameters. This performance jump occurs shortly after the cutoff.
(a) (b) (c)
Figure 4. Learning curve for CCR values: (a) Dense-UNet-121, (b) Dense-UNet-169, and (c) Dense-UNet-201
(a) (b) (c)
Figure 5. Learning curve for loss values: (a) Dense-UNet-121, (b) Dense-UNet-169, and (c) Dense-UNet-201
The average CCR increase for TL-S3 models during the next twenty epochs was 0.0216, compared
to 0.0048 for other scenarios. This approximately five-fold difference highlights how much preferable the
TL-S3 scenario is. We discover that the TL-S3 20%-F scenario corresponds to the best-performing model
during training in each Dense-UNet architecture. Dense-UNet-121, 169, and 201 with this scenario had CCR
values of 0.9950, 0.9950, and 0.9949, respectively, placing them among the top three in terms of both CCR
and loss. With a CCR of 0.9699, the Dense-UNet-121 model with TL-S3 20%-F also leads in validation.
Dense-UNet-121 with TL-S3 40%-F and Dense-UNet-169 with TL-S1 are the second and third-best models,
respectively, with CCR values of 0.9694. TL-S3 scenario models were able to maintain two of the top three
positions in this instance. Then, a different testing dataset was employed to further evaluate these three
models, which were determined to be the best options. Once more, the model with the greatest CCR of
0.9695 was Dense-UNet-121 with TL-S3 20%-F. It performed better than Dense-UNet-121 with TL-S3
40%-F and Dense-UNet-169 with TL-S1, which had CCR values of 0.9685 and 0.9681, respectively. The
results demonstrate the strong segmentation capabilities of Dense-UNet-121, confirming its superior
performance with TL-S3 20%-F. It continuously achieves the greatest CCR (0.9950, 0.9699, and 0.9695,
respectively) across training, validation, and testing datasets.
When comparing models with and without transfer learning, models with transfer learning generally
demonstrate faster training times. Dense-UNet-201 TL-S2 is an exception, taking 19 seconds longer than
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282
3280
Dense-UNet-201 without transfer learning. In other circumstances, transfer learning steadily quickens the
training process. Second, we anticipated that TL-S1 would demonstrate the most rapid training time. The
rationale behind this approach stemmed from the observation that TL-S1 requires fewer learned parameters
than TL-S2 and TL-S3. Nevertheless, our research indicates that this hypothesis is valid exclusively when
comparing TL-S1 and TL-S2. Interestingly, certain of the models in the TL-S3 scenario required less training
time than those in TL-S1. This result provides an interesting novel perspective to our investigation, indicating
that the special parameter update approach employed by TL-S3 may help enhance the effectiveness of
training. We additionally discover that among TL-S3 models, the training period varies depending on the
cutoff position selection. The earlier the transition from non-trainable (freeze) to trainable (unfreeze) status
occurs, the longer the training duration. This condition is attributed to the increasing proportion of epochs
with a full-scale trainable parameter set. In terms of processing time, our best model, the Dense-UNet-121
with TL-S3 20%-F, also performed well. With 2,857 seconds of duration, it is faster than 52% of other
models.
Lastly, Figure 6 provides a visualization of data segmentation testing with our best model. The
original photos are displayed in the top row, and a comparison of the ROI contour generated by the model
(red line) and the ground truth (blue line) is presented in the bottom row. This figure illustrates how the
model can segment data from a new dataset that was not utilized during training.
Figure 6. Segmentation results produced by the best model
4. CONCLUSION
This study provides several important conclusions. Firstly, during training, the TL-S3 scenario
consistently outperforms other scenarios, achieving CCRs over 0.99 and losses under 0.0205. This
superiority is explained by TL-S3's learning curve exhibiting a performance increase after surpassing the
freezing cutoff. Five times higher than the rest, the average CCR increase in the 20 epochs post-cutoff is
0.0216. Furthermore, the excellence of TL-S3 extends to validation process, securing top positions in terms
of the highest CCR. In summary, the Dense-UNet-121 model with TL-S3 20%-F is deemed the best,
achieving a training duration of 2,857 seconds and attaining the highest CCR values for training, validation,
and testing data (0.9950, 0.9699, and 0.9695, respectively). This study establishes opportunities for further
research on the TL-S3 scenario by raising two crucial issues: first, determining the optimal transition point
from 'untrainable' to 'trainable' status, and second, exploring how distinct training parameter adjustments can
be made for each layer impacted by transfer learning. These investigations are expected to enhance the
robustness and performance of the deep learning model with transfer learning.
ACKNOWLEDGEMENTS
The research presented in this paper was supported by Department of Statistics, Institut Teknologi
Sepuluh Nopember and Indonesia Endowment Fund for Education Agency under scholarship no.
KET-438/LPDP.4/2022.
REFERENCES
[1] R. Szeliski, Computer vision: algorithms and applications. Cham: Springer, 2022.
[2] R. Ranjbarzadeh, A. Caputo, E. B. Tirkolaee, S. J. Ghoushchi, and M. Bendechache, “Brain tumor segmentation of MRI images: a
comprehensive review on the application of artificial intelligence tools,” Computers in Biology and Medicine, vol. 152, 2023, doi:
10.1016/j.compbiomed.2022.106405.
Int J Artif Intell ISSN: 2252-8938 
Transfer learning scenarios on deep learning for ultrasound-based image segmentation (Didik Bani Unggul)
3281
[3] N. Salpea, P. Tzouveli, and D. Kollias, “Medical image segmentation: a review of modern architectures,” in Computer Vision –
ECCV 2022 Workshops, 2023, pp. 691–708, doi: 10.1007/978-3-031-25082-8_47.
[4] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” in Medical Image
Computing and Computer-Assisted Intervention – MICCAI 2015, Cham: Springer, 2015, pp. 234–241, doi: 10.1007/978-3-319-
24574-4_28.
[5] S. M. Azimi, C. Henry, L. Sommer, A. Schumann, and E. Vig, “SkyScapes ­ fine-grained semantic understanding of aerial
scenes,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7392–7402, doi:
10.1109/ICCV.2019.00749.
[6] P. Bhadoria, S. Agrawal, and R. Pandey, “Image segmentation techniques for remote sensing satellite images,” IOP Conference
Series: Materials Science and Engineering, vol. 993, no. 1, pp. 1–17, 2020, doi: 10.1088/1757-899X/993/1/012050.
[7] B. E. -Zahouani et al., “Remote sensing imagery segmentation in object-based analysis: a review of methods, optimization, and
quality evaluation over the past 20 years,” Remote Sensing Applications: Society and Environment, vol. 32, 2023, doi:
10.1016/j.rsase.2023.101031.
[8] D. Feng et al., “Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and
challenges,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1341–1360, 2021, doi:
10.1109/TITS.2020.2972974.
[9] D. -V. Giurgi, T. J. -Laurain, M. Devanne, and J. -P. Lauffenburger, “Real-time road detection implementation of UNet
architecture for autonomous driving,” in 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop
(IVMSP), 2022, pp. 1–5, doi: 10.1109/IVMSP54334.2022.9816237.
[10] L. Lizhou and Z. Yong, “A closer look at U-net for road detection,” in Tenth International Conference on Digital Image
Processing (ICDIP 2018), 2018, doi: 10.1117/12.2503282.
[11] M. Aljabri and M. AlGhamdi, “A review on the use of deep learning for medical images segmentation,” Neurocomputing, vol.
506, pp. 311–335, 2022, doi: 10.1016/j.neucom.2022.07.070.
[12] B. Sistaninejhad, H. Rasi, and P. Nayeri, “A review paper about deep learning for medical image analysis,” Computational and
Mathematical Methods in Medicine, vol. 2023, pp. 1–10, 2023, doi: 10.1155/2023/7091301.
[13] S. M. Khaniabadi, H. Ibrahim, I. A. Huqqani, F. M. Khaniabadi, H. A. M. Sakim, and S. S. Teoh, “Comparative review on
traditional and deep learning methods for medical image segmentation,” in 2023 IEEE 14th Control and System Graduate
Research Colloquium (ICSGRC), 2023, pp. 45–50, doi: 10.1109/ICSGRC57744.2023.10215402.
[14] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. Cambridge, Massachusetts: MIT Press, 2016.
[15] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.
[16] E. Alaros, M. Marjani, D. A. Shafiq, and D. Asirvatham, “Predicting consumption intention of consumer relationship
management users using deep learning techniques: a review,” Indonesian Journal of Science and Technology, vol. 8, no. 2, pp.
307–328, 2023, doi: 10.17509/ijost.v8i2.55814.
[17] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in Artificial Neural Networks and
Machine Learning – ICANN 2018, Cham: Springer, 2018, pp. 270–279, doi: 10.1007/978-3-030-01424-7_27.
[18] P. Kora et al., “Transfer learning techniques for medical image analysis: a review,” Biocybernetics and Biomedical Engineering,
vol. 42, no. 1, pp. 79–107, 2022, doi: 10.1016/j.bbe.2021.11.004.
[19] A. A. Pravitasari et al., “UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation,” Telkomnika
(Telecommunication Computing Electronics and Control), vol. 18, no. 3, pp. 1310–1318, 2020, doi:
10.12928/TELKOMNIKA.v18i3.14753.
[20] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring mid-level image representations using convolutional
neural networks,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1717–1724, doi:
10.1109/CVPR.2014.222.
[21] D. A. Rasyid, G. H. Huang, and N. Iriawan, “Segmentation of low-grade gliomas using U-Net VGG16 with transfer learning,” in
2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021, pp. 393–398, doi:
10.1109/Confluence51648.2021.9377093.
[22] O. T. Bişkin, İ. Kırbaş, and A. Çelik, “A fast and time-efficient glitch classification method: a deep learning-based visual feature
extractor for machine learning algorithms,” Astronomy and Computing, vol. 42, 2023, doi: 10.1016/j.ascom.2022.100683.
[23] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?,” Advances in Neural
Information Processing Systems, vol. 4, pp. 3320–3328, 2014.
[24] Z. Yang, J. Yue, Z. Li, and L. Zhu, “Vegetable image retrieval with fine-tuning VGG model and image hash,” IFAC-
PapersOnLine, vol. 51, no. 17, pp. 280–285, 2018, doi: 10.1016/j.ifacol.2018.08.175.
[25] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269, doi: 10.1109/CVPR.2017.243.
[26] Y. Cao, S. Liu, Y. Peng, and J. Li, “DenseUNet: densely connected UNet for electron microscopy image segmentation,” IET
Image Processing, vol. 14, no. 12, pp. 2682–2689, 2020, doi: 10.1049/iet-ipr.2019.1527.
[27] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in 2015 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2015, vol. 39, no. 4, pp. 3431–3440, doi: 10.1109/CVPR.2015.7298965.
[28] S. Cai, Y. Tian, H. Lui, H. Zeng, Y. Wu, and G. Chen, “Dense-UNet: a novel multiphoton in vivo cellular image segmentation
model based on a convolutional neural network,” Quantitative Imaging in Medicine and Surgery, vol. 10, no. 6, pp. 1275–1285,
2020, doi: 10.21037/QIMS-19-1090.
[29] J. E. -Taraboulsi, C. P. Cabrera, C. Roney, and N. Aung, “Deep neural network architectures for cardiac image segmentation,”
Artificial Intelligence in the Life Sciences, vol. 4, pp. 1–19, 2023, doi: 10.1016/j.ailsci.2023.100083.
[30] A. Degerli et al., “Early detection of myocardial infarction in low-quality echocardiography,” IEEE Access, vol. 9, pp. 34442–
34453, 2021, doi: 10.1109/ACCESS.2021.3059595.
[31] C. Chen et al., “Deep learning for cardiac image segmentation: a review,” Frontiers in Cardiovascular Medicine, vol. 7, pp. 1–33,
2020, doi: 10.3389/fcvm.2020.00025.
[32] J. A. U. -Moral et al., “Contrast-enhanced echocardiographic measurement of left ventricular wall thickness in hypertrophic
cardiomyopathy: comparison with standard echocardiography and cardiac magnetic resonance,” Journal of the American Society
of Echocardiography, vol. 33, no. 9, pp. 1106–1115, 2020, doi: 10.1016/j.echo.2020.04.009.
[33] O. Hamila et al., “Fully automated 2D and 3D convolutional neural networks pipeline for video segmentation and myocardial
infarction detection in echocardiography,” Multimedia Tools and Applications, vol. 81, no. 26, pp. 37417–37439, 2022, doi:
10.1007/s11042-021-11579-4.
[34] G. Sanjeevi, U. Gopalakrishnan, R. K. Pathinarupothi, and T. Madathil, “Automatic diagnostic tool for detection of regional wall
motion abnormality from echocardiogram,” Journal of Medical Systems, vol. 47, no. 1, 2023, doi: 10.1007/s10916-023-01911-w.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282
3282
[35] I. Adalioglu, M. Ahishali, A. Degerli, S. Kiranyaz, and M. Gabbouj, “SAF-Net: self-attention fusion network for myocardial
infarction detection using multi-view echocardiography,” in Computing in Cardiology, 2023, pp. 1–4, doi:
10.22489/CinC.2023.240.
[36] Y. Li, W. Lu, P. Monkam, Z. Zhu, W. Wu, and M. Liu, “LVSnake: accurate and robust left ventricle contour localization for
myocardial infarction detection,” Biomedical Signal Processing and Control, vol. 85, 2023, doi: 10.1016/j.bspc.2023.105076.
[37] A. Degerli, S. Kiranyaz, T. Hamid, R. Mazhar, and M. Gabbouj, “Early myocardial infarction detection over multi-view
echocardiography,” Biomedical Signal Processing and Control, vol. 87, pp. 1–12, 2024, doi: 10.1016/j.bspc.2023.105448.
[38] A. Hosna, E. Merry, J. Gyalmo, Z. Alom, Z. Aung, and M. A. Azim, “Transfer learning: a friendly introduction,” Journal of Big
Data, vol. 9, no. 1, pp. 1–19, 2022, doi: 10.1186/s40537-022-00652-w.
[39] A. H. Zim et al., “Smart manufacturing with transfer learning under limited data: towards data-driven intelligences,” Materials
Today Communications, vol. 37, 2023, doi: 10.1016/j.mtcomm.2023.107357.
[40] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,”
Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017, doi: 10.1145/3065386.
[41] H. Li, M. Krček, and G. Perin, “A comparison of weight initializers in deep learning-based side-channel analysis,” in Applied
Cryptography and Network Security Workshops, Cham: Springer, 2020, pp. 126–143, doi: 10.1007/978-3-030-61638-0_8.
[42] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” Journal of Machine
Learning Research, vol. 9, pp. 249–256, 2010.
[43] D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” Arxiv-Computer Science, vol. 1, pp. 1–15, 2015.
BIOGRAPHIES OF AUTHORS
Didik Bani Unggul earned his Bachelor of Science in Statistics from Universitas
Indonesia, graduating in 2020. Currently, he is pursuing a master's degree in statistics at
Institut Teknologi Sepuluh Nopember in Surabaya, Indonesia. Actively involved in projects at
the Laboratory of Computational Statistics and Data Science, his research areas of interest
include deep learning, biomedical image processing, and computational statistics. He can be
contacted at email: 6003212005@student.its.ac.id or didikbaniunggul@gmail.com.
Nur Iriawan received a bachelor's degree in statistics from the Institut Teknologi
Sepuluh Nopember (ITS) Surabaya, a master's degree in computer science from the University
of Maryland, USA, and a Ph.D. in statistics from Curtin University of Technology, Australia.
He is a professor at the Department of Statistics, Faculty of Science and Data Analytics, ITS,
Surabaya. He also serves as the head the Laboratory of Computational Statistics and Data
Science. He has supervised and co-supervised over 20 master and 10 Ph.D. students. He has
authored or co-authored more than 60 Scopus articles, with 12 H-indexes and over 1,000
citations. His research interests encompass stochastic processes, statistical computations, and
Bayesian models. He can be contacted at email: nur_i@statistika.its.ac.id.
Heri Kuswanto holds a Statistics B.Sc. (2003) and M.Sc. (2005) from Institut
Teknologi Sepuluh Nopember, Indonesia, and a Dr.rer.pol. in statistics (econometrics) from
Leibniz Hannover University, Germany (2009). He further pursued a postdoctoral degree at
Laval University, Canada, focusing on the calibration of ensemble weather forecasts in 2010.
Currently a professor in statistics at ITS, he also serves as the Director of Graduate Program
and Academic Development. His academic career includes appointments as the Head of
Climate Change Research Group. His research spans weather forecast, solar radiation
management, computational statistics, time series forecasting, econometrics, machine
learning, and advanced data analysis. He also received awards such as the Harvard Residency
Program on Solar Geoengineering and DAAD Scholarship for Doctoral research in Germany.
He can be contacted at email: heri_k@statistika.its.ac.id.

More Related Content

PDF
78-ijsrr-d-2250.ebn-f.pdf
PDF
Using Deep Learning and Transfer Learning for Pneumonia Detection
PDF
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
PDF
IRJET- A Survey on Medical Image Interpretation for Predicting Pneumonia
PDF
A review deep learning for medical image segmentation using multi modality fu...
PDF
A New Deep Convolutional Neural Network Learning Model for Covid-19 Diagnosis
PDF
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
PDF
Lung Cancer Detection using transfer learning.pptx.pdf
78-ijsrr-d-2250.ebn-f.pdf
Using Deep Learning and Transfer Learning for Pneumonia Detection
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
IRJET- A Survey on Medical Image Interpretation for Predicting Pneumonia
A review deep learning for medical image segmentation using multi modality fu...
A New Deep Convolutional Neural Network Learning Model for Covid-19 Diagnosis
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Lung Cancer Detection using transfer learning.pptx.pdf

Similar to Transfer learning scenarios on deep learning for ultrasound based image segmentation (20)

PDF
Employing deep learning for lung sounds classification
PPTX
Batch -13.pptx lung cancer detection using transfer learning
PDF
A Survey of Convolutional Neural Network Architectures for Deep Learning via ...
PDF
Performance evaluation of transfer learning based deep convolutional neural n...
PDF
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
PDF
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
PDF
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
PDF
AN INVESTIGATION INTO DETECTING PNEUMONIA THROUGH IMAGE PROCESSING AND OBJECT...
PDF
AN INVESTIGATION INTO DETECTING PNEUMONIA THROUGH IMAGE PROCESSING AND OBJECT...
PDF
AN INVESTIGATION INTO DETECTING PNEUMONIA THROUGH IMAGE PROCESSING AND OBJECT...
PDF
AN INVESTIGATION INTO DETECTING PNEUMONIA THROUGH IMAGE PROCESSING AND OBJECT...
PDF
End-to-end deep auto-encoder for segmenting a moving object with limited tra...
PDF
Deep Learning-based Diagnosis of Pneumonia using X-Ray Scans
PDF
Convolutional auto-encoded extreme learning machine for incremental learning ...
PDF
An Innovative Deep Learning Framework Integrating Transfer- Learning And Extr...
PPTX
Deep Learning in Computer Vision
PDF
Performance Comparison Analysis for Medical Images Using Deep Learning Approa...
DOCX
Chapter 1.docx
PDF
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
PDF
Role of Advanced Machine Learning Techniques and Deep Learning Approach Based...
Employing deep learning for lung sounds classification
Batch -13.pptx lung cancer detection using transfer learning
A Survey of Convolutional Neural Network Architectures for Deep Learning via ...
Performance evaluation of transfer learning based deep convolutional neural n...
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
AN INVESTIGATION INTO DETECTING PNEUMONIA THROUGH IMAGE PROCESSING AND OBJECT...
AN INVESTIGATION INTO DETECTING PNEUMONIA THROUGH IMAGE PROCESSING AND OBJECT...
AN INVESTIGATION INTO DETECTING PNEUMONIA THROUGH IMAGE PROCESSING AND OBJECT...
AN INVESTIGATION INTO DETECTING PNEUMONIA THROUGH IMAGE PROCESSING AND OBJECT...
End-to-end deep auto-encoder for segmenting a moving object with limited tra...
Deep Learning-based Diagnosis of Pneumonia using X-Ray Scans
Convolutional auto-encoded extreme learning machine for incremental learning ...
An Innovative Deep Learning Framework Integrating Transfer- Learning And Extr...
Deep Learning in Computer Vision
Performance Comparison Analysis for Medical Images Using Deep Learning Approa...
Chapter 1.docx
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
Role of Advanced Machine Learning Techniques and Deep Learning Approach Based...
Ad

More from IAESIJAI (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Automatic detection of dress-code surveillance in a university using YOLO alg...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
Improved convolutional neural networks for aircraft type classification in re...
PDF
Primary phase Alzheimer's disease detection using ensemble learning model
PDF
Deep learning-based techniques for video enhancement, compression and restora...
PDF
Hybrid model detection and classification of lung cancer
PDF
Adaptive kernel integration in visual geometry group 16 for enhanced classifi...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Enhancing fall detection and classification using Jarratt‐butterfly optimizat...
PDF
Deep ensemble learning with uncertainty aware prediction ranking for cervical...
PDF
Event detection in soccer matches through audio classification using transfer...
PDF
Detecting road damage utilizing retinaNet and mobileNet models on edge devices
PDF
Optimizing deep learning models from multi-objective perspective via Bayesian...
PDF
Squeeze-excitation half U-Net and synthetic minority oversampling technique o...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Exploring DenseNet architectures with particle swarm optimization: efficient ...
A comparative study of natural language inference in Swahili using monolingua...
Abstractive summarization using multilingual text-to-text transfer transforme...
Enhancing emotion recognition model for a student engagement use case through...
Automatic detection of dress-code surveillance in a university using YOLO alg...
Hindi spoken digit analysis for native and non-native speakers
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
Improved convolutional neural networks for aircraft type classification in re...
Primary phase Alzheimer's disease detection using ensemble learning model
Deep learning-based techniques for video enhancement, compression and restora...
Hybrid model detection and classification of lung cancer
Adaptive kernel integration in visual geometry group 16 for enhanced classifi...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Enhancing fall detection and classification using Jarratt‐butterfly optimizat...
Deep ensemble learning with uncertainty aware prediction ranking for cervical...
Event detection in soccer matches through audio classification using transfer...
Detecting road damage utilizing retinaNet and mobileNet models on edge devices
Optimizing deep learning models from multi-objective perspective via Bayesian...
Squeeze-excitation half U-Net and synthetic minority oversampling technique o...
A novel scalable deep ensemble learning framework for big data classification...
Exploring DenseNet architectures with particle swarm optimization: efficient ...
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Machine Learning_overview_presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
A Presentation on Artificial Intelligence
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
sap open course for s4hana steps from ECC to s4
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Machine Learning_overview_presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
A Presentation on Artificial Intelligence
MIND Revenue Release Quarter 2 2025 Press Release
sap open course for s4hana steps from ECC to s4
A comparative analysis of optical character recognition models for extracting...
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Assigned Numbers - 2025 - Bluetooth® Document
MYSQL Presentation for SQL database connectivity
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Transfer learning scenarios on deep learning for ultrasound based image segmentation

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 3, September 2024, pp. 3273~3282 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3273-3282  3273 Journal homepage: http://ijai.iaescore.com Transfer learning scenarios on deep learning for ultrasound- based image segmentation Didik Bani Unggul, Nur Iriawan, Heri Kuswanto Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia Article Info ABSTRACT Article history: Received Jan 3, 2024 Revised Feb 21, 2024 Accepted Mar 6, 2024 Deep learning coupled with transfer learning, which involves reusing a pretrained model's network structure and parameter values, offers a rapid and accurate solution for image segmentation. Differing approaches exist in updating transferred parameters during training. In some studies, parameters remain frozen or untrainable (referred to as TL-S1), while in others, they act as trainable initial values updated from the first iteration (TL-S2). We introduce a new state-of-the-art transfer learning scenario (TL-S3), where parameters initially remain unchanged and update only after a specified cutoff time. Our research focuses on comparing the performance of these scenarios, a dimension yet unexplored in the literature. We simulate on three architectures (Dense-UNet-121, Dense-UNet-169, and Dense-UNet-201) using an ultrasound-based dataset with the left ventricular wall as the region of interest. The results reveal that the TL-S3 consistently outperforms the previous state-of-the-art scenarios, i.e., TL-S1 and TL-S2, achieving correct classification ratios (CCR) above 0.99 during training with noticeable performance spikes post-cutoff. Notably, two out of three top-performing models in the validation data also originate from TL-S3. Finally, the best model is the Dense-UNet-121 with TL-S3 and a 20% cutoff. It achieves the highest CCR for training 0.9950, validation 0.9699, and testing data 0.9695, confirming its excellence. Keywords: Deep learning Image segmentation Left ventricular wall Transfer learning Ultrasound This is an open access article under the CC BY-SA license. Corresponding Author: Nur Iriawan Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember Kampus ITS Sukolilo, Surabaya, Indonesia Email: nur_i@statistika.its.ac.id 1. INTRODUCTION Image segmentation is a crucial task in image and video processing. This process involves dividing the image into multiple segments or objects by assigning class labels to each pixel [1]. Its applications are widespread and encompass medical imaging [2]–[4], remote sensing [5]–[7], and the development of autonomous vehicles [8]–[10]. Amid various segmentation methods, deep learning emerges as a promising approach [11]–[13]. They decompose complex mappings into a sequence of simpler ones, each described by different layers [14]. The input is presented in a visible layer, and subsequent hidden layers extract abstract features from it. The refinement of these layers is driven by the results of the training process, rather than manual intervention [15]. With a large number of layers, they can accurately represent input features and effectively perform complex tasks like image segmentation, natural language processing, or stock price prediction [16]. Due to this benefit, deep learning is better than traditional machine learning methods, which still rely on domain expertise for feature extraction. Deep learning implementation, however, requires a large amount of training data and may require a while to complete [17]. This presents difficulties, particularly in the medical domain where labeled datasets
  • 2.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282 3274 are scarce [18]. To overcome this issue, transfer learning can be coupled with deep learning approach [17]. Reusing pre-trained network components, such as the structure and parameter values, is part of this process. To be more precise, the network is typically divided into two parts: the part receiving transfer learning and the part not receiving it. The first, leveraging transfer learning, will be structurally identical with parameter values transferred from a pre-trained model. The source model is typically trained on a larger dataset, which may be related or entirely different. The next section is a non-transferred layer, meaning its parameter values are initialized and updated during training. Furthermore, variations exist in how parameter values are handled in layers affected by transfer learning. These values can be "frozen" (non-trainable) and maintained in that state, or they can be "unfrozen" (trainable) and updated as the training progresses. Some studies treat them as non-trainable parameters [19]–[22]. On the other hand, some researchers utilize transfer learning values for initialization and updating them immediately in the first training iteration [23], [24]. Unfortunately, to the best of our knowledge, no research has evaluated the effectiveness of these two scenarios simultaneously. The majority of the articles only contrasted one scenario of transfer learning with a model that did not employ transfer learning [19], [21]. Furthermore, a lot of applications only construct a transfer learning model without contrasting it with any other models [18]. This leads to a gap in knowledge that requires research. Therefore, this study aims to compare those two parameter update scenarios, as well as introduce a new state-of-the-art transfer learning scenario. This scenario involves updating the newly transferred parameter values only after a specific time point is reached. Dense-UNet, a deep learning architecture that hybridizes Unet [4] and DenseNet [25], was employed in this investigation. This architecture was implemented to limit the number of model parameters, maximize information flow between network layers, and address vanishing gradient concerns due to its feature reuse and dense connections at each stage [26]. The encoder and the decoder are the two primary components of this architecture in general. The encoder, also known as the contraction path, is responsible for applying transfer learning from a pre-trained model and extracting features. The second component, known as the expanding path or decoder, is amid reconfiguring features and boosting spatial resolution through the use of upsampling operators [4], [27]. These two paths are connected via skip connections, in which the feature maps from the encoder are bypassed and concatenated with the decoder results at specific positions [28]. The simulation will be conducted on an ultrasound-based cardiac assessment dataset. Ultrasound, known for its accessibility, affordability, and absence of radiation exposure, addresses key healthcare concerns [29]. However, due to increased noise and decreased contrast, observing certain cardiac features can be challenging, as they are typically difficult to determine and interpret [30]. Therefore, automatic segmentation is urgently required for assistance in identifying the region of interest in ultrasound-based images. Nevertheless, in contrast to other non-invasive imaging modalities like magnetic resonance imaging (MRI) and computed tomography scan (CT-scan), research on automatic segmentation in ultrasound, particularly utilizing deep learning, has been very limited in recent years [31]. To overcome this problem, we employ a publicly available dataset from Hamad Medical Corporation, Qatar University, and Tampere University known as the HMC-QU dataset, accessible at https://www.kaggle.com/datasets/aysendegerli/hmcqu-dataset. This dataset encompasses ultrasound-based assessments featuring diverse patients and viewpoint types. Furthermore, the ground truth is supplied, with the left ventricular wall (LVW) serving as the region of interest (ROI). This is essential to us because LVW movement and structure analysis serves as an early indicator of various heart problems, including myocardial infarction and hypertrophic cardiomyopathy [30], [32]. This dataset has been used in several earlier investigations, either for segmentation or for the identification of structural and movement anomalies [33]–[37]. While deep learning remains the dominant option, none of these studies has explored the use of transfer learning to the extent that we propose. Therefore, our research provides practical benefits for the development of ultrasound-based cardiac image processing in addition to theoretical benefits for deep learning transfer learning scenarios. 2. METHOD 2.1. Dense-UNet architecture Dense-UNet is a modified U-Net architecture that incorporates dense blocks and transition layers into its structure, drawing inspiration from the DenseNet architecture introduced by [25]. Their layer-to-layer linkages are the main distinction between standard block layers and dense blocks. Each layer in a dense block obtains feature mappings from every layer before it via some concatenation [25]. This feature reuse minimizes the addition of excessive features in each layer, consequently reducing the required parameters. However, it necessitates that the dimensions of feature maps remain unchanged due to concatenation-based merging. This limitation impedes the implementation of a pooling procedure, which is generally resolved by
  • 3. Int J Artif Intell ISSN: 2252-8938  Transfer learning scenarios on deep learning for ultrasound-based image segmentation (Didik Bani Unggul) 3275 adding a transition layer. In the original configuration, this transition layer consists of 2×2 average pooling preceded by 1×1 convolution. Figure 1 illustrates the structure of the nine-stage Dense-UNet. A 7×7 convolution is employed in the first step to process the input dimensions from 224×224 to 112×112. This process continues with the first transition layer, leading us to the first dense block in the second stage. Within a dense block, layer configurations include batch normalization (BN), rectified linear unit (ReLU) activation, 3×3 convolution, another BN, ReLU activation, and 1×1 convolution. This sequence is repeated several times depending on the architectural construction. Subsequently, the second transition layer guides us to the third stage (second dense block). We will continue this process until we reach the fourth dense block in the fifth stage when we have 7×7 feature maps. The next step involves starting 2×2 upsampling and concatenating the result with the final feature maps from the fourth stage. Their results will serve as the input for the fifth dense block, which has the same layer configuration as its mirrored version (third dense block). This provision continues until we reach the ninth stage, concluding with a sigmoid activation layer and a resulting output of 224×224. Determining how many layers are present in each dense block is another crucial factor. The number of layers in this study, ranging from stage one to stage five, follows the DenseNet-121, DenseNet-169, and DenseNet-201 structure of the original DenseNet versions [25]. The sixth to ninth stages replicate this structure by mirroring the number of layers. Under these conditions, the three Dense-UNet architectures in this study are named Dense-UNet-121, Dense-UNet-169, and Dense-UNet-201. Figure 1. Dense-UNet structure with nine processing stages 2.2. Transfer learning Transfer learning is a concept in learning that attempts to enhance model performance by employing knowledge acquired from a learning task in one domain (the source domain) to improve performance in a different area (the target domain). Addressing data inadequacy in the target domain is one of the many benefits of this method [17], [38], [39]. It will mitigate this issue by relaxing the assumption that training and testing data must originate from the identical domain. Transfer learning in deep learning refers to pre-training a network on a source domain, frequently a larger dataset like ImageNet [40]. This process leads to a model with optimized parameter values representing previously acquired knowledge. These parameter values are subsequently transferred to another network created particularly for the target dataset. Notably, the two networks are often dissimilar. As a result, the new model incorporates layers that receive the transfer learning results alongside layers that do not. In the Dense-UNet architecture discussed earlier, we can determine that the portion designated for transfer learning is the initial half known as the encoder, encompassing the first to fifth stages. If the architecture has M layers and the encoder consists of 𝐾 layers (𝐾 < 𝑀), the first 𝐾 layers will receive the parameter values from the pre-trained model. Furthermore, the other layers will be initialized using either fixed or random numbers [41]. After that, there are different scenarios for how we handle the transferred parameters: ‒ Scenario 1: freeze scenario (TL-S1). Transferred parameters are regarded as untrainable since their values remain unchanged while they are being trained, essentially freezing them. Layers not undergoing transfer learning will be initialized and updated from the first iteration until the completion of training. ‒ Scenario 2: unfreeze scenario (TL-S2). In this scenario transfer learning parameter values act as initializations and are changed in real time throughout training. Thus, all parameters are deemed trainable,
  • 4.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282 3276 regardless of whether they are in layers with or without transfer learning. The initialization procedure is where the differences arise: non-transferred layers begin with glorot uniform initialization, whereas other layers start with values from a pre-trained model. ‒ Scenario 3: freeze-unfreeze scenario (TL-S3). Parameters in layers affected by transfer learning will remain unchanged for an initial portion of the training process. In other words, only the parameters in layers not affected by transfer learning will be updated, while those influenced by transfer learning will be frozen. After reaching a pre-defined epoch threshold, the transfer learning layer is unfrozen, and training continues across all layers. The transfer learning cutoff will be explored at various stages, including 20%, 40%, 60%, and 80% of the total training epochs. This exploration will clarify how the timing of the transition impacts the final outcome. In this study, we will simulate the three scenarios that are depicted in Figure 2. Figure 2. Three parameter updating scenarios in transfer learning 2.3. Optimization technique In architectures like Dense-UNet, “trainable parameters” encompass weights and biases in convolution layers, as well as scale and shift parameters in BN layers. During training, these parameters are optimized, commencing with initial values generated by glorot-uniform initialization [42]. This technique uses a uniform distribution with an interval limit of [−𝑎, 𝑎], where 𝑎 is calculated employing (1). The values of 𝑛𝑖𝑛 and 𝑛𝑜𝑢𝑡 represent the number of input and output units of the layer, respectively. 𝑎 = √6 √𝑛𝑖𝑛+𝑛𝑜𝑢𝑡 (1) Next, the adaptive moment (Adam) technique proposed in [43] will be utilized for updating the initial value iteratively. This method updates parameter values using bias-corrected values of gradients' first and second moments estimations. Algorithm 1 illustrates the procedure. The first component that must be calculated is the gradient of the loss function with respect to the model parameters, denoted by 𝑔𝑡 where 𝑡 is the index of iteration performed. The binary cross-entropy loss function as in (2) was selected to suit the binary classification task. 𝐿𝑖 = −[𝑐𝑖 𝑙𝑜𝑔(𝑝(𝑐𝑖)) + (1 − 𝑐𝑖) 𝑙𝑜𝑔(1 − 𝑝(𝑐𝑖))]q (2)
  • 5. Int J Artif Intell ISSN: 2252-8938  Transfer learning scenarios on deep learning for ultrasound-based image segmentation (Didik Bani Unggul) 3277 Algorithm 1. Adam Optimization Require: 𝛽1, 𝛽2, 𝜂, 𝜀: Hyperparameter Require: 𝑓𝑡(𝜃𝑡−1): Stochastic objective function as in (2) with trainable parameter 𝜃𝑡−1 at time-step 𝑡 − 1 Require: 𝜃0: Initial parameter vector generated from glorot uniform 𝑚0 ← 0 (Initialize 1st moment vector) 𝑣0 ← 0 (Initialize 2nd moment vector) 𝑡 ← 0 (Initialize time-step) While 𝜃𝑡 not converged do 𝑡 ← 𝑡 + 1 Get gradients w.r.t. stochastic objective at timestep 𝑡 using (3): 𝑔𝑡 = 𝛻𝜃𝑓𝑡(𝜃𝑡−1) (3) Update biased 1st moment estimate using (4): 𝑚𝑡 = 𝛽1𝑚𝑡−1 + (1 − 𝛽1)𝑔𝑡 (4) Update biased 2nd moment estimate using (5): 𝑣𝑡 = 𝛽2𝑣𝑡−1 + (1 − 𝛽2)𝑔𝑡 2 (5) Compute bias-corrected 1st moment estimate using (6): 𝑚 ̂𝑡 = 1 1−(𝛽1)𝑡 𝑚𝑡 (6) Compute bias-corrected 2nd moment estimate using (7): 𝑣 ̂𝑡 = 1 1−(𝛽2)𝑡 𝑣𝑡 (7) Update parameters using (8): 𝜃𝑡 = 𝜃𝑡−1 − 𝜂 √𝑣 ̂𝑡+𝜀 𝑚 ̂𝑡 (8) End while Return 𝜃𝑡 (Resulting parameters) The loss for the 𝑖𝑡ℎ pixel, denoted as 𝐿𝑖, is defined for 𝑖 = 1, . . . , 𝑁 with 𝑁 representing the total pixels in the output image. The actual classification class of 𝑖𝑡ℎ pixel is notated by 𝑐𝑖 ∈ {0,1}, in which 𝑐𝑖 = 0 is for the background and 𝑐𝑖 = 1 is for the ROI. Lastly,𝑝(𝑐𝑖) is the predicted probability of belonging to class 𝑐𝑖 calculated by the model. After finding the 𝑔𝑡, we are able to calculate the exponentially weighted moving average of the gradient (𝑚𝑡) and squared gradient (𝑣𝑡). This step demands us to configure the hyperparameter values 𝛽1, 𝛽2 ∈ [0,1) as the exponential decay rates for the moment estimates. We then utilize the bias-corrected versions of 𝑚𝑡 and 𝑣𝑡 along with 𝜂 and 𝜀 to update parameter values from 𝜃𝑡−1 to 𝜃𝑡. We also set the hyperparameter values at 𝛽1 = 0.9, 𝛽2 = 0.999, 𝜂 = 10−6 , and 𝜀 = 10−8 . 2.4. Evaluation metric This study utilizes the correct classification ratio (CCR) to evaluate the model performances. This metric is demonstrated in (9). 𝐺𝑇𝑗 represents the ground truth area for class 𝑗, while 𝑆𝑒𝑔𝑗 depicts the model's corresponding segmentation area. Class 𝑗 = 0 is designated for the background (non-ROI) and 𝑗 = 1 is for the LVW area (ROI).|𝐺𝑇𝑗 ∩ 𝑆𝑒𝑔𝑗| denotes the number of pixels from class 𝑗 which are accurately classified by the model. |𝐺𝑇| can be measured by counting the number of pixels from the union of the 𝐺𝑇0 and 𝐺𝑇1 areas. The CCR values vary between 0 and 1. All pixels are appropriately categorized and our segmentation precisely matches the ground truth if the CCR value is one. A decreasing CCR indicates deteriorating segmentation results. CCR = ∑ |𝐺𝑇𝑗∩𝑆𝑒𝑔𝑗| |𝐺𝑇| 1 𝑗=0 (9) 2.5. Dataset and experimental setup An online echocardiogram (ECG) dataset made available by HMC-QU was employed in this study. Specifically, we concentrated on a subset of the dataset that included 109 ECG video recordings that had 224×224 pixels ground truth available. These videos presented the apical 4-chamber view at a resolution of 636×422 pixels. They had a frame rate of 25 frames per second and ranged in duration from 1 to 3 seconds. The videos were randomly divided into training, validation, and testing sets (80%:10%:10%), resulting in 87 training videos, 11 for validation, and 11 for testing. The images were then extracted and fed into preprocessing. They were center-cropped to 422×422 pixels and resized to 224×224 pixels. The red, green, and blue channels' color intensities were then extracted into three matrices. The matrix elements, initially ranging from 0 to 255, were normalized to a 0-to-1 range, serving as input for the deep learning architecture.
  • 6.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282 3278 The training utilized a batch size of 10, with 10 images selected at random for each iteration. Each epoch concluded after processing all images, and this procedure was repeated for 100 epochs. The experiment was conducted on Google Colab using an NVIDIA V100 GPU, with Python 3 and the Keras framework chosen for their effectiveness and executability. 3. RESULTS AND DISCUSSION Figure 3 displays an example of many ultrasound pictures processed during this study. The original photographs that were selected at random are presented in the first row, and the ground truth for those images is displayed in the second row. The combined result, which displays the location of the LVW area determined by the ground truth, can be examined in the third row. The models use ground truth as a reference during training to effectively learn and recognize LVW characteristics. Figure 3. Some examples of ultrasound-based images and their ground truth (mask) Table 1 summarizes the training durations (in seconds), loss values, and CCR for the three Dense-UNet architectures with various transfer learning scenarios. Notably, across all architectures, the suggested third scenario (TL-S3) consistently outperforms models without transfer learning (NoTL), TL-S1, and TL-S2. The models under TL-S3 achieve a remarkable CCR exceeding 0.99, a level not attained by models from other scenarios. Furthermore, TL-S3 models demonstrate far lower losses than the others, with reductions ranging from 82% to 97%. Table 1. The performance results of Dense-UNet models Architecture Transfer Learning Scenario Training Validation Duration (in second) Loss CCR Loss CCR Dense-UNet-121 No TL 0.2338 0.9772 0.2461 0.9685 2,983 TL-S1 0.0918 0.9815 0.1166 0.9685 1,955 TL-S2 0.1509 0.9849 0.1793 0.9679 2,966 TL-S3 20%-F 0.0095 0.9950 0.2063 0.9699 2,857 TL-S3 40%-F 0.0106 0.9948 0.1765 0.9694 2,666 TL-S3 60%-F 0.0169 0.9925 0.1369 0.9677 2,455 TL-S3 80%-F 0.0169 0.9925 0.1282 0.9692 2,250 Dense-UNet-169 No TL 0.2491 0.9777 0.2610 0.9679 3,849 TL-S1 0.1639 0.9780 0.1763 0.9694 2,383 TL-S2 0.1555 0.9854 0.1886 0.9674 3,838 TL-S3 20%-F 0.0095 0.9950 0.2074 0.9678 2,752 TL-S3 40%-F 0.0101 0.9949 0.2038 0.9678 2,552 TL-S3 60%-F 0.0110 0.9946 0.1823 0.9689 2,373 TL-S3 80%-F 0.0198 0.9913 0.1321 0.9667 2,170 Dense-UNet-201 No TL 0.1391 0.9833 0.1648 0.9694 4,874 TL-S1 0.2200 0.9746 0.2282 0.9683 3,112 TL-S2 0.2872 0.9791 0.3048 0.9669 4,893 TL-S3 20%-F 0.0100 0.9949 0.1905 0.9684 4,668 TL-S3 40%-F 0.0108 0.9947 0.1691 0.9680 4,327 TL-S3 60%-F 0.0115 0.9945 0.1804 0.9687 3,989 TL-S3 80%-F 0.0204 0.9910 0.1085 0.9679 3,707 No TL: Without transfer learning; TL-S1: Transfer learning scenario 1 (freeze scenario); TL-S2: Transfer learning scenario 2 (unfreeze scenario); TL-S3 20%, 40%, 60%, 80%-F: Transfer learning scenario 3 (freeze-unfreeze scena- rio with non-freezing start cutoffs being 20%, 40%, 60%, and 80% of the total epoch, respectively).
  • 7. Int J Artif Intell ISSN: 2252-8938  Transfer learning scenarios on deep learning for ultrasound-based image segmentation (Didik Bani Unggul) 3279 Investigation reveals that when transfer learning parameters are unfrozen after the cutoff, the TL-S3 models perform noticeably better. TL-S3 20%-F models, for example, exhibit better performance spikes after 20 epochs, whereas TL-S3 40%-F models show better performance surges after 40 epochs. The TL-S3 60%-F and TL-S3 80%-F models also exhibit this pattern. The learning curve provides a visual representation of this circumstance, with Figures 4(a) to 4(c) representing CCR and Figures 5(a) to 5(c) representing loss. It validates the hypothesis that temporarily freezing transfer learning layers enables the model to adapt to the current case's characteristics without disrupting the robust feature extraction of pre- trained layers. After the non-transfer learning layer stabilizes, unfreezing the transfer learning layer boosts performance by iteratively updating all parameters. This performance jump occurs shortly after the cutoff. (a) (b) (c) Figure 4. Learning curve for CCR values: (a) Dense-UNet-121, (b) Dense-UNet-169, and (c) Dense-UNet-201 (a) (b) (c) Figure 5. Learning curve for loss values: (a) Dense-UNet-121, (b) Dense-UNet-169, and (c) Dense-UNet-201 The average CCR increase for TL-S3 models during the next twenty epochs was 0.0216, compared to 0.0048 for other scenarios. This approximately five-fold difference highlights how much preferable the TL-S3 scenario is. We discover that the TL-S3 20%-F scenario corresponds to the best-performing model during training in each Dense-UNet architecture. Dense-UNet-121, 169, and 201 with this scenario had CCR values of 0.9950, 0.9950, and 0.9949, respectively, placing them among the top three in terms of both CCR and loss. With a CCR of 0.9699, the Dense-UNet-121 model with TL-S3 20%-F also leads in validation. Dense-UNet-121 with TL-S3 40%-F and Dense-UNet-169 with TL-S1 are the second and third-best models, respectively, with CCR values of 0.9694. TL-S3 scenario models were able to maintain two of the top three positions in this instance. Then, a different testing dataset was employed to further evaluate these three models, which were determined to be the best options. Once more, the model with the greatest CCR of 0.9695 was Dense-UNet-121 with TL-S3 20%-F. It performed better than Dense-UNet-121 with TL-S3 40%-F and Dense-UNet-169 with TL-S1, which had CCR values of 0.9685 and 0.9681, respectively. The results demonstrate the strong segmentation capabilities of Dense-UNet-121, confirming its superior performance with TL-S3 20%-F. It continuously achieves the greatest CCR (0.9950, 0.9699, and 0.9695, respectively) across training, validation, and testing datasets. When comparing models with and without transfer learning, models with transfer learning generally demonstrate faster training times. Dense-UNet-201 TL-S2 is an exception, taking 19 seconds longer than
  • 8.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282 3280 Dense-UNet-201 without transfer learning. In other circumstances, transfer learning steadily quickens the training process. Second, we anticipated that TL-S1 would demonstrate the most rapid training time. The rationale behind this approach stemmed from the observation that TL-S1 requires fewer learned parameters than TL-S2 and TL-S3. Nevertheless, our research indicates that this hypothesis is valid exclusively when comparing TL-S1 and TL-S2. Interestingly, certain of the models in the TL-S3 scenario required less training time than those in TL-S1. This result provides an interesting novel perspective to our investigation, indicating that the special parameter update approach employed by TL-S3 may help enhance the effectiveness of training. We additionally discover that among TL-S3 models, the training period varies depending on the cutoff position selection. The earlier the transition from non-trainable (freeze) to trainable (unfreeze) status occurs, the longer the training duration. This condition is attributed to the increasing proportion of epochs with a full-scale trainable parameter set. In terms of processing time, our best model, the Dense-UNet-121 with TL-S3 20%-F, also performed well. With 2,857 seconds of duration, it is faster than 52% of other models. Lastly, Figure 6 provides a visualization of data segmentation testing with our best model. The original photos are displayed in the top row, and a comparison of the ROI contour generated by the model (red line) and the ground truth (blue line) is presented in the bottom row. This figure illustrates how the model can segment data from a new dataset that was not utilized during training. Figure 6. Segmentation results produced by the best model 4. CONCLUSION This study provides several important conclusions. Firstly, during training, the TL-S3 scenario consistently outperforms other scenarios, achieving CCRs over 0.99 and losses under 0.0205. This superiority is explained by TL-S3's learning curve exhibiting a performance increase after surpassing the freezing cutoff. Five times higher than the rest, the average CCR increase in the 20 epochs post-cutoff is 0.0216. Furthermore, the excellence of TL-S3 extends to validation process, securing top positions in terms of the highest CCR. In summary, the Dense-UNet-121 model with TL-S3 20%-F is deemed the best, achieving a training duration of 2,857 seconds and attaining the highest CCR values for training, validation, and testing data (0.9950, 0.9699, and 0.9695, respectively). This study establishes opportunities for further research on the TL-S3 scenario by raising two crucial issues: first, determining the optimal transition point from 'untrainable' to 'trainable' status, and second, exploring how distinct training parameter adjustments can be made for each layer impacted by transfer learning. These investigations are expected to enhance the robustness and performance of the deep learning model with transfer learning. ACKNOWLEDGEMENTS The research presented in this paper was supported by Department of Statistics, Institut Teknologi Sepuluh Nopember and Indonesia Endowment Fund for Education Agency under scholarship no. KET-438/LPDP.4/2022. REFERENCES [1] R. Szeliski, Computer vision: algorithms and applications. Cham: Springer, 2022. [2] R. Ranjbarzadeh, A. Caputo, E. B. Tirkolaee, S. J. Ghoushchi, and M. Bendechache, “Brain tumor segmentation of MRI images: a comprehensive review on the application of artificial intelligence tools,” Computers in Biology and Medicine, vol. 152, 2023, doi: 10.1016/j.compbiomed.2022.106405.
  • 9. Int J Artif Intell ISSN: 2252-8938  Transfer learning scenarios on deep learning for ultrasound-based image segmentation (Didik Bani Unggul) 3281 [3] N. Salpea, P. Tzouveli, and D. Kollias, “Medical image segmentation: a review of modern architectures,” in Computer Vision – ECCV 2022 Workshops, 2023, pp. 691–708, doi: 10.1007/978-3-031-25082-8_47. [4] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Cham: Springer, 2015, pp. 234–241, doi: 10.1007/978-3-319- 24574-4_28. [5] S. M. Azimi, C. Henry, L. Sommer, A. Schumann, and E. Vig, “SkyScapes ­ fine-grained semantic understanding of aerial scenes,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7392–7402, doi: 10.1109/ICCV.2019.00749. [6] P. Bhadoria, S. Agrawal, and R. Pandey, “Image segmentation techniques for remote sensing satellite images,” IOP Conference Series: Materials Science and Engineering, vol. 993, no. 1, pp. 1–17, 2020, doi: 10.1088/1757-899X/993/1/012050. [7] B. E. -Zahouani et al., “Remote sensing imagery segmentation in object-based analysis: a review of methods, optimization, and quality evaluation over the past 20 years,” Remote Sensing Applications: Society and Environment, vol. 32, 2023, doi: 10.1016/j.rsase.2023.101031. [8] D. Feng et al., “Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1341–1360, 2021, doi: 10.1109/TITS.2020.2972974. [9] D. -V. Giurgi, T. J. -Laurain, M. Devanne, and J. -P. Lauffenburger, “Real-time road detection implementation of UNet architecture for autonomous driving,” in 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), 2022, pp. 1–5, doi: 10.1109/IVMSP54334.2022.9816237. [10] L. Lizhou and Z. Yong, “A closer look at U-net for road detection,” in Tenth International Conference on Digital Image Processing (ICDIP 2018), 2018, doi: 10.1117/12.2503282. [11] M. Aljabri and M. AlGhamdi, “A review on the use of deep learning for medical images segmentation,” Neurocomputing, vol. 506, pp. 311–335, 2022, doi: 10.1016/j.neucom.2022.07.070. [12] B. Sistaninejhad, H. Rasi, and P. Nayeri, “A review paper about deep learning for medical image analysis,” Computational and Mathematical Methods in Medicine, vol. 2023, pp. 1–10, 2023, doi: 10.1155/2023/7091301. [13] S. M. Khaniabadi, H. Ibrahim, I. A. Huqqani, F. M. Khaniabadi, H. A. M. Sakim, and S. S. Teoh, “Comparative review on traditional and deep learning methods for medical image segmentation,” in 2023 IEEE 14th Control and System Graduate Research Colloquium (ICSGRC), 2023, pp. 45–50, doi: 10.1109/ICSGRC57744.2023.10215402. [14] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. Cambridge, Massachusetts: MIT Press, 2016. [15] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539. [16] E. Alaros, M. Marjani, D. A. Shafiq, and D. Asirvatham, “Predicting consumption intention of consumer relationship management users using deep learning techniques: a review,” Indonesian Journal of Science and Technology, vol. 8, no. 2, pp. 307–328, 2023, doi: 10.17509/ijost.v8i2.55814. [17] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in Artificial Neural Networks and Machine Learning – ICANN 2018, Cham: Springer, 2018, pp. 270–279, doi: 10.1007/978-3-030-01424-7_27. [18] P. Kora et al., “Transfer learning techniques for medical image analysis: a review,” Biocybernetics and Biomedical Engineering, vol. 42, no. 1, pp. 79–107, 2022, doi: 10.1016/j.bbe.2021.11.004. [19] A. A. Pravitasari et al., “UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 18, no. 3, pp. 1310–1318, 2020, doi: 10.12928/TELKOMNIKA.v18i3.14753. [20] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring mid-level image representations using convolutional neural networks,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1717–1724, doi: 10.1109/CVPR.2014.222. [21] D. A. Rasyid, G. H. Huang, and N. Iriawan, “Segmentation of low-grade gliomas using U-Net VGG16 with transfer learning,” in 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021, pp. 393–398, doi: 10.1109/Confluence51648.2021.9377093. [22] O. T. Bişkin, İ. Kırbaş, and A. Çelik, “A fast and time-efficient glitch classification method: a deep learning-based visual feature extractor for machine learning algorithms,” Astronomy and Computing, vol. 42, 2023, doi: 10.1016/j.ascom.2022.100683. [23] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?,” Advances in Neural Information Processing Systems, vol. 4, pp. 3320–3328, 2014. [24] Z. Yang, J. Yue, Z. Li, and L. Zhu, “Vegetable image retrieval with fine-tuning VGG model and image hash,” IFAC- PapersOnLine, vol. 51, no. 17, pp. 280–285, 2018, doi: 10.1016/j.ifacol.2018.08.175. [25] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269, doi: 10.1109/CVPR.2017.243. [26] Y. Cao, S. Liu, Y. Peng, and J. Li, “DenseUNet: densely connected UNet for electron microscopy image segmentation,” IET Image Processing, vol. 14, no. 12, pp. 2682–2689, 2020, doi: 10.1049/iet-ipr.2019.1527. [27] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, vol. 39, no. 4, pp. 3431–3440, doi: 10.1109/CVPR.2015.7298965. [28] S. Cai, Y. Tian, H. Lui, H. Zeng, Y. Wu, and G. Chen, “Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network,” Quantitative Imaging in Medicine and Surgery, vol. 10, no. 6, pp. 1275–1285, 2020, doi: 10.21037/QIMS-19-1090. [29] J. E. -Taraboulsi, C. P. Cabrera, C. Roney, and N. Aung, “Deep neural network architectures for cardiac image segmentation,” Artificial Intelligence in the Life Sciences, vol. 4, pp. 1–19, 2023, doi: 10.1016/j.ailsci.2023.100083. [30] A. Degerli et al., “Early detection of myocardial infarction in low-quality echocardiography,” IEEE Access, vol. 9, pp. 34442– 34453, 2021, doi: 10.1109/ACCESS.2021.3059595. [31] C. Chen et al., “Deep learning for cardiac image segmentation: a review,” Frontiers in Cardiovascular Medicine, vol. 7, pp. 1–33, 2020, doi: 10.3389/fcvm.2020.00025. [32] J. A. U. -Moral et al., “Contrast-enhanced echocardiographic measurement of left ventricular wall thickness in hypertrophic cardiomyopathy: comparison with standard echocardiography and cardiac magnetic resonance,” Journal of the American Society of Echocardiography, vol. 33, no. 9, pp. 1106–1115, 2020, doi: 10.1016/j.echo.2020.04.009. [33] O. Hamila et al., “Fully automated 2D and 3D convolutional neural networks pipeline for video segmentation and myocardial infarction detection in echocardiography,” Multimedia Tools and Applications, vol. 81, no. 26, pp. 37417–37439, 2022, doi: 10.1007/s11042-021-11579-4. [34] G. Sanjeevi, U. Gopalakrishnan, R. K. Pathinarupothi, and T. Madathil, “Automatic diagnostic tool for detection of regional wall motion abnormality from echocardiogram,” Journal of Medical Systems, vol. 47, no. 1, 2023, doi: 10.1007/s10916-023-01911-w.
  • 10.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3273-3282 3282 [35] I. Adalioglu, M. Ahishali, A. Degerli, S. Kiranyaz, and M. Gabbouj, “SAF-Net: self-attention fusion network for myocardial infarction detection using multi-view echocardiography,” in Computing in Cardiology, 2023, pp. 1–4, doi: 10.22489/CinC.2023.240. [36] Y. Li, W. Lu, P. Monkam, Z. Zhu, W. Wu, and M. Liu, “LVSnake: accurate and robust left ventricle contour localization for myocardial infarction detection,” Biomedical Signal Processing and Control, vol. 85, 2023, doi: 10.1016/j.bspc.2023.105076. [37] A. Degerli, S. Kiranyaz, T. Hamid, R. Mazhar, and M. Gabbouj, “Early myocardial infarction detection over multi-view echocardiography,” Biomedical Signal Processing and Control, vol. 87, pp. 1–12, 2024, doi: 10.1016/j.bspc.2023.105448. [38] A. Hosna, E. Merry, J. Gyalmo, Z. Alom, Z. Aung, and M. A. Azim, “Transfer learning: a friendly introduction,” Journal of Big Data, vol. 9, no. 1, pp. 1–19, 2022, doi: 10.1186/s40537-022-00652-w. [39] A. H. Zim et al., “Smart manufacturing with transfer learning under limited data: towards data-driven intelligences,” Materials Today Communications, vol. 37, 2023, doi: 10.1016/j.mtcomm.2023.107357. [40] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017, doi: 10.1145/3065386. [41] H. Li, M. Krček, and G. Perin, “A comparison of weight initializers in deep learning-based side-channel analysis,” in Applied Cryptography and Network Security Workshops, Cham: Springer, 2020, pp. 126–143, doi: 10.1007/978-3-030-61638-0_8. [42] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” Journal of Machine Learning Research, vol. 9, pp. 249–256, 2010. [43] D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” Arxiv-Computer Science, vol. 1, pp. 1–15, 2015. BIOGRAPHIES OF AUTHORS Didik Bani Unggul earned his Bachelor of Science in Statistics from Universitas Indonesia, graduating in 2020. Currently, he is pursuing a master's degree in statistics at Institut Teknologi Sepuluh Nopember in Surabaya, Indonesia. Actively involved in projects at the Laboratory of Computational Statistics and Data Science, his research areas of interest include deep learning, biomedical image processing, and computational statistics. He can be contacted at email: 6003212005@student.its.ac.id or didikbaniunggul@gmail.com. Nur Iriawan received a bachelor's degree in statistics from the Institut Teknologi Sepuluh Nopember (ITS) Surabaya, a master's degree in computer science from the University of Maryland, USA, and a Ph.D. in statistics from Curtin University of Technology, Australia. He is a professor at the Department of Statistics, Faculty of Science and Data Analytics, ITS, Surabaya. He also serves as the head the Laboratory of Computational Statistics and Data Science. He has supervised and co-supervised over 20 master and 10 Ph.D. students. He has authored or co-authored more than 60 Scopus articles, with 12 H-indexes and over 1,000 citations. His research interests encompass stochastic processes, statistical computations, and Bayesian models. He can be contacted at email: nur_i@statistika.its.ac.id. Heri Kuswanto holds a Statistics B.Sc. (2003) and M.Sc. (2005) from Institut Teknologi Sepuluh Nopember, Indonesia, and a Dr.rer.pol. in statistics (econometrics) from Leibniz Hannover University, Germany (2009). He further pursued a postdoctoral degree at Laval University, Canada, focusing on the calibration of ensemble weather forecasts in 2010. Currently a professor in statistics at ITS, he also serves as the Director of Graduate Program and Academic Development. His academic career includes appointments as the Head of Climate Change Research Group. His research spans weather forecast, solar radiation management, computational statistics, time series forecasting, econometrics, machine learning, and advanced data analysis. He also received awards such as the Harvard Residency Program on Solar Geoengineering and DAAD Scholarship for Doctoral research in Germany. He can be contacted at email: heri_k@statistika.its.ac.id.