SlideShare a Scribd company logo
Intelligent Software Defect Prediction 1st
Edition Xiaoyuan Jing download
https://ebookbell.com/product/intelligent-software-defect-
prediction-1st-edition-xiaoyuan-jing-54902328
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Intelligent Software Defect Prediction Xiaoyuan Jing Haowen Chen
https://ebookbell.com/product/intelligent-software-defect-prediction-
xiaoyuan-jing-haowen-chen-54943150
Intelligent Software Methodologies Tools And Techniques 13th
International Conference Somet 2014 Langkawi Malaysia September 2224
2014 Revised Selected Papers 1st Edition Hamido Fujita
https://ebookbell.com/product/intelligent-software-methodologies-
tools-and-techniques-13th-international-conference-
somet-2014-langkawi-malaysia-september-2224-2014-revised-selected-
papers-1st-edition-hamido-fujita-5141056
Intelligent Software Methodologies Tools And Techniques 14th
International Conference Somet 2015 Naples Italy September 1517 2015
Proceedings 1st Edition Hamido Fujita
https://ebookbell.com/product/intelligent-software-methodologies-
tools-and-techniques-14th-international-conference-somet-2015-naples-
italy-september-1517-2015-proceedings-1st-edition-hamido-
fujita-5236026
Advancing Technology Industrialization Through Intelligent Software
Methodologies Tools And Techniques Fujita
https://ebookbell.com/product/advancing-technology-industrialization-
through-intelligent-software-methodologies-tools-and-techniques-
fujita-37189172
Advancing Technology Industrialization Through Intelligent Software
Methodologies Tools And Techniques Proceedings Of The 18th
International In Artificial Intelligence And Applications Hamido
Fujita Editor
https://ebookbell.com/product/advancing-technology-industrialization-
through-intelligent-software-methodologies-tools-and-techniques-
proceedings-of-the-18th-international-in-artificial-intelligence-and-
applications-hamido-fujita-editor-37244864
Security And Safety Interplay Of Intelligent Software Systems Esorics
2018 International Workshops Issa 2018 And Csits 2018 Barcelona Spain
September 67 2018 Revised Selected Papers 1st Ed Brahim Hamid
https://ebookbell.com/product/security-and-safety-interplay-of-
intelligent-software-systems-esorics-2018-international-workshops-
issa-2018-and-csits-2018-barcelona-spain-september-67-2018-revised-
selected-papers-1st-ed-brahim-hamid-10487310
Designing Distributed Learning Environments With Intelligent Software
Agents Fuhua Oscar Lin
https://ebookbell.com/product/designing-distributed-learning-
environments-with-intelligent-software-agents-fuhua-oscar-lin-2220234
Modelling In Mechanical Engineering And Mechatronics Towards
Autonomous Intelligent Software Models Nikolay Avgoustinov
https://ebookbell.com/product/modelling-in-mechanical-engineering-and-
mechatronics-towards-autonomous-intelligent-software-models-nikolay-
avgoustinov-1187526
Complex Intelligent And Software Intensive Systems Proceedings Of The
17th International Conference On Complex Intelligent And Software
Intensive Systems Cisis2023 Leonard Barolli
https://ebookbell.com/product/complex-intelligent-and-software-
intensive-systems-proceedings-of-the-17th-international-conference-on-
complex-intelligent-and-software-intensive-systems-cisis2023-leonard-
barolli-50714378
Intelligent Software Defect Prediction 1st Edition Xiaoyuan Jing
Intelligent
Software Defect
Prediction
Xiao-Yuan Jing · Haowen Chen
Baowen Xu
Intelligent Software Defect Prediction
Xiao-Yuan Jing • Haowen Chen • Baowen Xu
Intelligent Software Defect
Prediction
Xiao-Yuan Jing
School of Computer Science
Wuhan University
Wuhan, Hubei, China
Baowen Xu
Computer Science & Technology
Nanjing University
Nanjing, Jiangsu, China
Haowen Chen
School of Computer Science
Wuhan University
Wuhan, Hubei, China
ISBN 978-981-99-2841-5 ISBN 978-981-99-2842-2 (eBook)
https://doi.org/10.1007/978-981-99-2842-2
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Paper in this product is recyclable.
Preface
With the increase of complexity and dependency of software, the software product
may suffer from low quality, high cost, hard-to-maintain, and even the occurrence
of defects. Software defect usually produces incorrect or unexpected results and
behaviors in unintended ways. Software defect prediction (SDP) is one of the most
active research fields in software engineering and plays an important role in software
quality assurance. According to the feedback of SDP, developers can subsequently
conduct defect location and repair under reasonable resource allocation, which is
helpful in reducing the maintenance cost.
The early task of SDP is performed within a single project. Developers can make
use of the well-labeled historical data of the currently maintained project to build the
model and predict the defect-proneness of the remaining instances. This process is
called within-project defect prediction (WPDP). However, the annotation for defect
data (i.e., defective or defective-free) is time-consuming and high-cost, which is a
hard task for practitioners in the development or maintenance cycle. To solve this
problem, researchers consider introducing other projects with sufficient historical
data to conduct the cross-project defect prediction (CPDP) which has received
extensive attention in recent years. As the special case of CPDP, heterogeneous
defect prediction (HDP) refers to the scenario that training and test data have
different metrics, which can relax the restriction on source and target projects’
metrics. Besides, there also exist other research questions of SDP to be further
studied, such as cross-version defect prediction, just-in-time (JIT) defect prediction,
and effort-aware JIT defect prediction.
In the past few decades, more and more researchers pay attention to SDP and
a lot of intelligent SDP techniques have been presented. In order to obtain the
high-quality representations of defect data, a lot of machine learning techniques
such as dictionary learning, semi-supervised learning, multi-view learning, and deep
learning are applied to solve SDP problems. Besides, transfer learning techniques
are also used to eliminate the divergence between different project data in CPDP
scenario. Therefore, the combination with machine learning techniques is conducive
to improving the prediction efficiency and accuracy, which can promote the research
of intelligent SDP to make significant progress.
v
vi Preface
We propose to draft this book to provide a comprehensive picture of the
current state of SDP researches instead of improving and comparing existing SDP
approaches. More specifically, this book introduces a range of machine learning-
based SDP approaches proposed for different scenarios (i.e., WPDP, CPDP, and
HDP). Besides, this book also provides deep insight into current SDP approaches’
performance and learned lessons for further SDP researches.
This book is mainly applicable to graduate students, researchers who work in
or have interests in the areas of SDP, and the developers who are responsible for
software maintenance.
Wuhan, China Xiao-Yuan Jing
December, 2022 Haowen Chen
Acknowledgments
We thank Li Zhiqiang, Wu Fei, Wang Tiejian, Zhang Zhiwu, and Sun Ying
from Wuhan University for their contributions to this research. We would like to
express our heartfelt gratitude to Professor Baowen Xu and his team from Nanjing
University for their selfless technical assistance in the compilation of this book. We
are so thankful for the invaluable help and support provided by Professor Xiaoyuan
Xie from Wuhan University, whose valuable advice and guidance was crucial to the
successful completion of this book. We wanted to express our sincere appreciation
for the unwavering support provided by Nanjing University, Wuhan University,
and Nanjing University of Posts and Telecommunications, as well as the editing
suggestions provided by Kamesh and Wei Zhu from Springer Publishing House.
We just wanted to thank you from the bottom of our hearts for your unwavering
support and guidance throughout the compilation of this book. Finally, we would
like to express our heartfelt appreciation to two master students, Hanwei and Xiuting
Huang, who participated in the editing process and made indelible contributions to
the compilation of this book.
vii
Contents
1 Introduction .................................................................. 1
1.1 Software Quality Assurance ............................................ 1
1.2 Software Defect Prediction ............................................. 2
1.3 Research Directions of SDP ............................................ 3
1.3.1 Within-Project Defect Prediction (WPDP) .................... 3
1.3.2 Cross-Project Defect Prediction (CPDP) ...................... 4
1.3.3 Heterogeneous Defect Prediction (HDP) ...................... 4
1.3.4 Other Research Questions of SDP ............................. 5
1.4 Notations and Corresponding Descriptions ............................ 7
1.5 Structure of This Book.................................................. 8
References ..................................................................... 9
2 Machine Learning Techniques for Intelligent SDP....................... 13
2.1 Transfer Learning ....................................................... 13
2.2 Deep Learning........................................................... 14
2.3 Other Techniques........................................................ 15
2.3.1 Dictionary Learning ............................................ 15
2.3.2 Semi-Supervised Learning ..................................... 15
2.3.3 Multi-View Learning ........................................... 16
References ..................................................................... 16
3 Within-Project Defect Prediction .......................................... 19
3.1 Basic WPDP............................................................. 19
3.1.1 Dictionary Learning Based Software Defect Prediction ...... 19
3.1.2 Collaborative Representation Classification Based
Software Defect Prediction..................................... 26
3.2 Semi-supervised WPDP ................................................ 28
3.2.1 Sample-Based Software Defect Prediction with
Active and Semi-supervised Learning ......................... 28
References ..................................................................... 33
ix
x Contents
4 Cross-Project Defect Prediction ............................................ 35
4.1 Basic CPDP ............................................................. 36
4.1.1 Manifold Embedded Distribution Adaptation ................. 36
4.2 Class Imbalance Problem in CPDP .................................... 46
4.2.1 An Improved SDA Based Defect Prediction Framework ..... 46
4.3 Semi-Supervised CPDP................................................. 54
4.3.1 Cost-Sensitive Kernelized Semi-supervised
Dictionary Learning ............................................ 54
References ..................................................................... 61
5 Heterogeneous Defect Prediction .......................................... 65
5.1 Basic HDP............................................................... 66
5.1.1 Unified Metric Representation and CCA-Based
Transfer Learning ............................................... 66
5.2 Class Imbalance Problem in HDP...................................... 83
5.2.1 Cost-Sensitive Transfer Kernel Canonical
Correlation Analysis ............................................ 83
5.2.2 Other Solutions ................................................. 104
5.3 Multiple Sources and Privacy Preservation Problems in HDP........ 104
5.3.1 Multi-Source Selection Based Manifold
Discriminant Alignment ........................................ 104
5.3.2 Sparse Representation Based Double Obfuscation
Algorithm ....................................................... 109
References ..................................................................... 133
6 An Empirical Study on HDP Approaches ................................ 139
6.1 Goal Question Metric (GQM) Based Research Methodology ........ 139
6.1.1 Major Challenges ............................................... 139
6.1.2 Review of Research Status ..................................... 140
6.1.3 Analysis on Research Status ................................... 141
6.1.4 Research Goal................................................... 144
6.1.5 Research Questions ............................................. 145
6.1.6 Evaluation Metrics.............................................. 145
6.2 Experiments ............................................................. 147
6.2.1 Datasets ......................................................... 147
6.2.2 SDP Approaches for Comparisons............................. 149
6.2.3 Experimental Design ........................................... 150
6.2.4 Experimental Results ........................................... 151
6.3 Discussions .............................................................. 160
References ..................................................................... 168
7 Other Research Questions of SDP ......................................... 171
7.1 Cross-Version Defect Prediction ....................................... 171
7.1.1 Methodology .................................................... 171
7.1.2 Experiments ..................................................... 173
7.1.3 Discussions...................................................... 175
7.2 Just-in-Time Defect Prediction ......................................... 175
Contents xi
7.2.1 Methodology .................................................... 175
7.2.2 Experiments ..................................................... 179
7.2.3 Discussions...................................................... 187
7.3 Effort-Aware Just-in-Time Defect Prediction.......................... 188
7.3.1 Methodology .................................................... 188
7.3.2 Experiments ..................................................... 191
7.3.3 Discussions...................................................... 196
References ..................................................................... 198
8 Conclusion .................................................................... 203
8.1 Conclusion .............................................................. 203
Chapter 1
Introduction
1.1 Software Quality Assurance
With the increasing pressures of expediting software projects that is always
increasing in size and complexity to meet rapidly changing business needs, quality
assurance activities such as fault prediction models have thus become extremely
important. The main purpose of a fault prediction model is the effective allocation
or prioritization of quality assurance effort (test effort and code inspection effort).
Construction of these prediction models are mostly dependent on historical or
previous software project data referred to as a dataset.
However, a prevalent problem in data mining is the skewness of a dataset. Fault
prediction datasets are not excluded from this phenomenon. Most datasets have the
majority of the instances being either clean or not faulty and conventional learning
methods are primarily designed for balanced datasets. Common classifiers such
as Neural Networks (NN), Support Vector Machines (SVM), and decision trees
work best toward optimizing their objective functions, which lead to the maximum
overall accuracy—the ratio of correctly predicted instances to the total number of
instances. The use of imbalanced datasets for training a classifier will most likely
generate a classifier that tends to over-predict the presence of the majority class
but a lower probability of predicting the minority or faulty modules. When the
model predicts the minority class, it often has a higher error rate compared to
predictions for the majority class. This impacts the performance of classifiers in
machine learning and is known as learning from imbalanced datasets. This affects
the prediction performance of classifiers, and in machine learning, this issue is
known as learning from imbalanced datasets. Several methods have been proposed
in machine learning for dealing with the class imbalanced issue such as random
over and under sampling creating synthetic data application of cleaning techniques
for data sampling and cluster-based sampling. With a significant amount of literature
in machine learning for imbalanced datasets, very few studies have tackled it in the
area of fault prediction. The first of such studies by Kamei et al. [1] showed that
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X.-Y. Jing et al., Intelligent Software Defect Prediction,
https://doi.org/10.1007/978-981-99-2842-2_1
1
2 1 Introduction
sampling techniques improved the prediction performance of linear and logistics
models, whilst other two models (neural network and classification tree) did not
have a better performance upon application of the sampling techniques.
Interestingly, sampling techniques applied to datasets during fault prediction
are mostly evaluated in terms of Accuracy, AUC, F1-measure, Geometric Mean
Accuracy just to name a few; however, these measures ignore the effort needed to
fix faults, that is, they do not distinguish between a predicted fault in a small module
and a predicted fault in a large module. Nickerson et al. [2] conclude that to evaluate
the performance of classifiers on imbalanced datasets, accuracy or its inverse error
rate should never be used. Chawla et al. [3] also allude to the conclusion that simple
predictive accuracy might not be appropriate for an imbalanced dataset. The goal
of this research is to improve the prediction performance of fault-prone module
prediction models, applying over and under sampling approaches to rebalance
number of fault-prone modules and non-fault-prone modules in the training dataset
and to find the appropriate distribution or proportion of faulty and non-faulty
modules that results in the best performance. The experiment focuses on the use
of Norm(Popt), which is an effort-aware measure proposed by Kamei et al. [4] to
evaluate the effect of over/under sampling on prediction models to find out if the
over/under sampling is still effective in a more realistic setting.
1.2 Software Defect Prediction
The defect is a flaw in the component or system which can cause it to fail to perform
its desired function, that is, an incorrect statement or data definition. A defect, if
encountered during execution, may cause a failure of the system or a component.
Defect prediction helps in identifying the vulnerabilities in the project plan in
terms of lack of resources, improperly defined timelines, predictable defects, etc.
It can help organizations to fetch huge profits without getting delayed on schedules
planned or overrun on estimates of budget. It helps in modifying the parameters
in order to meet the schedule variations. The methods to estimate the software
defects are regression, genetic programming, clustering, neural network, statistical
technique of discriminate analysis, dictionary learning approach, hybrid attribute
selection approach, classification, attribute selection and instance filtering, Bayesian
belief networks, K-means clustering, and association rule mining. In the domain of
software defect prediction, people have developed many software defect prediction
models. These models are mostly described in two classes: one class is in the later
period of the software life cycle (testing phase), having gotten defect data, predicts
how many defects still in the software with these data. Models in this class include:
capture-recapture method based model, neural network based model, and measure
method based on scale and complexity of source code. The other class, which occurs
before the software development phase, aims to predict the number of defects that
will arise during the software development process by analyzing defect data from
previous projects. Presently, published models in this class include: phase based
1.3 Research Directions of SDP 3
model proposed by Gaffney and Davis, Ada programming defect prediction model
proposed by Agresti and Evanco, early prediction model proposed by USA ROME
lab, software development early prediction method proposed by Carol Smidts in
Maryyland University, and early fuzzy neural network based model. However,
there are a number of serious theoretical and practical problems in these methods.
Software development is an extremely complicated process. Defects relate to many
factors. If you want to measure exactly, you would consider as many correlative
factors as possible, but it would make the model more complicated. If considering
the solvability, you would have to simplify the model. However, it would not make
out the convinced answer. Neural network based prediction model, for instance, has
lots of problems in training and verifying the sample collection. Software test in
many organizations is still in the original phase, so lots of software hardly gives
the defects number requested, which would bring certain difficulties to sample
collection. Early models consider inadequately on the uncertain factors in software
develop process; the dependence to data factors is great besides. Therefore, many
methods have difficulty in application.
1.3 Research Directions of SDP
1.3.1 Within-Project Defect Prediction (WPDP)
Some defect data in the same project are used as the training set to build the
prediction model, and the remaining small number of data are used as test set to
test the performance. At present, some researchers mainly use the machine learning
algorithm to construct the defect prediction model on the within-project defect
prediction. In addition, how to optimize the data structure and extract effective
feature are also the focus of current research. Some important research works
will be summarized below. Elish et al. [5] use support vector machine (SVM)
to conduct defect prediction and compare its predictive performance with eight
statistical and machine learning models on four NASA datasets. Lu et al. [6]
leverage active learning to predict defect, and they also use feature compression
techniques to make feature reduction on defect data. Li et al. [7] propose a novel
semi-supervised learning method—ACoForest—which can sample the prediction
modules that are most helpful for learning. Rodriguez et al. [8] compare different
methods for different data preprocessing problems, such as sampling method, cost
sensitive method, integration method, and hybrid method. The final experimental
results show that the above different methods can effectively improve the accuracy
of defect prediction after performing the class imbalance. Seiffert et al. [9] analyze
11 different algorithms and seven different data sampling techniques and find that
class imbalance and data noise would have the negative impact on prediction
performance.
4 1 Introduction
1.3.2 Cross-Project Defect Prediction (CPDP)
When data are insufficient or non-existent for building quality defect predictors,
software engineers can use data from other organizations or projects. This is called
cross-project defect prediction (CPDP). Acquiring data from other sources is a
non-trivial task when data owners are concerned about confidentiality. In practice,
extracting project data from organizations is often difficult due to the business
sensitivity associated with the data. For example, at a keynote address at ESEM’11,
Elaine Weyuker doubted that she will ever be able to release the AT&T data she used
to build defect predictors [10]. Due to similar privacy concerns, we were only able to
add seven records from two years of work to our NASA-wide software cost metrics
repository [11]. In a personal communication, Barry Boehm stated that he was able
to publish less than 200 cost estimation records even after 30 years of COCOMO
effort. To enable sharing, we must assure confidentiality. In our view, confidentiality
is the next grand challenge for CPDP in software engineering. In previous work, we
allowed data owners to generate minimized and obfuscated versions of their original
data. Our MORPH algorithm [12] reflects on the boundary between an instance
and its nearest instance of another class, and MORPH’s restricted mutation policy
never pushes an instance across that boundary. MORPH can be usefully combined
with the CLIFF data minimization algorithm [13]. CLIFF is an instance selector
that returns a subset of instances that best predict for the target class. Previously
we reported that this combination of CLIFF and MORPH resulted in 7/10 defect
datasets studied retaining high privacy scores, while remaining useful for CPDP
[13]. This is a startling result since research by Grechanik et al. [14] and Brickell
et al. [15] showed that standard privacy methods increase privacy while decreasing
data mining efficacy. While useful CLIFF and MORPH only considered a single-
party scenario where each data owner privatized their data individually without
considering privatized data from others. This resulted in privatized data that were
directly proportional in size (number of instances) to the original data. Therefore, in
a case where the size of the original data is small enough, any minimization might
be meaningless, but if the size of the original data is large, minimization may not be
enough to matter in practice.
1.3.3 Heterogeneous Defect Prediction (HDP)
Existing CPDP approaches are based on the underlying assumption that both
source and target project data should exhibit the same data distribution or are
drawn from the same feature space (i.e., the same software metrics). When the
distribution of the data changes, or when the metrics features for source and target
projects are different, one cannot expect the resulting prediction performance to be
satisfactory. We consider these scenarios as Heterogeneous Cross-Project Defect
Prediction (HCPDP). Mostly, the software defect datasets are imbalanced, which
1.3 Research Directions of SDP 5
means the number of the defective modules is usually much smaller than that of the
defective-free modules. The imbalanced nature of data can cause poor prediction
performance. That is, the probability of defect prediction can be low, while the
overall performance is high. Without taking this issue into account, the effectiveness
of software defect prediction in many real-world tasks would be greatly reduced.
Recently, some researchers have noticed the importance of these problems in
software defect prediction. For example, Nam et al. [16] used the metrics selection
and metrics matching to select similar metrics for building a prediction model with
heterogeneous metrics set. They discarded dissimilar metrics, which may contain
useful information for training. Jing et al. [17] introduced Canonical Correlation
Analysis (CCA) into HCPDP, by constructing the common correlation space to
associate cross-project data. Then, one can simply project the source and target
project data into this space for defect prediction. Like previous CPDP methods,
the class imbalance problem of software defect datasets was not taken into account.
Ryu et al. [18] designed the Value-Cognitive Boosting with Support Vector Machine
(VCB-SVM) algorithm which exploited sampling techniques to solve the class
imbalance issue for cross-project environments. Nevertheless, sampling strategy
alters the distribution of the original data, where it may discard some potentially
useful samples that could be important for prediction process. Therefore, these
methods are not good solutions for addressing the class imbalance issue under
heterogeneous cross-project environments.
1.3.4 Other Research Questions of SDP
1.3.4.1 Cross-Version Defect Prediction
Cross Version Defect Prediction (CVDP) is a practical scenario by training the
classification model on the historical data of the prior version and then predicting
the defect labels of modules of the current version. Bennin et al. [19] evaluated
the defect prediction performance of 11 basic classification models in IVDP and
CVDP scenarios with an effort-aware indicator. They conducted experiments on 25
projects (each one has two versions with process metrics) and found that the optimal
models for the two defect prediction scenarios are not identical due to different data
as the training set. However, the performance differences of the 11 models are not
significant in both scenarios. Premraj et al. [20] investigated the impacts of code and
network metrics on the defect prediction performance of six classification models.
They considered three scenarios, including IVDP, CVDP, and CPDP. CPDP uses
the defect data of another project as the training set. Experiments on three projects
(each with two versions) suggested that the network metrics are better than the code
metrics in most cases. Holschuh et al. [21] explored the performance of CVDP
on a large software system by collecting four types of metrics. The experiments
on six projects (each with three versions) showed that the overall performance
is unsatisfactory. Monden et al. [22] evaluated the cost effectiveness of defect
6 1 Introduction
prediction on three classification models by comparing seven test effort allocation
strategies. The results on one project with five versions revealed that the reduction of
test effort relied on the appropriate test strategy. Khoshgoftaar et al. [23] studied the
performance of six classification models on one project with four versions and found
that CART model with lease absolute deviation performed the best. Zhao et al. [24]
investigated the relationship between the context-based cohesion metrics and the
defect-proneness in IVDP and CVDP scenarios. They conducted CVDP study on
four projects with total 19 versions and found that context-based cohesion metrics
had negative impacts on defect prediction performance but can be complementary to
non-context-based metrics. Yang et al. [25] surveyed the impacts of code, process,
and slice-based cohesion metrics on defect prediction performance in IVDP, CVDP,
and CPDP scenarios. They conducted CVDP study on one project with seven
versions and found that slice-based cohesion metrics had adverse impacts on defect
prediction performance but can be complementary to the commonly used metrics.
Wang et al. [26] explored the performance of their proposed semantic metrics on
defect prediction in CVDP and CPDP scenarios. The experiments on ten projects
with 26 versions showed the superiority of the semantic metrics compared with
traditional CK metrics and AST metrics.
1.3.4.2 Just-in-Time Defect Prediction
Just-in-time defect prediction aims to predict if a particular file involved in a commit
(i.e., a change) is buggy or not. Traditional just-in-time defect prediction techniques
typically follow the following steps:
Training Data Extraction. For each change, label it as buggy or clean by mining
a project’s revision history and issue tracking system. Buggy change means the
change contains bugs (one or more), while clean change means the change has no
bug.
Feature Extraction. Extract the values of various features from each change.
Many different features have been used in past change classification studies. Model
Learning. Build a model by using a classification algorithm based on the labeled
changes and their corresponding features.
Model Application. For a new change, extract the values of various features.
Input these values to the learned model to predict whether the change is buggy or
clean.
The studies by Kamei et al. [32] are great source of inspiration for our work. They
proposed a just-in-time quality assurance technique that predicts defects at commit-
level trying to reduce the effort of a reviewer. Later on, they also evaluated how
just-in-time models perform in the context of cross-project defect prediction [19].
Findings report good accuracy for the models not only in terms of both precision
and recall but also in terms of saved inspection effort. Our work is complementary
to these papers. In particular, we start from their basis of detecting defective commits
and complement this model with the attributes necessary to filter only those files that
are defect-prone and should be more thoroughly reviewed. Yang et al. [25] proposed
1.4 Notations and Corresponding Descriptions 7
the usage of alternative techniques for just-in-time quality assurance, such as cached
history, deep learning, and textual analysis, reporting promising results. We did not
investigate these further in the current chapter, but studies can be designed and
carried out to determine if and how these techniques can be used within the model
we present in this chapter to further increase its accuracy.
1.3.4.3 Effort-Aware Defect Prediction
Traditional SDP models based on some binary classification algorithms are not
sufficient for software testing in practice, since they do not distinguish between a
module with many defects or high defect density (i.e., number of defects/lines of
source codes) and a module with a small number of defects or low defect density.
Clearly, both modules require a different amount of effort to inspect and fix, yet they
are considered equal and allocated the same testing resources. Therefore, Mende et
al. [27] proposed effort-aware defect prediction (EADP) models to rank software
modules based on the possibility of these modules being defective, their predicted
number of defects, or defect density. Generally, EADP models are constructed by
using learning to rank techniques [28]. These techniques can be grouped into three
categories, that is, the pointwise approach, the pairwise approach, and the listwise
approach [29–31]. There exists a vast variety of learning to rank algorithms in
literature. It is thus important to empirically and statistically compare the impact
and effectiveness of different learning to rank algorithms for EADP. To the best
of our knowledge, few prior studies [32–36] evaluated and compared the existing
learning to rank algorithms for EADP. Most of these studies, however, conducted
their study with few learning to rank algorithms across a small number of datasets.
Previous studies [34–36] conducted their study with as many as five EADP models
and few datasets. For example, Jiang et al. [34] investigated the performance of only
five classification-based pointwise algorithms for EADP on two NASA datasets.
Nguyen et al. [36] investigated three regression based pointwise algorithm and two
pairwise algorithms for EADP on five Eclipse CVS datasets.
1.4 Notations and Corresponding Descriptions
We will briefly introduce some of the symbols and abbreviations that appear in this
book, as listed in the Table 1.1: Some parts are listed in the table, and the parts that
are not listed will be made in the corresponding text: Detailed description.
8 1 Introduction
Table 1.1 Symbols and corresponding descriptions
Symbol/Abbreviation Description
SDP Software Defect Prediction
WPDP Within-Project Defect Prediction
HCCDP Heterogeneous cross-company defect predecton
CPDP Cross-Project Defect Prediction
HDP Heterogeneous Defect Prediction
CCA Canonical correlation analysis
TKCCA Transfer kernel canonical correlation analysis
CTKCCA Cost-sensitive transfer kernel canonical correlation analysis
GQM Goal Question Metric
ROC Receiver operating characteristic
MDA A manifold embedded distribution adaptation
SDA Subclass discriminant analysis
.⇒ The source company data and the right side of “.⇒” represents the
target company data
.a = [a1, a2, . . . an] .a is a vector, and .ai is the .ith component
.a The length of a vector
.∈ An element belongs to a set
.tr(·) The trace of a matrix
1.5 Structure of This Book
In the second chapter of this book, several common learning algorithms and their
applications in software defect prediction are briefly introduced, including deep
learning, transfer learning, dictionary learning, semi-supervised learning, and multi-
view learning.
In Chap. 3, we discussed mainly about within-project defect prediction but first
introduced basic WPDP including dictionary learning based software defect pre-
diction, collaborative representation classification based software defect prediction
and then introduced the sample-based software defect prediction with active and
semi-supervised learning belonging to the semi-supervised WPDP.
In Chap. 4, we expounded some methodologies on cross-project defect pre-
diction, including basic CPDP, among which we introduced manifold embedded
distribution adaptation; for class imbalance problem in CPDP, we proposed an
improved SDA based defect prediction framework; finally, in semi-supervised
CPDP, we introduced cost-sensitive kernelized semi-supervised dictionary learning.
In Chap. 5, we introduce Heterogeneous Defect Prediction (HDP), first explain-
ing unified metric representation and CCA-based transfer learning in basic HDP;
then in class imbalance problem in HDP, we introduce cost-sensitive transfer kernel
canonical correlation analysis. Finally, regarding multiple sources and privacy
preservation problems in HDP, we have introduced multi-source selection based
References 9
manifold discriminant alignment and sparse representation based double obfusca-
tion algorithm.
In Chap. 6, an empirical study on HDP approaches is introduced, including
heterogeneous defect prediction and Goal Question Metric (GQM) based research
methodology.
Finally, in Chap. 7 of this book, we discuss other research questions of SDP,
mainly including the following aspects: cross-version defect prediction, just-in-time
defect prediction and effort-aware just-in-time defect prediction.
References
1. Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto KI (2007) The Effects of
Over and Under Sampling on Fault-prone Module Detection. In: Proceedings of the First
International Symposium on Empirical Software Engineering and Measurement, pp 196–204.
https://doi.org/10.1109/ESEM.2007.28
2. Nickerson A, Japkowicz N, Milios EE (2001) Using Unsupervised Learning to Guide
Resampling in Imbalanced Data Sets. In: Proceedings of the Eighth International Workshop
on Artificial Intelligence and Statistics. http://www.gatsby.ucl.ac.uk/aistats/aistats2001/files/
nickerson155.ps
3. Chawla NV (2010) Data Mining for Imbalanced Datasets: An Overview. In: Proceedings of
the Data Mining and Knowledge Discovery Handbook, pp 875–886. https://doi.org/10.1007/
978-0-387-09823-4_45
4. Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting
common bug prediction findings using effort-aware models. In: Proceedings of the 26th IEEE
International Conference on Software Maintenance, pp 1–10. https://doi.org/10.1109/ICSM.
2010.5609530
5. Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector
machines. J Syst Softw 81(5):649–660. https://doi.org/10.1016/j.jss.2007.07.040
6. Lu H, Kocaguneli E, Cukic B (2014) Defect Prediction between Software Versions with
Active Learning and Dimensionality Reduction. In: Proceedings of the 25th IEEE International
Symposium on Software Reliability Engineering, pp 312–322. https://doi.org/10.1109/ISSRE.
2014.35
7. Li M, Zhang H, Wu R, Zhou Z (2012) Sample-based software defect prediction with active and
semi-supervised learning. Autom Softw Eng 19(2):201–230. https://doi.org/10.1007/s10515-
011-0092-1
8. Rodríguez D, Herraiz I, Harrison R, Dolado JJ, Riquelme JC (2014) Preliminary comparison
of techniques for dealing with imbalance in software defect prediction. In: Proceedings of
the 18th International Conference on Evaluation and Assessment in Software Engineering, pp
43:1–43:10. https://doi.org/10.1145/2601248.2601294
9. Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2007) An Empirical Study of the
Classification Performance of Learners on Imbalanced and Noisy Software Quality Data. In:
Proceedings of the IEEE International Conference on Information Reuse and Integration, pp
651–658. https://doi.org/10.1109/IRI.2007.4296694
10. Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? Using the number
of developers to enhance defect prediction models. Empir Softw Eng 13(5):539–559. https://
doi.org/10.1007/s10664-008-9082-8
11. Menzies T, El-Rawas O, Hihn J, Feather MS, Madachy RJ, Boehm BW (2007) The business
case for automated software engineering. In: Proceedings of the 22nd IEEE/ACM International
Conference on Automated Software Engineering ASE 2007, pp 303–312. https://doi.org/10.
1145/1321631.1321676
10 1 Introduction
12. Peters F, Menzies T (2012) Privacy and utility for defect prediction: Experiments with
MORPH. In: Proceedings of the 34th International Conference on Software Engineering, pp
189–199. https://doi.org/10.1109/ICSE.2012.6227194
13. Peters F, Menzies T, Gong L, Zhang H (2013) Balancing Privacy and Utility in Cross-Company
Defect Prediction. IEEE Trans Software Eng 39(8):1054–1068. https://doi.org/10.1109/TSE.
2013.6
14. Grechanik M, Csallner C, Fu C, Xie Q (2010) Is Data Privacy Always Good for Software
Testing?. In: Proceedings of the IEEE 21st International Symposium on Software Reliability
Engineering, pp 368–377. https://doi.org/10.1109/ISSRE.2010.13
15. Brickell J, Shmatikov V (2008) The cost of privacy: destruction of data-mining utility
in anonymized data publishing. In: Proceedings of the 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pp 70–78. https://doi.org/10.1145/
1401890.1401904
16. Nam J, Kim S (2015) Heterogeneous defect prediction. In: Proceedings of the 10th Joint Meet-
ing on Foundations of Software Engineering, pp 508–519. https://doi.org/10.1145/2786805.
2786814
17. Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction
by unified metric representation and CCA-based transfer learning. In: Proceedings of the 2015
10th Joint Meeting on Foundations of Software Engineering, pp 496–507. https://doi.org/10.
1145/2786805.2786813
18. Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine
for cross-project defect prediction. Empir Softw Eng 21(1):43–71. https://doi.org/10.1007/
s10664-014-9346-4
19. Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical Evaluation
of Cross-Release Effort-Aware Defect Prediction Models. In: Proceedings of the 2016 IEEE
International Conference on Software Quality, pp 214–221. https://doi.org/10.1109/QRS.2016.
33
20. Premraj R, Herzig K (2011) Network Versus Code Metrics to Predict Defects: A Replication
Study. In: Proceedings of the 5th International Symposium on Empirical Software Engineering
and Measurement, pp 215–224. https://doi.org/10.1109/ESEM.2011.30
21. Holschuh T, Pauser M, Herzig K, Zimmermann T, Premraj R, Zeller A (2009) Predicting
defects in SAP Java code: An experience report. In: Proceedings of the 31st International Con-
ference on Software Engineering, pp 172–181. https://doi.org/10.1109/ICSE-COMPANION.
2009.5070975
22. Monden A, Hayashi T, Shinoda S, Shirai K, Yoshida J, Barker M, Matsumoto K (2013)
Assessing the Cost Effectiveness of Fault Prediction in Acceptance Testing. IEEE Trans Softw
Eng 39(10):1345–1357. https://doi.org/10.1109/TSE.2013.21
23. Khoshgoftaar TM, Seliya N (2003) Fault Prediction Modeling for Software Quality Estimation:
Comparing Commonly Used Techniques Empir. Softw Eng 8(3):255–283. https://doi.org/10.
1023/A:1024424811345
24. Zhao Y, Yang Y, Lu H, Liu J, Leung H, Wu Y, Zhou Y, Xu B (2017) Understanding the value
of considering client usage context in package cohesion for fault-proneness prediction Autom.
Softw Eng 24(2):393–453. https://doi.org/10.1007/s10515-016-0198-6
25. Yang Y, Zhou Y, Lu H, Chen L, Chen Z, Xu B, Leung HKN, Zhang Z (2015) Are Slice-Based
Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction?
An Empirical Study IEEE Trans. Softw Eng 41(4):331–357. https://doi.org/10.1109/TSE.2014.
2370048
26. Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction.
In: Proceedings of the 38th International Conference on Software Engineering, pp 297–308.
https://doi.org/10.1145/2884781.2884804
27. Mende T, Koschke R (2010) Effort-Aware Defect Prediction Models. In: Proceedings of the
14th European Conference on Software Maintenance and Reengineering, pp 107–116. https://
doi.org/10.1109/CSMR.2010.18
References 11
28. Wang F, Huang J, Ma Y (2018) A Top-k Learning to Rank Approach to Cross-Project Software
Defect Prediction. In: Proceedings of the 25th Asia-Pacific Software Engineering Conference,
pp 335–344. https://doi.org/10.1109/APSEC.2018.00048
29. Shi Z, Keung J, Bennin KE, Zhang X (2018) Comparing learning to rank techniques in hybrid
bug localization. Appl Soft Comput 62636-648. https://doi.org/10.1016/j.asoc.2017.10.048
30. Liu T (2010) Learning to rank for information retrieval. In: Proceedings of the Proceeding of
the 33rd International ACM SIGIR Conference on Research and Development in Information
Retrieval, pp 904. https://doi.org/10.1145/1835449.1835676
31. Yu X, Li Q, Liu J (2019) Scalable and parallel sequential pattern mining using spark. World
Wide Web 22(1):295–324. https://doi.org/10.1007/s11280-018-0566-1
32. Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical Evaluation
of Cross-Release Effort-Aware Defect Prediction Models. In: Proceedings of the 2016 IEEE
International Conference on Software Quality, pp 214–221. https://doi.org/10.1109/QRS.2016.
33
33. Yang X, Wen W (2018) Ridge and Lasso Regression Models for Cross-Version Defect
Prediction. IEEE Trans Reliab 67(3):885–896. https://doi.org/10.1109/TR.2018.2847353
34. Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw
Eng 13(5):561–595. https://doi.org/10.1007/s10664-008-9079-3
35. Mende T, Koschke R (2009) Revisiting the evaluation of defect prediction models. In:
Proceedings of the 5th International Workshop on Predictive Models in Software Engineering,
pp 7. https://doi.org/10.1145/1540438.1540448
36. Nguyen TT, An TQ, Hai VT, Phuong TM (2014) Similarity-based and rank-based defect
prediction. In: Proceedings of the 2014 International Conference on Advanced Technologies
for Communications (ATC 2014), pp 321–325.
Chapter 2
Machine Learning Techniques
for Intelligent SDP
Abstract In this chapter, several common learning algorithms and their applica-
tions in software defect prediction are briefly introduced, including deep learning,
transfer learning, dictionary learning, semi-supervised learning, and multi-view
learning.
2.1 Transfer Learning
In many real world applications, it is expensive or impossible to recollect the needed
training data and rebuild the models. It would be nice to reduce the need and
effort to recollect the training data. In such cases, transfer learning (TL) between
task domains would be desirable. Transfer learning exploits the knowledge gained
from a previous task to improve generalization on another related task. Transfer
learning can be useful when there is not enough labeled data for the new problem
or when the computational cost of training a model from scratch is too high.
Traditional data mining and machine learning algorithms make predictions on the
future data using statistical models that are trained on previously collected labeled
or unlabeled training data. Most of them assume that the distributions of the labeled
and unlabeled data are the same. Transfer learning (TL), in contrast, allows the
domains, tasks, and distributions used in training and testing to be different. It is
used to improve a learner from one domain by transferring information from a
related domain. Research on transfer learning has attracted more and more attention
since 1995. Today, transfer learning methods appear in several top venues, most
notably in data mining and applications of machine learning and data mining
Due to their strong ability of domain adaptation, researchers introduce TL
techniques to cross-project or heterogeneous defect prediction in recent years.
The application of TL in cross-project defect prediction (CPDP) aims to reduce
the distribution difference between source and target data. For example, Nam
et al. [1] proposed a new CPDP method called TCA+, which extends transfer
component analysis (TCA) by introducing a set of rules for selecting an appropriate
normalization method to obtain better CPDP performance. Krishna and Menzies [2]
introduced a baseline method named Bellwether for cross-project defect prediction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X.-Y. Jing et al., Intelligent Software Defect Prediction,
https://doi.org/10.1007/978-981-99-2842-2_2
13
14 2 Machine Learning Techniques for Intelligent SDP
based on existing CPDP methods. For heterogeneous defect prediction (HDP),
TL techniques are applied not only to reduce the distribution difference between
source and target data but also to eliminate the heterogeneity of metrics between
source and target projects. Jing et al. [3] proposed an HDP method named CCA+,
which uses the canonical correlation analysis (CCA) technique and the unified
metric representation (UMR) to find the latent common feature space between the
source and target projects. Specifically, the UMR is made of three kinds of metrics,
including the common metrics of the source and target data, source-specific metrics,
and target-specific metrics. Based on UMR, the transfer learning method based on
CCA is introduced to find common metrics by maximizing the canonical correlation
coefficient between source and target data.
2.2 Deep Learning
Deep learning (DL) is an extension of prior work on neural networks where the
“deep” refers to the use of multiple layers in the network. In the 1960s and 1970s,
it was found that very simple neural nets can be poor classifiers unless they are
extended with (a) extra layers between inputs and outputs and (b) a nonlinear
activation function controlling links from inputs to a hidden layer (which can be
very wide) to an output layer. Essentially, deep learning is a modern variation
on the above which is concerned with a potentially unbounded number of layers
of bounded size. In the last century, most neural networks used the “sigmoid”
activation function .f (x) = 1
1+e−x , which was subpar to other learners in several
tasks. It was only when the ReLU activation function .f (x) = max(0, x) was
introduced by Nair and Hinton [4] that their performance increased dramatically,
and they became popular.
With its strong representation learning ability, deep learning technology has
quickly gained favor in the field of software engineering. In software defect
prediction (SDP), researchers began to use DL techniques to extract deep features
of defect data. Wang et al. [6] first introduced the Deep Belief Network (DBN)
[5] that learns semantic features and then uses classical learners to perform defect
prediction. In this approach, for each file in the source code, they extract tokens,
disregarding ones that do not affect the semantics of the code, such as variable
names. These tokens are vectorized and given unique numbers, forming a vector
of integers for each source file. Wen et al. [7] utilized Recurrent Neural Network
(RNN) to encode features from sequence data automatically. They propose a novel
approach called FENCES, which extracts six types of change sequences covering
different aspects of software changes via fine-grained change analysis. It approaches
defect prediction by mapping it to a sequence labeling problem solvable by RNN.
2.3 Other Techniques 15
2.3 Other Techniques
2.3.1 Dictionary Learning
Both sparse representation and dictionary learning have been successfully applied
to many application fields, including image clustering, compressed sensing as
well as image classification tasks. In sparse representation based classification,
the dictionary for sparse coding could be predefined. For example, Wright et al.
[8] directly used the training samples of all classes as the dictionary to code the
query face image and classified the query face image by evaluating which class
leads to the minimal reconstruction error. However, the dictionary in his method
may not be effective enough to represent the query images due to the uncertain
and noisy information in the original training images. In addition, the number of
atoms of dictionary that is made up of image samples can also be very large, which
increases the coding complexity. Dictionary learning (DL) aims to learn from the
training samples’ space where the given signal could be well represented or coded
for processing. Most DL methods attempt to learn a common dictionary shared by
all classes as well as a classifier of coefficients for classification.
Usually, the dictionary can be constructed by directly using the original training
samples, whereas the original samples have much redundancy and noise, which are
adverse to prediction. For the purpose of further improving the classification ability,
DL techniques have been adopted in SDP tasks recently to represent project modules
well. For example, Jing et al. [14] are the first to apply the DL technology to the field
of software defect prediction and proposed a cost-sensitive discriminative dictionary
learning (CDDL) approach. Specifically, CDDL introduces misclassification costs
and builds the over-complete dictionary for software project modules.
2.3.2 Semi-Supervised Learning
Due to the lack of labeled data, Semi-Supervised Learning (SSL) has always been
a hot topic in machine learning. A myriad of SSL methods have been proposed.
For example, co-training is a well-known disagreement-based SSL method, which
trains different learners to exploit unlabeled data. Pseudo-label style methods
label unlabeled data with pseudo labels. Graph-based methods aim to construct a
similarity graph, through which label information propagates to unlabeled nodes.
Local smoothness regularization-based methods represent another widely recog-
nized category of semi-supervised learning (SSL) techniques, which leverage the
inherent structure of the data to improve learning accuracy. Different methods apply
different regularizers, such as Laplacian regularization, manifold regularization, and
virtual adversarial regularization. For example, Miyato et al. [11] proposed a smooth
regularization method called virtual adversarial training, which enables the model
16 2 Machine Learning Techniques for Intelligent SDP
to output a smooth label distribution for local perturbations of a given input. There
are other popular methods, for example, Ladder Network.
Since large unlabeled data exist in software projects, many SSL techniques have
been considered in SDP tasks. Wang et al. [9] proposed a non-negative sparse-based
semiboost learning approach for software defect prediction. Benefit from the idea
of semi-supervised learning, this approach is capable of exploiting both labeled and
unlabeled data and is formulated in a boosting framework. Besides, Zhang et al. [10]
used graph-based semi-supervised learning technique to predict software defect.
This approach utilizes not only few labeled data but also abundant unlabeled ones
to improve the generalization capability.
2.3.3 Multi-View Learning
Representation learning is a prerequisite step in many multi-view learning tasks.
In recent years, a variety of classical multi-view representation learning methods
have been proposed. These methods follow the previously presented taxonomy,
that is, joint representation, alignment representation, as well as shared and specific
representation. For example, based on Markov network, Chen et al. [12] presented a
large-margin predictive multi-view subspace learning method, which joints features
learned from multiple views. Jing et al. [13] proposed an intra-view and inter-view
supervised correlation analysis method for image classification, in which CCA was
applied to align multi-view features.
Deep multi-view representation learning works also follow the joint repre-
sentation, alignment representation, as well as shared and specific representation
classification paradigm. For example, Kan et al. [14] proposed a multi-view deep
network for cross-view classification. This network first extracts view-specific
features with a sub-network and then concatenates and feeds these features into
a common network, which is designed to project them into one uniform space.
Harwath et al. [15] presented an unsupervised audiovisual matchmap neural net-
work, which applies similarity metric and pairwise ranking criterion to align visual
objects and spoken words. Hu et al. [16] introduced a sharable and individual multi-
view deep metric learning method. It leverages view-specific networks to extract
individual features from each view and employs a common network to extract
shared features from all views.
References
1. Nam, Jaechang and Pan, Sinno Jialin and Kim, Sunghun. Transfer defect learning. 35th
international conference on software engineering (ICSE), 382–391, 2013.
2. Krishna, Rahul and Menzies, Tim. Bellwethers: A baseline method for transfer learning. IEEE
Transactions on Software Engineering, 45(11):1081–1105, 2018.
References 17
3. Jing, Xiaoyuan and Wu, Fei and Dong, Xiwei and Qi, Fumin and Xu, Baowen. Heterogeneous
cross-company defect prediction by unified metric representation and CCA-based transfer
learning. In Proceedings of the 2015 10th joint meeting on foundations of software engineering,
pages 496–507, 2015.
4. Nair, Vinod and Hinton, Geoffrey E. Rectified linear units improve restricted Boltzmann
machines. In Icml’10, 2010.
5. Hinton, Geoffrey E. Deep belief networks. Scholarpedia, 4(5):5947, 2009.
6. Wang, Song and Liu, Taiyue and Tan, Lin. Automatically learning semantic features for defect
prediction. In IEEE/ACM 38th International Conference on Software Engineering (ICSE),
pages 297–308, 2016.
7. Wen, Ming and Wu, Rongxin and Cheung, Shing-Chi. How well do change sequences
predict defects? sequence learning from software changes. IEEE Transactions on Software
Engineering, 46(11):1155–1175, 2018.
8. Wright, John and Yang, Allen Y and Ganesh, Arvind and Sastry, S Shankar and Ma, Yi. Robust
face recognition via sparse representation. IEEE transactions on pattern analysis and machine
intelligence, 31(2):210–227, 2008.
9. Wang, Tiejian and Zhang, Zhiwu and Jing, Xiaoyuan and Liu, Yanli. Non-negative sparse-
based SemiBoost for software defect prediction. Software Testing, Verification and Reliability,
26(7):498–515, 2016.
10. Zhang, Zhi-Wu and Jing, Xiao-Yuan and Wang, Tie-Jian. Label propagation based semi-
supervised learning for software defect prediction. Automated Software Engineering,
24(7):47–69, 2017.
11. Miyato, Takeru and Maeda, Shin-ichi and Koyama, Masanori and Ishii, Shin. Virtual
adversarial training: a regularization method for supervised and semi-supervised learning.
IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018.
12. Chen, Ning and Zhu, Jun and Sun, Fuchun and Xing, Eric Poe. Large-margin predictive latent
subspace learning for multiview data analysis. IEEE transactions on pattern analysis and
machine intelligence, 34(12):2365–2378, 2012.
13. Jing, Xiao-Yuan and Hu, Rui-Min and Zhu, Yang-Ping and Wu, Shan-Shan and Liang, Chao
and Yang, Jing-Yu. Intra-view and inter-view supervised correlation analysis for multi-view
feature learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1882–
1889, 2014.
14. Kan, Meina and Shan, Shiguang and Chen, Xilin. Multi-view deep network for cross-view
classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 4847–4855, 2016.
15. Harwath, David and Torralba, Antonio and Glass, James. Unsupervised learning of spoken
language with visual context. In Advances in Neural Information Processing Systems, pages
1858–1866, 2016.
16. Hu, Junlin and Lu, Jiwen and Tan, Yap-Peng. Sharable and individual multi-view metric
learning. IEEE transactions on pattern analysis and machine intelligence, 40(9):2281–2288,
2017.
Chapter 3
Within-Project Defect Prediction
Abstract In order to improve the quality of a software system, software defect
prediction aims to automatically identify defective software modules for efficient
software test. To predict software defect, those classification methods with static
code attributes have attracted a great deal of attention. In recent years, machine
learning techniques have been applied to defect prediction. Due to the fact that
there exists the similarity among different software modules, one software module
can be approximately represented by a small proportion of other modules. And
the representation coefficients over the pre-defined dictionary, which consists of
historical software module data, are generally sparse. We propose a cost-sensitive
discriminative dictionary learning (CDDL) approach for software defect classifica-
tion and prediction. The widely used datasets from NASA projects are employed
as test data to evaluate the performance of all compared methods. Experimental
results show that CDDL outperforms several representative state-of-the-art defect
prediction methods.
3.1 Basic WPDP
3.1.1 Dictionary Learning Based Software Defect Prediction
3.1.1.1 Methodology
To fully exploit the discriminative information of training samples for improving the
performance of classification, we design a supervised dictionary learning approach,
which learns a dictionary that can represent the given software module more
effectively. Moreover, the supervised dictionary learning can also reduce both
the number of dictionary atoms and the sparse coding complexity. Instead of
learning a shared dictionary for all classes, we learn a structured dictionary .D =
[D1, . . . , Di, . . . , Dc], where .Di is the class-specified sub-dictionary associated
with class i, and c is the total number of classes. We use the reconstruction error
to do classification with such a dictionary D, as the SRC method does.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X.-Y. Jing et al., Intelligent Software Defect Prediction,
https://doi.org/10.1007/978-981-99-2842-2_3
19
20 3 Within-Project Defect Prediction
Suppose that .A = [A1, . . . , Ai, . . . , Ac] is the set of training samples (labeled
software modules), .Ai is the subset of the training samples from class i, .X =
[X1, . . . , Xi, . . . , Xc] is the coding coefficient matrix of A over D, that is,.A ≈ DX,
where .Xi is the sub-matrix containing the coding coefficients of .Ai over D. We
require that D should have not only powerful reconstruction capability of A but
also powerful discriminative capability of classes in A. Thus, we propose the cost-
sensitive discriminative dictionary learning (CDDL) model as follows:
.J(D,X) = arg min
(D,X)
{r(A, D, X) + ‖X‖1} (3.1)
where.r(A, D, X) is the discriminative fidelity term;.‖X‖1 is the sparsity constraint;
.λ is a balance factor.
Let .Xi = [X1
i , X2
i , Xc
i ], where .X
j
i is the coding coefficient matrix of .Ai over
the sub-dictionary .Dj . Denote the representation of .Dk to .Ai as .Rk = DkXk
i . First
of all, the dictionary D should be able to well represent .Ai, and, therefore, .Ai ≈
DXi = D1X1
i +· · ·+DiXi
i +· · ·+DcXc
i . Secondly, since .Di is associated with the
ith class, it is expected that .Ai should be well represented by .Di (not by .Dj , j /= i),
which means both .

Ai − DiXi
i

2
F
and .


Dj X
j
i



2
F
should be minimized. Thus the
discriminative fidelity term is
.
r(A, D, X) =
c

i=1
r (Ai, D, Xi)
=
c

i=1
⎛
⎜
⎝‖Ai − DXi‖2
F +


Ai − DiXi
i



2
F
+
c

j=1
j/=i


Dj X
j
i



2
F
⎞
⎟
⎠
(3.2)
An intuitive explanation of three terms in .r(Ai, D, Xi) is shown in Fig. 3.1. In
software defect prediction, there are two kinds of modules: the defective modules
and the defective-free modules. Figure 3.1a shows that if we only minimize
Fig. 3.1 Illustration of the
discriminative fidelity term
3.1 Basic WPDP 21
the .‖Ai − DXi‖2
F on the total dictionary D, .Ri may deviate much from .Ai so
that sub-dictionary .Di could not well represent .Ai. In order to achieve better
powerful reconstruction capability and powerful discriminative capability, we add
another two parts.

Ai − DiXi
i

2
F
(which minimizes the reconstruction error on sub-
dictionary of its own class) and .


Dj X
j
i



2
F
(which minimizes the reconstruction
term on sub-dictionary of the other class); both of them should also be minimized.
Figure 3.1b shows that the proposed discriminative fidelity term could overcome the
problem in Fig. 3.1a.
As previously stated, misclassifying defective-free modules leads to increasing
the development cost, and misclassifying defective ones is related with risk cost.
Cost-sensitive learning can incorporate the different misclassification costs into the
classification process. In this section, we emphasize the risk cost such that we add
the penalty factor .cost(i, j) to increase the punishment when a defective software
module is predicted as a defective-free software module. As a result, cost-sensitive
dictionary learning makes the prediction incline to classify a module as a defective
one and generates a dictionary for classification with minimum misclassification
cost. The discriminative fidelity term with penalty factors is
.
r(A, D, X) =
c

i=1
r (Ai, D, Xi)
=
c

i=1
⎡
⎣‖Ai − DXi‖2
F +


Ai − DiXi
i



2
F
+
c

j=1
cost(i, j)


Dj X
j
i



2
F
⎤
⎦
(3.3)
Since there are only two classes in software defect prediction (the defective class and
the defective-free class), that is, .c = 2, the model of cost-sensitive discriminative
dictionary learning is
.
J(D,X) = arg min
(D,X)
 2

i=1

‖Ai − DXi‖2
F +


Ai − DiXi
i



2
F
+
2

j=1
cost(i, j)


Dj X
j
i



2
F
⎤
⎦ + λ‖X‖1
⎫
⎬
⎭
(3.4)
where the cost matrix is shown in Table 3.1.
Table 3.1 Cost matrix for CDDL
Predicts defective one Predicts defective-free one
Actually defective 0 .cost(1, 2)
Actually defective-free .cost(2, 1) 0
22 3 Within-Project Defect Prediction
The CDDL objective function in Formula 3.4 can be divided into two sub-
problems: updating X by fixing D and updating D by fixing X. The optimization
procedure is iteratively implemented for the desired discriminative dictionary D
and corresponding coefficient matrix X. At first, suppose that D is fixed, the
objective function in formula is reduced to a sparse coding problem to compute
.X = [X1, X2]. Here .X1 and .X2 are calculated one by one. We calculate .X1 with
fixed .X2 and then compute .X2 with fixed .X1. Thus, formula is rewritten as
.
J(Xi) = arg min
(Xi)

‖Ai − DXi‖2
F +


Ai − DiXi
i



2
F
+
2

j=1
cost(i, j)


Dj X
j
i



2
F
+ λ ‖Xi‖1
⎫
⎬
⎭
(3.5)
Formula 3.5 can be solved by using the IPM algorithm in [1]. When X is fixed,
we in turn update .D1 and .D2. When we calculate .D1, .D2 is fixed, then we compute
.D2, .D1 is fixed. Thus Formula 3.4 is rewritten as
.
J(Di) = arg min
(Di)
⎧
⎪
⎨
⎪
⎩







−DiXi
−
2

j=1
j/=i
Dj Xj







2
F
+


Ai − DiXi
i



2
F
+
2

j=1
cos(i, j)


Dj X
j
i



2
F
⎫
⎬
⎭
(3.6)
where .Xi is the coding coefficient matrix of A over .Di. Formula 3.6 is a quadratic
programming problem, and we can solve it by using the algorithm in [2].
By utilizing the PCA technique, we are able to initialize the sub-dictionary for
each class. Given the low data dimension of software defect prediction, PCA can
create a fully initialized sub-dictionary for every class. This means that all sub-
dictionaries have an equal number of atoms, which is generally equivalent to the
data dimension.
The algorithm of CDDL converges since its two alternative optimizations are
both convex. Figure 3.2 illustrates the convergence of the algorithm.
3.1.1.2 Experiments
To evaluate our CDDL approach, we conduct some experiments. For all selected
datasets, we use the 1:1 random division to obtain the training and testing sets for
all compared methods. The random division treatment may affect the prediction
performance. Therefore, we use the random division, perform prediction 20 times,
and report the average prediction results in the following discussions.
3.1 Basic WPDP 23
0
1 5 10
Iteration number
15 20 25 30 1 5 10
Iteration number
15 20 25 30
1 5 10
Iteration number
15 20 25 30 1 5 10
Iteration number
15 20 25 30
0
2
4
6
8
10
0.5
1
Total
objective
function
value
Total
objective
function
value
1.5
2
0
0.5
1
Total
objective
function
value
1.5
2
0
1
3
2
Total
objective
function
value 4
5
a b
c d
Fig. 3.2 Convergence of the realization algorithm of CDDL on four NASA benchmark datasets.
(a) CM1 dataset. (b) KC1 dataset. (c) MW1 dataset. (d) PC1 dataset
In our approach, in order to emphasize the risk cost, the parameters cost (1,2) and
cost (2,1) are set as 1:5. For various projects, users can select a different cost ratio,
such as cost(1,2) to cost(2,1) [3]. And the parameter is determined by searching a
wide range of values and choosing the one that yields the best F-measure value.
We compare the proposed CDDL approach with several representative methods,
particularly presented in the last five years, including support vector machine (SVM)
[4], Compressed C4.5 decision tree (CC4.5) [5], weighted Naïve Bayes (NB) [6],
coding based ensemble learning (CEL) [7], and cost-sensitive boosting neural
network (CBNN) [8]. In this section, we present the detailed experimental results of
our CDDL approach and other compared methods.
3.1.1.3 Discussions
Table 3.2 shows the Pd and Pf values of our approach and other compared methods
on 10 NASA datasets. For each dataset, Pd and Pf values of all methods are the
mean values calculated from the results of 20 runs. The results of Pf values suggest
that in spite of not acquiring the best Pf values on most datasets, CDDL can achieve
24 3 Within-Project Defect Prediction
Table 3.2 Experimental results: Pd and Pf comparisons on NASA’s ten datasets
Dataset M SVM CC4.5 NB CEL CBNN CDDL
CM1 Pd 0.15 0.26 0.44 0.43 0.59 0.74
Pf 0.04 0.11 0.18 0.15 0.29 0.37
JM1 Pd 0.53 0.37 0.14 0.32 0.54 0.68
Pf 0.45 0.17 0.32 0.14 0.29 0.35
KC1 Pd 0.19 0.40 0.31 0.37 0.69 0.81
Pf 0.02 0.12 0.06 0.13 0.30 0.37
KC3 Pd 0.33 0.41 0.46 0.29 0.51 0.71
Pf 0.08 0.16 0.21 0.12 0.25 0.34
MC2 Pd 0.51 0.64 0.35 0.56 0.79 0.83
Pf 0.24 0.49 0.09 0.38 0.54 0.29
MW1 Pd 0.21 0.29 0.49 0.25 0.61 0.79
Pf 0.04 0.09 0.19 0.11 0.25 0.25
PC1 Pd 0.66 0.38 0.36 0.46 0.54 0.86
Pf 0.19 0.09 0.11 0.13 0.17 0.29
PC3 Pd 0.64 0.34 0.28 0.41 0.65 0.77
Pf 0.41 0.08 0.09 0.13 0.25 0.28
PC4 Pd 0.72 0.49 0.39 0.48 0.66 0.89
Pf 0.16 0.07 0.13 0.06 0.18 0.28
PC5 Pd 0.71 0.50 0.32 0.37 0.79 0.84
Pf 0.22 0.02 0.14 0.13 0.08 0.06
Table 3.3 Average Pd value
of 10 NASA datasets
SVM CC4.5 NB CEL CBNN CDDL
Average .0.47 .0.41 .0.35 .0.39 .0.64 .0.79
comparatively better results in contrast with other methods. We can also observe
that the Pd values of CDDL, which are presented with boldface, are higher than
the corresponding values of all other methods. CDDL achieves the highest Pd
values on all datasets. The results indicate that the proposed CDDL approach takes
the misclassification costs into consideration, which makes the prediction tend to
classify the defective-free modules as the defective ones in order to obtain higher
Pd values.
We calculate the average Pd values of 10 NASA datasets in Table 3.3. As
compared with other methods, the average Pd value of our approach is higher in
contrast with other related methods, and CDDL improves the average Pd value at
least by .0.15(= 0.79 − 0.64).
Table 3.4 shows the F-measure values of our approach and the compared methods
on 10 NASA datasets. In Table 3.4, F-measure values of CDDL are better than
other methods on all datasets, which means that our proposed approach outperforms
other methods and achieves the ideal prediction effects. According to the average F-
measure values shown in Table 3.4, CDDL improves the average F-measure value at
3.1 Basic WPDP 25
Table 3.4 F-measure values
on ten NASA datasets
Datasets SVM CC4.5 NB CEL CBNN CDDL
CM1 .0.20 .0.25 .0.32 .0.27 .0.33 .0.38
JM1 .0.29 .0.34 .0.33 .0.33 .0.38 .0.40
KC1 .0.29 .0.39 .0.38 .0.36 .0.41 .0.47
KC3 .0.38 .0.38 .0.38 .0.33 .0.38 .0.44
MC2 .0.52 .0.48 .0.45 .0.49 .0.56 .0.63
MW1 .0.27 .0.27 .0.31 .0.27 .0.33 .0.38
PC1 .0.35 .0.32 .0.28 .0.32 .0.32 .0.41
PC3 .0.28 .0.29 .0.29 .0.36 .0.38 .0.42
PC4 .0.47 .0.49 .0.36 .0.48 .0.46 .0.55
PC5 .0.16 .0.48 .0.33 .0.36 .0.37 .0.59
Average .0.32 .0.37 .0.34 .0.35 .0.39 .0.47
Table 3.5 P -values between CDDL and other compared methods on ten NASA datasets
.CDDL
Dataset .s .SVM .CC4.5 .NB .CEL .CBNN
.CM1 .1.23 × 10−8 .3.51 × 10−6 .4.24 × 10−4 .1.80 × 10−4 .1.01 × 10−4
.JM1 .7.51 × 10−18 .2.33 × 10−13 .1.27 × 10−14 .1.58 × 10−13 .0.0564
.KC1 .1.20 × 10−14 .1.23 × 10−9 .8.38 × 10−13 .2.80 × 10−11 .9.69 × 10−6
.KC3 .0.0265 .0.0089 .3.22 × 10−4 .1.61 × 10−4 .4.24 × 10−4
.MC2 .1.26 × 10−4 .2.61 × 10−5 .1.13 × 10−8 .7.58 × 10−6 .1.01 × 10−4
.MW1 .1.14 × 10−3 .2.31 × 10−4 .1.10 × 10−3 .1.84 × 10−5 .2.20 × 10−3
.PC1 .2.64 × 10−4 .2.41 × 10−5 .1.60 × 10−8 .1.69 × 10−5 .1.68 × 10−8
.PC3 .7.79 × 10−14 .7.73 × 10−9 .1.04 × 10−8 .4.03 × 10−5 .4.31 × 10−5
.PC4 .7.32 × 10−8 .7.26 × 10−4 .2.81 × 10−16 .4.26 × 10−6 .1.75 × 10−10
.PC5 .3.01 × 10−18 .7.00 × 10−9 .1.90 × 10−14 .1.30 × 10−12 .2.13 × 10−11
least by (.0.47−0.39 = 0.08). To sum up, Tables 3.3 and 3.4 show that our approach
has the best achievement in the Pd and F-measure values.
To statistically analyze the F-measure results given in Table 3.4, we conduct a sta-
tistical test, that is, Mcnemar’s test [9]. This test can provide statistical significance
between CDDL and other methods. Here, the Mcnemar’s test uses a significance
level of 0.05. If the p-value is below 0.05, the performance difference between two
compared methods is considered to be statistically significant. Table 3.5 shows the
p-values between CDDL and other compared methods on 10 NASA datasets, where
only one value is slightly above 0.05. According to Table 3.5, the proposed approach
indeed makes a significant difference in comparison with other methods for software
defect prediction.
26 3 Within-Project Defect Prediction
3.1.2 Collaborative Representation Classification Based
Software Defect Prediction
3.1.2.1 Methodology
Figure 3.3 shows the flowchart of defect prediction in our approach, which includes
three steps. The first step is Laplace sampling process for the defective-free modules
to construct the training dataset. Second, the prediction models is trained by
using the CRC based learner. Finally, the CRC based predictor classifies whether
new modules are defective or defective-free. In the metric based software defect
prediction, the number of defective-free modules is much larger than that of
defective ones, that is, the class imbalance problem may occur. In this section, we
conduct the Laplace score sampling for training samples, which solves the class
imbalance problem effectively.
Sparse representation classification (SRC) represents a testing sample collabo-
ratively by samples of all classes. In SRC, there are enough training samples for
each class so that the dictionary is over-completed. Unfortunately, the number of
defective modules is usually much small. If we use this under-complete dictionary
to represent a defective module, the representation error may be much big and the
classification will be unstable. Fortunately, one fact in software defect prediction
is that software modules share similarities. Some samples from one class may be
very helpful to represent the testing sample of other classes. In CRC, this “lack of
samples” problem is solved by taking the software modules from the other class as
the possible samples of each class.
The main idea of CRC technique is that information of a signal can be
collaboratively represented by a linear combination of a few elementary signals.
We utilize .A = [A1, A2] ∈ Rm×n to denote the set of training samples which
is processed by Laplace sampling, and y denotes a testing sample. In order to
collaboratively represent the query sample using A with low computational burden,
we use the regularized least square method as follows:
.X̂ = arg min
X

‖y − A · X‖2
2 + λ‖X‖2
2

(3.7)
Software Defect
Database
Laplace
Sampling
Training Instances
Test Instances
Building a
Prediction Model
CRC_RLS
Prediction
Prediction Results
(defective/defective-free)
CRC Based Learner CRC Based Predictor
Fig. 3.3 CRC based software defect prediction flowchart
3.1 Basic WPDP 27
where .λ is the regularization parameter. The role of the regularization term is
twofold. First, it makes the least square solution stable. Second, it introduces a
certain amount of “sparsity” to the solution .X̂ while this sparsity is much weaker
than that by .l1-norm.
The solution of collaborative representation with regularized least square in
Eq. 3.7 can be easily and analytically derived as
.X̂ =

AT
A + λ · I
−1
AT
y (3.8)
Let .P = (AT A+λ·I)−1AT . Clearly, P is independent of y so that it can be pre-
calculated as a projection matrix. Hence, a query sample y can be simply projected
onto P via Py, which makes the collaborative representation very fast.
After training the CRC based learner, we can use the collaborative representation
classification with regularized least square .CRCRLS algorithm to do prediction.
For a test sample y, we code y over A and get .X̂. In addition to the class
specific representation residual .


y − Ai · X̂i



2
, where .X̂i is the coefficient vector
associated with class i (.i = 1, 2), the .l2-norm “sparsity” .


X̂i



2
can also bring
some discrimination information for classification. Thus we use both of them in
classification and calculate the regularized residual of each class by using .ri =


y − Ai · X̂i



2
/


X̂i



2
. The test sample y is assigned to the ith class corresponding
to the smallest regularized residual .ri.
3.1.2.2 Experiments
In the experiment, ten datasets from NASA Metrics Data Program are taken as the
test data. We compare the proposed approach with several representative software
defect prediction methods, including Compressed C4.5 decision tree (CC4.5),
weighted Naïve Bayes (NB), cost-sensitive boosting neural network (CBNN), and
coding based ensemble learning (CEL).
3.1.2.3 Discussions
We use recall (Pd), false positive rate (Pf), precision (Pre), and F-measure as
prediction accuracy evaluation indexes. A good prediction model desires to achieve
high value of recall rate and precision. However, there exists trade-off between
precision and recall. F-measure is the harmonic mean of precision and recall rate.
Note that these quality indexes are commonly used in the field of software defect
prediction. Table 3.6 shows the average Pd, Pf, Pre, and F-measure values of our
CSDP approach and other compared methods on ten NASA datasets, where each
value is the mean of 20 random runs. Our approach can acquire better prediction
28 3 Within-Project Defect Prediction
Table 3.6 Average Pd, Pf,
Pre, and F-measure values of
20 random runs on ten NASA
datasets
Evaluation Prediction methods
indexes CC4.5 NB CEL CBNN CSDP
Pd .0.408 .0.354 .0.394 .0.637 .0.745
Pf .0.140 .0.152 .0.148 .0.260 .0.211
Pre .0.342 .0.347 .0.324 .0.288 .0.343
F-measure .0.371 .0.342 .0.354 .0.390 .0.465
accuracy than other methods. In particular, our approach improves the average Pd at
least by 16.95% (.= (0.745 − 0.637)/0.637) and the average F-measure at least by
19.23% (.= (0.465 − 0.390)/0.390).
3.2 Semi-supervised WPDP
3.2.1 Sample-Based Software Defect Prediction with Active
and Semi-supervised Learning
3.2.1.1 Methodology
Software defect prediction, which aims to predict whether a particular software
module contains any defects, can be cast into a classification problem in machine
learning, where software metrics are extracted from each software module to form
an example with manually assigned labels defective (having one or more defects)
and non-defective (no defects). A classifier is then learned from these training
examples in the purpose of predicting the defect-proneness of unknown software
modules. In this section, we propose a sample-based defect prediction approach
which does not rely on the assumption that the current project has the same defect
characteristics as the historical projects.
Given a newly finished project, unlike the previous studies that leverage the
modules in historical projects for classifier learning, sample-based defect prediction
manages to sample a small portion of modules for extensive testing in order
to reliably label the sampled modules, while the defect-proneness of unsampled
modules remains unknown. Then, a classifier is constructed based on the sample of
software modules (the labeled data) and expected to provide accurate predictions
for the unsampled modules (unlabeled data). Here, conventional machine learners
(e.g., logistic regression, decision tree, Naive Bayes, etc.) can be applied to the
classification.
In practice, modern software systems often consist of hundreds or even thousands
of modules. An organization is usually not able to afford extensive testing for
all modules especially when time and resources are limited. In this case, the
organization can only manage to sample a small percentage of modules and test
them for defect-proneness. Classifier would have to be learned from a small training
3.2 Semi-supervised WPDP 29
set with the defect-proneness labels. Thus, the key for the sample-based defect
prediction to be cost-effective is to learn a well-performing classifier while keeping
the sample size small.
To improve the performance of sample-based defect prediction, we propose to
apply semi-supervised learning for classifier construction, which firstly learns an
initial classifier from a small sample of labeled training set and refines it by further
exploiting a larger number of available unlabeled data.
In semi-supervised learning, an effective paradigm is known as disagreement-
based semi-supervised learning, where multiple learners are trained for the same
task and the disagreements among the learners are exploited during learning. In
this paradigm, unlabeled data can be regarded as a special information exchange
“platform.” If one learner is much more confident on a disagreed unlabeled example
than other learner(s), then this learner will teach other(s) with this example; if
all learners are comparably confident on a disagreed unlabeled example, then this
example may be selected for query. Many well-known disagreement-based semi-
supervised learning methods have been developed.
In this study, we apply CoForest for defect prediction. It works based on a
well-known ensemble learning algorithm named random forest [10] to tackle the
problems of determining the most confident examples to label and producing the
final hypothesis. The pseudocode of CoForest is presented in Table 3.1. Briefly,
it works as follows. Let L denote the labeled dataset and U denote the unlabeled
dataset. First, N random trees are initiated from the training sets bootstrap-sampled
from the labeled dataset L for creating a random forest. Then, in each learning
iteration, each random tree is refined with the original labeled examples L and the
newly labeled examples .L' selected by its concomitant ensemble (i.e., the ensemble
of the other random trees except for the current tree). The learning process iterates
until certain stopping criterion is reached. Finally, the prediction is made based
on the majority voting from the ensemble of random trees. Note that in this way,
CoForest is able to exploit the advantage of both semi-supervised learning and
ensemble learning simultaneously, as suggested in Xu et al. [11].
In CoForest, the stopping criterion is essential to guarantee a good performance
Li and Zhou [12] derived a stopping criterion based on the theoretical findings
in Angluin and Laird [13]. By enforcing the worst case generalization error of a
random tree in the current round to be less than that in the preceded round, they
derived that semi-supervised learning process will be beneficial if the following
condition is satisfied
.
êi,t
êi,t−1

Wi,t−1
Wi,t
 1 (3.9)
where .êi,t and .êi,t−1 denote the estimated classification error of the i-th random tree
in the t-th and (.t − 1)-th round, respectively, and .Wi,t and .Wi,t−1 denote the total
weights of its newly labeled sets L .i, t and L.i, t − 1 in the t-th and (.t − 1)-th round,
respectively, and .i ∈ {1, 2, . . . , N}. For detailed information on the derivation,
please refer to Li and Zhou [12].
30 3 Within-Project Defect Prediction
The CoForest has been successfully applied to the domain of computer-aided
medical diagnosis, where conducting a large amount of routine examinations places
heavy burden on medical experts. The CoForest algorithm was applied to help learn
hypothesis from diagnosed and undiagnosed samples in order to assist the medical
experts in making diagnosis.
Although a random sample can be used to approximate the properties of all the
software modules in the current projects, a random sample is apparently not data-
efficient since random sample neglects the “needs” of the learners for achieving
good performance and hence may contain redundant information that the learner has
already captured during the learning process. Intuitively, if a learner is trained using
the data that the learner needs most for improving its performance, it may require
less labeled data than the learners trained without caring its needs for learning. Put
it another way, if the same number of labeled data is used, the learner that is trained
using the labeled data it needs most would achieve better performance than the
learner that is trained without caring its needs for learning.
Active learning, which is another major approach for learning in presence of a
large number of unlabeled data, aims to achieve good performance by learning with
as few labeled data as possible. It assumes that the learner has some control over
the data sampling process by allowing the learner to actively select and query the
label of some informative unlabeled example which, if the labels are known, may
contribute the most for improving the prediction accuracy. Since active learning
and semi-supervised learning exploit the merit of unlabeled data from different
perspective, they have been further combined to achieve better performance in
image retrieval [86], Email spam detection [39], etc. Recently, Wang and Zhou [68]
analytically showed that combining active learning and semi-supervised learning is
beneficial in exploiting unlabeled data.
In this study, we extend CoForest to incorporate the idea of active learning into
the sample-based defect prediction. We propose a novel active semi-supervised
learning method called ACoForest, which leverages the advantages from both
disagreement-based active learning and semi-supervised learning. In detail, let L
and U denote the labeled set and unlabeled set, respectively. Similar to CoForest,
ACoForest is firstly initiated by constructing a random forest with N random trees
over L. Then, ACoForest iteratively exploits the unlabeled data via both active
learning and semi-supervised learning. In each iteration, ACoForest firstly labels
all the unlabeled examples and computes the degree of agreement of the ensemble
on each unlabeled example. Then, it reversely ranks all the unlabeled data according
to the degree of agreement and selects the M top-most disagreed unlabeled data to
query their labels from the user. These unlabeled data as well as their corresponding
labels are then used to augment L. After that, ACoForest exploits the remaining
unlabeled data just as CoForest does.
3.2 Semi-supervised WPDP 31
3.2.1.2 Experiments
To evaluate the effectiveness of sample-based defect prediction methods, we
perform experiments using datasets available at the PROMISE website. We have
collected the Eclipse, Lucene, and Xalan datasets.
The Eclipse datasets contain 198 attributes, including code complexity metrics
(such as LOC, cyclomatic complexity, number of classes, etc.) and metrics about
abstract syntax trees (such as number of blocks, number of if statements, method
references, etc.) (Zimmermann et al. 2007). The Eclipse defect data was collected
by mining Eclipse’s bug databases and version archives.
In this study, we experiment with Eclipse 2.0 and 3.0. To show the generality of
the results, we use the package-level data for Eclipse 3.0 and the file-level data for
Eclipse 2.0. We also choose two Eclipse components: JDT.Core and SWT in Eclipse
3.0 to evaluate the defect prediction performance for smaller Eclipse projects. We
only examine the pre-release defects, which are defects reported in the last six
months before release.
The Lucene dataset we use contains metric and defect data for 340 source
files in Apache Lucene v2.4. The Xalan dataset contains metric and defect data
for 229 source files in Apache Xalan v2.6. Both datasets contain 20 attributes,
including code complexity metrics (e.g., average cyclomatic complexity), object-
oriented metrics (e.g., depth of inheritance tree), and program dependency metrics
(e.g., number of dependent classes).
Having collected the data, we then apply the three methods described in Sect. 2
to construct defect prediction models from a small sample of modules and use them
to predict defect-proneness of unsampled modules. We evaluate the performance
of all the methods in terms of precision (P), recall (R), F-measure (F), and
Balancemeasure (B), which are defined as follows:
.P =
tp
tp + fp
(3.10)
.R =
tp
tp + f n
(3.11)
.F =
2PR
P + R
(3.12)
.B = 1 −



1
2

f n
tp + f n
!2
+
fp
tn + fp
!2

(3.13)
where tp, fp, tn, f n are the number of defective modules that are predicted as
defective, the number non-defective modules that are predicted as defective, the
number non-defective modules that are predicted as non-defective, and the number
defective module that are predicted as non-defective, respectively.
32 3 Within-Project Defect Prediction
3.2.1.3 Discussions
Our experiments show that a smaller sample can achieve similar defect prediction
performance as larger samples do. The sample can serve as an initial labeled training
set that represents the underlying data distribution of the entire dataset. Thus if there
is no sufficient historical datasets for building an effective defect prediction model
for a new project, we can randomly sample a small percentage of modules to test,
obtain their defect status (defective or non-defective), and then use the collected
sample to build a defect prediction for this project.
Our experiments also show that, in general, sampling with semi-supervised
learning and active learning can achieve better prediction performance than sam-
pling with conventional machine learning techniques. A sample may contain much
information that a conventional machine learner has already learned well but may
contain little information that the learner needs for improving the current prediction
accuracy. The proposed CoForest and ACoForest learners take the needs for learning
into account and obtain information needed for improving performance from the
unsampled modules.
Both CoForest and ACoForest methods work well for sample-based defect
prediction. ACoForest also supports the active selection of the modules—it can
actively suggest the QA team which modules to be chosen in order to increase the
prediction performance. Thus in order to apply ACoForest, interactions with test
engineers are required. If such interactions is allowed (which implies that more time
and efforts are allowed), we can apply the ACoForest method. If such interaction is
not allowed due to limited time and resources, we can apply the CoForest method.
In our approach, we draw a random sample from the population of modules.
To ensure proper statistical inference and to ensure the cost effectiveness of the
proposed method, the population size should be large enough. Therefore, the
proposed method is suitable for large-scale software systems.
The simple random sampling method requires that each individual in a sample
to be collected entirely by chance with the same probability. Selection bias may
be introduced if the module sample is collected simply by convenience, or from a
single developer/team. The selection bias can lead to non-sampling errors (errors
caused by human rather than sampling) and should be avoided.
The defect data for a sample can be collected through quality assurance activities
such as software testing, static program checking, and code inspection. As the
sample will be used for prediction, these activities should be carefully carried out so
that most of defects can be discovered. Incorrect sample data may lead to incorrect
estimates of the population.
In our experiments, we used the public defect dataset available at the PROMISE
dataset. Although this dataset has been used by many other studies [14–18], our
results may be under threat if the dataset is seriously flawed (e.g., there were
major problems in bug data collection and recording). Also, all the data used are
collected from open source projects. It is desirable to replicate the experiments on
industrial, in-house developed projects to further evaluate their validity. This will be
our important future work.
References 33
References
1. Rosasco L, Verri A, Santoro M, Mosci S, Villa S (2009) Iterative Projection Methods for
Structured Sparsity Regularization.
2. Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based
face recognition. In: Proceedings of the International Conference on Image Processing, pp
1601–1604. https://doi.org/10.1109/ICIP.2010.5652363
3. Jiang Y, Cukic B, Menzies T (2008) Cost Curve Evaluation of Fault Prediction Models. In:
Proceedings of the 19th International Symposium on Software Reliability Engineering, pp
197–206. https://doi.org/10.1109/ISSRE.2008.54
4. Elish KOEaMO (2008) Predicting defect-prone software modules using support vector
machines. J Syst Softw 81(5):649–660. https://doi.org/10.1016/j.jss.2007.07.040
5. Wang J, Shen B, Chen Y (2012) Compressed C4.5 Models for Software Defect Prediction. In:
Proceedings of the 2012 12th International Conference on Quality Software, pp 13–16. https://
doi.org/10.1109/QSIC.2012.19
6. Wei-hua WTaL (2010) Naive Bayes Software Defect Prediction Model. In: Proceedings of the
2010 International Conference on Computational Intelligence and Software Engineering
7. Sun Z, Song Q, Zhu X (2012) Using Coding-Based Ensemble Learning to Improve Software
Defect Prediction. IEEE Trans Syst Man Cybern Part C 42(6):1806–1817. https://doi.org/10.
1109/TSMCC.2012.2226152
8. Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert
Syst Appl 37(6):4537–4543. https://doi.org/10.1016/j.eswa.2009.12.056
9. Yambor WS, Draper BA, Beveridge, JR (2002) Analyzing PCA-based face recognition
algorithms: Eigenvector selection and distance measures. Empirical Evaluation Methods in
Computer Vision, pp 39–60. World Scientific
10. Breiman L (2001) Random forests. Mach Learn 45:5–32
11. Xu J-M, Fumera G, Roli F, Zhou Z-H, et al. (2009) Training SpamAssassin with active
semi-supervised learning. In: Proceedings of the 6th Conference on Email and Anti-Spam
(CEAS’09), pp 1–8
12. Li M, Zhou Z-H (2007). Improve computer-aided diagnosis with machine learning techniques
using undiagnosed samples. IEEE Trans Syst Man Cybern Part A Syst Humans 37(6): 1088–
1098
13. Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2: 343–370
14. Koru AG, Liu H (2005) Building effective defect-prediction models in practice. IEEE Softw
22(6): 23–29
15. Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect
predictors. IEEE Trans Softw Eng 33(1):2–13
16. Zhang H, Nelson A, Menzies T (2010) On the value of learning from defect dense components
for software defect prediction. In: Proceedings of the 6th International Conference on
Predictive Models in Software Engineering, pp 1–9.
17. Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect
prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the
7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT
Symposium on the Foundations of Software Engineering, pp 91–100
18. Kim S, Zimmermann T, Whitehead Jr EJ, Zeller A (2007) Predicting faults from cached history.
In: 29th International Conference on Software Engineering (ICSE’07). IEEE, pp 489–498
Chapter 4
Cross-Project Defect Prediction
Abstract The challenge of CPDP methods is the distribution difference between
the data from different projects. Transfer learning can transfer the knowledge from
the source domain to the target domain with the aim to minimize the domain
difference between different domains. However, most existing methods reduce the
distribution discrepancy in the original feature space, where the features are high-
dimensional and nonlinear, which makes it hard to reduce the distribution distance
between different projects. In this chapter, we proposed a manifold embedded
distribution adaptation (MDA) approach to narrow the distribution gap in manifold
feature subspace. For cross-project SDP, we found that the class imbalanced source
usually leads to misclassification of defective instances. However, only one work has
paid attention to this cross-project class imbalance problem. Subclass discriminant
analysis (SDA), an effective feature learning method, is introduced to solve the
problems. It can learn features with more powerful classification ability from
original metrics. Within-project and cross-project class imbalance problems greatly
affect prediction performance, and we provide a unified and effective prediction
framework for both problems. We call CPDP in this scenario as cross-project
semi-supervised defect prediction (CSDP). Although some within-project semi-
supervised defect prediction (WSDP) methods have been developed in recent years,
there still exists much room for improvement on prediction performance. We aim
to provide a unified and effective solution for both CSDP and WSDP problems.
We introduce the semi-supervised dictionary learning technique and propose a
cost-sensitive kernelized semi-supervised dictionary learning (CKSDL) approach.
CKSDL can make full use of the limited labeled defect data and a large amount of
unlabeled data in the kernel space.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X.-Y. Jing et al., Intelligent Software Defect Prediction,
https://doi.org/10.1007/978-981-99-2842-2_4
35
36 4 Cross-Project Defect Prediction
4.1 Basic CPDP
4.1.1 Manifold Embedded Distribution Adaptation
4.1.1.1 Methodology
MDA consists of two processes. First, MDA performs manifold feature learning
to accommodate the feature distortion problem in the transformation process due
to the nonlinear distribution of the high-dimensional data. Since the features from
different projects in manifold have some similar geometrical structures and intrinsic
representations, MDA can effectively exploit latent information from different
projects by performing manifold feature learning. Second, to address the challenge
of distribution difference from different projects, MDA performs joint distribution
adaptation and considers both the importance of marginal and conditional distribu-
tions. Both manifold feature learning and joint distribution adaptation are used to
make the most of them.
Manifold Feature Learning
Manifold feature learning tries to discover the intrinsic geometry of the manifold
and project the data onto a lower-dimensional space that preserves some properties
of the manifold, such as distances, angles, or local neighborhoods. Manifold feature
learning can be useful for data visualization, clustering, classification, and other
tasks that benefit from reducing the complexity and noise of the data. Manifold
feature learning plays an important role since the features in manifold space usually
have a good geometric structure to avoid feature distortion. We map data from
different projects into a common latent space while keeping the geometry structures
of the input manifold; simultaneously the captured connections are mapped onto
the common latent space. MDA learns the mapping function .g(·) in the Grassmann
manifold .G (dk). .dk is the dimension of the subspaces of the different project data.
We utilize the geodesic flow kernel (GFK) [1] to learn .g(·) for its computational
efficiency.
.G can be regarded as a collection of all.dk-dimensional subspaces. Principal com-
ponent analysis [1] is performed on the source project and the target project to obtain
two corresponding orthogonal subspaces. Each original subspace corresponds to
one point in .G. The geodesic flow between two points can draw a path for the two
subspaces. Constructing a geodesic flow from two points equals to transforming the
original features into an infinite dimensional feature space. The new features of data
can be represented as .z = g(x). From [1], the inner product of transformed features
gives rise to a positive semi-definite GFK:
.

zi, zj

= xT
i Gxj (4.1)
4.1 Basic CPDP 37
where.G is a positive semi-definite matrix. The original data can be transformed into
Grassmann manifold with .z = g(x) =
√
Gx. .
√
G is just an expression form and
cannot be computed directly, where x is an instance from source or target project,
and.X = XS, XT . This square root can be calculated by Denman–Beavers algorithm
[2]. We use .Z =
√
GX as the manifold feature representation in the following
sections.
Joint Distribution Adaptation
Distribution adaptation reduces the distribution difference between different
projects by minimizing the predefined distance measures. In the CPDP situation, the
source and target projects have different data with different marginal and conditional
distributions. To significantly reduce the distribution difference between different
projects, joint distribution adaptation for CPDP considers the marginal and the
conditional distribution adaptation at the same time. The distance between .S and .T
can be represented as follows:
.
d(S, T ) =(1 − μ)d (P (zS) , P (zT ))
+ μd (P (yS | zS) , P (yT | zT ))
(4.2)
where .P(zS) and .P(zT ) denote the marginal distribution. .d(P(yS|zS) and
.P(yT |zT )) denote conditional distribution. .d(P(zS), P(zT )) denotes the marginal
distribution adaptation, and .d(P(yS|zS), P(yT |zT )) denotes the conditional
distribution adaptation. .μ ∈ [0, 1] is a parameter that adjusts the importance of
two kinds of distributions. The label set .yT is not available in advance which leads
to the calculation of the term .P(yT |zT ) infeasible. We follow the method in [3] and
.P(zT |yT ) approximately equal to .P(yT |zT ). Thus we use a base classifier trained
on S and then obtain the labels of .zT . To calculate the divergence between source
and target data distributions, the maximum mean discrepancy (MMD) is applied.
MMD is an effective method and has been widely used in many methods. Then
Formula 4.2 is formulated as follows:
Where C denotes the number of classes of the label in CPDP. .Sc and .Tc denote
the instances with the cth class label. .mc and .nc denote the number of instances of
the cth class label from source and target projects. .Hdenotes the reproducing kernel
Hilbert space induced by the mapping in manifold space. Using the matrix tricks,
we obtain the following formula:
. min AT
Z ((1 − μ)M0 + μMc) ZT
A + λ‖A‖2
F s.t.AT
ZHZT
A = I (4.3)
where .Z = ZS, ZT combines the source project and target project data. .M0 and .Mc
denote the MMD matrices that we obtained. A denotes the transformation matrix,
.‖A‖2
F is the Frobenius norm. .I ∈ R(m+n)×(m+n) is an identity matrix, and .H =
I − (1/(m + n))1 is the centering matrix similar to this work [4]. The constraint
38 4 Cross-Project Defect Prediction
condition ensures that .AT Z preserves the inner attributes of the original data. The
parameter .λ is a regularization term. The MMD matrix can be calculated by the
following formulas:
. (M0)ij =
⎧
⎨
⎩
1
m if zi, zj ∈ S
1
n if zi, zj ∈ T
− 1
mn otherwise
(4.4)
. (Mc)ij =
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎩
1
mc
if zi, zj ∈ Sc
1
nc
if zi, zj ∈ Tc
− 1
mcnc
,

1
mc
zi ∈ SC, zj ∈ Tc
1
nc
zi ∈ Tc, zj ∈ Sc
0 otherwise
(4.5)
Furthermore, we build a Laplacian matrix by linking nearby instances to use
a similar relationship better when learning transform matrices. It makes similar
instances stay close to each other in the shared space. We call the inter-project
similarity matrix as
. Wij =
sim zi, zj zi ∈ N zj or zj ∈ N (zi)
0 otherwise
(4.6)
where .sim(·, ·) is a similarity matrix representing the similarity of two instances.
.N(zi) denotes the NNs of .zi. Then we introduce the Laplacian matrix .L = D − W,
.Dii= m+n
j=1 Wij
. The final regularization function can be formulated as
.
RL =
m+n
i,j=1
zi − zj
2
Wij
=
m+n
i,j=1
ziLij zj
= ZLZT
(4.7)
The objective function of our MDA approach
.
min AT
Z ((1 − μ)M0 + μMc + βL) ZT
A + λ‖A‖2
F
s.t. AT
ZHZT
A = I
(4.8)
4.1 Basic CPDP 39
Algorithm 1 The pseudo of MDA
Input: Data matrices.S = XS,.YS and.T = {XT } from source and target projects, hyperparameters
.λ, β, μ.
Output: output result
1: Learn the GFK G to obtain the feature representations in manifold space by Gong et al. [1].
2: Get the transformed data representation .Z =
√
GX in manifold subspace.
3: Train a basic classifier using S, then get the pseudo label of target data.
4: Compute MMD matrices .M0 and .Mc by 4.5 and 4.6.
5: Using matrix tricks to rewrite 4.3 and obtain the joint distribution distance representation 4.4.
6: Construct a Laplacian matrix .L = D − W by 4.7 and 4.8.
7: Construct objective function 4.9 by incorporating 4.4 and 4.10.
8: Construct Lagrange function 4.10 for problem 4.9.
9: Solve the generalised eigendecomposition problem in 4.11 and get dl eigenvalues.
10: Construct a transformation matrix A by the eigenvectors with respect to the dl eigenvalues.
11: Construct the transformed data of the source and target data by using A.
12: Obtain the prediction label of target data by using the logistic regression (LR) classifier.
where .β is a balance parameter. To solve 4.8, we denote the Lagrange multipliers as
.Ф = Ф1, Ф2, . . . , Фdl , then we rewrite formula 4.8 as
.
LA =AT
Z ((1 − μ)M0 + μMc + βL) ZT
A + λ‖A‖2
F
+

I − AT
ZHZT
A

Ф
(4.9)
Set the derivative of A to 0, .∂LA/∂A = 0. Then we compute the eigenvectors
for the generalized eigenvector problem
.

Z ((1 − μ)M0 + μMc + βL) ZT
+ λI

A = ZHZT
AФ (4.10)
Finally, by solving formula 4.10, we find its dl small eigenvectors, and the
optimal transformation matrix A is obtained. Algorithm 1 presents the details of
MDA.
4.1.1.2 Experiments
In an experiment, we employ 20 publicly projects from three defect datasets,
including AEEEM, NASA, and PROMISE datasets. In the experiment, we employ
three commonly used measures in defect prediction experiment to evaluate the
effectiveness of the MDA approach. The measures we used include F-measure, G-
measure, and AUC. We compared MDA with defect prediction models including
CamargoCruz [5], TCA + [6], CKSDL [7], CTKCCA [8], HYDRA [9], and
ManualDown [10]. For the comparison with CPDP methods, we specifically looked
at the CamargoCruz, ManualDown, CKSDL, and HYDRA methods. Herbold et al.
[11] evaluated 24 methods for CPDP. CamargoCruz method [5] always performs
40 4 Cross-Project Defect Prediction
the best in CPDP methods in [11]. ManualDown is an unsupervised method that
is suggested as the baseline model for comparison in [10] when developing a new
CPDP method. CKSDL [7] is an effective semi-supervised CPDP method. HYDRA
is an ensemble learning method that utilizes multiple source projects for CPDP.
Comparison with transfer learning methods: there are several successful CPDP
models based on transfer learning for comparison. TCA + is an effective transfer
methods based on transfer component analysis technology [6]. CTKCCA [8] is one
of the state-of-the-art CPDP methods and also is a feature-based transfer learning
method. A total of 20 projects from AEEEM, NASA, and PROMISE datasets act as
experiment data. We organize the cross-project setting as previous studies [7, 8, 11]
and conduct CPDP experiment. We perform two experiment settings to evaluate
MDA:
Case 1 One-to-one CPDP (MDA-O). Following the pervious CPDP methods in
related work, we adopt one-to-one setting of CPDP. For a given dataset, we use one
project of the dataset as the target project in turn, and each of the other projects of
dataset is treated as source project separately to conduct the cross-project prediction.
For example, if the project EQ in AEEEM dataset is selected as the target project,
the remaining projects in AEEEM dataset (JDT, LC, ML, and PDE) are separately
set as a training project once, and we get four groups of prediction results for EQ,
that is, JDTEQ, LCEQ, ML EQ, and PDE EQ, where the left side of .⇒ represents
the source project and the right side of denotes the target project. Then the mean
performance of these four predictions for the target project EQ is reported in the
section. Finally, we report the mean prediction results of multiple cross-project pairs
for each target project.
Case 2 Many-to-one CPDP (MDA-M). For a given dataset, one project of this
dataset is selected as the target project, and all of other projects of the dataset are
used as the source projects for one time of prediction. For example, if the project
EQ in AEEEM is selected as target project, JDT, LC, ML, and PDE are all selected
as source projects. In other words, the cross-project prediction case is JDT, LC, ML,
PDE EQ (Table 4.1).
In our method, we have several parameters. The parameters .β and .λ in (4) are
set as .β = 0.1 and .λ = 0.1. The balance factor .μ controls the weights of two kinds
of distributions. Due to the different distributions from different datasets, the value
of .μ varies for different datasets. Taking MW1CM1 (source project target project)
as an example, we run MDA on 11 different values in the range of 0, 0.1, 0.2, . . . ,
1 and finally, we set the value of .μ as 0.6. Table 4.2 report the values of F-measure,
G-measure, and AUC for MDA versus baselines on AEEEM dataset, NASA
dataset, and PROMISE dataset. The values in bold denote the best performance.
These results show that MDA outperforms all baselines on three indicators in
most cases. Comparison with CPDP methods: compared with four CPDP methods
(CamargoCruz, ManualDown, CKSDL, and HYDRA), MDA-O achieves the best
performance values of three indicators on average prediction performance. Both
MDA-O and MDA-M perform better than compared methods on most projects.
MDA-O improves the result at least by 24.4% ((.0.4876 − 0.3919)/0.3919) in terms
4.1 Basic CPDP 41
Table
4.1
Comparison
results
in
terms
of
F-measure
on
each
project
Project
CamargoCruz
ManualDown
CKSDL
TCA
+
CTKCCA
HYDRA
MDA-M
MDA-O
EQ
.
0.6592
.
0.6742
.
0.2709
.
0.4112
.
0.3530
.
0.5926
.
0.6531
.
0.6534
JDT
.
0.4732
.
0.3976
.
0.3522
.
0.4093
.
0.3495
.
0.5385
.
0.4638
.
0.5754
LC
.
0.2448
.
0.2046
.
0.3467
.
0.3631
.
0.3326
.
0.3774
.
0.2263
.
0.6637
ML
.
0.3238
.
0.2581
.
0.3642
.
0.3581
.
0.3530
.
0.5385
.
0.3252
.
0.6381
PDE
.
0.3249
.
0.3009
.
0.3507
.
0.4209
.
0.3495
.
0.2000
.
0.3168
.
0.6630
CM1
.
0.2663
.
0.2602
.
0.2127
.
0.3298
.
0.3309
.
0.2206
.
0.2846
.
0.3360
MW1
.
0.2326
.
0.2191
.
0.1850
.
0.3123
.
0.3345
.
0.3429
.
0.2223
.
0.2276
PC1
.
0.1809
.
0.1990
.
0.2241
.
0.3268
.
0.3544
.
0.3684
.
0.2196
.
0.2173
PC3
.
0.2924
.
0.2814
.
0.1780
.
0.4006
.
0.4490
.
0.3529
.
0.3333
.
0.3265
PC4
.
0.3273
.
0.3358
.
0.1589
.
0.3464
.
0.4467
.
0.6087
.
0.3572
.
0.3468
ant1.7
.
0.4582
.
0.4853
.
0.3497
.
0.4390
.
0.3177
.
0.3774
.
0.4780
.
0.4668
camel1.6
.
0.3420
.
0.3333
.
0.4614
.
0.3986
.
0.2404
.
0.1734
.
0.3486
.
0.3527
ivy2.0
.
0.3477
.
0.3188
.
0.3037
.
0.4510
.
0.2961
.
0.4400
.
0.2961
.
0.3281
jedit4.1
.
0.3992
.
0.2843
.
0.3028
.
0.1444
.
0.3588
.
0.4203
.
0.5300
.
0.4791
lucene2.4
.
0.4022
.
0.6454
.
0.2953
.
0.4441
.
0.3749
.
0.3273
.
0.6430
.
0.6599
poi3.0
.
0.3713
.
0.5729
.
0.2895
.
0.4117
.
0.4040
.
0.3333
.
0.6904
.
0.7486
synapse1.2
.
0.4056
.
0.4933
.
0.2583
.
0.3669
.
0.4099
.
0.5000
.
0.5307
.
0.5402
velocity1.6
.
0.4635
.
0.5609
.
0.2696
.
0.4598
.
0.4156
.
0.3447
.
0.5722
.
0.5576
xalan2.6
.
0.5186
.
0.6225
.
0.2652
.
0.4261
.
0.3967
.
0.3723
.
0.6861
.
0.6679
xerces1.3
.
0.3000
.
0.2279
.
0.3378
.
0.4033
.
0.3839
.
0.3200
.
0.2884
.
0.3038
average
.
0.3667
.
0.3838
.
0.2888
.
0.3812
.
0.3625
.
0.3919
.
0.4233
.
0.4876
42 4 Cross-Project Defect Prediction
Table
4.2
Comparison
results
in
terms
of
AUC
on
each
project
Project
CamargoCruz
ManualDown
CKSDL
TCA
+
CTKCCA
HYDRA
MDA-M
MDA-O
EQ
.
0.7406
.
0.7137
.
0.5567
.
0.6572
.
0.6437
.
0.7666
.
0.7050
.
0.7874
JDT
.
0.7359
.
0.6212
.
0.6028
.
0.5606
.
0.6430
.
0.7394
.
0.7622
.
0.7640
LC
.
0.7159
.
0.5902
.
0.5660
.
0.6631
.
0.6456
.
0.7337
.
0.6610
.
0.7650
ML
.
0.7065
.
0.5690
.
0.5940
.
0.6164
.
0.6437
.
0.7394
.
0.7099
.
0.7502
PDE
.
0.6964
.
0.6343
.
0.5787
.
0.6628
.
0.6430
.
0.6532
.
0.6898
.
0.7301
CM1
.
0.7380
.
0.6932
.
0.5901
.
0.6274
.
0.6413
.
0.7392
.
0.7736
.
0.7845
MW1
.
0.7547
.
0.6593
.
0.5401
.
0.5885
.
0.6337
.
0.6921
.
0.7099
.
0.7716
PC1
.
0.6819
.
0.6631
.
0.5768
.
0.6602
.
0.6422
.
0.7334
.
0.7412
.
0.7452
PC3
.
0.7223
.
0.6833
.
0.5502
.
0.6461
.
0.6837
.
0.7645
.
0.7999
.
0.7823
PC4
.
0.7456
.
0.6919
.
0.5339
.
0.5759
.
0.6887
.
0.7675
.
0.7889
.
0.7725
ant1.7
.
0.6732
.
0.6947
.
0.5644
.
0.6442
.
0.5842
.
0.7331
.
0.8032
.
0.7661
camel1.6
.
0.5743
.
0.5611
.
0.5771
.
0.5794
.
0.5595
.
0.6838
.
0.6097
.
0.6064
ivy2.0
.
0.6797
.
0.7119
.
0.5969
.
0.7088
.
0.5516
.
0.7797
.
0.8246
.
0.7820
jedit4.1
.
0.6198
.
0.4613
.
0.6152
.
0.6439
.
0.6484
.
0.6763
.
0.7427
.
0.7350
lucene2.4
.
0.6284
.
0.5980
.
0.5855
.
0.5911
.
0.6647
.
0.5746
.
0.6116
.
0.6357
poi3.0
.
0.6154
.
0.6611
.
0.5371
.
0.6235
.
0.6867
.
0.6935
.
0.6847
.
0.7249
synapse1.2
.
0.6518
.
0.5823
.
0.5556
.
0.6211
.
0.6602
.
0.6762
.
0.6955
.
0.6805
velocity1.6
.
0.5990
.
0.6395
.
0.6093
.
0.6010
.
0.6569
.
0.6550
.
0.7149
.
0.6890
xalan2.6
.
0.5884
.
0.5988
.
0.5707
.
0.6821
.
0.6578
.
0.6743
.
0.7633
.
0.7454
xerces1.3
.
0.6092
.
0.4873
.
0.5838
.
0.6207
.
0.6392
.
0.6290
.
0.6263
.
0.6254
average
.
0.6739
.
0.6258
.
0.5742
.
0.6287
.
0.6409
.
0.7057
.
0.7209
.
0.73
4.1 Basic CPDP 43
of average F-measure value, 4.6% (.0.6474 − 0.6191)/0.6191 in terms of average
G-measure value, and 3.8% ((.0.7322 − 0.7057)/0.7057) in terms of AUC value
against with four CPDP baselines. Comparison with transfer learning methods: we
can see that MDA-O achieves satisfying performance on each project results. Both
MDA-O and MDA-M perform better than compared methods on average results.
MDA-O achieves improvements of 27.9 and 34.5% in terms of F-measure, of 11.2
and 16.3% in terms of G-measure, of 16.4 and 14.2% in terms of AUC against
with the baselines on average prediction performance. Comparison with many-
to-one CPDP (MDA-M), MDA-O achieves better average results than MDA-M.
MDA-O slightly outperforms than MDA-M on the overall prediction performance
in terms of G-measure and AUC. Specially, on AEEEM dataset, the results of
MDA-O on JDT, LC, ML, and PDE are better than MDAM. The reasons may be
divided into the following aspects: Firstly, using all of other projects except target
project as source project may contain some redundant information. Secondly, the
distribution differences of the data from different projects may be large. The data
from source project and target project has distribution difference, and the data from
multiple source projects also has distribution difference. In brief, MDA achieves
improvements in terms of three measures over three benchmark datasets among
all six baselines (four classical CPDP methods and two CPDP methods based on
transfer learning). This shows that the feasibility and effectiveness of MDA, which
facilitates the performance of CPDP.
4.1.1.3 Discussions
Does Manifold Feature Learning Influence the Prediction Performance of
MDA In order to investigate the effectiveness of manifold feature learning of MDA,
we run MDA and DA (MDA without manifold feature learning, we called it DA)
on three datasets. In this section, the results of MDA reported in the tables and
figures are under the setting of case 1 in Section Experiments. Figure 4.1 shows the
performance of DA and MDA on three datasets. From the results, the performance
of MDA with manifold feature learning is improved on all datasets. These results
indicate that the features in the manifold subspace facilitate distribution adaptation.
Manifold feature learning plays an important role in avoiding feature distortion and
exploring geometric structure of data. MDA can realize approximate results without
manifold learning on few projects, while adding manifold learning can obtain better
performance. From Fig. 4.1, the F-measure and G-measure values of DA are slightly
better than MDA on MW1. The reasons can be attributed as follows: The number of
instances on MW1 is limited. Compared with other projects, the number of instances
on MW1 is smaller. Manifold feature learning may play a limited role on the dataset
which has small number of instances.
Do Different Distributions Influence the Prediction Performance of MDA μ
can evaluate the importance of different distributions. We discuss the impacts of
different values of MDA in this section. We tune 11 different parameter values
Random documents with unrelated
content Scribd suggests to you:
returned to camp I would fulfil my part of the contract by going back
with him.
“Well, Bones,” he said. “I’ll come. I don’t know what special kind
of miseries the Turks keep for malingering lunatics, but I promise
you that without your permission they’ll never find out through me.”
I made him the same promise. Three months later I was to regret
it most bitterly, for Hill then lay at death’s door in Gumush Suyu
hospital, and forbade me to say the few words of confession that
would have got him the humane treatment he required.
Our Spook had a delicate task regaining its full authority over
Kiazim. It began by developing the Commandant’s own plan—a
process to which he could hardly object—and laying stress on its
desire to keep Kiazim in the background. It reminded us that in
order to avoid OOO’s interference it was better for us not to know
what method would be ultimately adopted. But there was no harm in
preparing for a trip to Constantinople to read the thoughts of AAA.
And if we failed, which was unlikely, we could try some other
method when we returned to Yozgad. Meantime, Kiazim need do
nothing but tell the truth, in which there was never any harm. It did
not reprove Kiazim for lack of faith, or pretend to know anything
about his temporary secession, but went on quietly as if nothing had
occurred.
The Commandant was perfectly ready to tell the truth, but wanted
to know to whom he was to tell it, and what he was to say! The
Spook told him. He was to call in the Turkish doctors and make them
the following statement, which he should learn by heart:
“I am anxious about two of my prisoners, and I want your
professional advice that I may act on it. I have reason to believe
they are mentally affected, and that the English doctor is
endeavouring to conceal the fact.[41]
A certain number of the
prisoners, amongst whom Jones and Hill were prominent, have been
studying occultism ever since they arrived. They admittedly practise
telepathy, and were arrested for communication with people outside
on military matters. For direct evidence as to their conduct during
their confinement I refer you to my Interpreter (Moïse) and my
orderly (the Cook) who have seen a good deal of them. If they have
become mentally unhinged I fear they may do something desperate,
and would like you to send them to Constantinople where they can
be properly looked after, or do whatever you think is best for them.”
The Commandant would then produce the Cook. His story to the
doctors was to be as follows:
“By the Commandant’s orders I attended Hill and Jones in their
imprisonment, as they were not allowed to communicate with other
prisoners. I took them their food (from Posh Castle). At first I
noticed nothing peculiar. After a few days, in brushing out their
room, I began to find bits of meat hidden away in the corners. I
used to give these to my chickens. I do not know why the meat was
thus thrown away because the prisoners cannot talk Turkish. I also
found charred remains of bread and other food in the stove. A few
days ago the prisoners forbade me to sweep out their room. I do not
know why. They usually look depressed and silent. That is all I
know.”
Then the Pimple:
“I know both Jones and Hill well. When they first arrived they
were both smart and soldierlike. They have gradually become more
and more untidy and slovenly. For over a year they have been
studying occultism, and I know they achieved some extraordinary
results, e.g., they got the first news that came to Yozgad of the
taking of Baghdad. There were many other things. At one time
spirit-communiqués were published in the camp. All the other
prisoners knew of it and many believed in it. The first peculiarity I
noticed was that occasionally one or the other of them would write
an extraordinary letter, abusing certain officers and the camp in
general. I thought at the time these letters were due to drink, and
tore them up. This was many months ago. I remonstrated with them
for using such language about their fellow-officers.[42]
I do not know
when they began what they call ‘telepathy,’ but I used to come upon
them studying together. I was present at their public exhibition
(description follows). Nobody has ever given me a satisfactory
explanation of their powers.
“When Hill and Jones were imprisoned on March 7th it was my
duty to visit them every day and try to elicit the name of their
correspondent, which the Commandant wanted. Sometimes they
were rude to me, sometimes polite, sometimes sullen. At first they
got food sent in from Major Baylay’s mess (Posh Castle). I now
remember that soon after they were locked up they began to ask me
if Major Baylay was abusing them. About 20th March or a little
before they began to beg to be allowed to cook their own food, or
for the Turks to cook it. When I asked why, they first said they did
not want to cause trouble in the camp. I saw Major Baylay and Price,
of the Posh Castle mess, who said it was no trouble, and they would
continue sending food. When I told this to Hill and Jones they got
excited, insisted that they must not give trouble, and finally told me
in confidence that Major Baylay was putting poison in the meat, and
that they were afraid he would poison the other food too. I thought
they were joking about the poison, and that the real reason was
they did not wish to give trouble, but I arranged for them to cook
their own food. I now understand that they did not intend it as a
joke—their belief explains why they hid the meat which the Cook
found.
“On the 1st of April the order came from Constantinople to release
them. When I told them of this they were very frightened. They
asked me to keep the door locked, and said this order did not really
come from Constantinople, but was an arrangement between Major
Baylay and the postmaster who had been paid ten liras to forge a
telegram. They said the real object of the telegram was to stop them
writing to the British War Office about Baylay (it forbade them write
any letters), and to get them outside so that they could be
murdered. This alarmed me, as they were obviously serious. I
fetched in the English camp doctor, but did not tell him my
suspicions about their sanity. I was present during the doctor’s
examination, and noticed the two prisoners were reticent and said
nothing about Baylay. The doctor seemed puzzled. He paid several
visits and was vague when I questioned him. He mentioned
neurasthenia, but when I asked if that meant nervous trouble he
shut up and did not answer. He was obviously alarmed about them.
To please them and give the doctor a chance, the door was kept
locked for several days, in spite of the War Office order to liberate
them. Then I had to inform the camp that they were free, Hill and
Jones were terrified and begged me not to allow any English officers
to visit them.
“When visitors came Hill and Jones got very excited. They were
rude to many of their friends. They complained to me that these
officers had been sent by Major Baylay and Colonel Maule to murder
them. They complained that one officer—Captain Colbeck—had
asked them to come out, with the object of killing them, and when
they refused to go had threatened to take them by force.[43]
I found
out that the truth was their visitor was alarmed by their altered
appearance, and thought it would do them good to have tea in
Baylay’s garden. Hill and Jones thought they were being enticed out
to be killed. They also complained to me that Baylay had visited
them,[44]
and had scattered poison about the room, and had
poisoned some bread, which they had to burn in consequence.
When asked why they would not allow the Cook to sweep the room
they said if he did so it would liberate the poison which Baylay had
put in the dust. They next began to distrust the English doctor and
to think he was an emissary of Baylay’s. They pretended to take his
medicine, but confided to me that they dared not do so, and showed
me a bottle of Dover Powder which the doctor had given them,
pointing out that it was labelled ‘POISON.’” (O’Farrell had provided us
with medicines for his “neurasthenia” diagnosis, but had instructed
us not to take them.)
“When Constantinople, in their telegram of April 1st, prohibited Hill
and Jones from writing to England, they began to write
extraordinary letters to high Turkish officials and also to the Sultan.
This alarmed me. I could get no satisfaction from the English doctor.
I therefore asked you gentlemen to tell me the early symptoms of
madness”—(This was true enough. Moïse had done so, acting under
instructions from the Spook)—“and learned enough to make me
fairly certain that the English doctor was concealing the truth. With
the Commandant’s consent I then questioned the English doctor.”
(This interview was also ordered by the Spook, O’Farrell having been
previously warned by us.) “He was again vague, said the two men
could be treated and looked after here, and appeared to be afraid of
a Turkish asylum. I reported what O’Farrell had said to the
Commandant, and he decided he must have proper medical advice,
as they are gradually getting more violent.”
Moïse was then to produce the letters we had written to the “high
Turkish officials.” The Spook told us these letters were written by
himself. We pretended, at the time of writing them, that we were
“under control” and quite unconscious of what we were writing.
Moïse and the Commandant, of course, quite believed this.
I give below two specimens of the many letters we wrote. In my
letters the handwriting was very scrawly and hurried, there were
frequent repetitions, and occasionally words were left out. The first
is to the Sultan, the second to Enver Pasha. Hill was supposed to be
forced to write by me.
“To the Light of the World, the Ruler of the Universe, and
Protector of the Poor, the Sword  Breastplate of the True Faith, his
most gracious Majesty Abdul Hamid the of Turkey, Greeting: This is
the humble petition of two of your Majesty’s prisoners of War now at
Yozgad in Anatolia. We humbly ask your most gracious protection.
We remain here in danger of our lives owing to the plots of the
camp against us. They are all in league against us. Baylay is
determined to poison us. He tried to drag us into the garden to
murder us. He is in league with all the camp against us. We cannot
eat the food they send because he puts poison in it. Colonel Maule
has said to the Commandant he is going to get rid of us. Also the
doctor who was our friend until Baylay persuaded him to give us
poison instead of medicine. Please protect us. The Commandant is
our friend. When Baylay tried to he said no and put us in a nice
house please give him a high decoration for his kindness we cannot
go out because Baylay will kill us and all the camp hate us who shall
in duty bound ever pray for your gracious Majesty.
“E. H. Jones. C. W. Hill.”
“Dear Mr. Enver Pasha,
“I don’t suppose your Excellency will know who I am, but Jones
says he knows you. He met you in Mosul. Will you help us? The
other prisoners want to kill us. The ringleader is Major Baylay. He
gave a letter to the Turks and said we wrote it. He thought the
Commandant would hang us. But the Commandant was very kind to
us and gave us a house to ourselves and locked the door so that
Baylay could not get at us. We were very happy until Baylay started
poisoning our food. Then we the Commandant said we could cook
our own food and now he leaves the door open and we are in terror
lest Major Baylay comes and kills us he did come one day and tried
to entice us into the garden and he now sends the doctor to give us
poison the doctor pretends it is medicine but we know better. Will
you please write to the Commandant and ask him to lock the door.
“Your obedient servants,
“C. W. Hill. E. H. Jones.”
Such was the case that was laid before the two official Turkish
doctors in Yozgad, Major Osman and Captain Suhbi Fahri, by the
principal officials of the prisoners’ camp on the morning of April
13th, 1918. We knew nothing of the medical attainments of Major
Osman or Captain Suhbi Fahri, but we calculated that if the officers
in charge of a camp of German prisoners in England made similar
statements about two prisoners to the local English doctors, and told
them (as the Turks were told) that the German doctor in the camp
was trying to conceal the true state of affairs with a view to keeping
the two men from the horrors of an English asylum, it ought to
create an atmosphere most favourable to malingerers. In Yozgad we
had the additional advantage that the Turkish doctors were very
jealous of O’Farrell, whose medical skill had created a great
impression amongst the local officials, and were only too delighted
at a chance of proving him wrong. But the outstanding merit of the
scheme was that it avoided implicating O’Farrell. We would face the
Constantinople specialists purely on the recommendation of the
Turks, and O’Farrell’s disagreement with the local doctors would
make him perfectly safe if we were found out. Also O’Farrell’s whole
attitude towards us, his fellow-prisoners, would help us to deceive
the specialists, because it would be a strong argument against the
theory that we were malingering, for it would be natural to suppose
that the English doctor would seek to help rather than hinder us to
leave Yozgad. The Turks are not sufficiently conversant with Poker to
recognize a bluff of the second degree.
The Spook had promised the Commandant to place us under
control and make us seem mad when the doctors visited us. It
succeeded to perfection, for we had left no stone unturned to
deceive the Turks.
We were unshaven, unwashed, and looked utterly disreputable.
For over three weeks we had been living on a very short ration of
dry bread and tea. For the last three days we had eaten next to
nothing, and by the 13th April we were literally starving. We sat up
all night on the 12th, that our eyes might be dull when the doctors
came, and we took heavy doses of phenacetin at frequent intervals,
to slow down our pulses. All night we kept the windows and doors
shut, and the stove red-hot and roaring, and smoked hard, so that
by morning the atmosphere was indescribable. We scattered filth
about the room, which had already remained a week unswept, and
strewed it with slop-pails, empty tins, torn paper, and clothing. Near
the door we upset a bucket of dirty water; in the centre of the floor
was a heap of soiled linen, and close beside it what looked like the
remains of a morning meal. Over all we sprinkled a precious bottle
of Elliman’s Embrocation, adding a new odour to the awful
atmosphere. An hour before the doctors were due, Hill began
smoking strong plug tobacco, which always makes him sick. The
Turks, being Turks, were ninety minutes late. Hill kept puffing
valiantly at his pipe, and by the time they arrived he had the
horrible, greeny-yellow hue that is known to those who go down to
the sea in ships.
It was a lovely spring morning outside. The snow had gone. The
countryside, fresh from the rains, was bathed in sunlight, and a fine
fresh breeze was blowing. We heard Moïse and the doctors coming
up our stairs, laughing and chatting together. Captain Suhbi Fahri,
still talking, opened the door of our room—and stopped in the
middle of a sentence. It takes a pretty vile atmosphere to astonish a
Turk, but the specimen of “fug” we had so laboriously prepared took
his breath away. The two doctors stood at the door and talked in
whispers to Moïse.
Hill, with a British warm up to his ears and a balaclava on his
tousled head, sat huddled motionless over the red-hot stove,
warming his hands. On the other side of the stove I wrote furiously,
dashing off sheet after sheet of manuscript and hurling them on to
the floor.
Their examination of us was a farce. If their minds were not
already made up before they entered, the state of our room and our
appearance completely satisfied them. Major Osman never left the
door. Captain Suhbi Fahri tiptoed silently round the room, peering
into our scientist-trapping slop-pails and cag-heaps, until he got
behind my chair, when I whirled round on him in a frightened fury,
and he retreated suddenly to the door again. Neither of them sought
to investigate our reflexes—the test we feared most of all—but they
contented themselves with a few questions which were put through
Moïse in whispers, and translated to us by him.
They began with me.
Major Osman. “What are you writing?”
Self (nervously). “It is not finished yet.” The question was
repeated several times; each time I answered in the same words,
and immediately began writing again.
Major Osman. “What is it?”
Self. “A plan.” (Back to my writing. More whispering between the
doctors at the door.)
Major Osman. “What plan?”
Self. “A scheme.”
Major Osman. “What scheme?”
Self. “A scheme to divide up England at the end of the war. A
scheme for the abolition of England! Go away! You are bothering
me.”
(More whispering at the door.)
Major Osman. “Why do you want to do that?”
Self. “Because the English hate us.”
Major Osman. “Your father is English. Does he hate you?”
Photo by Savony
Self. “Yes. He has not written to me for a long time. He puts
poison in my parcels. He is in league with Major Baylay. It is all
Major Baylay’s doing.”
“THE MELANCHOLIC”—C. W. HILL
I grew more and more excited, and burst into a torrent of talk
about my good friend Baylay’s “enmity,” waving my arms and raving
furiously. The two doctors looked on aghast, and I noticed Captain
Suhbi Fahri changed his grip on his silver-headed cane to the thin
end. It took them quite a time to quieten me down again. At last I
gathered up my scattered manuscript and resumed my writing. Hill
had never moved or paid the slightest attention to the
pandemonium. They turned to him.
Major Osman. “Why are you keeping the room so hot? It is a warm
day.”
(Moïse had to call Hill by name and repeat the question several
times before Hill appeared to realize that he was being addressed.
Then he raised a starving, grey-green, woebegone face to his
questioners.)
“Cold,” he said, and huddled an inch nearer the stove.
“Why don’t you go out?” asked Major Osman.
“Baylay,” said Hill, without lifting his head.
“Why don’t you sweep the floor?”
“Poison in dust.”
“Why is there poison in the dust?”
“Baylay,” said the monotonous voice again.
“Is there anything you want?” Major Osman asked.
Hill lifted his head once more.
“Please tell the Commandant to lock the door and you go away,”
then he turned his back on his questioners.
The two doctors, followed by Moïse, tiptoed down the stairs. We
heard the outer gate clang, listened carefully to make sure they had
gone, and then let loose the laughter we had bottled up so long. For
both the Turkish doctors had clearly been scared out of their wits by
us.
Moïse came back later with our certificates of lunacy. They were
imposing documents, written in a beautiful hand, and each
decorated with two enormous seals. The following is a translation as
it was written out by the Pimple at our request:—
“HILL. This officer is in a very calm condition, thinking. His face is
long, not very fat. Breath heavy. He has been seen very thinking. He
gave very short answers. There is no (? life) in his answers. There is
a nervousness in his present condition. He states that his life is in
danger and he wants the door to be locked because a Major is going
to kill him. By his answers and by the fact he is not taking any food,
it seems that he is suffering from melancholia. We beg to report that
it is necessary he be sent to Constantinople for treatment and
observation and a final examination by a specialist.”
“JONES. This officer appears to be a furious. Weak constitution.
His hands were shaking and was busy writing when we went to see
him. When asked what he was writing he answered that it was a
plan for the abolition of England because the English were his
enemies; even his father was on their part because he was not
sending letters. His life is in danger. A Major wants to kill him and
has put poison in his meat. That is why he is not eating. He
requested nobody may be allowed to come and the door may be
locked. According to the statement of the orderly and other officers
this officer has been over-studying spiritualism. He says that the
doctor was giving him poison instead of medicine. According to his
answers and his present condition he seems to suffer from a
derangement in his brains. We beg to report that it is necessary to
send him to Constantinople for observation and treatment.”
Both reports were signed and sealed by
“Major Osman, Bacteriologist in charge of Infectious Diseases at
Yozgad.”
“Captain Suhbi Fahri, District Doctor in charge of Infectious
Diseases at Yozgad.”
“Your control,” said Moïse to us, “was wonderful—marvellous. Your
very expressions had altered. The doctors said your looks were ‘very
bad, treacherous, haine.’ You, Jones, have a fixed delusion—(idée
fixée)—and Hill has melancholia, they say. They have ordered that a
sentry be posted to prevent your committing suicide and that you
and your room be thoroughly cleaned, by force if necessary. Do you
remember the doctors’ visit?”
Photo by Annan
“THE FURIOUS.”—E. H. JONES
Our memories, we said, were utterly blank, and we got the Pimple
to relate what had occurred.
“It was truly a glorious exhibition of the power of our Spook,” the
Pimple ended, “and the Commandant is greatly pleased. I trust you
suffer no ill-effects?”
We were only very tired, and very anxious that the doctors’
suggestions as to cleaning up should be carried out. Sentries were
called in. Our bedding and possessions were moved to a clean room,
and we were led out into the yard and made to bathe in the horse-
trough. Then we slept the sleep of the successful conspirator till
evening.
CHAPTER XXII
HOW THE SPOOK CORRESPONDED WITH THE TURKISH WAR
OFFICE AND GOT A REPLY
I woke at sunset to find Doc. O’Farrell bending over me. “Doctors
been here?” he asked in a hoarse whisper.
I nodded.
“And what’s the result?”
“Did you see the sentry at the door?” I asked.
“Don’t tell me you’re found out,” Doc. moaned, “or I’ll never
forgive myself.”
“All right, Doc. dear! The sentry’s there to prevent us committing
suicide!”
Doc. stared a moment, and then doubled up with laughter that
had to be silent because of the Turk outside.
“Like to see the medical reports?” I asked, handing him the
Pimple’s translation.
He began to read. At the first sentence he burst into a loud
guffaw, and thrust the reports hastily out of sight. Luckily the
gamekeeper at the door paid no attention. The Doc. apologized for
his indiscretion and managed to read the rest in silence.
“Think we’ve a chance?” Hill asked, as he finished.
“Ye’re a pair of unmitigated blackguards,” said the Doc., “an’ I’m
sorry for the leech that’s up against you. There’s only one thing
needed to beat the best specialist in Berlin or anywhere else, but as
you both aim at getting to England you can’t do it.”
“What is that?” we asked.
“One of ye commit suicide!” said the Doc., laughing.
“By Jove! That’s a good idea!” I cried. “We’ll both try it.”
“Don’t be a fool!” he began sharply, then—seeing the merriment in
our eyes—“Oh! be natural! Be natural an’ you’ll bamboozle
Æsculapius himself.” He dodged the pillow Hill threw at him and
clattered down the stairs chuckling to himself.
Within five minutes of his going we decided to hang ourselves
—“within limits”—on the way to Constantinople.
A little later the Pimple arrived, with the compliments and thanks
of the Commandant to the Spook, and would the Spook be so kind
as to dictate a telegram about us to the War Office? The Spook was
most obliging, and somewhere amongst the Turkish archives at
Constantinople the following telegram reposes:
“For over a year two officer prisoners here have spent much time
in study of spiritualism and telepathy, and have shown increasing
signs of mental derangement which recently have become very
noticeable. I therefore summoned our military doctors Major Osman
and Captain Suhbi Fahri who after examination diagnosed
melancholia in the case of Hill and fixed delusion in the case of
Jones and advised their despatch to Constantinople for observation
and treatment. Doctors warn me these two officers may commit
suicide or violence. I respectfully request I may be allowed to send
them as soon as possible. Transport will be available in a few days
when prisoners from Changri arrive. If permitted I shall send them
with necessary escort under charge of my Interpreter who can watch
and look after them en route and give any further information
required by the specialists. Until his return may I have the services
of the Changri Interpreter? My report together with the report of the
doctors, follows by post. Submitted for favour of urgent orders.”
This spook-telegram was sent by the Commandant on 14th April,
1918, at 5 p.m. The same night the Spook dictated a report on our
case, of a character so useful to the Constantinople specialists that
Kiazim was thanked for it by his superiors at headquarters. The
spook-report (which should also be among the Constantinople
archives) is as follows:
“In reference to my wire of 14th April I beg to report as follows:
As will be seen from the enclosed medical reports written by Major
Osman and Captain Suhbi Fahri, the Military Medical Officers of
Yozgad, there are two officers in this camp who are suffering from
grave mental disease. The doctors recommend their despatch to
Constantinople for observation and treatment, and I beg to urge that
this be done as early as possible, as the doctors warn me they may
commit suicide or violence, and I am anxious to avoid any such
trouble in this camp.
“In addition to the information contained in the medical reports I
beg to submit the following facts for guidance and consideration.
The two officers are Lieut. Hill and Lieut. Jones. The former came
here with the prisoners from Katia. The latter from Kut-el-Amara. I
have made enquiries about both. I find Lieut. Hill has always been a
remarkably silent and solitary man. He has the reputation of never
speaking unless spoken to, and then only answers in monosyllables.
During his stay here he has been growing more and more morose
and gloomy. Lieut. Jones is regarded by his fellow-prisoners as
eccentric and peculiar. I myself have noticed an increasing
slovenliness in his dress since he came here. I learn that he has
done a number of little things which caused his comrades to regard
him as peculiar. For instance, sixteen months ago he spent a week
sliding down the stairs in his house and calling himself the
‘Toboggan King.’ On another occasion when receiving a parcel from
England in this office he expressed disgust at the ‘rubbish’ which
was sent him, and drawing out a pocket-knife he slashed into
ribbons a valuable waterproof sheet which had been included in his
parcel. This was about a year ago.[45]
Such appears to be the
reputation of these two officers in the camp.
“About eighteen months ago a number of officers began to take
up spiritualism. Among these Jones was prominent. He asserted he
was in communication with the dead and for some time he even
published the news he thus obtained. I do not know when Hill
began, but he also was a keen spiritualist. They have both spent a
great deal of their time in this pursuit. Whether or not this has
anything to do with their present condition I cannot say. Many other
officers did the same and I saw no reason to interfere as I
considered it a legitimate amusement.
“These two officers also appear to have studied what they call
‘telepathy,’ and about two or three months ago they gave an
exhibition of thought-reading, part of which my Interpreter saw and
which considerably surprised their fellow-officers. Later Hill and
Jones asserted they were in communication (telepathic) with people
in Europe and elsewhere as well as with the dead. Early in March, as
I reported to you in my letter of the 18th March, Jones and Hill were
found guilty on a charge of attempting to communicate with some
person in Yozgad whose name they refused to give, and as I
reported, I confined them in a separate house and forbade any
intercourse with the rest of the camp. I allowed them to have their
food sent in from Major Baylay’s house, which is near.
“While in confinement these two officers appear to have got the
idea that their comrades in the camp disliked them, and this idea
developed into delusion and terror that they were going to be
murdered. Their condition became so grave that I called in the two
medical officers, who had no hesitation, after examining them, in
recommending their despatch to Constantinople.
“Meantime, until their departure, by the advice of Major Osman
and Captain Suhbi Fahri, I have posted a special guard over the
patients to prevent them from doing themselves or others any harm.
“With regard to the journey, as reported in my telegram I beg
leave to send them under charge of my Interpreter with a sufficient
escort, as the sufferers are accustomed to him and he will be able to
understand their wants, and especially because knowing all they
have done he may be of assistance to the specialists in their enquiry.
Until his return I would like the services of the Changri Interpreter,
but if necessary, for a short time, I could communicate any orders
that may be necessary direct as several British officers here know a
little Turkish.”
The report was posted on the 15th April. On the 16th the
Commandant received from Constantinople the following telegram in
answer to the Spook’s wire:
“Number 887. 15th April. Urgent. Very important. Answer to your
cipher wire No. 77. Under your proposed arrangement send to the
Hospital of Haidar Pasha the two English Officers who have to be
under observation. Communicate with the Commandant Changri.—
Kemal.”
“Hurrah!” said Moïse, when he brought us the news, “the Spook
has controlled Constantinople!”
CHAPTER XXIII
IN WHICH THE SPOOK PERSUADES MOÏSE TO VOLUNTEER FOR
ACTIVE SERVICE
The telegram from Kemal Pasha, ordering us to be sent to
Constantinople, arrived on the 16th April. The prisoners from
Changri, bringing with them the Interpreter who was to take the
place of the Pimple, reached Yozgad on the 24th. Hill and I left for
Angora on the 26th.
The Spook explained that though we would probably read AAA’s
thoughts and discover the position of the third clue as soon as we
got to Constantinople, it was essential for our safety that the
Constantinople specialists should, for a time, think us slightly
deranged and in need of a course of treatment. Therefore it behoved
Moïse to endeavour to bring this about by reporting to the
Constantinople authorities the things which the Spook would tell him
to report, and learning his lesson carefully.
“What will happen to the mediums,” the Pimple asked, “if the
specialists do not think them slightly deranged?”
“Jail, mon petit cheri chou!” said the Spook. “Jail for malingering,
and they will not return to Yozgad to continue our experiments. You
must play your part.”
The Pimple’s part, the Spook explained, was to observe and note
carefully everything the mediums said and did. At the request of the
Spook, as soon as the Yozgad doctors had declared us mad, the
Commandant publicly ordered Moïse to make notes of our behaviour,
for the benefit of the doctors at the Haidar Pasha hospital. The
Spook declared that from now on the mediums would be kept
“under control” so as to appear mad, for control being a species of
hypnotism the oftener we were placed in that condition the easier it
would be for the Spook to impose its will on us in Constantinople to
deceive the specialists. Thus, while the Turks thought the Spook was
practising on us, making us appear mad, we were really practising
our madness on the Turks. Doc. O’Farrell visited us every day. The
Turks thought he too was “under control” and that he was puzzled
by our symptoms. In point of fact he was coaching us very carefully
in what things were fit and proper for a “melancholic” and “a
furious” to do and say, for we had decided to adhere to the two
distinct types of madness diagnosed by the Yozgad doctors. What he
secretly taught us each morning, the Spook made us do “under
control” each evening, when it was duly noted down by the Pimple.
These notes were revised and corrected by the Spook at regular
intervals. In this way we piled up a goodly store of evidence as to
our insanity.
Every evening, after the rest of the camp had been locked up, we
held séances, and at every séance the poor Pimple was put through
his lesson. Over and over again he was made to recite to the spook-
board what he had to say to the Constantinople doctors. It made a
strange picture: Moïse, leaning over the piece of tin that was his
Delphic oracle, told his tale as he would tell it at Haidar Pasha. His
face used to be lined with anxiety lest he should go wrong and incur
the wrath of the Unknown. Hill and I, pale and thin with starvation,
and the strain of our long deception, sat motionless (and, as Moïse
thought, unconscious), with our fingers resting on the glass and
every sense strained to detect the slightest error in the Pimple’s
story or in his tone or manner of telling it. And when the mistakes
came (as to begin with they did with some frequency), the glass
would bang out the Spook’s wrath with every sign of anger and
there would follow the trembling apologies and stammered
emendations of the unhappy Interpreter. Hill and I had got beyond
the stage of wanting to laugh, for we were working now at our last
hope. It was absolutely essential that the Pimple’s story should be
without flaw.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Accelerating Digital Transformation 10 Years Of Software Center Jan Bosch
PDF
New Approaches For Multidimensional Signal Processing Roumen Kountchev
PDF
The Lean Product Design and Development Journey: A Practical View 1st Edition...
PDF
Decision Sciences for COVID-19 1st Edition Said Ali Hassan
PDF
Infrastructure Delivery Systems Governance and Implementation Issues Bankole ...
PPTX
Better Software, Better Research
PPT
Edu pov pres sign writing Queensland training education
PDF
Proceedings Of The International Conference On Information Engineering And Ap...
Accelerating Digital Transformation 10 Years Of Software Center Jan Bosch
New Approaches For Multidimensional Signal Processing Roumen Kountchev
The Lean Product Design and Development Journey: A Practical View 1st Edition...
Decision Sciences for COVID-19 1st Edition Said Ali Hassan
Infrastructure Delivery Systems Governance and Implementation Issues Bankole ...
Better Software, Better Research
Edu pov pres sign writing Queensland training education
Proceedings Of The International Conference On Information Engineering And Ap...

Similar to Intelligent Software Defect Prediction 1st Edition Xiaoyuan Jing (20)

PDF
Parametric Packetbased Audiovisual Quality Model For Iptv Services 1st Editio...
DOC
ICFAI Projects and Operations Management - Solved assignments and case study ...
PDF
Essential Spectrumbased Fault Localization Xiaoyuan Xie Baowen Xu
PDF
Proceedings Of Second International Conference On Recent Advancements In Arti...
PDF
Deep Learning In Solar Astronomy Long Xu Yihua Yan Xin Huang
PDF
Building Construction And Technology 1st Edition Vijayalaxmi J
PDF
Social Informatics Socinfo 2013 International Workshops Qmc And Histoinformat...
PDF
The Challenge Of Reframing Engineering Education 1st Edition Dennis Sale Auth
PDF
Federated Learning Over Wireless Edge Networks Wei Yang Bryan Lim
PDF
Through life Engineering Services Motivation Theory and Practice 1st Edition ...
PDF
Privacy Computing Theory and Technology 2nd Edition Fenghua Li
PDF
Scalable Information Systems 5th International Conference INFOSCALE 2014 Seou...
DOCX
DIGITAL MARKETING PROJECT
PDF
Semantic Web Based Systems Quality Assessment Models SpringerBriefs in Comput...
DOCX
◆Rapid InstructionalDesignLearning ID Fast and RightS E C .docx
PDF
Navigating Complexity A Practice Guide Coll
PDF
Learning Mathematics in a Mobile App Supported Math Trail Environment Adi Nur...
PDF
TOGAF 10 Shift to digital Product w205.pdf
PDF
Intelligent Decision Technology Support in Practice 1st Edition Jeffrey W. Tw...
PPT
Adoptability 2011
Parametric Packetbased Audiovisual Quality Model For Iptv Services 1st Editio...
ICFAI Projects and Operations Management - Solved assignments and case study ...
Essential Spectrumbased Fault Localization Xiaoyuan Xie Baowen Xu
Proceedings Of Second International Conference On Recent Advancements In Arti...
Deep Learning In Solar Astronomy Long Xu Yihua Yan Xin Huang
Building Construction And Technology 1st Edition Vijayalaxmi J
Social Informatics Socinfo 2013 International Workshops Qmc And Histoinformat...
The Challenge Of Reframing Engineering Education 1st Edition Dennis Sale Auth
Federated Learning Over Wireless Edge Networks Wei Yang Bryan Lim
Through life Engineering Services Motivation Theory and Practice 1st Edition ...
Privacy Computing Theory and Technology 2nd Edition Fenghua Li
Scalable Information Systems 5th International Conference INFOSCALE 2014 Seou...
DIGITAL MARKETING PROJECT
Semantic Web Based Systems Quality Assessment Models SpringerBriefs in Comput...
◆Rapid InstructionalDesignLearning ID Fast and RightS E C .docx
Navigating Complexity A Practice Guide Coll
Learning Mathematics in a Mobile App Supported Math Trail Environment Adi Nur...
TOGAF 10 Shift to digital Product w205.pdf
Intelligent Decision Technology Support in Practice 1st Edition Jeffrey W. Tw...
Adoptability 2011
Ad

Recently uploaded (20)

PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Presentation on HIE in infants and its manifestations
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Lesson notes of climatology university.
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Institutional Correction lecture only . . .
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Classroom Observation Tools for Teachers
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Presentation on HIE in infants and its manifestations
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
STATICS OF THE RIGID BODIES Hibbelers.pdf
Microbial diseases, their pathogenesis and prophylaxis
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Lesson notes of climatology university.
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Institutional Correction lecture only . . .
human mycosis Human fungal infections are called human mycosis..pptx
Classroom Observation Tools for Teachers
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
01-Introduction-to-Information-Management.pdf
A systematic review of self-coping strategies used by university students to ...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Ad

Intelligent Software Defect Prediction 1st Edition Xiaoyuan Jing

  • 1. Intelligent Software Defect Prediction 1st Edition Xiaoyuan Jing download https://ebookbell.com/product/intelligent-software-defect- prediction-1st-edition-xiaoyuan-jing-54902328 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Intelligent Software Defect Prediction Xiaoyuan Jing Haowen Chen https://ebookbell.com/product/intelligent-software-defect-prediction- xiaoyuan-jing-haowen-chen-54943150 Intelligent Software Methodologies Tools And Techniques 13th International Conference Somet 2014 Langkawi Malaysia September 2224 2014 Revised Selected Papers 1st Edition Hamido Fujita https://ebookbell.com/product/intelligent-software-methodologies- tools-and-techniques-13th-international-conference- somet-2014-langkawi-malaysia-september-2224-2014-revised-selected- papers-1st-edition-hamido-fujita-5141056 Intelligent Software Methodologies Tools And Techniques 14th International Conference Somet 2015 Naples Italy September 1517 2015 Proceedings 1st Edition Hamido Fujita https://ebookbell.com/product/intelligent-software-methodologies- tools-and-techniques-14th-international-conference-somet-2015-naples- italy-september-1517-2015-proceedings-1st-edition-hamido- fujita-5236026 Advancing Technology Industrialization Through Intelligent Software Methodologies Tools And Techniques Fujita https://ebookbell.com/product/advancing-technology-industrialization- through-intelligent-software-methodologies-tools-and-techniques- fujita-37189172
  • 3. Advancing Technology Industrialization Through Intelligent Software Methodologies Tools And Techniques Proceedings Of The 18th International In Artificial Intelligence And Applications Hamido Fujita Editor https://ebookbell.com/product/advancing-technology-industrialization- through-intelligent-software-methodologies-tools-and-techniques- proceedings-of-the-18th-international-in-artificial-intelligence-and- applications-hamido-fujita-editor-37244864 Security And Safety Interplay Of Intelligent Software Systems Esorics 2018 International Workshops Issa 2018 And Csits 2018 Barcelona Spain September 67 2018 Revised Selected Papers 1st Ed Brahim Hamid https://ebookbell.com/product/security-and-safety-interplay-of- intelligent-software-systems-esorics-2018-international-workshops- issa-2018-and-csits-2018-barcelona-spain-september-67-2018-revised- selected-papers-1st-ed-brahim-hamid-10487310 Designing Distributed Learning Environments With Intelligent Software Agents Fuhua Oscar Lin https://ebookbell.com/product/designing-distributed-learning- environments-with-intelligent-software-agents-fuhua-oscar-lin-2220234 Modelling In Mechanical Engineering And Mechatronics Towards Autonomous Intelligent Software Models Nikolay Avgoustinov https://ebookbell.com/product/modelling-in-mechanical-engineering-and- mechatronics-towards-autonomous-intelligent-software-models-nikolay- avgoustinov-1187526 Complex Intelligent And Software Intensive Systems Proceedings Of The 17th International Conference On Complex Intelligent And Software Intensive Systems Cisis2023 Leonard Barolli https://ebookbell.com/product/complex-intelligent-and-software- intensive-systems-proceedings-of-the-17th-international-conference-on- complex-intelligent-and-software-intensive-systems-cisis2023-leonard- barolli-50714378
  • 7. Xiao-Yuan Jing • Haowen Chen • Baowen Xu Intelligent Software Defect Prediction
  • 8. Xiao-Yuan Jing School of Computer Science Wuhan University Wuhan, Hubei, China Baowen Xu Computer Science & Technology Nanjing University Nanjing, Jiangsu, China Haowen Chen School of Computer Science Wuhan University Wuhan, Hubei, China ISBN 978-981-99-2841-5 ISBN 978-981-99-2842-2 (eBook) https://doi.org/10.1007/978-981-99-2842-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
  • 9. Preface With the increase of complexity and dependency of software, the software product may suffer from low quality, high cost, hard-to-maintain, and even the occurrence of defects. Software defect usually produces incorrect or unexpected results and behaviors in unintended ways. Software defect prediction (SDP) is one of the most active research fields in software engineering and plays an important role in software quality assurance. According to the feedback of SDP, developers can subsequently conduct defect location and repair under reasonable resource allocation, which is helpful in reducing the maintenance cost. The early task of SDP is performed within a single project. Developers can make use of the well-labeled historical data of the currently maintained project to build the model and predict the defect-proneness of the remaining instances. This process is called within-project defect prediction (WPDP). However, the annotation for defect data (i.e., defective or defective-free) is time-consuming and high-cost, which is a hard task for practitioners in the development or maintenance cycle. To solve this problem, researchers consider introducing other projects with sufficient historical data to conduct the cross-project defect prediction (CPDP) which has received extensive attention in recent years. As the special case of CPDP, heterogeneous defect prediction (HDP) refers to the scenario that training and test data have different metrics, which can relax the restriction on source and target projects’ metrics. Besides, there also exist other research questions of SDP to be further studied, such as cross-version defect prediction, just-in-time (JIT) defect prediction, and effort-aware JIT defect prediction. In the past few decades, more and more researchers pay attention to SDP and a lot of intelligent SDP techniques have been presented. In order to obtain the high-quality representations of defect data, a lot of machine learning techniques such as dictionary learning, semi-supervised learning, multi-view learning, and deep learning are applied to solve SDP problems. Besides, transfer learning techniques are also used to eliminate the divergence between different project data in CPDP scenario. Therefore, the combination with machine learning techniques is conducive to improving the prediction efficiency and accuracy, which can promote the research of intelligent SDP to make significant progress. v
  • 10. vi Preface We propose to draft this book to provide a comprehensive picture of the current state of SDP researches instead of improving and comparing existing SDP approaches. More specifically, this book introduces a range of machine learning- based SDP approaches proposed for different scenarios (i.e., WPDP, CPDP, and HDP). Besides, this book also provides deep insight into current SDP approaches’ performance and learned lessons for further SDP researches. This book is mainly applicable to graduate students, researchers who work in or have interests in the areas of SDP, and the developers who are responsible for software maintenance. Wuhan, China Xiao-Yuan Jing December, 2022 Haowen Chen
  • 11. Acknowledgments We thank Li Zhiqiang, Wu Fei, Wang Tiejian, Zhang Zhiwu, and Sun Ying from Wuhan University for their contributions to this research. We would like to express our heartfelt gratitude to Professor Baowen Xu and his team from Nanjing University for their selfless technical assistance in the compilation of this book. We are so thankful for the invaluable help and support provided by Professor Xiaoyuan Xie from Wuhan University, whose valuable advice and guidance was crucial to the successful completion of this book. We wanted to express our sincere appreciation for the unwavering support provided by Nanjing University, Wuhan University, and Nanjing University of Posts and Telecommunications, as well as the editing suggestions provided by Kamesh and Wei Zhu from Springer Publishing House. We just wanted to thank you from the bottom of our hearts for your unwavering support and guidance throughout the compilation of this book. Finally, we would like to express our heartfelt appreciation to two master students, Hanwei and Xiuting Huang, who participated in the editing process and made indelible contributions to the compilation of this book. vii
  • 12. Contents 1 Introduction .................................................................. 1 1.1 Software Quality Assurance ............................................ 1 1.2 Software Defect Prediction ............................................. 2 1.3 Research Directions of SDP ............................................ 3 1.3.1 Within-Project Defect Prediction (WPDP) .................... 3 1.3.2 Cross-Project Defect Prediction (CPDP) ...................... 4 1.3.3 Heterogeneous Defect Prediction (HDP) ...................... 4 1.3.4 Other Research Questions of SDP ............................. 5 1.4 Notations and Corresponding Descriptions ............................ 7 1.5 Structure of This Book.................................................. 8 References ..................................................................... 9 2 Machine Learning Techniques for Intelligent SDP....................... 13 2.1 Transfer Learning ....................................................... 13 2.2 Deep Learning........................................................... 14 2.3 Other Techniques........................................................ 15 2.3.1 Dictionary Learning ............................................ 15 2.3.2 Semi-Supervised Learning ..................................... 15 2.3.3 Multi-View Learning ........................................... 16 References ..................................................................... 16 3 Within-Project Defect Prediction .......................................... 19 3.1 Basic WPDP............................................................. 19 3.1.1 Dictionary Learning Based Software Defect Prediction ...... 19 3.1.2 Collaborative Representation Classification Based Software Defect Prediction..................................... 26 3.2 Semi-supervised WPDP ................................................ 28 3.2.1 Sample-Based Software Defect Prediction with Active and Semi-supervised Learning ......................... 28 References ..................................................................... 33 ix
  • 13. x Contents 4 Cross-Project Defect Prediction ............................................ 35 4.1 Basic CPDP ............................................................. 36 4.1.1 Manifold Embedded Distribution Adaptation ................. 36 4.2 Class Imbalance Problem in CPDP .................................... 46 4.2.1 An Improved SDA Based Defect Prediction Framework ..... 46 4.3 Semi-Supervised CPDP................................................. 54 4.3.1 Cost-Sensitive Kernelized Semi-supervised Dictionary Learning ............................................ 54 References ..................................................................... 61 5 Heterogeneous Defect Prediction .......................................... 65 5.1 Basic HDP............................................................... 66 5.1.1 Unified Metric Representation and CCA-Based Transfer Learning ............................................... 66 5.2 Class Imbalance Problem in HDP...................................... 83 5.2.1 Cost-Sensitive Transfer Kernel Canonical Correlation Analysis ............................................ 83 5.2.2 Other Solutions ................................................. 104 5.3 Multiple Sources and Privacy Preservation Problems in HDP........ 104 5.3.1 Multi-Source Selection Based Manifold Discriminant Alignment ........................................ 104 5.3.2 Sparse Representation Based Double Obfuscation Algorithm ....................................................... 109 References ..................................................................... 133 6 An Empirical Study on HDP Approaches ................................ 139 6.1 Goal Question Metric (GQM) Based Research Methodology ........ 139 6.1.1 Major Challenges ............................................... 139 6.1.2 Review of Research Status ..................................... 140 6.1.3 Analysis on Research Status ................................... 141 6.1.4 Research Goal................................................... 144 6.1.5 Research Questions ............................................. 145 6.1.6 Evaluation Metrics.............................................. 145 6.2 Experiments ............................................................. 147 6.2.1 Datasets ......................................................... 147 6.2.2 SDP Approaches for Comparisons............................. 149 6.2.3 Experimental Design ........................................... 150 6.2.4 Experimental Results ........................................... 151 6.3 Discussions .............................................................. 160 References ..................................................................... 168 7 Other Research Questions of SDP ......................................... 171 7.1 Cross-Version Defect Prediction ....................................... 171 7.1.1 Methodology .................................................... 171 7.1.2 Experiments ..................................................... 173 7.1.3 Discussions...................................................... 175 7.2 Just-in-Time Defect Prediction ......................................... 175
  • 14. Contents xi 7.2.1 Methodology .................................................... 175 7.2.2 Experiments ..................................................... 179 7.2.3 Discussions...................................................... 187 7.3 Effort-Aware Just-in-Time Defect Prediction.......................... 188 7.3.1 Methodology .................................................... 188 7.3.2 Experiments ..................................................... 191 7.3.3 Discussions...................................................... 196 References ..................................................................... 198 8 Conclusion .................................................................... 203 8.1 Conclusion .............................................................. 203
  • 15. Chapter 1 Introduction 1.1 Software Quality Assurance With the increasing pressures of expediting software projects that is always increasing in size and complexity to meet rapidly changing business needs, quality assurance activities such as fault prediction models have thus become extremely important. The main purpose of a fault prediction model is the effective allocation or prioritization of quality assurance effort (test effort and code inspection effort). Construction of these prediction models are mostly dependent on historical or previous software project data referred to as a dataset. However, a prevalent problem in data mining is the skewness of a dataset. Fault prediction datasets are not excluded from this phenomenon. Most datasets have the majority of the instances being either clean or not faulty and conventional learning methods are primarily designed for balanced datasets. Common classifiers such as Neural Networks (NN), Support Vector Machines (SVM), and decision trees work best toward optimizing their objective functions, which lead to the maximum overall accuracy—the ratio of correctly predicted instances to the total number of instances. The use of imbalanced datasets for training a classifier will most likely generate a classifier that tends to over-predict the presence of the majority class but a lower probability of predicting the minority or faulty modules. When the model predicts the minority class, it often has a higher error rate compared to predictions for the majority class. This impacts the performance of classifiers in machine learning and is known as learning from imbalanced datasets. This affects the prediction performance of classifiers, and in machine learning, this issue is known as learning from imbalanced datasets. Several methods have been proposed in machine learning for dealing with the class imbalanced issue such as random over and under sampling creating synthetic data application of cleaning techniques for data sampling and cluster-based sampling. With a significant amount of literature in machine learning for imbalanced datasets, very few studies have tackled it in the area of fault prediction. The first of such studies by Kamei et al. [1] showed that © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 X.-Y. Jing et al., Intelligent Software Defect Prediction, https://doi.org/10.1007/978-981-99-2842-2_1 1
  • 16. 2 1 Introduction sampling techniques improved the prediction performance of linear and logistics models, whilst other two models (neural network and classification tree) did not have a better performance upon application of the sampling techniques. Interestingly, sampling techniques applied to datasets during fault prediction are mostly evaluated in terms of Accuracy, AUC, F1-measure, Geometric Mean Accuracy just to name a few; however, these measures ignore the effort needed to fix faults, that is, they do not distinguish between a predicted fault in a small module and a predicted fault in a large module. Nickerson et al. [2] conclude that to evaluate the performance of classifiers on imbalanced datasets, accuracy or its inverse error rate should never be used. Chawla et al. [3] also allude to the conclusion that simple predictive accuracy might not be appropriate for an imbalanced dataset. The goal of this research is to improve the prediction performance of fault-prone module prediction models, applying over and under sampling approaches to rebalance number of fault-prone modules and non-fault-prone modules in the training dataset and to find the appropriate distribution or proportion of faulty and non-faulty modules that results in the best performance. The experiment focuses on the use of Norm(Popt), which is an effort-aware measure proposed by Kamei et al. [4] to evaluate the effect of over/under sampling on prediction models to find out if the over/under sampling is still effective in a more realistic setting. 1.2 Software Defect Prediction The defect is a flaw in the component or system which can cause it to fail to perform its desired function, that is, an incorrect statement or data definition. A defect, if encountered during execution, may cause a failure of the system or a component. Defect prediction helps in identifying the vulnerabilities in the project plan in terms of lack of resources, improperly defined timelines, predictable defects, etc. It can help organizations to fetch huge profits without getting delayed on schedules planned or overrun on estimates of budget. It helps in modifying the parameters in order to meet the schedule variations. The methods to estimate the software defects are regression, genetic programming, clustering, neural network, statistical technique of discriminate analysis, dictionary learning approach, hybrid attribute selection approach, classification, attribute selection and instance filtering, Bayesian belief networks, K-means clustering, and association rule mining. In the domain of software defect prediction, people have developed many software defect prediction models. These models are mostly described in two classes: one class is in the later period of the software life cycle (testing phase), having gotten defect data, predicts how many defects still in the software with these data. Models in this class include: capture-recapture method based model, neural network based model, and measure method based on scale and complexity of source code. The other class, which occurs before the software development phase, aims to predict the number of defects that will arise during the software development process by analyzing defect data from previous projects. Presently, published models in this class include: phase based
  • 17. 1.3 Research Directions of SDP 3 model proposed by Gaffney and Davis, Ada programming defect prediction model proposed by Agresti and Evanco, early prediction model proposed by USA ROME lab, software development early prediction method proposed by Carol Smidts in Maryyland University, and early fuzzy neural network based model. However, there are a number of serious theoretical and practical problems in these methods. Software development is an extremely complicated process. Defects relate to many factors. If you want to measure exactly, you would consider as many correlative factors as possible, but it would make the model more complicated. If considering the solvability, you would have to simplify the model. However, it would not make out the convinced answer. Neural network based prediction model, for instance, has lots of problems in training and verifying the sample collection. Software test in many organizations is still in the original phase, so lots of software hardly gives the defects number requested, which would bring certain difficulties to sample collection. Early models consider inadequately on the uncertain factors in software develop process; the dependence to data factors is great besides. Therefore, many methods have difficulty in application. 1.3 Research Directions of SDP 1.3.1 Within-Project Defect Prediction (WPDP) Some defect data in the same project are used as the training set to build the prediction model, and the remaining small number of data are used as test set to test the performance. At present, some researchers mainly use the machine learning algorithm to construct the defect prediction model on the within-project defect prediction. In addition, how to optimize the data structure and extract effective feature are also the focus of current research. Some important research works will be summarized below. Elish et al. [5] use support vector machine (SVM) to conduct defect prediction and compare its predictive performance with eight statistical and machine learning models on four NASA datasets. Lu et al. [6] leverage active learning to predict defect, and they also use feature compression techniques to make feature reduction on defect data. Li et al. [7] propose a novel semi-supervised learning method—ACoForest—which can sample the prediction modules that are most helpful for learning. Rodriguez et al. [8] compare different methods for different data preprocessing problems, such as sampling method, cost sensitive method, integration method, and hybrid method. The final experimental results show that the above different methods can effectively improve the accuracy of defect prediction after performing the class imbalance. Seiffert et al. [9] analyze 11 different algorithms and seven different data sampling techniques and find that class imbalance and data noise would have the negative impact on prediction performance.
  • 18. 4 1 Introduction 1.3.2 Cross-Project Defect Prediction (CPDP) When data are insufficient or non-existent for building quality defect predictors, software engineers can use data from other organizations or projects. This is called cross-project defect prediction (CPDP). Acquiring data from other sources is a non-trivial task when data owners are concerned about confidentiality. In practice, extracting project data from organizations is often difficult due to the business sensitivity associated with the data. For example, at a keynote address at ESEM’11, Elaine Weyuker doubted that she will ever be able to release the AT&T data she used to build defect predictors [10]. Due to similar privacy concerns, we were only able to add seven records from two years of work to our NASA-wide software cost metrics repository [11]. In a personal communication, Barry Boehm stated that he was able to publish less than 200 cost estimation records even after 30 years of COCOMO effort. To enable sharing, we must assure confidentiality. In our view, confidentiality is the next grand challenge for CPDP in software engineering. In previous work, we allowed data owners to generate minimized and obfuscated versions of their original data. Our MORPH algorithm [12] reflects on the boundary between an instance and its nearest instance of another class, and MORPH’s restricted mutation policy never pushes an instance across that boundary. MORPH can be usefully combined with the CLIFF data minimization algorithm [13]. CLIFF is an instance selector that returns a subset of instances that best predict for the target class. Previously we reported that this combination of CLIFF and MORPH resulted in 7/10 defect datasets studied retaining high privacy scores, while remaining useful for CPDP [13]. This is a startling result since research by Grechanik et al. [14] and Brickell et al. [15] showed that standard privacy methods increase privacy while decreasing data mining efficacy. While useful CLIFF and MORPH only considered a single- party scenario where each data owner privatized their data individually without considering privatized data from others. This resulted in privatized data that were directly proportional in size (number of instances) to the original data. Therefore, in a case where the size of the original data is small enough, any minimization might be meaningless, but if the size of the original data is large, minimization may not be enough to matter in practice. 1.3.3 Heterogeneous Defect Prediction (HDP) Existing CPDP approaches are based on the underlying assumption that both source and target project data should exhibit the same data distribution or are drawn from the same feature space (i.e., the same software metrics). When the distribution of the data changes, or when the metrics features for source and target projects are different, one cannot expect the resulting prediction performance to be satisfactory. We consider these scenarios as Heterogeneous Cross-Project Defect Prediction (HCPDP). Mostly, the software defect datasets are imbalanced, which
  • 19. 1.3 Research Directions of SDP 5 means the number of the defective modules is usually much smaller than that of the defective-free modules. The imbalanced nature of data can cause poor prediction performance. That is, the probability of defect prediction can be low, while the overall performance is high. Without taking this issue into account, the effectiveness of software defect prediction in many real-world tasks would be greatly reduced. Recently, some researchers have noticed the importance of these problems in software defect prediction. For example, Nam et al. [16] used the metrics selection and metrics matching to select similar metrics for building a prediction model with heterogeneous metrics set. They discarded dissimilar metrics, which may contain useful information for training. Jing et al. [17] introduced Canonical Correlation Analysis (CCA) into HCPDP, by constructing the common correlation space to associate cross-project data. Then, one can simply project the source and target project data into this space for defect prediction. Like previous CPDP methods, the class imbalance problem of software defect datasets was not taken into account. Ryu et al. [18] designed the Value-Cognitive Boosting with Support Vector Machine (VCB-SVM) algorithm which exploited sampling techniques to solve the class imbalance issue for cross-project environments. Nevertheless, sampling strategy alters the distribution of the original data, where it may discard some potentially useful samples that could be important for prediction process. Therefore, these methods are not good solutions for addressing the class imbalance issue under heterogeneous cross-project environments. 1.3.4 Other Research Questions of SDP 1.3.4.1 Cross-Version Defect Prediction Cross Version Defect Prediction (CVDP) is a practical scenario by training the classification model on the historical data of the prior version and then predicting the defect labels of modules of the current version. Bennin et al. [19] evaluated the defect prediction performance of 11 basic classification models in IVDP and CVDP scenarios with an effort-aware indicator. They conducted experiments on 25 projects (each one has two versions with process metrics) and found that the optimal models for the two defect prediction scenarios are not identical due to different data as the training set. However, the performance differences of the 11 models are not significant in both scenarios. Premraj et al. [20] investigated the impacts of code and network metrics on the defect prediction performance of six classification models. They considered three scenarios, including IVDP, CVDP, and CPDP. CPDP uses the defect data of another project as the training set. Experiments on three projects (each with two versions) suggested that the network metrics are better than the code metrics in most cases. Holschuh et al. [21] explored the performance of CVDP on a large software system by collecting four types of metrics. The experiments on six projects (each with three versions) showed that the overall performance is unsatisfactory. Monden et al. [22] evaluated the cost effectiveness of defect
  • 20. 6 1 Introduction prediction on three classification models by comparing seven test effort allocation strategies. The results on one project with five versions revealed that the reduction of test effort relied on the appropriate test strategy. Khoshgoftaar et al. [23] studied the performance of six classification models on one project with four versions and found that CART model with lease absolute deviation performed the best. Zhao et al. [24] investigated the relationship between the context-based cohesion metrics and the defect-proneness in IVDP and CVDP scenarios. They conducted CVDP study on four projects with total 19 versions and found that context-based cohesion metrics had negative impacts on defect prediction performance but can be complementary to non-context-based metrics. Yang et al. [25] surveyed the impacts of code, process, and slice-based cohesion metrics on defect prediction performance in IVDP, CVDP, and CPDP scenarios. They conducted CVDP study on one project with seven versions and found that slice-based cohesion metrics had adverse impacts on defect prediction performance but can be complementary to the commonly used metrics. Wang et al. [26] explored the performance of their proposed semantic metrics on defect prediction in CVDP and CPDP scenarios. The experiments on ten projects with 26 versions showed the superiority of the semantic metrics compared with traditional CK metrics and AST metrics. 1.3.4.2 Just-in-Time Defect Prediction Just-in-time defect prediction aims to predict if a particular file involved in a commit (i.e., a change) is buggy or not. Traditional just-in-time defect prediction techniques typically follow the following steps: Training Data Extraction. For each change, label it as buggy or clean by mining a project’s revision history and issue tracking system. Buggy change means the change contains bugs (one or more), while clean change means the change has no bug. Feature Extraction. Extract the values of various features from each change. Many different features have been used in past change classification studies. Model Learning. Build a model by using a classification algorithm based on the labeled changes and their corresponding features. Model Application. For a new change, extract the values of various features. Input these values to the learned model to predict whether the change is buggy or clean. The studies by Kamei et al. [32] are great source of inspiration for our work. They proposed a just-in-time quality assurance technique that predicts defects at commit- level trying to reduce the effort of a reviewer. Later on, they also evaluated how just-in-time models perform in the context of cross-project defect prediction [19]. Findings report good accuracy for the models not only in terms of both precision and recall but also in terms of saved inspection effort. Our work is complementary to these papers. In particular, we start from their basis of detecting defective commits and complement this model with the attributes necessary to filter only those files that are defect-prone and should be more thoroughly reviewed. Yang et al. [25] proposed
  • 21. 1.4 Notations and Corresponding Descriptions 7 the usage of alternative techniques for just-in-time quality assurance, such as cached history, deep learning, and textual analysis, reporting promising results. We did not investigate these further in the current chapter, but studies can be designed and carried out to determine if and how these techniques can be used within the model we present in this chapter to further increase its accuracy. 1.3.4.3 Effort-Aware Defect Prediction Traditional SDP models based on some binary classification algorithms are not sufficient for software testing in practice, since they do not distinguish between a module with many defects or high defect density (i.e., number of defects/lines of source codes) and a module with a small number of defects or low defect density. Clearly, both modules require a different amount of effort to inspect and fix, yet they are considered equal and allocated the same testing resources. Therefore, Mende et al. [27] proposed effort-aware defect prediction (EADP) models to rank software modules based on the possibility of these modules being defective, their predicted number of defects, or defect density. Generally, EADP models are constructed by using learning to rank techniques [28]. These techniques can be grouped into three categories, that is, the pointwise approach, the pairwise approach, and the listwise approach [29–31]. There exists a vast variety of learning to rank algorithms in literature. It is thus important to empirically and statistically compare the impact and effectiveness of different learning to rank algorithms for EADP. To the best of our knowledge, few prior studies [32–36] evaluated and compared the existing learning to rank algorithms for EADP. Most of these studies, however, conducted their study with few learning to rank algorithms across a small number of datasets. Previous studies [34–36] conducted their study with as many as five EADP models and few datasets. For example, Jiang et al. [34] investigated the performance of only five classification-based pointwise algorithms for EADP on two NASA datasets. Nguyen et al. [36] investigated three regression based pointwise algorithm and two pairwise algorithms for EADP on five Eclipse CVS datasets. 1.4 Notations and Corresponding Descriptions We will briefly introduce some of the symbols and abbreviations that appear in this book, as listed in the Table 1.1: Some parts are listed in the table, and the parts that are not listed will be made in the corresponding text: Detailed description.
  • 22. 8 1 Introduction Table 1.1 Symbols and corresponding descriptions Symbol/Abbreviation Description SDP Software Defect Prediction WPDP Within-Project Defect Prediction HCCDP Heterogeneous cross-company defect predecton CPDP Cross-Project Defect Prediction HDP Heterogeneous Defect Prediction CCA Canonical correlation analysis TKCCA Transfer kernel canonical correlation analysis CTKCCA Cost-sensitive transfer kernel canonical correlation analysis GQM Goal Question Metric ROC Receiver operating characteristic MDA A manifold embedded distribution adaptation SDA Subclass discriminant analysis .⇒ The source company data and the right side of “.⇒” represents the target company data .a = [a1, a2, . . . an] .a is a vector, and .ai is the .ith component .a The length of a vector .∈ An element belongs to a set .tr(·) The trace of a matrix 1.5 Structure of This Book In the second chapter of this book, several common learning algorithms and their applications in software defect prediction are briefly introduced, including deep learning, transfer learning, dictionary learning, semi-supervised learning, and multi- view learning. In Chap. 3, we discussed mainly about within-project defect prediction but first introduced basic WPDP including dictionary learning based software defect pre- diction, collaborative representation classification based software defect prediction and then introduced the sample-based software defect prediction with active and semi-supervised learning belonging to the semi-supervised WPDP. In Chap. 4, we expounded some methodologies on cross-project defect pre- diction, including basic CPDP, among which we introduced manifold embedded distribution adaptation; for class imbalance problem in CPDP, we proposed an improved SDA based defect prediction framework; finally, in semi-supervised CPDP, we introduced cost-sensitive kernelized semi-supervised dictionary learning. In Chap. 5, we introduce Heterogeneous Defect Prediction (HDP), first explain- ing unified metric representation and CCA-based transfer learning in basic HDP; then in class imbalance problem in HDP, we introduce cost-sensitive transfer kernel canonical correlation analysis. Finally, regarding multiple sources and privacy preservation problems in HDP, we have introduced multi-source selection based
  • 23. References 9 manifold discriminant alignment and sparse representation based double obfusca- tion algorithm. In Chap. 6, an empirical study on HDP approaches is introduced, including heterogeneous defect prediction and Goal Question Metric (GQM) based research methodology. Finally, in Chap. 7 of this book, we discuss other research questions of SDP, mainly including the following aspects: cross-version defect prediction, just-in-time defect prediction and effort-aware just-in-time defect prediction. References 1. Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto KI (2007) The Effects of Over and Under Sampling on Fault-prone Module Detection. In: Proceedings of the First International Symposium on Empirical Software Engineering and Measurement, pp 196–204. https://doi.org/10.1109/ESEM.2007.28 2. Nickerson A, Japkowicz N, Milios EE (2001) Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics. http://www.gatsby.ucl.ac.uk/aistats/aistats2001/files/ nickerson155.ps 3. Chawla NV (2010) Data Mining for Imbalanced Datasets: An Overview. In: Proceedings of the Data Mining and Knowledge Discovery Handbook, pp 875–886. https://doi.org/10.1007/ 978-0-387-09823-4_45 4. Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: Proceedings of the 26th IEEE International Conference on Software Maintenance, pp 1–10. https://doi.org/10.1109/ICSM. 2010.5609530 5. Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660. https://doi.org/10.1016/j.jss.2007.07.040 6. Lu H, Kocaguneli E, Cukic B (2014) Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction. In: Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering, pp 312–322. https://doi.org/10.1109/ISSRE. 2014.35 7. Li M, Zhang H, Wu R, Zhou Z (2012) Sample-based software defect prediction with active and semi-supervised learning. Autom Softw Eng 19(2):201–230. https://doi.org/10.1007/s10515- 011-0092-1 8. Rodríguez D, Herraiz I, Harrison R, Dolado JJ, Riquelme JC (2014) Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, pp 43:1–43:10. https://doi.org/10.1145/2601248.2601294 9. Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2007) An Empirical Study of the Classification Performance of Learners on Imbalanced and Noisy Software Quality Data. In: Proceedings of the IEEE International Conference on Information Reuse and Integration, pp 651–658. https://doi.org/10.1109/IRI.2007.4296694 10. Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir Softw Eng 13(5):539–559. https:// doi.org/10.1007/s10664-008-9082-8 11. Menzies T, El-Rawas O, Hihn J, Feather MS, Madachy RJ, Boehm BW (2007) The business case for automated software engineering. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering ASE 2007, pp 303–312. https://doi.org/10. 1145/1321631.1321676
  • 24. 10 1 Introduction 12. Peters F, Menzies T (2012) Privacy and utility for defect prediction: Experiments with MORPH. In: Proceedings of the 34th International Conference on Software Engineering, pp 189–199. https://doi.org/10.1109/ICSE.2012.6227194 13. Peters F, Menzies T, Gong L, Zhang H (2013) Balancing Privacy and Utility in Cross-Company Defect Prediction. IEEE Trans Software Eng 39(8):1054–1068. https://doi.org/10.1109/TSE. 2013.6 14. Grechanik M, Csallner C, Fu C, Xie Q (2010) Is Data Privacy Always Good for Software Testing?. In: Proceedings of the IEEE 21st International Symposium on Software Reliability Engineering, pp 368–377. https://doi.org/10.1109/ISSRE.2010.13 15. Brickell J, Shmatikov V (2008) The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 70–78. https://doi.org/10.1145/ 1401890.1401904 16. Nam J, Kim S (2015) Heterogeneous defect prediction. In: Proceedings of the 10th Joint Meet- ing on Foundations of Software Engineering, pp 508–519. https://doi.org/10.1145/2786805. 2786814 17. Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pp 496–507. https://doi.org/10. 1145/2786805.2786813 18. Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71. https://doi.org/10.1007/ s10664-014-9346-4 19. Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models. In: Proceedings of the 2016 IEEE International Conference on Software Quality, pp 214–221. https://doi.org/10.1109/QRS.2016. 33 20. Premraj R, Herzig K (2011) Network Versus Code Metrics to Predict Defects: A Replication Study. In: Proceedings of the 5th International Symposium on Empirical Software Engineering and Measurement, pp 215–224. https://doi.org/10.1109/ESEM.2011.30 21. Holschuh T, Pauser M, Herzig K, Zimmermann T, Premraj R, Zeller A (2009) Predicting defects in SAP Java code: An experience report. In: Proceedings of the 31st International Con- ference on Software Engineering, pp 172–181. https://doi.org/10.1109/ICSE-COMPANION. 2009.5070975 22. Monden A, Hayashi T, Shinoda S, Shirai K, Yoshida J, Barker M, Matsumoto K (2013) Assessing the Cost Effectiveness of Fault Prediction in Acceptance Testing. IEEE Trans Softw Eng 39(10):1345–1357. https://doi.org/10.1109/TSE.2013.21 23. Khoshgoftaar TM, Seliya N (2003) Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques Empir. Softw Eng 8(3):255–283. https://doi.org/10. 1023/A:1024424811345 24. Zhao Y, Yang Y, Lu H, Liu J, Leung H, Wu Y, Zhou Y, Xu B (2017) Understanding the value of considering client usage context in package cohesion for fault-proneness prediction Autom. Softw Eng 24(2):393–453. https://doi.org/10.1007/s10515-016-0198-6 25. Yang Y, Zhou Y, Lu H, Chen L, Chen Z, Xu B, Leung HKN, Zhang Z (2015) Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study IEEE Trans. Softw Eng 41(4):331–357. https://doi.org/10.1109/TSE.2014. 2370048 26. Wang S, Liu T, Tan L (2016) Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International Conference on Software Engineering, pp 297–308. https://doi.org/10.1145/2884781.2884804 27. Mende T, Koschke R (2010) Effort-Aware Defect Prediction Models. In: Proceedings of the 14th European Conference on Software Maintenance and Reengineering, pp 107–116. https:// doi.org/10.1109/CSMR.2010.18
  • 25. References 11 28. Wang F, Huang J, Ma Y (2018) A Top-k Learning to Rank Approach to Cross-Project Software Defect Prediction. In: Proceedings of the 25th Asia-Pacific Software Engineering Conference, pp 335–344. https://doi.org/10.1109/APSEC.2018.00048 29. Shi Z, Keung J, Bennin KE, Zhang X (2018) Comparing learning to rank techniques in hybrid bug localization. Appl Soft Comput 62636-648. https://doi.org/10.1016/j.asoc.2017.10.048 30. Liu T (2010) Learning to rank for information retrieval. In: Proceedings of the Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 904. https://doi.org/10.1145/1835449.1835676 31. Yu X, Li Q, Liu J (2019) Scalable and parallel sequential pattern mining using spark. World Wide Web 22(1):295–324. https://doi.org/10.1007/s11280-018-0566-1 32. Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models. In: Proceedings of the 2016 IEEE International Conference on Software Quality, pp 214–221. https://doi.org/10.1109/QRS.2016. 33 33. Yang X, Wen W (2018) Ridge and Lasso Regression Models for Cross-Version Defect Prediction. IEEE Trans Reliab 67(3):885–896. https://doi.org/10.1109/TR.2018.2847353 34. Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595. https://doi.org/10.1007/s10664-008-9079-3 35. Mende T, Koschke R (2009) Revisiting the evaluation of defect prediction models. In: Proceedings of the 5th International Workshop on Predictive Models in Software Engineering, pp 7. https://doi.org/10.1145/1540438.1540448 36. Nguyen TT, An TQ, Hai VT, Phuong TM (2014) Similarity-based and rank-based defect prediction. In: Proceedings of the 2014 International Conference on Advanced Technologies for Communications (ATC 2014), pp 321–325.
  • 26. Chapter 2 Machine Learning Techniques for Intelligent SDP Abstract In this chapter, several common learning algorithms and their applica- tions in software defect prediction are briefly introduced, including deep learning, transfer learning, dictionary learning, semi-supervised learning, and multi-view learning. 2.1 Transfer Learning In many real world applications, it is expensive or impossible to recollect the needed training data and rebuild the models. It would be nice to reduce the need and effort to recollect the training data. In such cases, transfer learning (TL) between task domains would be desirable. Transfer learning exploits the knowledge gained from a previous task to improve generalization on another related task. Transfer learning can be useful when there is not enough labeled data for the new problem or when the computational cost of training a model from scratch is too high. Traditional data mining and machine learning algorithms make predictions on the future data using statistical models that are trained on previously collected labeled or unlabeled training data. Most of them assume that the distributions of the labeled and unlabeled data are the same. Transfer learning (TL), in contrast, allows the domains, tasks, and distributions used in training and testing to be different. It is used to improve a learner from one domain by transferring information from a related domain. Research on transfer learning has attracted more and more attention since 1995. Today, transfer learning methods appear in several top venues, most notably in data mining and applications of machine learning and data mining Due to their strong ability of domain adaptation, researchers introduce TL techniques to cross-project or heterogeneous defect prediction in recent years. The application of TL in cross-project defect prediction (CPDP) aims to reduce the distribution difference between source and target data. For example, Nam et al. [1] proposed a new CPDP method called TCA+, which extends transfer component analysis (TCA) by introducing a set of rules for selecting an appropriate normalization method to obtain better CPDP performance. Krishna and Menzies [2] introduced a baseline method named Bellwether for cross-project defect prediction © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 X.-Y. Jing et al., Intelligent Software Defect Prediction, https://doi.org/10.1007/978-981-99-2842-2_2 13
  • 27. 14 2 Machine Learning Techniques for Intelligent SDP based on existing CPDP methods. For heterogeneous defect prediction (HDP), TL techniques are applied not only to reduce the distribution difference between source and target data but also to eliminate the heterogeneity of metrics between source and target projects. Jing et al. [3] proposed an HDP method named CCA+, which uses the canonical correlation analysis (CCA) technique and the unified metric representation (UMR) to find the latent common feature space between the source and target projects. Specifically, the UMR is made of three kinds of metrics, including the common metrics of the source and target data, source-specific metrics, and target-specific metrics. Based on UMR, the transfer learning method based on CCA is introduced to find common metrics by maximizing the canonical correlation coefficient between source and target data. 2.2 Deep Learning Deep learning (DL) is an extension of prior work on neural networks where the “deep” refers to the use of multiple layers in the network. In the 1960s and 1970s, it was found that very simple neural nets can be poor classifiers unless they are extended with (a) extra layers between inputs and outputs and (b) a nonlinear activation function controlling links from inputs to a hidden layer (which can be very wide) to an output layer. Essentially, deep learning is a modern variation on the above which is concerned with a potentially unbounded number of layers of bounded size. In the last century, most neural networks used the “sigmoid” activation function .f (x) = 1 1+e−x , which was subpar to other learners in several tasks. It was only when the ReLU activation function .f (x) = max(0, x) was introduced by Nair and Hinton [4] that their performance increased dramatically, and they became popular. With its strong representation learning ability, deep learning technology has quickly gained favor in the field of software engineering. In software defect prediction (SDP), researchers began to use DL techniques to extract deep features of defect data. Wang et al. [6] first introduced the Deep Belief Network (DBN) [5] that learns semantic features and then uses classical learners to perform defect prediction. In this approach, for each file in the source code, they extract tokens, disregarding ones that do not affect the semantics of the code, such as variable names. These tokens are vectorized and given unique numbers, forming a vector of integers for each source file. Wen et al. [7] utilized Recurrent Neural Network (RNN) to encode features from sequence data automatically. They propose a novel approach called FENCES, which extracts six types of change sequences covering different aspects of software changes via fine-grained change analysis. It approaches defect prediction by mapping it to a sequence labeling problem solvable by RNN.
  • 28. 2.3 Other Techniques 15 2.3 Other Techniques 2.3.1 Dictionary Learning Both sparse representation and dictionary learning have been successfully applied to many application fields, including image clustering, compressed sensing as well as image classification tasks. In sparse representation based classification, the dictionary for sparse coding could be predefined. For example, Wright et al. [8] directly used the training samples of all classes as the dictionary to code the query face image and classified the query face image by evaluating which class leads to the minimal reconstruction error. However, the dictionary in his method may not be effective enough to represent the query images due to the uncertain and noisy information in the original training images. In addition, the number of atoms of dictionary that is made up of image samples can also be very large, which increases the coding complexity. Dictionary learning (DL) aims to learn from the training samples’ space where the given signal could be well represented or coded for processing. Most DL methods attempt to learn a common dictionary shared by all classes as well as a classifier of coefficients for classification. Usually, the dictionary can be constructed by directly using the original training samples, whereas the original samples have much redundancy and noise, which are adverse to prediction. For the purpose of further improving the classification ability, DL techniques have been adopted in SDP tasks recently to represent project modules well. For example, Jing et al. [14] are the first to apply the DL technology to the field of software defect prediction and proposed a cost-sensitive discriminative dictionary learning (CDDL) approach. Specifically, CDDL introduces misclassification costs and builds the over-complete dictionary for software project modules. 2.3.2 Semi-Supervised Learning Due to the lack of labeled data, Semi-Supervised Learning (SSL) has always been a hot topic in machine learning. A myriad of SSL methods have been proposed. For example, co-training is a well-known disagreement-based SSL method, which trains different learners to exploit unlabeled data. Pseudo-label style methods label unlabeled data with pseudo labels. Graph-based methods aim to construct a similarity graph, through which label information propagates to unlabeled nodes. Local smoothness regularization-based methods represent another widely recog- nized category of semi-supervised learning (SSL) techniques, which leverage the inherent structure of the data to improve learning accuracy. Different methods apply different regularizers, such as Laplacian regularization, manifold regularization, and virtual adversarial regularization. For example, Miyato et al. [11] proposed a smooth regularization method called virtual adversarial training, which enables the model
  • 29. 16 2 Machine Learning Techniques for Intelligent SDP to output a smooth label distribution for local perturbations of a given input. There are other popular methods, for example, Ladder Network. Since large unlabeled data exist in software projects, many SSL techniques have been considered in SDP tasks. Wang et al. [9] proposed a non-negative sparse-based semiboost learning approach for software defect prediction. Benefit from the idea of semi-supervised learning, this approach is capable of exploiting both labeled and unlabeled data and is formulated in a boosting framework. Besides, Zhang et al. [10] used graph-based semi-supervised learning technique to predict software defect. This approach utilizes not only few labeled data but also abundant unlabeled ones to improve the generalization capability. 2.3.3 Multi-View Learning Representation learning is a prerequisite step in many multi-view learning tasks. In recent years, a variety of classical multi-view representation learning methods have been proposed. These methods follow the previously presented taxonomy, that is, joint representation, alignment representation, as well as shared and specific representation. For example, based on Markov network, Chen et al. [12] presented a large-margin predictive multi-view subspace learning method, which joints features learned from multiple views. Jing et al. [13] proposed an intra-view and inter-view supervised correlation analysis method for image classification, in which CCA was applied to align multi-view features. Deep multi-view representation learning works also follow the joint repre- sentation, alignment representation, as well as shared and specific representation classification paradigm. For example, Kan et al. [14] proposed a multi-view deep network for cross-view classification. This network first extracts view-specific features with a sub-network and then concatenates and feeds these features into a common network, which is designed to project them into one uniform space. Harwath et al. [15] presented an unsupervised audiovisual matchmap neural net- work, which applies similarity metric and pairwise ranking criterion to align visual objects and spoken words. Hu et al. [16] introduced a sharable and individual multi- view deep metric learning method. It leverages view-specific networks to extract individual features from each view and employs a common network to extract shared features from all views. References 1. Nam, Jaechang and Pan, Sinno Jialin and Kim, Sunghun. Transfer defect learning. 35th international conference on software engineering (ICSE), 382–391, 2013. 2. Krishna, Rahul and Menzies, Tim. Bellwethers: A baseline method for transfer learning. IEEE Transactions on Software Engineering, 45(11):1081–1105, 2018.
  • 30. References 17 3. Jing, Xiaoyuan and Wu, Fei and Dong, Xiwei and Qi, Fumin and Xu, Baowen. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 2015 10th joint meeting on foundations of software engineering, pages 496–507, 2015. 4. Nair, Vinod and Hinton, Geoffrey E. Rectified linear units improve restricted Boltzmann machines. In Icml’10, 2010. 5. Hinton, Geoffrey E. Deep belief networks. Scholarpedia, 4(5):5947, 2009. 6. Wang, Song and Liu, Taiyue and Tan, Lin. Automatically learning semantic features for defect prediction. In IEEE/ACM 38th International Conference on Software Engineering (ICSE), pages 297–308, 2016. 7. Wen, Ming and Wu, Rongxin and Cheung, Shing-Chi. How well do change sequences predict defects? sequence learning from software changes. IEEE Transactions on Software Engineering, 46(11):1155–1175, 2018. 8. Wright, John and Yang, Allen Y and Ganesh, Arvind and Sastry, S Shankar and Ma, Yi. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, 31(2):210–227, 2008. 9. Wang, Tiejian and Zhang, Zhiwu and Jing, Xiaoyuan and Liu, Yanli. Non-negative sparse- based SemiBoost for software defect prediction. Software Testing, Verification and Reliability, 26(7):498–515, 2016. 10. Zhang, Zhi-Wu and Jing, Xiao-Yuan and Wang, Tie-Jian. Label propagation based semi- supervised learning for software defect prediction. Automated Software Engineering, 24(7):47–69, 2017. 11. Miyato, Takeru and Maeda, Shin-ichi and Koyama, Masanori and Ishii, Shin. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018. 12. Chen, Ning and Zhu, Jun and Sun, Fuchun and Xing, Eric Poe. Large-margin predictive latent subspace learning for multiview data analysis. IEEE transactions on pattern analysis and machine intelligence, 34(12):2365–2378, 2012. 13. Jing, Xiao-Yuan and Hu, Rui-Min and Zhu, Yang-Ping and Wu, Shan-Shan and Liang, Chao and Yang, Jing-Yu. Intra-view and inter-view supervised correlation analysis for multi-view feature learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1882– 1889, 2014. 14. Kan, Meina and Shan, Shiguang and Chen, Xilin. Multi-view deep network for cross-view classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4847–4855, 2016. 15. Harwath, David and Torralba, Antonio and Glass, James. Unsupervised learning of spoken language with visual context. In Advances in Neural Information Processing Systems, pages 1858–1866, 2016. 16. Hu, Junlin and Lu, Jiwen and Tan, Yap-Peng. Sharable and individual multi-view metric learning. IEEE transactions on pattern analysis and machine intelligence, 40(9):2281–2288, 2017.
  • 31. Chapter 3 Within-Project Defect Prediction Abstract In order to improve the quality of a software system, software defect prediction aims to automatically identify defective software modules for efficient software test. To predict software defect, those classification methods with static code attributes have attracted a great deal of attention. In recent years, machine learning techniques have been applied to defect prediction. Due to the fact that there exists the similarity among different software modules, one software module can be approximately represented by a small proportion of other modules. And the representation coefficients over the pre-defined dictionary, which consists of historical software module data, are generally sparse. We propose a cost-sensitive discriminative dictionary learning (CDDL) approach for software defect classifica- tion and prediction. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that CDDL outperforms several representative state-of-the-art defect prediction methods. 3.1 Basic WPDP 3.1.1 Dictionary Learning Based Software Defect Prediction 3.1.1.1 Methodology To fully exploit the discriminative information of training samples for improving the performance of classification, we design a supervised dictionary learning approach, which learns a dictionary that can represent the given software module more effectively. Moreover, the supervised dictionary learning can also reduce both the number of dictionary atoms and the sparse coding complexity. Instead of learning a shared dictionary for all classes, we learn a structured dictionary .D = [D1, . . . , Di, . . . , Dc], where .Di is the class-specified sub-dictionary associated with class i, and c is the total number of classes. We use the reconstruction error to do classification with such a dictionary D, as the SRC method does. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 X.-Y. Jing et al., Intelligent Software Defect Prediction, https://doi.org/10.1007/978-981-99-2842-2_3 19
  • 32. 20 3 Within-Project Defect Prediction Suppose that .A = [A1, . . . , Ai, . . . , Ac] is the set of training samples (labeled software modules), .Ai is the subset of the training samples from class i, .X = [X1, . . . , Xi, . . . , Xc] is the coding coefficient matrix of A over D, that is,.A ≈ DX, where .Xi is the sub-matrix containing the coding coefficients of .Ai over D. We require that D should have not only powerful reconstruction capability of A but also powerful discriminative capability of classes in A. Thus, we propose the cost- sensitive discriminative dictionary learning (CDDL) model as follows: .J(D,X) = arg min (D,X) {r(A, D, X) + ‖X‖1} (3.1) where.r(A, D, X) is the discriminative fidelity term;.‖X‖1 is the sparsity constraint; .λ is a balance factor. Let .Xi = [X1 i , X2 i , Xc i ], where .X j i is the coding coefficient matrix of .Ai over the sub-dictionary .Dj . Denote the representation of .Dk to .Ai as .Rk = DkXk i . First of all, the dictionary D should be able to well represent .Ai, and, therefore, .Ai ≈ DXi = D1X1 i +· · ·+DiXi i +· · ·+DcXc i . Secondly, since .Di is associated with the ith class, it is expected that .Ai should be well represented by .Di (not by .Dj , j /= i), which means both . Ai − DiXi i 2 F and . Dj X j i 2 F should be minimized. Thus the discriminative fidelity term is . r(A, D, X) = c i=1 r (Ai, D, Xi) = c i=1 ⎛ ⎜ ⎝‖Ai − DXi‖2 F + Ai − DiXi i 2 F + c j=1 j/=i Dj X j i 2 F ⎞ ⎟ ⎠ (3.2) An intuitive explanation of three terms in .r(Ai, D, Xi) is shown in Fig. 3.1. In software defect prediction, there are two kinds of modules: the defective modules and the defective-free modules. Figure 3.1a shows that if we only minimize Fig. 3.1 Illustration of the discriminative fidelity term
  • 33. 3.1 Basic WPDP 21 the .‖Ai − DXi‖2 F on the total dictionary D, .Ri may deviate much from .Ai so that sub-dictionary .Di could not well represent .Ai. In order to achieve better powerful reconstruction capability and powerful discriminative capability, we add another two parts. Ai − DiXi i 2 F (which minimizes the reconstruction error on sub- dictionary of its own class) and . Dj X j i 2 F (which minimizes the reconstruction term on sub-dictionary of the other class); both of them should also be minimized. Figure 3.1b shows that the proposed discriminative fidelity term could overcome the problem in Fig. 3.1a. As previously stated, misclassifying defective-free modules leads to increasing the development cost, and misclassifying defective ones is related with risk cost. Cost-sensitive learning can incorporate the different misclassification costs into the classification process. In this section, we emphasize the risk cost such that we add the penalty factor .cost(i, j) to increase the punishment when a defective software module is predicted as a defective-free software module. As a result, cost-sensitive dictionary learning makes the prediction incline to classify a module as a defective one and generates a dictionary for classification with minimum misclassification cost. The discriminative fidelity term with penalty factors is . r(A, D, X) = c i=1 r (Ai, D, Xi) = c i=1 ⎡ ⎣‖Ai − DXi‖2 F + Ai − DiXi i 2 F + c j=1 cost(i, j) Dj X j i 2 F ⎤ ⎦ (3.3) Since there are only two classes in software defect prediction (the defective class and the defective-free class), that is, .c = 2, the model of cost-sensitive discriminative dictionary learning is . J(D,X) = arg min (D,X) 2 i=1 ‖Ai − DXi‖2 F + Ai − DiXi i 2 F + 2 j=1 cost(i, j) Dj X j i 2 F ⎤ ⎦ + λ‖X‖1 ⎫ ⎬ ⎭ (3.4) where the cost matrix is shown in Table 3.1. Table 3.1 Cost matrix for CDDL Predicts defective one Predicts defective-free one Actually defective 0 .cost(1, 2) Actually defective-free .cost(2, 1) 0
  • 34. 22 3 Within-Project Defect Prediction The CDDL objective function in Formula 3.4 can be divided into two sub- problems: updating X by fixing D and updating D by fixing X. The optimization procedure is iteratively implemented for the desired discriminative dictionary D and corresponding coefficient matrix X. At first, suppose that D is fixed, the objective function in formula is reduced to a sparse coding problem to compute .X = [X1, X2]. Here .X1 and .X2 are calculated one by one. We calculate .X1 with fixed .X2 and then compute .X2 with fixed .X1. Thus, formula is rewritten as . J(Xi) = arg min (Xi) ‖Ai − DXi‖2 F + Ai − DiXi i 2 F + 2 j=1 cost(i, j) Dj X j i 2 F + λ ‖Xi‖1 ⎫ ⎬ ⎭ (3.5) Formula 3.5 can be solved by using the IPM algorithm in [1]. When X is fixed, we in turn update .D1 and .D2. When we calculate .D1, .D2 is fixed, then we compute .D2, .D1 is fixed. Thus Formula 3.4 is rewritten as . J(Di) = arg min (Di) ⎧ ⎪ ⎨ ⎪ ⎩ −DiXi − 2 j=1 j/=i Dj Xj 2 F + Ai − DiXi i 2 F + 2 j=1 cos(i, j) Dj X j i 2 F ⎫ ⎬ ⎭ (3.6) where .Xi is the coding coefficient matrix of A over .Di. Formula 3.6 is a quadratic programming problem, and we can solve it by using the algorithm in [2]. By utilizing the PCA technique, we are able to initialize the sub-dictionary for each class. Given the low data dimension of software defect prediction, PCA can create a fully initialized sub-dictionary for every class. This means that all sub- dictionaries have an equal number of atoms, which is generally equivalent to the data dimension. The algorithm of CDDL converges since its two alternative optimizations are both convex. Figure 3.2 illustrates the convergence of the algorithm. 3.1.1.2 Experiments To evaluate our CDDL approach, we conduct some experiments. For all selected datasets, we use the 1:1 random division to obtain the training and testing sets for all compared methods. The random division treatment may affect the prediction performance. Therefore, we use the random division, perform prediction 20 times, and report the average prediction results in the following discussions.
  • 35. 3.1 Basic WPDP 23 0 1 5 10 Iteration number 15 20 25 30 1 5 10 Iteration number 15 20 25 30 1 5 10 Iteration number 15 20 25 30 1 5 10 Iteration number 15 20 25 30 0 2 4 6 8 10 0.5 1 Total objective function value Total objective function value 1.5 2 0 0.5 1 Total objective function value 1.5 2 0 1 3 2 Total objective function value 4 5 a b c d Fig. 3.2 Convergence of the realization algorithm of CDDL on four NASA benchmark datasets. (a) CM1 dataset. (b) KC1 dataset. (c) MW1 dataset. (d) PC1 dataset In our approach, in order to emphasize the risk cost, the parameters cost (1,2) and cost (2,1) are set as 1:5. For various projects, users can select a different cost ratio, such as cost(1,2) to cost(2,1) [3]. And the parameter is determined by searching a wide range of values and choosing the one that yields the best F-measure value. We compare the proposed CDDL approach with several representative methods, particularly presented in the last five years, including support vector machine (SVM) [4], Compressed C4.5 decision tree (CC4.5) [5], weighted Naïve Bayes (NB) [6], coding based ensemble learning (CEL) [7], and cost-sensitive boosting neural network (CBNN) [8]. In this section, we present the detailed experimental results of our CDDL approach and other compared methods. 3.1.1.3 Discussions Table 3.2 shows the Pd and Pf values of our approach and other compared methods on 10 NASA datasets. For each dataset, Pd and Pf values of all methods are the mean values calculated from the results of 20 runs. The results of Pf values suggest that in spite of not acquiring the best Pf values on most datasets, CDDL can achieve
  • 36. 24 3 Within-Project Defect Prediction Table 3.2 Experimental results: Pd and Pf comparisons on NASA’s ten datasets Dataset M SVM CC4.5 NB CEL CBNN CDDL CM1 Pd 0.15 0.26 0.44 0.43 0.59 0.74 Pf 0.04 0.11 0.18 0.15 0.29 0.37 JM1 Pd 0.53 0.37 0.14 0.32 0.54 0.68 Pf 0.45 0.17 0.32 0.14 0.29 0.35 KC1 Pd 0.19 0.40 0.31 0.37 0.69 0.81 Pf 0.02 0.12 0.06 0.13 0.30 0.37 KC3 Pd 0.33 0.41 0.46 0.29 0.51 0.71 Pf 0.08 0.16 0.21 0.12 0.25 0.34 MC2 Pd 0.51 0.64 0.35 0.56 0.79 0.83 Pf 0.24 0.49 0.09 0.38 0.54 0.29 MW1 Pd 0.21 0.29 0.49 0.25 0.61 0.79 Pf 0.04 0.09 0.19 0.11 0.25 0.25 PC1 Pd 0.66 0.38 0.36 0.46 0.54 0.86 Pf 0.19 0.09 0.11 0.13 0.17 0.29 PC3 Pd 0.64 0.34 0.28 0.41 0.65 0.77 Pf 0.41 0.08 0.09 0.13 0.25 0.28 PC4 Pd 0.72 0.49 0.39 0.48 0.66 0.89 Pf 0.16 0.07 0.13 0.06 0.18 0.28 PC5 Pd 0.71 0.50 0.32 0.37 0.79 0.84 Pf 0.22 0.02 0.14 0.13 0.08 0.06 Table 3.3 Average Pd value of 10 NASA datasets SVM CC4.5 NB CEL CBNN CDDL Average .0.47 .0.41 .0.35 .0.39 .0.64 .0.79 comparatively better results in contrast with other methods. We can also observe that the Pd values of CDDL, which are presented with boldface, are higher than the corresponding values of all other methods. CDDL achieves the highest Pd values on all datasets. The results indicate that the proposed CDDL approach takes the misclassification costs into consideration, which makes the prediction tend to classify the defective-free modules as the defective ones in order to obtain higher Pd values. We calculate the average Pd values of 10 NASA datasets in Table 3.3. As compared with other methods, the average Pd value of our approach is higher in contrast with other related methods, and CDDL improves the average Pd value at least by .0.15(= 0.79 − 0.64). Table 3.4 shows the F-measure values of our approach and the compared methods on 10 NASA datasets. In Table 3.4, F-measure values of CDDL are better than other methods on all datasets, which means that our proposed approach outperforms other methods and achieves the ideal prediction effects. According to the average F- measure values shown in Table 3.4, CDDL improves the average F-measure value at
  • 37. 3.1 Basic WPDP 25 Table 3.4 F-measure values on ten NASA datasets Datasets SVM CC4.5 NB CEL CBNN CDDL CM1 .0.20 .0.25 .0.32 .0.27 .0.33 .0.38 JM1 .0.29 .0.34 .0.33 .0.33 .0.38 .0.40 KC1 .0.29 .0.39 .0.38 .0.36 .0.41 .0.47 KC3 .0.38 .0.38 .0.38 .0.33 .0.38 .0.44 MC2 .0.52 .0.48 .0.45 .0.49 .0.56 .0.63 MW1 .0.27 .0.27 .0.31 .0.27 .0.33 .0.38 PC1 .0.35 .0.32 .0.28 .0.32 .0.32 .0.41 PC3 .0.28 .0.29 .0.29 .0.36 .0.38 .0.42 PC4 .0.47 .0.49 .0.36 .0.48 .0.46 .0.55 PC5 .0.16 .0.48 .0.33 .0.36 .0.37 .0.59 Average .0.32 .0.37 .0.34 .0.35 .0.39 .0.47 Table 3.5 P -values between CDDL and other compared methods on ten NASA datasets .CDDL Dataset .s .SVM .CC4.5 .NB .CEL .CBNN .CM1 .1.23 × 10−8 .3.51 × 10−6 .4.24 × 10−4 .1.80 × 10−4 .1.01 × 10−4 .JM1 .7.51 × 10−18 .2.33 × 10−13 .1.27 × 10−14 .1.58 × 10−13 .0.0564 .KC1 .1.20 × 10−14 .1.23 × 10−9 .8.38 × 10−13 .2.80 × 10−11 .9.69 × 10−6 .KC3 .0.0265 .0.0089 .3.22 × 10−4 .1.61 × 10−4 .4.24 × 10−4 .MC2 .1.26 × 10−4 .2.61 × 10−5 .1.13 × 10−8 .7.58 × 10−6 .1.01 × 10−4 .MW1 .1.14 × 10−3 .2.31 × 10−4 .1.10 × 10−3 .1.84 × 10−5 .2.20 × 10−3 .PC1 .2.64 × 10−4 .2.41 × 10−5 .1.60 × 10−8 .1.69 × 10−5 .1.68 × 10−8 .PC3 .7.79 × 10−14 .7.73 × 10−9 .1.04 × 10−8 .4.03 × 10−5 .4.31 × 10−5 .PC4 .7.32 × 10−8 .7.26 × 10−4 .2.81 × 10−16 .4.26 × 10−6 .1.75 × 10−10 .PC5 .3.01 × 10−18 .7.00 × 10−9 .1.90 × 10−14 .1.30 × 10−12 .2.13 × 10−11 least by (.0.47−0.39 = 0.08). To sum up, Tables 3.3 and 3.4 show that our approach has the best achievement in the Pd and F-measure values. To statistically analyze the F-measure results given in Table 3.4, we conduct a sta- tistical test, that is, Mcnemar’s test [9]. This test can provide statistical significance between CDDL and other methods. Here, the Mcnemar’s test uses a significance level of 0.05. If the p-value is below 0.05, the performance difference between two compared methods is considered to be statistically significant. Table 3.5 shows the p-values between CDDL and other compared methods on 10 NASA datasets, where only one value is slightly above 0.05. According to Table 3.5, the proposed approach indeed makes a significant difference in comparison with other methods for software defect prediction.
  • 38. 26 3 Within-Project Defect Prediction 3.1.2 Collaborative Representation Classification Based Software Defect Prediction 3.1.2.1 Methodology Figure 3.3 shows the flowchart of defect prediction in our approach, which includes three steps. The first step is Laplace sampling process for the defective-free modules to construct the training dataset. Second, the prediction models is trained by using the CRC based learner. Finally, the CRC based predictor classifies whether new modules are defective or defective-free. In the metric based software defect prediction, the number of defective-free modules is much larger than that of defective ones, that is, the class imbalance problem may occur. In this section, we conduct the Laplace score sampling for training samples, which solves the class imbalance problem effectively. Sparse representation classification (SRC) represents a testing sample collabo- ratively by samples of all classes. In SRC, there are enough training samples for each class so that the dictionary is over-completed. Unfortunately, the number of defective modules is usually much small. If we use this under-complete dictionary to represent a defective module, the representation error may be much big and the classification will be unstable. Fortunately, one fact in software defect prediction is that software modules share similarities. Some samples from one class may be very helpful to represent the testing sample of other classes. In CRC, this “lack of samples” problem is solved by taking the software modules from the other class as the possible samples of each class. The main idea of CRC technique is that information of a signal can be collaboratively represented by a linear combination of a few elementary signals. We utilize .A = [A1, A2] ∈ Rm×n to denote the set of training samples which is processed by Laplace sampling, and y denotes a testing sample. In order to collaboratively represent the query sample using A with low computational burden, we use the regularized least square method as follows: .X̂ = arg min X ‖y − A · X‖2 2 + λ‖X‖2 2 (3.7) Software Defect Database Laplace Sampling Training Instances Test Instances Building a Prediction Model CRC_RLS Prediction Prediction Results (defective/defective-free) CRC Based Learner CRC Based Predictor Fig. 3.3 CRC based software defect prediction flowchart
  • 39. 3.1 Basic WPDP 27 where .λ is the regularization parameter. The role of the regularization term is twofold. First, it makes the least square solution stable. Second, it introduces a certain amount of “sparsity” to the solution .X̂ while this sparsity is much weaker than that by .l1-norm. The solution of collaborative representation with regularized least square in Eq. 3.7 can be easily and analytically derived as .X̂ = AT A + λ · I −1 AT y (3.8) Let .P = (AT A+λ·I)−1AT . Clearly, P is independent of y so that it can be pre- calculated as a projection matrix. Hence, a query sample y can be simply projected onto P via Py, which makes the collaborative representation very fast. After training the CRC based learner, we can use the collaborative representation classification with regularized least square .CRCRLS algorithm to do prediction. For a test sample y, we code y over A and get .X̂. In addition to the class specific representation residual . y − Ai · X̂i 2 , where .X̂i is the coefficient vector associated with class i (.i = 1, 2), the .l2-norm “sparsity” . X̂i 2 can also bring some discrimination information for classification. Thus we use both of them in classification and calculate the regularized residual of each class by using .ri = y − Ai · X̂i 2 / X̂i 2 . The test sample y is assigned to the ith class corresponding to the smallest regularized residual .ri. 3.1.2.2 Experiments In the experiment, ten datasets from NASA Metrics Data Program are taken as the test data. We compare the proposed approach with several representative software defect prediction methods, including Compressed C4.5 decision tree (CC4.5), weighted Naïve Bayes (NB), cost-sensitive boosting neural network (CBNN), and coding based ensemble learning (CEL). 3.1.2.3 Discussions We use recall (Pd), false positive rate (Pf), precision (Pre), and F-measure as prediction accuracy evaluation indexes. A good prediction model desires to achieve high value of recall rate and precision. However, there exists trade-off between precision and recall. F-measure is the harmonic mean of precision and recall rate. Note that these quality indexes are commonly used in the field of software defect prediction. Table 3.6 shows the average Pd, Pf, Pre, and F-measure values of our CSDP approach and other compared methods on ten NASA datasets, where each value is the mean of 20 random runs. Our approach can acquire better prediction
  • 40. 28 3 Within-Project Defect Prediction Table 3.6 Average Pd, Pf, Pre, and F-measure values of 20 random runs on ten NASA datasets Evaluation Prediction methods indexes CC4.5 NB CEL CBNN CSDP Pd .0.408 .0.354 .0.394 .0.637 .0.745 Pf .0.140 .0.152 .0.148 .0.260 .0.211 Pre .0.342 .0.347 .0.324 .0.288 .0.343 F-measure .0.371 .0.342 .0.354 .0.390 .0.465 accuracy than other methods. In particular, our approach improves the average Pd at least by 16.95% (.= (0.745 − 0.637)/0.637) and the average F-measure at least by 19.23% (.= (0.465 − 0.390)/0.390). 3.2 Semi-supervised WPDP 3.2.1 Sample-Based Software Defect Prediction with Active and Semi-supervised Learning 3.2.1.1 Methodology Software defect prediction, which aims to predict whether a particular software module contains any defects, can be cast into a classification problem in machine learning, where software metrics are extracted from each software module to form an example with manually assigned labels defective (having one or more defects) and non-defective (no defects). A classifier is then learned from these training examples in the purpose of predicting the defect-proneness of unknown software modules. In this section, we propose a sample-based defect prediction approach which does not rely on the assumption that the current project has the same defect characteristics as the historical projects. Given a newly finished project, unlike the previous studies that leverage the modules in historical projects for classifier learning, sample-based defect prediction manages to sample a small portion of modules for extensive testing in order to reliably label the sampled modules, while the defect-proneness of unsampled modules remains unknown. Then, a classifier is constructed based on the sample of software modules (the labeled data) and expected to provide accurate predictions for the unsampled modules (unlabeled data). Here, conventional machine learners (e.g., logistic regression, decision tree, Naive Bayes, etc.) can be applied to the classification. In practice, modern software systems often consist of hundreds or even thousands of modules. An organization is usually not able to afford extensive testing for all modules especially when time and resources are limited. In this case, the organization can only manage to sample a small percentage of modules and test them for defect-proneness. Classifier would have to be learned from a small training
  • 41. 3.2 Semi-supervised WPDP 29 set with the defect-proneness labels. Thus, the key for the sample-based defect prediction to be cost-effective is to learn a well-performing classifier while keeping the sample size small. To improve the performance of sample-based defect prediction, we propose to apply semi-supervised learning for classifier construction, which firstly learns an initial classifier from a small sample of labeled training set and refines it by further exploiting a larger number of available unlabeled data. In semi-supervised learning, an effective paradigm is known as disagreement- based semi-supervised learning, where multiple learners are trained for the same task and the disagreements among the learners are exploited during learning. In this paradigm, unlabeled data can be regarded as a special information exchange “platform.” If one learner is much more confident on a disagreed unlabeled example than other learner(s), then this learner will teach other(s) with this example; if all learners are comparably confident on a disagreed unlabeled example, then this example may be selected for query. Many well-known disagreement-based semi- supervised learning methods have been developed. In this study, we apply CoForest for defect prediction. It works based on a well-known ensemble learning algorithm named random forest [10] to tackle the problems of determining the most confident examples to label and producing the final hypothesis. The pseudocode of CoForest is presented in Table 3.1. Briefly, it works as follows. Let L denote the labeled dataset and U denote the unlabeled dataset. First, N random trees are initiated from the training sets bootstrap-sampled from the labeled dataset L for creating a random forest. Then, in each learning iteration, each random tree is refined with the original labeled examples L and the newly labeled examples .L' selected by its concomitant ensemble (i.e., the ensemble of the other random trees except for the current tree). The learning process iterates until certain stopping criterion is reached. Finally, the prediction is made based on the majority voting from the ensemble of random trees. Note that in this way, CoForest is able to exploit the advantage of both semi-supervised learning and ensemble learning simultaneously, as suggested in Xu et al. [11]. In CoForest, the stopping criterion is essential to guarantee a good performance Li and Zhou [12] derived a stopping criterion based on the theoretical findings in Angluin and Laird [13]. By enforcing the worst case generalization error of a random tree in the current round to be less than that in the preceded round, they derived that semi-supervised learning process will be beneficial if the following condition is satisfied . êi,t êi,t−1 Wi,t−1 Wi,t 1 (3.9) where .êi,t and .êi,t−1 denote the estimated classification error of the i-th random tree in the t-th and (.t − 1)-th round, respectively, and .Wi,t and .Wi,t−1 denote the total weights of its newly labeled sets L .i, t and L.i, t − 1 in the t-th and (.t − 1)-th round, respectively, and .i ∈ {1, 2, . . . , N}. For detailed information on the derivation, please refer to Li and Zhou [12].
  • 42. 30 3 Within-Project Defect Prediction The CoForest has been successfully applied to the domain of computer-aided medical diagnosis, where conducting a large amount of routine examinations places heavy burden on medical experts. The CoForest algorithm was applied to help learn hypothesis from diagnosed and undiagnosed samples in order to assist the medical experts in making diagnosis. Although a random sample can be used to approximate the properties of all the software modules in the current projects, a random sample is apparently not data- efficient since random sample neglects the “needs” of the learners for achieving good performance and hence may contain redundant information that the learner has already captured during the learning process. Intuitively, if a learner is trained using the data that the learner needs most for improving its performance, it may require less labeled data than the learners trained without caring its needs for learning. Put it another way, if the same number of labeled data is used, the learner that is trained using the labeled data it needs most would achieve better performance than the learner that is trained without caring its needs for learning. Active learning, which is another major approach for learning in presence of a large number of unlabeled data, aims to achieve good performance by learning with as few labeled data as possible. It assumes that the learner has some control over the data sampling process by allowing the learner to actively select and query the label of some informative unlabeled example which, if the labels are known, may contribute the most for improving the prediction accuracy. Since active learning and semi-supervised learning exploit the merit of unlabeled data from different perspective, they have been further combined to achieve better performance in image retrieval [86], Email spam detection [39], etc. Recently, Wang and Zhou [68] analytically showed that combining active learning and semi-supervised learning is beneficial in exploiting unlabeled data. In this study, we extend CoForest to incorporate the idea of active learning into the sample-based defect prediction. We propose a novel active semi-supervised learning method called ACoForest, which leverages the advantages from both disagreement-based active learning and semi-supervised learning. In detail, let L and U denote the labeled set and unlabeled set, respectively. Similar to CoForest, ACoForest is firstly initiated by constructing a random forest with N random trees over L. Then, ACoForest iteratively exploits the unlabeled data via both active learning and semi-supervised learning. In each iteration, ACoForest firstly labels all the unlabeled examples and computes the degree of agreement of the ensemble on each unlabeled example. Then, it reversely ranks all the unlabeled data according to the degree of agreement and selects the M top-most disagreed unlabeled data to query their labels from the user. These unlabeled data as well as their corresponding labels are then used to augment L. After that, ACoForest exploits the remaining unlabeled data just as CoForest does.
  • 43. 3.2 Semi-supervised WPDP 31 3.2.1.2 Experiments To evaluate the effectiveness of sample-based defect prediction methods, we perform experiments using datasets available at the PROMISE website. We have collected the Eclipse, Lucene, and Xalan datasets. The Eclipse datasets contain 198 attributes, including code complexity metrics (such as LOC, cyclomatic complexity, number of classes, etc.) and metrics about abstract syntax trees (such as number of blocks, number of if statements, method references, etc.) (Zimmermann et al. 2007). The Eclipse defect data was collected by mining Eclipse’s bug databases and version archives. In this study, we experiment with Eclipse 2.0 and 3.0. To show the generality of the results, we use the package-level data for Eclipse 3.0 and the file-level data for Eclipse 2.0. We also choose two Eclipse components: JDT.Core and SWT in Eclipse 3.0 to evaluate the defect prediction performance for smaller Eclipse projects. We only examine the pre-release defects, which are defects reported in the last six months before release. The Lucene dataset we use contains metric and defect data for 340 source files in Apache Lucene v2.4. The Xalan dataset contains metric and defect data for 229 source files in Apache Xalan v2.6. Both datasets contain 20 attributes, including code complexity metrics (e.g., average cyclomatic complexity), object- oriented metrics (e.g., depth of inheritance tree), and program dependency metrics (e.g., number of dependent classes). Having collected the data, we then apply the three methods described in Sect. 2 to construct defect prediction models from a small sample of modules and use them to predict defect-proneness of unsampled modules. We evaluate the performance of all the methods in terms of precision (P), recall (R), F-measure (F), and Balancemeasure (B), which are defined as follows: .P = tp tp + fp (3.10) .R = tp tp + f n (3.11) .F = 2PR P + R (3.12) .B = 1 − 1 2 f n tp + f n !2 + fp tn + fp !2 (3.13) where tp, fp, tn, f n are the number of defective modules that are predicted as defective, the number non-defective modules that are predicted as defective, the number non-defective modules that are predicted as non-defective, and the number defective module that are predicted as non-defective, respectively.
  • 44. 32 3 Within-Project Defect Prediction 3.2.1.3 Discussions Our experiments show that a smaller sample can achieve similar defect prediction performance as larger samples do. The sample can serve as an initial labeled training set that represents the underlying data distribution of the entire dataset. Thus if there is no sufficient historical datasets for building an effective defect prediction model for a new project, we can randomly sample a small percentage of modules to test, obtain their defect status (defective or non-defective), and then use the collected sample to build a defect prediction for this project. Our experiments also show that, in general, sampling with semi-supervised learning and active learning can achieve better prediction performance than sam- pling with conventional machine learning techniques. A sample may contain much information that a conventional machine learner has already learned well but may contain little information that the learner needs for improving the current prediction accuracy. The proposed CoForest and ACoForest learners take the needs for learning into account and obtain information needed for improving performance from the unsampled modules. Both CoForest and ACoForest methods work well for sample-based defect prediction. ACoForest also supports the active selection of the modules—it can actively suggest the QA team which modules to be chosen in order to increase the prediction performance. Thus in order to apply ACoForest, interactions with test engineers are required. If such interactions is allowed (which implies that more time and efforts are allowed), we can apply the ACoForest method. If such interaction is not allowed due to limited time and resources, we can apply the CoForest method. In our approach, we draw a random sample from the population of modules. To ensure proper statistical inference and to ensure the cost effectiveness of the proposed method, the population size should be large enough. Therefore, the proposed method is suitable for large-scale software systems. The simple random sampling method requires that each individual in a sample to be collected entirely by chance with the same probability. Selection bias may be introduced if the module sample is collected simply by convenience, or from a single developer/team. The selection bias can lead to non-sampling errors (errors caused by human rather than sampling) and should be avoided. The defect data for a sample can be collected through quality assurance activities such as software testing, static program checking, and code inspection. As the sample will be used for prediction, these activities should be carefully carried out so that most of defects can be discovered. Incorrect sample data may lead to incorrect estimates of the population. In our experiments, we used the public defect dataset available at the PROMISE dataset. Although this dataset has been used by many other studies [14–18], our results may be under threat if the dataset is seriously flawed (e.g., there were major problems in bug data collection and recording). Also, all the data used are collected from open source projects. It is desirable to replicate the experiments on industrial, in-house developed projects to further evaluate their validity. This will be our important future work.
  • 45. References 33 References 1. Rosasco L, Verri A, Santoro M, Mosci S, Villa S (2009) Iterative Projection Methods for Structured Sparsity Regularization. 2. Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based face recognition. In: Proceedings of the International Conference on Image Processing, pp 1601–1604. https://doi.org/10.1109/ICIP.2010.5652363 3. Jiang Y, Cukic B, Menzies T (2008) Cost Curve Evaluation of Fault Prediction Models. In: Proceedings of the 19th International Symposium on Software Reliability Engineering, pp 197–206. https://doi.org/10.1109/ISSRE.2008.54 4. Elish KOEaMO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660. https://doi.org/10.1016/j.jss.2007.07.040 5. Wang J, Shen B, Chen Y (2012) Compressed C4.5 Models for Software Defect Prediction. In: Proceedings of the 2012 12th International Conference on Quality Software, pp 13–16. https:// doi.org/10.1109/QSIC.2012.19 6. Wei-hua WTaL (2010) Naive Bayes Software Defect Prediction Model. In: Proceedings of the 2010 International Conference on Computational Intelligence and Software Engineering 7. Sun Z, Song Q, Zhu X (2012) Using Coding-Based Ensemble Learning to Improve Software Defect Prediction. IEEE Trans Syst Man Cybern Part C 42(6):1806–1817. https://doi.org/10. 1109/TSMCC.2012.2226152 8. Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl 37(6):4537–4543. https://doi.org/10.1016/j.eswa.2009.12.056 9. Yambor WS, Draper BA, Beveridge, JR (2002) Analyzing PCA-based face recognition algorithms: Eigenvector selection and distance measures. Empirical Evaluation Methods in Computer Vision, pp 39–60. World Scientific 10. Breiman L (2001) Random forests. Mach Learn 45:5–32 11. Xu J-M, Fumera G, Roli F, Zhou Z-H, et al. (2009) Training SpamAssassin with active semi-supervised learning. In: Proceedings of the 6th Conference on Email and Anti-Spam (CEAS’09), pp 1–8 12. Li M, Zhou Z-H (2007). Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern Part A Syst Humans 37(6): 1088– 1098 13. Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2: 343–370 14. Koru AG, Liu H (2005) Building effective defect-prediction models in practice. IEEE Softw 22(6): 23–29 15. Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13 16. Zhang H, Nelson A, Menzies T (2010) On the value of learning from defect dense components for software defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pp 1–9. 17. Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp 91–100 18. Kim S, Zimmermann T, Whitehead Jr EJ, Zeller A (2007) Predicting faults from cached history. In: 29th International Conference on Software Engineering (ICSE’07). IEEE, pp 489–498
  • 46. Chapter 4 Cross-Project Defect Prediction Abstract The challenge of CPDP methods is the distribution difference between the data from different projects. Transfer learning can transfer the knowledge from the source domain to the target domain with the aim to minimize the domain difference between different domains. However, most existing methods reduce the distribution discrepancy in the original feature space, where the features are high- dimensional and nonlinear, which makes it hard to reduce the distribution distance between different projects. In this chapter, we proposed a manifold embedded distribution adaptation (MDA) approach to narrow the distribution gap in manifold feature subspace. For cross-project SDP, we found that the class imbalanced source usually leads to misclassification of defective instances. However, only one work has paid attention to this cross-project class imbalance problem. Subclass discriminant analysis (SDA), an effective feature learning method, is introduced to solve the problems. It can learn features with more powerful classification ability from original metrics. Within-project and cross-project class imbalance problems greatly affect prediction performance, and we provide a unified and effective prediction framework for both problems. We call CPDP in this scenario as cross-project semi-supervised defect prediction (CSDP). Although some within-project semi- supervised defect prediction (WSDP) methods have been developed in recent years, there still exists much room for improvement on prediction performance. We aim to provide a unified and effective solution for both CSDP and WSDP problems. We introduce the semi-supervised dictionary learning technique and propose a cost-sensitive kernelized semi-supervised dictionary learning (CKSDL) approach. CKSDL can make full use of the limited labeled defect data and a large amount of unlabeled data in the kernel space. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 X.-Y. Jing et al., Intelligent Software Defect Prediction, https://doi.org/10.1007/978-981-99-2842-2_4 35
  • 47. 36 4 Cross-Project Defect Prediction 4.1 Basic CPDP 4.1.1 Manifold Embedded Distribution Adaptation 4.1.1.1 Methodology MDA consists of two processes. First, MDA performs manifold feature learning to accommodate the feature distortion problem in the transformation process due to the nonlinear distribution of the high-dimensional data. Since the features from different projects in manifold have some similar geometrical structures and intrinsic representations, MDA can effectively exploit latent information from different projects by performing manifold feature learning. Second, to address the challenge of distribution difference from different projects, MDA performs joint distribution adaptation and considers both the importance of marginal and conditional distribu- tions. Both manifold feature learning and joint distribution adaptation are used to make the most of them. Manifold Feature Learning Manifold feature learning tries to discover the intrinsic geometry of the manifold and project the data onto a lower-dimensional space that preserves some properties of the manifold, such as distances, angles, or local neighborhoods. Manifold feature learning can be useful for data visualization, clustering, classification, and other tasks that benefit from reducing the complexity and noise of the data. Manifold feature learning plays an important role since the features in manifold space usually have a good geometric structure to avoid feature distortion. We map data from different projects into a common latent space while keeping the geometry structures of the input manifold; simultaneously the captured connections are mapped onto the common latent space. MDA learns the mapping function .g(·) in the Grassmann manifold .G (dk). .dk is the dimension of the subspaces of the different project data. We utilize the geodesic flow kernel (GFK) [1] to learn .g(·) for its computational efficiency. .G can be regarded as a collection of all.dk-dimensional subspaces. Principal com- ponent analysis [1] is performed on the source project and the target project to obtain two corresponding orthogonal subspaces. Each original subspace corresponds to one point in .G. The geodesic flow between two points can draw a path for the two subspaces. Constructing a geodesic flow from two points equals to transforming the original features into an infinite dimensional feature space. The new features of data can be represented as .z = g(x). From [1], the inner product of transformed features gives rise to a positive semi-definite GFK: . zi, zj = xT i Gxj (4.1)
  • 48. 4.1 Basic CPDP 37 where.G is a positive semi-definite matrix. The original data can be transformed into Grassmann manifold with .z = g(x) = √ Gx. . √ G is just an expression form and cannot be computed directly, where x is an instance from source or target project, and.X = XS, XT . This square root can be calculated by Denman–Beavers algorithm [2]. We use .Z = √ GX as the manifold feature representation in the following sections. Joint Distribution Adaptation Distribution adaptation reduces the distribution difference between different projects by minimizing the predefined distance measures. In the CPDP situation, the source and target projects have different data with different marginal and conditional distributions. To significantly reduce the distribution difference between different projects, joint distribution adaptation for CPDP considers the marginal and the conditional distribution adaptation at the same time. The distance between .S and .T can be represented as follows: . d(S, T ) =(1 − μ)d (P (zS) , P (zT )) + μd (P (yS | zS) , P (yT | zT )) (4.2) where .P(zS) and .P(zT ) denote the marginal distribution. .d(P(yS|zS) and .P(yT |zT )) denote conditional distribution. .d(P(zS), P(zT )) denotes the marginal distribution adaptation, and .d(P(yS|zS), P(yT |zT )) denotes the conditional distribution adaptation. .μ ∈ [0, 1] is a parameter that adjusts the importance of two kinds of distributions. The label set .yT is not available in advance which leads to the calculation of the term .P(yT |zT ) infeasible. We follow the method in [3] and .P(zT |yT ) approximately equal to .P(yT |zT ). Thus we use a base classifier trained on S and then obtain the labels of .zT . To calculate the divergence between source and target data distributions, the maximum mean discrepancy (MMD) is applied. MMD is an effective method and has been widely used in many methods. Then Formula 4.2 is formulated as follows: Where C denotes the number of classes of the label in CPDP. .Sc and .Tc denote the instances with the cth class label. .mc and .nc denote the number of instances of the cth class label from source and target projects. .Hdenotes the reproducing kernel Hilbert space induced by the mapping in manifold space. Using the matrix tricks, we obtain the following formula: . min AT Z ((1 − μ)M0 + μMc) ZT A + λ‖A‖2 F s.t.AT ZHZT A = I (4.3) where .Z = ZS, ZT combines the source project and target project data. .M0 and .Mc denote the MMD matrices that we obtained. A denotes the transformation matrix, .‖A‖2 F is the Frobenius norm. .I ∈ R(m+n)×(m+n) is an identity matrix, and .H = I − (1/(m + n))1 is the centering matrix similar to this work [4]. The constraint
  • 49. 38 4 Cross-Project Defect Prediction condition ensures that .AT Z preserves the inner attributes of the original data. The parameter .λ is a regularization term. The MMD matrix can be calculated by the following formulas: . (M0)ij = ⎧ ⎨ ⎩ 1 m if zi, zj ∈ S 1 n if zi, zj ∈ T − 1 mn otherwise (4.4) . (Mc)ij = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 1 mc if zi, zj ∈ Sc 1 nc if zi, zj ∈ Tc − 1 mcnc , 1 mc zi ∈ SC, zj ∈ Tc 1 nc zi ∈ Tc, zj ∈ Sc 0 otherwise (4.5) Furthermore, we build a Laplacian matrix by linking nearby instances to use a similar relationship better when learning transform matrices. It makes similar instances stay close to each other in the shared space. We call the inter-project similarity matrix as . Wij = sim zi, zj zi ∈ N zj or zj ∈ N (zi) 0 otherwise (4.6) where .sim(·, ·) is a similarity matrix representing the similarity of two instances. .N(zi) denotes the NNs of .zi. Then we introduce the Laplacian matrix .L = D − W, .Dii= m+n j=1 Wij . The final regularization function can be formulated as . RL = m+n i,j=1 zi − zj 2 Wij = m+n i,j=1 ziLij zj = ZLZT (4.7) The objective function of our MDA approach . min AT Z ((1 − μ)M0 + μMc + βL) ZT A + λ‖A‖2 F s.t. AT ZHZT A = I (4.8)
  • 50. 4.1 Basic CPDP 39 Algorithm 1 The pseudo of MDA Input: Data matrices.S = XS,.YS and.T = {XT } from source and target projects, hyperparameters .λ, β, μ. Output: output result 1: Learn the GFK G to obtain the feature representations in manifold space by Gong et al. [1]. 2: Get the transformed data representation .Z = √ GX in manifold subspace. 3: Train a basic classifier using S, then get the pseudo label of target data. 4: Compute MMD matrices .M0 and .Mc by 4.5 and 4.6. 5: Using matrix tricks to rewrite 4.3 and obtain the joint distribution distance representation 4.4. 6: Construct a Laplacian matrix .L = D − W by 4.7 and 4.8. 7: Construct objective function 4.9 by incorporating 4.4 and 4.10. 8: Construct Lagrange function 4.10 for problem 4.9. 9: Solve the generalised eigendecomposition problem in 4.11 and get dl eigenvalues. 10: Construct a transformation matrix A by the eigenvectors with respect to the dl eigenvalues. 11: Construct the transformed data of the source and target data by using A. 12: Obtain the prediction label of target data by using the logistic regression (LR) classifier. where .β is a balance parameter. To solve 4.8, we denote the Lagrange multipliers as .Ф = Ф1, Ф2, . . . , Фdl , then we rewrite formula 4.8 as . LA =AT Z ((1 − μ)M0 + μMc + βL) ZT A + λ‖A‖2 F + I − AT ZHZT A Ф (4.9) Set the derivative of A to 0, .∂LA/∂A = 0. Then we compute the eigenvectors for the generalized eigenvector problem . Z ((1 − μ)M0 + μMc + βL) ZT + λI A = ZHZT AФ (4.10) Finally, by solving formula 4.10, we find its dl small eigenvectors, and the optimal transformation matrix A is obtained. Algorithm 1 presents the details of MDA. 4.1.1.2 Experiments In an experiment, we employ 20 publicly projects from three defect datasets, including AEEEM, NASA, and PROMISE datasets. In the experiment, we employ three commonly used measures in defect prediction experiment to evaluate the effectiveness of the MDA approach. The measures we used include F-measure, G- measure, and AUC. We compared MDA with defect prediction models including CamargoCruz [5], TCA + [6], CKSDL [7], CTKCCA [8], HYDRA [9], and ManualDown [10]. For the comparison with CPDP methods, we specifically looked at the CamargoCruz, ManualDown, CKSDL, and HYDRA methods. Herbold et al. [11] evaluated 24 methods for CPDP. CamargoCruz method [5] always performs
  • 51. 40 4 Cross-Project Defect Prediction the best in CPDP methods in [11]. ManualDown is an unsupervised method that is suggested as the baseline model for comparison in [10] when developing a new CPDP method. CKSDL [7] is an effective semi-supervised CPDP method. HYDRA is an ensemble learning method that utilizes multiple source projects for CPDP. Comparison with transfer learning methods: there are several successful CPDP models based on transfer learning for comparison. TCA + is an effective transfer methods based on transfer component analysis technology [6]. CTKCCA [8] is one of the state-of-the-art CPDP methods and also is a feature-based transfer learning method. A total of 20 projects from AEEEM, NASA, and PROMISE datasets act as experiment data. We organize the cross-project setting as previous studies [7, 8, 11] and conduct CPDP experiment. We perform two experiment settings to evaluate MDA: Case 1 One-to-one CPDP (MDA-O). Following the pervious CPDP methods in related work, we adopt one-to-one setting of CPDP. For a given dataset, we use one project of the dataset as the target project in turn, and each of the other projects of dataset is treated as source project separately to conduct the cross-project prediction. For example, if the project EQ in AEEEM dataset is selected as the target project, the remaining projects in AEEEM dataset (JDT, LC, ML, and PDE) are separately set as a training project once, and we get four groups of prediction results for EQ, that is, JDTEQ, LCEQ, ML EQ, and PDE EQ, where the left side of .⇒ represents the source project and the right side of denotes the target project. Then the mean performance of these four predictions for the target project EQ is reported in the section. Finally, we report the mean prediction results of multiple cross-project pairs for each target project. Case 2 Many-to-one CPDP (MDA-M). For a given dataset, one project of this dataset is selected as the target project, and all of other projects of the dataset are used as the source projects for one time of prediction. For example, if the project EQ in AEEEM is selected as target project, JDT, LC, ML, and PDE are all selected as source projects. In other words, the cross-project prediction case is JDT, LC, ML, PDE EQ (Table 4.1). In our method, we have several parameters. The parameters .β and .λ in (4) are set as .β = 0.1 and .λ = 0.1. The balance factor .μ controls the weights of two kinds of distributions. Due to the different distributions from different datasets, the value of .μ varies for different datasets. Taking MW1CM1 (source project target project) as an example, we run MDA on 11 different values in the range of 0, 0.1, 0.2, . . . , 1 and finally, we set the value of .μ as 0.6. Table 4.2 report the values of F-measure, G-measure, and AUC for MDA versus baselines on AEEEM dataset, NASA dataset, and PROMISE dataset. The values in bold denote the best performance. These results show that MDA outperforms all baselines on three indicators in most cases. Comparison with CPDP methods: compared with four CPDP methods (CamargoCruz, ManualDown, CKSDL, and HYDRA), MDA-O achieves the best performance values of three indicators on average prediction performance. Both MDA-O and MDA-M perform better than compared methods on most projects. MDA-O improves the result at least by 24.4% ((.0.4876 − 0.3919)/0.3919) in terms
  • 52. 4.1 Basic CPDP 41 Table 4.1 Comparison results in terms of F-measure on each project Project CamargoCruz ManualDown CKSDL TCA + CTKCCA HYDRA MDA-M MDA-O EQ . 0.6592 . 0.6742 . 0.2709 . 0.4112 . 0.3530 . 0.5926 . 0.6531 . 0.6534 JDT . 0.4732 . 0.3976 . 0.3522 . 0.4093 . 0.3495 . 0.5385 . 0.4638 . 0.5754 LC . 0.2448 . 0.2046 . 0.3467 . 0.3631 . 0.3326 . 0.3774 . 0.2263 . 0.6637 ML . 0.3238 . 0.2581 . 0.3642 . 0.3581 . 0.3530 . 0.5385 . 0.3252 . 0.6381 PDE . 0.3249 . 0.3009 . 0.3507 . 0.4209 . 0.3495 . 0.2000 . 0.3168 . 0.6630 CM1 . 0.2663 . 0.2602 . 0.2127 . 0.3298 . 0.3309 . 0.2206 . 0.2846 . 0.3360 MW1 . 0.2326 . 0.2191 . 0.1850 . 0.3123 . 0.3345 . 0.3429 . 0.2223 . 0.2276 PC1 . 0.1809 . 0.1990 . 0.2241 . 0.3268 . 0.3544 . 0.3684 . 0.2196 . 0.2173 PC3 . 0.2924 . 0.2814 . 0.1780 . 0.4006 . 0.4490 . 0.3529 . 0.3333 . 0.3265 PC4 . 0.3273 . 0.3358 . 0.1589 . 0.3464 . 0.4467 . 0.6087 . 0.3572 . 0.3468 ant1.7 . 0.4582 . 0.4853 . 0.3497 . 0.4390 . 0.3177 . 0.3774 . 0.4780 . 0.4668 camel1.6 . 0.3420 . 0.3333 . 0.4614 . 0.3986 . 0.2404 . 0.1734 . 0.3486 . 0.3527 ivy2.0 . 0.3477 . 0.3188 . 0.3037 . 0.4510 . 0.2961 . 0.4400 . 0.2961 . 0.3281 jedit4.1 . 0.3992 . 0.2843 . 0.3028 . 0.1444 . 0.3588 . 0.4203 . 0.5300 . 0.4791 lucene2.4 . 0.4022 . 0.6454 . 0.2953 . 0.4441 . 0.3749 . 0.3273 . 0.6430 . 0.6599 poi3.0 . 0.3713 . 0.5729 . 0.2895 . 0.4117 . 0.4040 . 0.3333 . 0.6904 . 0.7486 synapse1.2 . 0.4056 . 0.4933 . 0.2583 . 0.3669 . 0.4099 . 0.5000 . 0.5307 . 0.5402 velocity1.6 . 0.4635 . 0.5609 . 0.2696 . 0.4598 . 0.4156 . 0.3447 . 0.5722 . 0.5576 xalan2.6 . 0.5186 . 0.6225 . 0.2652 . 0.4261 . 0.3967 . 0.3723 . 0.6861 . 0.6679 xerces1.3 . 0.3000 . 0.2279 . 0.3378 . 0.4033 . 0.3839 . 0.3200 . 0.2884 . 0.3038 average . 0.3667 . 0.3838 . 0.2888 . 0.3812 . 0.3625 . 0.3919 . 0.4233 . 0.4876
  • 53. 42 4 Cross-Project Defect Prediction Table 4.2 Comparison results in terms of AUC on each project Project CamargoCruz ManualDown CKSDL TCA + CTKCCA HYDRA MDA-M MDA-O EQ . 0.7406 . 0.7137 . 0.5567 . 0.6572 . 0.6437 . 0.7666 . 0.7050 . 0.7874 JDT . 0.7359 . 0.6212 . 0.6028 . 0.5606 . 0.6430 . 0.7394 . 0.7622 . 0.7640 LC . 0.7159 . 0.5902 . 0.5660 . 0.6631 . 0.6456 . 0.7337 . 0.6610 . 0.7650 ML . 0.7065 . 0.5690 . 0.5940 . 0.6164 . 0.6437 . 0.7394 . 0.7099 . 0.7502 PDE . 0.6964 . 0.6343 . 0.5787 . 0.6628 . 0.6430 . 0.6532 . 0.6898 . 0.7301 CM1 . 0.7380 . 0.6932 . 0.5901 . 0.6274 . 0.6413 . 0.7392 . 0.7736 . 0.7845 MW1 . 0.7547 . 0.6593 . 0.5401 . 0.5885 . 0.6337 . 0.6921 . 0.7099 . 0.7716 PC1 . 0.6819 . 0.6631 . 0.5768 . 0.6602 . 0.6422 . 0.7334 . 0.7412 . 0.7452 PC3 . 0.7223 . 0.6833 . 0.5502 . 0.6461 . 0.6837 . 0.7645 . 0.7999 . 0.7823 PC4 . 0.7456 . 0.6919 . 0.5339 . 0.5759 . 0.6887 . 0.7675 . 0.7889 . 0.7725 ant1.7 . 0.6732 . 0.6947 . 0.5644 . 0.6442 . 0.5842 . 0.7331 . 0.8032 . 0.7661 camel1.6 . 0.5743 . 0.5611 . 0.5771 . 0.5794 . 0.5595 . 0.6838 . 0.6097 . 0.6064 ivy2.0 . 0.6797 . 0.7119 . 0.5969 . 0.7088 . 0.5516 . 0.7797 . 0.8246 . 0.7820 jedit4.1 . 0.6198 . 0.4613 . 0.6152 . 0.6439 . 0.6484 . 0.6763 . 0.7427 . 0.7350 lucene2.4 . 0.6284 . 0.5980 . 0.5855 . 0.5911 . 0.6647 . 0.5746 . 0.6116 . 0.6357 poi3.0 . 0.6154 . 0.6611 . 0.5371 . 0.6235 . 0.6867 . 0.6935 . 0.6847 . 0.7249 synapse1.2 . 0.6518 . 0.5823 . 0.5556 . 0.6211 . 0.6602 . 0.6762 . 0.6955 . 0.6805 velocity1.6 . 0.5990 . 0.6395 . 0.6093 . 0.6010 . 0.6569 . 0.6550 . 0.7149 . 0.6890 xalan2.6 . 0.5884 . 0.5988 . 0.5707 . 0.6821 . 0.6578 . 0.6743 . 0.7633 . 0.7454 xerces1.3 . 0.6092 . 0.4873 . 0.5838 . 0.6207 . 0.6392 . 0.6290 . 0.6263 . 0.6254 average . 0.6739 . 0.6258 . 0.5742 . 0.6287 . 0.6409 . 0.7057 . 0.7209 . 0.73
  • 54. 4.1 Basic CPDP 43 of average F-measure value, 4.6% (.0.6474 − 0.6191)/0.6191 in terms of average G-measure value, and 3.8% ((.0.7322 − 0.7057)/0.7057) in terms of AUC value against with four CPDP baselines. Comparison with transfer learning methods: we can see that MDA-O achieves satisfying performance on each project results. Both MDA-O and MDA-M perform better than compared methods on average results. MDA-O achieves improvements of 27.9 and 34.5% in terms of F-measure, of 11.2 and 16.3% in terms of G-measure, of 16.4 and 14.2% in terms of AUC against with the baselines on average prediction performance. Comparison with many- to-one CPDP (MDA-M), MDA-O achieves better average results than MDA-M. MDA-O slightly outperforms than MDA-M on the overall prediction performance in terms of G-measure and AUC. Specially, on AEEEM dataset, the results of MDA-O on JDT, LC, ML, and PDE are better than MDAM. The reasons may be divided into the following aspects: Firstly, using all of other projects except target project as source project may contain some redundant information. Secondly, the distribution differences of the data from different projects may be large. The data from source project and target project has distribution difference, and the data from multiple source projects also has distribution difference. In brief, MDA achieves improvements in terms of three measures over three benchmark datasets among all six baselines (four classical CPDP methods and two CPDP methods based on transfer learning). This shows that the feasibility and effectiveness of MDA, which facilitates the performance of CPDP. 4.1.1.3 Discussions Does Manifold Feature Learning Influence the Prediction Performance of MDA In order to investigate the effectiveness of manifold feature learning of MDA, we run MDA and DA (MDA without manifold feature learning, we called it DA) on three datasets. In this section, the results of MDA reported in the tables and figures are under the setting of case 1 in Section Experiments. Figure 4.1 shows the performance of DA and MDA on three datasets. From the results, the performance of MDA with manifold feature learning is improved on all datasets. These results indicate that the features in the manifold subspace facilitate distribution adaptation. Manifold feature learning plays an important role in avoiding feature distortion and exploring geometric structure of data. MDA can realize approximate results without manifold learning on few projects, while adding manifold learning can obtain better performance. From Fig. 4.1, the F-measure and G-measure values of DA are slightly better than MDA on MW1. The reasons can be attributed as follows: The number of instances on MW1 is limited. Compared with other projects, the number of instances on MW1 is smaller. Manifold feature learning may play a limited role on the dataset which has small number of instances. Do Different Distributions Influence the Prediction Performance of MDA μ can evaluate the importance of different distributions. We discuss the impacts of different values of MDA in this section. We tune 11 different parameter values
  • 55. Random documents with unrelated content Scribd suggests to you:
  • 56. returned to camp I would fulfil my part of the contract by going back with him. “Well, Bones,” he said. “I’ll come. I don’t know what special kind of miseries the Turks keep for malingering lunatics, but I promise you that without your permission they’ll never find out through me.” I made him the same promise. Three months later I was to regret it most bitterly, for Hill then lay at death’s door in Gumush Suyu hospital, and forbade me to say the few words of confession that would have got him the humane treatment he required. Our Spook had a delicate task regaining its full authority over Kiazim. It began by developing the Commandant’s own plan—a process to which he could hardly object—and laying stress on its desire to keep Kiazim in the background. It reminded us that in order to avoid OOO’s interference it was better for us not to know what method would be ultimately adopted. But there was no harm in preparing for a trip to Constantinople to read the thoughts of AAA. And if we failed, which was unlikely, we could try some other method when we returned to Yozgad. Meantime, Kiazim need do nothing but tell the truth, in which there was never any harm. It did not reprove Kiazim for lack of faith, or pretend to know anything about his temporary secession, but went on quietly as if nothing had occurred. The Commandant was perfectly ready to tell the truth, but wanted to know to whom he was to tell it, and what he was to say! The Spook told him. He was to call in the Turkish doctors and make them the following statement, which he should learn by heart: “I am anxious about two of my prisoners, and I want your professional advice that I may act on it. I have reason to believe they are mentally affected, and that the English doctor is endeavouring to conceal the fact.[41] A certain number of the prisoners, amongst whom Jones and Hill were prominent, have been studying occultism ever since they arrived. They admittedly practise telepathy, and were arrested for communication with people outside on military matters. For direct evidence as to their conduct during their confinement I refer you to my Interpreter (Moïse) and my orderly (the Cook) who have seen a good deal of them. If they have
  • 57. become mentally unhinged I fear they may do something desperate, and would like you to send them to Constantinople where they can be properly looked after, or do whatever you think is best for them.” The Commandant would then produce the Cook. His story to the doctors was to be as follows: “By the Commandant’s orders I attended Hill and Jones in their imprisonment, as they were not allowed to communicate with other prisoners. I took them their food (from Posh Castle). At first I noticed nothing peculiar. After a few days, in brushing out their room, I began to find bits of meat hidden away in the corners. I used to give these to my chickens. I do not know why the meat was thus thrown away because the prisoners cannot talk Turkish. I also found charred remains of bread and other food in the stove. A few days ago the prisoners forbade me to sweep out their room. I do not know why. They usually look depressed and silent. That is all I know.” Then the Pimple: “I know both Jones and Hill well. When they first arrived they were both smart and soldierlike. They have gradually become more and more untidy and slovenly. For over a year they have been studying occultism, and I know they achieved some extraordinary results, e.g., they got the first news that came to Yozgad of the taking of Baghdad. There were many other things. At one time spirit-communiqués were published in the camp. All the other prisoners knew of it and many believed in it. The first peculiarity I noticed was that occasionally one or the other of them would write an extraordinary letter, abusing certain officers and the camp in general. I thought at the time these letters were due to drink, and tore them up. This was many months ago. I remonstrated with them for using such language about their fellow-officers.[42] I do not know when they began what they call ‘telepathy,’ but I used to come upon them studying together. I was present at their public exhibition (description follows). Nobody has ever given me a satisfactory explanation of their powers. “When Hill and Jones were imprisoned on March 7th it was my duty to visit them every day and try to elicit the name of their
  • 58. correspondent, which the Commandant wanted. Sometimes they were rude to me, sometimes polite, sometimes sullen. At first they got food sent in from Major Baylay’s mess (Posh Castle). I now remember that soon after they were locked up they began to ask me if Major Baylay was abusing them. About 20th March or a little before they began to beg to be allowed to cook their own food, or for the Turks to cook it. When I asked why, they first said they did not want to cause trouble in the camp. I saw Major Baylay and Price, of the Posh Castle mess, who said it was no trouble, and they would continue sending food. When I told this to Hill and Jones they got excited, insisted that they must not give trouble, and finally told me in confidence that Major Baylay was putting poison in the meat, and that they were afraid he would poison the other food too. I thought they were joking about the poison, and that the real reason was they did not wish to give trouble, but I arranged for them to cook their own food. I now understand that they did not intend it as a joke—their belief explains why they hid the meat which the Cook found. “On the 1st of April the order came from Constantinople to release them. When I told them of this they were very frightened. They asked me to keep the door locked, and said this order did not really come from Constantinople, but was an arrangement between Major Baylay and the postmaster who had been paid ten liras to forge a telegram. They said the real object of the telegram was to stop them writing to the British War Office about Baylay (it forbade them write any letters), and to get them outside so that they could be murdered. This alarmed me, as they were obviously serious. I fetched in the English camp doctor, but did not tell him my suspicions about their sanity. I was present during the doctor’s examination, and noticed the two prisoners were reticent and said nothing about Baylay. The doctor seemed puzzled. He paid several visits and was vague when I questioned him. He mentioned neurasthenia, but when I asked if that meant nervous trouble he shut up and did not answer. He was obviously alarmed about them. To please them and give the doctor a chance, the door was kept locked for several days, in spite of the War Office order to liberate
  • 59. them. Then I had to inform the camp that they were free, Hill and Jones were terrified and begged me not to allow any English officers to visit them. “When visitors came Hill and Jones got very excited. They were rude to many of their friends. They complained to me that these officers had been sent by Major Baylay and Colonel Maule to murder them. They complained that one officer—Captain Colbeck—had asked them to come out, with the object of killing them, and when they refused to go had threatened to take them by force.[43] I found out that the truth was their visitor was alarmed by their altered appearance, and thought it would do them good to have tea in Baylay’s garden. Hill and Jones thought they were being enticed out to be killed. They also complained to me that Baylay had visited them,[44] and had scattered poison about the room, and had poisoned some bread, which they had to burn in consequence. When asked why they would not allow the Cook to sweep the room they said if he did so it would liberate the poison which Baylay had put in the dust. They next began to distrust the English doctor and to think he was an emissary of Baylay’s. They pretended to take his medicine, but confided to me that they dared not do so, and showed me a bottle of Dover Powder which the doctor had given them, pointing out that it was labelled ‘POISON.’” (O’Farrell had provided us with medicines for his “neurasthenia” diagnosis, but had instructed us not to take them.) “When Constantinople, in their telegram of April 1st, prohibited Hill and Jones from writing to England, they began to write extraordinary letters to high Turkish officials and also to the Sultan. This alarmed me. I could get no satisfaction from the English doctor. I therefore asked you gentlemen to tell me the early symptoms of madness”—(This was true enough. Moïse had done so, acting under instructions from the Spook)—“and learned enough to make me fairly certain that the English doctor was concealing the truth. With the Commandant’s consent I then questioned the English doctor.” (This interview was also ordered by the Spook, O’Farrell having been previously warned by us.) “He was again vague, said the two men could be treated and looked after here, and appeared to be afraid of
  • 60. a Turkish asylum. I reported what O’Farrell had said to the Commandant, and he decided he must have proper medical advice, as they are gradually getting more violent.” Moïse was then to produce the letters we had written to the “high Turkish officials.” The Spook told us these letters were written by himself. We pretended, at the time of writing them, that we were “under control” and quite unconscious of what we were writing. Moïse and the Commandant, of course, quite believed this. I give below two specimens of the many letters we wrote. In my letters the handwriting was very scrawly and hurried, there were frequent repetitions, and occasionally words were left out. The first is to the Sultan, the second to Enver Pasha. Hill was supposed to be forced to write by me. “To the Light of the World, the Ruler of the Universe, and Protector of the Poor, the Sword Breastplate of the True Faith, his most gracious Majesty Abdul Hamid the of Turkey, Greeting: This is the humble petition of two of your Majesty’s prisoners of War now at Yozgad in Anatolia. We humbly ask your most gracious protection. We remain here in danger of our lives owing to the plots of the camp against us. They are all in league against us. Baylay is determined to poison us. He tried to drag us into the garden to murder us. He is in league with all the camp against us. We cannot eat the food they send because he puts poison in it. Colonel Maule has said to the Commandant he is going to get rid of us. Also the doctor who was our friend until Baylay persuaded him to give us poison instead of medicine. Please protect us. The Commandant is our friend. When Baylay tried to he said no and put us in a nice house please give him a high decoration for his kindness we cannot go out because Baylay will kill us and all the camp hate us who shall in duty bound ever pray for your gracious Majesty. “E. H. Jones. C. W. Hill.” “Dear Mr. Enver Pasha, “I don’t suppose your Excellency will know who I am, but Jones says he knows you. He met you in Mosul. Will you help us? The
  • 61. other prisoners want to kill us. The ringleader is Major Baylay. He gave a letter to the Turks and said we wrote it. He thought the Commandant would hang us. But the Commandant was very kind to us and gave us a house to ourselves and locked the door so that Baylay could not get at us. We were very happy until Baylay started poisoning our food. Then we the Commandant said we could cook our own food and now he leaves the door open and we are in terror lest Major Baylay comes and kills us he did come one day and tried to entice us into the garden and he now sends the doctor to give us poison the doctor pretends it is medicine but we know better. Will you please write to the Commandant and ask him to lock the door. “Your obedient servants, “C. W. Hill. E. H. Jones.” Such was the case that was laid before the two official Turkish doctors in Yozgad, Major Osman and Captain Suhbi Fahri, by the principal officials of the prisoners’ camp on the morning of April 13th, 1918. We knew nothing of the medical attainments of Major Osman or Captain Suhbi Fahri, but we calculated that if the officers in charge of a camp of German prisoners in England made similar statements about two prisoners to the local English doctors, and told them (as the Turks were told) that the German doctor in the camp was trying to conceal the true state of affairs with a view to keeping the two men from the horrors of an English asylum, it ought to create an atmosphere most favourable to malingerers. In Yozgad we had the additional advantage that the Turkish doctors were very jealous of O’Farrell, whose medical skill had created a great impression amongst the local officials, and were only too delighted at a chance of proving him wrong. But the outstanding merit of the scheme was that it avoided implicating O’Farrell. We would face the Constantinople specialists purely on the recommendation of the Turks, and O’Farrell’s disagreement with the local doctors would make him perfectly safe if we were found out. Also O’Farrell’s whole attitude towards us, his fellow-prisoners, would help us to deceive the specialists, because it would be a strong argument against the theory that we were malingering, for it would be natural to suppose
  • 62. that the English doctor would seek to help rather than hinder us to leave Yozgad. The Turks are not sufficiently conversant with Poker to recognize a bluff of the second degree. The Spook had promised the Commandant to place us under control and make us seem mad when the doctors visited us. It succeeded to perfection, for we had left no stone unturned to deceive the Turks. We were unshaven, unwashed, and looked utterly disreputable. For over three weeks we had been living on a very short ration of dry bread and tea. For the last three days we had eaten next to nothing, and by the 13th April we were literally starving. We sat up all night on the 12th, that our eyes might be dull when the doctors came, and we took heavy doses of phenacetin at frequent intervals, to slow down our pulses. All night we kept the windows and doors shut, and the stove red-hot and roaring, and smoked hard, so that by morning the atmosphere was indescribable. We scattered filth about the room, which had already remained a week unswept, and strewed it with slop-pails, empty tins, torn paper, and clothing. Near the door we upset a bucket of dirty water; in the centre of the floor was a heap of soiled linen, and close beside it what looked like the remains of a morning meal. Over all we sprinkled a precious bottle of Elliman’s Embrocation, adding a new odour to the awful atmosphere. An hour before the doctors were due, Hill began smoking strong plug tobacco, which always makes him sick. The Turks, being Turks, were ninety minutes late. Hill kept puffing valiantly at his pipe, and by the time they arrived he had the horrible, greeny-yellow hue that is known to those who go down to the sea in ships. It was a lovely spring morning outside. The snow had gone. The countryside, fresh from the rains, was bathed in sunlight, and a fine fresh breeze was blowing. We heard Moïse and the doctors coming up our stairs, laughing and chatting together. Captain Suhbi Fahri, still talking, opened the door of our room—and stopped in the middle of a sentence. It takes a pretty vile atmosphere to astonish a Turk, but the specimen of “fug” we had so laboriously prepared took
  • 63. his breath away. The two doctors stood at the door and talked in whispers to Moïse. Hill, with a British warm up to his ears and a balaclava on his tousled head, sat huddled motionless over the red-hot stove, warming his hands. On the other side of the stove I wrote furiously, dashing off sheet after sheet of manuscript and hurling them on to the floor. Their examination of us was a farce. If their minds were not already made up before they entered, the state of our room and our appearance completely satisfied them. Major Osman never left the door. Captain Suhbi Fahri tiptoed silently round the room, peering into our scientist-trapping slop-pails and cag-heaps, until he got behind my chair, when I whirled round on him in a frightened fury, and he retreated suddenly to the door again. Neither of them sought to investigate our reflexes—the test we feared most of all—but they contented themselves with a few questions which were put through Moïse in whispers, and translated to us by him. They began with me. Major Osman. “What are you writing?” Self (nervously). “It is not finished yet.” The question was repeated several times; each time I answered in the same words, and immediately began writing again. Major Osman. “What is it?” Self. “A plan.” (Back to my writing. More whispering between the doctors at the door.) Major Osman. “What plan?” Self. “A scheme.” Major Osman. “What scheme?” Self. “A scheme to divide up England at the end of the war. A scheme for the abolition of England! Go away! You are bothering me.” (More whispering at the door.) Major Osman. “Why do you want to do that?” Self. “Because the English hate us.” Major Osman. “Your father is English. Does he hate you?”
  • 64. Photo by Savony Self. “Yes. He has not written to me for a long time. He puts poison in my parcels. He is in league with Major Baylay. It is all Major Baylay’s doing.” “THE MELANCHOLIC”—C. W. HILL I grew more and more excited, and burst into a torrent of talk about my good friend Baylay’s “enmity,” waving my arms and raving furiously. The two doctors looked on aghast, and I noticed Captain Suhbi Fahri changed his grip on his silver-headed cane to the thin end. It took them quite a time to quieten me down again. At last I
  • 65. gathered up my scattered manuscript and resumed my writing. Hill had never moved or paid the slightest attention to the pandemonium. They turned to him. Major Osman. “Why are you keeping the room so hot? It is a warm day.” (Moïse had to call Hill by name and repeat the question several times before Hill appeared to realize that he was being addressed. Then he raised a starving, grey-green, woebegone face to his questioners.) “Cold,” he said, and huddled an inch nearer the stove. “Why don’t you go out?” asked Major Osman. “Baylay,” said Hill, without lifting his head. “Why don’t you sweep the floor?” “Poison in dust.” “Why is there poison in the dust?” “Baylay,” said the monotonous voice again. “Is there anything you want?” Major Osman asked. Hill lifted his head once more. “Please tell the Commandant to lock the door and you go away,” then he turned his back on his questioners. The two doctors, followed by Moïse, tiptoed down the stairs. We heard the outer gate clang, listened carefully to make sure they had gone, and then let loose the laughter we had bottled up so long. For both the Turkish doctors had clearly been scared out of their wits by us. Moïse came back later with our certificates of lunacy. They were imposing documents, written in a beautiful hand, and each decorated with two enormous seals. The following is a translation as it was written out by the Pimple at our request:— “HILL. This officer is in a very calm condition, thinking. His face is long, not very fat. Breath heavy. He has been seen very thinking. He gave very short answers. There is no (? life) in his answers. There is a nervousness in his present condition. He states that his life is in danger and he wants the door to be locked because a Major is going to kill him. By his answers and by the fact he is not taking any food,
  • 66. it seems that he is suffering from melancholia. We beg to report that it is necessary he be sent to Constantinople for treatment and observation and a final examination by a specialist.” “JONES. This officer appears to be a furious. Weak constitution. His hands were shaking and was busy writing when we went to see him. When asked what he was writing he answered that it was a plan for the abolition of England because the English were his enemies; even his father was on their part because he was not sending letters. His life is in danger. A Major wants to kill him and has put poison in his meat. That is why he is not eating. He requested nobody may be allowed to come and the door may be locked. According to the statement of the orderly and other officers this officer has been over-studying spiritualism. He says that the doctor was giving him poison instead of medicine. According to his answers and his present condition he seems to suffer from a derangement in his brains. We beg to report that it is necessary to send him to Constantinople for observation and treatment.” Both reports were signed and sealed by “Major Osman, Bacteriologist in charge of Infectious Diseases at Yozgad.” “Captain Suhbi Fahri, District Doctor in charge of Infectious Diseases at Yozgad.” “Your control,” said Moïse to us, “was wonderful—marvellous. Your very expressions had altered. The doctors said your looks were ‘very bad, treacherous, haine.’ You, Jones, have a fixed delusion—(idée fixée)—and Hill has melancholia, they say. They have ordered that a sentry be posted to prevent your committing suicide and that you and your room be thoroughly cleaned, by force if necessary. Do you remember the doctors’ visit?”
  • 67. Photo by Annan “THE FURIOUS.”—E. H. JONES Our memories, we said, were utterly blank, and we got the Pimple to relate what had occurred. “It was truly a glorious exhibition of the power of our Spook,” the Pimple ended, “and the Commandant is greatly pleased. I trust you suffer no ill-effects?” We were only very tired, and very anxious that the doctors’ suggestions as to cleaning up should be carried out. Sentries were called in. Our bedding and possessions were moved to a clean room, and we were led out into the yard and made to bathe in the horse-
  • 68. trough. Then we slept the sleep of the successful conspirator till evening.
  • 69. CHAPTER XXII HOW THE SPOOK CORRESPONDED WITH THE TURKISH WAR OFFICE AND GOT A REPLY I woke at sunset to find Doc. O’Farrell bending over me. “Doctors been here?” he asked in a hoarse whisper. I nodded. “And what’s the result?” “Did you see the sentry at the door?” I asked. “Don’t tell me you’re found out,” Doc. moaned, “or I’ll never forgive myself.” “All right, Doc. dear! The sentry’s there to prevent us committing suicide!” Doc. stared a moment, and then doubled up with laughter that had to be silent because of the Turk outside. “Like to see the medical reports?” I asked, handing him the Pimple’s translation. He began to read. At the first sentence he burst into a loud guffaw, and thrust the reports hastily out of sight. Luckily the gamekeeper at the door paid no attention. The Doc. apologized for his indiscretion and managed to read the rest in silence. “Think we’ve a chance?” Hill asked, as he finished. “Ye’re a pair of unmitigated blackguards,” said the Doc., “an’ I’m sorry for the leech that’s up against you. There’s only one thing needed to beat the best specialist in Berlin or anywhere else, but as you both aim at getting to England you can’t do it.” “What is that?” we asked.
  • 70. “One of ye commit suicide!” said the Doc., laughing. “By Jove! That’s a good idea!” I cried. “We’ll both try it.” “Don’t be a fool!” he began sharply, then—seeing the merriment in our eyes—“Oh! be natural! Be natural an’ you’ll bamboozle Æsculapius himself.” He dodged the pillow Hill threw at him and clattered down the stairs chuckling to himself. Within five minutes of his going we decided to hang ourselves —“within limits”—on the way to Constantinople. A little later the Pimple arrived, with the compliments and thanks of the Commandant to the Spook, and would the Spook be so kind as to dictate a telegram about us to the War Office? The Spook was most obliging, and somewhere amongst the Turkish archives at Constantinople the following telegram reposes: “For over a year two officer prisoners here have spent much time in study of spiritualism and telepathy, and have shown increasing signs of mental derangement which recently have become very noticeable. I therefore summoned our military doctors Major Osman and Captain Suhbi Fahri who after examination diagnosed melancholia in the case of Hill and fixed delusion in the case of Jones and advised their despatch to Constantinople for observation and treatment. Doctors warn me these two officers may commit suicide or violence. I respectfully request I may be allowed to send them as soon as possible. Transport will be available in a few days when prisoners from Changri arrive. If permitted I shall send them with necessary escort under charge of my Interpreter who can watch and look after them en route and give any further information required by the specialists. Until his return may I have the services of the Changri Interpreter? My report together with the report of the doctors, follows by post. Submitted for favour of urgent orders.” This spook-telegram was sent by the Commandant on 14th April, 1918, at 5 p.m. The same night the Spook dictated a report on our case, of a character so useful to the Constantinople specialists that Kiazim was thanked for it by his superiors at headquarters. The spook-report (which should also be among the Constantinople archives) is as follows:
  • 71. “In reference to my wire of 14th April I beg to report as follows: As will be seen from the enclosed medical reports written by Major Osman and Captain Suhbi Fahri, the Military Medical Officers of Yozgad, there are two officers in this camp who are suffering from grave mental disease. The doctors recommend their despatch to Constantinople for observation and treatment, and I beg to urge that this be done as early as possible, as the doctors warn me they may commit suicide or violence, and I am anxious to avoid any such trouble in this camp. “In addition to the information contained in the medical reports I beg to submit the following facts for guidance and consideration. The two officers are Lieut. Hill and Lieut. Jones. The former came here with the prisoners from Katia. The latter from Kut-el-Amara. I have made enquiries about both. I find Lieut. Hill has always been a remarkably silent and solitary man. He has the reputation of never speaking unless spoken to, and then only answers in monosyllables. During his stay here he has been growing more and more morose and gloomy. Lieut. Jones is regarded by his fellow-prisoners as eccentric and peculiar. I myself have noticed an increasing slovenliness in his dress since he came here. I learn that he has done a number of little things which caused his comrades to regard him as peculiar. For instance, sixteen months ago he spent a week sliding down the stairs in his house and calling himself the ‘Toboggan King.’ On another occasion when receiving a parcel from England in this office he expressed disgust at the ‘rubbish’ which was sent him, and drawing out a pocket-knife he slashed into ribbons a valuable waterproof sheet which had been included in his parcel. This was about a year ago.[45] Such appears to be the reputation of these two officers in the camp. “About eighteen months ago a number of officers began to take up spiritualism. Among these Jones was prominent. He asserted he was in communication with the dead and for some time he even published the news he thus obtained. I do not know when Hill began, but he also was a keen spiritualist. They have both spent a great deal of their time in this pursuit. Whether or not this has
  • 72. anything to do with their present condition I cannot say. Many other officers did the same and I saw no reason to interfere as I considered it a legitimate amusement. “These two officers also appear to have studied what they call ‘telepathy,’ and about two or three months ago they gave an exhibition of thought-reading, part of which my Interpreter saw and which considerably surprised their fellow-officers. Later Hill and Jones asserted they were in communication (telepathic) with people in Europe and elsewhere as well as with the dead. Early in March, as I reported to you in my letter of the 18th March, Jones and Hill were found guilty on a charge of attempting to communicate with some person in Yozgad whose name they refused to give, and as I reported, I confined them in a separate house and forbade any intercourse with the rest of the camp. I allowed them to have their food sent in from Major Baylay’s house, which is near. “While in confinement these two officers appear to have got the idea that their comrades in the camp disliked them, and this idea developed into delusion and terror that they were going to be murdered. Their condition became so grave that I called in the two medical officers, who had no hesitation, after examining them, in recommending their despatch to Constantinople. “Meantime, until their departure, by the advice of Major Osman and Captain Suhbi Fahri, I have posted a special guard over the patients to prevent them from doing themselves or others any harm. “With regard to the journey, as reported in my telegram I beg leave to send them under charge of my Interpreter with a sufficient escort, as the sufferers are accustomed to him and he will be able to understand their wants, and especially because knowing all they have done he may be of assistance to the specialists in their enquiry. Until his return I would like the services of the Changri Interpreter, but if necessary, for a short time, I could communicate any orders that may be necessary direct as several British officers here know a little Turkish.” The report was posted on the 15th April. On the 16th the Commandant received from Constantinople the following telegram in
  • 73. answer to the Spook’s wire: “Number 887. 15th April. Urgent. Very important. Answer to your cipher wire No. 77. Under your proposed arrangement send to the Hospital of Haidar Pasha the two English Officers who have to be under observation. Communicate with the Commandant Changri.— Kemal.” “Hurrah!” said Moïse, when he brought us the news, “the Spook has controlled Constantinople!”
  • 74. CHAPTER XXIII IN WHICH THE SPOOK PERSUADES MOÏSE TO VOLUNTEER FOR ACTIVE SERVICE The telegram from Kemal Pasha, ordering us to be sent to Constantinople, arrived on the 16th April. The prisoners from Changri, bringing with them the Interpreter who was to take the place of the Pimple, reached Yozgad on the 24th. Hill and I left for Angora on the 26th. The Spook explained that though we would probably read AAA’s thoughts and discover the position of the third clue as soon as we got to Constantinople, it was essential for our safety that the Constantinople specialists should, for a time, think us slightly deranged and in need of a course of treatment. Therefore it behoved Moïse to endeavour to bring this about by reporting to the Constantinople authorities the things which the Spook would tell him to report, and learning his lesson carefully. “What will happen to the mediums,” the Pimple asked, “if the specialists do not think them slightly deranged?” “Jail, mon petit cheri chou!” said the Spook. “Jail for malingering, and they will not return to Yozgad to continue our experiments. You must play your part.” The Pimple’s part, the Spook explained, was to observe and note carefully everything the mediums said and did. At the request of the Spook, as soon as the Yozgad doctors had declared us mad, the Commandant publicly ordered Moïse to make notes of our behaviour, for the benefit of the doctors at the Haidar Pasha hospital. The
  • 75. Spook declared that from now on the mediums would be kept “under control” so as to appear mad, for control being a species of hypnotism the oftener we were placed in that condition the easier it would be for the Spook to impose its will on us in Constantinople to deceive the specialists. Thus, while the Turks thought the Spook was practising on us, making us appear mad, we were really practising our madness on the Turks. Doc. O’Farrell visited us every day. The Turks thought he too was “under control” and that he was puzzled by our symptoms. In point of fact he was coaching us very carefully in what things were fit and proper for a “melancholic” and “a furious” to do and say, for we had decided to adhere to the two distinct types of madness diagnosed by the Yozgad doctors. What he secretly taught us each morning, the Spook made us do “under control” each evening, when it was duly noted down by the Pimple. These notes were revised and corrected by the Spook at regular intervals. In this way we piled up a goodly store of evidence as to our insanity. Every evening, after the rest of the camp had been locked up, we held séances, and at every séance the poor Pimple was put through his lesson. Over and over again he was made to recite to the spook- board what he had to say to the Constantinople doctors. It made a strange picture: Moïse, leaning over the piece of tin that was his Delphic oracle, told his tale as he would tell it at Haidar Pasha. His face used to be lined with anxiety lest he should go wrong and incur the wrath of the Unknown. Hill and I, pale and thin with starvation, and the strain of our long deception, sat motionless (and, as Moïse thought, unconscious), with our fingers resting on the glass and every sense strained to detect the slightest error in the Pimple’s story or in his tone or manner of telling it. And when the mistakes came (as to begin with they did with some frequency), the glass would bang out the Spook’s wrath with every sign of anger and there would follow the trembling apologies and stammered emendations of the unhappy Interpreter. Hill and I had got beyond the stage of wanting to laugh, for we were working now at our last hope. It was absolutely essential that the Pimple’s story should be without flaw.
  • 76. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com