Automating Software Development Using Artificial Intelligence (AI)

Funding
provided
by:
http://www.sqrlab.ca
Faculty of Science (Computer Science)
University of Ontario Institute of Technology
Oshawa, Ontario, Canada
Automating Software
Development Using AI
Jeremy Bradbury, PhD
jeremy.bradbury@uoit.ca

Software Quality Research Lab
© 2018 J.S. Bradbury
Gabrielle Peres Dias, Michael Miljanovic, Luisa Rojas Garcia Kevil Jalbert
Mark Green David Kelk
Joseph Heron
2

Defining AI
• What is artificial intelligence?
• An evolving concept that means different things to
different people
• What are examples of AI?
• Metaheuristic search techniques (e.g., genetic algorithms)
• Machine learning (e.g., support vector machines)
• Deep learning (and neural networks)
© 2018 J.S. Bradbury 3

“
Metaheuristic Definition
…a metaheuristic can be seen as a general
algorithmic framework which can be applied to
different optimization problems with relatively
few modifications to make them adapted to a
specific problem. [MHN17]
[MHN17] Metaheuristic Network. Website: http://www.metaheuristics.net/ (Last accessed: Oct. 17, 2017).

Example
metaheuristic
techniques include
hill climbing,
particle swarm
optimizations,
genetic algorithms
(GAs)…

Metaheuristic
Strategies
SOLUTION
CONSTRUCTION
SOLUTION
MODIFICATION
SOLUTION
RECOMBINATION
[ZBB10] Gunther Zapfel,
Roland Braune, Michael
Bogl. “Metaheurustic
Search Concepts:
A Tutorial with
Applications to Production
and Logistics.” 2010.
6
• Ant Colony Optimization
• Greedy Randomized
Adaptive Search
Procedure (GRASP)
• Hill Climbing
• Tabu Search
• Simulated
Annealing
• Genetic
Algorithms

Local vs. Global Search
Local Search
• Scope = local
• Strategy = making iterative local changes
• Solution = local optima
Global Search
• Scope = global
• Strategy = making iterative changes over the
entire solution space
• Solution = global optima "Location_of_Cape_Verde_in_the
_globe.svg" derivative work by
Luan and original by Eddo is
licensed under CC BY-SA 3.0

The Machine Learning (ML) Process
• Machine Learning (ML) techniques can generally be applied to
tasks (problems) as follows[Fla12]:
[Fla12] Peter Flach. “Machine Learning: The Art and Science of Algorithms that Make Sense of Data.” 2012.
Training
Data
(features)
ML
Algorithm
ModelData
Output
(grouping,
grading)

[Par17] David L. Parnas. “The Real Risks of Artificial Intelligence,” Communications of the ACM 60(10), pages 27-31, 2017.
Photo from https://alchetron.com/David-Parnas and is licensed under CC BY-SA 3.0
“
Learning is not magic,
it is the use of data
collected during use to
improve future
performance.
– David Parnas

Machine
Learning
Applications
CLASSIFICATION
CLUSTERING
PREDICTIONREGRESSION
OPTIMIZATION
[Gol16] Sunila Gollapudi.
“Practical Machine
Learning.” 2016.
10

Machine Learning Methods
• Naïve Bayes
• Average One-Dependence Estimators (AODE)
• Bayesian Belief Network (BBN)
• Support Vector Machine (SVM)
• Linear Discriminant Analysis (LDA)
• Classification & Regression Tree (CART)
• Random Forest
• K-Means Clustering
• Expectation Maximization (EM)
• …

Challenges with ML
• Overfitting
• When your model (target function) is tailored too much to
past (training) data and doesn’t generalise to future data
points
• You have trained your model too well!
• Underfitting
• When your model is not trained well enough for modeling
the past (training) data and does not generalise to future
data points either
• You haven’t trained your model well enough!

Machine Learning
REPRESENTATION
LEARNING
DEEP
LEARNING

“
Representation Learning
a set of methods that allows a
machine to be fed with raw data
and to automatically discover the
representations needed for
detection or classification. [LBH15]
[LBH15] Yann LeCun, Yoshua Bengio, Geoffrey Hinton (2015). Deep learning. Nature, 521(7553), 436–444.

“
Deep Learning
…are representation-learning methods
with multiple levels of representation,
obtained by composing simple but non-
linear modules that each transform the
representation at one level (starting with
the raw input) into a representation at a
higher, slightly more abstract level.
[LBH15]
[LBH15] Yann LeCun, Yoshua Bengio, Geoffrey Hinton (2015). Deep learning. Nature, 521(7553), 436–444.

Deep Learning
16© 2018 J.S. Bradbury
Input
Layer
Output
Layer
Hidden Layers
• Not constrained by traditional machine learning’s limitations
with respect to processing raw data (which requires expertise
and domain knowledge)

Implementations of Deep Learning
• Multiple levels/layers of representation learning can be
implemented in various ways including:
• Deep neural networks
• Deep convolution neural networks (ConvNets)
• Recurrent neural networks (RNNs)
• Deep believe networks
• A good framework to use for deep learning is
TensorFlow – https://www.tensorflow.org/

AI & SE – Understanding the Relationship
Artificial
Intelligence
Software
Engineering
18

Artificial
Intelligence
Software
Engineering
AI
+
SE
19

Artificial
Intelligence
Software
Engineering
AI
+
SE
How can AI be applied to SE?
20

• Automation of software development activities including the
creation of software artifacts (e.g., software test generation)
• Recommendation systems to assist software developers
improve their performance (e.g., recommended code for
review)
• The software development problems that can be addressed
with AI are those that can be reframed in terms of
optimization, classification, prediction…

• There are already several vibrant research communities
conducting work in this area:
• International Symposiums on Search-based Software
Engineering (SSBSE), 2009-2018
• International Workshops on Realizing Artificial Intelligence
Synergies in Software Engineering (RAISE), 2012-2016,
2018

EXAMPLE #1
Automatic
Bug Repair

What do we mean by
concurrency bugs?
• There are many different
kinds of concurrency bugs
• We focus on two of the
most common kinds – data
races and deadlocks

What do we mean by
bug repair?
• We view bug repair
as a source code
modification that
fixes a concurrency
bug while
minimizing the
effect on
performance

Automatic Repair of Concurrency Bugs
• Several SBSE approaches have been proposed to fix bugs in
single threaded programs [LDFW12, Arc11]
• genetic programming is used to evolve patches, while
testing evaluates fitness
• These techniques cannot be applied directly to fix
concurrency bugs due to the nondeterministic nature of
thread scheduling
• We adapt this work to handle concurrency bugs by modifying
the fitness function and it’s evaluation
[LDFW12] C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, “A systematic study of automated program
repair: Fixing 55 out of 105 bugs for $8 each,” in Proc. of ICSE 2012, Jun. 2012.
[Arc11] A. Arcuri, “Evolutionary repair of faulty software,” in Applied Soft. Computing, vol. 11, 2011, pp. 3494–3514.

Automatic Repair of Concurrency (ARC)
PHASE 1:
• Repairing Deadlocks and Data Races
PHASE 2:
• Optimizing the Performance of Repaired Source Code
Buggy
Java
program
Java program
that exhibits
bug-free
behavior
Java program
that exhibits
bug-free
behavior
and is
performance
optimized
PHASE 1 PHASE 2
27

ARC PHASE 1
Repairing Deadlocks and Data Races
INPUT
1. Java program with
concurrency bugs
2. Set of JUnit tests
The test suite is the oracle
(hence, the approach is only
as good as the tests!)

ARC PHASE 1
1. Initialize/update population
• Create the population for the genetic
algorithm (GA)
• The first generation is a set of copies of the
original buggy program
• Subsequent generations will be updated based
on the GA (described in future steps)

ARC PHASE 1
2. Generate mutants
• Use mutation operators to generate mutants
for all members of the population
• The generated mutants are optimized using the
static analysis tool Chord [NA07]
• Allows mutation operators to target specific
shared classes, methods and variables when
generating mutants
[NA07] Naik, M., Aiken, A.: Conditional must not aliasing for static race detection.
In: Proc. of ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages (POPL 2007), pp. 327–338, Jan. 2007.

Mutation Operator1 Description Acronym
Add a synchronized block around a statement ASAT
Add the synchronized keyword to the method header ASIM
Add a synchronized block around a method ASM
Change the order of two synchronized blocks order CSO
Expand synchronized region after EXSA
Expand synchronized region before EXSB
Remove synchronized statement around a synchronized statement RSAS
Remove synchronization around a variable RSAV
Remove synchronized keyword in method header RSIM
Remove synchronization block around method RSM
Shrink synchronization block after SHSA
Shrink synchronization block before SHSB
1All mutation operators are written in the TXL source transformation language – http://www.txl.ca.

Program P:
obj.write(var1);
synchronized(lock) {
myHash.remove(var1);
}
Program P’:
synchronized(lock) {
obj.write(var1);
myHash.remove(var1);
}
EXSB

ARC PHASE 1
3. Apply mutation to an individual in population
• During execution the GA selects a type of
mutation (i.e., a mutation operator) – random
on first generation
• From the set of mutations created a random
instance is used

ARC PHASE 1
4. Evaluate individuals
• Fitness function is used to evaluate mutants
selected in previous step
• The mutated individual is maintained only if the
fitness function is improved
functional fitness(P) = (s x sw) + (t x tw)
where:
s = # of successful executions
sw = success weighting (high)
t = # of timeout executions
tw = timeout weighting (low)

ARC PHASE 1
4. Evaluate individuals
• In order to evaluate the fitness function for a
given individual we need to evaluate the
function over many different
interleavings/executions
• We use IBM's ConTest [EFN+02], which
instruments the program with noise, to ensure
that many interleavings are evaluated
[EFN+02] Edelstein, O., Farchi, E., Nir, Y., Ratsaby, G., Ur, S.: Multithreaded Java program test generation.
IBM Systems Journal 41(1), 111–125 (2002)

ARC PHASE 1
4. Check terminating condition
• An individual that produces 100% successful
executions is a potential fix
• However, we perform an additional step to
increase confidence that the individual is in fact
correct
• We evaluate the individual with ConTest again
using a safety multiplier (e.g., 20) to increase the
number of interleavings explored – a fix is only
accepted if we achieved 100% success for this
additional evaluation

ARC PHASE 1
4. Check terminating condition
• Our approach will not always find a successful
solution that repairs all of the concurrency bugs
in a program
• If after a user-defined number of generations a
solution is not reached our algorithm will
terminate

ARC PHASE 1
5. Replace weakest individuals (Optional)
• To encourage individuals to explore more
fruitful areas of state space we can replace
individuals
• We can restart with original population
• We can replace (e.g., 10%) of
underperforming individuals with random
high-performance individuals or with
original program

ARC PHASE 1
6. Calculate operator weighting
• We leverage historic information from previous
generations to weight the operators and
increase the likelihood that useful operators
are selected first/more frequently
• Strategy 1: weight based on % of dead
locks/data races uncovered
• Strategy 2: Weight based on a mutation
operator’s fitness function success

ARC PHASE 1
• Strategy 1: weight based on % of dead
locks/data races uncovered
• For example, some operators are geared
towards fixing deadlocks, others data races
and some both.
• We increase the likelihood that specific
operators are selected based on the number
of deadlock and data races in our historic
evaluations

ARC PHASE 1
• Strategy 2: Weight based on a mutation
operator’s fitness function success
• For example, operators that have historically
increased the fitness are weighted
proportional to their success

ARC PHASE 2
Optimizing Repaired Source Code
• ARC may introduce unnecessary
synchronization during Phase 1.
• If Phase 1 is successful, an optional
second phase attempts to improve
the running time of the program-
under-repair by shrinking and
removing unnecessary
synchronization blocks
• A new non-functional fitness function
and a subset of the mutation
operators (e.g., RSAS, SHSA) are used

Evaluation – Setup
• We selected a set of 8 programs from the IBM Concurrency
Benchmark [EU04] that have deadlock or data race bugs
• 6 programs that exhibit bugs ARC was designed to fix
• 2 programs that ARC was not designed to fix (sanity check)
• Each program was analyzed using 5 executions of ARC
[EU04] Eytani, Y., Ur, S.: Compiling a benchmark of documented multi-threaded bugs.
In Proc. of Work. on Parallel and Distributed Sys.: Testing, Analysis, and Debugging (PADTAD 2004), 2004.

Evaluation – Results1
Program Bug Type
Bug
Repaired?
# Generations to
Repair Bug
(Avg.)
Time Required to
Repair Bug (Avg.)
account Data Race ✔ 5.0 08m 08s
accounts Data Race ✔ 1.0 44m 00s
bubblesort2 Data Race ✔ 2.2 1h 40m 20s
deadlock Deadlock ✔ 1.0 02m 12s
lottery Data Race ✔ 2.4 38m 00s
pingpong Data Race ✔ 1.0 12m 32s
airline Data Race ✖ - -
buffer Data Race ✖ - -
1Our evaluation was conducted on a Linux PC with a 2.33 GHz processor, 4 gigabytes of RAM running Linux.

Challenges & Future Work
• Flexibility – ARC is currently only capable of fixing deadlocks
and data races
• We place to explore other mutation operators that will
increase the kinds of bugs that can be fixed
• Readability [FLW12] – automatic repair always has the
potential to decrease the readability and maintainability of
the source code
• We have not studied the readability of the fixes produced
by ARC
[FLW12] Zachary P. Fry, Bryan Landau, and Westley Weimer. “A Human Study of Patch Maintainability.”
In Proc. of the International Symposium on Software Testing and Analysis (ISSTA), 177–187, 2012.

EXAMPLE #2
Predicting Future Code
Changes With Historic
Commit Data

Software Projects
• Software projects are often developed in teams or groups of
people contributing through a version control system
• Overtime software projects evolve
• introduce new functionality, remove functionality, fix
bugs, optimize code, etc.
• Popular git-based version control platforms include GitHub or
BitBucket
• Host both open and closed source software

“
What is a Commit?
A commit, or "revision", is an
individual change to a file (or
set of files) … that allows you
to keep record of what
changes were made when and
by who. Commits usually
contain a commit message
which is a brief description of
what changes were made.1
1 https://help.github.com/articles/github-glossary/

Research Questions
• Can historic commit data be used to predict future changes in
software projects?
• Using Machine Learning (ML) for prediction
• What is the impact the following factors on the performance
of the prediction?
• Sampling range
• Feature set
• Balanced sampling

Research Questions
● If the answer to the research questions is YES! (we can
predict future code changes) the benefits include:
● Providing developers with insight into future development
development
● Providing project managers a new tool for resource
allocation

Overview of Prediction Approach
V1.0 V1.01
... V2.0 Vn
GitHub
Machine
Learning
Vn+1
Candidate
Features
SVM RF
Predict
...
SVM = Support Vector
Machine
RF = Random Forest

Experiment Setup
• Collected the data from 23 projects including:
• ACRA/acra
• google/blockly-android
• apache/storm
• Projects were all: Java, open source, hosted on GitHub, have
longer development history (+1 year)
• Data collected was all extracted from GitHub repositories and
stored locally for experiments.

Experiment Setup
• The features used to predict future code changes will occur in
a given unit of code (method, class) include:
• Committer
• Method Signature
• Filename
• Overall change frequency
• Short-term change frequency
• Method Length
• Changed in previous 5 commits? (true/false)

Experiment Setup
• Utilized two different machine learning algorithms:
• Support Vector Machine (SVM)
• Supervised learning
• Applications in classification and regression
• Can be used for multi-class tasks
• Random Forest (RF)
• Supervised learning that work by constructing decision trees
• Applications in classification and regression
• Ensemble learning method

Experiment Setup
• Conducted 6 separate experiments
• One experiment for each factor using each machine learning
technique
Sampling range
Feature set
Balanced sampling
Support Vector
Machine
Random ForestX

Experiment Setup
• Measured performance in
terms of:
• Accuracy
• Precision
• Recall

Experiment 1: Impact of Sampling Range
• Does sampling range impact predictive capabilities of the
approach?
• For each project, 8 values for the range size are tested

Experiment 1: SVM Results for acra
• Performed well with
some of the values
• Performance did have
variations

Experiment 1: Discussion
• Performance variations
• Large variation for
recall
• Smaller for precision
and accuracy
• Smaller projects
performed better
• Factor is impactful

Discussion
• Predicting changes within a project is possible
• Both machine learning algorithms are capable of providing
predictions
• The Sample Window Range and Feature Set proved most
influential
• Project dependent factors require further investigation

Artificial
Intelligence
Software
Engineering
AI
+
SE
65

Artificial
Intelligence
Software
Engineering
AI
+
SE
How can SE be applied to AI?
66

Criticisms of Deep Learning
• “Deep learning and AI in general ignore too much of the
brain’s biology in favor of brute-force computing.” [MIT-DL]
• “Google’s attitude is: lots of data makes up for everything”
– viewpoint of Jeff Hawkins, founder of Palm Computing,
on Google’s approach to deep learning [MIT-DL]
• Concerns about bias and comprehension of deep learning
algorithms also increase as the algorithms get more
complex and the data gets bigger
[MIT-DL] https://www.technologyreview.com/s/513696/deep-learning/

Criticisms of Deep Learning
• “Deep learning and AI in general ignore too much of the
brain’s biology in favor of brute-force computing.” [MIT-DL]
• “Google’s attitude is: lots of data makes up for everything”
– viewpoint of Jeff Hawkins, founder of Palm Computing,
on Google’s approach to deep learning [MIT-DL]
• Concerns about bias and comprehension of deep learning
algorithms also increase as the algorithms get more
complex and the data gets bigger
[MIT-DL] https://www.technologyreview.com/s/513696/deep-learning/
Can Software Engineering help with this?

Can SE help solve AI’s black box?

Automating Software Development Using Artificial Intelligence (AI)

More Related Content

What's hot (13)

Similar to Automating Software Development Using Artificial Intelligence (AI) (20)

Recently uploaded (20)

Automating Software Development Using Artificial Intelligence (AI)