SlideShare a Scribd company logo
Data-driven software engineering @Microsoft 
Michaela Greiler
Data-driven software engineering @Microsoft 
•How can we optimize the testing process? 
•Do code reviews make a difference? 
•Is coding velocity and quality always a tradeoff? 
•What’s the optimal way to organize work on a large team? 
MSR Redmond/TSE: 
Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta 
MSR Redmond: 
Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann 
MSR Cambridge: Brendan MurphyKim Herzig
0 
20 
40 
60 
80 
100 
2010 
2010 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Code Coverage trigger of Checkins 
% completely covered 
% somewhat covered 
% not covered
Reviewer recommendation: Does experience matter?
Can we change with what we can measure? 
Michaela Greiler
YES
YES 
that’s the danger!
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2.5 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
Code Quality
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2.5 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
Code Quality
SOCIO TECHNICAL CONGRUENCE 
“Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
So should we go without any measurements?
Interpretation 
Data Collection 
Usage 
Lessons learned 
No 
Garbage!
•What is codemine? What data does codeminehave?
GMQ vs. Opportunistic data collection 
•Easily available ≠ what’s needed 
•Determine the needed data 
•Find proxy measures if needed 
•Know the analysis before collecting the data 
Otherwise, data is not usable for the intended purpose 
•Goal –Question –Metric 
•Check for completeness, cleanness/ noise and usefulness 
•Data background 
•How was data generated? 
•Why was it generated? 
•Who consumes the data? 
•What about outliers? 
•How was the data processed?
Interpretation needs domain knowledge
Tools, processes, 
practices and policies. 
Release schedule 
Time 
Engineers 
What roles exist? 
Who does what? 
Responsibilities? 
M1 
M2 
Beta 
Organization of code bases 
Team structure and culture.
You cannot compare 1:1
Engineers want to understand the nitty-gritty 
•How do you calculate the recommended reviewers? 
•Why was that person recommended? 
•Why is Lisa not recommended?
Simplicity first 
Files 
without 
bugs 
Files 
with 
bugs 
Files withoutbugs: main contributor made > 50% of all edits 
Files withbugs: main contributor made < 60% of all edits 
Ownership metric: 
Proportion of edits of all edits for the contributor with the most edits 
Reporting vs. Prediction 
Comprehension 
vs. automation 
If you can do it with a decision tree… do it…
Iterative process with very close involvement of product teams and domain experts. 
It’s a dialog 
It’s a back and forth
Mixed Method Research 
Is a research approach or methodology 
•for questions that call for real-life contextual understandings; 
•employing rigorous quantitative research assessing magnitude and frequency of constructs and 
•rigorous qualitative researchexploring the meaning and understanding of constructs; 
DR. MARGARET-ANNESTOREY 
Professor of Computer Science University of Victoria 
All methods are inherently flawed! 
Generalizability 
Precision 
Realism 
DR. ARIEVANDEURSEN 
Professor of Software Engineering Delft University of Technology
Foundations of Mixed 
Methods Research 
Designing 
Social Inquiry 
Qualitative Research: Mixed Method Research 
•Interviews 
•Observations 
•Focus groups 
•Contextual Inquiry 
•Grounded Theory 
•…
A Grounded Theory Study 
23 
Systematic procedure to discover a theory from (qualitative) data 
S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. 
B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. 
Glaser and Strauss
Deductiveversus inductive 
A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) 
Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). 
Theory 
Hypotheses 
Observation 
Confirm/Reject 
Observation 
Patterns 
Theory
All models are wrong but some are useful 
(George E. P. Box)
Theo: Test Effectiveness Optimization from History 
Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* 
*Microsoft Research, Cambridge 
+Microsoft Corporation, US
Improving Development Processes 
Product / 
Service 
Legacy 
changes 
New product 
features 
Technology 
changes 
Development Environment 
$ 
Speed 
R 
Cost 
Quality / Risk 
(should be well balanced) 
Microsoft aims for shorter release cycles 
Empirical data to support & drive decisions 
• Speed up development processes (e.g. code velocity) 
• More frequent releases 
• Maintaining / increasing product quality 
Joint effort by MSR & product teams 
• MSR Cambridge: Brendan Murphy, Kim Herzig 
• TSE Redmond: Jacek Czerwonka, Michaela Greiler 
• MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan 
• Windows, Windows Phone, Office, Dynamics product teams
Software Testing for Windows 
Winmain (main branch) 
Quality gate 
(system testing) 
Quality gate 
(system & component testing) 
Quality gate 
(component testing) 
time 
Development branch 
Multiple area branches 
Multiple component branches 
Software testing is very expensive 
• Thousands test suites executed, millions test cases executed 
• On different branches, architectures, languages, etc. 
• We tend to repeat the same tests over and over again 
• Too many false alarms (failures due to test and infrastructure issues) 
• Each test failures slows down product development 
• Aims to find code issues as early as possible 
• At the cost of slower product development 
Actual problem 
Current process aims for maximal protection 
{Simplified illustration}
Software Testing for Office 
Software testing is very expensive 
• Thousands test suites executed, millions test cases executed 
• On different branches, architectures, languages, etc. 
• We tend to repeat the same tests over and over again 
• Too many false alarms (failures due to test and infrastructure issues) 
• Each test failures slows down product development 
• Aims to find code issues as early as possible 
• At the cost of slower product development 
Actual problem 
Current process aims for maximal protection 
Dev Inner Loop 
BVT and CVT 
on main 
Dog food 
Different 
• Branching structure 
• Development process 
• Testing process 
• Release schedules 
• … 
{Simplified illustration}
Goal 
Reduce the number of test executions … 
… without sacrificing code quality 
Dynamic, self-adaptive optimization model
Solution 
Reduce the number of test executions … 
•Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). 
•We eventually find all code issues but take riskof finding them later (on higher level branches). 
… without sacrificing code quality 
High cost, unknown value 
$$$$$ 
High cost, low value$$$$ 
Low cost, 
low value$ 
Low cost, good value$$ 
How likely is a test causing: 
1)false positivesor 
2)finding code issues? 
Analyzehistoric data: 
-Test Events 
-Builds 
-Code Integrations 
Analyzepast test results 
-Passing tests, false alarms, detected code issues
Bug finding capabilities change with context
Solution 
Using cost function to model risk. 
푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 
퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" 
=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 
퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" 
=푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ 
Test 
Costto run a test. 
Valueof output.
Current Results 
Simulated on Windows 8.1 development period (BVT only)
Dynamic, Self-Adaptive 
Decision points are connected to each other 
Skipping tests influences the risk factorsof higher level branches 
We re-enable testsif code quality drops (e.g. different milestone) 
0.00% 
10.00% 
20.00% 
30.00% 
40.00% 
50.00% 
60.00% 
70.00% 
relative test reduction rate 
Time (Windows 8.1) 
Training period
Bug Finding Performance of Tests 
How many test executions fail? 
#failed test exec 
Branch level 
Number of test executions 
How many of the failed test executions result in bug reports? 
FP 
TP test-unspecific 
TP test-specific 
Branch level
Impact on Development Process 
Secondary Improvements 
•Machine Setup: we may lower the number of machines allocated to testing process 
•Developer satisfaction: Removing false test failures increases confidence in testing process 
…hard to estimate speed improvement through simulation 
“We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
Michaela Greiler 
@mgreiler 
www.michaelagreiler.com 
http://research.microsoft.com/en-us/projects/tse/

More Related Content

PDF
Strategies to Avoid Test Fixture Smells durin Software Evolution
PDF
On to code review lessons learned at microsoft
PPT
Mattias Ratert - Incremental Scenario Testing
PDF
Exploring Exploratory Testing
PPT
A survey of software testing
PDF
Programming with GUTs
PPT
Erik Boelen - Testing, The Next Level
PDF
Exploratory Testing
Strategies to Avoid Test Fixture Smells durin Software Evolution
On to code review lessons learned at microsoft
Mattias Ratert - Incremental Scenario Testing
Exploring Exploratory Testing
A survey of software testing
Programming with GUTs
Erik Boelen - Testing, The Next Level
Exploratory Testing

What's hot (19)

PPT
Better Software Classic Testing Mistakes
PDF
A Study: The Analysis of Test Driven Development And Design Driven Test
PPTX
Writing acceptable patches: an empirical study of open source project patches
PDF
Exploratory Testing Basics and Future
PDF
On The Relation of Test Smells to Software Code Quality
PDF
ISTQB CTAL - Test Analyst
PDF
Software testing
DOC
Ôn tập kiến thức ISTQB
PPT
Klaus Olsen - Agile Test Management Using Scrum
PPT
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
ODP
Effective unit testing
PPT
Mats Grindal - Risk-Based Testing - Details of Our Success
PPTX
OmniTestingConf: Taking Test Automation to the Next Level
PDF
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
PDF
S440999102
PDF
01 software test engineering (manual testing)
PDF
Julian Harty - Alternatives To Testing - EuroSTAR 2010
PDF
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
PPTX
IT8076 - SOFTWARE TESTING
Better Software Classic Testing Mistakes
A Study: The Analysis of Test Driven Development And Design Driven Test
Writing acceptable patches: an empirical study of open source project patches
Exploratory Testing Basics and Future
On The Relation of Test Smells to Software Code Quality
ISTQB CTAL - Test Analyst
Software testing
Ôn tập kiến thức ISTQB
Klaus Olsen - Agile Test Management Using Scrum
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
Effective unit testing
Mats Grindal - Risk-Based Testing - Details of Our Success
OmniTestingConf: Taking Test Automation to the Next Level
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
S440999102
01 software test engineering (manual testing)
Julian Harty - Alternatives To Testing - EuroSTAR 2010
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
IT8076 - SOFTWARE TESTING
Ad

Similar to Can we induce change with what we measure? (20)

PPTX
First steps in testing analytics: Does test code quality matter?
PDF
AI improves software testing to be more fault tolerant, focused and efficient
PDF
AI improves software testing through test automation, test creation and test ...
PPTX
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
PPT
Test-Driven Development in the Corporate Workplace
PDF
How AI supports testing Kari Kakkonen TestFormation 2025
PPTX
History Class - For software testers
PPTX
An Agile Approach to Machine Learning
PPTX
Exploratory Testing in a chaotic world to share
PDF
Enabling Automated Software Testing with Artificial Intelligence
PPT
PDF
Agile Testing Days
PDF
What would Jesus Developer do?
PPTX
2014 toronto-torbug
PPTX
A New Model for Testing
PPTX
New model
PPTX
A New Model For Testing
PDF
Software testing
PDF
Software Analytics - Achievements and Challenges
PDF
Bridging the Gap: from Data Science to Production
First steps in testing analytics: Does test code quality matter?
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing through test automation, test creation and test ...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Test-Driven Development in the Corporate Workplace
How AI supports testing Kari Kakkonen TestFormation 2025
History Class - For software testers
An Agile Approach to Machine Learning
Exploratory Testing in a chaotic world to share
Enabling Automated Software Testing with Artificial Intelligence
Agile Testing Days
What would Jesus Developer do?
2014 toronto-torbug
A New Model for Testing
New model
A New Model For Testing
Software testing
Software Analytics - Achievements and Challenges
Bridging the Gap: from Data Science to Production
Ad

Recently uploaded (20)

PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
Tartificialntelligence_presentation.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPT
Geologic Time for studying geology for geologist
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Architecture types and enterprise applications.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
1 - Historical Antecedents, Social Consideration.pdf
Hybrid model detection and classification of lung cancer
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
WOOl fibre morphology and structure.pdf for textiles
Getting started with AI Agents and Multi-Agent Systems
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Tartificialntelligence_presentation.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Developing a website for English-speaking practice to English as a foreign la...
DP Operators-handbook-extract for the Mautical Institute
Univ-Connecticut-ChatGPT-Presentaion.pdf
The various Industrial Revolutions .pptx
Enhancing emotion recognition model for a student engagement use case through...
Final SEM Unit 1 for mit wpu at pune .pptx
Benefits of Physical activity for teenagers.pptx
Getting Started with Data Integration: FME Form 101
Geologic Time for studying geology for geologist
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Architecture types and enterprise applications.pdf

Can we induce change with what we measure?

  • 1. Data-driven software engineering @Microsoft Michaela Greiler
  • 2. Data-driven software engineering @Microsoft •How can we optimize the testing process? •Do code reviews make a difference? •Is coding velocity and quality always a tradeoff? •What’s the optimal way to organize work on a large team? MSR Redmond/TSE: Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta MSR Redmond: Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann MSR Cambridge: Brendan MurphyKim Herzig
  • 3. 0 20 40 60 80 100 2010 2010 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 Code Coverage trigger of Checkins % completely covered % somewhat covered % not covered
  • 4. Reviewer recommendation: Does experience matter?
  • 5. Can we change with what we can measure? Michaela Greiler
  • 6. YES
  • 8. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  • 9. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  • 10. SOCIO TECHNICAL CONGRUENCE “Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
  • 11. So should we go without any measurements?
  • 12. Interpretation Data Collection Usage Lessons learned No Garbage!
  • 13. •What is codemine? What data does codeminehave?
  • 14. GMQ vs. Opportunistic data collection •Easily available ≠ what’s needed •Determine the needed data •Find proxy measures if needed •Know the analysis before collecting the data Otherwise, data is not usable for the intended purpose •Goal –Question –Metric •Check for completeness, cleanness/ noise and usefulness •Data background •How was data generated? •Why was it generated? •Who consumes the data? •What about outliers? •How was the data processed?
  • 16. Tools, processes, practices and policies. Release schedule Time Engineers What roles exist? Who does what? Responsibilities? M1 M2 Beta Organization of code bases Team structure and culture.
  • 18. Engineers want to understand the nitty-gritty •How do you calculate the recommended reviewers? •Why was that person recommended? •Why is Lisa not recommended?
  • 19. Simplicity first Files without bugs Files with bugs Files withoutbugs: main contributor made > 50% of all edits Files withbugs: main contributor made < 60% of all edits Ownership metric: Proportion of edits of all edits for the contributor with the most edits Reporting vs. Prediction Comprehension vs. automation If you can do it with a decision tree… do it…
  • 20. Iterative process with very close involvement of product teams and domain experts. It’s a dialog It’s a back and forth
  • 21. Mixed Method Research Is a research approach or methodology •for questions that call for real-life contextual understandings; •employing rigorous quantitative research assessing magnitude and frequency of constructs and •rigorous qualitative researchexploring the meaning and understanding of constructs; DR. MARGARET-ANNESTOREY Professor of Computer Science University of Victoria All methods are inherently flawed! Generalizability Precision Realism DR. ARIEVANDEURSEN Professor of Software Engineering Delft University of Technology
  • 22. Foundations of Mixed Methods Research Designing Social Inquiry Qualitative Research: Mixed Method Research •Interviews •Observations •Focus groups •Contextual Inquiry •Grounded Theory •…
  • 23. A Grounded Theory Study 23 Systematic procedure to discover a theory from (qualitative) data S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. Glaser and Strauss
  • 24. Deductiveversus inductive A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). Theory Hypotheses Observation Confirm/Reject Observation Patterns Theory
  • 25. All models are wrong but some are useful (George E. P. Box)
  • 26. Theo: Test Effectiveness Optimization from History Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* *Microsoft Research, Cambridge +Microsoft Corporation, US
  • 27. Improving Development Processes Product / Service Legacy changes New product features Technology changes Development Environment $ Speed R Cost Quality / Risk (should be well balanced) Microsoft aims for shorter release cycles Empirical data to support & drive decisions • Speed up development processes (e.g. code velocity) • More frequent releases • Maintaining / increasing product quality Joint effort by MSR & product teams • MSR Cambridge: Brendan Murphy, Kim Herzig • TSE Redmond: Jacek Czerwonka, Michaela Greiler • MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan • Windows, Windows Phone, Office, Dynamics product teams
  • 28. Software Testing for Windows Winmain (main branch) Quality gate (system testing) Quality gate (system & component testing) Quality gate (component testing) time Development branch Multiple area branches Multiple component branches Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection {Simplified illustration}
  • 29. Software Testing for Office Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection Dev Inner Loop BVT and CVT on main Dog food Different • Branching structure • Development process • Testing process • Release schedules • … {Simplified illustration}
  • 30. Goal Reduce the number of test executions … … without sacrificing code quality Dynamic, self-adaptive optimization model
  • 31. Solution Reduce the number of test executions … •Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). •We eventually find all code issues but take riskof finding them later (on higher level branches). … without sacrificing code quality High cost, unknown value $$$$$ High cost, low value$$$$ Low cost, low value$ Low cost, good value$$ How likely is a test causing: 1)false positivesor 2)finding code issues? Analyzehistoric data: -Test Events -Builds -Code Integrations Analyzepast test results -Passing tests, false alarms, detected code issues
  • 32. Bug finding capabilities change with context
  • 33. Solution Using cost function to model risk. 푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" =퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" =푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ Test Costto run a test. Valueof output.
  • 34. Current Results Simulated on Windows 8.1 development period (BVT only)
  • 35. Dynamic, Self-Adaptive Decision points are connected to each other Skipping tests influences the risk factorsof higher level branches We re-enable testsif code quality drops (e.g. different milestone) 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% relative test reduction rate Time (Windows 8.1) Training period
  • 36. Bug Finding Performance of Tests How many test executions fail? #failed test exec Branch level Number of test executions How many of the failed test executions result in bug reports? FP TP test-unspecific TP test-specific Branch level
  • 37. Impact on Development Process Secondary Improvements •Machine Setup: we may lower the number of machines allocated to testing process •Developer satisfaction: Removing false test failures increases confidence in testing process …hard to estimate speed improvement through simulation “We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
  • 38. Michaela Greiler @mgreiler www.michaelagreiler.com http://research.microsoft.com/en-us/projects/tse/