SlideShare a Scribd company logo
Software Analytics:
Reflection and Path Forward
Dr. Dongmei Zhang
Data, Knowledge, and Intelligence
(DKI) Group
Microsoft Research Asia
Prof. Tao Xie
School of Computer Science
Peking University
Outline
‱ Origin and early research
‱ Community building
‱ New research topics
‱ Reflections
05/20/2022 MSR 2022 2
Origin and Early Research
05/20/2022 MSR 2022 3
05/20/2022 MSR 2022 4
Software Analytics Group at MSRA, founded in May 2009
Software Analytics Research
Utilize data-driven approach to help create high quality, user friendly,
and efficiently developed and operated software and services
05/20/2022 MSR 2022 5
Information Visualization
Analysis Algorithms
Large-scale Computing
Vertical
Horizontal
https://www.microsoft.com/en-us/research/group/software-analytics/
http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
Prof. Tao Xie’s
Visit at MSRA SA
05/20/2022 MSR 2022 6
Defining Software Analytics
Software analytics is to enable software practitioners to perform data
exploration and analysis in order to obtain insightful and actionable
information for data-driven tasks around software and services.
05/20/2022 MSR 2022 7
D. Zhang, Y. Dang, J. Lou, S. Han, H. Zhang, and Tao Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences. In MALETS 2011.
Six dimensions
05/20/2022 MSR 2022 8
Research
Topics
Technology
Pillars
Target
Audience
Connection
to Practice
Output
Input
Research topics – the trinity view
05/20/2022 MSR 2022 9
‱ Covering major areas of software domain
‱ Throughout entire development cycle
‱ Enabling practitioners to obtain insights
Software
Users
Software
Development
Process
Software
System
Input - data sources
05/20/2022 MSR 2022 10
Runtime traces
Program logs
System events
Perf counters


Usage log
User surveys
Online forum posts
Blog & Twitter


Source code
Bug history
Check-in history
Test cases


Output – insightful information
‱ Conveys meaningful and useful understanding or knowledge towards
completing the target task
‱ Not easily attainable via directly investigating raw data without aid of
analytics technologies
‱ Examples
‱ It is easy to count the number of re-opened bugs, but how to find out the
primary reasons for these re-opened bugs?
‱ When the availability of an online service drops below a threshold, how to
localize the problem?
05/20/2022 MSR 2022 11
Output – actionable information
‱ Enables software practitioners to come up with concrete solutions
towards completing the target task
‱ Examples
‱ Why bugs were re-opened?
‱ A list of bug groups each with the same reason of re-opening
‱ Why availability of online services dropped?
‱ A list of problematic areas with associated confidence values
‱ Which part of my code should be refactored?
‱ A list of cloned code snippets easily explored from different perspectives
05/20/2022 MSR 2022 12
Technology pillars
05/20/2022 MSR 2022 13
Software
Users
Software
Development
Process
Software
System
Information Visualization
Analysis Algorithms
Large-scale Computing
Vertical
Horizontal
Technology pillars
Target audience – software practitioners
05/20/2022 MSR 2022 14
Developer
Tester
Program Manager
Usability engineer
Designer
Support engineer
Management personnel
Operation engineer
Connection to practice
‱ Software Analytics is naturally tied with software development
practice
‱ Getting real
05/20/2022 MSR 2022 15
Real
Data
Real
Problems
Real
Users
Real
Tools
Early projects
05/20/2022 MSR 2022 16
StackMine – Performance debugging in the large via mining millions of stack traces
Scalable code clone analysis
Data exploration for Customer Experience Improvement Program (CEIP)
05/20/2022 MSR 2022 17
Performance Debugging in the Large via
Mining Millions of Stack Traces
S. Han, Y. Dong, D. Zhang, and T. Xie, ICSE 2012
Comprehending Performance from Real-World
Execution Traces: A Device-Driver Case
X. Yu, S. Han, D. Zhang, and T. Xie, ASPLOS 2014
05/20/2022 MSR 2022 18
Performance Debugging in the Large via
Mining Millions of Stack Traces
S. Han, Y. Dong, D. Zhang, and T. Xie, ICSE 2012
Comprehending Performance from Real-World
Execution Traces: A Device-Driver Case
X. Yu, S. Han, D. Zhang, and T. Xie, ASPLOS 2014
as representative paper in 2012, 1 of 20 representative
papers (one paper a year)
Community Building
05/20/2022 MSR 2022 19
Building Upon Rich Work by the Communities
05/20/2022 MSR 2022 20
FSE/SDP Workshop on the Future of Software Engineering Research (FoSER 2010)
...
MSR 2012 Keynote
05/20/2022 MSR 2022 21
SoftMine 2013 Keynote
05/20/2022 MSR 2022 22
CCCF/IEEE Software 2013 Articles
05/20/2022 MSR 2022 23
Shonan Meeting 2013
05/20/2022 MSR 2022 24
Tutorials/Tech Briefings at ICSE/FSE/ASE...
‱ [ASE 11 Tutorial] Zhang & Xie. xSA: eXtreme Software Analytics -
Marriage of eXtreme Computing and Software Analytics
‱ [CSEE&T 12 Tutorial] Zhang, Dang, Han & Xie. Teaching and Training
for Software Analytics
‱ [ICSE 12 SEIP Mini Tutorial] Zhang & Xie. Software Analytics in
Practice: Mini Tutorial
‱ [ICSE 13 Tutorial] Zhang & Tao Xie. Software Analytics: Achievements
and Challenges
‱ [FSE 14 Tutorial] Zhang & Tao Xie. Software Analytics: Achievements
and Challenges
05/20/2022 MSR 2022 25
Community Building by Others
05/20/2022 MSR 2022 26
IEEE Software
2013 Special Issue
Dagstuhl Seminar
2014
International Workshop on
Software Analytics (SWAN)
2015, 2016, 2017, 2018
...
Expanding Community
05/20/2022 MSR 2022 27
...
Beyond SE Communities: ASPLOS 2021 Keynote
05/20/2022 MSR 2022 28
ASPLOS is the premier forum for interdisciplinary systems research, intersecting computer architecture, hardware
and emerging technologies, programming languages and compilers, operating systems, and networking.
New Research Topic (1)
Cloud Intelligence
05/20/2022 MSR 2022 29
Cloud Services
‱ Shift to cloud becoming mainstream
‱ Critical role of cloud computing platforms fortified by COVID-19
05/20/2022 MSR 2022 30
2018 2019 2020 2021 2022
System
Infrastructure
11% 13% 16% 19% 22%
Infrastructure
software
13% 15% 17% 18% 20%
Application
software
34% 36% 38% 39% 40%
Business process
outsourcing
27% 28% 29% 29% 30%
Total 19% 21% 24% 26% 28%
Cloud shift proportion by category
Source: Gartner (August 2018)
2019 2020 2021 2022
BPaaS 45,212 44,741 47,521 50,336
PaaS 37,512 43,823 55,486 68,964
SaaS 102,064 101,480 117,773 138,261
IaaS 44,457 51,421 65,264 82,225
DaaS 616 1,204 1,945 2,542
Total Market 242,696 257,549 304,990 362,263
Worldwide public cloud services end-user spending forecast (Millions of USD)
Source: Gartner (November 2020)
Note: Totals may not add up due to rounding.
Focusing on Cloud Computing
‱ Huge space for improvement for cloud computing platforms
‱ Software Analytics is the digital transformation of software industry
‱ Cloud intelligence
‱ Software Analytics focusing on cloud computing
‱ Re-emergence of AI
‱ Making impact is key
05/20/2022 MSR 2022 31
Cloud Intelligence
Using AI/ML technologies to effectively and efficiently design, build and
operate complex cloud services at scale
MSR 2022 32
Customers
Engineering
Services
‱ AI for System
Designing and building high-quality services with better
reliability, performance, and efficiency
‱ AI for Customers
Improving customer satisfaction with intelligence and
better user experiences
‱ AI for DevOps
Achieving high productivity in DevOps via empowering
engineers with intelligent tooling
05/20/2022
‱ Cloud Intelligence Workshop
‱ @ AAAI 2020
‱ @ ICSE 2021
‱ @ SysML 2022
‱ Program Chair
Jian Zhang, Microsoft Azure
‱ Steering Committee
Rama Akkiraju, IBM
Ricardo Bianchini, Microsoft Research
Mike Dahlin, Google
Marcus Fontoura, Microsoft Azure
Ahmed E. Hassan, Queen’s University
Michael Lyu, Chinese University of Hong Kong
Erik Meijer, Facebook
Tao Xie, Peking University
Dongmei Zhang, Microsoft Research
Yuanyuan Zhou, UCSD
Related Efforts
05/20/2022 MSR 2022 33
‱ AIOps by Gartner
“Put simply, AIOps is the application of machine learning
(ML) and data science to IT operations problems. AIOps
platforms combine big data and ML functionality to
enhance and partially replace all primary IT operations
functions, including availability and performance
monitoring, event correlation and analysis, and IT service
management and automation.”
‱ AIOps extended
AIOps: Real-world Challenges and Research Innovations
Yingnong Dang, Qingwei Lin, Peng Huang
Technical Briefing, ICSE 2019
Scenarios
05/20/2022 MSR 2022 34
Service health measuring (KPI)
‱ Availability / reliability
‱ Performance
‱ Security
Anomalous behavior detection
‱ KPI (Overall, component)
‱ Resource (overhead / leak)
Health prediction
‱ Infrastructure (e.g., power, cooling)
‱ HW, SW Failure
‱ Workload
‱ System capacity
Auto-recovery/adjustment/healing
‱ Recovery option optimization
‱ Auto healing
Programming
‱ API/code suggestion
‱ Code defect, smell, code review
‱ Test coverage, test selection
CI/CD
‱ Integration testing and strategy
‱ Rollout risk assessment and strategy
Auto-triage & diagnosis
‱ Auto-triage (investigation owner)
‱ Diagnosis intelligence
Repair/mitigation decision
‱ Solution recommendation
‱ Decision support
Customer behavior understanding
‱ Usage experience
‱ Customer churn
Proactive customer engagement
‱ Service auto-scale (up/down)
‱ Engaging before reporting
Intelligent customer support
‱ Self-serve
‱ Efficient communication
‱ Intelligent suggestion/hints
Service Engineering Customer
Problems and Challenges
MSR 2022 35
Detection
Diagnosis
Optimization
Prediction
‱ Time-series anomaly detection
‱ Log-based anomaly detection
‱ Multi-dimensional change detection
‱ 

‱ Log pattern mining
‱ Correlation analysis
‱ Dependency graph diagnosis
‱ 

‱ Context/dependency-aware prediction
‱ Automated feature engineering
‱ Extremely-imbalanced data prediction
‱ 

Diverse requirements, noisy
data, high dimensions, lack
of labeled data 

Diverse causes, complex
service dependency,
scattered knowledge

Huge problem space,
large scale data, complex
constraints and tradeoffs, 

Highly imbalanced class,
fast system evolution,
unpredictable behavior
changes, 

‱ Multi-constraint/objective optimization
‱ DL-based combinatorial search
‱ Optimization under prediction uncertainty
‱ 

PROBLEMS CHALLENGES
05/20/2022
Disk Failure Prediction in Cloud Computing Platform
Improving Service Availability of Cloud Systems by Predicting Disk Error, Y. Xu, K. Sui, R. Yao, H. Zhang, Q. Lin, Y. Dang, P. Li, K. Jiang, W. Zhang, J. Lou, M. Chintalapati, D. Zhang, USNIX ATC 2018.
NTAM: Neighborhood-Temporal Attention Model for Disk Failure Prediction in Cloud Platforms, C. Luo, P. Zhao, B. Qiao, Y. Wu, H. Zhang, W. Wu, W. Lu, Y. Dang, S. Rajmohan, Q. Lin, D. Zhang, the Web Conference
2021.
05/20/2022 MSR 2022 36
Virtual Machine (VM) Availability and Disk Failures
‱ Hardware issues are one of the top reasons of VM going down and VM reboot
‱ Disk failures contribute most to the hardware issues
05/20/2022 MSR 2022 37
Source: https://www.backblaze.com/blog/hard-drive-stats-for-2018/
Source: https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/08/a7-narayanan.pdf
SSD Annualized Failure Rates
Binary Classification Problem
The training set is a collection of 𝑁𝑁 training samples, denoted as
đ·đ· = { 𝑋𝑋1, 𝑩𝑩1 , (𝑋𝑋2, 𝑩𝑩2) 
 , (𝑋𝑋𝑁𝑁, 𝑩𝑩𝑁𝑁)}
𝑋𝑋𝑖𝑖 represents the corresponding disk 𝑑𝑑𝑖𝑖’s own status data and neighborhood information,
i.e., 𝑋𝑋𝑖𝑖 = 𝐮𝐮𝑖𝑖 âˆȘ đ”đ”đ‘–đ‘–, 𝐮𝐮𝑖𝑖 ∈ 𝑅𝑅ℎ×𝑛𝑛 represents 𝑑𝑑𝑖𝑖’s own status data, and đ”đ”đ‘–đ‘– is a subset of unions
of all 𝐮𝐮𝑖𝑖.
𝑩𝑩𝑖𝑖 ∈ {0,1} is the label
𝑩𝑩𝑖𝑖 = 1 means that the corresponding disk will fail in near future
𝑩𝑩𝑖𝑖 = 0 means ‘healthy’
Loss function
𝐿𝐿 = −
1
𝑁𝑁
ïżœ
𝑖𝑖=1
𝑁𝑁
[𝑩𝑩𝑖𝑖 ⋅ log ïżœ
𝑩𝑩𝑖𝑖 + 1 − 𝑩𝑩𝑖𝑖 ⋅ log(1 − ïżœ
𝑩𝑩𝑖𝑖)]
05/20/2022 MSR 2022 38
Related Work
‱ Traditional machine learning based approaches
‱ Support Vector Machine (SVM) [MSST 2013]
‱ Decision Tree (DT) [DSN 2014]
‱ Random Forest (RF) [DSN 2018]
‱ Gradient Boosting Decision Tree (GBDT) [Ph.D. Dissertation, UCLA 2017]
‱ Regularized Greedy Forest (RGF) [KDD 2016]
‱ Cloud Disk Error Forecasting (CDEF) [USENIX ATC 2018]
‱ Deep Learning based approaches
‱ Recurrent Neural Network (RNN) [IEEE Transactions on Computers 2016]
‱ Long Short-Term Memory (LSTM) [ICDM 2018]
‱ Temporal Convolution Neural Network (TCNN) [DAC 2019]
‱ Convolution Neural Network with Long Short-Term Memory (CNN+LSTM) [FAST 2020]
‱ Neighborhood-Temporal Attention Model (NTAM) [Web Conference 2021]
05/20/2022 MSR 2022 39
Observations (1)
‱ VMs can be impacted before disks completely fail
‱ Disk errors occur before disk completely fails
‱ Disk errors often reflected by system-level signals such as OS events
05/20/2022 MSR 2022 40
Name Description
Timestamp The timestamp 𝑡𝑡 of the feature vector recorded.
Disk ID The unique ID of disk 𝑑𝑑𝑖𝑖 .
Node ID The unique ID of each computing server (i.e. node) 𝑑𝑑𝑖𝑖 is associated with.
SMART Attributes The SMART attributes of 𝑑𝑑𝑖𝑖 recorded at 𝑡𝑡, providing information such as the Current Pending
Sector Count, Seek Error Rate, Soft Read Error Rate, etc.
System-related
attributes
OS events such as paging error, file system error, device reset, telemetry loss, etc.
Driver-related
attributes
Gathered from disk driver with information on Flush Count, IO Latency, Controller Reset, etc.
Observation (2)
‱ A disk’s health status may be impacted by its neighboring disks
‱ Incorporating individual disk’s status and its neighborhood info
05/20/2022 MSR 2022 41
Figure 2: The architecture of the neighborhood-aware component underlying NTAM.
Observation (3)
‱ Extremely imbalanced disk population
‱ Data enhancement via Temporal Progressive Sampling (TPS)
05/20/2022 MSR 2022 42
Figure 4: The design of the Temporal Progressive Sampling (TPS) method.
Neighborhood-Temporal Attention Model (NTAM)
‱ Neighborhood-aware component
To effectively incorporate
neighborhood information
‱ Temporal component
To better capture temporal
information
‱ Decision component
Decide whether the corresponding
disk will fail in near future or not
05/20/2022 MSR 2022 43
Failure probability
Temporal-encoded vector
Neighbor-encoded vectors
Disk Ai & Neighbors Bi
Figure 1: Overview of Neighborhood-aware Attention Model (NTAM).
AI & Software Engineering
05/20/2022 MSR 2022 44
New Research Topic (2)
RAISE 2013 Keynote & Vision Statement
05/20/2022 MSR 2022 45
SIGSOFT Webinar 2019
05/20/2022 MSR 2022 46
IEEE Software 2020 Special Issue
05/20/2022 MSR 2022 47
Making IntelliTest More Intelligent
05/20/2022 MSR 2022 48
Pex journey [ASE 2014]
Pex shipped as IntelliTest in
Visual Studio Enterprise Edition
since 2015
Self-learning (data driven)
Thummalapenta, Xie, Tillmann, de Halleux, and Schulte. MSeqGen: Object-
Oriented Unit-Test Generation via Mining Source Code. ESEC/FSE 2009.
ICSE 2020 Technical Briefing
05/20/2022 MSR 2022 49
Programming is not easy, even for easy task
SELECT e1.brand AS brand, e1.Year as year
FROM table e1=(select sum(sale) as salesum, year,
brand, group by year, brand )
LEFT OUTER JOIN table e2=(select sum(sale) as
salesum, year, brand, group by year, brand)
ON (e1.year = e2.year AND e1. salesum >= e2.
salesum)
GROUP BY e1.brand, e1.year
HAVING COUNT(*) <= 2
ORDER BY year;
A Question: Writing a SQL statement for “top 2 selling brands in each year”
given a table of three columns “sales”, “Brand”, and “year”.
NL2Regex, NL2SQL, ...
05/20/2022 MSR 2022 51
Zhong, Guo, Yang, Peng, Xie, Lou, Liu and Zhang. SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications. EMNLP 2018.
Guo, Liu, Lou, Li, Liu, Xie, and Liu. Benchmarking Meaning Representations in Neural Semantic Parsing. EMNLP 2020.
Dong, Sun, Liu, Lou, and Zhang. Data-Anonymous Encoding for Text-to-SQL Generation. EMNLP 2019.
Conversational Interface for
NL to Data Analysis/Visualization in Excel
aiXcoder
05/20/2022 MSR 2022 53
After aiXcoder 2.0 became online (currently 4.0)
for 1 month, #download > 130K
So far 2C: 300K users
2B: major banks/IT companies
https://aixcoder.com/en/
aiXcoder L and Next
05/20/2022 MSR 2022 54
Billion-scale model parameters NL2Code
New Trend: Big Pre-trained Model + Task Adaptation
GPT-3 can program?
Reflections
05/20/2022 MSR 2022 56
Data Driven vs. Problem Driven
05/20/2022 MSR 2022 57
AI + Human Intelligence
05/20/2022 MSR 2022 58
Making Impact in Practice
‱ Finding the critical scenario
‱ Closing the loop
‱ End-to-end and fast iteration
05/20/2022 MSR 2022 59
Perspective Potential Impact
Problem Applicability
Assumption Problem validity
Constraint
Formulation and solution
Requirement
Evaluation Usefulness in practice
Technology readiness framework
Takeaways
‱ Software Analytics
digital transformation of software industry
‱ Thriving community
‱ New research topics
‱ Cloud Intelligence
‱ AI and Software Engineering
‱ Reflections
‱ Data driven vs. problem driven
‱ AI + human intelligence
‱ Making impact in practice
‱ WE ARE HIRING!
05/20/2022 MSR 2022 60
Acknowledgement
Sincere thank-you to all the academic collaborators, colleagues and
partners in Microsoft, and our talented intern students for the
collaboration and partnership over the years!
05/20/2022 MSR 2022 61
Thanks!
05/20/2022 MSR 2022 62

More Related Content

PDF
2022ćčŽ3月18æ—„ 「ăȘă«ăŒé•ă†ăźïŒŸăƒ‡ă‚žă‚żăƒ«ăƒ„ă‚€ăƒłăšăƒĄă‚żăƒăƒŒă‚čïŒˆæ—„ç”ŒăƒĄă‚żăƒăƒŒă‚čă‚·ăƒłăƒă‚žă‚Šăƒ èł‡æ–™ïŒ‰ă€
PPTX
HoloLens 2ă‚’æ‰‹ă«ć…„ă‚ŒăŸă‚‰ăšă‚Šă‚ăˆăšè©Šă—ăŠăŠăăčきケプăƒȘ
PDF
ă‚čăƒžăƒ›ă‚”ăƒŒăƒ“ă‚čにおける、UIăƒ‡ă‚¶ă‚€ăƒłăźăƒŽă‚Šăƒă‚ŠăšćźŸäŸ‹
PPTX
UniRxでMV(R)Păƒ‘ă‚żăƒŒăƒł をやっどみた
PPTX
ă€Œăƒ‰ă‚­ăƒ„ăƒĄăƒłăƒˆèŠ‹ă€ă‹ă‚‰ăȘい敏題」をăȘんべかしたい - æšȘæ–­æ€œçŽąă‚šăƒłă‚žăƒłć°Žć…„ăźć–ă‚Šç”„ăżă«ă€ă„ăŠ-
PPTX
AR / VR / MRăźäž–ç•Œă«ă€çœźă‘ă‚‹UIă€çœźă‘ăȘいUIă€çœźăăčきUI
PDF
ă€äŒç”»æ›žă€‘UIscopeMOVIDA JAPAN_Demo Dayç”šèł‡æ–™
PDF
ă‚Šă‚©ăƒŒă‚żăƒŒăƒ•ă‚©ăƒŒăƒ«é–‹ç™șă«ăŠă‘ă‚‹ăƒă‚±ăƒƒăƒˆé§†ć‹•é–‹ç™ș -ă‚Šă‚©ăƒŒă‚żăƒ•ă‚©ăƒŒăƒ«é–‹ç™șă‚’ă‚ąă‚żă‚™ăƒ•ă‚šă‚żăƒ•ă‚™ăƒ«ă«ă™ă‚‹-
2022ćčŽ3月18æ—„ 「ăȘă«ăŒé•ă†ăźïŒŸăƒ‡ă‚žă‚żăƒ«ăƒ„ă‚€ăƒłăšăƒĄă‚żăƒăƒŒă‚čïŒˆæ—„ç”ŒăƒĄă‚żăƒăƒŒă‚čă‚·ăƒłăƒă‚žă‚Šăƒ èł‡æ–™ïŒ‰ă€
HoloLens 2ă‚’æ‰‹ă«ć…„ă‚ŒăŸă‚‰ăšă‚Šă‚ăˆăšè©Šă—ăŠăŠăăčきケプăƒȘ
ă‚čăƒžăƒ›ă‚”ăƒŒăƒ“ă‚čにおける、UIăƒ‡ă‚¶ă‚€ăƒłăźăƒŽă‚Šăƒă‚ŠăšćźŸäŸ‹
UniRxでMV(R)Păƒ‘ă‚żăƒŒăƒł をやっどみた
ă€Œăƒ‰ă‚­ăƒ„ăƒĄăƒłăƒˆèŠ‹ă€ă‹ă‚‰ăȘい敏題」をăȘんべかしたい - æšȘæ–­æ€œçŽąă‚šăƒłă‚žăƒłć°Žć…„ăźć–ă‚Šç”„ăżă«ă€ă„ăŠ-
AR / VR / MRăźäž–ç•Œă«ă€çœźă‘ă‚‹UIă€çœźă‘ăȘいUIă€çœźăăčきUI
ă€äŒç”»æ›žă€‘UIscopeMOVIDA JAPAN_Demo Dayç”šèł‡æ–™
ă‚Šă‚©ăƒŒă‚żăƒŒăƒ•ă‚©ăƒŒăƒ«é–‹ç™șă«ăŠă‘ă‚‹ăƒă‚±ăƒƒăƒˆé§†ć‹•é–‹ç™ș -ă‚Šă‚©ăƒŒă‚żăƒ•ă‚©ăƒŒăƒ«é–‹ç™șă‚’ă‚ąă‚żă‚™ăƒ•ă‚šă‚żăƒ•ă‚™ăƒ«ă«ă™ă‚‹-

What's hot (20)

PDF
ă—ă‚‡ăŒă„ăƒ—ăƒŹă‚Œăƒłă‚’ăƒ‘ăƒŻăƒăźă›ă„ă«ă™ă‚‹ăȘ! by @jessedee
PDF
ă‚ČăƒŒăƒ é–‹ç™șćˆćżƒè€…ăźćƒ•ăŒUnity + WebSocketă§äœ•ă‹äœœăŁăŠăżăŸ
PDF
Redmineć°Žć…„èš˜
PDF
あăȘăŸăźăƒăƒŒăƒ ăźă€Œă„ă„äșșă€ăŻæ©Ÿèƒœă—ăŠă„ăŸă™ă‹ïŒŸ
PPTX
ă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ć–ćŒ•ăźă‚·ă‚čテムを開ç™șăƒ»é‹ç”šă—ăŠăżăŠćˆ†ă‹ăŁăŸă“ăš
PDF
ćŻŸè©±èż”ç­”ç”Ÿæˆă«ăŠă‘ă‚‹ć€‹æ€§ăźèżœćŠ ćæ˜ 
PDF
ă‚ąă‚žăƒŁă‚€ăƒ«é–‹ç™șăźäž­ăźèš­èšˆ
PPTX
UE4ăźæ”»ç•„æ–čæł•ă‚’äŒæŽˆïŒ よりćŠčçŽ‡ă‚ˆăæ„œă—ăć­Šă¶ ăŸă‚ăźé‰„ć‰‡ă«ă€ă„ăŠ
PPTX
Teams郚çœČにæ čä»˜ăăŸă§
PDF
The Curious Case of Fuzzing for Automated Software Testing
PDF
2022XP焭りK-Track 甄çč”ă‚’ă‚ąă‚žăƒŁă‚€ăƒ«ă«ă™ă‚‹ć…±ć‰”æˆŠç•„ăšăŻ ă‚»ăƒłăƒˆăƒ©ăƒ«ă‚œăƒ•ăƒˆ 林栄䞀
PDF
æ—„æ›œăƒ•ă‚šăƒ­ă‚Żă‚™ăƒ©ăƒžăƒŒă‹ă‚™â€š1é€±é–“ăă‚‰ă„ăŠă‚™é€šäżĄćŻŸæˆŠă‚±ă‚™ăƒŒăƒ ă‚’äœœăŁăŠăżăŸ
PDF
ă‚łăƒŸăƒ„ăƒ‹ăƒ†ă‚Łăšäșșた瞁
PDF
あăȘたぼă‚čă‚żăƒŒăƒˆă‚ąăƒƒăƒ—ăźă‚ąă‚€ăƒ‡ă‚ąăźè‚Čどかた
PDF
ă€äŒç”»æ›žă€‘gamba!(ガンバ)ïŒšă‚”ăƒ ăƒ©ă‚€ă‚€ăƒłă‚­ăƒ„ăƒ™ăƒŒăƒˆæ§˜ć‘ă‘_äŒç”»ăƒ—ăƒŹă‚Œăƒłèł‡æ–™
PDF
UE4+Photonă§ăƒăƒƒăƒˆăƒŻăƒŒă‚ŻćŒæœŸă‚’èĄŒă†
PDF
ăƒ–ăƒ«ăƒŒăƒ—ăƒȘント+ăƒ“ă‚žăƒ„ă‚ąăƒ«ă‚čクăƒȘプトべä»Čè‰Żăă‚„ă‚‹æ–čæł•
PPT
E school japan waseda july11
PDF
UE4ă«ăŠă‘ă‚‹ć€§èŠæšĄăƒŹăƒ™ăƒ«ćźŸèŁ…ăƒŻăƒŒă‚Żăƒ•ăƒ­ăƒŒăšăƒ–ăƒ«ăƒŒăƒ—ăƒȘăƒłăƒˆæŽ»ç”šäș‹äŸ‹
PPTX
Power Apps ăȘă«ăă‚ŒïŒŸ ăŠă„ă—ă„ăźïŒŸ
ă—ă‚‡ăŒă„ăƒ—ăƒŹă‚Œăƒłă‚’ăƒ‘ăƒŻăƒăźă›ă„ă«ă™ă‚‹ăȘ! by @jessedee
ă‚ČăƒŒăƒ é–‹ç™șćˆćżƒè€…ăźćƒ•ăŒUnity + WebSocketă§äœ•ă‹äœœăŁăŠăżăŸ
Redmineć°Žć…„èš˜
あăȘăŸăźăƒăƒŒăƒ ăźă€Œă„ă„äșșă€ăŻæ©Ÿèƒœă—ăŠă„ăŸă™ă‹ïŒŸ
ă‚ąăƒ«ă‚ŽăƒȘă‚șăƒ ć–ćŒ•ăźă‚·ă‚čテムを開ç™șăƒ»é‹ç”šă—ăŠăżăŠćˆ†ă‹ăŁăŸă“ăš
ćŻŸè©±èż”ç­”ç”Ÿæˆă«ăŠă‘ă‚‹ć€‹æ€§ăźèżœćŠ ćæ˜ 
ă‚ąă‚žăƒŁă‚€ăƒ«é–‹ç™șăźäž­ăźèš­èšˆ
UE4ăźæ”»ç•„æ–čæł•ă‚’äŒæŽˆïŒ よりćŠčçŽ‡ă‚ˆăæ„œă—ăć­Šă¶ ăŸă‚ăźé‰„ć‰‡ă«ă€ă„ăŠ
Teams郚çœČにæ čä»˜ăăŸă§
The Curious Case of Fuzzing for Automated Software Testing
2022XP焭りK-Track 甄çč”ă‚’ă‚ąă‚žăƒŁă‚€ăƒ«ă«ă™ă‚‹ć…±ć‰”æˆŠç•„ăšăŻ ă‚»ăƒłăƒˆăƒ©ăƒ«ă‚œăƒ•ăƒˆ 林栄䞀
æ—„æ›œăƒ•ă‚šăƒ­ă‚Żă‚™ăƒ©ăƒžăƒŒă‹ă‚™â€š1é€±é–“ăă‚‰ă„ăŠă‚™é€šäżĄćŻŸæˆŠă‚±ă‚™ăƒŒăƒ ă‚’äœœăŁăŠăżăŸ
ă‚łăƒŸăƒ„ăƒ‹ăƒ†ă‚Łăšäșșた瞁
あăȘたぼă‚čă‚żăƒŒăƒˆă‚ąăƒƒăƒ—ăźă‚ąă‚€ăƒ‡ă‚ąăźè‚Čどかた
ă€äŒç”»æ›žă€‘gamba!(ガンバ)ïŒšă‚”ăƒ ăƒ©ă‚€ă‚€ăƒłă‚­ăƒ„ăƒ™ăƒŒăƒˆæ§˜ć‘ă‘_äŒç”»ăƒ—ăƒŹă‚Œăƒłèł‡æ–™
UE4+Photonă§ăƒăƒƒăƒˆăƒŻăƒŒă‚ŻćŒæœŸă‚’èĄŒă†
ăƒ–ăƒ«ăƒŒăƒ—ăƒȘント+ăƒ“ă‚žăƒ„ă‚ąăƒ«ă‚čクăƒȘプトべä»Čè‰Żăă‚„ă‚‹æ–čæł•
E school japan waseda july11
UE4ă«ăŠă‘ă‚‹ć€§èŠæšĄăƒŹăƒ™ăƒ«ćźŸèŁ…ăƒŻăƒŒă‚Żăƒ•ăƒ­ăƒŒăšăƒ–ăƒ«ăƒŒăƒ—ăƒȘăƒłăƒˆæŽ»ç”šäș‹äŸ‹
Power Apps ăȘă«ăă‚ŒïŒŸ ăŠă„ă—ă„ăźïŒŸ
Ad

Similar to MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection and Path Forward (20)

PDF
Learn Best Practices of a True Hybrid IT Management Approach
PPTX
Engineering_Campus_Presentation_2022 (1)-compressed.pptx
PDF
Why Modern Systems Require a New Approach to Observability
PDF
AI in the Enterprise
PPTX
Build Answer-generating Apps that Users Love: Development best practices for ...
PDF
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
PDF
Next-Gen Legacy Modernization- GenAI, Kubernetes, and Google Cloud in Action ...
PDF
What is the future of data strategy?
PPTX
Data Science as a Service: Intersection of Cloud Computing and Data Science
PPTX
Data Science as a Service: Intersection of Cloud Computing and Data Science
PPTX
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
PDF
InterConnect 2017 : Cognitive DevOps: Get Rid of the Guesswork to Improve Sof...
PDF
Modern Business Intelligence - Design and Implementations
PPTX
How to add security in dataops and devops
PPTX
Presented at CDOIQ 2024: How to Unlock Data for AI by Breaking Through the Da...
PDF
A Machine learning based framework for Verification and Validation of Massive...
PDF
FinishedProject
PDF
Project FMEA for Recognizing Difficulties in Machine Learning Application Sys...
PPTX
Cloud-Based IoT Analytics and Machine Learning
PDF
MTech- Viva_Voce
Learn Best Practices of a True Hybrid IT Management Approach
Engineering_Campus_Presentation_2022 (1)-compressed.pptx
Why Modern Systems Require a New Approach to Observability
AI in the Enterprise
Build Answer-generating Apps that Users Love: Development best practices for ...
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
Next-Gen Legacy Modernization- GenAI, Kubernetes, and Google Cloud in Action ...
What is the future of data strategy?
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
InterConnect 2017 : Cognitive DevOps: Get Rid of the Guesswork to Improve Sof...
Modern Business Intelligence - Design and Implementations
How to add security in dataops and devops
Presented at CDOIQ 2024: How to Unlock Data for AI by Breaking Through the Da...
A Machine learning based framework for Verification and Validation of Massive...
FinishedProject
Project FMEA for Recognizing Difficulties in Machine Learning Application Sys...
Cloud-Based IoT Analytics and Machine Learning
MTech- Viva_Voce
Ad

More from Tao Xie (20)

PPTX
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
PPTX
Intelligent Software Engineering: Synergy between AI and Software Engineering
PDF
Diversity and Computing/Engineering: Perspectives from Allies
PDF
Intelligent Software Engineering: Synergy between AI and Software Engineering...
PDF
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
PDF
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
PDF
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
PDF
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
PPTX
Intelligent Software Engineering: Synergy between AI and Software Engineering
PDF
Software Analytics: Data Analytics for Software Engineering and Security
PDF
Planning and Executing Practice-Impactful Research
PDF
Software Analytics: Data Analytics for Software Engineering
PDF
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
PPTX
Transferring Software Testing Tools to Practice
PPTX
Advances in Unit Testing: Theory and Practice
PDF
Common Technical Writing Issues
PPTX
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
PPTX
Transferring Software Testing and Analytics Tools to Practice
PDF
User Expectations in Mobile App Security
PPTX
Impact-Driven Research on Software Engineering Tooling
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
Intelligent Software Engineering: Synergy between AI and Software Engineering
Diversity and Computing/Engineering: Perspectives from Allies
Intelligent Software Engineering: Synergy between AI and Software Engineering...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
Intelligent Software Engineering: Synergy between AI and Software Engineering
Software Analytics: Data Analytics for Software Engineering and Security
Planning and Executing Practice-Impactful Research
Software Analytics: Data Analytics for Software Engineering
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice
Advances in Unit Testing: Theory and Practice
Common Technical Writing Issues
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
Transferring Software Testing and Analytics Tools to Practice
User Expectations in Mobile App Security
Impact-Driven Research on Software Engineering Tooling

Recently uploaded (20)

PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Transform Your Business with a Software ERP System
PDF
medical staffing services at VALiNTRY
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Digital Strategies for Manufacturing Companies
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Nekopoi APK 2025 free lastest update
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
history of c programming in notes for students .pptx
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Design an Analysis of Algorithms I-SECS-1021-03
Upgrade and Innovation Strategies for SAP ERP Customers
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Odoo POS Development Services by CandidRoot Solutions
Reimagine Home Health with the Power of Agentic AI​
Transform Your Business with a Software ERP System
medical staffing services at VALiNTRY
How to Migrate SBCGlobal Email to Yahoo Easily
How to Choose the Right IT Partner for Your Business in Malaysia
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Digital Strategies for Manufacturing Companies
Odoo Companies in India – Driving Business Transformation.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Nekopoi APK 2025 free lastest update
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
history of c programming in notes for students .pptx
CHAPTER 2 - PM Management and IT Context
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection and Path Forward

  • 1. Software Analytics: Reflection and Path Forward Dr. Dongmei Zhang Data, Knowledge, and Intelligence (DKI) Group Microsoft Research Asia Prof. Tao Xie School of Computer Science Peking University
  • 2. Outline ‱ Origin and early research ‱ Community building ‱ New research topics ‱ Reflections 05/20/2022 MSR 2022 2
  • 3. Origin and Early Research 05/20/2022 MSR 2022 3
  • 4. 05/20/2022 MSR 2022 4 Software Analytics Group at MSRA, founded in May 2009
  • 5. Software Analytics Research Utilize data-driven approach to help create high quality, user friendly, and efficiently developed and operated software and services 05/20/2022 MSR 2022 5 Information Visualization Analysis Algorithms Large-scale Computing Vertical Horizontal https://www.microsoft.com/en-us/research/group/software-analytics/ http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
  • 6. Prof. Tao Xie’s Visit at MSRA SA 05/20/2022 MSR 2022 6
  • 7. Defining Software Analytics Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services. 05/20/2022 MSR 2022 7 D. Zhang, Y. Dang, J. Lou, S. Han, H. Zhang, and Tao Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences. In MALETS 2011.
  • 8. Six dimensions 05/20/2022 MSR 2022 8 Research Topics Technology Pillars Target Audience Connection to Practice Output Input
  • 9. Research topics – the trinity view 05/20/2022 MSR 2022 9 ‱ Covering major areas of software domain ‱ Throughout entire development cycle ‱ Enabling practitioners to obtain insights Software Users Software Development Process Software System
  • 10. Input - data sources 05/20/2022 MSR 2022 10 Runtime traces Program logs System events Perf counters 
 Usage log User surveys Online forum posts Blog & Twitter 
 Source code Bug history Check-in history Test cases 

  • 11. Output – insightful information ‱ Conveys meaningful and useful understanding or knowledge towards completing the target task ‱ Not easily attainable via directly investigating raw data without aid of analytics technologies ‱ Examples ‱ It is easy to count the number of re-opened bugs, but how to find out the primary reasons for these re-opened bugs? ‱ When the availability of an online service drops below a threshold, how to localize the problem? 05/20/2022 MSR 2022 11
  • 12. Output – actionable information ‱ Enables software practitioners to come up with concrete solutions towards completing the target task ‱ Examples ‱ Why bugs were re-opened? ‱ A list of bug groups each with the same reason of re-opening ‱ Why availability of online services dropped? ‱ A list of problematic areas with associated confidence values ‱ Which part of my code should be refactored? ‱ A list of cloned code snippets easily explored from different perspectives 05/20/2022 MSR 2022 12
  • 13. Technology pillars 05/20/2022 MSR 2022 13 Software Users Software Development Process Software System Information Visualization Analysis Algorithms Large-scale Computing Vertical Horizontal Technology pillars
  • 14. Target audience – software practitioners 05/20/2022 MSR 2022 14 Developer Tester Program Manager Usability engineer Designer Support engineer Management personnel Operation engineer
  • 15. Connection to practice ‱ Software Analytics is naturally tied with software development practice ‱ Getting real 05/20/2022 MSR 2022 15 Real Data Real Problems Real Users Real Tools
  • 16. Early projects 05/20/2022 MSR 2022 16 StackMine – Performance debugging in the large via mining millions of stack traces Scalable code clone analysis Data exploration for Customer Experience Improvement Program (CEIP)
  • 17. 05/20/2022 MSR 2022 17 Performance Debugging in the Large via Mining Millions of Stack Traces S. Han, Y. Dong, D. Zhang, and T. Xie, ICSE 2012 Comprehending Performance from Real-World Execution Traces: A Device-Driver Case X. Yu, S. Han, D. Zhang, and T. Xie, ASPLOS 2014
  • 18. 05/20/2022 MSR 2022 18 Performance Debugging in the Large via Mining Millions of Stack Traces S. Han, Y. Dong, D. Zhang, and T. Xie, ICSE 2012 Comprehending Performance from Real-World Execution Traces: A Device-Driver Case X. Yu, S. Han, D. Zhang, and T. Xie, ASPLOS 2014 as representative paper in 2012, 1 of 20 representative papers (one paper a year)
  • 20. Building Upon Rich Work by the Communities 05/20/2022 MSR 2022 20 FSE/SDP Workshop on the Future of Software Engineering Research (FoSER 2010) ...
  • 23. CCCF/IEEE Software 2013 Articles 05/20/2022 MSR 2022 23
  • 25. Tutorials/Tech Briefings at ICSE/FSE/ASE... ‱ [ASE 11 Tutorial] Zhang & Xie. xSA: eXtreme Software Analytics - Marriage of eXtreme Computing and Software Analytics ‱ [CSEE&T 12 Tutorial] Zhang, Dang, Han & Xie. Teaching and Training for Software Analytics ‱ [ICSE 12 SEIP Mini Tutorial] Zhang & Xie. Software Analytics in Practice: Mini Tutorial ‱ [ICSE 13 Tutorial] Zhang & Tao Xie. Software Analytics: Achievements and Challenges ‱ [FSE 14 Tutorial] Zhang & Tao Xie. Software Analytics: Achievements and Challenges 05/20/2022 MSR 2022 25
  • 26. Community Building by Others 05/20/2022 MSR 2022 26 IEEE Software 2013 Special Issue Dagstuhl Seminar 2014 International Workshop on Software Analytics (SWAN) 2015, 2016, 2017, 2018 ...
  • 28. Beyond SE Communities: ASPLOS 2021 Keynote 05/20/2022 MSR 2022 28 ASPLOS is the premier forum for interdisciplinary systems research, intersecting computer architecture, hardware and emerging technologies, programming languages and compilers, operating systems, and networking.
  • 29. New Research Topic (1) Cloud Intelligence 05/20/2022 MSR 2022 29
  • 30. Cloud Services ‱ Shift to cloud becoming mainstream ‱ Critical role of cloud computing platforms fortified by COVID-19 05/20/2022 MSR 2022 30 2018 2019 2020 2021 2022 System Infrastructure 11% 13% 16% 19% 22% Infrastructure software 13% 15% 17% 18% 20% Application software 34% 36% 38% 39% 40% Business process outsourcing 27% 28% 29% 29% 30% Total 19% 21% 24% 26% 28% Cloud shift proportion by category Source: Gartner (August 2018) 2019 2020 2021 2022 BPaaS 45,212 44,741 47,521 50,336 PaaS 37,512 43,823 55,486 68,964 SaaS 102,064 101,480 117,773 138,261 IaaS 44,457 51,421 65,264 82,225 DaaS 616 1,204 1,945 2,542 Total Market 242,696 257,549 304,990 362,263 Worldwide public cloud services end-user spending forecast (Millions of USD) Source: Gartner (November 2020) Note: Totals may not add up due to rounding.
  • 31. Focusing on Cloud Computing ‱ Huge space for improvement for cloud computing platforms ‱ Software Analytics is the digital transformation of software industry ‱ Cloud intelligence ‱ Software Analytics focusing on cloud computing ‱ Re-emergence of AI ‱ Making impact is key 05/20/2022 MSR 2022 31
  • 32. Cloud Intelligence Using AI/ML technologies to effectively and efficiently design, build and operate complex cloud services at scale MSR 2022 32 Customers Engineering Services ‱ AI for System Designing and building high-quality services with better reliability, performance, and efficiency ‱ AI for Customers Improving customer satisfaction with intelligence and better user experiences ‱ AI for DevOps Achieving high productivity in DevOps via empowering engineers with intelligent tooling 05/20/2022
  • 33. ‱ Cloud Intelligence Workshop ‱ @ AAAI 2020 ‱ @ ICSE 2021 ‱ @ SysML 2022 ‱ Program Chair Jian Zhang, Microsoft Azure ‱ Steering Committee Rama Akkiraju, IBM Ricardo Bianchini, Microsoft Research Mike Dahlin, Google Marcus Fontoura, Microsoft Azure Ahmed E. Hassan, Queen’s University Michael Lyu, Chinese University of Hong Kong Erik Meijer, Facebook Tao Xie, Peking University Dongmei Zhang, Microsoft Research Yuanyuan Zhou, UCSD Related Efforts 05/20/2022 MSR 2022 33 ‱ AIOps by Gartner “Put simply, AIOps is the application of machine learning (ML) and data science to IT operations problems. AIOps platforms combine big data and ML functionality to enhance and partially replace all primary IT operations functions, including availability and performance monitoring, event correlation and analysis, and IT service management and automation.” ‱ AIOps extended AIOps: Real-world Challenges and Research Innovations Yingnong Dang, Qingwei Lin, Peng Huang Technical Briefing, ICSE 2019
  • 34. Scenarios 05/20/2022 MSR 2022 34 Service health measuring (KPI) ‱ Availability / reliability ‱ Performance ‱ Security Anomalous behavior detection ‱ KPI (Overall, component) ‱ Resource (overhead / leak) Health prediction ‱ Infrastructure (e.g., power, cooling) ‱ HW, SW Failure ‱ Workload ‱ System capacity Auto-recovery/adjustment/healing ‱ Recovery option optimization ‱ Auto healing Programming ‱ API/code suggestion ‱ Code defect, smell, code review ‱ Test coverage, test selection CI/CD ‱ Integration testing and strategy ‱ Rollout risk assessment and strategy Auto-triage & diagnosis ‱ Auto-triage (investigation owner) ‱ Diagnosis intelligence Repair/mitigation decision ‱ Solution recommendation ‱ Decision support Customer behavior understanding ‱ Usage experience ‱ Customer churn Proactive customer engagement ‱ Service auto-scale (up/down) ‱ Engaging before reporting Intelligent customer support ‱ Self-serve ‱ Efficient communication ‱ Intelligent suggestion/hints Service Engineering Customer
  • 35. Problems and Challenges MSR 2022 35 Detection Diagnosis Optimization Prediction ‱ Time-series anomaly detection ‱ Log-based anomaly detection ‱ Multi-dimensional change detection ‱ 
 ‱ Log pattern mining ‱ Correlation analysis ‱ Dependency graph diagnosis ‱ 
 ‱ Context/dependency-aware prediction ‱ Automated feature engineering ‱ Extremely-imbalanced data prediction ‱ 
 Diverse requirements, noisy data, high dimensions, lack of labeled data 
 Diverse causes, complex service dependency, scattered knowledge
 Huge problem space, large scale data, complex constraints and tradeoffs, 
 Highly imbalanced class, fast system evolution, unpredictable behavior changes, 
 ‱ Multi-constraint/objective optimization ‱ DL-based combinatorial search ‱ Optimization under prediction uncertainty ‱ 
 PROBLEMS CHALLENGES 05/20/2022
  • 36. Disk Failure Prediction in Cloud Computing Platform Improving Service Availability of Cloud Systems by Predicting Disk Error, Y. Xu, K. Sui, R. Yao, H. Zhang, Q. Lin, Y. Dang, P. Li, K. Jiang, W. Zhang, J. Lou, M. Chintalapati, D. Zhang, USNIX ATC 2018. NTAM: Neighborhood-Temporal Attention Model for Disk Failure Prediction in Cloud Platforms, C. Luo, P. Zhao, B. Qiao, Y. Wu, H. Zhang, W. Wu, W. Lu, Y. Dang, S. Rajmohan, Q. Lin, D. Zhang, the Web Conference 2021. 05/20/2022 MSR 2022 36
  • 37. Virtual Machine (VM) Availability and Disk Failures ‱ Hardware issues are one of the top reasons of VM going down and VM reboot ‱ Disk failures contribute most to the hardware issues 05/20/2022 MSR 2022 37 Source: https://www.backblaze.com/blog/hard-drive-stats-for-2018/ Source: https://www.microsoft.com/en-us/research/wp- content/uploads/2016/08/a7-narayanan.pdf SSD Annualized Failure Rates
  • 38. Binary Classification Problem The training set is a collection of 𝑁𝑁 training samples, denoted as đ·đ· = { 𝑋𝑋1, 𝑩𝑩1 , (𝑋𝑋2, 𝑩𝑩2) 
 , (𝑋𝑋𝑁𝑁, 𝑩𝑩𝑁𝑁)} 𝑋𝑋𝑖𝑖 represents the corresponding disk 𝑑𝑑𝑖𝑖’s own status data and neighborhood information, i.e., 𝑋𝑋𝑖𝑖 = 𝐮𝐮𝑖𝑖 âˆȘ đ”đ”đ‘–đ‘–, 𝐮𝐮𝑖𝑖 ∈ 𝑅𝑅ℎ×𝑛𝑛 represents 𝑑𝑑𝑖𝑖’s own status data, and đ”đ”đ‘–đ‘– is a subset of unions of all 𝐮𝐮𝑖𝑖. 𝑩𝑩𝑖𝑖 ∈ {0,1} is the label 𝑩𝑩𝑖𝑖 = 1 means that the corresponding disk will fail in near future 𝑩𝑩𝑖𝑖 = 0 means ‘healthy’ Loss function 𝐿𝐿 = − 1 𝑁𝑁 ïżœ 𝑖𝑖=1 𝑁𝑁 [𝑩𝑩𝑖𝑖 ⋅ log ïżœ 𝑩𝑩𝑖𝑖 + 1 − 𝑩𝑩𝑖𝑖 ⋅ log(1 − ïżœ 𝑩𝑩𝑖𝑖)] 05/20/2022 MSR 2022 38
  • 39. Related Work ‱ Traditional machine learning based approaches ‱ Support Vector Machine (SVM) [MSST 2013] ‱ Decision Tree (DT) [DSN 2014] ‱ Random Forest (RF) [DSN 2018] ‱ Gradient Boosting Decision Tree (GBDT) [Ph.D. Dissertation, UCLA 2017] ‱ Regularized Greedy Forest (RGF) [KDD 2016] ‱ Cloud Disk Error Forecasting (CDEF) [USENIX ATC 2018] ‱ Deep Learning based approaches ‱ Recurrent Neural Network (RNN) [IEEE Transactions on Computers 2016] ‱ Long Short-Term Memory (LSTM) [ICDM 2018] ‱ Temporal Convolution Neural Network (TCNN) [DAC 2019] ‱ Convolution Neural Network with Long Short-Term Memory (CNN+LSTM) [FAST 2020] ‱ Neighborhood-Temporal Attention Model (NTAM) [Web Conference 2021] 05/20/2022 MSR 2022 39
  • 40. Observations (1) ‱ VMs can be impacted before disks completely fail ‱ Disk errors occur before disk completely fails ‱ Disk errors often reflected by system-level signals such as OS events 05/20/2022 MSR 2022 40 Name Description Timestamp The timestamp 𝑡𝑡 of the feature vector recorded. Disk ID The unique ID of disk 𝑑𝑑𝑖𝑖 . Node ID The unique ID of each computing server (i.e. node) 𝑑𝑑𝑖𝑖 is associated with. SMART Attributes The SMART attributes of 𝑑𝑑𝑖𝑖 recorded at 𝑡𝑡, providing information such as the Current Pending Sector Count, Seek Error Rate, Soft Read Error Rate, etc. System-related attributes OS events such as paging error, file system error, device reset, telemetry loss, etc. Driver-related attributes Gathered from disk driver with information on Flush Count, IO Latency, Controller Reset, etc.
  • 41. Observation (2) ‱ A disk’s health status may be impacted by its neighboring disks ‱ Incorporating individual disk’s status and its neighborhood info 05/20/2022 MSR 2022 41 Figure 2: The architecture of the neighborhood-aware component underlying NTAM.
  • 42. Observation (3) ‱ Extremely imbalanced disk population ‱ Data enhancement via Temporal Progressive Sampling (TPS) 05/20/2022 MSR 2022 42 Figure 4: The design of the Temporal Progressive Sampling (TPS) method.
  • 43. Neighborhood-Temporal Attention Model (NTAM) ‱ Neighborhood-aware component To effectively incorporate neighborhood information ‱ Temporal component To better capture temporal information ‱ Decision component Decide whether the corresponding disk will fail in near future or not 05/20/2022 MSR 2022 43 Failure probability Temporal-encoded vector Neighbor-encoded vectors Disk Ai & Neighbors Bi Figure 1: Overview of Neighborhood-aware Attention Model (NTAM).
  • 44. AI & Software Engineering 05/20/2022 MSR 2022 44 New Research Topic (2)
  • 45. RAISE 2013 Keynote & Vision Statement 05/20/2022 MSR 2022 45
  • 47. IEEE Software 2020 Special Issue 05/20/2022 MSR 2022 47
  • 48. Making IntelliTest More Intelligent 05/20/2022 MSR 2022 48 Pex journey [ASE 2014] Pex shipped as IntelliTest in Visual Studio Enterprise Edition since 2015 Self-learning (data driven) Thummalapenta, Xie, Tillmann, de Halleux, and Schulte. MSeqGen: Object- Oriented Unit-Test Generation via Mining Source Code. ESEC/FSE 2009.
  • 49. ICSE 2020 Technical Briefing 05/20/2022 MSR 2022 49
  • 50. Programming is not easy, even for easy task SELECT e1.brand AS brand, e1.Year as year FROM table e1=(select sum(sale) as salesum, year, brand, group by year, brand ) LEFT OUTER JOIN table e2=(select sum(sale) as salesum, year, brand, group by year, brand) ON (e1.year = e2.year AND e1. salesum >= e2. salesum) GROUP BY e1.brand, e1.year HAVING COUNT(*) <= 2 ORDER BY year; A Question: Writing a SQL statement for “top 2 selling brands in each year” given a table of three columns “sales”, “Brand”, and “year”.
  • 51. NL2Regex, NL2SQL, ... 05/20/2022 MSR 2022 51 Zhong, Guo, Yang, Peng, Xie, Lou, Liu and Zhang. SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications. EMNLP 2018. Guo, Liu, Lou, Li, Liu, Xie, and Liu. Benchmarking Meaning Representations in Neural Semantic Parsing. EMNLP 2020. Dong, Sun, Liu, Lou, and Zhang. Data-Anonymous Encoding for Text-to-SQL Generation. EMNLP 2019. Conversational Interface for
  • 52. NL to Data Analysis/Visualization in Excel
  • 53. aiXcoder 05/20/2022 MSR 2022 53 After aiXcoder 2.0 became online (currently 4.0) for 1 month, #download > 130K So far 2C: 300K users 2B: major banks/IT companies https://aixcoder.com/en/
  • 54. aiXcoder L and Next 05/20/2022 MSR 2022 54 Billion-scale model parameters NL2Code
  • 55. New Trend: Big Pre-trained Model + Task Adaptation GPT-3 can program?
  • 57. Data Driven vs. Problem Driven 05/20/2022 MSR 2022 57
  • 58. AI + Human Intelligence 05/20/2022 MSR 2022 58
  • 59. Making Impact in Practice ‱ Finding the critical scenario ‱ Closing the loop ‱ End-to-end and fast iteration 05/20/2022 MSR 2022 59 Perspective Potential Impact Problem Applicability Assumption Problem validity Constraint Formulation and solution Requirement Evaluation Usefulness in practice Technology readiness framework
  • 60. Takeaways ‱ Software Analytics digital transformation of software industry ‱ Thriving community ‱ New research topics ‱ Cloud Intelligence ‱ AI and Software Engineering ‱ Reflections ‱ Data driven vs. problem driven ‱ AI + human intelligence ‱ Making impact in practice ‱ WE ARE HIRING! 05/20/2022 MSR 2022 60
  • 61. Acknowledgement Sincere thank-you to all the academic collaborators, colleagues and partners in Microsoft, and our talented intern students for the collaboration and partnership over the years! 05/20/2022 MSR 2022 61