SlideShare a Scribd company logo
MAC6912 -Ambientes de Desenvolvimento de Software 
Professor Marco Aurélio Gerosa 
Ana Paula Oliveira Bertholdo
•4 papers: 
–Analyticsfor SW Development 
(Zimmermann & Buse, 2010) 
–SwAnalyticsas a Learning Case in Practice: Approaches andExperiences 
(Zhang et al., 2011) 
–Analyzethis! 145 questionsfor data scientistsin SwEngineering 
(Begel& Zimmermann, 2014) 
–What’snextin SW Analytics 
(Hassan et al., 2013)
•Software engineering is a data rich activity. 
•Artifacts of a project’s development 
–automation, efficiency, and granularity. 
•Projects can be measured throughout their life-cycle.
•SW development continues to be risky and unpredictable. 
•It is not unusual for major development efforts to experience large delays or failures.
•Substantial disconnect between 
–(A) the information needed by project managers to make good decisions and 
–(B) the information currently delivered by existing tools. 
–At its root: 
•Problem: real-world information needs of project managers are not well understood by the research community. 
•Research has ignored the needs of managers and has instead focused on the information needs of developers.
•When data needs are not met… 
–tools are unavailable 
–too difficult to use 
–too difficult to interpret or 
–they simply do not present useful or actionable information 
•Managers must primarily rely on past experience and intuition for critical decision making.
•The data-centric style of decision making is known as analytics. 
•The idea is to leverage large amounts of data into real and actionable insights.
Figure 1: AnalyticalQuestions. The researchersdistinguishbetweenquestionsofinformationwhichcanbedirectlymeasured, fromquestionsofinsight whicharisefroma carefulanalyticanalysisandprovidemanagers witha basisfor action.
•Transition isn’t easy! 
•Insight necessarily requires 
–knowledge of the domain coupled with the 
–ability to identify patterns involving multiple indicators.
•Managers may be too busy or may simply lack the quantitative skills or analytic expertise to fully leverage advanced analytical applications. 
•One possibility is that tools should be created with this in mind. 
•Another possibility is the addition of an analytic professional to the software development team.
Software Analytics
•Conclusion 
–All resources, especially talent, are always constrained. 
–This alludes to the importance of careful and deliberate decision making by the managers of software projects. 
–The observation that software projects continue to be risky and unpredictable despite being highly measurable implies that more analytic information should be leveraged toward decision making. 
–In this paper, the researchers 
•described how software analytics can help managers move from low-level measurements to high-level insights about complex projects. 
•advocated more research into the information needs and decision process of managers. 
•discussed how the complexity of software development suggests that dedicated analytic professionals with both quantitative skills and domain knowledge might provide great benefit to future projects.
•Researchers(Microsoft Research Asia) advocatethatwhenapplying analytic technologies in practice one should: 
–(1) incorporate a broad spectrum of domain knowledge and expertise, 
•e.g., management, machine learning, large-scale data processing and computing, and information visualization; and 
–(2) investigate how practitioners take actions on the produced information, and provide effective support for such information-based action taking.
–Various analytic technologies 
•(data mining, machine learning, and information visualization). 
–Software analytics is to enable to perform data exploration and analysis in order to obtain insightful and actionable information. 
–Insightful information 
•meaningful and useful understanding or knowledge towards performing the target task. 
–Actionable information 
•upon which software practitioners can come up with concrete solutions towards completing the target task.
•Developing a software analytic project typically goes through iterations of the life cycle of four phases: 
1) task definition, 
2) data preparation, 
3) analytic-technology development, and 
4) deployment and feedback gathering.
•Task definition is to define the target task to be assisted by software analytics 
–pull model: Stack Mine -> performance analysis 
–push model: XIAO -> refactoring and defect detection
•Data preparationis to collect data to be analyzed. 
–2 types of infrastructure supports: existing ones in industry and in-house ones. 
–StackMine-> existing Microsoft infrastructure support. 
–XIAO-> in-house code-analysis.
•Analytic-technology development is to develop problem formulation, algorithms, and systems to explore, understand, and get insights from the data. 
–The SA team needs to acquire deep knowledge about the data (including its format and semantics) and target tasks. 
–the time this acquirement process may be non-trivial.
•Deployment and feedback gathering involves two typical scenarios. 
–1: the researchers have obtained some insightful information from the data and they ask domain experts to review and verify. 
–2: the researchers ask domain experts to use the analytic tools to obtain insights by themselves. 
•“the more the customers use the tools, the “smarter” the tools become.”
•Domain knowledge and expertise are strongly needed in successfully developing a software analytic project for technology transfer. 
•Types of domain knowledge: 
–Specific application domain knowledge (customers). 
–Common application domain knowledge(family of swapplications). 
–Data domain knowledge(data preparation).
•Typesofexpertise: 
–Task expertise 
•work with the customers to learn the workflow. 
–Management expertise 
•good management and communication skills to interact with the customers and manage the team. 
–Machine learning expertise. 
•to develop machine learning algorithms and tools (not just in a black-box way). 
–Large-scale data processing/computing expertise. 
•to design and implement scalable data processing tools and learning tools. 
–Information visualization expertise. 
•to design and implement good user interfaces and visualization for presenting analysis results.
•Conclusion: 
–What do developers think about your result? 
–Is it applicable in their context? 
–How much would it help them in their daily work?”
•Resultsfrom2 Surveysrelatedtodata Science appliedtoSW Engineering. 
•1st Survey: 
•questionsthatswengineerswouldlikedata scientiststoinvestigateaboutsw, swprocesses andpracticesandswengineers. 
•2nd Survey: 
–Swengineersrate 145 questionsandidentifythemostimportante onestoworkonfirst.
•Businesses of all types commonly use analytics to better reach and understand their customers. 
•Many software engineering researchers have argued for more use of data for decision-making. 
•The demand for data scientists in software projects will grow rapidly. 
•Harvard Business Review named the job of Data Scientist as the most desired Job of the 21st Century 
•By 2018, the U.S. may face a shortage of as many as 190,000 people with analytical expertise and of 1.5 million managers and analysts with the skills to make data-driven decisions, according to a report by the McKinsey Global Institute.
•Researchgoal: 
–Presentsa rankedlistofquestionsthatswengineerswanttohaveansweredbydata scientists. 
–The listwasdeployedamongprofessional swengineersatMicrosoft.
Software Analytics
•The research: 
–provides a catalog of 145 questions that software engineers would like to ask data scientists about software. 
–ranks the questions by importance (and opposition) to help researchers, practitioners, and educators focus their efforts on topics of importance to industry. 
–calls to action to other industry companies and to the academic community to replicate its methods and grow the body of knowledge from this start (technical report).
•Initialsurvey: 
–2 pilotsurveysto25 and75 Microsoft engineers. 
–The pilot demonstrated the need to seed the survey with data analytics questions. 
•What impact does code quality have on our ability to monetize a software service? 
–1500 SW engineers in September 2012. 
–36,5% developers, 38,9% testers, 22,7% program managers.
Software Analytics
Software Analytics
Software Analytics
•Rating Survey: 
–Split QuestionnaireSurveyDesign 
•Componentblocks 
–607 responses (2500 engineers) 
–16,705 ratings 
–Multiple-choiceformat 
–29,3% developers, 30,1% testersand40,5% programmanagers.
Software Analytics
Software Analytics
Software Analytics
•Of the questions with the most opposition, the top five are about the fear that respondents had of being ranked and rated.
Software Analytics
Catalog of 145 questions is relevant for: 
•Research: 
–the descriptive questions outline opportunities to collaborate with industry and 
–influence their software development processes, practices, and tools. 
•Practice: 
•the list of questions identifies particular data to collect and analyze to find answers, 
•as well as the need to build collection and analysis tools at industrial scale. 
•Education: 
•the questions provide guidance on what analytical techniques to teach in courses for future data scientists, 
•as well as providing instruction on topics of importance to industry (which students always appreciate).
•Conclusion 
–Researchershope that this paper will inspire similar research projects. 
–In order to facilitate replication of this work for additional engineering disciplines and companies, they provide the full text of both surveys as well as the 145 questions in a technical report. 
–With the growing demand for data scientists, more research is needed to better understand how people make decisions in software projects and what data and tools they need. 
–There is also a need to increase the data literacy of future software engineers. 
–Lastly, we need to think more about the consumer of analyses and not just the producers of them (data scientists, empirical researchers).
•6 establishedexperts in SW analytics 
•Whatisthemostimportante aspectofthisfield?
•1) SW analyticsshouldgo beyonddevelopers. 
•2) Analyticsshouldprove its relevancetopractitioners. 
•3) Merenumbersaren’tenough. 
•4) 3 Questionsfor analytics. 
•5) Opportunitiesfor natural SW analytics. 
•6) AssistancefromInformationAnalysts.
•SW analyticsshouldgo beyonddevelopers 
–SA focusesonhelpingindividual developers with coding and bug-fixing decisions 
•by mining developer-oriented repositories such as version control systems and bug trackers. 
–SA needs to service a project’s various stakeholders 
•marketing, sales, support teams –not just developers.
•SW analyticsshouldgo beyonddevelopers 
–ArtifactsandKnowledgeacrossa project’svariousfacets. 
–Importanceos a pieceofcodeandits impactonusersatisfactionandrevenue. 
•Marketers-> fieldusagedata. 
•Sales staff -> inherentvaluethatcustomersassociatewitheachfeature.
•ProvingrelevancetoPractitioners 
–Future -> Layersofcontextare takenintoconsideration: 
•Domain ofSW development 
–nonfunctional requirements, environments, tools, idioms, and so on. 
•Domain of the software itself 
–databases, applications, and so on. 
•Context of the overall software project 
–Requirements, glossary, architecture, community, and so on.
•ProvingrelevancetoPractitioners 
–Software analytics has to prove its relevance by showing its cost effectiveness versus the alternative, which is doing nothing. 
•Doing nothing can be amazingly efficient. 
•We need to evaluate these techniques with practitioners in mind. 
•More meaningful and less superficial software analytics.
•Merenumbersaren’tenough 
–Numbers and equations are important to capture relations in the data, 
–For practical use: they must be accompanied with interpretation and visualization. 
–It’s a transfer from the quantitative domain to the qualitative domain. 
–more research is needed on: 
•how to bring the message out of the software analytics to those who make decision based on them.
•3 Questionsfor Analytics: 
•1)How much better is my model performing than a simple strategy, such as guessing? 
•2) How practically significant are the results? 
–effect sizes 
•3) How sensitive are the results to small changes in one or more of the inputs? 
–uncertain data
•Opportunitiesfor natural SW analytics 
–using models from statistical natural language processing for a new kind of analytics. 
–What most people write and say, most of the time, is highly repeatable and predictable. 
–Devices like Google Translate and Siri. 
–Code is no different. 
•most everyday code is simple and highly predictable. 
–Able to adapt standard n-gram models from statistical NLP to code, and train them on hundreds of millions of LOC. 
–Code is actually between 8 and 16 times more predictable than English.
•Wanted: Assistance from Information Analysts 
–Mission Impossible and TV Series 24 
•Fields agents -> heroes -> developers 
•We shouldn’t neglect the information analysts (Chloe on 24) 
•Information Analysts -> provide critical information 
–such as the backgrounds, strengths, and weaknesses of the people, places, and eventualities faced by the field agents. 
–Without the information analysts, it’s hard to imagine a successful mission. 
–Information analysts = real heroes.
•Wanted: Assistance from Information Analysts 
–Developers have to figure out all the necessary information about 
•what and where and how to change the software by themselves. 
–We need to provide the services of information analysts to developers 
•and assist them in making the right decisions. 
–SW analytics can continually provide contextual information based on developers’ current tasks. 
–Decent information visualization and computer- human interaction technologies 
•can help present this information efficiently.
•Papersdiscuss: 
–Context! 
–Relevancefor practioners. 
–New waysfor conductingSW analytics. 
–Importanceofnew studies. 
–Additionof an analytic professional to the software development team.
•Video: 
–https://www.youtube.com/watch?v=nO6X0azR0nw 
IEEE Software editor in chief Forrest Shull speaks with Tim Menziesabout the growing importance of software analytics. From IEEE Software's July/August 2013 issue.
[1] Buse& Zimmermann: Analytics for Software Development (FoSER2010). 
[2] Zhang et al.: Software Analytics as a Learning Case in Practice: Approaches and Experiences (MALETS 2011). 
[3] Begel& Zimmerman: Analyze This! 145 Questions for Data Scientists in Software Engineering (ICSE 2014). 
[4] Hassan, Hindle, Runeson, Shepperd, Devanbu, & Kim: What’s Next in Software Analytics (IEEE Software 2013).

More Related Content

PDF
software engineering
PDF
Distributed Software Development Process, Initiatives and Key Factors: A Syst...
PDF
IRJET- A Research Study on Critical Challenges in Agile Requirements Engineering
PDF
Information-seeking behaviors among professional users of industrial equipment
PDF
Survey Based Reviewof Elicitation Problems
DOCX
216328327 nilesh-and-teams-project
PDF
A noble methodology for users’ work
PPTX
Artificial Inteligence: the begining
software engineering
Distributed Software Development Process, Initiatives and Key Factors: A Syst...
IRJET- A Research Study on Critical Challenges in Agile Requirements Engineering
Information-seeking behaviors among professional users of industrial equipment
Survey Based Reviewof Elicitation Problems
216328327 nilesh-and-teams-project
A noble methodology for users’ work
Artificial Inteligence: the begining

What's hot (6)

PDF
IT PROJECT SHOWSTOPPER FRAMEWORK: THE VIEW OF PRACTITIONERS
PDF
ITERATIVE AND INCREMENTAL DEVELOPMENT ANALYSIS STUDY OF VOCATIONAL CAREER INF...
PDF
Software requirements engineering
DOCX
Research paperV1
DOCX
Professional Practice Course Outline
PDF
Risk factorserp sumner
IT PROJECT SHOWSTOPPER FRAMEWORK: THE VIEW OF PRACTITIONERS
ITERATIVE AND INCREMENTAL DEVELOPMENT ANALYSIS STUDY OF VOCATIONAL CAREER INF...
Software requirements engineering
Research paperV1
Professional Practice Course Outline
Risk factorserp sumner
Ad

Similar to Software Analytics (20)

PDF
Software Analytics = Sharing Information
PDF
Past, Present, and Future of Analyzing Software Data
DOCX
The Emerging Role of Data Scientists on Software Developmen.docx
DOCX
The Emerging Role of Data Scientists on Software Developmen.docx
DOCX
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
PDF
Doing Analytics Right - Building the Analytics Environment
PDF
20151016 Data Science For Project Managers
PPTX
Project Mangement
PDF
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
PPTX
MODULE 1_Introduction to Data analytics and life cycle..pptx
PDF
Frameworks provide structure. The core objective of the Big Data Framework is...
PDF
Information Needs for Software Development Analytics
PDF
50320140502003
PDF
1. Overview_of_data_analytics (1).pdf
PDF
Information Architech and DWH with PowerDesigner
PDF
empirical software engineering, v2.0
PPTX
001 More introduction to big data analytics
PDF
50320140502003
PPTX
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
PDF
The Softer Skills Analysts need to make an impact
Software Analytics = Sharing Information
Past, Present, and Future of Analyzing Software Data
The Emerging Role of Data Scientists on Software Developmen.docx
The Emerging Role of Data Scientists on Software Developmen.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
Doing Analytics Right - Building the Analytics Environment
20151016 Data Science For Project Managers
Project Mangement
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
MODULE 1_Introduction to Data analytics and life cycle..pptx
Frameworks provide structure. The core objective of the Big Data Framework is...
Information Needs for Software Development Analytics
50320140502003
1. Overview_of_data_analytics (1).pdf
Information Architech and DWH with PowerDesigner
empirical software engineering, v2.0
001 More introduction to big data analytics
50320140502003
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
The Softer Skills Analysts need to make an impact
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Global journeys: estimating international migration
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Computer network topology notes for revision
PPT
Quality review (1)_presentation of this 21
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Global journeys: estimating international migration
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to machine learning and Linear Models
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Fluorescence-microscope_Botany_detailed content
Business Ppt On Nestle.pptx huunnnhhgfvu
Computer network topology notes for revision
Quality review (1)_presentation of this 21
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Moving the Public Sector (Government) to a Digital Adoption
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Foundation of Data Science unit number two notes
Business Acumen Training GuidePresentation.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
climate analysis of Dhaka ,Banglades.pptx

Software Analytics

  • 1. MAC6912 -Ambientes de Desenvolvimento de Software Professor Marco Aurélio Gerosa Ana Paula Oliveira Bertholdo
  • 2. •4 papers: –Analyticsfor SW Development (Zimmermann & Buse, 2010) –SwAnalyticsas a Learning Case in Practice: Approaches andExperiences (Zhang et al., 2011) –Analyzethis! 145 questionsfor data scientistsin SwEngineering (Begel& Zimmermann, 2014) –What’snextin SW Analytics (Hassan et al., 2013)
  • 3. •Software engineering is a data rich activity. •Artifacts of a project’s development –automation, efficiency, and granularity. •Projects can be measured throughout their life-cycle.
  • 4. •SW development continues to be risky and unpredictable. •It is not unusual for major development efforts to experience large delays or failures.
  • 5. •Substantial disconnect between –(A) the information needed by project managers to make good decisions and –(B) the information currently delivered by existing tools. –At its root: •Problem: real-world information needs of project managers are not well understood by the research community. •Research has ignored the needs of managers and has instead focused on the information needs of developers.
  • 6. •When data needs are not met… –tools are unavailable –too difficult to use –too difficult to interpret or –they simply do not present useful or actionable information •Managers must primarily rely on past experience and intuition for critical decision making.
  • 7. •The data-centric style of decision making is known as analytics. •The idea is to leverage large amounts of data into real and actionable insights.
  • 8. Figure 1: AnalyticalQuestions. The researchersdistinguishbetweenquestionsofinformationwhichcanbedirectlymeasured, fromquestionsofinsight whicharisefroma carefulanalyticanalysisandprovidemanagers witha basisfor action.
  • 9. •Transition isn’t easy! •Insight necessarily requires –knowledge of the domain coupled with the –ability to identify patterns involving multiple indicators.
  • 10. •Managers may be too busy or may simply lack the quantitative skills or analytic expertise to fully leverage advanced analytical applications. •One possibility is that tools should be created with this in mind. •Another possibility is the addition of an analytic professional to the software development team.
  • 12. •Conclusion –All resources, especially talent, are always constrained. –This alludes to the importance of careful and deliberate decision making by the managers of software projects. –The observation that software projects continue to be risky and unpredictable despite being highly measurable implies that more analytic information should be leveraged toward decision making. –In this paper, the researchers •described how software analytics can help managers move from low-level measurements to high-level insights about complex projects. •advocated more research into the information needs and decision process of managers. •discussed how the complexity of software development suggests that dedicated analytic professionals with both quantitative skills and domain knowledge might provide great benefit to future projects.
  • 13. •Researchers(Microsoft Research Asia) advocatethatwhenapplying analytic technologies in practice one should: –(1) incorporate a broad spectrum of domain knowledge and expertise, •e.g., management, machine learning, large-scale data processing and computing, and information visualization; and –(2) investigate how practitioners take actions on the produced information, and provide effective support for such information-based action taking.
  • 14. –Various analytic technologies •(data mining, machine learning, and information visualization). –Software analytics is to enable to perform data exploration and analysis in order to obtain insightful and actionable information. –Insightful information •meaningful and useful understanding or knowledge towards performing the target task. –Actionable information •upon which software practitioners can come up with concrete solutions towards completing the target task.
  • 15. •Developing a software analytic project typically goes through iterations of the life cycle of four phases: 1) task definition, 2) data preparation, 3) analytic-technology development, and 4) deployment and feedback gathering.
  • 16. •Task definition is to define the target task to be assisted by software analytics –pull model: Stack Mine -> performance analysis –push model: XIAO -> refactoring and defect detection
  • 17. •Data preparationis to collect data to be analyzed. –2 types of infrastructure supports: existing ones in industry and in-house ones. –StackMine-> existing Microsoft infrastructure support. –XIAO-> in-house code-analysis.
  • 18. •Analytic-technology development is to develop problem formulation, algorithms, and systems to explore, understand, and get insights from the data. –The SA team needs to acquire deep knowledge about the data (including its format and semantics) and target tasks. –the time this acquirement process may be non-trivial.
  • 19. •Deployment and feedback gathering involves two typical scenarios. –1: the researchers have obtained some insightful information from the data and they ask domain experts to review and verify. –2: the researchers ask domain experts to use the analytic tools to obtain insights by themselves. •“the more the customers use the tools, the “smarter” the tools become.”
  • 20. •Domain knowledge and expertise are strongly needed in successfully developing a software analytic project for technology transfer. •Types of domain knowledge: –Specific application domain knowledge (customers). –Common application domain knowledge(family of swapplications). –Data domain knowledge(data preparation).
  • 21. •Typesofexpertise: –Task expertise •work with the customers to learn the workflow. –Management expertise •good management and communication skills to interact with the customers and manage the team. –Machine learning expertise. •to develop machine learning algorithms and tools (not just in a black-box way). –Large-scale data processing/computing expertise. •to design and implement scalable data processing tools and learning tools. –Information visualization expertise. •to design and implement good user interfaces and visualization for presenting analysis results.
  • 22. •Conclusion: –What do developers think about your result? –Is it applicable in their context? –How much would it help them in their daily work?”
  • 23. •Resultsfrom2 Surveysrelatedtodata Science appliedtoSW Engineering. •1st Survey: •questionsthatswengineerswouldlikedata scientiststoinvestigateaboutsw, swprocesses andpracticesandswengineers. •2nd Survey: –Swengineersrate 145 questionsandidentifythemostimportante onestoworkonfirst.
  • 24. •Businesses of all types commonly use analytics to better reach and understand their customers. •Many software engineering researchers have argued for more use of data for decision-making. •The demand for data scientists in software projects will grow rapidly. •Harvard Business Review named the job of Data Scientist as the most desired Job of the 21st Century •By 2018, the U.S. may face a shortage of as many as 190,000 people with analytical expertise and of 1.5 million managers and analysts with the skills to make data-driven decisions, according to a report by the McKinsey Global Institute.
  • 25. •Researchgoal: –Presentsa rankedlistofquestionsthatswengineerswanttohaveansweredbydata scientists. –The listwasdeployedamongprofessional swengineersatMicrosoft.
  • 27. •The research: –provides a catalog of 145 questions that software engineers would like to ask data scientists about software. –ranks the questions by importance (and opposition) to help researchers, practitioners, and educators focus their efforts on topics of importance to industry. –calls to action to other industry companies and to the academic community to replicate its methods and grow the body of knowledge from this start (technical report).
  • 28. •Initialsurvey: –2 pilotsurveysto25 and75 Microsoft engineers. –The pilot demonstrated the need to seed the survey with data analytics questions. •What impact does code quality have on our ability to monetize a software service? –1500 SW engineers in September 2012. –36,5% developers, 38,9% testers, 22,7% program managers.
  • 32. •Rating Survey: –Split QuestionnaireSurveyDesign •Componentblocks –607 responses (2500 engineers) –16,705 ratings –Multiple-choiceformat –29,3% developers, 30,1% testersand40,5% programmanagers.
  • 36. •Of the questions with the most opposition, the top five are about the fear that respondents had of being ranked and rated.
  • 38. Catalog of 145 questions is relevant for: •Research: –the descriptive questions outline opportunities to collaborate with industry and –influence their software development processes, practices, and tools. •Practice: •the list of questions identifies particular data to collect and analyze to find answers, •as well as the need to build collection and analysis tools at industrial scale. •Education: •the questions provide guidance on what analytical techniques to teach in courses for future data scientists, •as well as providing instruction on topics of importance to industry (which students always appreciate).
  • 39. •Conclusion –Researchershope that this paper will inspire similar research projects. –In order to facilitate replication of this work for additional engineering disciplines and companies, they provide the full text of both surveys as well as the 145 questions in a technical report. –With the growing demand for data scientists, more research is needed to better understand how people make decisions in software projects and what data and tools they need. –There is also a need to increase the data literacy of future software engineers. –Lastly, we need to think more about the consumer of analyses and not just the producers of them (data scientists, empirical researchers).
  • 40. •6 establishedexperts in SW analytics •Whatisthemostimportante aspectofthisfield?
  • 41. •1) SW analyticsshouldgo beyonddevelopers. •2) Analyticsshouldprove its relevancetopractitioners. •3) Merenumbersaren’tenough. •4) 3 Questionsfor analytics. •5) Opportunitiesfor natural SW analytics. •6) AssistancefromInformationAnalysts.
  • 42. •SW analyticsshouldgo beyonddevelopers –SA focusesonhelpingindividual developers with coding and bug-fixing decisions •by mining developer-oriented repositories such as version control systems and bug trackers. –SA needs to service a project’s various stakeholders •marketing, sales, support teams –not just developers.
  • 43. •SW analyticsshouldgo beyonddevelopers –ArtifactsandKnowledgeacrossa project’svariousfacets. –Importanceos a pieceofcodeandits impactonusersatisfactionandrevenue. •Marketers-> fieldusagedata. •Sales staff -> inherentvaluethatcustomersassociatewitheachfeature.
  • 44. •ProvingrelevancetoPractitioners –Future -> Layersofcontextare takenintoconsideration: •Domain ofSW development –nonfunctional requirements, environments, tools, idioms, and so on. •Domain of the software itself –databases, applications, and so on. •Context of the overall software project –Requirements, glossary, architecture, community, and so on.
  • 45. •ProvingrelevancetoPractitioners –Software analytics has to prove its relevance by showing its cost effectiveness versus the alternative, which is doing nothing. •Doing nothing can be amazingly efficient. •We need to evaluate these techniques with practitioners in mind. •More meaningful and less superficial software analytics.
  • 46. •Merenumbersaren’tenough –Numbers and equations are important to capture relations in the data, –For practical use: they must be accompanied with interpretation and visualization. –It’s a transfer from the quantitative domain to the qualitative domain. –more research is needed on: •how to bring the message out of the software analytics to those who make decision based on them.
  • 47. •3 Questionsfor Analytics: •1)How much better is my model performing than a simple strategy, such as guessing? •2) How practically significant are the results? –effect sizes •3) How sensitive are the results to small changes in one or more of the inputs? –uncertain data
  • 48. •Opportunitiesfor natural SW analytics –using models from statistical natural language processing for a new kind of analytics. –What most people write and say, most of the time, is highly repeatable and predictable. –Devices like Google Translate and Siri. –Code is no different. •most everyday code is simple and highly predictable. –Able to adapt standard n-gram models from statistical NLP to code, and train them on hundreds of millions of LOC. –Code is actually between 8 and 16 times more predictable than English.
  • 49. •Wanted: Assistance from Information Analysts –Mission Impossible and TV Series 24 •Fields agents -> heroes -> developers •We shouldn’t neglect the information analysts (Chloe on 24) •Information Analysts -> provide critical information –such as the backgrounds, strengths, and weaknesses of the people, places, and eventualities faced by the field agents. –Without the information analysts, it’s hard to imagine a successful mission. –Information analysts = real heroes.
  • 50. •Wanted: Assistance from Information Analysts –Developers have to figure out all the necessary information about •what and where and how to change the software by themselves. –We need to provide the services of information analysts to developers •and assist them in making the right decisions. –SW analytics can continually provide contextual information based on developers’ current tasks. –Decent information visualization and computer- human interaction technologies •can help present this information efficiently.
  • 51. •Papersdiscuss: –Context! –Relevancefor practioners. –New waysfor conductingSW analytics. –Importanceofnew studies. –Additionof an analytic professional to the software development team.
  • 52. •Video: –https://www.youtube.com/watch?v=nO6X0azR0nw IEEE Software editor in chief Forrest Shull speaks with Tim Menziesabout the growing importance of software analytics. From IEEE Software's July/August 2013 issue.
  • 53. [1] Buse& Zimmermann: Analytics for Software Development (FoSER2010). [2] Zhang et al.: Software Analytics as a Learning Case in Practice: Approaches and Experiences (MALETS 2011). [3] Begel& Zimmerman: Analyze This! 145 Questions for Data Scientists in Software Engineering (ICSE 2014). [4] Hassan, Hindle, Runeson, Shepperd, Devanbu, & Kim: What’s Next in Software Analytics (IEEE Software 2013).