SlideShare a Scribd company logo
Googling for Software Development:
What Developers Search For and
What They Find
MSR 2021
Andre Hora
Developers often search for
software resources on the web
2
Developers often search for
software resources on the web
They may spend ~20% of
their time on the web
3
Code examples
Novel technologies
Bug-
fi
xes
Documentation
etc.
4
Stack Over
fl
ow
50Musers/month
W3Schools
2.5Bpageviews/year
5
Over 85% of
their tra
ffi
c
come from
web search
engines
[alexa.com]
Stack Over
fl
ow
50Musers/month
W3Schools
2.5Bpageviews/year
6
7
8
9
What do developers search for
and what they
fi
nd?
10
Search Queries
11
Search Queries Search Results
12
Search Queries Search Results
Understand real-world search
queries and developers’ needs
Detect where search engines
fi
nd software resources and
explore the results
13
Study Design
1. stackover
fl
ow.com
2. w3schools.com
3. geeksforgeeks.org
4. tutorialspoint.com
5. programcreek.com
Selecting the Websites
15
1
1. stackover
fl
ow.com
2. w3schools.com
3. geeksforgeeks.org
4. tutorialspoint.com
5. programcreek.com
Selecting the Websites Collecting the Search Queries
16
1 2
1.3M distinct queries
1. stackover
fl
ow.com
2. w3schools.com
3. geeksforgeeks.org
4. tutorialspoint.com
5. programcreek.com
Selecting the Websites Collecting the Search Queries
17
1 2
What Developers Search For
3
1.3M distinct queries
• RQ1: Query content
• RQ2: Query size & keywords
• RQ3: Query structure
• RQ4: Query similarity
1. stackover
fl
ow.com
2. w3schools.com
3. geeksforgeeks.org
4. tutorialspoint.com
5. programcreek.com
Selecting the Websites Collecting the Search Queries
18
1 2
What Developers Search For
3
What Developers Find
4
• RQ1: Query content
• RQ2: Query size & keywords
• RQ3: Query structure
• RQ4: Query similarity
• RQ5: Result resources
• RQ6: Result variation
1.3M distinct queries
Search API
Results
RQ1
Query Content
RQ1 What is the content of the
search queries?
21
RQ1 What is the content of the
search queries?
22
RQ1 What is the content of the
search queries?
23
RQ1 What is the content of the
search queries?
24
Developers’ queries typically provide references to
technologies, such as programming languages (30%),
software technologies (24.5%), and web frameworks (5%)
RQ2
Query Size & Keywords
RQ2 What is the size of the search
queries? Where are the keywords located?
26
Size
Keyword position
49.2 65.2 48.7
3
RQ2 What is the size of the search
queries? Where are the keywords located?
27
Size
Keyword position
49.2 65.2 48.7
3
Developers’ queries are short: 3 words, on
the median. Keywords are more likely to be
the
fi
rst than the last word in the query
RQ3
Query structure
RQ4
Query similarity
29
As in general web search, developers also tend to exclude
function words in their queries, which are mostly
composed of content words (80.3%).
Most of the developers’ queries are similar among each
other: while 40% have no similar counterpart, 60% have at
least one similar peer. 8% have 10 or more similar ones.
RQ3
RQ4
RQ5
Result resources
Search API
RQ5 Where does Google
fi
nd
software resources?
31
RQ5 Where does Google
fi
nd
software resources?
32
Google
fi
nds software resources mostly on Stack Over
fl
ow
(11%), YouTube (6%), and W3Schools (5%). However, the
results may vary according to query (keyword or general).
RQ6
Result variation
Search API
RQ6: How do Google results vary for similar queries?
34
RQ6: How do Google results vary for similar queries?
35
1
2
3
4
RQ6: How do Google results vary for similar queries?
36
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python
RQ6: How do Google results vary for similar queries?
37
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python
Word removal
Query 1: python email parser
Query 2: email parser
RQ6: How do Google results vary for similar queries?
38
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python
Word removal
Query 1: python email parser
Query 2: email parser
Synonymous word
Query 1: python email parser
Query 2: py email parser
RQ6: How do Google results vary for similar queries?
39
RQ6: How do Google results vary for similar queries?
40
The links and order of the top 10 Google search results are very
likely to change due to similar queries, whereas the top 1 is much
less affected. However, overall, the intersection of links due to
similar queries is high, at least 70% in most cases.
1. Developers’ queries are likely to include key
contexts (e.g., technologies)
2. Developers’ queries share characteristics with
general ones: both are short and tend to omit
function words
3. Google
fi
nds software resources mostly on
Stack Over
fl
ow (11%) with an over-
concentration in the top 1 results (28%)
4. YouTube is a prominent source for Google
(mostly top 3 results of general queries)
5. Performing minor changes to queries do not
broadly a
ff
ect the top 1 search results nor the
overall top 10 (however, there are exceptions!)
Takeaways
Googling for Software Development:
What Developers Search For and
What They Find
MSR 2021
Andre Hora

More Related Content

PPTX
Towards Automated Supports for Code Reviews using Reviewer Recommendation and...
PDF
Using HPC Resources to Exploit Big Data for Code Review Analytics
PDF
Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...
PPTX
Chapter 1 2 - some size factors
PPTX
What are the Characteristics of High-rated Apps
PDF
Crowd Documentation - How Programmer Social Communities are Flipping Software...
PDF
PhD Proposal talk
PDF
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...
Towards Automated Supports for Code Reviews using Reviewer Recommendation and...
Using HPC Resources to Exploit Big Data for Code Review Analytics
Process Aspects and Social Dynamics of Contemporary Code Review: Insights fro...
Chapter 1 2 - some size factors
What are the Characteristics of High-rated Apps
Crowd Documentation - How Programmer Social Communities are Flipping Software...
PhD Proposal talk
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...

Similar to Googling for Software Development: What Developers Search For and What They Find (MSR 2021) (20)

PDF
What Do Developers Discuss about Code Comments?
PPTX
Managing Large-scale Multimedia Development Projects
PDF
web-roadmap developer file information..
PPTX
Final Presentation
PDF
HackerRank 2018 Tech Recruiting Report
PDF
IRJET- Semantic Question Matching
PPTX
Some Size factors in software engineering
PPT
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
PPT
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
PPTX
Towards Reusable Research Software
PPTX
Find the 'Unfindable' with TalentBin by Monster!
PPTX
Understanding and Executing on API Developer Experience
PDF
Software Engineering and Fundamentals note
PPTX
Automatic Identification of Informative Code in Stack Overflow Posts
PPT
Quality, Cost, and Governance of Open Source Software
PDF
Adaptation of the technology of the static code analyzer for developing paral...
PDF
What does open source mean for the institutional web manager?
PDF
Download full ebook of Basics Of Programming Dg Junior instant download pdf
PPTX
Developing accessible experiences - Alison Walden
PDF
Is software engineering research addressing software engineering problems?
What Do Developers Discuss about Code Comments?
Managing Large-scale Multimedia Development Projects
web-roadmap developer file information..
Final Presentation
HackerRank 2018 Tech Recruiting Report
IRJET- Semantic Question Matching
Some Size factors in software engineering
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Towards Reusable Research Software
Find the 'Unfindable' with TalentBin by Monster!
Understanding and Executing on API Developer Experience
Software Engineering and Fundamentals note
Automatic Identification of Informative Code in Stack Overflow Posts
Quality, Cost, and Governance of Open Source Software
Adaptation of the technology of the static code analyzer for developing paral...
What does open source mean for the institutional web manager?
Download full ebook of Basics Of Programming Dg Junior instant download pdf
Developing accessible experiences - Alison Walden
Is software engineering research addressing software engineering problems?
Ad

More from Andre Hora (15)

PDF
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
PDF
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
PDF
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
PDF
When should internal interfaces be promoted to public? (FSE 2016)
PDF
Assessing the Threat of Untracked Changes in Software Evolution (ICSE 2018)
PDF
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
PDF
Assessing Mock Classes: An Empirical Study (ICSME 2020)
PDF
What Code Is Deliberately Excluded from Test Coverage and Why? (MSR 2021)
PDF
Availability and Usage of Platform-Specific APIs: A First Empirical Study (MS...
PDF
How and Why Developers Migrate Python Tests (SANER 2022)
PDF
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
PDF
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
PDF
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PDF
Predicting Test Results without Execution (FSE 2024)
PDF
SpotFlow: Tracking Method Calls and States at Runtime (ICSE 2024)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
When should internal interfaces be promoted to public? (FSE 2016)
Assessing the Threat of Untracked Changes in Software Evolution (ICSE 2018)
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
Assessing Mock Classes: An Empirical Study (ICSME 2020)
What Code Is Deliberately Excluded from Test Coverage and Why? (MSR 2021)
Availability and Usage of Platform-Specific APIs: A First Empirical Study (MS...
How and Why Developers Migrate Python Tests (SANER 2022)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is ...
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
Predicting Test Results without Execution (FSE 2024)
SpotFlow: Tracking Method Calls and States at Runtime (ICSE 2024)
Ad

Recently uploaded (20)

PPTX
ai tools demonstartion for schools and inter college
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
history of c programming in notes for students .pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
L1 - Introduction to python Backend.pptx
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Digital Strategies for Manufacturing Companies
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
ai tools demonstartion for schools and inter college
Operating system designcfffgfgggggggvggggggggg
How to Choose the Right IT Partner for Your Business in Malaysia
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
history of c programming in notes for students .pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PTS Company Brochure 2025 (1).pdf.......
CHAPTER 2 - PM Management and IT Context
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Which alternative to Crystal Reports is best for small or large businesses.pdf
Odoo POS Development Services by CandidRoot Solutions
L1 - Introduction to python Backend.pptx
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Digital Strategies for Manufacturing Companies
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free

Googling for Software Development: What Developers Search For and What They Find (MSR 2021)

  • 1. Googling for Software Development: What Developers Search For and What They Find MSR 2021 Andre Hora
  • 2. Developers often search for software resources on the web 2
  • 3. Developers often search for software resources on the web They may spend ~20% of their time on the web 3
  • 6. Over 85% of their tra ffi c come from web search engines [alexa.com] Stack Over fl ow 50Musers/month W3Schools 2.5Bpageviews/year 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 10. What do developers search for and what they fi nd? 10
  • 12. Search Queries Search Results 12
  • 13. Search Queries Search Results Understand real-world search queries and developers’ needs Detect where search engines fi nd software resources and explore the results 13
  • 15. 1. stackover fl ow.com 2. w3schools.com 3. geeksforgeeks.org 4. tutorialspoint.com 5. programcreek.com Selecting the Websites 15 1
  • 16. 1. stackover fl ow.com 2. w3schools.com 3. geeksforgeeks.org 4. tutorialspoint.com 5. programcreek.com Selecting the Websites Collecting the Search Queries 16 1 2 1.3M distinct queries
  • 17. 1. stackover fl ow.com 2. w3schools.com 3. geeksforgeeks.org 4. tutorialspoint.com 5. programcreek.com Selecting the Websites Collecting the Search Queries 17 1 2 What Developers Search For 3 1.3M distinct queries • RQ1: Query content • RQ2: Query size & keywords • RQ3: Query structure • RQ4: Query similarity
  • 18. 1. stackover fl ow.com 2. w3schools.com 3. geeksforgeeks.org 4. tutorialspoint.com 5. programcreek.com Selecting the Websites Collecting the Search Queries 18 1 2 What Developers Search For 3 What Developers Find 4 • RQ1: Query content • RQ2: Query size & keywords • RQ3: Query structure • RQ4: Query similarity • RQ5: Result resources • RQ6: Result variation 1.3M distinct queries Search API
  • 21. RQ1 What is the content of the search queries? 21
  • 22. RQ1 What is the content of the search queries? 22
  • 23. RQ1 What is the content of the search queries? 23
  • 24. RQ1 What is the content of the search queries? 24 Developers’ queries typically provide references to technologies, such as programming languages (30%), software technologies (24.5%), and web frameworks (5%)
  • 25. RQ2 Query Size & Keywords
  • 26. RQ2 What is the size of the search queries? Where are the keywords located? 26 Size Keyword position 49.2 65.2 48.7 3
  • 27. RQ2 What is the size of the search queries? Where are the keywords located? 27 Size Keyword position 49.2 65.2 48.7 3 Developers’ queries are short: 3 words, on the median. Keywords are more likely to be the fi rst than the last word in the query
  • 29. 29 As in general web search, developers also tend to exclude function words in their queries, which are mostly composed of content words (80.3%). Most of the developers’ queries are similar among each other: while 40% have no similar counterpart, 60% have at least one similar peer. 8% have 10 or more similar ones. RQ3 RQ4
  • 31. RQ5 Where does Google fi nd software resources? 31
  • 32. RQ5 Where does Google fi nd software resources? 32 Google fi nds software resources mostly on Stack Over fl ow (11%), YouTube (6%), and W3Schools (5%). However, the results may vary according to query (keyword or general).
  • 34. RQ6: How do Google results vary for similar queries? 34
  • 35. RQ6: How do Google results vary for similar queries? 35 1 2 3 4
  • 36. RQ6: How do Google results vary for similar queries? 36 1 2 3 4 Word swap Query 1: python email parser Query 2: email parser python
  • 37. RQ6: How do Google results vary for similar queries? 37 1 2 3 4 Word swap Query 1: python email parser Query 2: email parser python Word removal Query 1: python email parser Query 2: email parser
  • 38. RQ6: How do Google results vary for similar queries? 38 1 2 3 4 Word swap Query 1: python email parser Query 2: email parser python Word removal Query 1: python email parser Query 2: email parser Synonymous word Query 1: python email parser Query 2: py email parser
  • 39. RQ6: How do Google results vary for similar queries? 39
  • 40. RQ6: How do Google results vary for similar queries? 40 The links and order of the top 10 Google search results are very likely to change due to similar queries, whereas the top 1 is much less affected. However, overall, the intersection of links due to similar queries is high, at least 70% in most cases.
  • 41. 1. Developers’ queries are likely to include key contexts (e.g., technologies) 2. Developers’ queries share characteristics with general ones: both are short and tend to omit function words 3. Google fi nds software resources mostly on Stack Over fl ow (11%) with an over- concentration in the top 1 results (28%) 4. YouTube is a prominent source for Google (mostly top 3 results of general queries) 5. Performing minor changes to queries do not broadly a ff ect the top 1 search results nor the overall top 10 (however, there are exceptions!) Takeaways
  • 42. Googling for Software Development: What Developers Search For and What They Find MSR 2021 Andre Hora