SlideShare a Scribd company logo
Are FLOSS Developers Committing to CVS/SVN as much
          as they are Talking in Mailing Lists?
Challenges for Integrating Data from Multiple Repositories




                              Sulayman K. Sowe, I. Samoladas, I. Stamelos, L. Angelis
                                  Dept. of Informatics, Aristotle University, Greece.
                                                sksowe@csd.auth.gr

               3rd International Workshop on Public Data about Software Development (WoPDaSD)
                                        10th September 2008, Milan, Italy.




This research is partially sponsored by the FLOSSMetrics Project (Ref. No. FP6-IST5-033547), http://flossmetrics.org/
and SQO-OSS project (Ref. No. FP6-IST-5-033331),http://www.sqo-oss.eu/

                                          WoPDaSD                                                                       ~.1
In this presentation...
  ➲   Nomadic life of FLOSS developers
         Motivation for this research:
         Research hypothesis

  ➲   Methodology in brief
         Data & Source
         Identification of developers from SVN & Lists

  ➲   Results & Discussion

  ➲   Summary & conclusion
         Ongoing research
                     WoPDaSD                              ~.2
Nomadic life of FLOSS developers




➲    Like the Fulani nomads of the West African planes
     FLOSS developers are not bound to a single territory
     and are free to:
     participate in other projects or communities,
     use and reuse software/bits of code from other projects,
     suggest, argue for or against requirements, specs., etc. in
      projects where they have least commits rights,
     use different identities (usernames, email), etc.


                        WoPDaSD                                     ~.3
Motivation for this research
➲
        Why research FLOSS developers or nomads?
    
          Understand the collaborative nature of developing FLOSS in
          terms developer participation (code commits and email postings)
          in multiple repositories - SVN and Mailing Lists.
➲
        Research Hypothesis:
        IF Mailing lists are the main communication veins in most projects,
        then CVS/SVN is a collection of arteries. Thus,
    
          FLOSS developers code and participate in lists discussions:
           H0: ”FLOSS developers contribute equally to code
               repository and mailing lists”, alternative
           H1: “FLOSS developers contribute more to code repository
               than mailing lists”.




                                WoPDaSD                                       ~.4
Methodology…Data & Source
➲   Retrieve data from 14 projects from the Flossmetric
    retrieval system
       Mailing lists data dumps (.sql file format)
       SVN data dumps (.sql file format)




                           WoPDaSD                        ~.5
Initial (Raw) Data
 ➲   How many SVN commiters and Mailing Lists posters in each project?




SVN
Commits




ML
Posts




                            WoPDaSD                                      ~.6
Methodology…Identification of developers
 ➲
     The main problem in
     studying developers
     activities in multiple
     repositories is
     identification:
     ➲
         Is committer A in SVN of
         project X the same person
         (Poster A) in mailing lists of
         project X?




                                  WoPDaSD   ~.7
Results & Discussion…1
➲       The query result for each project gave us developers co-occurrence in both SVN
        and mailing list
➲       N=486 for all 14 projects.
            Percentage of developer in both repositories
              In 8 projects = 57.14%
              In 4 projects = 90.11%
              In 2 projects = 80.21%
➲       What is going on in ibatis and turbine?




                                    WoPDaSD                                          ~.8
Results & Discussion...2
➲       Distribution of Commits & Posts
        Domination of commits over posts
        Mean commit per developer > Mean post per developer
        Developers are committing more to SVN than they are posting to mailing lists,
         EXCEPT in ibatis and turbine.




                              WoPDaSD                                                ~.9
Results & Discussion...3
➲ Relationship between Commits and Posts
➲   Overall correlation between commits and posts shows statistical significance
    (with * and for p < 0.05).




                        WoPDaSD                                                    ~.10
Results & Discussion...4
➲       Developers contribution in terms of commits and posts
        Wilcoxon signed rank test applied on mean values shows almost 50-50 split
         between projects where commits = posts (green) and commits > posts (yellow).
         With only the turbine project showing otherwise.




                              WoPDaSD                                              ~.11
Summary & conclusion
➲   FLOSS developers are coding as much as they are
    talking. They contribute equally to cod repositories
    and mailing lists, H0 supported.
➲   However, in almost all the projects, developers made
    more commits than posts, H1 supported.

➲   Why turbine and ibatis are outliers?
         Maybe the high prolific developer is making more posts than commits; in
          a ratio 4:1.
         Something peculiar about the composition of Apache related projects
➲   Ongoing aspects of this research
          Automate data collection and identification process
          Analyze a total of 60 or more projects from the FM retrieval system.
          Add a quality dimension to committers variable:
            Categorize commits: modifications, deletions, additions, code related,
             documentation (reports, readme, etc)
            Time scale/Sliding frames: the evolution of commits and posts over a
             given period.



                             WoPDaSD                                                  ~.12
Thank you for your attention
        Questions ?
         Comments
Suggestion for improvements




       WoPDaSD                  ~.13

More Related Content

PDF
morph-LDP: An R2RML-based Linked Data Platform implementation
ODP
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
ODP
OOoCon Lpod
PDF
CASE STUDIES: East-Southern Africa FLOSS Business Models
PPT
FLOSS Pilot Studies
PDF
A New Kid on the Block:The Pardus GNU/Linux Distro
ODP
FLOSS Case Studies
PPTX
Later vedic age
morph-LDP: An R2RML-based Linked Data Platform implementation
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
OOoCon Lpod
CASE STUDIES: East-Southern Africa FLOSS Business Models
FLOSS Pilot Studies
A New Kid on the Block:The Pardus GNU/Linux Distro
FLOSS Case Studies
Later vedic age

Similar to Implications Of Dual Participation Of Floss Developer (20)

PDF
Why FLOSS is a Java developer's best friend: Dave Gruber
PDF
Open source evolution analysis
PPT
Coordination Dynamics in Free/Libre and Open Source Software
PPT
Six Easy Pieces of Quantitatively Analyzing Open Source
PDF
Case Study: We're Watching You: How and Why Researchers Study Open Source And...
PDF
Mozilla education program - comete project - fossa2010
PPTX
Introduction to Open Source, Apache and Apache Way
PDF
Open Development in the Enterprise - ApacheCon NA 2013
KEY
Performing Large Scale Repeatable Software Engineering Studies
PPT
Dynamic Social Network Analysis (and more!) with eResearch Tools
PDF
Open Wonderland: A Tool for Collaborative Software Development
PPTX
IESL Talk Series: Apache System Projects in the Real World
PDF
Software Lifecycle Management in Asemantics
PDF
Rails Scala Citec Presentation
PDF
Floss Metrics 2009
PPT
SMP Project Presentation
PDF
How to Run a Successful Open Source Java EE Project
PDF
Beyond Fluffy Bunny. How I leveraged WebObjects in my lean startup.
PDF
Open Innovation means Open Source
PDF
Is software engineering research addressing software engineering problems?
Why FLOSS is a Java developer's best friend: Dave Gruber
Open source evolution analysis
Coordination Dynamics in Free/Libre and Open Source Software
Six Easy Pieces of Quantitatively Analyzing Open Source
Case Study: We're Watching You: How and Why Researchers Study Open Source And...
Mozilla education program - comete project - fossa2010
Introduction to Open Source, Apache and Apache Way
Open Development in the Enterprise - ApacheCon NA 2013
Performing Large Scale Repeatable Software Engineering Studies
Dynamic Social Network Analysis (and more!) with eResearch Tools
Open Wonderland: A Tool for Collaborative Software Development
IESL Talk Series: Apache System Projects in the Real World
Software Lifecycle Management in Asemantics
Rails Scala Citec Presentation
Floss Metrics 2009
SMP Project Presentation
How to Run a Successful Open Source Java EE Project
Beyond Fluffy Bunny. How I leveraged WebObjects in my lean startup.
Open Innovation means Open Source
Is software engineering research addressing software engineering problems?
Ad

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Approach and Philosophy of On baking technology
PPTX
Spectroscopy.pptx food analysis technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Unlocking AI with Model Context Protocol (MCP)
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?
sap open course for s4hana steps from ECC to s4
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
Unlocking AI with Model Context Protocol (MCP)
Ad

Implications Of Dual Participation Of Floss Developer

  • 1. Are FLOSS Developers Committing to CVS/SVN as much as they are Talking in Mailing Lists? Challenges for Integrating Data from Multiple Repositories Sulayman K. Sowe, I. Samoladas, I. Stamelos, L. Angelis Dept. of Informatics, Aristotle University, Greece. sksowe@csd.auth.gr 3rd International Workshop on Public Data about Software Development (WoPDaSD) 10th September 2008, Milan, Italy. This research is partially sponsored by the FLOSSMetrics Project (Ref. No. FP6-IST5-033547), http://flossmetrics.org/ and SQO-OSS project (Ref. No. FP6-IST-5-033331),http://www.sqo-oss.eu/ WoPDaSD ~.1
  • 2. In this presentation... ➲ Nomadic life of FLOSS developers  Motivation for this research:  Research hypothesis ➲ Methodology in brief  Data & Source  Identification of developers from SVN & Lists ➲ Results & Discussion ➲ Summary & conclusion  Ongoing research WoPDaSD ~.2
  • 3. Nomadic life of FLOSS developers ➲ Like the Fulani nomads of the West African planes FLOSS developers are not bound to a single territory and are free to:  participate in other projects or communities,  use and reuse software/bits of code from other projects,  suggest, argue for or against requirements, specs., etc. in projects where they have least commits rights,  use different identities (usernames, email), etc. WoPDaSD ~.3
  • 4. Motivation for this research ➲ Why research FLOSS developers or nomads?  Understand the collaborative nature of developing FLOSS in terms developer participation (code commits and email postings) in multiple repositories - SVN and Mailing Lists. ➲ Research Hypothesis: IF Mailing lists are the main communication veins in most projects, then CVS/SVN is a collection of arteries. Thus,  FLOSS developers code and participate in lists discussions: H0: ”FLOSS developers contribute equally to code repository and mailing lists”, alternative H1: “FLOSS developers contribute more to code repository than mailing lists”. WoPDaSD ~.4
  • 5. Methodology…Data & Source ➲ Retrieve data from 14 projects from the Flossmetric retrieval system  Mailing lists data dumps (.sql file format)  SVN data dumps (.sql file format) WoPDaSD ~.5
  • 6. Initial (Raw) Data ➲ How many SVN commiters and Mailing Lists posters in each project? SVN Commits ML Posts WoPDaSD ~.6
  • 7. Methodology…Identification of developers ➲ The main problem in studying developers activities in multiple repositories is identification: ➲ Is committer A in SVN of project X the same person (Poster A) in mailing lists of project X? WoPDaSD ~.7
  • 8. Results & Discussion…1 ➲ The query result for each project gave us developers co-occurrence in both SVN and mailing list ➲ N=486 for all 14 projects.  Percentage of developer in both repositories  In 8 projects = 57.14%  In 4 projects = 90.11%  In 2 projects = 80.21% ➲ What is going on in ibatis and turbine? WoPDaSD ~.8
  • 9. Results & Discussion...2 ➲ Distribution of Commits & Posts  Domination of commits over posts  Mean commit per developer > Mean post per developer  Developers are committing more to SVN than they are posting to mailing lists, EXCEPT in ibatis and turbine. WoPDaSD ~.9
  • 10. Results & Discussion...3 ➲ Relationship between Commits and Posts ➲ Overall correlation between commits and posts shows statistical significance (with * and for p < 0.05). WoPDaSD ~.10
  • 11. Results & Discussion...4 ➲ Developers contribution in terms of commits and posts  Wilcoxon signed rank test applied on mean values shows almost 50-50 split between projects where commits = posts (green) and commits > posts (yellow). With only the turbine project showing otherwise. WoPDaSD ~.11
  • 12. Summary & conclusion ➲ FLOSS developers are coding as much as they are talking. They contribute equally to cod repositories and mailing lists, H0 supported. ➲ However, in almost all the projects, developers made more commits than posts, H1 supported. ➲ Why turbine and ibatis are outliers?  Maybe the high prolific developer is making more posts than commits; in a ratio 4:1.  Something peculiar about the composition of Apache related projects ➲ Ongoing aspects of this research  Automate data collection and identification process  Analyze a total of 60 or more projects from the FM retrieval system.  Add a quality dimension to committers variable:  Categorize commits: modifications, deletions, additions, code related, documentation (reports, readme, etc)  Time scale/Sliding frames: the evolution of commits and posts over a given period. WoPDaSD ~.12
  • 13. Thank you for your attention Questions ? Comments Suggestion for improvements WoPDaSD ~.13