2. ②
③
④
⑤
2
Outlines
Introduction
① What is Mining Software Repository ?
What are the specificities of Berger-Levrault
regarding MSR ?
How we develop GitProjectHealth?
What can you do with it ? (demo)
What is next ?
4. 4
What is Mining Software Repository ?
Mining Software Repository
Definitions - 2
• Today's collaborative development relies on Git Social Platforms (GSP) [Dabbish et al. 2012]
• they are server implementations of Git with builtin social features
• They contain valuable historical information over a software development
Fig - Discussing Pull Request in GitHub Fig - Commits distribution over time (GitHub)
5. 5
What is Mining Software Repository ?
Mining Software Repository
Definitions - 2
• Mining Software Repository (MSR) provides methods and tools to extract data
from these platforms [Hassan 2008].
• Among other, it allows us to:
• Studying the impact of code smells [Steffen et al. 2010, Palomba et al. 2014]
• Exploring developers code review [Bacchelli et al. 2013]
• Predicting classes prone to change and defect [Bacchelli et al. 2010]
• Retro-analysing a entire development process [Mockus et al. 2000]
6. 6
What is Mining Software Repository ?
Mining Software Repository
Existing tools
• General Mining data
• PyDriller — python tool for mining commit
• Git-Miner — Pharo tool, based on CLI-GitMiner
• Specific Mining Data
• Javapers — java lib mostly for Java file analysis in git repository (leveraging SPOON)
• ModelMine — retrieve UML model from project's artefacts
• Data Storage
• Pandora —focus on long terme storage of Git social platform data
• Software Heritage — Archive of Git repository from GSP
• Computing Metrics
• LinearB — Productions and deployment metrics, data positioning with other
companies
7. What are the specificities of
Berger-Levrault regarding MSR
7
Section ②
8. 8
What are the specificities of Berger-Levrault regarding MSR ?
Industrial context
A quick word on Berger-levrault
• Berger-Levrault is
• a group of international software editors
• with divers sectors of activity (eduction, health, public administration, etc)
• The group acquired divers software editors over the past 30 years.
• From different countries (France, Spain, Canada, Maroco, etc);
• working with different technology (Java, C#, Typescript, Dart, etc);
• and different Git Social Platform (Gitlab, Bitbucket, Azure Devops, etc).
9. 9
What are the specificities of Berger-Levrault regarding MSR ?
Industrial context
A quick word on Berger-levrault - 2
• We use the project management system Altelissan's Jira to manage:
• tickets (Bug, features, Hotfix, etc)
• SPRINT (Agile development)
• releases (delivering dates, testing software, etc).
Fig - Kanban view of a SPRINT in Jira
10. 10
What are the specificities of Berger-Levrault regarding MSR ?
Industrial context
A quick word on Berger-levrault - 3
• Our Jira and Git Social Platform environment are connected
Fig - Linking Jira Tickets to Commit and Merge activity in GitLab
11. 14
What are the specificities of Berger-Levrault regarding MSR ?
Industrial context
The specificities of Berger-Levrault
Git Social Platform project management systems
How to mine from different Git Social Platforms (GSP) ?
How to implement MSR metrics efficiently ?
How to connect GSP data to Jira efficiently ?
We use Model Driven Engineering with Pharo-Moose
12. How we develop our solution
with MDE
15
Section ③
The conception of Git Project Health
13. 16
GitProjectHealth - MSR with MDE in Pharo
Our MSR solution
GitProjectHealth
GitProjectHealth (GPH) is framework to extract and analyse data from Git Social
platforms using Model-Driven Engineering (MDE).
tool for General Mining data & Metrics computing
• GitProjectHealth contributions are :
• A unify model for all Git Social Platform
• A framework to build custom metric from the model
• A use of metamodel connector to extend any analysis to other platforms (e.g., Jira)
github.com/moosetechnology/GitProjectHealth
14. 17
GitProjectHealth - MSR with MDE in Pharo
Main feature
GitProjectHealth
Key Features:
• Data Extraction & importation:
Extracts data from major social pla1orms.
Imports a model of specific Git en88es.
• Visualization and Metrics:
Visualizes data and computes metrics.
• Model Connection:
Connects models (e.g., Gitlab and Jira).
15. 18
GitProjectHealth - MSR with MDE in Pharo
GitProjectHealth
Git Model
Fig - Simplify Metamodel of a Git Social Platform in GitProjectHealth
16. 19
GitProjectHealth - MSR with MDE in Pharo
GitProjectHealth
Git Model
Fig - Simplify Metamodel of a Git Social Platform in GitProjectHealth
1. Naming decisions
Merge Request vs Pull Request
17. 20
GitProjectHealth - MSR with MDE in Pharo
GitProjectHealth
Git Model
Fig - Simplify Metamodel of a Git Social Platform in GitProjectHealth
2. New relations
Commits link to a User, not author
18. 21
GitProjectHealth - MSR with MDE in Pharo
GitProjectHealth
Git Model
Fig - Simplify Metamodel of a Git Social Platform in GitProjectHealth
3. Concepts at fine granularity
Modeling down to the changed line of code
19. 22
GitProjectHealth - MSR with MDE in Pharo
GitProjectHealth
API and Importer
Fig — APIs and GSP importers related to our Git model
Our API are independent
open source projects in
Pharo.
Anyone can access them
via github/Evref-BL
20. 23
Petit titre
GitProjectHealth
Metamodel connection: Jira - Git Model
Git model Jira model
Moose Model
GPH-Jira
model connector
sub-model sub-model
The connector accesses all the
entities and relations of its two
submodes (Git and Jira model)
21. 24
Petit titre
GitProjectHealth
Metamodel connection: Jira - Git Model
Git model Jira model
Moose Model
GPH-Jira
model connector
sub-model sub-model
The connector accesses all the
entities and relations of its two
submodes (Git and Jira model)
25. 28
Petit titre
Using GPH at Berger-Levrault
Computing MSR Metrics
• We build a metric framework within GPH
• They are either Projet or User centric
• For each Metric,
• it loads entities from a time period
( i.e., two dates)
• it calculates the metric over a time windows
(i.e. a Day, a Week, a Month, or a Year).
• 47 metrics are implemented so far.
Fig - UML representation of the Metrics in GitProjectHealth
Fig - Running all metrics in GitProjectHealth from a playground (simplified)
29. 32
Petit titre
Perspectives
Future Works
• Addressing limitations
• The difficulty of maintaining a global metamodel by investigating the generation of
GSP submetamodels from their OpenAPI
• Discuss the purpose of the measures and consider which measures correlate with
a healthy project.
• Evaluating GPH against existing tools (PyDriller, Git-Miner, etc)
• Evolution
• From GPH model to source code model navigating from repository to Famix model
• Build usable knowledge maps by detecting parts of the repository that have
become unknown to developers.
31. 34
Citations
Ressources
Bibliography
[Steffen et al. 2010] Steffen M. Olbrich, Daniela Cruzes, and Dag I. K. Sjùberg. 2010. Are all code smells harmful? A study of God Classes and Brain
Classes in the evolution of three open source systems. In 26th IEEE International Conference on Software Maintenance (ICSM 2010), September
12-18, 2010, Timisoara, Romania. 1-10.
[Palomba et al. 2014] Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrea De Lucia. 2014. Do They Really Smell
Bad? A Study on Developers’ Perception of Bad Code Smells. In Proc. of the 30th International Conference on Software Maintenance and Evolution.
101-110
[Bacchelli et al. 2013] Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In Proc. of the
35th International Conference on Software Engineering. 712-721
[Bacchelli et al. 2010] Alberto Bacchelli, Marco D’Ambros, and Michele Lanza. 2010. Are popular classes more defect prone?. In International
Conference on Fundamental Approaches to Software Engineering. Springer, 59-73.
[Mockus et al. 2000] Audris Mockus, Roy T Fielding, and James Herbsleb. 2000. A case study of open source software development: the Apache
server. In Proc. of the 22nd international conference on Software engineering. Acm, 263ś272.
[Hassan 2008] A. E. Hassan, "The road ahead for Mining Software Repositories," 2008 Frontiers of Software Maintenance, Beijing, China, 2008, pp.
48-57, doi: 10.1109/FOSM.2008.4659248.
[Dabbish et al. 2012] Dabbish, L., Stuart, C., Tsay, J., & Herbsleb, J. (2012, February). Social coding in GitHub: transparency and collaboration in an
open software repository. In Proceedings of the ACM 2012 conference on computer supported cooperative work (pp. 1277-1286).
33. 36
Petit titre
Titre de section
Texte du titre
Open-Source Analysis (github)
• Analyzed 30 days of activities from Eclipse,
Microsoft, and MooseTechnology.
• Analyzed ~4457 entities over 30 days
• Visualizations: daily commit distributions, user
activity
Industrial Case at Berger-Levrault (gitlab)
Connected Jira model with Git model to analyze
merge request distributions across different issue
types.
• 27% of Merge Requests linked to bug-related
Jira issues.
A user’s commit activity by day, during the
month of September 2024, on moosetechnology
Issues occurrences in September 2024 for
WeHR