Directing Generative AI for Pharo Documentation

1
1
1
Directing Generative AI
for Pharo Documentation
How can we effectively use AI to help us write
documentation?
Pascal Zaragoza
Nicolas Hlad
ESUG 2025

2
2
2
Context
Our Approach
Experimentation
Results
Conclusion
3 - 5
6 – 13
14 - 19
20 - 22
23 - 25
Sommaire
01
02
03
04
05
ESUG 2025

4
4
4
Context : Documentation in Pharo 12
Why documentation matters
§ ~58% of a developer’s time is spent on code
comprehension [1].
§ Bad documentation = more time lost
§ Good documentation = less time lost
Code documentation in Pharo
§ Package, class and method-level comments
§ Class-Responsibility-Collaborator definition
ESUG 2025

5
5
5
Problem
Package documentation
§ Only 16.7% of packages have comments.
§ 81.1% of classes have comments.
§ 41.9% of methods have comments.
§ Most package comments are very short (60.3% <
100 characters).
Conclusion: There is a strong need for improved and scalable documentation practices in Pharo.
ESUG 2025

6
6
6
Our approach towards generating
package comments
ESUG 2025

7
7
7
Overview of the Comment Generation Approach
Goal: Improve Pharo package documentation
using LLMs.
Method: Retrieval-Augmented Generation
(RAG).
Focus: Evaluate how different information
sources affect generated comment quality.
3-Step Process:
§ Generate a model representation of the
package (using Moose).
§ Data extraction/retrieval from the model.
§ Comment generation via LLM.
https://github.com/pzaragoza93/AutoCodeDocumentator
Prompts
Comment
3) comment generation
1) model generation model
(mistral-small-2503)
2) Extraction strategy
ESUG 2025

8
8
8
Example Prompts
ESUG 2025

9
9
9
Example Prompts
ESUG 2025

10
10
10
Overview of the Comment Generation Approach
Prompt
Comment
LLM Service
model generation model
ESUG 2025

11
11
11
Strategy 1 – Naive Extraction
Input: Full source code of each class (.st files).
Process:
§ Summarize class responsibilities,
collaborators, and key implementations.
§ Use LLM to generate CRC-based package
comment from class summaries.
Pros:
§ Rich context.
§ Can infer detailed responsibilities and
interactions.
Cons:
§ Risk of hallucinations (e.g., non-existent
classes).
§ Computationally expensive due to large
context size.
ESUG 2025

12
12
12
Strategy 2 – Comment-Based Extraction
Input: Existing class comments only.
Process:
§ Aggregate class comments.
§ Generate package-level CRC comment
using LLM.
Pros:
§ Leverages human-authored summaries.
§ Lower risk of hallucination.
Cons:
§ Limited by comment coverage
(incomplete/missing comments).
§ Misses undocumented class behaviors or
dependencies.
ESUG 2025

13
13
13
Strategy 3 – Comment & Outgoing Reference Extraction
Input: Class comments + method-level
outgoing references.
Process:
§ Extract collaborators through reference
analysis.
§ Combine with existing class comments for
CRC-based comment generation.
Pros:
§ Balances authored insights with structural
dependency data.
§ Better handles inter-class collaboration
context.
Cons:
§ Dependent on reference accuracy and
structure parsing.
§ Limited by comment coverage
(incomplete/missing comments).
ESUG 2025

14
14
14
Experimentation
ESUG 2025

15
15
15
Experimentation
Purpose: Assess the impact of different LLM strategies on
package comment generation.
Strategies Tested:
§ Naive (source code based)
§ Comment-based
§ Comment + Dependency-based
Focus: Identify strengths and weaknesses across strategies.
ESUG 2025

16
16
16
Research questions
§ RQ1: Impact on CRC structure quality?
§ RQ2: Accuracy of responsibility descriptions?
§ RQ3: Accuracy of collaborator descriptions?
§ RQ4: Overall quality vs. original comments?
§ RQ5: Effect of package size on comment quality?
VS
ESUG 2025

17
17
17
Evaluation Dataset
Dataset: 21 Pharo packages
§ Grouped by size: Small, Medium, Large (7 each)
Filtering:
§ Only packages with existing comments included.
§ Excluded test and baseline packages.
Each package: Evaluated with all 3 strategies → 63
generated comments.
Large Language Model: mistral-small-2503
§ Apache 2 Licence
Package N
Package …
Package 1
Filter
Package 21
Package …
Package 1
Comment 1
(Strat 3)
Comment 1
(Strat 2)
Comment 1
(Strat 1)
Comment Generation
(LLM: mistral-small-2503)
Comment Evaluation
Evaluation 63
Evaluation …
Evaluation 1
…
ESUG 2025

18
18
18
Evaluation Method
Review Process:
§ 6 Pharo users in 3 groups.
§ Each user reviewed 7 packages and their 3
generated comments (21 comments per group).
Manual Scoring using 12 questions across 4 categories
(3 questions for each category):
§ CRC Structure (RQ 1)
§ Responsibility Accuracy (RQ 2)
§ Collaborator Accuracy (RQ 3)
§ Comparison to Original (RQ 4)
Scale: 7-point Likert (strongly disagree to strongly
agree)
Table 1: List of questions, their category and question ID used in the
questionnaire.
ESUG 2025

19
19
19
Evaluation Method
Review Process:
§ 6 Pharo users in 3 groups.
§ Each user reviewed 7 packages and their 3
generated comments (21 comments per group).
Manual Scoring using 12 questions across 4 categories
(3 questions for each category):
§ CRC Structure (RQ 1)
§ Responsibility Accuracy (RQ 2)
§ Collaborator Accuracy (RQ 3)
§ Comparison to Original (RQ 4)
Scale: 7-point Likert (strongly disagree to strongly
agree)
https://github.com/pzaragoza93/label-studio-pharo-evaluation
ESUG 2025

21
21
21
Results regarding RQ 1 - 4
§ Comparison between strategies across the 12
different statements :
§ No strategy offers a significatively better result (RQ
1, 2, 3, 4).
§ All strategies generate comments that are
prefered over existing comments
Table 2: Average Likert score for each question across all 3
strategies.
ESUG 2025

22
22
22
Results regarding RQ 5
Comparison of results between different package sizes
(small, medium, large):
§ Overall small packages receive higher scores
§ Small packages have clearer comments
§ Smaller packages have collaborators that are well-
mentioned & we are not missing key collaborators.
§ Smaller packages are more useful than existing
comments
Table 3: Average Likert score for each question across all 3
strategies.
ESUG 2025

24
24
24
Conclusion, Limitations, and Future Directions (?)
Limitations
§ Limited amount of evaluation per comment
§ Needs more work on prompt tuning, document
structure
§ Weak solution for identifying collaborators
Conclusions:
§ Generated comments are more complete, clear
and useful than some human-made comments
→ Maybe use when there are no comments ?
Future Directions
§ Use heuristics for identifying collaborators & GenAI
for describing these collaborations
§ Adapt to existing dynamic comment features (e.g.
examples)
§ Automate a pipeline for comment suggestion in
existing Pharo projects
ESUG 2025

25
25
25
Conclusion, Limitations, and Future Directions (?)
Limitations
§ Limited amount of evaluation per comment
§ Needs more work on prompt tuning, document
structure
§ Weak solution for identifying collaborators
Conclusions:
§ Generated comments are more complete, clear
and useful than some human-made comments
→ Maybe use when there are no comments ?
Future Directions
§ Use heuristics for identifying collaborators & GenAI
for describing these collaborations
§ Adapt to existing dynamic comment features (e.g.
examples)
§ Automate a pipeline for comment suggestion in
existing Pharo projects
ESUG 2025

Directing Generative AI for Pharo Documentation

More Related Content

More from ESUG (20)

Recently uploaded (20)

Directing Generative AI for Pharo Documentation