SlideShare a Scribd company logo
1
1
1
Directing Generative AI
for Pharo Documentation
How can we effectively use AI to help us write
documentation?
Pascal Zaragoza
Nicolas Hlad
ESUG 2025
2
2
2
Context
Our Approach
Experimentation
Results
Conclusion
3 - 5
6 – 13
14 - 19
20 - 22
23 - 25
Sommaire
01
02
03
04
05
ESUG 2025
3
3
3
Context
ESUG 2025
4
4
4
Context : Documentation in Pharo 12
Why documentation matters
§ ~58% of a developer’s time is spent on code
comprehension [1].
§ Bad documentation = more time lost
§ Good documentation = less time lost
Code documentation in Pharo
§ Package, class and method-level comments
§ Class-Responsibility-Collaborator definition
ESUG 2025
5
5
5
Problem
Package documentation
§ Only 16.7% of packages have comments.
§ 81.1% of classes have comments.
§ 41.9% of methods have comments.
§ Most package comments are very short (60.3% <
100 characters).
Conclusion: There is a strong need for improved and scalable documentation practices in Pharo.
ESUG 2025
6
6
6
Our approach towards generating
package comments
ESUG 2025
7
7
7
Overview of the Comment Generation Approach
Goal: Improve Pharo package documentation
using LLMs.
Method: Retrieval-Augmented Generation
(RAG).
Focus: Evaluate how different information
sources affect generated comment quality.
3-Step Process:
§ Generate a model representation of the
package (using Moose).
§ Data extraction/retrieval from the model.
§ Comment generation via LLM.
https://github.com/pzaragoza93/AutoCodeDocumentator
Prompts
Comment
3) comment generation
1) model generation model
(mistral-small-2503)
2) Extraction strategy
ESUG 2025
8
8
8
Example Prompts
ESUG 2025
9
9
9
Example Prompts
ESUG 2025
10
10
10
Overview of the Comment Generation Approach
https://github.com/pzaragoza93/AutoCodeDocumentator
Prompt
Comment
LLM Service
model generation model
(mistral-small-2503)
ESUG 2025
11
11
11
Strategy 1 – Naive Extraction
Input: Full source code of each class (.st files).
Process:
§ Summarize class responsibilities,
collaborators, and key implementations.
§ Use LLM to generate CRC-based package
comment from class summaries.
Pros:
§ Rich context.
§ Can infer detailed responsibilities and
interactions.
Cons:
§ Risk of hallucinations (e.g., non-existent
classes).
§ Computationally expensive due to large
context size.
https://github.com/pzaragoza93/AutoCodeDocumentator
ESUG 2025
12
12
12
Strategy 2 – Comment-Based Extraction
Input: Existing class comments only.
Process:
§ Aggregate class comments.
§ Generate package-level CRC comment
using LLM.
Pros:
§ Leverages human-authored summaries.
§ Lower risk of hallucination.
Cons:
§ Limited by comment coverage
(incomplete/missing comments).
§ Misses undocumented class behaviors or
dependencies.
https://github.com/pzaragoza93/AutoCodeDocumentator
ESUG 2025
13
13
13
Strategy 3 – Comment & Outgoing Reference Extraction
Input: Class comments + method-level
outgoing references.
Process:
§ Extract collaborators through reference
analysis.
§ Combine with existing class comments for
CRC-based comment generation.
Pros:
§ Balances authored insights with structural
dependency data.
§ Better handles inter-class collaboration
context.
Cons:
§ Dependent on reference accuracy and
structure parsing.
§ Limited by comment coverage
(incomplete/missing comments).
https://github.com/pzaragoza93/AutoCodeDocumentator
ESUG 2025
14
14
14
Experimentation
ESUG 2025
15
15
15
Experimentation
Purpose: Assess the impact of different LLM strategies on
package comment generation.
Strategies Tested:
§ Naive (source code based)
§ Comment-based
§ Comment + Dependency-based
Focus: Identify strengths and weaknesses across strategies.
ESUG 2025
16
16
16
Research questions
§ RQ1: Impact on CRC structure quality?
§ RQ2: Accuracy of responsibility descriptions?
§ RQ3: Accuracy of collaborator descriptions?
§ RQ4: Overall quality vs. original comments?
§ RQ5: Effect of package size on comment quality?
VS
ESUG 2025
17
17
17
Evaluation Dataset
Dataset: 21 Pharo packages
§ Grouped by size: Small, Medium, Large (7 each)
Filtering:
§ Only packages with existing comments included.
§ Excluded test and baseline packages.
Each package: Evaluated with all 3 strategies → 63
generated comments.
Large Language Model: mistral-small-2503
§ Apache 2 Licence
Package N
Package …
Package 1
Filter
Package 21
Package …
Package 1
Comment 1
(Strat 3)
Comment 1
(Strat 2)
Comment 1
(Strat 1)
Comment Generation
(LLM: mistral-small-2503)
Comment Evaluation
Evaluation 63
Evaluation …
Evaluation 1
…
(mistral-small-2503)
ESUG 2025
18
18
18
Evaluation Method
Review Process:
§ 6 Pharo users in 3 groups.
§ Each user reviewed 7 packages and their 3
generated comments (21 comments per group).
Manual Scoring using 12 questions across 4 categories
(3 questions for each category):
§ CRC Structure (RQ 1)
§ Responsibility Accuracy (RQ 2)
§ Collaborator Accuracy (RQ 3)
§ Comparison to Original (RQ 4)
Scale: 7-point Likert (strongly disagree to strongly
agree)
Table 1: List of questions, their category and question ID used in the
questionnaire.
ESUG 2025
19
19
19
Evaluation Method
Review Process:
§ 6 Pharo users in 3 groups.
§ Each user reviewed 7 packages and their 3
generated comments (21 comments per group).
Manual Scoring using 12 questions across 4 categories
(3 questions for each category):
§ CRC Structure (RQ 1)
§ Responsibility Accuracy (RQ 2)
§ Collaborator Accuracy (RQ 3)
§ Comparison to Original (RQ 4)
Scale: 7-point Likert (strongly disagree to strongly
agree)
https://github.com/pzaragoza93/label-studio-pharo-evaluation
ESUG 2025
20
20
20
Results
ESUG 2025
21
21
21
Results regarding RQ 1 - 4
§ Comparison between strategies across the 12
different statements :
§ No strategy offers a significatively better result (RQ
1, 2, 3, 4).
§ All strategies generate comments that are
prefered over existing comments
Table 2: Average Likert score for each question across all 3
strategies.
ESUG 2025
22
22
22
Results regarding RQ 5
Comparison of results between different package sizes
(small, medium, large):
§ Overall small packages receive higher scores
§ Small packages have clearer comments
§ Smaller packages have collaborators that are well-
mentioned & we are not missing key collaborators.
§ Smaller packages are more useful than existing
comments
Table 3: Average Likert score for each question across all 3
strategies.
ESUG 2025
23
23
23
Conclusion
ESUG 2025
24
24
24
Conclusion, Limitations, and Future Directions (?)
Limitations
§ Limited amount of evaluation per comment
§ Needs more work on prompt tuning, document
structure
§ Weak solution for identifying collaborators
Conclusions:
§ Generated comments are more complete, clear
and useful than some human-made comments
→ Maybe use when there are no comments ?
Future Directions
§ Use heuristics for identifying collaborators & GenAI
for describing these collaborations
§ Adapt to existing dynamic comment features (e.g.
examples)
§ Automate a pipeline for comment suggestion in
existing Pharo projects
ESUG 2025
25
25
25
Conclusion, Limitations, and Future Directions (?)
Limitations
§ Limited amount of evaluation per comment
§ Needs more work on prompt tuning, document
structure
§ Weak solution for identifying collaborators
Conclusions:
§ Generated comments are more complete, clear
and useful than some human-made comments
→ Maybe use when there are no comments ?
Future Directions
§ Use heuristics for identifying collaborators & GenAI
for describing these collaborations
§ Adapt to existing dynamic comment features (e.g.
examples)
§ Automate a pipeline for comment suggestion in
existing Pharo projects
ESUG 2025
26
26
26
References
ESUG 2025
27
27
27
Some stats
ESUG 2025

More Related Content

PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
PDF
Micromaid: A simple Mermaid-like chart generator for Pharo
PDF
Even Lighter Than Lightweiht: Augmenting Type Inference with Primitive Heuris...
PDF
Composing and Performing Electronic Music on-the-Fly with Pharo and Coypu
PDF
Gamifying Agent-Based Models in Cormas: Towards the Playable Architecture for...
PDF
Analysing Python Machine Learning Notebooks with Moose
PDF
FASTTypeScript metamodel generation using FAST traits and TreeSitter project
PDF
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
Micromaid: A simple Mermaid-like chart generator for Pharo
Even Lighter Than Lightweiht: Augmenting Type Inference with Primitive Heuris...
Composing and Performing Electronic Music on-the-Fly with Pharo and Coypu
Gamifying Agent-Based Models in Cormas: Towards the Playable Architecture for...
Analysing Python Machine Learning Notebooks with Moose
FASTTypeScript metamodel generation using FAST traits and TreeSitter project
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering

More from ESUG (20)

PDF
Package-Aware Approach for Repository-Level Code Completion in Pharo
PDF
Evaluating Benchmark Quality: a Mutation-Testing- Based Methodology
PDF
An Analysis of Inline Method Refactoring
PDF
Identification of unnecessary object allocations using static escape analysis
PDF
Control flow-sensitive optimizations In the Druid Meta-Compiler
PDF
Clean Blocks (IWST 2025, Gdansk, Poland)
PDF
Encoding for Objects Matters (IWST 2025)
PDF
Challenges of Transpiling Smalltalk to JavaScript
PDF
Immersive experiences: what Pharo users do!
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
PDF
Cavrois - an Organic Window Management (ESUG 2025)
PDF
Fluid Class Definitions in Pharo (ESUG 2025)
PDF
Glamorous Toolkit (ESUG 2025, Gdansk, Poland)
PDF
Clap: Command Line Argument Parser (ESUG 2025)
PDF
Canyon: Creating mobile apps using CodeParadise
PDF
What companies do with Pharo (ESUG 2025)
PDF
How to test a Spec App ? (ESUG 2025, Gdansk)
PDF
Mining software repository with Pharo (ESUG 2025)
PDF
Enhancing Security in VAST: Towards Static Vulnerability Scanning
PDF
(Turbo)Phausto: News from the pitlane (ESUG 2025)
Package-Aware Approach for Repository-Level Code Completion in Pharo
Evaluating Benchmark Quality: a Mutation-Testing- Based Methodology
An Analysis of Inline Method Refactoring
Identification of unnecessary object allocations using static escape analysis
Control flow-sensitive optimizations In the Druid Meta-Compiler
Clean Blocks (IWST 2025, Gdansk, Poland)
Encoding for Objects Matters (IWST 2025)
Challenges of Transpiling Smalltalk to JavaScript
Immersive experiences: what Pharo users do!
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
Cavrois - an Organic Window Management (ESUG 2025)
Fluid Class Definitions in Pharo (ESUG 2025)
Glamorous Toolkit (ESUG 2025, Gdansk, Poland)
Clap: Command Line Argument Parser (ESUG 2025)
Canyon: Creating mobile apps using CodeParadise
What companies do with Pharo (ESUG 2025)
How to test a Spec App ? (ESUG 2025, Gdansk)
Mining software repository with Pharo (ESUG 2025)
Enhancing Security in VAST: Towards Static Vulnerability Scanning
(Turbo)Phausto: News from the pitlane (ESUG 2025)
Ad

Recently uploaded (20)

PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
famous lake in india and its disturibution and importance
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPT
Chemical bonding and molecular structure
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Sciences of Europe No 170 (2025)
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPT
protein biochemistry.ppt for university classes
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
famous lake in india and its disturibution and importance
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Chemical bonding and molecular structure
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
The KM-GBF monitoring framework – status & key messages.pptx
ECG_Course_Presentation د.محمد صقران ppt
Sciences of Europe No 170 (2025)
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Phytochemical Investigation of Miliusa longipes.pdf
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
protein biochemistry.ppt for university classes
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
TOTAL hIP ARTHROPLASTY Presentation.pptx
Ad

Directing Generative AI for Pharo Documentation

  • 1. 1 1 1 Directing Generative AI for Pharo Documentation How can we effectively use AI to help us write documentation? Pascal Zaragoza Nicolas Hlad ESUG 2025
  • 2. 2 2 2 Context Our Approach Experimentation Results Conclusion 3 - 5 6 – 13 14 - 19 20 - 22 23 - 25 Sommaire 01 02 03 04 05 ESUG 2025
  • 4. 4 4 4 Context : Documentation in Pharo 12 Why documentation matters § ~58% of a developer’s time is spent on code comprehension [1]. § Bad documentation = more time lost § Good documentation = less time lost Code documentation in Pharo § Package, class and method-level comments § Class-Responsibility-Collaborator definition ESUG 2025
  • 5. 5 5 5 Problem Package documentation § Only 16.7% of packages have comments. § 81.1% of classes have comments. § 41.9% of methods have comments. § Most package comments are very short (60.3% < 100 characters). Conclusion: There is a strong need for improved and scalable documentation practices in Pharo. ESUG 2025
  • 6. 6 6 6 Our approach towards generating package comments ESUG 2025
  • 7. 7 7 7 Overview of the Comment Generation Approach Goal: Improve Pharo package documentation using LLMs. Method: Retrieval-Augmented Generation (RAG). Focus: Evaluate how different information sources affect generated comment quality. 3-Step Process: § Generate a model representation of the package (using Moose). § Data extraction/retrieval from the model. § Comment generation via LLM. https://github.com/pzaragoza93/AutoCodeDocumentator Prompts Comment 3) comment generation 1) model generation model (mistral-small-2503) 2) Extraction strategy ESUG 2025
  • 10. 10 10 10 Overview of the Comment Generation Approach https://github.com/pzaragoza93/AutoCodeDocumentator Prompt Comment LLM Service model generation model (mistral-small-2503) ESUG 2025
  • 11. 11 11 11 Strategy 1 – Naive Extraction Input: Full source code of each class (.st files). Process: § Summarize class responsibilities, collaborators, and key implementations. § Use LLM to generate CRC-based package comment from class summaries. Pros: § Rich context. § Can infer detailed responsibilities and interactions. Cons: § Risk of hallucinations (e.g., non-existent classes). § Computationally expensive due to large context size. https://github.com/pzaragoza93/AutoCodeDocumentator ESUG 2025
  • 12. 12 12 12 Strategy 2 – Comment-Based Extraction Input: Existing class comments only. Process: § Aggregate class comments. § Generate package-level CRC comment using LLM. Pros: § Leverages human-authored summaries. § Lower risk of hallucination. Cons: § Limited by comment coverage (incomplete/missing comments). § Misses undocumented class behaviors or dependencies. https://github.com/pzaragoza93/AutoCodeDocumentator ESUG 2025
  • 13. 13 13 13 Strategy 3 – Comment & Outgoing Reference Extraction Input: Class comments + method-level outgoing references. Process: § Extract collaborators through reference analysis. § Combine with existing class comments for CRC-based comment generation. Pros: § Balances authored insights with structural dependency data. § Better handles inter-class collaboration context. Cons: § Dependent on reference accuracy and structure parsing. § Limited by comment coverage (incomplete/missing comments). https://github.com/pzaragoza93/AutoCodeDocumentator ESUG 2025
  • 15. 15 15 15 Experimentation Purpose: Assess the impact of different LLM strategies on package comment generation. Strategies Tested: § Naive (source code based) § Comment-based § Comment + Dependency-based Focus: Identify strengths and weaknesses across strategies. ESUG 2025
  • 16. 16 16 16 Research questions § RQ1: Impact on CRC structure quality? § RQ2: Accuracy of responsibility descriptions? § RQ3: Accuracy of collaborator descriptions? § RQ4: Overall quality vs. original comments? § RQ5: Effect of package size on comment quality? VS ESUG 2025
  • 17. 17 17 17 Evaluation Dataset Dataset: 21 Pharo packages § Grouped by size: Small, Medium, Large (7 each) Filtering: § Only packages with existing comments included. § Excluded test and baseline packages. Each package: Evaluated with all 3 strategies → 63 generated comments. Large Language Model: mistral-small-2503 § Apache 2 Licence Package N Package … Package 1 Filter Package 21 Package … Package 1 Comment 1 (Strat 3) Comment 1 (Strat 2) Comment 1 (Strat 1) Comment Generation (LLM: mistral-small-2503) Comment Evaluation Evaluation 63 Evaluation … Evaluation 1 … (mistral-small-2503) ESUG 2025
  • 18. 18 18 18 Evaluation Method Review Process: § 6 Pharo users in 3 groups. § Each user reviewed 7 packages and their 3 generated comments (21 comments per group). Manual Scoring using 12 questions across 4 categories (3 questions for each category): § CRC Structure (RQ 1) § Responsibility Accuracy (RQ 2) § Collaborator Accuracy (RQ 3) § Comparison to Original (RQ 4) Scale: 7-point Likert (strongly disagree to strongly agree) Table 1: List of questions, their category and question ID used in the questionnaire. ESUG 2025
  • 19. 19 19 19 Evaluation Method Review Process: § 6 Pharo users in 3 groups. § Each user reviewed 7 packages and their 3 generated comments (21 comments per group). Manual Scoring using 12 questions across 4 categories (3 questions for each category): § CRC Structure (RQ 1) § Responsibility Accuracy (RQ 2) § Collaborator Accuracy (RQ 3) § Comparison to Original (RQ 4) Scale: 7-point Likert (strongly disagree to strongly agree) https://github.com/pzaragoza93/label-studio-pharo-evaluation ESUG 2025
  • 21. 21 21 21 Results regarding RQ 1 - 4 § Comparison between strategies across the 12 different statements : § No strategy offers a significatively better result (RQ 1, 2, 3, 4). § All strategies generate comments that are prefered over existing comments Table 2: Average Likert score for each question across all 3 strategies. ESUG 2025
  • 22. 22 22 22 Results regarding RQ 5 Comparison of results between different package sizes (small, medium, large): § Overall small packages receive higher scores § Small packages have clearer comments § Smaller packages have collaborators that are well- mentioned & we are not missing key collaborators. § Smaller packages are more useful than existing comments Table 3: Average Likert score for each question across all 3 strategies. ESUG 2025
  • 24. 24 24 24 Conclusion, Limitations, and Future Directions (?) Limitations § Limited amount of evaluation per comment § Needs more work on prompt tuning, document structure § Weak solution for identifying collaborators Conclusions: § Generated comments are more complete, clear and useful than some human-made comments → Maybe use when there are no comments ? Future Directions § Use heuristics for identifying collaborators & GenAI for describing these collaborations § Adapt to existing dynamic comment features (e.g. examples) § Automate a pipeline for comment suggestion in existing Pharo projects ESUG 2025
  • 25. 25 25 25 Conclusion, Limitations, and Future Directions (?) Limitations § Limited amount of evaluation per comment § Needs more work on prompt tuning, document structure § Weak solution for identifying collaborators Conclusions: § Generated comments are more complete, clear and useful than some human-made comments → Maybe use when there are no comments ? Future Directions § Use heuristics for identifying collaborators & GenAI for describing these collaborations § Adapt to existing dynamic comment features (e.g. examples) § Automate a pipeline for comment suggestion in existing Pharo projects ESUG 2025