SlideShare a Scribd company logo
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Project Topic Presentation
IE686 Large Language Models and Agents
1
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
General Information
• Teams of 5 students work together on an applied project to
solve a problem with LLM agents
– Teams write a 12-page report about their project and present their
results during a presentation at the end of the semester
– 3 ECTS (70% written project report, 30% presentation of results)
• How to find a topic?
– We will propose a set of topic areas today
– You prepare a ranked list of your top three preferences and fill
them into this Google Form
– You can also propose a topic area of your own choosing
– I will match you to the topic areas based on your preferences
– If you already have other students you want to work with, please
submit only a single Google Form as a (sub-)team!
2
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Project Proposal
• We propose a set of topic areas that should be covered
• You define your project in one of these areas
• You are free in choosing which APIs/Benchmarks/Models
you wish to use
• To make sure projects remain feasible in the allotted time,
each team will prepare a project proposal outlining what
they intend to do
• More information about the proposals next week!
3
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Course Schedule
4
Day Topic
13.02 Lecture: Introduction to Language Models
20.02 Lecture: Instruction Tuning and RLHF
27.02 Lecture: Prompt Engineering and Efficient Adaptation
06.03 Lecture: LLM Agents and Tool Use
13.03 Exercise: Introduction to LangGraph 1
20.03 Project: Introduction to Student Projects
27.03 Exercise: Introduction to LangGraph 2
03.04 Exercise: Introduction to AutoGen
10.04 Project: Project Coaching
30.04 Project: Project Coaching
08.05 Project: Project Coaching
15.05 Project: Project Coaching
22.05 Project: Project Coaching
28.05 Project: Presentation of Project Results
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Topic 1: Question Answering
• Task: Build a (Multi-)Agent QA system.
– Create an agent-based environment for QA challenges
– Make use of RAG, search engines, etc.
– Identify problems with each approach
• Dataset/APIs:
– Use existing benchmarks and relevant metrics
– https://www.aicrowd.com/challenges/meta-comprehensive-rag-
benchmark-kdd-cup-2024
• Evaluation:
– Relevant evaluation metrics in used benchmarks
– Compare different setups
5
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Topic 2: Multi-Agent Gaming
• Task: Build a Multi-Agent gaming application.
– Have teams of agents cooperate to solve a game or play against
each other
– Team in last semester tried playing Codenames with good success
– Can be explorative or you use an existing agent benchmark
• Dataset/APIs:
– https://github.com/THUDM/AgentBench
– https://github.com/microsoft/SmartPlay
• Evaluation:
– Existing evaluation when using agent benchmark
– Otherwise depends on game, could be win rate, ELO ranking, …
6
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Topic 3: Online Shopping Assistant
• Task: Build a (Multi-)Agent system that supports a user in
making shopping decisions, e.g. for a new TV.
– Search and present relevant products
– Based on the users wishes
– Present in structured format or directly perform the transaction
• Dataset/APIs:
– Use existing benchmarks and relevant metrics
– https://github.com/THUDM/AgentBench
– https://webarena.dev/
– https://webshop-pnlp.github.io/
• Evaluation:
– Relevant evaluation metrics in used benchmarks
7
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Topic 4: Job Hunting Agents
• Task: Build a (Multi-)Agent application to search for job
postings relevant to a user.
– start with a user query
– search for relevant postings
– extract relevant facts about the jobs
– present to user in a structured, easy-to-browse format
• Dataset/APIs:
– select 2 job providers/APIs to work with
– https://apislist.com/category/28/jobs
• Evaluation:
– Explorative Topic, likely no relevant benchmarks exist
– Human-based evaluation by the group members
8
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Topic 5: Browser-Interaction
• Task: Build a (Multi-)Agent system that can perform various
tasks on websites.
– Site search/link following/extraction of relevant info
– Identify problems with each approach
• Dataset/APIs:
– Use existing benchmarks and relevant metrics
– https://github.com/THUDM/AgentBench
– https://webarena.dev/
• Evaluation:
– Relevant evaluation metrics in used benchmarks
• Additional references:
– https://arxiv.org/abs/2401.01614
– https://arxiv.org/abs/2401.13919v4
– https://openreview.net/pdf?id=9JQtrumvg8
9
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Topic 6: Text-to-SQL
• Task: Build a (Multi-)Agent system that converts natural
language queries to SQL.
– Convert query
– Query database
– Refine based on result/errors
• Dataset/APIs:
– https://paperswithcode.com/task/text-to-sql
– https://github.com/THUDM/AgentBench
• Evaluation:
– Relevant evaluation metrics in used benchmarks
10
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Topic 7: Software Developer
• Task: (Multi-)Agent system for software engineering.
– Generate code solutions based on descriptions or tests
– Generate and run tests on solutions
– Identify potential discrepancies in behavior of code solutions
– Recommend final solution
• Dataset/APIs/Language:
– https://evalplus.github.io/
– Python or Java
• Evaluation:
– Relevant evaluation metrics in used benchmarks
11
University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025
Data and Web Science Group
Time to Choose!
• Decide for a ranked list of three topic areas by yourself or
with your team if you already have someone to work with
• Fill out this Google Form with your preferences
– until Tuesday 18th March (end-of-day)
– If you already have a team, please only fill a single form with your
teams preferences
• We will assign you and present the teams in next weeks
session
• More information about the project work, coachings,
project proposal and final report in next weeks session!
12

More Related Content

PDF
LLM Agents and Tool Use Data and Web Science Group IE686 Large Language Model...
PPTX
U mass data-project proposal-noapp
PDF
QuSandbox+NVIDIA Rapids
DOC
Preliminry report
PDF
Interactive solutions - Web usability
PDF
Team Data Science Process Presentation (TDSP), Aug 29, 2017
PDF
KSU IT4983 Capstone Projects Report 2017 Update
PPTX
Lak2018: Scaling Nationally: Seven Lesson Learned
LLM Agents and Tool Use Data and Web Science Group IE686 Large Language Model...
U mass data-project proposal-noapp
QuSandbox+NVIDIA Rapids
Preliminry report
Interactive solutions - Web usability
Team Data Science Process Presentation (TDSP), Aug 29, 2017
KSU IT4983 Capstone Projects Report 2017 Update
Lak2018: Scaling Nationally: Seven Lesson Learned

Similar to Project Topic Presentation Data and Web Science Group IE686 Large Language Models and Agents.pdf (20)

DOC
Anshu Dubey
DOC
PDF
KSU IT Capstone Report 2012-2017.pdf
PDF
Cooperation Menu for Universities and Researchers in Latvia | Accenture
PPTX
server side scripting lecture zero .pptx
PDF
Big Data projects.pdf
PDF
Proof of Concept for Learning Analytics Interoperability
PDF
Maruti gollapudi cv
PDF
Introduction and Organization Data and Web Science Group IE686 Large Language...
PPTX
final_review_miniproject_chatbot-2. .pptx
PPT
Sanket 895 presentation
PPTX
Collaborative Working: University of Sunderland & Roundhouse Digital
PPTX
Tracking and Controlling Technical Documentation Projects
PDF
PMO and Project Server 2013 main features
PPT
PMO and project server 2013
PPT
IWMW 2004: Using your Ayes and Noes: Creating a Business Case for an Institut...
PPTX
ALIGNED Data Curation Methods and Tools
PPTX
RDM Roadmap to the Future, or: Lords and Ladies of the Data
PDF
Luna - How to build and maintain a github project
PDF
Sakshi sharma resume
Anshu Dubey
KSU IT Capstone Report 2012-2017.pdf
Cooperation Menu for Universities and Researchers in Latvia | Accenture
server side scripting lecture zero .pptx
Big Data projects.pdf
Proof of Concept for Learning Analytics Interoperability
Maruti gollapudi cv
Introduction and Organization Data and Web Science Group IE686 Large Language...
final_review_miniproject_chatbot-2. .pptx
Sanket 895 presentation
Collaborative Working: University of Sunderland & Roundhouse Digital
Tracking and Controlling Technical Documentation Projects
PMO and Project Server 2013 main features
PMO and project server 2013
IWMW 2004: Using your Ayes and Noes: Creating a Business Case for an Institut...
ALIGNED Data Curation Methods and Tools
RDM Roadmap to the Future, or: Lords and Ladies of the Data
Luna - How to build and maintain a github project
Sakshi sharma resume
Ad

More from cniclsh (6)

PDF
Lecture 2 Solving Problems by Searching by Marco Chiarandini
PDF
Course Introduction Artificial Intelligence by Marco Chiarandini
PDF
Lecture 5 Baysian Networks by Marco Chiarandini
PDF
Instruction Tuning and Reinforcement Learning from Human Feedback Data and We...
PDF
DM2556 Intercultural communication Lecture 3
PDF
Speech and Speaker Recognition - Language Modelling
Lecture 2 Solving Problems by Searching by Marco Chiarandini
Course Introduction Artificial Intelligence by Marco Chiarandini
Lecture 5 Baysian Networks by Marco Chiarandini
Instruction Tuning and Reinforcement Learning from Human Feedback Data and We...
DM2556 Intercultural communication Lecture 3
Speech and Speaker Recognition - Language Modelling
Ad

Recently uploaded (20)

PPTX
Transform Your Business with a Software ERP System
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
history of c programming in notes for students .pptx
PPTX
Essential Infomation Tech presentation.pptx
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Digital Strategies for Manufacturing Companies
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Transform Your Business with a Software ERP System
How to Choose the Right IT Partner for Your Business in Malaysia
VVF-Customer-Presentation2025-Ver1.9.pptx
CHAPTER 2 - PM Management and IT Context
history of c programming in notes for students .pptx
Essential Infomation Tech presentation.pptx
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Design an Analysis of Algorithms II-SECS-1021-03
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Digital Strategies for Manufacturing Companies
wealthsignaloriginal-com-DS-text-... (1).pdf
Nekopoi APK 2025 free lastest update
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Understanding Forklifts - TECH EHS Solution
Operating system designcfffgfgggggggvggggggggg
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Softaken Excel to vCard Converter Software.pdf
Odoo Companies in India – Driving Business Transformation.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...

Project Topic Presentation Data and Web Science Group IE686 Large Language Models and Agents.pdf

  • 1. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Project Topic Presentation IE686 Large Language Models and Agents 1
  • 2. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group General Information • Teams of 5 students work together on an applied project to solve a problem with LLM agents – Teams write a 12-page report about their project and present their results during a presentation at the end of the semester – 3 ECTS (70% written project report, 30% presentation of results) • How to find a topic? – We will propose a set of topic areas today – You prepare a ranked list of your top three preferences and fill them into this Google Form – You can also propose a topic area of your own choosing – I will match you to the topic areas based on your preferences – If you already have other students you want to work with, please submit only a single Google Form as a (sub-)team! 2
  • 3. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Project Proposal • We propose a set of topic areas that should be covered • You define your project in one of these areas • You are free in choosing which APIs/Benchmarks/Models you wish to use • To make sure projects remain feasible in the allotted time, each team will prepare a project proposal outlining what they intend to do • More information about the proposals next week! 3
  • 4. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Course Schedule 4 Day Topic 13.02 Lecture: Introduction to Language Models 20.02 Lecture: Instruction Tuning and RLHF 27.02 Lecture: Prompt Engineering and Efficient Adaptation 06.03 Lecture: LLM Agents and Tool Use 13.03 Exercise: Introduction to LangGraph 1 20.03 Project: Introduction to Student Projects 27.03 Exercise: Introduction to LangGraph 2 03.04 Exercise: Introduction to AutoGen 10.04 Project: Project Coaching 30.04 Project: Project Coaching 08.05 Project: Project Coaching 15.05 Project: Project Coaching 22.05 Project: Project Coaching 28.05 Project: Presentation of Project Results
  • 5. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Topic 1: Question Answering • Task: Build a (Multi-)Agent QA system. – Create an agent-based environment for QA challenges – Make use of RAG, search engines, etc. – Identify problems with each approach • Dataset/APIs: – Use existing benchmarks and relevant metrics – https://www.aicrowd.com/challenges/meta-comprehensive-rag- benchmark-kdd-cup-2024 • Evaluation: – Relevant evaluation metrics in used benchmarks – Compare different setups 5
  • 6. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Topic 2: Multi-Agent Gaming • Task: Build a Multi-Agent gaming application. – Have teams of agents cooperate to solve a game or play against each other – Team in last semester tried playing Codenames with good success – Can be explorative or you use an existing agent benchmark • Dataset/APIs: – https://github.com/THUDM/AgentBench – https://github.com/microsoft/SmartPlay • Evaluation: – Existing evaluation when using agent benchmark – Otherwise depends on game, could be win rate, ELO ranking, … 6
  • 7. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Topic 3: Online Shopping Assistant • Task: Build a (Multi-)Agent system that supports a user in making shopping decisions, e.g. for a new TV. – Search and present relevant products – Based on the users wishes – Present in structured format or directly perform the transaction • Dataset/APIs: – Use existing benchmarks and relevant metrics – https://github.com/THUDM/AgentBench – https://webarena.dev/ – https://webshop-pnlp.github.io/ • Evaluation: – Relevant evaluation metrics in used benchmarks 7
  • 8. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Topic 4: Job Hunting Agents • Task: Build a (Multi-)Agent application to search for job postings relevant to a user. – start with a user query – search for relevant postings – extract relevant facts about the jobs – present to user in a structured, easy-to-browse format • Dataset/APIs: – select 2 job providers/APIs to work with – https://apislist.com/category/28/jobs • Evaluation: – Explorative Topic, likely no relevant benchmarks exist – Human-based evaluation by the group members 8
  • 9. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Topic 5: Browser-Interaction • Task: Build a (Multi-)Agent system that can perform various tasks on websites. – Site search/link following/extraction of relevant info – Identify problems with each approach • Dataset/APIs: – Use existing benchmarks and relevant metrics – https://github.com/THUDM/AgentBench – https://webarena.dev/ • Evaluation: – Relevant evaluation metrics in used benchmarks • Additional references: – https://arxiv.org/abs/2401.01614 – https://arxiv.org/abs/2401.13919v4 – https://openreview.net/pdf?id=9JQtrumvg8 9
  • 10. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Topic 6: Text-to-SQL • Task: Build a (Multi-)Agent system that converts natural language queries to SQL. – Convert query – Query database – Refine based on result/errors • Dataset/APIs: – https://paperswithcode.com/task/text-to-sql – https://github.com/THUDM/AgentBench • Evaluation: – Relevant evaluation metrics in used benchmarks 10
  • 11. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Topic 7: Software Developer • Task: (Multi-)Agent system for software engineering. – Generate code solutions based on descriptions or tests – Generate and run tests on solutions – Identify potential discrepancies in behavior of code solutions – Recommend final solution • Dataset/APIs/Language: – https://evalplus.github.io/ – Python or Java • Evaluation: – Relevant evaluation metrics in used benchmarks 11
  • 12. University of Mannheim | IE686 LLMs and Agents | Project Topic Presentation|Version 13.03.2025 Data and Web Science Group Time to Choose! • Decide for a ranked list of three topic areas by yourself or with your team if you already have someone to work with • Fill out this Google Form with your preferences – until Tuesday 18th March (end-of-day) – If you already have a team, please only fill a single form with your teams preferences • We will assign you and present the teams in next weeks session • More information about the project work, coachings, project proposal and final report in next weeks session! 12