Topic 1: Question Answering
Data and Web Science Group
• Task: Build a (Multi-)Agent QA system.
– Create an agent-based environment for QA challenges
– Make use of RAG, search engines, etc.
– Identify problems with each approach
• Dataset/APIs:
– Use existing benchmarks and relevant metrics
– https://www.aicrowd.com/challenges/meta-comprehensive-rag-
benchmark-kdd-cup-2024
• Evaluation:
– Relevant evaluation metrics in used benchmarks
– Compare different setups
Topic 2: Multi-Agent Gaming
Data and Web Science Group
• Task: Build a Multi-Agent gaming application.
– Have teams of agents cooperate to solve a game or play against
each other
– Team in last semester tried playing Codenames with good success
– Can be explorative or you use an existing agent benchmark
• Dataset/APIs:
– https://github.com/THUDM/AgentBench
– https://github.com/microsoft/SmartPlay
• Evaluation:
– Existing evaluation when using agent benchmark
– Otherwise depends on game, could be win rate, ELO ranking,