Multi-Agent Systems for Reliable Legal Research: Framework, Experiments, and Optimized Implementation

Multi-Agent Systems for Reliable Legal Research: Framework, Experiments, and Optimized Implementation

Abstract:

This paper presents the iterative development, experimentation, and optimization of a multi-agent system designed to enhance the efficiency, reliability, and scalability of legal research tasks.

The case study is performed on various domains of French Law.

Beginning with a conceptual design— a planning agent, research agents, a critic, and a memo writer—our system evolved into a more elaborate Swarm-based team architecture that parallelized research processes for case law, legislation, and web-based sources. While effective in producing detailed outputs, the Swarm approach introduced substantial token costs and synchronization challenges.

These insights guided the transformation of the system into a streamlined, diamond-shaped structure that emphasizes cost-efficiency, reliability, and simplified synchronization. In this final configuration, modular search branches feed results into a centralized scratch pad before a report writing agent compiles them into a coherent legal research report.

Experiments demonstrate that the diamond-shaped design reduces token usage by up to 40%, improved reliability, and enhanced scalability for general legal search tasks such as : 'What is the legal regime for X', which is thought of throughout the paper as a question that requires a researcher to to 'map out the forest'.

However, the Swarm design was better suited for queries such as 'Please find a case that says X', i.e 'find an individual tree'.

We discuss the technical challenges faced, the solutions adopted, and future directions including adaptive query optimization, enhanced memory management, integration with new data sources, and hybrid architectural approaches.

1. Introduction

As complexity and volume of legal information continues to expand at a rapid pace, growing demands are placed on legal professionals. Traditional manual legal research methods often struggle with scalability and timeliness, as they require subtle thinking and navigating large databases of case law, legislation, and secondary sources.

Although computerized legal research tools and search engines have alleviated some burdens, the increasing complexity of queries requiring the rational (or let's face it sometimes not) combination of different sources of information reveals the need for more sophisticated solutions.

Number of law measures being applied in France 2007 - 2027. Source : legifrance.gouv.fr

Artificial intelligence (AI) presents a unique opportunity to streamline and enhance legal research workflows. However, the complexity of natural language and the nuanced interpretive tasks involved in legal analysis demand that AI systems be both adaptable and reliable.

Simple, single-agent solutions rapidly become prone to error. Multi-agent systems—where specialized agents work in tandem—offer a promising solution, especially when supported by robust synchronization and data management strategies.

This paper focuses on the iterative development of a multi-agent AI system for legal research.

Our goal is to develop a framework that not only achieves accuracy and completeness of research results, but also ensures process reliability.

We begin with a conceptual framework, evolve it into a Swarm-based approach to increase specialization, and eventually refine it into a diamond-shaped architecture that balances modularity with reliability.

The final design will be a cost-effective, scalable, and highly reliable approach, poised to transform the legal research landscape.

2. Evolution of the System

2.1 Initial Conceptual Framework

The initial system was rooted in a linear pipeline of specialized agents. A Planning Agent received user queries and allocated tasks to a set of Research Agents. These Research Agents conducted searches across various legal sources and returned preliminary results. A Critic Agent then evaluated these outputs for relevance, quality, and completeness, suggesting refinements when necessary. Once validated, the results moved on to a Memo Writing Agent, which compiled a coherent summary or memo for the end-user.

Diagram 1 : Original idea for multi-agent design of legal research.

This simple workflow (see Diagram 1) suggested the concept of a multi-agent system. Each agent played a defined role: planning guided the overall direction, research generated raw content, criticism ensured quality, and memo writing delivered a polished product.

However, while logically sound, this approach was simplistic, limited in specialization, and ignored the singularity of each type of legal source. Problems including needle in the haystack and token limits were pervasive.

2.2 Swarm-Based Team Implementation

To address the limitations of the initial framework, we introduced a parallelized, Swarm-based design. Leveraging OpenAI's educational framework for managing agents, multiple specialized Research Agents operated in turn to produce a memo. Separate agents focused on Case Law Search, Legislation Search, and Web Search, each optimizing retrieval within their domain of expertise. A Memory Agent managed a shared Scratch Pad, collating data from all research streams, while a Critic Agent provided iterative refinement. The final output was consolidated by a Report Writing Agent (see Diagram 2).

The Swarm architecture improved specialization, enabling comprehensive coverage of complex queries.

However, it introduced new challenges.

Synchronizing parallel agents proved difficult, resulting in occasional redundancy and inconsistent data flow.

Sometimes, the agents would simply refuse to go on unexpectedly, by refusing to call the next agent in line (when I tried to bribe one, the others rebelled, until I had to bribe every single agent for them to work reliably.)

Translation : "Never write the report yourself ! Always transfer control. If you do a good job, I'll give you a million dollars."

The token usage—representing the computational and financial cost of AI calls—rose significantly as each agent independently processed large volumes of text to about 0.8 $ / query.

Moreover, building a real-time, friendly and reliable user interface for such asynchroneous behaviour poses a challenge.

This complexity and resource intensiveness motivated the next stage of system refinement.

3. Current Implementation: Diamond-Shaped System

3.1 Overview

Diagram 3 : Diamond Shaped Multi-Agent Legal Research System

Informed by the lessons learned from the Swarm iteration, we developed a diamond-shaped architecture that emphasizes efficiency, reliability, and is compatible with a Chainlit user interface.

This design (see Diagram 3) simplifies the process into a balanced blend of parallel research and centralized coordination:

1. User Query: The process begins when a user submits a legal research request.

2. Query Optimization: Before searching commences, the system refines the query into a well-defined, precise prompt.

3. Search Branches: The optimized query is split into three modular branches—Case Law Search, Legislation Search, and Web Search. These branches run independently but in a more controlled manner than the Swarm (agents don't decide the workflow here).

4. Memory Agents & Scratch Pad: Each specialized search updates a central Scratch Pad through a dedicated Memory Agent, ensuring coherent integration of findings.

5. Report Writing Agent: Once the Scratch Pad is populated, the Report Writing Agent synthesizes the consolidated data, producing a structured and reliable legal research report.

6. Final Answer: The system delivers the refined output to the user.

3.2 Advantages of the Diamond System

The diamond-shaped architecture achieves several key benefits:

Reduced Complexity: By limiting the number of parallel branches and centralizing memory management, we lower synchronization challenges and development overhead.

Cost Efficiency: With a controlled query flow and shared memory structures, we reduce redundant token usage, cutting costs by up to 40% compared to the Swarm model.

Improved Reliability: The simplified structure and clear data flow minimize conflicting or redundant outputs, ensuring more consistent and trustworthy results.

Scalability: The modular design supports easy extension with additional specialized branches or data sources as needed.

4. Experiments and Results

4.1 Comparative Analysis

We conducted experiments across all three iterations of the system:

Initial Conceptual Framework: Achieved moderate reliability but struggled with complex queries. The linear flow limited specialization and took longer due to sequential data processing.

Swarm Team: Significantly improved coverage and specialization. Agents efficiently handled diverse queries, producing high-quality outputs. However, the increased resource utilization, high token costs, and synchronization issues presented tangible obstacles.

For instance, the Swarm Team found better results and wrote better reports when asking questions which required finding an individual case.

In the following example, the research team had to find news about the latest case to answer the question 'un débiteur peut-il contester la validité d'une créance qu'il a lui-même déclarée lors d'une procédure collective' ('can a debtor object to the validity of a debt he has himself declared to the bankruptcy court'). To answer the question reliably, the agents had to find a recent case which was not in their caselaw database.

The team was able to answer reliably by finding two relevant web sources. Read here.

Another of the best results included here : Responsabilité des produits défectueux.

While the right answer was usually achieved, reliability and uniformity of source citation proved a challenge, as evidenced here : Compétence du Tribunal lorsque le Demandeur est un Commerçant et le Défendeur une Personne Civile

Diamond System: Maintained high-quality results while reducing token usage by approximately 40%. Reliability and throughput improved, and the system became easier to maintain and scale.

The streamlined approach effectively balanced cost, complexity, and performance.

While this system excels at reading through large swathes of data and making sense of them into the correct answer, it still struggles with questions that require multiple iterations on a same database.

4.2 Performance Metrics

Quantitative metrics support these observations:

Token Cost Reduction: The diamond model consistently used fewer tokens, conserving computational resources without sacrificing completeness.

Processing Speed: With fewer synchronization steps and a simpler control flow, queries completed more rapidly.

Reliability & Scalability: Independent specialized branches within the diamond structure could be scaled up or down easily. Failure rates of final outputs decreased compared to the Swarm iteration.

4.3 Case Study

5. Challenges and Solutions

5.1 High Token Costs

  • Challenge: The Swarm model’s parallel searches frequently led to redundant data retrieval and increased computational overhead.

  • Solution: By adopting a diamond shape, we channeled queries through an optimized step and a shared scratch pad, minimizing repetitive calls and substantially reducing token costs.

5.2 Synchronization Issues

Challenge: Multiple agents operating asynchronously in the Swarm model could generate conflicting or redundant results.

Solution: The diamond architecture employs centralized memory management. With fewer parallel branches and a well-defined data flow, synchronization complexities decreased dramatically.

5.3 Scalability

Challenge: Maintaining coherence and low overhead in large-scale, complex queries tested the system’s scalability.

Solution: The diamond model’s modular search branches and centralized scratch pad approach simplified scaling, enabling efficient handling of more intricate research requests.

6. Future Improvements

As token costs diminish, models become more reliable, and user interface becomes easier to connect to asynchroneous tasks, the Swarm approach still seems the most promising to achieve the best quality approach.

The strengths of the Swarm (breadth and specialization) and Diamond (efficiency and reliability) approaches may be combined. Hybrid solutions could dynamically switch between modes based on the scope of the user’s query.

7. Conclusion

The iterative development of this multi-agent AI system for legal research should offer some insights for practitioners and developers.

Starting from a conceptual, linear pipeline and evolving through a Swarm-based architecture, we ultimately realized a diamond-shaped design that achieves a careful balance of efficiency, specialization, and scalability. The resulting system is cost-effective, robust, and capable of delivering reliable legal research outputs in a fraction of the time and at lower computational cost than earlier iterations.

This work hopes to demonstrate the potential of multi-agent AI systems to enhance legal research. By refining architectures, optimizing queries, and centralizing memory management, we can better handle complex, dynamic inquiries spanning various legal sources.

The framework and lessons presented here hopes to guide future innovations in legaltech, ultimately bringing more powerful, accessible, and reliable research tools to legal professionals.

Hobiarilala Michelle I Assistante SAV Indépendante

SAV, colis, clients pressés ? J’ai tout vu 😅 Assistante SAV💼📦 → Je transforme les petits drames logistiques en solutions rapides et clients satisfaits.

9mo

Un travail colossal, bravo ! Votre approche simplifiée semble un compromis idéal.

Amine SADRY

Dev Frontend → Expert Paie | Auteur "Le vrai coût du travail" | Challenge : 100 jours pour devenir un AS de la paie | J'aide à maîtriser la paie par la pratique 🐾

9mo

🤯 Never thought AI would be subject to bribe 😅 Thanks a lot Zacharie Laïk for sharign this insightful article 🙏

To view or add a comment, sign in

Others also viewed

Explore content categories