Skip to content

ComoRAG is a Retrieval-Augmented Generation (RAG) system for long documents and multi-document QA, information extraction, and knowledge graph construction. It integrates various LLMs, embedding models, graph-based reasoning, and evaluation tools, suitable for both research and practical applications.

License

Notifications You must be signed in to change notification settings

EternityJune25/ComoRAG

Repository files navigation

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning

ComoRAG Overview

πŸ“– Paper Information

This is the official implementation of the paper:

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning

Citation:

@article{wang2025comorag,
  title={ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning},
  author={Wang, Juyuan and Zhao, Rongchen and Wei, Wei and Wang, Yufeng and Yu, Mo and Zhou, Jie and Xu, Jin and Xu, Liyan},
  journal={arXiv preprint arXiv:2508.10419},
  year={2025}
}

Project Introduction

ComoRAG is a retrieval-augmented generation (RAG) framework designed for long-document and multi-document tasks, including question answering, information extraction, and knowledge graph construction. It integrates large language models, embedding techniques, graph-based reasoning, and evaluation methodologies, making it suitable for both academic research and real-world applications.

πŸ”₯ What makes ComoRAG different?

Narrative comprehension on long stories and novels is hard due to intricate plotlines and evolving character/entity relations. LLMs struggle with extended context and cost, so retrieval stays crucial. However, classic RAG is often stateless and single-step, missing the dynamic nature of long-range, interconnected reasoning.

ComoRAG takes a cognition-inspired approach: narrative reasoning is not one-shot, but a dynamic, evolving interplay between new evidence acquisition and consolidation of past knowledge β€” analogous to memory processes in the brain. 🧠

  • πŸ” Iterative Reasoning Cycles: When hitting an impasse, ComoRAG launches cycles that interact with a dynamic memory workspace.
  • πŸ•΅οΈ Probing Queries: Each cycle generates targeted probes to explore new evidence paths.
  • 🧳 Global Memory Pool: Newly retrieved evidence is integrated into a shared memory pool to progressively build coherent context for the query.

πŸš€ Benchmarks & Gains: On four challenging long-context narrative benchmarks (200K+ tokens), ComoRAG outperforms strong RAG baselines with consistent relative gains up to 11% over the strongest baseline. It particularly shines on complex queries requiring global comprehension, enabling principled, cognitively motivated, stateful retrieval-based reasoning. πŸ“ˆ

Key idea in one line: Reason β†’ Probe β†’ Retrieve β†’ Consolidate β†’ Resolve. 🧩


Key Features ✨

  • 🧠 Support for various LLMs and local/remote embedding models
  • πŸ•ΈοΈ Graph-augmented retrieval and reasoning
  • πŸ”§ Flexible data preprocessing and chunking
  • πŸ“Š Multiple evaluation metrics (F1, EM, etc.)
  • 🧱 Modular and extensible design

Directory Structure πŸ“‚

ComoRAG/
β”œβ”€β”€ main_openai.py                       # Main program using OpenAI API
β”œβ”€β”€ main_vllm.py                         # Main program using local vLLM server
β”œβ”€β”€ script/                              # Data processing and evaluation scripts
β”‚   β”œβ”€β”€ chunk_doc_corpus.py              # Document chunking script
β”‚   └── eval_qa.py                       # QA evaluation script
β”œβ”€β”€ dataset/                             # Dataset directory
β”‚   └── ...
β”œβ”€β”€ src/comorag/                        # Core code
β”‚   β”œβ”€β”€ ComoRAG.py                       # Main class and core logic
β”‚   β”œβ”€β”€ utils/                           # Utility modules
β”‚   β”œβ”€β”€ embedding_model/                 # Embedding model related
β”‚   β”œβ”€β”€ llm/                             # LLM related
β”‚   β”œβ”€β”€ prompts/                         # Prompt templates
β”‚   β”œβ”€β”€ information_extraction/          # Information extraction
β”‚   └── rerank.py, embedding_store.py    # Other core modules
β”œβ”€β”€ requirements.txt                     # Dependencies
└── README.md                            # Project documentation

Installation & Environment πŸ› οΈ

  1. 🐍 Python version: Python 3.10 or above recommended
  2. πŸ“¦ Install dependencies:
pip install -r requirements.txt
  1. πŸ”‘ Environment variables: Set your OpenAI API Key or local LLM/embedding paths as needed
  2. βš™οΈ GPU (optional but recommended): CUDA 12.x supported by many dependencies in requirements.txt

Data Preparation & Format πŸ“„

  • πŸ“š Corpus file corpus.jsonl: Each line is a document, with fields like id, doc_id, title, contents
  • ❓ QA file qas.jsonl: Each line is a question, with fields like id, question, golden_answers

Example:

corpus.jsonl:

{"id": 0, "doc_id": 1, "title": "...", "contents": "..."}

qas.jsonl:

{"id": "1", "question": "...", "golden_answers": ["..."]}

Quick Start ⚑

Method 1: Using OpenAI API (main_openai.py) πŸš€

  1. Configure dataset path and model parameters in the script:
config = BaseConfig(
    llm_base_url='https://api.example.com/v1',  # OpenAI API
    llm_name='gpt-4o-mini',
    dataset='cinderella',
    embedding_model_name='/path/to/your/embedding/model',
    embedding_batch_size=32,
    need_cluster=True,  # Enable Semantic/Episodic enhancement
    output_dir='result/cinderella',
    save_dir='outputs/cinderella',
    max_meta_loop_max_iterations=5,
    is_mc=False,  # Multiple-choice?
    max_tokens_ver=2000,  # Veridical layer tokens
    max_tokens_sem=2000,  # Semantic layer tokens
    max_tokens_epi=2000   # Episodic layer tokens
)
  1. Run the main program ▢️:
python main_openai.py

Method 2: Using Local vLLM Server (main_vllm.py) ⚑

1. Start vLLM Server πŸš€

First, start the vLLM OpenAI-compatible API server:

# Method 1: Using vllm serve command
vllm serve /path/to/your/model \
  --tensor-parallel-size 1 \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.95

# Method 2: Using python -m vllm.entrypoints.openai.api_server
python -m vllm.entrypoints.openai.api_server \
  --model /path/to/your/model \
  --served-model-name your-model-name \
  --tensor-parallel-size 1 \
  --max-model-len 32768 \
  --dtype auto

Parameter descriptions:

  • --model: Model path (e.g., /path/to/your/model)
  • --tensor-parallel-size: Number of GPU parallel processes
  • --max-model-len: Maximum model length
  • --gpu-memory-utilization: GPU memory utilization rate

2. Configure main_vllm.py πŸ“

Modify the configuration in main_vllm.py:

# vLLM server configuration
vllm_base_url = 'http://localhost:8000/v1'  # vLLM server address
served_model_name = '/path/to/your/model'    # Model path

config = BaseConfig(
    llm_base_url=vllm_base_url,
    llm_name=served_model_name,
    llm_api_key="your-api-key-here",  # Any value, local server doesn't need real API key
    dataset='cinderella',
    embedding_model_name='/path/to/your/embedding/model',
    embedding_batch_size=4,
    need_cluster=True,
    output_dir='result/cinderella_vllm',
    save_dir='outputs/cinderella_vllm',
    max_meta_loop_max_iterations=5,
    is_mc=False,
    max_tokens_ver=2000,
    max_tokens_sem=2000,
    max_tokens_epi=2000
)

3. Run the Program ▢️

python main_vllm.py

4. Check Server Status πŸ”

Ensure the vLLM server is running properly:

# Check if port is occupied
netstat -tlnp | grep 8000

# Test API connection
curl http://localhost:8000/v1/models

Comparison of Two Methods πŸ“Š

Feature OpenAI API (main.py) vLLM Local (main_vllm.py)
Cost Pay per token One-time model download
Speed Network latency Local inference, faster
Privacy Data sent to cloud Completely local processing
Setup Simple, just API key Requires GPU and model files
Stability Network dependent Local control
  1. πŸ“ Results will be saved under result/<dataset>/<subset>/

Main Modules

  • πŸ›οΈ ComoRAG.py: The main class, responsible for retrieval, graph construction, reasoning, and QA
  • 🧰 utils/: Configuration, logging, embedding, clustering, summarization, memory, agents, and other utilities
  • 🧲 embedding_model/: Embedding model adaptation and loading
  • πŸ€– llm/: LLM adaptation
  • πŸ—’οΈ prompts/: Prompt template management
  • πŸ“¦ embedding_store.py: Embedding vector storage and retrieval

Data Processing & Evaluation Scripts πŸ§ͺ

  • βœ‚οΈ script/chunk_doc_corpus.py: Document chunking, supports token/word/sentence/recursive methods
  • πŸ“ˆ script/eval_qa.py: Automatic QA result evaluation, supports EM, F1, and other metrics

Example usage:

Chunking documents βœ‚οΈ:

python script/chunk_doc_corpus.py \
  --input_path dataset/<name>/<subset>/corpus.jsonl \
  --output_path dataset/<name>/<subset>/corpus_chunked.jsonl \
  --chunk_by token \
  --chunk_size 512 \
  --tokenizer_name_or_path /path/to/your/tokenizer

Evaluate QA results πŸ“Š:

python script/eval_qa.py /path/to/result/<dataset>/<subset>

This produces files like ``details、results.json`, etc.


Contact & Contribution 🀝

For questions or suggestions, feel free to submit an Issue or PR.


Acknowledgement πŸ™

We refer to the repository of HippoRAG as a skeleton code.

About

ComoRAG is a Retrieval-Augmented Generation (RAG) system for long documents and multi-document QA, information extraction, and knowledge graph construction. It integrates various LLMs, embedding models, graph-based reasoning, and evaluation tools, suitable for both research and practical applications.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages