This is the official implementation of the paper:
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
Citation:
@article{wang2025comorag,
title={ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning},
author={Wang, Juyuan and Zhao, Rongchen and Wei, Wei and Wang, Yufeng and Yu, Mo and Zhou, Jie and Xu, Jin and Xu, Liyan},
journal={arXiv preprint arXiv:2508.10419},
year={2025}
}
ComoRAG is a retrieval-augmented generation (RAG) framework designed for long-document and multi-document tasks, including question answering, information extraction, and knowledge graph construction. It integrates large language models, embedding techniques, graph-based reasoning, and evaluation methodologies, making it suitable for both academic research and real-world applications.
π₯ What makes ComoRAG different?
Narrative comprehension on long stories and novels is hard due to intricate plotlines and evolving character/entity relations. LLMs struggle with extended context and cost, so retrieval stays crucial. However, classic RAG is often stateless and single-step, missing the dynamic nature of long-range, interconnected reasoning.
ComoRAG takes a cognition-inspired approach: narrative reasoning is not one-shot, but a dynamic, evolving interplay between new evidence acquisition and consolidation of past knowledge β analogous to memory processes in the brain. π§
- π Iterative Reasoning Cycles: When hitting an impasse, ComoRAG launches cycles that interact with a dynamic memory workspace.
- π΅οΈ Probing Queries: Each cycle generates targeted probes to explore new evidence paths.
- π§³ Global Memory Pool: Newly retrieved evidence is integrated into a shared memory pool to progressively build coherent context for the query.
π Benchmarks & Gains: On four challenging long-context narrative benchmarks (200K+ tokens), ComoRAG outperforms strong RAG baselines with consistent relative gains up to 11% over the strongest baseline. It particularly shines on complex queries requiring global comprehension, enabling principled, cognitively motivated, stateful retrieval-based reasoning. π
Key idea in one line: Reason β Probe β Retrieve β Consolidate β Resolve. π§©
- π§ Support for various LLMs and local/remote embedding models
- πΈοΈ Graph-augmented retrieval and reasoning
- π§ Flexible data preprocessing and chunking
- π Multiple evaluation metrics (F1, EM, etc.)
- π§± Modular and extensible design
ComoRAG/
βββ main_openai.py # Main program using OpenAI API
βββ main_vllm.py # Main program using local vLLM server
βββ script/ # Data processing and evaluation scripts
β βββ chunk_doc_corpus.py # Document chunking script
β βββ eval_qa.py # QA evaluation script
βββ dataset/ # Dataset directory
β βββ ...
βββ src/comorag/ # Core code
β βββ ComoRAG.py # Main class and core logic
β βββ utils/ # Utility modules
β βββ embedding_model/ # Embedding model related
β βββ llm/ # LLM related
β βββ prompts/ # Prompt templates
β βββ information_extraction/ # Information extraction
β βββ rerank.py, embedding_store.py # Other core modules
βββ requirements.txt # Dependencies
βββ README.md # Project documentation
- π Python version: Python 3.10 or above recommended
- π¦ Install dependencies:
pip install -r requirements.txt
- π Environment variables: Set your OpenAI API Key or local LLM/embedding paths as needed
- βοΈ GPU (optional but recommended): CUDA 12.x supported by many dependencies in requirements.txt
- π Corpus file corpus.jsonl: Each line is a document, with fields like
id
,doc_id
,title
,contents
- β QA file qas.jsonl: Each line is a question, with fields like
id
,question
,golden_answers
Example:
corpus.jsonl:
{"id": 0, "doc_id": 1, "title": "...", "contents": "..."}
qas.jsonl:
{"id": "1", "question": "...", "golden_answers": ["..."]}
- Configure dataset path and model parameters in the script:
config = BaseConfig(
llm_base_url='https://api.example.com/v1', # OpenAI API
llm_name='gpt-4o-mini',
dataset='cinderella',
embedding_model_name='/path/to/your/embedding/model',
embedding_batch_size=32,
need_cluster=True, # Enable Semantic/Episodic enhancement
output_dir='result/cinderella',
save_dir='outputs/cinderella',
max_meta_loop_max_iterations=5,
is_mc=False, # Multiple-choice?
max_tokens_ver=2000, # Veridical layer tokens
max_tokens_sem=2000, # Semantic layer tokens
max_tokens_epi=2000 # Episodic layer tokens
)
- Run the main program
βΆοΈ :
python main_openai.py
First, start the vLLM OpenAI-compatible API server:
# Method 1: Using vllm serve command
vllm serve /path/to/your/model \
--tensor-parallel-size 1 \
--max-model-len 4096 \
--gpu-memory-utilization 0.95
# Method 2: Using python -m vllm.entrypoints.openai.api_server
python -m vllm.entrypoints.openai.api_server \
--model /path/to/your/model \
--served-model-name your-model-name \
--tensor-parallel-size 1 \
--max-model-len 32768 \
--dtype auto
Parameter descriptions:
--model
: Model path (e.g.,/path/to/your/model
)--tensor-parallel-size
: Number of GPU parallel processes--max-model-len
: Maximum model length--gpu-memory-utilization
: GPU memory utilization rate
Modify the configuration in main_vllm.py
:
# vLLM server configuration
vllm_base_url = 'http://localhost:8000/v1' # vLLM server address
served_model_name = '/path/to/your/model' # Model path
config = BaseConfig(
llm_base_url=vllm_base_url,
llm_name=served_model_name,
llm_api_key="your-api-key-here", # Any value, local server doesn't need real API key
dataset='cinderella',
embedding_model_name='/path/to/your/embedding/model',
embedding_batch_size=4,
need_cluster=True,
output_dir='result/cinderella_vllm',
save_dir='outputs/cinderella_vllm',
max_meta_loop_max_iterations=5,
is_mc=False,
max_tokens_ver=2000,
max_tokens_sem=2000,
max_tokens_epi=2000
)
python main_vllm.py
Ensure the vLLM server is running properly:
# Check if port is occupied
netstat -tlnp | grep 8000
# Test API connection
curl http://localhost:8000/v1/models
Feature | OpenAI API (main.py) | vLLM Local (main_vllm.py) |
---|---|---|
Cost | Pay per token | One-time model download |
Speed | Network latency | Local inference, faster |
Privacy | Data sent to cloud | Completely local processing |
Setup | Simple, just API key | Requires GPU and model files |
Stability | Network dependent | Local control |
- π Results will be saved under
result/<dataset>/<subset>/
- ποΈ
ComoRAG.py
: The main class, responsible for retrieval, graph construction, reasoning, and QA - π§°
utils/
: Configuration, logging, embedding, clustering, summarization, memory, agents, and other utilities - π§²
embedding_model/
: Embedding model adaptation and loading - π€
llm/
: LLM adaptation - ποΈ
prompts/
: Prompt template management - π¦
embedding_store.py
: Embedding vector storage and retrieval
- βοΈ
script/chunk_doc_corpus.py
: Document chunking, supports token/word/sentence/recursive methods - π
script/eval_qa.py
: Automatic QA result evaluation, supports EM, F1, and other metrics
Example usage:
Chunking documents βοΈ:
python script/chunk_doc_corpus.py \
--input_path dataset/<name>/<subset>/corpus.jsonl \
--output_path dataset/<name>/<subset>/corpus_chunked.jsonl \
--chunk_by token \
--chunk_size 512 \
--tokenizer_name_or_path /path/to/your/tokenizer
Evaluate QA results π:
python script/eval_qa.py /path/to/result/<dataset>/<subset>
This produces files like ``detailsγ
results.json`, etc.
For questions or suggestions, feel free to submit an Issue or PR.
We refer to the repository of HippoRAG as a skeleton code.