Building a multimodal RAG pipeline over PDFs with ColQwen2

Computer vision/Machine Learning Software Engineer

A multimodal Retrieval-Augmented Generation (RAG) pipeline can be built over PDFs using ColQwen2, which embeds PDF pages as images without requiring OCR or layout detection. These embeddings are stored in a Weaviate vector database, and text queries are embedded similarly with ColQwen2 to retrieve relevant pages, while Qwen2.5-VL generates answers based on both text and visual content.

Leonie Monigatti

Multimodal RAG over PDFs with ColQwen2 (without any OCR, layout detection, or chunking). In this tutorial, my colleague Tobias Christiani and I show you how to build a multimodal RAG pipeline over PDFs: • Embed screenshots of PDF pages as images with a multimodal late-interaction model, ColQwen2 • Store in a Weaviate vector database • Embed text query with ColQwen2 to retrieve PDFs from the database at query time • Generate an answer with VLM Qwen2.5-VL GitHub: https://lnkd.in/eBAgxJ-b

To view or add a comment, sign in

More Relevant Posts

ThankGod Neibagha Kingsway

Student at Enugu State University of Science and Technology
2w
Report this post
Today I learned about: Sets & Maps! Set methods: add, size, has, delete and clear Map methods: .set, .get, delete, size How to iterate a Map How to convert from an Object to a Map & a Map to an array!
Like Comment
To view or add a comment, sign in
Mohamed Elesely B.Eng

Applied NLP Researcher (M.Sc. Thesis) | M.Sc. Data Science Candidate @ Hochschule Anhalt | Ex- Planning Engineer.
2w
Report this post
in this post, we will zoom in on how the document search can be implemented by using the vector Space Model ? And I can summarize this Simple Algorithm as follows: * Search through the Documents group --> where Documents are built as Vectors. * then the Query = small document, and in this small document --> i apply term-weighting by calculating TF-IDF for the query terms. * then calculating the similarity between terms, between the terms from the query & the terms of the available Documents. * Ranking Documents in the search outcome based on the similarity value Score (d,q) = Sim (d, q) * Descending sorting , where documents with highest score are the best passed by the K value.
Like Comment
To view or add a comment, sign in
Khuyen Tran

Author of Production-Ready Data Science | DevRel @ Nixtla
4d
Report this post
Why saving vector embeddings in a file is not enough? Basic file storage forces you to scan every single embedding for similarity search, creating massive performance bottlenecks as your dataset grows. ChromaDB provides persistent vector storage with automatic indexing and metadata filtering capabilities. Key benefits: 🔹 Find relevant content by meaning, not just keyword matching 🔹 Handle large datasets without memory crashes using efficient indexing 🔹 Complete toolkit included: similarity scoring, deduplication, search ranking, and more See the first comment for the complete guide 📖
6 Comments
Like Comment
To view or add a comment, sign in
Ado Kukic

Community, Claude, Code
4w
Report this post
Did you know you can have multiple claude.md files for your project? Claude Code will intelligently load only the relevant ones for the task at hand in the following order: 1. The Claude.md file in your current working directory 2. Recursively going up from your current directory 3. Nested Claude.md files (only if files in directory are read by CC) Here is a super simplified visualization showing how it all works 👇

2 Comments
Like Comment
To view or add a comment, sign in
Jean-Claude Tremblay

Formateur, consultant et infographiste chez Services Proficiografik | Assistance technique, production, consultation
3w
Report this post
(16) Create a series of GREP queries that target and execute exclusively on text frames assigned by a specific object style, enabling selective formatting control. (Scenario: think post-processing data merged documents.) Part of GREP MEGAPACK 1.0 Bundle: https://payhip.com/b/Z0kTl
1 Comment
Like Comment
To view or add a comment, sign in
skrub

1,623 followers
1w
Report this post
skrub DataOps help you construct complex and extensive hyperparameter search spaces. However, interpreting results from large grids can be challenging. To address this, skrub generates a parallel coordinate plot that visualizes all runs and the parameters used to achieve specific results. The plot is interactive: you can select a range of results, and it will highlight only the runs within that range, enabling you to refine your search further. It also tracks fit and score times, so you can identify which parameters most impact runtime. The plot in the video was created for our EurosciPy 2025 tutorial on forecasting time series. You can find the source of the plot and additional references in the comments below.

8 Comments
Like Comment
To view or add a comment, sign in
Vishakha Gupta
2w Edited
Report this post
𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗯𝗹𝗼𝗰𝗸𝘀 𝗼𝗿 𝗯𝗹𝗮𝗰𝗸 𝗯𝗼𝘅 - 𝘁𝗵𝗮𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻... 👀Why did we launch workflows for generating embeddings, object detection, face detection, and others instead of landing you in our UI with multimodal RAG? 🧠Why SQL compatibility or MCP access through workflows? 💡What does it mean for you and the variety of use cases you work with? Read up in our latest blog 👇 ApertureData Deniece Moxy, MBA
3 Comments
Like Comment
To view or add a comment, sign in
Jianna Liu

building something new!
1w
Report this post
😅 Don’t let a PDF get the best of you in 2025. 📝 Here’s Cardinal’s /markdown endpoint in action: Drop in any document — even messy, multi-column forms — and get back clean, structured Markdown you can actually use. Tables are preserved (no broken cells) Annotations & checkmarks stay intact Bounding boxes are included so you never lose context It’s like going from “PDF chaos” → “ready-to-use text” in one step. Cardinal is built for the hardest documents. Beyond OCR. Beyond black-box LLMs.
Like Comment
To view or add a comment, sign in
ThankGod Neibagha Kingsway

Student at Enugu State University of Science and Technology
2w
Report this post
Deep dive into working with arrays today! I've learned: - map, filter and reduce array methods! - the different ways the forEach and Map affect the array. -how to use insertAdjacetHTML & diff btw after begin, etc... to add elements to the webpage
Like Comment
To view or add a comment, sign in
Scott Askinosie M.S. Ph.D.

AI Developer
5d Edited
Report this post
Curious about what it means to move beyond naive RAG? And what agentic RAG looks like in action? I just published two notebooks on the Weaviate Query Agent! All you have to do is: 1. Install the Weaviate client with pip install weaviate-client[agents] 2. Initialize a query agent 3. Run the agent with a query And then the Weaviate Query Agents takes care of the rest! Whether that’s routing between collections, intelligent aggregation, dynamic filtering, or even locating data deficiencies. Check it out on GitHub: https://lnkd.in/gfemcjKg
1 Comment
Like Comment
To view or add a comment, sign in

502 followers

44 Posts

View Profile Follow

LinkedIn respects your privacy

Building a multimodal RAG pipeline over PDFs with ColQwen2

Explore content categories