From Docs to Markdown in a Click: How I Hooked Up Tines with Microsoft’s MarkItDown
Hey folks — so I’ve been tinkering with something I think you’ll appreciate if you’ve ever been stuck parsing PDFs, PowerPoints, or... ugh... .doc files in an automation pipeline.
Let me walk you through how I got Tines talking to Microsoft’s MarkItDown — and why it’s now my go-to for file conversion when working with LLMs.
So, why even bother?
You know how LLMs love Markdown, right? It’s clean, structured, and way easier to chunk into digestible bits for analysis or summarization. But most of the source content we get? Yeah, it’s not in Markdown.
Think enterprise-y stuff: DOCX, PPT, PDFs, even the occasional .zip full of... who knows what.
What’s nice is, Microsoft quietly dropped this nifty Python package — MarkItDown — that converts all that into Markdown. Seriously, it handles PowerPoint, Excel, DOCX, PDFs, HTML, audio files, even ZIPs. But here’s the kicker: I didn’t want to run it manually. I wanted to hook it right into Tines.
Demo:
The Setup
Alright, so here’s the basic idea: spin up a container running MarkItDown behind a Tines HTTP Tunnel. This way, I can send files directly from Tines (through the tunnel), and get back nice, clean Markdown. That’s it.
Let’s break it down real quick:
1. Docker Compose
Two services:
tines_tunnel: connect to the SaaS platform
markitdown: a custom Flask API wrapper I built around the MarkItDown library.
docker-compose.yml
They live on the same Docker network so that Tines can talk to MarkItDown internally without any port weirdness.
2. The Dockerfile
Now here’s where it gets interesting. I needed a bunch of tools installed to support all the file types — especially legacy formats like .ppt, .doc, and .xls.
So I baked all that into a single image:
It ends by launching app.py, which exposes a simple /convert endpoint that takes in a base64 file, URI, or raw HTML, and spits back Markdown.
Super lightweight and super effective.
3. The Flask API (app.py)
This part’s kinda fun. It handles:
Base64-encoded files (great for Tines HTTP Actions)
File URIs
HTML strings
Even remote URLs
What’s cool is, I added some logic to automatically convert legacy formats (like .doc, .ppt, .xls) into their modern siblings using LibreOffice behind the scenes. No special prep needed. Just send the file and boom — it’ll upgrade and convert it.
There’s even validation for ZIPs to make sure they aren’t corrupt before we try to parse them. That alone saved me from hours of debugging when a partner sent over a junk file.
4. Calling It from Tines
Inside Tines, I use a regular HTTP Request action with a JSON body like:
The result is clean Markdown and optional metadata (like title and author). It’s perfect for feeding into OpenAI or any downstream logic — summaries, classifications, whatever.
Final Thoughts
Look — converting files to Markdown used to be annoying. Too many formats, too many edge cases. Now? It’s just an API call away. And pairing this with Tines makes it feel automatic in the best way possible.
I’ll probably polish this up more and throw it into a public Story Library template soon. But if you're curious now and want to set it up — happy to share the repo or walk you through it.
Catch ya in the next build. 👋
CODE
https://github.com/seefor/tines/blob/main/MarkItDown-Demo.json
GenAI Guru - I transform engineers into LLM Tooling and Agentic AI Experts | AI Contractor @ Bollywood Hungama | Technical Project Manager @ Adeptmind | IIT Patna | IIIT Bangalore | LJMU | Ex-BlueStacks, now.gg
2moWalkthrough yesss.
Senior Content Manager at Tines │ Producer of The Future of Security Operations podcast
3moOh I definitely need this.
Machine Learning Developer Advocate | LLMs, AI Agents & RAG | Shipping Open Source AI Apps | AI Engineering
3moGreat write up Sif Baksh!