ThreatIngestor - Lightweight CLI Based IOC Aggregator

Urvesh Thakkar

Senior Security Engineer @ Arctic Wolf | DFIR, Threat Hunting & Intel | CHFI | eTHP | DCPLA | CTIA | ECIH | CND | CCSE | Aspiring Music Producer @ URU | Ex-Informatica | President @ BeFojji OpSec India

Published Jul 22, 2025

If you're looking to aggregate threat data from multiple open feeds, extract meaningful IOCs, and feed them directly into your detection ecosystem - ThreatIngestor is simple & powerful.

It helps you pull in IOCs (Indicators of Compromise) from a bunch of different sources such as Twitter accounts, RSS feeds, and XML sitemaps and extracts malicious IPs, domains, hashes, YARA rules etc.

Think of it as your threat intel middleman that quietly keeps feeding your security stack with latest IoC(s).

It’s open source, it's lightweight and with just a few lines of config, you can operationalize your own threat intel pipeline, tailored to your environment. It doesn’t come with any fancy, modern-looking GUI or some slick, Jarvis-themed tech console, it’s all based on the CLI (hackers won't complaint as they love black terminal with green font) rather, that’s exactly what makes it lightweight and scalable, especially if you’re working with limited infrastructure or deploying on lower computing infrastructures.

ThreatIngestor Project - https://github.com/InQuest/threatingestor

Let's Get Started!

The build I used for deployment -

OS: Ubuntu 24.10 running as a droplet (cloud-machine) within DigitalOcean
Memory - 2 GB
Disk - 60 GB

Output for command "neofetch" for displaying System Info

First things first -

Do you cook any dish without ensuring that you've the correct spices with you? Like imagine a Biryani without the actual/ proper spices in it! Okay, layman examples apart, before we actually start using ThreatIngestor - let's ensure that our system has the required dependencies.

Run the following commands:

The above will update your OS and install the required packages like Python3 and git (if not pre-installed).

Now for Debian/Ubuntu systems with Python ≥3.11 there is Python PEP 668 protection enabled which prevents from installing packages system-wide to avoid breaking the OS. While there are ways to bypass it, however, it is not recommended to do so. Hence, I will be using a Virtual Environment (VENV).

Fire the command below to install Python VENV package -

Create and activate a virtual environment -

Once VENV is activated, you will see a prompt as shown above (threatingestor-env)

Install ThreatIngestor inside the virtualenv -

The CONFIG magic!

Okay! So, you’ve got ThreatIngestor installed inside your virtual environment, neat and clean. But now comes the part where the actual orchestration happens: the config file.

This is where you tell ThreatIngestor:

what feeds to ingest from (RSS, sitemap, Twitter, VirusTotal, etc.)
what kind of data to look for (IP, hash, domain, yara rule, etc.)
and where to dump that output (CSV, Redis, Elasticsearch, etc.)

Here’s a sample config file I used for testing (and yes, it works like a charm):

Save the above file as config.yml in your current working directory (from where we will run the ThreatIngestor tool).

👉 You can find an example config in the official repo: ThreatIngestor/config.example.yml at master · InQuest/ThreatIngestor · GitHub

Let's Understand the Config

🔹Sources - Each feed you want to ingest from is defined under . You’ll choose a (e.g., , ) and provide the URL. Some modules like require a , such as:

: if the feed contains general blog content (like Malwarebytes)
: if it’s clean and IOC-focused

🔹Operators - These are your output engines. In our case, we’re just dumping everything to output.csv. But you could also use:

to push to an ELK stack
if you want a stream of IOCs
to auto-publish to your threat intelligence sharing instance

🔹State File (state.db) - Think of this as ThreatIngestor’s brain. It remembers the last seen post or IOC from each source to avoid duplicates. That’s why even if you delete your CSV, the next run won’t regenerate old data unless you delete .

To start fresh:

The file is neatly structured. It’ll look something like:

Column A - Type of IoC (e.g. hash/ URL etc.)
Column B - The actual IoC
Column C - Reference link/ feed link from where it is collected

You can plug this into anything, really.

Some interesting ways to use this data -

Alright, now that we’ve got a juicy CSV full of artifacts, what next? Here are a few practical, operational ways to integrate this into your security workflow:

1. Import to MISP (Malware Information Sharing Platform)

You can create a custom script or use MISP’s built-in CSV import tool to bring in IOCs from . This works great if you want to use ThreatIngestor as your external feed harvester.

2. Feed into SIEM or any other security tool in your arsenal

You can tail or batch import this CSV into tools like:

Splunk (using a file monitor)
Elastic/Logstash (via filebeat or CSV import)
Graylog (CSV ingestion pipeline)
EDR / XDR feeds through API by creating a SOAR playbook
Any other use-case of your choice!

3. Use with a Custom Python App or Enrichment Service

Parse the CSV and build your own threat scoring engine, IOC enrichment dashboard, or correlation script. Sky's the limit here.

Did you know? The awesome folks at InQuest Labs are using this exact tool in their backend at:🔗 https://labs.inquest.net/iocdb

It’s real, scalable, and field-proven. If you're an org just starting out or want to build your own mini-Threat Intelligence Platform (TIP), this is an amazing foundation.

Maybe I'll dive into "how to build your own IOC feed portal like InQuest" in the next blog post? Let me know your thoughts!

Ashutosh Mhaisekar

Insightful, easy to do Threat Intel

1 Reaction

Berges Kasad

SOC ADMIN - ANALYST | Threat Hunter | LogRhythm | Exabeam

1mo

💡 Great insight

See more comments

ThreatIngestor - Lightweight CLI Based IOC Aggregator