SlideShare a Scribd company logo
https://digitalis.io
info@digitalis.io
Security Information and
Event Management with Kafka,
Kafka Connect, KSQL and Logstash
https://digitalis.io
2
Jason Bell
ABOUT
Working with Kafka since 2014, in
development, support and now DevOps.
Author of Machine Learning: Hands on
for Developers and Technical
Professionals, published by Wiley.
Kafka DevOps Engineer
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
What is SIEM?
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
SIEM adoption originally driven from Payment Card
Industry Data Security Standard (PCI DSS).
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Data can come from various sources such as
firewalls, anti-virus, login information and intrusion
prevention systems.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
For example: A user does 20 failed login attempts.
Has the user actually forgotten? Let’s class this as
a low priority event. The user may have just
forgotten their password and retried.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
A user does 140 failed login attempts in five
minutes. This is more than likely a brute force
attack and needs investigating.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Enterprise SIEM Problems
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Enterprise SIEM Problems
● Large Volumes of Data.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Enterprise SIEM Problems
● Large Volumes of Data.
● Variety of log formats - RFC5424, RFC3164, Windows Events and other
bespoke log formats from network devices.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Enterprise SIEM Problems
● Large Volumes of Data.
● Variety of log formats - RFC5424, RFC3164, Windows Events and other
bespoke log formats from network devices.
● Regulatory compliance.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
“Virtually every regulatory compliance regime or
standard such as GDPR, ISO 27001, PCI DSS,
HIPAA, FERPA, Sarbanes-Oxley (SOX), FISMA,
and SOC 2 have some requirements of log
management to preserve audit trails of activity that
addresses the CIA (Confidentiality, Integrity, and
Availability) triad.”
https://digitalis.io/blog/kafka/apache-kafka-and-regulatory-compliance/
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Enterprise SIEM Problems
● Large Volumes of Data.
● Variety of log formats - RFC5424, RFC3164, Windows Events and other
bespoke log formats from network devices.
● Regulatory compliance.
● High Availability Requirements
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Enterprise SIEM Problems
● Large Volumes of Data.
● Variety of log formats - RFC5424, RFC3164, Windows Events and other
bespoke log formats from network devices.
● Regulatory compliance.
● High Availability Requirements
● Downstream sometimes cannot keep up at peak times – 9am, DDoS events
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Enterprise SIEM Problems
● Large Volumes of Data.
● Variety of log formats - RFC5424, RFC3164, Windows Events and other
bespoke log formats from network devices.
● Regulatory compliance.
● High Availability Requirements
● Downstream sometimes cannot keep up at peak times – 9am, DDoS events
● Multiple consumers of data and connectivity to them
○ routing, transforming, filtering
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Why use Kafka?
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Why Kafka?
● High Availability
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Why Kafka?
● High Availability
● Scalable
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Why Kafka?
● High Availability
● Scalable
● High Throughput
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Why Kafka?
● High Availability
● Scalable
● High Throughput
● Rich Ecosystem
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Why Kafka?
● High Availability
● Scalable
● High Throughput
● Rich Ecosystem
● ksqlDB for Implementing Logic for Routing/Filtering/Transforming
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Why Kafka?
● High Availability
● Scalable
● High Throughput
● Rich Ecosystem
● ksqlDB for Implementing Logic for Routing/Filtering/Transforming
● Buffering of data during high peak volumes – a shock absorber.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Kafka SIEM Architecture
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Data Flows and Components
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Topic and Outbound Data Flows
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Data Ingestion
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Data Ingestion
● Non-repudiation - fingerprinting source logs
● Transformation to JSON
● Non-standard syslog formats - bespoke grokking
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Logstash - Input
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
TODO: Insert Logstash In->Filter-Out diagram
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Logstash Input – All Types input {
udp {
host => "0.0.0.0"
port => 5140
type => rfc5424
tags => ["rfc5424"]
}
tcp {
host => "0.0.0.0"
port => 5140
type => rfc5424
tags => ["rfc5424"]
}
syslog {
port => 5150
type => rfc3164
tags => ["rfc3164"]
}
}
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Logstash - Filtering
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Logstash Filter – RFC3164
filter {
if [type] == "rfc3164" {
# rename and remove fields
mutate {
remove_field => [ "@version", "@timestamp" ]
rename => { "host" => "client_addr" }
rename => { "logsource" => "host" }
rename => { "severity_label" => "severity" }
rename => { "facility_label" => "facility" }
}
}
}
}
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Logstash Filter – RFC5424
filter {
if [type] == "rfc5424" {
# parse RFC5424 log
grok {
patterns_dir => "/etc/logstash/patterns"
match => [ "message", "%{SYSLOG}" ]
tag_on_failure => [ "_grokparsefailure_syslog" ]
}
# rename fields and remove unneeded ones
mutate {
rename => { "syslog_facility" => "facility" }
rename => { "syslog_severity" => "severity" }
# message_syslog contains message content +
extra data
replace => { "message" => "%{message_syslog}" }
remove_field => [ "@version", "facility_label",
"@timestamp", "message_content", "message_syslog" ]
rename => { "program" => "ident" }
rename => { "timestamp_source" => "timestamp"}
rename => { "host" => "client_addr" }
rename => { "host_source" => "host" }
}
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Logstash Filter – RFC JSON
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
{
"host":“testhost",
"ident":"info",
"message":"01070417:6: AUDIT - user admin - RAW: httpd(pam_audit): User=admin tty=(unknown) host=10.234.254.90 failed to login after 1 attempt….",
"priority":"info",
"facility":"local0",
"client_addr":"10.234.254.90",
"bucket":"2019042913",
"evt_id":"33a3a040-6a7f-11e9-a8be-0050568115fd",
"extradata":"[ ]",
"fingerprint ":"73dd765f55a1791b667bd6160235e3f6 ",
"rawdata ":"..... ",
"pid":"-",
"msgid":"-",
"timestamp":"2019-04-29T14:03:37.000000Z"
}
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Logstash - Output
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
output {
if "syslog_rfc5424" in [tags] {
kafka {
codec => json
topic_id => "syslog_rfc5424"
bootstrap_servers => "{{ confluent_ksql_bootstrap_servers }}"
security_protocol => SSL
ssl_key_password => "{{ logstash_ssl_key_password }}"
ssl_keystore_location => "/etc/logstash/logstash.keystore.jks"
ssl_keystore_password => "{{ logstash_ssl_keystore_password }}"
ssl_truststore_location => "/etc/logstash/logstash.truststore.jks"
ssl_truststore_password => "{{ logstash_ssl_truststore_password }}"
compression_type => "snappy"
acks => "1"
retries => "3"
retry_backoff_ms => "500"
request_timeout_ms => "2000"
batch_size => "32768"
ssl_endpoint_identification_algorithm => "https"
ssl_keystore_type => jks
}
}
}
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Topic Filtering and Routing
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Filter / Routing
● Some downstream systems are not interested in INFO -
too much data
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Filter / Routing
● Some downstream systems are not interested in INFO -
too much data
● Some are only interested in Windows events for
example.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
create stream syslog_rfc3164 (client_addr varchar, host varchar, timestamp varchar, severity varchar,
message varchar, facility varchar, type varchar, priority varchar) with (KAFKA_TOPIC='syslog_rfc3164',
VALUE_FORMAT='JSON’);
create stream auth_rfc3164 with (KAFKA_TOPIC='syslog_auth', VALUE_FORMAT='JSON') AS SELECT * FROM
syslog_rfc3164 WHERE message LIKE '%password check failed for user%' OR message LIKE '%An account
failed to log on.%' OR message LIKE '%%0xC000006D’;
create stream syslog_rfc5424 (facility varchar, message varchar, pid varchar, type varchar, timestamp
varchar, ident varchar, client_addr varchar, host varchar, msgid varchar, extradata varchar, priority
varchar) with (KAFKA_TOPIC='syslog_rfc5424', VALUE_FORMAT='JSON’);
create stream auth_rfc5424 with (KAFKA_TOPIC='syslog_auth', VALUE_FORMAT='JSON') AS SELECT * FROM
syslog_rfc5424 WHERE message LIKE '%password check failed%' OR extradata LIKE '%|309|%' OR message
LIKE '%An account failed to log on.%' OR message LIKE '%%0xC000006D';
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Destinations and Sinks
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Destinations and Sink
● Use existing connectors
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Destinations and Sink
● Use existing connectors
● Build your own connectors
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Splunk HTTP Sink in
Kafka Connect
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
{
"name": "syslog-sink-splunk",
"config": {
"connector.class": "SplunkHECSinkConnector",
"tasks.max": "{{ tasks_max }}",
"topics": "{{ topics }}",
"splunk.endpoint.uri": "{{ splunk_endpoint_uri }}",
"splunk.hec.token": "{{ splunk_hec_token }}",
"splunk.index": "{{ splunk_index }}",
"splunk.channelid": "{{ splunk_channelid }}",
"splunk.sourcetype": "{{ splunk_sourcetype }}",
"splunk.http.loglevel": "{{ splunk_http_loglevel }}",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "{{ splunk_value_converter_schemas_enable }}",
"errors.tolerance": "{{ splunk_errors_tolerance }}",
"errors.deadletterqueue.topic.name":"{{ errors_deadletterqueue_topic_name }}",
"errors.deadletterqueue.topic.replication.factor": "{{ errors_deadletterqueue_topic_replication_factor }}"
}
}
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Testing
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Testing
● Process 3TB/day data volumes.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Testing
● Process 3TB/day data volumes.
● Prove the solution can scale horizontally.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Testing: Process 2.3TB/day data volumes.
● 3TB/day = 33MB/second
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
• 400 threads were set up in the Thread Group to simulate
400 servers sending the logs.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
• 6 load injectors were setup, totalling 2400 threads
(simulated servers), in order to generate between
20MB/second to 40 MB/second load against the endpoint
from the injectors.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
• The load was injected over 5 days period at a sustained
rate to ascertain the performance characteristics of each
component over a prolonged duration.
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Testing
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Carry on the conversation:
• Website: https://digitalis.io
• Reddit: https://reddit.com/users/digitalis_io
• Twitter: @digitalis_io
©2020 digitalis.io Ltd. Do not distribute without
consent.
https://digitalis.io
Any Questions?

More Related Content

PPTX
Stream me to the Cloud (and back) with Confluent & MongoDB
PDF
Transforming Financial Services with Event Streaming Data
PDF
Confluent x imply: Build the last mile to value for data streaming applications
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
PDF
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
PDF
Event-Streaming verstehen in unter 10 Min
PDF
Data reply sneak peek: real time decision engines
PPTX
Seamless Guest Experience with Kafka Streams
Stream me to the Cloud (and back) with Confluent & MongoDB
Transforming Financial Services with Event Streaming Data
Confluent x imply: Build the last mile to value for data streaming applications
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Event-Streaming verstehen in unter 10 Min
Data reply sneak peek: real time decision engines
Seamless Guest Experience with Kafka Streams

What's hot (20)

PPTX
Financial Event Sourcing at Enterprise Scale
PDF
Lead confluent HQ Dec 2019
PDF
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
PPTX
Modernizing your Application Architecture with Microservices
PDF
How Apache Kafka helps to create Data Culture – How to Cross the Kafka Chasm
PDF
Apache Kafka® Use Cases for Financial Services
PDF
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
PDF
batbern43 Events - Lessons learnt building an Enterprise Data Bus
PDF
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
PDF
Risk Management in Retail with Stream Processing (Daniel Jagielski, Virtuslab...
PPTX
Check Out our Rich Python Portfolio: Leaders in Python & Django‎
PPTX
Python Automation With Gauge + Selenium + API + Jenkins
PDF
Application Modernization Using Event Streaming Architecture (David Wadden, V...
PDF
Confluent Messaging Modernization Forum
PDF
Pivoting event streaming, from PROJECTS to a PLATFORM
PDF
Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka ...
PPTX
Digital Transformation Mindset - More Than Just Technology
PDF
Battle Tested Event-Driven Patterns for your Microservices Architecture
PDF
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
PDF
Generali connection platform_full
Financial Event Sourcing at Enterprise Scale
Lead confluent HQ Dec 2019
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
Modernizing your Application Architecture with Microservices
How Apache Kafka helps to create Data Culture – How to Cross the Kafka Chasm
Apache Kafka® Use Cases for Financial Services
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
batbern43 Events - Lessons learnt building an Enterprise Data Bus
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Risk Management in Retail with Stream Processing (Daniel Jagielski, Virtuslab...
Check Out our Rich Python Portfolio: Leaders in Python & Django‎
Python Automation With Gauge + Selenium + API + Jenkins
Application Modernization Using Event Streaming Architecture (David Wadden, V...
Confluent Messaging Modernization Forum
Pivoting event streaming, from PROJECTS to a PLATFORM
Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka ...
Digital Transformation Mindset - More Than Just Technology
Battle Tested Event-Driven Patterns for your Microservices Architecture
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Generali connection platform_full
Ad

Similar to Security Information and Event Management with Kafka, Kafka Connect, KSQL and Logstash (20)

PPTX
Ivanti for msp
PDF
Log Analytics for Distributed Microservices
PPTX
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
PPTX
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
PPTX
Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?
PPTX
Meeting rooms are talking. Are you listening
PDF
TENDENCIAS DE SEGURIDAD PARA AMBIENTES EN LA NUBE
PPTX
Meeting rooms are talking! are you listening?
PDF
Zabbix – Powerful enterprise grade monitoring driven by Open Source by Wolfga...
PDF
Logstash and Maxmind: not just for GEOIP anymore
PDF
DDS Secure Intro
PDF
Elk its big log season
DOCX
Case StudyAutomotive - SSLVPN case study DIGIPASS BY VA
PPTX
The Boring Security Talk
PDF
Webzurich - The State of Web Security in Switzerland
PDF
Building Event-Driven Microservices using Kafka Streams (Stathis Souris, Thou...
PPTX
Web Application Debugging Webinar
PPTX
Introduction to ThousandEyes
PDF
Sydney Identity Summit: Addressing the New Threat Landscape with Continuous S...
PPTX
CrikeyCon VI - The Boring Security Talk
Ivanti for msp
Log Analytics for Distributed Microservices
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?
Meeting rooms are talking. Are you listening
TENDENCIAS DE SEGURIDAD PARA AMBIENTES EN LA NUBE
Meeting rooms are talking! are you listening?
Zabbix – Powerful enterprise grade monitoring driven by Open Source by Wolfga...
Logstash and Maxmind: not just for GEOIP anymore
DDS Secure Intro
Elk its big log season
Case StudyAutomotive - SSLVPN case study DIGIPASS BY VA
The Boring Security Talk
Webzurich - The State of Web Security in Switzerland
Building Event-Driven Microservices using Kafka Streams (Stathis Souris, Thou...
Web Application Debugging Webinar
Introduction to ThousandEyes
Sydney Identity Summit: Addressing the New Threat Landscape with Continuous S...
CrikeyCon VI - The Boring Security Talk
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Spectroscopy.pptx food analysis technology
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
MYSQL Presentation for SQL database connectivity
Spectroscopy.pptx food analysis technology
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The AUB Centre for AI in Media Proposal.docx
Diabetes mellitus diagnosis method based random forest with bat algorithm
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25-Week II
MIND Revenue Release Quarter 2 2025 Press Release

Security Information and Event Management with Kafka, Kafka Connect, KSQL and Logstash

  • 1. https://digitalis.io info@digitalis.io Security Information and Event Management with Kafka, Kafka Connect, KSQL and Logstash
  • 2. https://digitalis.io 2 Jason Bell ABOUT Working with Kafka since 2014, in development, support and now DevOps. Author of Machine Learning: Hands on for Developers and Technical Professionals, published by Wiley. Kafka DevOps Engineer
  • 3. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io What is SIEM?
  • 4. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io SIEM adoption originally driven from Payment Card Industry Data Security Standard (PCI DSS).
  • 5. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Data can come from various sources such as firewalls, anti-virus, login information and intrusion prevention systems.
  • 6. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io For example: A user does 20 failed login attempts. Has the user actually forgotten? Let’s class this as a low priority event. The user may have just forgotten their password and retried.
  • 7. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io A user does 140 failed login attempts in five minutes. This is more than likely a brute force attack and needs investigating.
  • 8. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Enterprise SIEM Problems
  • 9. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Enterprise SIEM Problems ● Large Volumes of Data.
  • 10. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Enterprise SIEM Problems ● Large Volumes of Data. ● Variety of log formats - RFC5424, RFC3164, Windows Events and other bespoke log formats from network devices.
  • 11. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Enterprise SIEM Problems ● Large Volumes of Data. ● Variety of log formats - RFC5424, RFC3164, Windows Events and other bespoke log formats from network devices. ● Regulatory compliance.
  • 12. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io “Virtually every regulatory compliance regime or standard such as GDPR, ISO 27001, PCI DSS, HIPAA, FERPA, Sarbanes-Oxley (SOX), FISMA, and SOC 2 have some requirements of log management to preserve audit trails of activity that addresses the CIA (Confidentiality, Integrity, and Availability) triad.” https://digitalis.io/blog/kafka/apache-kafka-and-regulatory-compliance/
  • 13. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Enterprise SIEM Problems ● Large Volumes of Data. ● Variety of log formats - RFC5424, RFC3164, Windows Events and other bespoke log formats from network devices. ● Regulatory compliance. ● High Availability Requirements
  • 14. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Enterprise SIEM Problems ● Large Volumes of Data. ● Variety of log formats - RFC5424, RFC3164, Windows Events and other bespoke log formats from network devices. ● Regulatory compliance. ● High Availability Requirements ● Downstream sometimes cannot keep up at peak times – 9am, DDoS events
  • 15. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Enterprise SIEM Problems ● Large Volumes of Data. ● Variety of log formats - RFC5424, RFC3164, Windows Events and other bespoke log formats from network devices. ● Regulatory compliance. ● High Availability Requirements ● Downstream sometimes cannot keep up at peak times – 9am, DDoS events ● Multiple consumers of data and connectivity to them ○ routing, transforming, filtering
  • 16. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Why use Kafka?
  • 17. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Why Kafka? ● High Availability
  • 18. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Why Kafka? ● High Availability ● Scalable
  • 19. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Why Kafka? ● High Availability ● Scalable ● High Throughput
  • 20. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Why Kafka? ● High Availability ● Scalable ● High Throughput ● Rich Ecosystem
  • 21. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Why Kafka? ● High Availability ● Scalable ● High Throughput ● Rich Ecosystem ● ksqlDB for Implementing Logic for Routing/Filtering/Transforming
  • 22. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Why Kafka? ● High Availability ● Scalable ● High Throughput ● Rich Ecosystem ● ksqlDB for Implementing Logic for Routing/Filtering/Transforming ● Buffering of data during high peak volumes – a shock absorber.
  • 23. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Kafka SIEM Architecture
  • 24. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io
  • 25. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Data Flows and Components
  • 26. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Topic and Outbound Data Flows
  • 27. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Data Ingestion
  • 28. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Data Ingestion ● Non-repudiation - fingerprinting source logs ● Transformation to JSON ● Non-standard syslog formats - bespoke grokking
  • 29. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Logstash - Input
  • 30. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io TODO: Insert Logstash In->Filter-Out diagram
  • 31. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Logstash Input – All Types input { udp { host => "0.0.0.0" port => 5140 type => rfc5424 tags => ["rfc5424"] } tcp { host => "0.0.0.0" port => 5140 type => rfc5424 tags => ["rfc5424"] } syslog { port => 5150 type => rfc3164 tags => ["rfc3164"] } }
  • 32. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Logstash - Filtering
  • 33. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Logstash Filter – RFC3164 filter { if [type] == "rfc3164" { # rename and remove fields mutate { remove_field => [ "@version", "@timestamp" ] rename => { "host" => "client_addr" } rename => { "logsource" => "host" } rename => { "severity_label" => "severity" } rename => { "facility_label" => "facility" } } } } }
  • 34. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Logstash Filter – RFC5424 filter { if [type] == "rfc5424" { # parse RFC5424 log grok { patterns_dir => "/etc/logstash/patterns" match => [ "message", "%{SYSLOG}" ] tag_on_failure => [ "_grokparsefailure_syslog" ] } # rename fields and remove unneeded ones mutate { rename => { "syslog_facility" => "facility" } rename => { "syslog_severity" => "severity" } # message_syslog contains message content + extra data replace => { "message" => "%{message_syslog}" } remove_field => [ "@version", "facility_label", "@timestamp", "message_content", "message_syslog" ] rename => { "program" => "ident" } rename => { "timestamp_source" => "timestamp"} rename => { "host" => "client_addr" } rename => { "host_source" => "host" } }
  • 35. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Logstash Filter – RFC JSON
  • 36. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io { "host":“testhost", "ident":"info", "message":"01070417:6: AUDIT - user admin - RAW: httpd(pam_audit): User=admin tty=(unknown) host=10.234.254.90 failed to login after 1 attempt….", "priority":"info", "facility":"local0", "client_addr":"10.234.254.90", "bucket":"2019042913", "evt_id":"33a3a040-6a7f-11e9-a8be-0050568115fd", "extradata":"[ ]", "fingerprint ":"73dd765f55a1791b667bd6160235e3f6 ", "rawdata ":"..... ", "pid":"-", "msgid":"-", "timestamp":"2019-04-29T14:03:37.000000Z" }
  • 37. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Logstash - Output
  • 38. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io output { if "syslog_rfc5424" in [tags] { kafka { codec => json topic_id => "syslog_rfc5424" bootstrap_servers => "{{ confluent_ksql_bootstrap_servers }}" security_protocol => SSL ssl_key_password => "{{ logstash_ssl_key_password }}" ssl_keystore_location => "/etc/logstash/logstash.keystore.jks" ssl_keystore_password => "{{ logstash_ssl_keystore_password }}" ssl_truststore_location => "/etc/logstash/logstash.truststore.jks" ssl_truststore_password => "{{ logstash_ssl_truststore_password }}" compression_type => "snappy" acks => "1" retries => "3" retry_backoff_ms => "500" request_timeout_ms => "2000" batch_size => "32768" ssl_endpoint_identification_algorithm => "https" ssl_keystore_type => jks } } }
  • 39. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Topic Filtering and Routing
  • 40. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Filter / Routing ● Some downstream systems are not interested in INFO - too much data
  • 41. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Filter / Routing ● Some downstream systems are not interested in INFO - too much data ● Some are only interested in Windows events for example.
  • 42. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io create stream syslog_rfc3164 (client_addr varchar, host varchar, timestamp varchar, severity varchar, message varchar, facility varchar, type varchar, priority varchar) with (KAFKA_TOPIC='syslog_rfc3164', VALUE_FORMAT='JSON’); create stream auth_rfc3164 with (KAFKA_TOPIC='syslog_auth', VALUE_FORMAT='JSON') AS SELECT * FROM syslog_rfc3164 WHERE message LIKE '%password check failed for user%' OR message LIKE '%An account failed to log on.%' OR message LIKE '%%0xC000006D’; create stream syslog_rfc5424 (facility varchar, message varchar, pid varchar, type varchar, timestamp varchar, ident varchar, client_addr varchar, host varchar, msgid varchar, extradata varchar, priority varchar) with (KAFKA_TOPIC='syslog_rfc5424', VALUE_FORMAT='JSON’); create stream auth_rfc5424 with (KAFKA_TOPIC='syslog_auth', VALUE_FORMAT='JSON') AS SELECT * FROM syslog_rfc5424 WHERE message LIKE '%password check failed%' OR extradata LIKE '%|309|%' OR message LIKE '%An account failed to log on.%' OR message LIKE '%%0xC000006D';
  • 43. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Destinations and Sinks
  • 44. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Destinations and Sink ● Use existing connectors
  • 45. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Destinations and Sink ● Use existing connectors ● Build your own connectors
  • 46. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Splunk HTTP Sink in Kafka Connect
  • 47. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io { "name": "syslog-sink-splunk", "config": { "connector.class": "SplunkHECSinkConnector", "tasks.max": "{{ tasks_max }}", "topics": "{{ topics }}", "splunk.endpoint.uri": "{{ splunk_endpoint_uri }}", "splunk.hec.token": "{{ splunk_hec_token }}", "splunk.index": "{{ splunk_index }}", "splunk.channelid": "{{ splunk_channelid }}", "splunk.sourcetype": "{{ splunk_sourcetype }}", "splunk.http.loglevel": "{{ splunk_http_loglevel }}", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "{{ splunk_value_converter_schemas_enable }}", "errors.tolerance": "{{ splunk_errors_tolerance }}", "errors.deadletterqueue.topic.name":"{{ errors_deadletterqueue_topic_name }}", "errors.deadletterqueue.topic.replication.factor": "{{ errors_deadletterqueue_topic_replication_factor }}" } }
  • 48. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Testing
  • 49. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Testing ● Process 3TB/day data volumes.
  • 50. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Testing ● Process 3TB/day data volumes. ● Prove the solution can scale horizontally.
  • 51. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Testing: Process 2.3TB/day data volumes. ● 3TB/day = 33MB/second
  • 52. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io
  • 53. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io • 400 threads were set up in the Thread Group to simulate 400 servers sending the logs.
  • 54. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io • 6 load injectors were setup, totalling 2400 threads (simulated servers), in order to generate between 20MB/second to 40 MB/second load against the endpoint from the injectors.
  • 55. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io • The load was injected over 5 days period at a sustained rate to ascertain the performance characteristics of each component over a prolonged duration.
  • 56. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Testing
  • 57. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Carry on the conversation: • Website: https://digitalis.io • Reddit: https://reddit.com/users/digitalis_io • Twitter: @digitalis_io
  • 58. ©2020 digitalis.io Ltd. Do not distribute without consent. https://digitalis.io Any Questions?