SlideShare a Scribd company logo
Protecting the Web
at a scale using
consul and ELK
vaLentin chernoZemski • 07.Nov.2017
SiteGround • https://siteground.com
Who am I:
Валентин Черноземски / 33 / Bulgaria
SiteGround … 12 years
What I do
Computers for food and fun since I was 15
What I like
Science, art, history, science, *ing things
Dislike problems at work … :)
A problem of a hosting company
We host sites, sites get hacked
• brute-force attacks
• sql injection attacks
• command execution
• remote file disclosure
• unauthorized file upload
• … you name it
Malicious activity
• Spam
• Scam
• Phishing
• Malware
• Global sadness
• No silver bullet
Scale of the problem
● Managing more than 12 000 servers
● Hosting more than 1 000 000 web sites
● 60% WordPress, Drupal, Joomla!, Magento
● More than 120 000 000 malicious requests/day
● 10 000 malicious hits per server/day
What we needed?
● Reliably detect ongoing attacks
● Block them before they succeed or reach other sites
● Avoid false positives as much as possible
● Make customers happy again
System requirements
● Scalable handle number of events
● Available tolerate failures
● Easy to maintain managed as code (git)
● Easy to extend plug & play
● Data center aware “web scale”
Layers TL;DR;
● Attack detection
● Data collection
● Data analyses
● Data distribution
● Data visualisation
Attack detection
Attack detection
Sensors and logs
• mod_security
• naxsi
• in-house
• authentication logs
• play-with-its-logs
Service
• apache
• nginx
• WordPress, Drupal, Magento, Joomla
• FTP, SMTP, POP3, IMAP, SSH
• plug-your-system-here
Data collection
Data collection layer
● filebeat ship
● logstash enrich
● elasticsearch store
● kibana visualise
Filebeat
● Read lines
● Send them over the wire
● As JSON
● In “real time” (see GOTCHA)
● Payload is encrypted
● Payload is signed and authenticated
● Fault tolerant with exponential backoff on failure
https://www.elastic.co/products/filebeat
Logstash
● Read events over the wire
● Real time
● Authenticate them
● Decrypt them
● Normalize and enrich them with grok filters
● Send them to backend for storage
● Support multiple storage backends
● Fault tolerant in case of storage backend outage
https://www.elastic.co/products/logstash
Elasticsearch
● Search engine
● Store and index arbitrary “documents”
● Scalable
● Highly available
● Fault tolerant by design
● Query large data sets
● Allows us to get meaning from vast amount of data
https://www.elastic.co/products/elasticsearch
Kibana
● Picture worth a thousand words
● Visualise data stored in elasticsearch
● Visualisers
● Dashboards
● Time series databases
● Graph explorer
https://www.elastic.co/products/kibana
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround)
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround)
Data analyses
Block list generation
● Uses data stored in elasticsearch as input
● Custom rules
● Produces ip and network blocklists as output
● A lot of room for improvements
Дistribution in 60 seconds
Data distribution layer
● consul central kv store and services catalog
● consul-replicate cross dc, consul data distribution
● consul-template handle dynamic configurations
consul
● Simplify service discovery and service configuration
● Join all machines in a cluster
● Failure detection on top of GOSSIP
● Built in KV store master -> slave
● Distributed locking in KV
● Event message distribution on top of GOSSIP
● Easy to query DNS, HTTP, CLI
● Concept to use your DC as a database
https://www.consul.io/
consul security
● Clear threat model
● Encrypted communication (pre shared key)
● Authenticated membership (signed SSL certificate)
● Per client token(s)
● Per token ACL roles
● Per endpoint ACL roles
● Does not require extra privileges
https://www.consul.io/docs/internals/security.html
consul scalability
● Highly available multiple servers, raft, built in failover
● Highly scalable consistent or stale responses
● Data center aware members, states
● Services aware registered services, service states
● Low net overhead GOSSIP
https://www.consul.io/docs/guides/index.html
consul watches
● Key watches /role/nginx/reload
● Key prefix watches /role/ipset/blocked
● Service(s) watches “logstash service registered on node X”
● Checks watches “elasticsearch failing on node Y”
● Nodes watches “example.com joined cluster”
● Event watches “eventname” {payload < 100 bytes}
https://www.consul.io/docs/agent/watches.html
consul in production?
● Cluster size 1900 and growing
● DCs 6 and growing
● Easy to integrate via common tools, HTTP, CLI
● It complements ansible, puppet, etc.
● Awesome docs
● Community driven
https://www.consul.io/intro/vs/index.html
consul-replicate
● Replicate specific KV prefixes
● Skip prefixes that are not important
● As easy as go binary
This is your master dc | This KV is important | Sync it
● Running on each consul server node
● Only one instance active at a time - HA with “consul lock”
https://github.com/hashicorp/consul-replicate/
consul-template
● read / monitor arbitrary consul data (from KV, service catalog, etc.)
● parse Go template format
{{range service "logstash"}}
hosts: “{{.Address}}”
{{end}}
● output is rendered on disk i.e. /etc/filebeat/filebeat.yml
● execute post render handler i.e. service filebeat reload
https://github.com/hashicorp/consul-template
consul-template + service discovery
● /etc/filebeat/filebeat.yml logstash nodes
● /etc/logstash/output.conf elasticsearch nodes
● /etc/nginx/captcha.conf captcha nodes
● node with service/role X added/removed/failed
● Render configs & reload services on state change
https://github.com/hashicorp/consul-template
consul-template + kv block distribution
● /etc/sysconfig/ipset blocked networks
● /etc/nginx/blacklist.conf blocked networks
https://github.com/hashicorp/consul-template
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround)
Results
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround)
Deployment results
● More than 12 000 servers protected
● More than 1 000 000 web sites protected
● Global attack detection latency ~5 seconds
● Global blocklist distribution 60 seconds (by design)
Webapps bruteforce filtering rate
● WordPress, Drupal, Joomla etc. bruteforce attacks rendered useless
Filtered malicious http(s) requests
● 120 millions / 24 hours
● 800 millions / 7 days
● 3 billions / 30 days
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround)
Services bruteforce filtering rate
● ftp, smtp, imap etc. bruteforce attacks rendered useless
Failed logins rate
● ~52 failed logins / second (on the whole infrastructure)
● ~8 failed logins / second / per service
● ~0.00066 failed logins / second / service / per server
● ~0.00000874 failed logins / second / service / per domain
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround)
So what is so cool?
● Scalable
● Available
● Fault tolerant
● Self assembled - built in service discovery
● Event based
● Easy to manage
● Easy to extend
Notes
TODO
● Make system real time
● filebeat tuning
● more logstash ingest nodes
● more elasticsearch index nodes for faster indexing
● Plug more sensors and co-relaltions
● Block attackers earlier
● Blocklist sharing and automated abuse reports to ISPs
Gotchas
● consul ARP traffic - https://www.youtube.com/watch?v=LUgE-sM5L4A
● filebeat realtime log shipping configuration is not that straightforward
● elasticsearch is feeling much better on top of SSDs … for obvious reasons
● Machine with role X export set of services [Y,Z]
● services later can be references by consul-template.
● consul GOSSIP events payload size is limited (< 100 bytes)
Why don’t you?
● Problem can be solved in many ways
● No solution solves them all
● System fits our stack and requirements
● Security is a process, systems are adaptive and evolve
● Qrator, Cloudflare, Sucuri etc. … our system does not replace but rather
complement
EOF
Tear me apart now :)
Questions?
valentin <-[o]-> sitegrond.com

More Related Content

PDF
NATS vs HTTP
PDF
Simple Solutions for Complex Problems
PDF
Inside election night at The New York Times | Altitude NYC
POTX
Putting the Go in MongoDB: How We Rebuilt The MongoDB Tools in Go
PPTX
A Free New World: Atlas Free Tier and How It Was Born
PPTX
Fastly CEO Artur Bergman at Altitude NYC
PDF
Build real time stream processing applications using Apache Kafka
PDF
Build intelligent, real-time applications using Machine Learning
NATS vs HTTP
Simple Solutions for Complex Problems
Inside election night at The New York Times | Altitude NYC
Putting the Go in MongoDB: How We Rebuilt The MongoDB Tools in Go
A Free New World: Atlas Free Tier and How It Was Born
Fastly CEO Artur Bergman at Altitude NYC
Build real time stream processing applications using Apache Kafka
Build intelligent, real-time applications using Machine Learning

What's hot (17)

PDF
Simple Solutions for Complex Problems - Boulder Meetup
PDF
A New Way of Thinking | NATS 2.0 & Connectivity
PDF
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
NOSQL in the Cloud
PDF
An analysis of TLS handshake proxying
PDF
Implementing Microservices with NATS
PPTX
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
PDF
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
PDF
Running Secure Server Software on Insecure Hardware Without Parachute
POT
Slideshare
PDF
Packetbeat at GDG Berlin meetup
PDF
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
PPTX
Running a Robust DNS Infrastructure with CloudFlare Virtual DNS
PDF
Timur Shemsedinov "Эволюция архитектуры ИС"
PPTX
Altitude SF 2017: Reddit - How we built and scaled r/place
PPTX
Building the Real-Time Performance Panel
PPTX
Introduction to Azure DocumentDB
Simple Solutions for Complex Problems - Boulder Meetup
A New Way of Thinking | NATS 2.0 & Connectivity
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
NOSQL in the Cloud
An analysis of TLS handshake proxying
Implementing Microservices with NATS
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
Running Secure Server Software on Insecure Hardware Without Parachute
Slideshare
Packetbeat at GDG Berlin meetup
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Running a Robust DNS Infrastructure with CloudFlare Virtual DNS
Timur Shemsedinov "Эволюция архитектуры ИС"
Altitude SF 2017: Reddit - How we built and scaled r/place
Building the Real-Time Performance Panel
Introduction to Azure DocumentDB
Ad

Viewers also liked (10)

PDF
DevOps-трансформация Альфа-Банка / Антон Исанин (Альфа-Банк)
PDF
Инцидент-менеджмент в Badoo / Илья Аблеев (Badoo)
PDF
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
PPTX
Искусство предсказания: как давать более точные оценки времени проекта / Андр...
PDF
Переосмысливая подход к инфраструктурному коду / Евгений Пивень (IPONWEB)
PPTX
ElasticSearch и Heka: как мы учились просеивать слона через сито / Адель Сачк...
PDF
Лучшие практики CI/CD с Kubernetes и GitLab / Дмитрий Столяров (Флант)
PDF
Эксплуатация container-based-инфраструктур / Николай Сивко (okmeter.io)
PDF
Мониторинг облачной CI-системы на примере Jenkins / Александр Акбашев (HERE T...
PPTX
Как мы поддерживаем 100 разных версий клиентов в Badoo / Ярослав Голуб (Badoo)
DevOps-трансформация Альфа-Банка / Антон Исанин (Альфа-Банк)
Инцидент-менеджмент в Badoo / Илья Аблеев (Badoo)
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Искусство предсказания: как давать более точные оценки времени проекта / Андр...
Переосмысливая подход к инфраструктурному коду / Евгений Пивень (IPONWEB)
ElasticSearch и Heka: как мы учились просеивать слона через сито / Адель Сачк...
Лучшие практики CI/CD с Kubernetes и GitLab / Дмитрий Столяров (Флант)
Эксплуатация container-based-инфраструктур / Николай Сивко (okmeter.io)
Мониторинг облачной CI-системы на примере Jenkins / Александр Акбашев (HERE T...
Как мы поддерживаем 100 разных версий клиентов в Badoo / Ярослав Голуб (Badoo)
Ad

Similar to Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround) (20)

PDF
Security Monitoring for big Infrastructures without a Million Dollar budget
PPTX
PDF
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
PPTX
Node.js Web Apps @ ebay scale
PDF
Web performance mercadolibre - ECI 2013
PDF
What is Nginx and Why You Should to Use it with Wordpress Hosting
PDF
Log Management: AtlSecCon2015
PDF
Netflix Open Source Meetup Season 4 Episode 2
PDF
Neoito — Secure coding practices
PDF
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
PDF
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
PDF
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
PPTX
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
PDF
Insecurity-In-Security version.1 (2010)
PDF
Web performance optimization - MercadoLibre
PDF
FastNetMon and Metrics
PPT
18_Node.js.ppt
ODP
Log aggregation and analysis
PDF
[WSO2Con EU 2018] The Rise of Streaming SQL
PDF
Cloud arch patterns
Security Monitoring for big Infrastructures without a Million Dollar budget
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Node.js Web Apps @ ebay scale
Web performance mercadolibre - ECI 2013
What is Nginx and Why You Should to Use it with Wordpress Hosting
Log Management: AtlSecCon2015
Netflix Open Source Meetup Season 4 Episode 2
Neoito — Secure coding practices
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Insecurity-In-Security version.1 (2010)
Web performance optimization - MercadoLibre
FastNetMon and Metrics
18_Node.js.ppt
Log aggregation and analysis
[WSO2Con EU 2018] The Rise of Streaming SQL
Cloud arch patterns

More from Ontico (20)

PDF
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
PDF
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
PPTX
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
PDF
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
PDF
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PDF
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
PDF
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
PPTX
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
PPTX
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
PDF
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
PPTX
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
PPTX
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
PDF
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
PPT
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
PPTX
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
PPTX
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
PPTX
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
PPTX
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
PDF
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
PDF
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)

Recently uploaded (20)

PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
Abrasive, erosive and cavitation wear.pdf
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
PPT on Performance Review to get promotions
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Information Storage and Retrieval Techniques Unit III
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PPTX
UNIT - 3 Total quality Management .pptx
PDF
Soil Improvement Techniques Note - Rabbi
Exploratory_Data_Analysis_Fundamentals.pdf
Abrasive, erosive and cavitation wear.pdf
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Safety Seminar civil to be ensured for safe working.
Nature of X-rays, X- Ray Equipment, Fluoroscopy
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPT on Performance Review to get promotions
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
R24 SURVEYING LAB MANUAL for civil enggi
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
Information Storage and Retrieval Techniques Unit III
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
III.4.1.2_The_Space_Environment.p pdffdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
UNIT - 3 Total quality Management .pptx
Soil Improvement Techniques Note - Rabbi

Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround)

  • 1. Protecting the Web at a scale using consul and ELK vaLentin chernoZemski • 07.Nov.2017 SiteGround • https://siteground.com
  • 2. Who am I: Валентин Черноземски / 33 / Bulgaria SiteGround … 12 years What I do Computers for food and fun since I was 15 What I like Science, art, history, science, *ing things Dislike problems at work … :)
  • 3. A problem of a hosting company We host sites, sites get hacked • brute-force attacks • sql injection attacks • command execution • remote file disclosure • unauthorized file upload • … you name it Malicious activity • Spam • Scam • Phishing • Malware • Global sadness • No silver bullet
  • 4. Scale of the problem ● Managing more than 12 000 servers ● Hosting more than 1 000 000 web sites ● 60% WordPress, Drupal, Joomla!, Magento ● More than 120 000 000 malicious requests/day ● 10 000 malicious hits per server/day
  • 5. What we needed? ● Reliably detect ongoing attacks ● Block them before they succeed or reach other sites ● Avoid false positives as much as possible ● Make customers happy again
  • 6. System requirements ● Scalable handle number of events ● Available tolerate failures ● Easy to maintain managed as code (git) ● Easy to extend plug & play ● Data center aware “web scale”
  • 7. Layers TL;DR; ● Attack detection ● Data collection ● Data analyses ● Data distribution ● Data visualisation
  • 9. Attack detection Sensors and logs • mod_security • naxsi • in-house • authentication logs • play-with-its-logs Service • apache • nginx • WordPress, Drupal, Magento, Joomla • FTP, SMTP, POP3, IMAP, SSH • plug-your-system-here
  • 11. Data collection layer ● filebeat ship ● logstash enrich ● elasticsearch store ● kibana visualise
  • 12. Filebeat ● Read lines ● Send them over the wire ● As JSON ● In “real time” (see GOTCHA) ● Payload is encrypted ● Payload is signed and authenticated ● Fault tolerant with exponential backoff on failure https://www.elastic.co/products/filebeat
  • 13. Logstash ● Read events over the wire ● Real time ● Authenticate them ● Decrypt them ● Normalize and enrich them with grok filters ● Send them to backend for storage ● Support multiple storage backends ● Fault tolerant in case of storage backend outage https://www.elastic.co/products/logstash
  • 14. Elasticsearch ● Search engine ● Store and index arbitrary “documents” ● Scalable ● Highly available ● Fault tolerant by design ● Query large data sets ● Allows us to get meaning from vast amount of data https://www.elastic.co/products/elasticsearch
  • 15. Kibana ● Picture worth a thousand words ● Visualise data stored in elasticsearch ● Visualisers ● Dashboards ● Time series databases ● Graph explorer https://www.elastic.co/products/kibana
  • 19. Block list generation ● Uses data stored in elasticsearch as input ● Custom rules ● Produces ip and network blocklists as output ● A lot of room for improvements
  • 21. Data distribution layer ● consul central kv store and services catalog ● consul-replicate cross dc, consul data distribution ● consul-template handle dynamic configurations
  • 22. consul ● Simplify service discovery and service configuration ● Join all machines in a cluster ● Failure detection on top of GOSSIP ● Built in KV store master -> slave ● Distributed locking in KV ● Event message distribution on top of GOSSIP ● Easy to query DNS, HTTP, CLI ● Concept to use your DC as a database https://www.consul.io/
  • 23. consul security ● Clear threat model ● Encrypted communication (pre shared key) ● Authenticated membership (signed SSL certificate) ● Per client token(s) ● Per token ACL roles ● Per endpoint ACL roles ● Does not require extra privileges https://www.consul.io/docs/internals/security.html
  • 24. consul scalability ● Highly available multiple servers, raft, built in failover ● Highly scalable consistent or stale responses ● Data center aware members, states ● Services aware registered services, service states ● Low net overhead GOSSIP https://www.consul.io/docs/guides/index.html
  • 25. consul watches ● Key watches /role/nginx/reload ● Key prefix watches /role/ipset/blocked ● Service(s) watches “logstash service registered on node X” ● Checks watches “elasticsearch failing on node Y” ● Nodes watches “example.com joined cluster” ● Event watches “eventname” {payload < 100 bytes} https://www.consul.io/docs/agent/watches.html
  • 26. consul in production? ● Cluster size 1900 and growing ● DCs 6 and growing ● Easy to integrate via common tools, HTTP, CLI ● It complements ansible, puppet, etc. ● Awesome docs ● Community driven https://www.consul.io/intro/vs/index.html
  • 27. consul-replicate ● Replicate specific KV prefixes ● Skip prefixes that are not important ● As easy as go binary This is your master dc | This KV is important | Sync it ● Running on each consul server node ● Only one instance active at a time - HA with “consul lock” https://github.com/hashicorp/consul-replicate/
  • 28. consul-template ● read / monitor arbitrary consul data (from KV, service catalog, etc.) ● parse Go template format {{range service "logstash"}} hosts: “{{.Address}}” {{end}} ● output is rendered on disk i.e. /etc/filebeat/filebeat.yml ● execute post render handler i.e. service filebeat reload https://github.com/hashicorp/consul-template
  • 29. consul-template + service discovery ● /etc/filebeat/filebeat.yml logstash nodes ● /etc/logstash/output.conf elasticsearch nodes ● /etc/nginx/captcha.conf captcha nodes ● node with service/role X added/removed/failed ● Render configs & reload services on state change https://github.com/hashicorp/consul-template
  • 30. consul-template + kv block distribution ● /etc/sysconfig/ipset blocked networks ● /etc/nginx/blacklist.conf blocked networks https://github.com/hashicorp/consul-template
  • 34. Deployment results ● More than 12 000 servers protected ● More than 1 000 000 web sites protected ● Global attack detection latency ~5 seconds ● Global blocklist distribution 60 seconds (by design)
  • 35. Webapps bruteforce filtering rate ● WordPress, Drupal, Joomla etc. bruteforce attacks rendered useless Filtered malicious http(s) requests ● 120 millions / 24 hours ● 800 millions / 7 days ● 3 billions / 30 days
  • 37. Services bruteforce filtering rate ● ftp, smtp, imap etc. bruteforce attacks rendered useless Failed logins rate ● ~52 failed logins / second (on the whole infrastructure) ● ~8 failed logins / second / per service ● ~0.00066 failed logins / second / service / per server ● ~0.00000874 failed logins / second / service / per domain
  • 39. So what is so cool? ● Scalable ● Available ● Fault tolerant ● Self assembled - built in service discovery ● Event based ● Easy to manage ● Easy to extend
  • 40. Notes
  • 41. TODO ● Make system real time ● filebeat tuning ● more logstash ingest nodes ● more elasticsearch index nodes for faster indexing ● Plug more sensors and co-relaltions ● Block attackers earlier ● Blocklist sharing and automated abuse reports to ISPs
  • 42. Gotchas ● consul ARP traffic - https://www.youtube.com/watch?v=LUgE-sM5L4A ● filebeat realtime log shipping configuration is not that straightforward ● elasticsearch is feeling much better on top of SSDs … for obvious reasons ● Machine with role X export set of services [Y,Z] ● services later can be references by consul-template. ● consul GOSSIP events payload size is limited (< 100 bytes)
  • 43. Why don’t you? ● Problem can be solved in many ways ● No solution solves them all ● System fits our stack and requirements ● Security is a process, systems are adaptive and evolve ● Qrator, Cloudflare, Sucuri etc. … our system does not replace but rather complement
  • 44. EOF Tear me apart now :) Questions? valentin <-[o]-> sitegrond.com