SlideShare a Scribd company logo
Open source tools for optimizing
your peering infrastructure
@ DE-CIX TechMeeting 2018-06-06
by Daniel Czerwonk
• Software / Network Engineer at Mauve Mailorder Software
• Head of Network Freifunk Essen e.V.
• AS44821 (Mauve), AS206356 (Freifunk Essen e.V.),
AS202739 (routing-rocks)
• birdwatcher and bio-routing contributor
• Twitter: @dan_nrw
• Github: https://github.com/czerwonk
• LinkedIn: https://www.linkedin.com/in/czerwonk/
Who is this guy? About me…
Our journey starts late 2016
A new networking setup is about to
be build
But before that:
Let’s talk about monitoring…
• Very small operations team
• Freifunk Essen should be even less ops demanding
• Identify trends/anomalies early
• Capacity planing (beware of retention)
• Source for alerting
• Start point for traffic engineering, etc.
• Source to build post mortem on (in case of outage)
• Dashboard to give a quick overview when needed
Why is monitoring important for me?
So, let’s build a monitoring system…
• Prometheus to collect metrics
• Grafana to visualize metrics
• Alertmanager with Pushover integration for alerting
• Everything Ansible managed
What I wanted…
+ +
• Bird routing daemon
• JunOS running on a few EX series switches
• Host metrics from bare metal software router machines (statistics, resources)
• External network latencies (RIPE ATLAS, etc.)
What I wanted to scrape?
What I found…
In 2016…
Metric Solution Problem
bird no exporter available
JunOS snmp_exporter
complex configuration,
bad performance
Host metrics node_exporter
Network latencies
blackbox_exporter with
external probe VMs
bad coverage,
only one request per scrape
• Official Prometheus project
• On Linux hosts (e.g. Routers)
• Network interface metrics
• Resource consumption: CPU load, RAM usage, Disk space
• Interrupts / context switches
• License: Apache 2.0
• Source: https://github.com/prometheus/node_exporter
node_exporter
At least we got the host metrics covered.
And the rest?
I had to solve that…
So I started to write some
exporters…
• Performance is key feature
• Need for concurrent processing
• Single binary / no dependencies
• Easy installation via go get …
• Existing client API for Prometheus
• Love writing code in golang in my spare time
Which programming language?
I chose golang:
atlas_exporter
RIPE ATLAS
Milestones to an exporter suite
bird_exporter
Bird 1.x
2016 20182017
RIPE LABS
article
Support for
bird 2.x
Replaced SNMP
by SSH
junos_exporter
Juniper JunOS
using SNMP
ping_exporter
ICMP probing
mikrotik-exporter
RouterOS
• Started late 2016
• Communicates with bird via socket
• Bird 1.x and 2.x supported
• Protocols: BGP, OSPFv2, OSPFv3, Kernel, Static, Device, Direct
• License: MIT
• Source: https://github.com/czerwonk/bird_exporter
bird_exporter
bird_exporter
bird_protocol_prefix_import_count{proto=~"BGP|OSPFv3",ip_version="6"}
count(bird_protocol_up{proto=“BGP"} == 1)
• BGP session state metrics
• BGP message counts (received, sent, withdrawn, etc.)
• Prefix counts for all supported protocols (imported, exported, filtered, etc.)
• OSPFv2/OSPFv3 neighbour counts
• Protocol uptime
bird_exporter - Features
• Started early 2018
• Replacement for RRD based smokeping
• Concerning ICMP also replacement for blackbox_exporter since lack of loss
detection
• Based on go-ping by Digineo: https://github.com/digineo/go-ping
• License: MIT
• Source: https://github.com/czerwonk/ping_exporter
ping_exporter
ping_exporter
ping_rtt_mean_ms{ip_version="6"}
ping_loss_percent{ip_version="4"}
• Sends and aggregates multiple ICMP ECHO requests
• Roundtrip metrics (current, best, worst)
• Simple way to detect loss
• Supports multiple targets
• DNS refresh ensures the correct IP is measured when DNS is changed
• Only ICMP support at the moment
• Warning: ICMP is not user traffic so keep that in mind when trying to interpret these
metrics
ping_exporter - Features
• Started early 2017
• Metrics by requesting measurement results from RIPE ATLAS
• Useful to get an outside view from different other networks
• License: LGPL3 (since the binding used is under this license)
• Source: https://github.com/czerwonk/atlas_exporter
• More info:
https://labs.ripe.net/Members/daniel_czerwonk/using-ripe-atlas-measurement-
results-in-prometheus-with-atlas_exporter
atlas_exporter
atlas_exporter
avg(atlas_ping_avg_latency{ip_version="4"}) by (asn)
avg(atlas_traceroute_hops{ip_version="4"}) by (asn)
• Ping (success, min/max/avg latency, dups, size)
• Traceroute (success, hop count, rtt)
• NTP (delay, derivation, ntp version)
• DNS (succress, rtt)
• HTTP (return code, rtt, http version, header size, body size)
• SSL Certificates (alert, rtt)
atlas_exporter - Features
• Started late 2017
• snmp_exporter did not perform as required
• First implementation using a simple set of SNMP OIDs
• Early 2018: reimplementation using SSH and XML RPC representation
• Alternative to Junipers OpenNTI since telemetry is only supported on newer
versions of JunOS and hardware
• License: MIT
• Source: https://github.com/czerwonk/junos_exporter
junos_exporter
• Interfaces (bytes transmitted/received, errors, drops)
• Routes (per table, by protocol)
• Alarms (count)
• BGP (message count, prefix counts per peer, session state)
• OSPFv2, OSPFv3 (number of neighbours)
• Interface diagnostics (optical signals)
• ISIS (number of adjacencies, total number of routers)
• Environment (temperatures)
• Routing engine statistics
junos_exporter - Features
• Contribution to existing project
• Only interface and resource metrics at this point
• Added several other features
• License: BSD3
• Source: https://github.com/nshttpd/mikrotik-exporter
mikrotik-exporter
• Interface metrics (RX bytes, TX bytes, drops, errors, etc.)
• BGP session states
• BGP message counts (updates, withdraws)
• DHCP leases
• DHCPv6 bindings
• Optical diagnostics
• IPv4/IPv6 pool counts
• System resources (memory, CPU load, etc.)
• Prefix counts per protocol (in RIB)
mikrotik-exporter - Features
Dashboard examples
How to combine several exporters?
Mauve Network Overview
Mauve Routing
Alerting
When and how?
How to alert?
What the SRE book has taught us:
https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html
How to alert? A few examples…
Port saturation:
Upstream session down:
Thank you for your attention.
Special thanks to all people contributed to my projects!

More Related Content

PDF
AWS Loft Talk: Behind the Scenes with SignalFx
PDF
TRAP (transient detection pipeline) status update
PDF
OSMC 2021 | Monitoring Open Source Hardware
PDF
Micro control idsecconf2010
PDF
OSMC 2021 | Robotmk: You don’t run IT – you deliver services!
PDF
How static analysis supports quality over 50 million lines of C++ code
PPTX
ChronoLogic Tools Demo: 6/12/18
PDF
Promcon2016
AWS Loft Talk: Behind the Scenes with SignalFx
TRAP (transient detection pipeline) status update
OSMC 2021 | Monitoring Open Source Hardware
Micro control idsecconf2010
OSMC 2021 | Robotmk: You don’t run IT – you deliver services!
How static analysis supports quality over 50 million lines of C++ code
ChronoLogic Tools Demo: 6/12/18
Promcon2016

What's hot (20)

PPTX
Flink. Pure Streaming
PPTX
Juggling with Bits and Bytes - How Apache Flink operates on binary data
PPTX
Eac integrations JS LiveStream
PDF
OSINT RF Reverse Engineering by Marc Newlin
PDF
Axon Server went RAFTing
PDF
Infrastructure & System Monitoring using Prometheus
PPTX
MeetUp Monitoring with Prometheus and Grafana (September 2018)
PPTX
Flink history, roadmap and vision
PPTX
Monitoring with Prometheus
PDF
Summit 16: StorPerf: Cinder Storage Performance Measurement
PDF
OSDC 2018 - Distributed monitoring
PPTX
Raptor codes
PDF
Declarative benchmarking of cassandra and it's data models
PPTX
A Science Project: Swift Serial Chat
PPTX
Training – Going Async
PPTX
SecureWV - APT2
PPTX
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
PPTX
DerbyCon - APT2
PPTX
Upstream Testing Collaboration
PDF
My Journey with Laravel by Shavkat, Ecompile.io
Flink. Pure Streaming
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Eac integrations JS LiveStream
OSINT RF Reverse Engineering by Marc Newlin
Axon Server went RAFTing
Infrastructure & System Monitoring using Prometheus
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Flink history, roadmap and vision
Monitoring with Prometheus
Summit 16: StorPerf: Cinder Storage Performance Measurement
OSDC 2018 - Distributed monitoring
Raptor codes
Declarative benchmarking of cassandra and it's data models
A Science Project: Swift Serial Chat
Training – Going Async
SecureWV - APT2
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
DerbyCon - APT2
Upstream Testing Collaboration
My Journey with Laravel by Shavkat, Ecompile.io
Ad

Similar to Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018 (20)

PDF
Model-driven Network Automation
PDF
DevOps Spain 2019. Beatriz Martínez-IBM
PDF
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
PDF
Model-driven Network Management
PDF
IDNOG3-Jimmy-CloudFlare
PDF
21 - IDNOG03 - Jimmy Halim (Cloudflare) - Brief Introduction of CloudFlare, t...
PPTX
OpenTelemetry For Operators
PPTX
Sanger, upcoming Openstack for Bio-informaticians
PPTX
Flexible compute
PPTX
Herding cats & catching fire: Workday's telemetry & middleware
PDF
Ceilometer presentation ODS Grizzly.pdf
PDF
Network Automation with Salt and NAPALM: a self-resilient network
PPTX
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
PPTX
OpenTelemetry For Architects
PDF
RIPE Atlas Tools for Operators and IXPs
PDF
Improving monitoring systems Interoperability with OpenMetrics
PDF
Time series denver an introduction to prometheus
PDF
Demonstrating 100 Gbps in and out of the Clouds
PDF
How our Cloudy Mindsets Approached Physical Routers
PDF
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Model-driven Network Automation
DevOps Spain 2019. Beatriz Martínez-IBM
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
Model-driven Network Management
IDNOG3-Jimmy-CloudFlare
21 - IDNOG03 - Jimmy Halim (Cloudflare) - Brief Introduction of CloudFlare, t...
OpenTelemetry For Operators
Sanger, upcoming Openstack for Bio-informaticians
Flexible compute
Herding cats & catching fire: Workday's telemetry & middleware
Ceilometer presentation ODS Grizzly.pdf
Network Automation with Salt and NAPALM: a self-resilient network
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
OpenTelemetry For Architects
RIPE Atlas Tools for Operators and IXPs
Improving monitoring systems Interoperability with OpenMetrics
Time series denver an introduction to prometheus
Demonstrating 100 Gbps in and out of the Clouds
How our Cloudy Mindsets Approached Physical Routers
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Ad

Recently uploaded (20)

PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Digital Strategies for Manufacturing Companies
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
medical staffing services at VALiNTRY
PPTX
Introduction to Artificial Intelligence
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
top salesforce developer skills in 2025.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
System and Network Administration Chapter 2
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
assetexplorer- product-overview - presentation
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Computer Software and OS of computer science of grade 11.pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
How to Migrate SBCGlobal Email to Yahoo Easily
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Digital Strategies for Manufacturing Companies
Which alternative to Crystal Reports is best for small or large businesses.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
medical staffing services at VALiNTRY
Introduction to Artificial Intelligence
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
top salesforce developer skills in 2025.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
System and Network Administration Chapter 2
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
assetexplorer- product-overview - presentation
Designing Intelligence for the Shop Floor.pdf
Odoo Companies in India – Driving Business Transformation.pdf

Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018

  • 1. Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018-06-06 by Daniel Czerwonk
  • 2. • Software / Network Engineer at Mauve Mailorder Software • Head of Network Freifunk Essen e.V. • AS44821 (Mauve), AS206356 (Freifunk Essen e.V.), AS202739 (routing-rocks) • birdwatcher and bio-routing contributor • Twitter: @dan_nrw • Github: https://github.com/czerwonk • LinkedIn: https://www.linkedin.com/in/czerwonk/ Who is this guy? About me…
  • 3. Our journey starts late 2016 A new networking setup is about to be build
  • 4. But before that: Let’s talk about monitoring…
  • 5. • Very small operations team • Freifunk Essen should be even less ops demanding • Identify trends/anomalies early • Capacity planing (beware of retention) • Source for alerting • Start point for traffic engineering, etc. • Source to build post mortem on (in case of outage) • Dashboard to give a quick overview when needed Why is monitoring important for me?
  • 6. So, let’s build a monitoring system…
  • 7. • Prometheus to collect metrics • Grafana to visualize metrics • Alertmanager with Pushover integration for alerting • Everything Ansible managed What I wanted… + +
  • 8. • Bird routing daemon • JunOS running on a few EX series switches • Host metrics from bare metal software router machines (statistics, resources) • External network latencies (RIPE ATLAS, etc.) What I wanted to scrape?
  • 10. In 2016… Metric Solution Problem bird no exporter available JunOS snmp_exporter complex configuration, bad performance Host metrics node_exporter Network latencies blackbox_exporter with external probe VMs bad coverage, only one request per scrape
  • 11. • Official Prometheus project • On Linux hosts (e.g. Routers) • Network interface metrics • Resource consumption: CPU load, RAM usage, Disk space • Interrupts / context switches • License: Apache 2.0 • Source: https://github.com/prometheus/node_exporter node_exporter
  • 12. At least we got the host metrics covered. And the rest? I had to solve that…
  • 13. So I started to write some exporters…
  • 14. • Performance is key feature • Need for concurrent processing • Single binary / no dependencies • Easy installation via go get … • Existing client API for Prometheus • Love writing code in golang in my spare time Which programming language? I chose golang:
  • 15. atlas_exporter RIPE ATLAS Milestones to an exporter suite bird_exporter Bird 1.x 2016 20182017 RIPE LABS article Support for bird 2.x Replaced SNMP by SSH junos_exporter Juniper JunOS using SNMP ping_exporter ICMP probing mikrotik-exporter RouterOS
  • 16. • Started late 2016 • Communicates with bird via socket • Bird 1.x and 2.x supported • Protocols: BGP, OSPFv2, OSPFv3, Kernel, Static, Device, Direct • License: MIT • Source: https://github.com/czerwonk/bird_exporter bird_exporter
  • 18. • BGP session state metrics • BGP message counts (received, sent, withdrawn, etc.) • Prefix counts for all supported protocols (imported, exported, filtered, etc.) • OSPFv2/OSPFv3 neighbour counts • Protocol uptime bird_exporter - Features
  • 19. • Started early 2018 • Replacement for RRD based smokeping • Concerning ICMP also replacement for blackbox_exporter since lack of loss detection • Based on go-ping by Digineo: https://github.com/digineo/go-ping • License: MIT • Source: https://github.com/czerwonk/ping_exporter ping_exporter
  • 21. • Sends and aggregates multiple ICMP ECHO requests • Roundtrip metrics (current, best, worst) • Simple way to detect loss • Supports multiple targets • DNS refresh ensures the correct IP is measured when DNS is changed • Only ICMP support at the moment • Warning: ICMP is not user traffic so keep that in mind when trying to interpret these metrics ping_exporter - Features
  • 22. • Started early 2017 • Metrics by requesting measurement results from RIPE ATLAS • Useful to get an outside view from different other networks • License: LGPL3 (since the binding used is under this license) • Source: https://github.com/czerwonk/atlas_exporter • More info: https://labs.ripe.net/Members/daniel_czerwonk/using-ripe-atlas-measurement- results-in-prometheus-with-atlas_exporter atlas_exporter
  • 24. • Ping (success, min/max/avg latency, dups, size) • Traceroute (success, hop count, rtt) • NTP (delay, derivation, ntp version) • DNS (succress, rtt) • HTTP (return code, rtt, http version, header size, body size) • SSL Certificates (alert, rtt) atlas_exporter - Features
  • 25. • Started late 2017 • snmp_exporter did not perform as required • First implementation using a simple set of SNMP OIDs • Early 2018: reimplementation using SSH and XML RPC representation • Alternative to Junipers OpenNTI since telemetry is only supported on newer versions of JunOS and hardware • License: MIT • Source: https://github.com/czerwonk/junos_exporter junos_exporter
  • 26. • Interfaces (bytes transmitted/received, errors, drops) • Routes (per table, by protocol) • Alarms (count) • BGP (message count, prefix counts per peer, session state) • OSPFv2, OSPFv3 (number of neighbours) • Interface diagnostics (optical signals) • ISIS (number of adjacencies, total number of routers) • Environment (temperatures) • Routing engine statistics junos_exporter - Features
  • 27. • Contribution to existing project • Only interface and resource metrics at this point • Added several other features • License: BSD3 • Source: https://github.com/nshttpd/mikrotik-exporter mikrotik-exporter
  • 28. • Interface metrics (RX bytes, TX bytes, drops, errors, etc.) • BGP session states • BGP message counts (updates, withdraws) • DHCP leases • DHCPv6 bindings • Optical diagnostics • IPv4/IPv6 pool counts • System resources (memory, CPU load, etc.) • Prefix counts per protocol (in RIB) mikrotik-exporter - Features
  • 29. Dashboard examples How to combine several exporters?
  • 33. How to alert? What the SRE book has taught us: https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html
  • 34. How to alert? A few examples… Port saturation: Upstream session down:
  • 35. Thank you for your attention. Special thanks to all people contributed to my projects!