SlideShare a Scribd company logo
Monitoring With Riemann
About
(def about-me
{:name “Abhishek Anand Amralkar”
:shortname “@aaa”
:from “Talentica Software Pvt. Ltd”
:social { :blog “https://medium.com/@aamralkar”
:twitter “https://twitter.com/aamralkar”
:github “https://github.com/abhishekamralkar”} })
Agenda
● What Riemann is?
● Why Riemann?
● Concepts of Riemann
○ Indexes
○ Streams
○ Events
● Configuration walk through
Why Monitoring?
● To understand the Business risks.
● To plan for capacity.
● To catch the issue early in infrastructure/Applications.
● To compli SLA’s promised to customers.
● To know what part of infrastructure/application broken rather customer telling
letting us know.
Why Monitoring?
● Make sure application is stable and performance is under SLA and available
across the globe.
● Systems are getting complex. Humans needs Sleep.
What needs to be Monitored?
● Cloud
● Infrastructure
● Operating Systems
● Business Applications
How Many Types of Monitoring?
● Push
● Pull
● Customers -- The Good ones who send email/chat to support and tell them
“Dear XYZ your application foobar module is not working.
Please check”
Challenges in Distributed Systems Monitoring?
● Multiple distributed machines emitting hundreds of thousands of metrics every
second.
● What metrics need to be monitored?
● What metrics should be alerted?
● How frequently alerted?
Challenges in Distributed Systems Monitoring?
● Storage of useful metrics
● Real Time metrics so that we know when the problem occurs.
● Real Time alerting if something fails in system.
● Sending out metrics to Enterprise Monitoring Solutions is costly.
● Informative Dashboards across all environment?
What Riemann is?
In one word “ Riemann is Event Aggregator”
● A monitoring tool which aggregates events from servers and applications
running on those servers.
● Riemann uses powerful stream processing language written in Clojure to
aggregates the events.
Why Riemann?
● Low latency events processing monitoring engine.
● Can process millions events per second.
● Events and Events Stream processing, which makes its highly
adaptable/ideal for dynamic and Distributed Systems.
● Riemann configuration file is Clojure Program.
● Riemann is Highly Configurable.
Why Riemann?
● Full control on your Infrastructure/Application monitoring and alerting.
● Can monitor anything.
● Comes with its own Instrumentation to measures own performance.
● Can send alerts to email, chats and SMS and many more..
● Can get connected to backend time series databases InfluxDB and Graphite
to store metrics for historical data.
Riemann in our case
Riemann Events
● Events in Riemann are the base construct.
● Riemann receives events and process them
● Events are structs (records) which Riemann treats as map. (immutable)
● Events fields are referred by Keywords in config like :host, :service, :tags.
● Apart from the standard fields, custom fields can also be sent in the event.
How Event Looks Like? A Clojure map (immutable
for sure)
Riemann Streams
● Streams are Clojure functions that we can define.
● Streams are defined in stream section of the Riemann config file.
● Streams can have child stream.
● Events get passed to the streams for aggregation, modification and alerting.
● Riemann config can have as many streams.
● Riemann comes with a Powerful Stream Processing Language.
Monitoring with riemann
Riemann Indexes
● Table of current state of all services tracked by Riemann.
● Each event is uniquely indexed by its host and service. The index just keeps
track of the most recent event for a given (host, service) pair.
● Index can have TTL (time to leave).
● Events in Index expires after the TTL.
Riemann Config
● Below is valid Clojure expression where we are calling logging namespace
and init function which takes a map {:key value}.
Riemann Config
● Clojure let a lexical scope where its taking vector of one or more binding in our case a vector of
riemann host, followed by functions call like tcp-server, udp-server
Monitoring with riemann
In built Filtering Streams
● where
● match
● tagged
● tagged-all
● expired
Rate Events Streams
● rollup
● throttle
Coalesce
The coalesce stream remembers the last events from each host and service, and
sends them all as a vector to its children. We can map that vector of events to a
single event--the one with the largest metric--using folds/maximum. Then we just
set the service and host, since this event pertains to the system as a whole.
Monitoring with riemann
Events Grouping
● moving-time-window
● moving-event-window
● fixed-time-window
● fixed-event-window
Time for Action
● We will demonstrate how to send Zookeeper metrics to Riemann and
InfluxDB.
● Also the Riemann/Grafana Dashboard for the same.
Thank You All!
Questions If Any?

More Related Content

PDF
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
PDF
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
PDF
Flink Forward Berlin 2018: Shriya Arora - "Taming large-state to join dataset...
PDF
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
PPTX
Distributed monitoring
PDF
Real Time Test Data with Grafana
PDF
Moving RDF Stream Processing to the Client
PDF
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Shriya Arora - "Taming large-state to join dataset...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Distributed monitoring
Real Time Test Data with Grafana
Moving RDF Stream Processing to the Client
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018

What's hot (20)

PDF
Scalable Dynamic Data Consumption on the Web
PDF
Continuous Self-Updating Query Results over Dynamic Linked Data
PPTX
New Relic_Heroku_Presentation_Dreamforce11
PDF
ULMAN GUI Specifications
PDF
Continuously Updating Query Results over Real-Time Linked Data
PPTX
Adopting serverless
PDF
ConXedge US Brochure 1 0116 WEB
PDF
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
PPTX
Using Processes and Timers for Long-Running Asynchronous Tasks
PPTX
Distributing Transactions using MassTransit
PPTX
Join semantics in kafka streams
ODP
Insight Demo
ODP
Insight Recent Demo
PPTX
Reliability at scale
PDF
Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...
PPTX
Apache Kafka : Monitoring vs Alerting
PPTX
Using TICK Stack For System and App Metrics
PDF
How Robinhood Built a Real-Time Anomaly Detection System to Monitor and Mitig...
PDF
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
PDF
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Scalable Dynamic Data Consumption on the Web
Continuous Self-Updating Query Results over Dynamic Linked Data
New Relic_Heroku_Presentation_Dreamforce11
ULMAN GUI Specifications
Continuously Updating Query Results over Real-Time Linked Data
Adopting serverless
ConXedge US Brochure 1 0116 WEB
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Using Processes and Timers for Long-Running Asynchronous Tasks
Distributing Transactions using MassTransit
Join semantics in kafka streams
Insight Demo
Insight Recent Demo
Reliability at scale
Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...
Apache Kafka : Monitoring vs Alerting
Using TICK Stack For System and App Metrics
How Robinhood Built a Real-Time Anomaly Detection System to Monitor and Mitig...
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ad

Similar to Monitoring with riemann (20)

PPTX
My Talk Slides for Clojured Berlin 2019
PDF
OSMC 2022 | Metrics Stream Processing Using Riemann by Pradeep Chhertri
PDF
DevOps Days Tel Aviv 2013: Ignite Talk: Monitoring Patterns with Riemann - It...
PDF
Logging makes perfect - Riemann, Elasticsearch and friends
PDF
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
PDF
Lambda Jam 2015: Event Processing in Clojure
PDF
StatsCraft 2015: Monitoring using riemann - Moshe Zada
PDF
PDF
StackWatch: A prototype CloudWatch service for CloudStack
PDF
The Art of Monitoring (2016).pdf
PDF
March 29, 2016 Dr. Josiah Carlson talks about using Redis as a Time Series DB
PDF
Scalable Online Analytics for Monitoring
PDF
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
PPTX
Challenges of monitoring distributed systems
PPT
Secure information aggregation in sensor networks
PPTX
Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017
PDF
Using Redis Streams To Build Event Driven Microservices And User Interface In...
PPTX
Monitoring patterns for mitigating technical risk
PPTX
Evolution of Monitoring and Prometheus (Dublin 2018)
PPTX
Observability - the good, the bad, and the ugly
My Talk Slides for Clojured Berlin 2019
OSMC 2022 | Metrics Stream Processing Using Riemann by Pradeep Chhertri
DevOps Days Tel Aviv 2013: Ignite Talk: Monitoring Patterns with Riemann - It...
Logging makes perfect - Riemann, Elasticsearch and friends
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Lambda Jam 2015: Event Processing in Clojure
StatsCraft 2015: Monitoring using riemann - Moshe Zada
StackWatch: A prototype CloudWatch service for CloudStack
The Art of Monitoring (2016).pdf
March 29, 2016 Dr. Josiah Carlson talks about using Redis as a Time Series DB
Scalable Online Analytics for Monitoring
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Challenges of monitoring distributed systems
Secure information aggregation in sensor networks
Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017
Using Redis Streams To Build Event Driven Microservices And User Interface In...
Monitoring patterns for mitigating technical risk
Evolution of Monitoring and Prometheus (Dublin 2018)
Observability - the good, the bad, and the ugly
Ad

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Well-logging-methods_new................
PPTX
web development for engineering and engineering
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
composite construction of structures.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Digital Logic Computer Design lecture notes
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Welding lecture in detail for understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Well-logging-methods_new................
web development for engineering and engineering
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
CH1 Production IntroductoryConcepts.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
composite construction of structures.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Digital Logic Computer Design lecture notes
Embodied AI: Ushering in the Next Era of Intelligent Systems
OOP with Java - Java Introduction (Basics)
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Welding lecture in detail for understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx

Monitoring with riemann

  • 2. About (def about-me {:name “Abhishek Anand Amralkar” :shortname “@aaa” :from “Talentica Software Pvt. Ltd” :social { :blog “https://medium.com/@aamralkar” :twitter “https://twitter.com/aamralkar” :github “https://github.com/abhishekamralkar”} })
  • 3. Agenda ● What Riemann is? ● Why Riemann? ● Concepts of Riemann ○ Indexes ○ Streams ○ Events ● Configuration walk through
  • 4. Why Monitoring? ● To understand the Business risks. ● To plan for capacity. ● To catch the issue early in infrastructure/Applications. ● To compli SLA’s promised to customers. ● To know what part of infrastructure/application broken rather customer telling letting us know.
  • 5. Why Monitoring? ● Make sure application is stable and performance is under SLA and available across the globe. ● Systems are getting complex. Humans needs Sleep.
  • 6. What needs to be Monitored? ● Cloud ● Infrastructure ● Operating Systems ● Business Applications
  • 7. How Many Types of Monitoring? ● Push ● Pull ● Customers -- The Good ones who send email/chat to support and tell them “Dear XYZ your application foobar module is not working. Please check”
  • 8. Challenges in Distributed Systems Monitoring? ● Multiple distributed machines emitting hundreds of thousands of metrics every second. ● What metrics need to be monitored? ● What metrics should be alerted? ● How frequently alerted?
  • 9. Challenges in Distributed Systems Monitoring? ● Storage of useful metrics ● Real Time metrics so that we know when the problem occurs. ● Real Time alerting if something fails in system. ● Sending out metrics to Enterprise Monitoring Solutions is costly. ● Informative Dashboards across all environment?
  • 10. What Riemann is? In one word “ Riemann is Event Aggregator” ● A monitoring tool which aggregates events from servers and applications running on those servers. ● Riemann uses powerful stream processing language written in Clojure to aggregates the events.
  • 11. Why Riemann? ● Low latency events processing monitoring engine. ● Can process millions events per second. ● Events and Events Stream processing, which makes its highly adaptable/ideal for dynamic and Distributed Systems. ● Riemann configuration file is Clojure Program. ● Riemann is Highly Configurable.
  • 12. Why Riemann? ● Full control on your Infrastructure/Application monitoring and alerting. ● Can monitor anything. ● Comes with its own Instrumentation to measures own performance. ● Can send alerts to email, chats and SMS and many more.. ● Can get connected to backend time series databases InfluxDB and Graphite to store metrics for historical data.
  • 14. Riemann Events ● Events in Riemann are the base construct. ● Riemann receives events and process them ● Events are structs (records) which Riemann treats as map. (immutable) ● Events fields are referred by Keywords in config like :host, :service, :tags. ● Apart from the standard fields, custom fields can also be sent in the event.
  • 15. How Event Looks Like? A Clojure map (immutable for sure)
  • 16. Riemann Streams ● Streams are Clojure functions that we can define. ● Streams are defined in stream section of the Riemann config file. ● Streams can have child stream. ● Events get passed to the streams for aggregation, modification and alerting. ● Riemann config can have as many streams. ● Riemann comes with a Powerful Stream Processing Language.
  • 18. Riemann Indexes ● Table of current state of all services tracked by Riemann. ● Each event is uniquely indexed by its host and service. The index just keeps track of the most recent event for a given (host, service) pair. ● Index can have TTL (time to leave). ● Events in Index expires after the TTL.
  • 19. Riemann Config ● Below is valid Clojure expression where we are calling logging namespace and init function which takes a map {:key value}.
  • 20. Riemann Config ● Clojure let a lexical scope where its taking vector of one or more binding in our case a vector of riemann host, followed by functions call like tcp-server, udp-server
  • 22. In built Filtering Streams ● where ● match ● tagged ● tagged-all ● expired
  • 23. Rate Events Streams ● rollup ● throttle
  • 24. Coalesce The coalesce stream remembers the last events from each host and service, and sends them all as a vector to its children. We can map that vector of events to a single event--the one with the largest metric--using folds/maximum. Then we just set the service and host, since this event pertains to the system as a whole.
  • 26. Events Grouping ● moving-time-window ● moving-event-window ● fixed-time-window ● fixed-event-window
  • 27. Time for Action ● We will demonstrate how to send Zookeeper metrics to Riemann and InfluxDB. ● Also the Riemann/Grafana Dashboard for the same.

Editor's Notes

  • #15: The event is the base construct of Riemann. Events flow into Riemann and can be processed, counted, collected, manipulated, or exported to other systems. A Riemann event is a struct that Riemann treats as an immutable map.Inside our Riemann configuration, we’ll generally refer to an event field using keywords. Remember that keywords are often used to identify the key in a key/value pair in a map and that our event is an immutable map. We identify keywords by their :prefix. So, the host field would be referenced as :host. A Riemann event can also be supplemented with optional custom fields. You can configure additional fields when you create the event, or you can add additional fields to the event as it is being processed — for example, you could add a field containing a summary or derived metrics to an event.
  • #17: Each arriving event is added to one or more streams. You define streams in the (streams section of your Riemann configuration. Streams are functions you can pass events to for aggregation, modification, or escalation. Streams can also have child streams that they can pass events to. This allows for filtering or partitioning of the event stream, such as by only selecting events from specific hosts or services. You can think of streams like plumbing in the real world. Events enter the plumbing system, flow through pipes and tunnels, collect in tanks and dams, and are filtered by grates and drains. You can have as many streams as you like and Riemann provides a powerful stream processing language that allows you to select the events relevant to a specific stream. For example, you could select events from a specific host or service that meets some other criteria. Like your plumbing, though, streams are designed for events to flow through them and for limited or no state to be retained. For many purposes, however, we do need to retain some state. To manage this state Riemann has the index.
  • #19: Riemann indexes are sort for copy for the last events for each server and service. It is also a cache. Riemann sends a fack event expired.
  • #23: Where takes a predicate, which is a special expression for matching events. After the predicate, where takes any number of child streams, each of which will receive events which the predicate matched. For example, we could email only events which have state "error". The where stream provides some syntactic sugar to allow you to access your event fields. In a where stream you can refer to "standard" fields like host, service, description, metric, and ttl by name. If you need to refer to another field you need to reference the full field name, (:field_name event).
  • #24: rollup will allow a few events to pass through readily. Then it starts to accumulate events, rolling them up into a list which is submitted at the end of a given time interval. Let's define a new stream for alerting the operations team, which sends only five emails per hour (3600 seconds). We'll receive the first four events immediately--and at the end of the hour, a single email with a summary of all the rest.
  • #27: moving-time-window forwards the last n seconds of events moving-event-window forwards the last n events fixed-time-window forwards events from disjoint n-second windows fixed-event-window forwards disjoint sequences of n events