SlideShare a Scribd company logo
Miguel Pérez Colino // @mmmmmmpc
CLOUD OPERATIONS WITH STREAMING
ANALYTICS USING BIG DATA TOOLS
DataWorks Summit Sydney 2017
Miguel Pérez Colino
Senior Design Product Manager, ISBU - Red Hat
miguel@redhat.com / @mmmmmmpc
Suneel Marthi
Senior Principal Software Engineer - Red Hat
smarthi@redhat.com / @suneelmarthi
Miguel Pérez Colino // @mmmmmmpc
THE PROBLEM
Miguel Pérez Colino // @mmmmmmpc
Cloud Deployments
Act as one single thing …
… and need to be managed and operated as one
Source: https://commons.wikimedia.org/wiki/File:Auklet_flock_Shumagins_1986.jpg
Miguel Pérez Colino // @mmmmmmpc
Cloud Deployments
They do really scale ...
https://www.cncf.io/blog/2016/08/23/deploying-1000-nodes-of-openshift-on-the-cncf-cluster-part-1/
● Higher scalability
● More workloads per physical
machine (multi-tenant)
● Network and Storage also
Software Defined
● Containers and Microservices
providing more granularity
Miguel Pérez Colino // @mmmmmmpc
THE CHALLENGE
Miguel Pérez Colino // @mmmmmmpc
Questions to solve
● Who is the user?
● What is there problem?
● How do other people solve this problem?
● How can we better solve the problem?
● What would the end result look/feel like?
Miguel Pérez Colino // @mmmmmmpc
[DESIGN THINKING]
THE BEST WAY TO HAVE A GOOD IDEA
IS TO HAVE LOTS OF IDEAS.
Miguel Pérez Colino // @mmmmmmpc
Who is the user? (Personas)
● Cloud Ops
● Developer
● Security Ops
● Monitoring
● Service Designer
● Marketing
● IT Manager
● Infrastructure Architect?
Customer’s issues are mostly
“Day 2” → Operations
● Operate OpenStack
● Operate OpenShift
○ Platform Ops
○ Developer logs
Logs → issue detection + root causes + forensic
Miguel Pérez Colino // @mmmmmmpc
Logs
Config
Telemetry
App debug info
Events
Monitoring
Provides Events,
Consumes Logs
Cloud Ops
Root Cause Analysis
Developer
App Analysis & Debug
Security Engineer
Sec Analysis, Audits
Marketing
Access to stats
Service
DesignerIT Manager
Access to aggregated
data, i.e. SLA, usage
Personae
Miguel Pérez Colino // @mmmmmmpc
What are these problems?
● Data aggregation
○ Ingestion
○ Transport
● Data Model → Common Data Model
● Correlation
○ With external sources (Events / Metrics / Config …)
○ Add more Information types to the solution
● Coherency (Data format and Enrichment)
Miguel Pérez Colino // @mmmmmmpc
Data (What)
Data + Information flow in Log Aggregation
ProcessIngest StoreCollect Query ViewGenerate
Derived from: http://www.dataintensive.info/
Miguel Pérez Colino // @mmmmmmpc
Personae (Motivation)
That need Log Aggregation
Cloud Ops (Apps)
“I want to proactively know
about active or potential
degradation of service”
Cloud Ops (OpenStack)
“User reports that their VM
request failed and returned
error”
Developer (OpenShift)
“My recent commit resulted in
Jenkins test failure”
“Application (multi-tiered)
launched from CloudForms
returns error”
Cloud Suite User
Miguel Pérez Colino // @mmmmmmpc
Situational Awareness (Why)
Or the need of it!
Source: https://en.wikipedia.org/wiki/Situation_awareness
Miguel Pérez Colino // @mmmmmmpc
THE SOLUTION
Miguel Pérez Colino // @mmmmmmpc
Focus on One Persona and Use Case
“Oscar the OpenStack Operator”
Log Aggregation
Monitoring
Provides Events,
Consumes Logs
Cloud Ops
Root Cause
Analysis
Developer
App Analysis &
Debug
Security Engineer
Sec Analysis, Audits
User /
Marketing
Access to stats
Service
DesignerIT Manager
Access to
aggregated data,
i.e. SLA, usage
Miguel Pérez Colino // @mmmmmmpc
Prototyped User Experience
Creating User Interface Mockups
Miguel Pérez Colino // @mmmmmmpc
Implementation
Red Hat’s containerized solution with EFK stack
ElasticFluentd Kibana
ProcessIngest StoreCollect Query ViewCreate
Miguel Pérez Colino // @mmmmmmpc
Implementation
KEEDIO’s containerized solution with a Big Data toolset
SOLR /
Cassandra
Kafka PatternFly
ProcessIngest StoreCollect Query ViewCreate
Flume / NiFi
HDFS
(tier 2)
Spark / FlinkRsyslog
Miguel Pérez Colino // @mmmmmmpc
Implementation: Generation
Rsyslog
What?
● Open-source software used for
forwarding log messages in a network.
● Implements the syslog protocol
Why?
● Fast system for log processing.
● High performance, Low footprint,
included in the OS
● Inputs from wide variety of sources
Miguel Pérez Colino // @mmmmmmpc
Implementation: Ingestion
Apache Nifi
What?
● Reliable system to process and
distribute data
● Language: Java
Why?
● Graphical management
● Clusterizable
● Data Provenance
● Many sources and destinations
Miguel Pérez Colino // @mmmmmmpc
Use Case: Ingestion
Apache Nifi
Easily customize “tagging” and processing
rules via Graphical User Interface
Review steps with data provenance
“Like having an IDE and a Debugger for
data processing rules.”
Miguel Pérez Colino // @mmmmmmpc
Use Case: Ingestion
Miguel Pérez Colino // @mmmmmmpc
Implementation: Collect
Apache Kafka
What?
● Open-source distributed messaging
system
● Languages: Java & Scala
Why?
● High throughput and low-latency
● Clusterable, load balancing and async
send.
● Allows handling real-time data feeds
● Customizable data retention on disk
● Enables multiple consumers on the
same data
● “Rewind and Replay”
Miguel Pérez Colino // @mmmmmmpc
Implementation: Process
Apache Flink
What?
● Open-source stream processing
framework for distributed,
high-performing, always-available, and
accurate data streaming apps.
● Language: Java, Scala
Why?
● Streaming-first, continuous processing
● Fault-tolerant, stateful computations
● Scalable & performance. High
throughput, low latency
● Advanced filtering capabilities (CEP)
Miguel Pérez Colino // @mmmmmmpc
Use Case: Collect + Process
Apache Kafka + Flink
● Long retention periods in queue
enable new post processing targets
to previous events
● Only the right info sent to the right
target
● Detect anomalies and trigger alerts
Miguel Pérez Colino // @mmmmmmpc
Use Case: Collect + Process
Apache Kafka + Flink
● Different storage targets with filtered post
processed output
Miguel Pérez Colino // @mmmmmmpc
Use Case: Collect + Process
Apache Kafka + Flink
● Alerts sent to Kafka. A listener can enable
all kind of alerts
Alert ListenerTelegramE-Mail
Miguel Pérez Colino // @mmmmmmpc
Implementation: Store + Query
Apache Cassandra
What?
● Open source NoSQL database, <key,
value> based
● Language: Java
Why?
● Fault tolerant
● Decentralized & scalable
● Fully proven & high performant
● Flexible data model
Miguel Pérez Colino // @mmmmmmpc
Implementation: View
Patternfly
What?
● Open Source responsive framework for
frontends
● Language: Javascript, Bootstrap,
AngularJS 1
Why?
● Easy to implement new interfaces
● Includes capabilities for graphs
● (d3 JS + c3 JS)
● Natively responsive (mobile / tablet)
● Well supported and extended (Used in
most Red Hat products)
Miguel Pérez Colino // @mmmmmmpc
Implementation
Infrastructure
Miguel Pérez Colino // @mmmmmmpc
Deployment
Miguel Pérez Colino // @mmmmmmpc
Deployment: View
Patternfly
Miguel Pérez Colino // @mmmmmmpc
Deployment: View
Patternfly
Miguel Pérez Colino // @mmmmmmpc
Deployment: View
Patternfly
Miguel Pérez Colino // @mmmmmmpc
USE CASE EXAMPLE (CEP)
Miguel Pérez Colino // @mmmmmmpc
Use Case: OpenStack Timeouts
Network Timeout by default 30 secs
1. Request of VM
2. Request of vPort (Virtual NIC)
3. vPort generated in more than 30 secs → Timeout!
4. Error generating VM
5. No error generating vPort
Need correlation to detect
Miguel Pérez Colino // @mmmmmmpc
Use Case: OpenStack Timeouts
What we see ...
Error in Nova
2016-12-05 10:28:14.292 10253 ERROR nova.compute.manager
[req-190de497-d90f-48e0-91ea-f1f1c0877704688ae4039aad471fbab98da1b1e1fcb6
e21be8c7ab34490386508bbd0c58f511 - - -] Instance failed network setup after 1
attempt(s)
2016-12-05 10:28:14.292 10253 ERROR nova.compute.manager ConnectTimeout: Request to
https://[::1]:9696/v2.0/ports.json timed out
Info in Neutron
2016-12-05 10:28:16.878 13187 INFO neutron.wsgi
[req-827495e1-2ae2-41c1-b51b-2eda57f4ba1d688ae4039aad471fbab98da1b1e1fcb6
e21be8c7ab34490386508bbd0c58f511 - - -] ::1 - - [05/Dec/2016 10:28:16] "POST
/v2.0/ports.json HTTP/1.1" 201 900 32.589028
Miguel Pérez Colino // @mmmmmmpc
Use Case: OpenStack Timeouts
Both lines detected correlated and alert generated. → Alert sent to Kafka
ErrorAlert:
Nova-3-2017-04-28 12:48:20.321
Neutron-6-2017-04-28 12:48:23.123
{"severity":"3","body":"[ Generating synthetic log
CEP_ID=67c8c1cc3d48c3987aee13dce5cf35a1]","spriority":"191","hostname":"overcloud-co
mpute-1","protocol":"TCP","port":"7790","sender":"/192.168.1.16","service":"Nova","i
d":"c1318482-11a1-41cd-949e-5195c54767e5","facility":"23","timestamp":"2017-04-28
12:48:20.321"}
{"severity":"6","body":"[ Generating synthetic log
CEP_ID=67c8c1cc3d48c3987aee13dce5cf35a1]","spriority":"191","hostname":"overcloud-co
ntroller-1","protocol":"TCP","port":"7793","sender":"/192.168.1.13","service":"Neutr
on","id":"e617d049-7e40-4114-8727-c6c41140567e","facility":"23","timestamp":"2017-04
-28 12:48:23.123"}
Miguel Pérez Colino // @mmmmmmpc
Use Case: OpenStack Timeouts
Both lines detected correlated and alert generated. → Alert routed to Telegram
Miguel Pérez Colino // @mmmmmmpc
THANK YOU
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews
Miguel Pérez Colino // @mmmmmmpc
BACKUP SLIDES
Miguel Pérez Colino // @mmmmmmpc
Deployment

More Related Content

PDF
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
PDF
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
PPTX
Do Flink on Web with FLOW
PPTX
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
PDF
Powering machine learning workflows with Apache Airflow and Python
PPTX
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
PDF
Clearing Airflow Obstructions
PDF
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
Do Flink on Web with FLOW
A stream: Ad-hoc Shared Stream Processing - Jeyhun Karimov, DFKI GmbH
Powering machine learning workflows with Apache Airflow and Python
Real Time Experiment Analytics at Pinterest with Apache Flink - Ben Liu & Par...
Clearing Airflow Obstructions
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...

What's hot (9)

PDF
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
PDF
Implementing MySQL Database-as-a-Service using open source tools
PDF
Capacity Planning Infrastructure for Web Applications (Drupal)
PDF
Realizing the promise of portability with Apache Beam
PPTX
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
PDF
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
PPTX
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
PPTX
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
PDF
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Implementing MySQL Database-as-a-Service using open source tools
Capacity Planning Infrastructure for Web Applications (Drupal)
Realizing the promise of portability with Apache Beam
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Ad

Similar to Cloud operations with streaming analytics using big data tools (20)

PPTX
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
PDF
Machine Learning Infrastructure
PDF
TensorFlow 16: Building a Data Science Platform
PDF
Meetup 2020 - Back to the Basics part 101 : IaC
PDF
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
PPTX
FluentD for end to end monitoring
PDF
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
PDF
Path to continuous delivery
PDF
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
PDF
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
PDF
Adtech scala-performance-tuning-150323223738-conversion-gate01
PDF
Adtech x Scala x Performance tuning
PDF
GE Capital Legacy Modernization and Mainframe Conversion
PDF
Nexxworks bootcamp ML6 (27/09/2017)
PDF
Solving enterprise challenges through scale out storage &amp; big compute final
PDF
Apache Beam and Google Cloud Dataflow - IDG - final
PDF
Machine learning model to production
PDF
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
PDF
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
PDF
DevOps Fest 2020. immutable infrastructure as code. True story.
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Machine Learning Infrastructure
TensorFlow 16: Building a Data Science Platform
Meetup 2020 - Back to the Basics part 101 : IaC
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
FluentD for end to end monitoring
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
Path to continuous delivery
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech x Scala x Performance tuning
GE Capital Legacy Modernization and Mainframe Conversion
Nexxworks bootcamp ML6 (27/09/2017)
Solving enterprise challenges through scale out storage &amp; big compute final
Apache Beam and Google Cloud Dataflow - IDG - final
Machine learning model to production
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
DevOps Fest 2020. immutable infrastructure as code. True story.
Ad

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Machine Learning_overview_presentation.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Approach and Philosophy of On baking technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
Machine Learning_overview_presentation.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectroscopy.pptx food analysis technology
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
A comparative analysis of optical character recognition models for extracting...
Approach and Philosophy of On baking technology
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Cloud operations with streaming analytics using big data tools

  • 1. Miguel Pérez Colino // @mmmmmmpc CLOUD OPERATIONS WITH STREAMING ANALYTICS USING BIG DATA TOOLS DataWorks Summit Sydney 2017 Miguel Pérez Colino Senior Design Product Manager, ISBU - Red Hat miguel@redhat.com / @mmmmmmpc Suneel Marthi Senior Principal Software Engineer - Red Hat smarthi@redhat.com / @suneelmarthi
  • 2. Miguel Pérez Colino // @mmmmmmpc THE PROBLEM
  • 3. Miguel Pérez Colino // @mmmmmmpc Cloud Deployments Act as one single thing … … and need to be managed and operated as one Source: https://commons.wikimedia.org/wiki/File:Auklet_flock_Shumagins_1986.jpg
  • 4. Miguel Pérez Colino // @mmmmmmpc Cloud Deployments They do really scale ... https://www.cncf.io/blog/2016/08/23/deploying-1000-nodes-of-openshift-on-the-cncf-cluster-part-1/ ● Higher scalability ● More workloads per physical machine (multi-tenant) ● Network and Storage also Software Defined ● Containers and Microservices providing more granularity
  • 5. Miguel Pérez Colino // @mmmmmmpc THE CHALLENGE
  • 6. Miguel Pérez Colino // @mmmmmmpc Questions to solve ● Who is the user? ● What is there problem? ● How do other people solve this problem? ● How can we better solve the problem? ● What would the end result look/feel like?
  • 7. Miguel Pérez Colino // @mmmmmmpc [DESIGN THINKING] THE BEST WAY TO HAVE A GOOD IDEA IS TO HAVE LOTS OF IDEAS.
  • 8. Miguel Pérez Colino // @mmmmmmpc Who is the user? (Personas) ● Cloud Ops ● Developer ● Security Ops ● Monitoring ● Service Designer ● Marketing ● IT Manager ● Infrastructure Architect? Customer’s issues are mostly “Day 2” → Operations ● Operate OpenStack ● Operate OpenShift ○ Platform Ops ○ Developer logs Logs → issue detection + root causes + forensic
  • 9. Miguel Pérez Colino // @mmmmmmpc Logs Config Telemetry App debug info Events Monitoring Provides Events, Consumes Logs Cloud Ops Root Cause Analysis Developer App Analysis & Debug Security Engineer Sec Analysis, Audits Marketing Access to stats Service DesignerIT Manager Access to aggregated data, i.e. SLA, usage Personae
  • 10. Miguel Pérez Colino // @mmmmmmpc What are these problems? ● Data aggregation ○ Ingestion ○ Transport ● Data Model → Common Data Model ● Correlation ○ With external sources (Events / Metrics / Config …) ○ Add more Information types to the solution ● Coherency (Data format and Enrichment)
  • 11. Miguel Pérez Colino // @mmmmmmpc Data (What) Data + Information flow in Log Aggregation ProcessIngest StoreCollect Query ViewGenerate Derived from: http://www.dataintensive.info/
  • 12. Miguel Pérez Colino // @mmmmmmpc Personae (Motivation) That need Log Aggregation Cloud Ops (Apps) “I want to proactively know about active or potential degradation of service” Cloud Ops (OpenStack) “User reports that their VM request failed and returned error” Developer (OpenShift) “My recent commit resulted in Jenkins test failure” “Application (multi-tiered) launched from CloudForms returns error” Cloud Suite User
  • 13. Miguel Pérez Colino // @mmmmmmpc Situational Awareness (Why) Or the need of it! Source: https://en.wikipedia.org/wiki/Situation_awareness
  • 14. Miguel Pérez Colino // @mmmmmmpc THE SOLUTION
  • 15. Miguel Pérez Colino // @mmmmmmpc Focus on One Persona and Use Case “Oscar the OpenStack Operator” Log Aggregation Monitoring Provides Events, Consumes Logs Cloud Ops Root Cause Analysis Developer App Analysis & Debug Security Engineer Sec Analysis, Audits User / Marketing Access to stats Service DesignerIT Manager Access to aggregated data, i.e. SLA, usage
  • 16. Miguel Pérez Colino // @mmmmmmpc Prototyped User Experience Creating User Interface Mockups
  • 17. Miguel Pérez Colino // @mmmmmmpc Implementation Red Hat’s containerized solution with EFK stack ElasticFluentd Kibana ProcessIngest StoreCollect Query ViewCreate
  • 18. Miguel Pérez Colino // @mmmmmmpc Implementation KEEDIO’s containerized solution with a Big Data toolset SOLR / Cassandra Kafka PatternFly ProcessIngest StoreCollect Query ViewCreate Flume / NiFi HDFS (tier 2) Spark / FlinkRsyslog
  • 19. Miguel Pérez Colino // @mmmmmmpc Implementation: Generation Rsyslog What? ● Open-source software used for forwarding log messages in a network. ● Implements the syslog protocol Why? ● Fast system for log processing. ● High performance, Low footprint, included in the OS ● Inputs from wide variety of sources
  • 20. Miguel Pérez Colino // @mmmmmmpc Implementation: Ingestion Apache Nifi What? ● Reliable system to process and distribute data ● Language: Java Why? ● Graphical management ● Clusterizable ● Data Provenance ● Many sources and destinations
  • 21. Miguel Pérez Colino // @mmmmmmpc Use Case: Ingestion Apache Nifi Easily customize “tagging” and processing rules via Graphical User Interface Review steps with data provenance “Like having an IDE and a Debugger for data processing rules.”
  • 22. Miguel Pérez Colino // @mmmmmmpc Use Case: Ingestion
  • 23. Miguel Pérez Colino // @mmmmmmpc Implementation: Collect Apache Kafka What? ● Open-source distributed messaging system ● Languages: Java & Scala Why? ● High throughput and low-latency ● Clusterable, load balancing and async send. ● Allows handling real-time data feeds ● Customizable data retention on disk ● Enables multiple consumers on the same data ● “Rewind and Replay”
  • 24. Miguel Pérez Colino // @mmmmmmpc Implementation: Process Apache Flink What? ● Open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming apps. ● Language: Java, Scala Why? ● Streaming-first, continuous processing ● Fault-tolerant, stateful computations ● Scalable & performance. High throughput, low latency ● Advanced filtering capabilities (CEP)
  • 25. Miguel Pérez Colino // @mmmmmmpc Use Case: Collect + Process Apache Kafka + Flink ● Long retention periods in queue enable new post processing targets to previous events ● Only the right info sent to the right target ● Detect anomalies and trigger alerts
  • 26. Miguel Pérez Colino // @mmmmmmpc Use Case: Collect + Process Apache Kafka + Flink ● Different storage targets with filtered post processed output
  • 27. Miguel Pérez Colino // @mmmmmmpc Use Case: Collect + Process Apache Kafka + Flink ● Alerts sent to Kafka. A listener can enable all kind of alerts Alert ListenerTelegramE-Mail
  • 28. Miguel Pérez Colino // @mmmmmmpc Implementation: Store + Query Apache Cassandra What? ● Open source NoSQL database, <key, value> based ● Language: Java Why? ● Fault tolerant ● Decentralized & scalable ● Fully proven & high performant ● Flexible data model
  • 29. Miguel Pérez Colino // @mmmmmmpc Implementation: View Patternfly What? ● Open Source responsive framework for frontends ● Language: Javascript, Bootstrap, AngularJS 1 Why? ● Easy to implement new interfaces ● Includes capabilities for graphs ● (d3 JS + c3 JS) ● Natively responsive (mobile / tablet) ● Well supported and extended (Used in most Red Hat products)
  • 30. Miguel Pérez Colino // @mmmmmmpc Implementation Infrastructure
  • 31. Miguel Pérez Colino // @mmmmmmpc Deployment
  • 32. Miguel Pérez Colino // @mmmmmmpc Deployment: View Patternfly
  • 33. Miguel Pérez Colino // @mmmmmmpc Deployment: View Patternfly
  • 34. Miguel Pérez Colino // @mmmmmmpc Deployment: View Patternfly
  • 35. Miguel Pérez Colino // @mmmmmmpc USE CASE EXAMPLE (CEP)
  • 36. Miguel Pérez Colino // @mmmmmmpc Use Case: OpenStack Timeouts Network Timeout by default 30 secs 1. Request of VM 2. Request of vPort (Virtual NIC) 3. vPort generated in more than 30 secs → Timeout! 4. Error generating VM 5. No error generating vPort Need correlation to detect
  • 37. Miguel Pérez Colino // @mmmmmmpc Use Case: OpenStack Timeouts What we see ... Error in Nova 2016-12-05 10:28:14.292 10253 ERROR nova.compute.manager [req-190de497-d90f-48e0-91ea-f1f1c0877704688ae4039aad471fbab98da1b1e1fcb6 e21be8c7ab34490386508bbd0c58f511 - - -] Instance failed network setup after 1 attempt(s) 2016-12-05 10:28:14.292 10253 ERROR nova.compute.manager ConnectTimeout: Request to https://[::1]:9696/v2.0/ports.json timed out Info in Neutron 2016-12-05 10:28:16.878 13187 INFO neutron.wsgi [req-827495e1-2ae2-41c1-b51b-2eda57f4ba1d688ae4039aad471fbab98da1b1e1fcb6 e21be8c7ab34490386508bbd0c58f511 - - -] ::1 - - [05/Dec/2016 10:28:16] "POST /v2.0/ports.json HTTP/1.1" 201 900 32.589028
  • 38. Miguel Pérez Colino // @mmmmmmpc Use Case: OpenStack Timeouts Both lines detected correlated and alert generated. → Alert sent to Kafka ErrorAlert: Nova-3-2017-04-28 12:48:20.321 Neutron-6-2017-04-28 12:48:23.123 {"severity":"3","body":"[ Generating synthetic log CEP_ID=67c8c1cc3d48c3987aee13dce5cf35a1]","spriority":"191","hostname":"overcloud-co mpute-1","protocol":"TCP","port":"7790","sender":"/192.168.1.16","service":"Nova","i d":"c1318482-11a1-41cd-949e-5195c54767e5","facility":"23","timestamp":"2017-04-28 12:48:20.321"} {"severity":"6","body":"[ Generating synthetic log CEP_ID=67c8c1cc3d48c3987aee13dce5cf35a1]","spriority":"191","hostname":"overcloud-co ntroller-1","protocol":"TCP","port":"7793","sender":"/192.168.1.13","service":"Neutr on","id":"e617d049-7e40-4114-8727-c6c41140567e","facility":"23","timestamp":"2017-04 -28 12:48:23.123"}
  • 39. Miguel Pérez Colino // @mmmmmmpc Use Case: OpenStack Timeouts Both lines detected correlated and alert generated. → Alert routed to Telegram
  • 40. Miguel Pérez Colino // @mmmmmmpc THANK YOU plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews
  • 41. Miguel Pérez Colino // @mmmmmmpc BACKUP SLIDES
  • 42. Miguel Pérez Colino // @mmmmmmpc Deployment