SlideShare a Scribd company logo
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
#GALAXZ16
OwnIT Through Proactive Monitoring
Quis custodiet ipsos custodes?
Who will monitor the monitors themselves?
@jstanley232
1
Jason Stanley
Enterprise Monitoring Engineer @Secure_24
jstanley734@gmail.com
Github.com/jstanley23
Zenoss Community Forums/IRC: jstanley
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 2
Secure-24 has 15 years of experience delivering managed IT operations, application
hosting and cloud services to enterprises worldwide. We manage SAP, Oracle, Hyperion,
JD Edwards, and other mission critical applications across all industries and for
businesses of every size. Our industry-leading client satisfaction rates result from
lowering IT operational costs and our relentless focus on superior service and support.
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Zenoss is the primary monitoring tool
for infrastructure, client devices and
applications.
3
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Replaced other
monitoring platforms
with Zenoss
• Oracle Enterprise Manager
• Solarwinds
• Nimsoft
• Nagios
• Tidal
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Primary Zenoss environment
• Zenoss 4.2.5 RPS 538
• 100+ ZenPacks
• 9k+ devices
• 1.7m+ data points
• Dedicated servers
• 3 dedicated Hubs
• 16 dedicated multi-tenant collectors
• 9 customer dedicated collectors
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Monitoring from within
Zenoss provides a lot of built-in self monitoring and additional ZenPacks.
 Zenoss Daemons
› Processes
› Heartbeats
 Zenoss Toolbox Scans
 Tracebacks and exceptions
 ZenPacks
› ZenPacks.zenoss.MySqlMonitor
› ZenPacks.Zenoss.RabbitMQ
› ZenPacks.Zenoss.Memcached
6
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Daemon monitoring
Built-in Methods
 Process
› Most daemon processes are already added
› Polls every 3 minutes
› Monitors CPU, memory, and count
 /Status/Heartbeat
› Takes longer to spawn event than processes
› Can signify issues with the daemon or hub
 Note:
› Verify new daemons are added to processes
› Heartbeats are same instance only
7
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Zenoss ZenPacks
 ZenPacks.zenoss.MySqlMonitor *
› Critical to monitor up/down
› Primary use internal is graphs and trending
 ZenPacks.Zenoss.RabbitMQ *
› Critical to monitor up/down
› Primary use internal is graphs and trending
 ZenPacks.Zenoss.Memcached
› Can be monitoring internally for up/down
› Can have negative user experience if down
*Should monitor externally
8
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Zenoss Toolbox Scans and Exceptions Events
https://github.com/zenoss/zenoss.toolbox
 Setup scans in crontab to set and forget
 All toolbox scans now create events!
 Warning:
› Do not run zencatalogscan –f without
zenrelationscan and findposkeyerror coming
back clean first.
9
Exceptions and tracebacks
 Modelers, datasources and templates can
error out
 Check your events for sneaky errors:
› Message: traceback
› Message: exception
 TALES exceptions will come in under the
Hub’s full name and is a single event.
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Event Monitoring
Event flow in Zenoss is one of the more important
aspects of the tool. Without events, you will not be
alerted to any issues in your environments.
For this reason, we place a special need on monitoring
this aspect.
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Monitoring from afar
We focus on monitoring Zenoss event flow from a remote
location. In case Zenoss goes down, we will still get alerted.
 Zenoss Webserver
 RabbitMQ
› rawevents
› zenevents
› signal
 Zeneventserver
 Synthetic Event Checks
› zeneventd
 Event processing and transforms
› Zeneventserver
 Changing event state
11
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Web (Http) checks
12
Both zenwebserver and zeneventserver can be
monitored with a simple http check.
 zenwebserver
› Http check to 8080 to the Dashboard URL with a regex
 /zport/dmd/Dashboard
 zeneventserver
› Http check to 8084 to hit the zeneventserver API
 /zeneventserver/api/1.0/events
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
RabbitMQ
13
Very important to monitor RabbitMQ
queues. If something happens with
RabbitMQ, event processing is
compromised in Zenoss.
For this reason, we will monitor the
queues remotely. Alerting on anything
above a certain threshold.*
* This threshold should be set depending on your environment.
 We see 3 queues are the most important.
› rawevents
 Where raw events from the collectors are sent
› zenevents
 After events are processed by zeneventd, they are sent here for
zeneventserver
› signal
 Events that are true for any trigger and need to be processed by
a notification are sent here for zenactiond to process.
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Synthetic Checks
14
 Pre-existing event check
› Checks the functionality of zeneventserver
by
 Acknowledging a pre-existing event *
 Un-acknowledging a pre-existing event *
› Verifies the following is up and running:
 ZenDS
 zeneventserver
 zenwebserver
› Only uses a single event, if the event is
closed a new one must be created
• Script can be used to create event for you and provide the event
ID to use
 New event check
› Checks the Zenoss event process by:
 Opening a new event
 Finding new event
 Verifying event was modified by transform
 Closing event
 Verifying event was closed
› Verifies the following is up and running:
 ZenDS
 zenwebserver
 zeneventd
 zeneventserver
› Creates a new event each and every time
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Take Aways
The script we use for monitoring can be found on the
community wiki or on github.com
Along with documentation on how to use it.
http://wiki.zenoss.org/Monitoring_Zenoss
https://github.com/jstanley23/MonitoringZenoss
© 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
Question me this

More Related Content

PPT
ZenPack Development with Jane Curry
PPTX
General Electric Migrates to Zenoss 5.0
PPTX
Troubleshooting Zenoss: A Support Perspective
PPTX
Zenoss presentation (nur nabilah hassan)
PDF
Meetup - An introduction to Salt
PPTX
Dev Talk: Event Manipulation and Testing
PDF
Getting started with salt stack
PDF
Zabbix monitoring in 5 pictures
ZenPack Development with Jane Curry
General Electric Migrates to Zenoss 5.0
Troubleshooting Zenoss: A Support Perspective
Zenoss presentation (nur nabilah hassan)
Meetup - An introduction to Salt
Dev Talk: Event Manipulation and Testing
Getting started with salt stack
Zabbix monitoring in 5 pictures

What's hot (20)

PPTX
Zabbix 3.2 presentation June 2017
PDF
Zabbix Monitoring Platform
PPTX
Google Cloud Platform monitoring with Zabbix
PPTX
RuSIEM IT assets
ODP
MySQL Monitoring Shoot Out
PPTX
Improve App Performance & Reliability with NGINX Amplify
PPTX
Pxosys Webinar Amplify your Security
ODP
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
PDF
Trouble Ticket Integration with Zabbix in Large Environment
PPTX
Blue Teamin' on a Budget [of zero]
PPTX
Zabbix
PDF
OSMC 2014: Interesting use cases of Zabbix improvements in latest versions | ...
PPTX
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
PPTX
ChinaNetCloud Online Lecture:Something About Tshark
PPTX
hallenges of Monitoring Big Infrastructure - Icinga Camp Milan 2019
PPTX
PDF
Stop using Nagios (so it can die peacefully)
PPTX
Deploy RvSIEM (eng)
PPTX
Zabbix visión general del sistema - 04.12.2013
PPTX
Secure Your Apps with NGINX Plus and the ModSecurity WAF
Zabbix 3.2 presentation June 2017
Zabbix Monitoring Platform
Google Cloud Platform monitoring with Zabbix
RuSIEM IT assets
MySQL Monitoring Shoot Out
Improve App Performance & Reliability with NGINX Amplify
Pxosys Webinar Amplify your Security
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
Trouble Ticket Integration with Zabbix in Large Environment
Blue Teamin' on a Budget [of zero]
Zabbix
OSMC 2014: Interesting use cases of Zabbix improvements in latest versions | ...
ChinaNetCloud - The Zabbix Database - Zabbix Conference 2014
ChinaNetCloud Online Lecture:Something About Tshark
hallenges of Monitoring Big Infrastructure - Icinga Camp Milan 2019
Stop using Nagios (so it can die peacefully)
Deploy RvSIEM (eng)
Zabbix visión general del sistema - 04.12.2013
Secure Your Apps with NGINX Plus and the ModSecurity WAF
Ad

Viewers also liked (20)

PPTX
Intro to Zenoss by Andrew Kirch
ODP
Open Source Monitoring Tools Shootout
PPTX
Cloud stack monitoring with zenoss
PDF
8 Source Code Cloudstack Developer Day
ODP
Puppet and Apache CloudStack
PPTX
Webinar widescreen zenoss service-now integration final draft
ODP
Open Source Monitoring in 2014, from #monitoringssucks to #monitoringlove and...
PPTX
Migrating IT to the Cloud - Zenoss in Amazon Web Services
PDF
Zenoss administration
DOC
Zenoss Manual
PPTX
Blending ITIL, Agile, DevOps and LeanUX at Auto Trader UK
PPTX
TechWiseTV Workshop: APIC-EM
ODP
Monitoring with ElasticSearch
PPTX
Zenoss & ServiceNow Integration - Incident Management & CMDB
PDF
Agile IT Service Management
PPTX
Waterfall-ITIL vs Agile-DevOps
PDF
ITIL and DevOps can be friends
PDF
Agile and ITIL Continuous Delivery
PDF
30 important-virtualization-vmware-interview-questions-with-answers
Intro to Zenoss by Andrew Kirch
Open Source Monitoring Tools Shootout
Cloud stack monitoring with zenoss
8 Source Code Cloudstack Developer Day
Puppet and Apache CloudStack
Webinar widescreen zenoss service-now integration final draft
Open Source Monitoring in 2014, from #monitoringssucks to #monitoringlove and...
Migrating IT to the Cloud - Zenoss in Amazon Web Services
Zenoss administration
Zenoss Manual
Blending ITIL, Agile, DevOps and LeanUX at Auto Trader UK
TechWiseTV Workshop: APIC-EM
Monitoring with ElasticSearch
Zenoss & ServiceNow Integration - Incident Management & CMDB
Agile IT Service Management
Waterfall-ITIL vs Agile-DevOps
ITIL and DevOps can be friends
Agile and ITIL Continuous Delivery
30 important-virtualization-vmware-interview-questions-with-answers
Ad

Similar to Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring (20)

PPTX
A Vision for Transformation
PPTX
Dev Talk: Event Manipulation and Testing
PPTX
Grainger: Our Rookie Year with Zenoss
PDF
ZENOSS EVENT MANAGEMENT
PPTX
Intro to Zenoss by Andrew Kirch
PPTX
WTF is Sensu and Monitoring
PDF
OSMC 2008 | Monitoring Tools Shootout by Tom De Cooman
PDF
Lesson_08_Continuous_Monitoring.pdf
PDF
Learning Nagios module 1
ODP
Monitoring shootout loadays
ODP
opensource Monitoring Tool , an overview
PDF
Zenoss Monitroing – zendmd Scripting Guide
PPTX
Practical DMD Scripting
PPTX
Unlock the Intelligent Data Center with VMware & Zenoss
PPTX
( Ethical hacking tools ) Information grathring
DOCX
Fully Automated Nagios (FAN)
PDF
Proactive monitoring tools or services - Open Source
PPTX
MobZabbix.pptx
PDF
Open Source Monitoring in 2015
PDF
Better Bug Stomping with Zend Studio and Zend Server
A Vision for Transformation
Dev Talk: Event Manipulation and Testing
Grainger: Our Rookie Year with Zenoss
ZENOSS EVENT MANAGEMENT
Intro to Zenoss by Andrew Kirch
WTF is Sensu and Monitoring
OSMC 2008 | Monitoring Tools Shootout by Tom De Cooman
Lesson_08_Continuous_Monitoring.pdf
Learning Nagios module 1
Monitoring shootout loadays
opensource Monitoring Tool , an overview
Zenoss Monitroing – zendmd Scripting Guide
Practical DMD Scripting
Unlock the Intelligent Data Center with VMware & Zenoss
( Ethical hacking tools ) Information grathring
Fully Automated Nagios (FAN)
Proactive monitoring tools or services - Open Source
MobZabbix.pptx
Open Source Monitoring in 2015
Better Bug Stomping with Zend Studio and Zend Server

More from Zenoss (20)

PDF
DevOps Introduction - AWS Boston Meetup - AWS Presentation
PDF
Integrating Operational Response Automation Into Your Code - AWS Boston Meetu...
PDF
Transforming IT Ops - AWS Boston Meetup - Zenoss Presentation
PPTX
Zenoss as Core Element for Video QOS
PPTX
Zenoss as a Service: How to Get There
PPTX
Why Zenoss is Right for You
PPTX
The Newgistics Digital Transformation Journey
PDF
TransUnion's Impact of Impact
PPTX
Skeptics in the Church of Data: Getting Evangelical
PPTX
Product Overview: An Analytics Primer
PPTX
Leveraging the JSON API as a Self-Service Tool
PPTX
Lack of Automation Ruins Lives
PPTX
IT4IT: Realize a Digital Strategy with ServiceNow
PPTX
Growing Monitoring to Keep Up with Technology and Business Demands
PPTX
Empowering Marketing Solutions Teams
PPTX
Empathy in Monitoring
PPTX
Developing ZenPacks the Right Way: Introducing the SDK
PPTX
Demystifying Network Function Virtualization (NFV) Service Assurance
PPTX
What is Zenoss as a Service?
PPTX
My ZaaS Life - University of Maryland University College
DevOps Introduction - AWS Boston Meetup - AWS Presentation
Integrating Operational Response Automation Into Your Code - AWS Boston Meetu...
Transforming IT Ops - AWS Boston Meetup - Zenoss Presentation
Zenoss as Core Element for Video QOS
Zenoss as a Service: How to Get There
Why Zenoss is Right for You
The Newgistics Digital Transformation Journey
TransUnion's Impact of Impact
Skeptics in the Church of Data: Getting Evangelical
Product Overview: An Analytics Primer
Leveraging the JSON API as a Self-Service Tool
Lack of Automation Ruins Lives
IT4IT: Realize a Digital Strategy with ServiceNow
Growing Monitoring to Keep Up with Technology and Business Demands
Empowering Marketing Solutions Teams
Empathy in Monitoring
Developing ZenPacks the Right Way: Introducing the SDK
Demystifying Network Function Virtualization (NFV) Service Assurance
What is Zenoss as a Service?
My ZaaS Life - University of Maryland University College

Recently uploaded (20)

PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Nekopoi APK 2025 free lastest update
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
medical staffing services at VALiNTRY
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
CHAPTER 2 - PM Management and IT Context
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Nekopoi APK 2025 free lastest update
Odoo POS Development Services by CandidRoot Solutions
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Odoo Companies in India – Driving Business Transformation.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
2025 Textile ERP Trends: SAP, Odoo & Oracle
How Creative Agencies Leverage Project Management Software.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025
medical staffing services at VALiNTRY
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

  • 1. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 #GALAXZ16 OwnIT Through Proactive Monitoring Quis custodiet ipsos custodes? Who will monitor the monitors themselves? @jstanley232 1 Jason Stanley Enterprise Monitoring Engineer @Secure_24 jstanley734@gmail.com Github.com/jstanley23 Zenoss Community Forums/IRC: jstanley
  • 2. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 2 Secure-24 has 15 years of experience delivering managed IT operations, application hosting and cloud services to enterprises worldwide. We manage SAP, Oracle, Hyperion, JD Edwards, and other mission critical applications across all industries and for businesses of every size. Our industry-leading client satisfaction rates result from lowering IT operational costs and our relentless focus on superior service and support.
  • 3. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Zenoss is the primary monitoring tool for infrastructure, client devices and applications. 3
  • 4. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Replaced other monitoring platforms with Zenoss • Oracle Enterprise Manager • Solarwinds • Nimsoft • Nagios • Tidal
  • 5. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Primary Zenoss environment • Zenoss 4.2.5 RPS 538 • 100+ ZenPacks • 9k+ devices • 1.7m+ data points • Dedicated servers • 3 dedicated Hubs • 16 dedicated multi-tenant collectors • 9 customer dedicated collectors
  • 6. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Monitoring from within Zenoss provides a lot of built-in self monitoring and additional ZenPacks.  Zenoss Daemons › Processes › Heartbeats  Zenoss Toolbox Scans  Tracebacks and exceptions  ZenPacks › ZenPacks.zenoss.MySqlMonitor › ZenPacks.Zenoss.RabbitMQ › ZenPacks.Zenoss.Memcached 6
  • 7. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Daemon monitoring Built-in Methods  Process › Most daemon processes are already added › Polls every 3 minutes › Monitors CPU, memory, and count  /Status/Heartbeat › Takes longer to spawn event than processes › Can signify issues with the daemon or hub  Note: › Verify new daemons are added to processes › Heartbeats are same instance only 7
  • 8. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Zenoss ZenPacks  ZenPacks.zenoss.MySqlMonitor * › Critical to monitor up/down › Primary use internal is graphs and trending  ZenPacks.Zenoss.RabbitMQ * › Critical to monitor up/down › Primary use internal is graphs and trending  ZenPacks.Zenoss.Memcached › Can be monitoring internally for up/down › Can have negative user experience if down *Should monitor externally 8
  • 9. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Zenoss Toolbox Scans and Exceptions Events https://github.com/zenoss/zenoss.toolbox  Setup scans in crontab to set and forget  All toolbox scans now create events!  Warning: › Do not run zencatalogscan –f without zenrelationscan and findposkeyerror coming back clean first. 9 Exceptions and tracebacks  Modelers, datasources and templates can error out  Check your events for sneaky errors: › Message: traceback › Message: exception  TALES exceptions will come in under the Hub’s full name and is a single event.
  • 10. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Event Monitoring Event flow in Zenoss is one of the more important aspects of the tool. Without events, you will not be alerted to any issues in your environments. For this reason, we place a special need on monitoring this aspect.
  • 11. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Monitoring from afar We focus on monitoring Zenoss event flow from a remote location. In case Zenoss goes down, we will still get alerted.  Zenoss Webserver  RabbitMQ › rawevents › zenevents › signal  Zeneventserver  Synthetic Event Checks › zeneventd  Event processing and transforms › Zeneventserver  Changing event state 11
  • 12. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Web (Http) checks 12 Both zenwebserver and zeneventserver can be monitored with a simple http check.  zenwebserver › Http check to 8080 to the Dashboard URL with a regex  /zport/dmd/Dashboard  zeneventserver › Http check to 8084 to hit the zeneventserver API  /zeneventserver/api/1.0/events
  • 13. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 RabbitMQ 13 Very important to monitor RabbitMQ queues. If something happens with RabbitMQ, event processing is compromised in Zenoss. For this reason, we will monitor the queues remotely. Alerting on anything above a certain threshold.* * This threshold should be set depending on your environment.  We see 3 queues are the most important. › rawevents  Where raw events from the collectors are sent › zenevents  After events are processed by zeneventd, they are sent here for zeneventserver › signal  Events that are true for any trigger and need to be processed by a notification are sent here for zenactiond to process.
  • 14. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Synthetic Checks 14  Pre-existing event check › Checks the functionality of zeneventserver by  Acknowledging a pre-existing event *  Un-acknowledging a pre-existing event * › Verifies the following is up and running:  ZenDS  zeneventserver  zenwebserver › Only uses a single event, if the event is closed a new one must be created • Script can be used to create event for you and provide the event ID to use  New event check › Checks the Zenoss event process by:  Opening a new event  Finding new event  Verifying event was modified by transform  Closing event  Verifying event was closed › Verifies the following is up and running:  ZenDS  zenwebserver  zeneventd  zeneventserver › Creates a new event each and every time
  • 15. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16
  • 16. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Take Aways The script we use for monitoring can be found on the community wiki or on github.com Along with documentation on how to use it. http://wiki.zenoss.org/Monitoring_Zenoss https://github.com/jstanley23/MonitoringZenoss
  • 17. © 2016 All Rights Reserved CONFIDENTIAL#GALAXZ16 Question me this

Editor's Notes

  • #2: In this breakout session we will be discussing monitoring some important aspects of Zenoss Who am I? - Maintaining the health of the monitoring infrastructure - Developing ZenPacks that: - Add functionality to Zenoss - Building out new monitoring for new devices and applications - And ZenPacks that extend the API to work with our other internal systems What I hope you take from this is: How to monitor your Zenoss instance, the importance of monitoring your Zenoss instance. So that you are the first to know about issues. Or at least getting you thinking about monitoring your instance. Before we get into that, I would like to give you some background on Secure-24, our environment and what we use Zenoss for
  • #3: www.secure-24.com Secure-24 has 15 years of experience delivering managed IT operations, application hosting and cloud services. We manage: - Oracle E-Business Suite - PeopleSoft - JD Edwards - Hyperion - SAP - other critical applications
  • #4: Devices we use in Zenoss - Cisco UCS and HP Proliant - Networking - Cisco, Juniper, F5, Riverbed and Radware Networking devices - EMC and NetApp - Windows and Linux - SAP, Hyperion, Oracle, MySql, PeopleSoft, Progress DB - Microsoft Applications: - Exchange, Sql, SharePoint, Lync - Along with Citrix and VMWare View
  • #5: We used to have a variety of monitoring tools. Different teams would have different applications Solarwinds for networking Nagios for Linux and Windows servers Tidal/OEM for applications We have moved away from these other monitoring platforms Focusing on using Zenoss as our primary tool Taking anything we liked from our older tools and added that functionality into Zenoss
  • #8: Zenoss daemon monitoring is setup out of the box in two different forms: Heartbeats Process monitoring Heartbeats are sent out from the daemon to zenhub and then passed on to zeneventserver. If heartbeats stop coming in, then a /Status/Heartbeat event will be created Monitoring daemons with processes works like all other process monitoring. You can monitoring CPU, memory, count and up/down status. I find this very useful when deploying new ZenPacks. For example, when I deploy a new ZenPack that has a new zenpython datasource I like to watch the memory usage of the daemon over time to make sure it does not have a memory leak or other performance issues. You will generally get a process down event before a heartbeat event.
  • #9: MySql and RabbitMQ are both critical applications. Monitoring them for at least up/down is a must. If either of these go down, you will not get any events. Think about setting up an external monitoring server.
  • #12: Here is where we start talking about setting up an external monitoring instance. This can be another Zenoss instance, another monitoring product or a simple server running scheduled scripts. We migrated from Nagios to Zenoss, and since we already had Nagios servers up and running and integrated in with Service-Now, we just used that to perform our external monitoring
  • #13: We started with some basic Http checks, these checks were designed to perform some simple monitoring of the two things we cared about most at the time. Web interface (Users being able to login and use Zenoss) Events (Users being able to view events in the Event Console) The first check is a simple http check to the Dashboard page that verifies a string on the page. This allows us to monitoring zenwebserver (nginx/zope) and LDAP authentication The second was a http check directly to zeneventserver using the API to get a list of events. This allowed us to monitor zeneventserver and make sure it was accepting connections. This was a good start, but not ideal.
  • #14: Very important to monitor Rabbit queues. If something happens to RabbitMQ, event processing will not work. We wanted to start monitoring for the symptoms to our issues we were having. And one of the common symptoms was Rabbit queues backing up. Zenoss has 3 major queues it uses to process event: Events come in from the collectors and are placed into the rawevents queue by zenhub Zeneventd then processes the messages in rawevents, applies any transforms and then places the event into zenevents queue Zeneventserver processes the messages in zenevents and runs them through any triggers. If an event matches a trigger the event will be placed into the signal queue Zenactiond then processes any messages in signal using the proper Notification method So, you can have several different kinds of issues, but one of the symptoms for each are backed up Rabbit queues. With each of these issues you will see a backup of messages in RabbitMQ: Deadlocks in zends zenactiond daemon is down or overwhelmed New (poor performing) transform added to environment We use a script to connect to RabbitMQ and pull the current message count in these 3 queues, and if any counts are higher than set thresholds it will alert. And we do it remotely. But this wasn’t enough, we wanted to know as soon as zeneventserver was having an issue.. Which lead us to creating some synthetic checks…
  • #15: We took that a setup farther and created a new check that would more closely follow the event flow process.
  • #16: When I was here last year at GalaxZ15, I really liked the technical discussions, take aways from those discussions and talks outside of the break outs. So, when Zenoss asked me to speak this year, I wanted to make a point of giving something to the community that they could use and take back with them. The script that I mentioned today that performs the Rabbit queue monitoring and synthetic event checks can be found on the community wiki and in github. My hope is that people will find this information and script useful in some way. If not using it out right to monitor their Zenoss instance, then at least able to give them ideas on how to monitor their instance in a proactive way. I am open to any feedback you have about this session and the script. Feel free to post comments on the wiki or bugs/feature requests on github. And I believe you can leave feedback on this session in the GalaxZ app. I would love to hear your thoughts on both topics.
  • #17: When I was here last year at GalaxZ15, I really liked the technical discussions, take aways from those discussions and talks outside of the break outs. So, when Zenoss asked me to speak this year, I wanted to make a point of giving something to the community that they could use and take back with them. The script that I mentioned today that performs the Rabbit queue monitoring and synthetic event checks can be found on the community wiki and in github. My hope is that people will find this information and script useful in some way. If not using it out right to monitor their Zenoss instance, then at least able to give them ideas on how to monitor their instance in a proactive way. I am open to any feedback you have about this session and the script. Feel free to post comments on the wiki or bugs/feature requests on github. And I believe you can leave feedback on this session in the GalaxZ app. I would love to hear your thoughts on both topics.
  • #18: Questions?