SlideShare a Scribd company logo
Object, Measure Thyself
Greg Opaczewski – Orbitz Worldwide
Michael Ducy – BMC Software
Open Source
• ERMA Project :
http://launchpad.net/erma
• Graphite Project :
http://launchpad.net/graphite
Complex Environment
$10.8 Billion in Gross Bookings in 2007
Myths of Instrumentation
• No Time For Instrumentation
• No Value ($) in Instrumentation
• Instrumentation Causes Bugs
Myth: No Time For
Instrumentation
ERMA
Extremely Reusable Monitoring API
TransactionMonitor monitor =
new TransactionMonitor(“HotelService.purchase”);
try {
response = hotelSupplier.reserve(hotel);
monitor.succeeded();
} catch (ServiceException e) {
monitor.failedDueTo(e);
throw e;
} finally {
monitor.done();
}
ERMA
Self-Instrumentation by:
• Hooks – Interceptors and Listeners
• Abstraction – Abstract the details away
from developers
• AOP – Aspect Oriented Programming
Frameworks - Hooks
• Spring Framework
Frameworks - Abstraction
Self-Instrumentation by:
• Aspect Oriented Programming (AOP)
<aop:config>
<aop:aspect id="transactionMonitorActionAspect"
ref="transactionMonitorActionAdvice">
<aop:pointcut id="transactionMonitorActionPointcut“
expression="target(org.springframework.webflow.execution.Action)
and args(context)"/>
<aop:around pointcut-ref="transactionMonitorActionPointcut“
method="invoke"/>
</aop:aspect>
</aop:config>
Myth: No Time For
Instrumentation
Myth: No Value ($) in
Instrumentation
Event Aggregation
Event Aggregation
Storage and Visualization: Graphite
Graphite
Graphite
Graphite Demo
Value to the Business
• Fixing Production Problems Fast
• Capacity Planning
• Business Product teams rely on ERMA
data
Myth: No Value ($) in
Instrumentation
Myth: Instrumentation Causes
Bugs
Avoid Boilerplate
@Monitored
public interface HotelService {
void purchase(Itinerary itinerary);
void cancel(Itinerary itinerary);
}
Avoid Boilerplate
public interface HotelService {
@Monitored(includeArguments = true)
void purchase(Itinerary itinerary);
void cancel(Itinerary itinerary);
}
Uncovers Bugs
• Allows you to base line across builds
• MASF and SPC
• Event Pattern Monitoring
Base Lining
• Compare present performance vs.
historical performance
• Validate testing via theoretical models
MASF and SPC
Need for Abstraction
abstraction
Webapp
Travel Business Services
Switching Services
Transaction Services
Suppliers
Event Pattern Monitoring
wl|httpIn.shop.search.air.redirect_searchFailure
wl|AirSearchExecuteAction.search
wl|com.orbitz.ojf.OJFClient.getInternal
wl|jiniOut_ShopService_createResultSet
tbs-shop|jiniIn_ShopService_createResultSet
tbs-shop|jiniOut_LowFareSearchService_execute
air-search|jiniIn_LowFareSearchService_execute
air-search|com.orbitz.afo.lib.SearchFilter
air-search|com.orbitz.afo.lib.LowFareSearchServiceImpl.execute
air-search|jiniOut_AirportLookupService_findLocationByIATACode
market|jiniIn_LocationService|DbPoolExhaustedException
Myth: Instrumentation Causes
Bugs
Final Thought
Performance monitoring is easy when the
objects practically measure themselves.
Thank You
• Special thanks to:
– Fellow Co-Authors – Matthew O’Keefe and
Stephen Mullins
– Neil Gunther – Mentoring and Candid Editorial
Review
– Lead Graphite Developer – Chris Davis
Websites
• ERMA Project :
http://launchpad.net/erma
• Graphite Project :
http://launchpad.net/graphite
?
michael@ducy.org
gopaczewski@orbitz.com

More Related Content

PDF
CloudStack Day 14 - Automation: The Key to Hybrid Cloud
PDF
Changing the Way Development and Operations Works
PPTX
Knife CloudStack
PDF
Cut the Digital Transformation Fluff: Creating Metrics That Matter
PDF
Nesma event June '23 - How to use objective metrics as a basis for agile cost...
PDF
What the hell is your software doing at runtime?
PPTX
DevOps monitoring: Feedback loops in enterprise environments
PDF
Software Metrics: Taking the Guesswork Out of Software Projects
CloudStack Day 14 - Automation: The Key to Hybrid Cloud
Changing the Way Development and Operations Works
Knife CloudStack
Cut the Digital Transformation Fluff: Creating Metrics That Matter
Nesma event June '23 - How to use objective metrics as a basis for agile cost...
What the hell is your software doing at runtime?
DevOps monitoring: Feedback loops in enterprise environments
Software Metrics: Taking the Guesswork Out of Software Projects

Similar to Object, measure thyself (19)

PDF
October 2018 Agile Connect Lisbon Meetup
PPTX
The real cost of it franken monitoring
PPTX
TopConf : DevOps Monitoring: Feedback Loops in Enterprise Environments
PPTX
Top 5 Java Performance Metrics, Tips & Tricks
PDF
Metrics driven development 10.09.2014
PPT
Software Measurement: Lecture 3. Metrics in Organization
PDF
Automated functional size measurement for three tier object relational mappin...
PDF
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
PPTX
Improving Developer Productivity With DORA, SPACE, and DevEx
PDF
Object-Oriented Metrics in Practice
PDF
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
PDF
Afrekenen met functiepunten
PDF
Building a Data Driven Organization
PDF
8. how nesma can quick start your software estimate frank vogelezang
PPTX
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
PPTX
Resolving Cost Management and Key Pitfalls of Agile Software Development - Da...
PPTX
Estimation - web software development estimation DrupalCon and DrupalCamp pre...
PPTX
Real-Time Metrics and Distributed Monitoring - Jeff Pierce, Change.org - Dev...
PDF
Nesma autumn conference 2015 - Functional testing miniguide - Ignacio López C...
October 2018 Agile Connect Lisbon Meetup
The real cost of it franken monitoring
TopConf : DevOps Monitoring: Feedback Loops in Enterprise Environments
Top 5 Java Performance Metrics, Tips & Tricks
Metrics driven development 10.09.2014
Software Measurement: Lecture 3. Metrics in Organization
Automated functional size measurement for three tier object relational mappin...
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
Improving Developer Productivity With DORA, SPACE, and DevEx
Object-Oriented Metrics in Practice
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
Afrekenen met functiepunten
Building a Data Driven Organization
8. how nesma can quick start your software estimate frank vogelezang
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Resolving Cost Management and Key Pitfalls of Agile Software Development - Da...
Estimation - web software development estimation DrupalCon and DrupalCamp pre...
Real-Time Metrics and Distributed Monitoring - Jeff Pierce, Change.org - Dev...
Nesma autumn conference 2015 - Functional testing miniguide - Ignacio López C...
Ad

More from Michael Ducy (20)

PDF
Automating Security Response with Serverless
PDF
Rethinking Open Source in the Age of Cloud
PPTX
Open source security tools for Kubernetes.
PDF
Container Runtime Security with Falco
PDF
DevOps in a Cloud Native World
PDF
Securing your Container Environment with Open Source
PDF
Sysdig Open Source Intro
PDF
Monitoring & Securing Microservices in Kubernetes
PDF
Sysdig Tokyo Meetup 2018 02-27
PDF
Principles of Monitoring Microservices
PDF
Survey of Container Build Tools
PDF
Monoliths, Myths, and Microservices - CfgMgmtCamp
PDF
Monoliths, Myths, and Microservices
PPTX
Why Pipelines Matter
PPTX
The Future of Everything
PPTX
Improving Goat Production
PPTX
The Road to Hybrid Cloud is Paved with Automation
PPTX
The Velocity of Bureaucracy
PPTX
The Goat and the Silo
PPTX
Little Tech, Big Impact - Monktoberfest 2013
Automating Security Response with Serverless
Rethinking Open Source in the Age of Cloud
Open source security tools for Kubernetes.
Container Runtime Security with Falco
DevOps in a Cloud Native World
Securing your Container Environment with Open Source
Sysdig Open Source Intro
Monitoring & Securing Microservices in Kubernetes
Sysdig Tokyo Meetup 2018 02-27
Principles of Monitoring Microservices
Survey of Container Build Tools
Monoliths, Myths, and Microservices - CfgMgmtCamp
Monoliths, Myths, and Microservices
Why Pipelines Matter
The Future of Everything
Improving Goat Production
The Road to Hybrid Cloud is Paved with Automation
The Velocity of Bureaucracy
The Goat and the Silo
Little Tech, Big Impact - Monktoberfest 2013
Ad

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
A Presentation on Artificial Intelligence
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
A Presentation on Artificial Intelligence

Object, measure thyself

Editor's Notes

  • #2: MD/GAO Hello, I ’m Mike Ducy… Hello, I ’m Greg Opaczewski a Tech Lead at Orbitz WorldWide. I’m a part of development team named Operations Architecture. We develop site health and performance monitoring tools, primarily for operations teams. But also tools for development teams that need to know how their applications are performing in production.
  • #3: GO I ’m excited to say that two of the major technologies in the Orbitz monitoring platform are now open source software. I encourage you to check out the project sites on launchpad at the URLs listed on the screen. We welcome any feedback you might have for the projects. We will display these again at the end of the presentation as well
  • #4: MD SOA/Distrubuted architectures create problems for administration, support and development teams. Instrumentation of the various applications can provide valuable insights into how they interact and can ease the administration headaches. Orbitz Worldwide (OWW) operates dozens of applications running on hundreds of servers connected in a multi-layered Jini network. In this kind of environment it can be difficult to obtain consistent, uniform instrumentation at all application process boundaries. It is also not ideal to require each and every application development team to become experts at leveraging an instrumentation API and monitoring tools.
  • #5: GO The Orbitz technology platform is very large and the business has grown as well. Orbitz WorldWide operates websites around the world in over a dozen locales. These point of sale and internationalization variables can make service operations even more challenging due to the number of key metrics that must be monitored. In 2007 the sales of over $10 Billion in travel products were dependent on the health of our technology platform. Therefore, OWW has made substantial investments in technology to detect problems early and minimize mean-time-to-repair.
  • #6: MD It can be difficult for a technology organization to commit to provide the level of application instrumentation required to effectively monitor availability, reliability and performance. In this presentation we will examine several myths and explain how our technology overcame them. We ’re huge fans of the Mythbusters show on the Discovery channel, hopefully we have some fans in the audience as well.
  • #7: GO Some believe that it has to be a time consuming process to apply monitoring code. In many approaches , every method call has to be wrapped with instrumentation code. Additionally, standards need to be defined for how the instrumentation will be applied consistently across a system. This obviously requires additional effort on the part of the development teams as well as technical leaders responsible for ensuring the standards are being followed. At Orbitz we ’ve observed that instrumentation (and monitoring concerns in general) are often the last concerns of developers. Naturally a majority or all of the development cycle producing code for new features.
  • #8: We addressed this need to make the process of applying instrumentation simple by creating the Extremely Reusable Monitoring API (ERMA). ERMA consists of an API used for instrumenting Java applications and a library used to process the data produced by the instrumentation. This separation of concerns makes it easy for developers to apply the instrumentation without needing to be concerned with the details of how the data will be consumed.
  • #9: GO Monitor objects in ERMA are Plain Old Java Objects (POJOs). To instrument a transaction, you construct a TransactionMonitor. Upon construction a stopwatch is started, that is used to measure latency. The code to be monitored is surrounded with try/catch/finally blocks. If the business code executes without exception, succeeded is invoked on the TM. However, if an exception is caught, it is recorded in the failedDueTo method. In the finally block, done is invoked in order to stop the stopwatch and pass the Monitor to the MonitoringEngine for processing. This is the handoff point to the processors implemented in the ERMA library.
  • #10: GO So what I ’ve shown you on the previous slide is ERMA applied explicitly, wrapped around the business logic by a developer. The API is simple enough to use on its own. But we wanted to make the application of monitoring even easier. So we have implemented several techniques of self-instrumentation in order to achieve monitoring of the business objects with a minimal amount of effort.
  • #11: GO We use the Spring Framework throughout our system. Spring is a popular open-source framework in the Java development community. Spring MVC and Spring Web Flow (SWF) are used in the web application architecture. Both of these frameworks provide hooks that can be used for monitoring. For example, there is a HandlerInterceptor interface in Spring MVC that we ’ve implemented and configured such that each and every web request is intercepted. ERMA is applied to these requests in a consistent and reusable manner. Spring webflow acts as the controller – it allows you to define flows between components such as actions and views in a webapp. WebFlow provides the FlowExecutionListener. By implementing this listener interface, we provide detailed metrics on how users are interacting with these flows in production.
  • #12: GO Abstraction is another important technique we have used. Orbitz applications are networked together using Jini technology. Jini provides for dynamic service discovery and remote invocation. In a service oriented architecture, applications need a way to find out where the services they depend on are running. Jini provides this as well the ability to add and remove services from the network seamlessly. We created the Orbitz Jini Framework (OJF) in order to abstract away the details of our Jini service network from end developers. The abstraction layer contains a FilterChain facility that we have leveraged for monitoring. ERMA filters are executed both on the client and server side for each and every request. Because OJF is a shared library used consistently across our system, all developers get monitoring of remote method calls for free.
  • #13: GO In the absence of hooks in the form of APIs that can be leveraged for monitoring, Aspect Oriented Programming (AOP) is another good option for providing reusable monitoring code. Spring provides integration with the popular AspectJ AOP framework. We have implemented an ERMA aspect that applies monitoring to all Action component invocations with just a few lines of reusable XML configuration. Spring creates a dynamic proxy for each Action object once at startup, and overhead at runtime is minimal as just one extra method invocation through the proxy is involved.
  • #14: GO As a result of these techniques for applying reusable instrumentation, a developer at Orbitz needs to spend almost no time at all to get basic monitoring coverage. The frameworks that we use were instrumented by a small group of platform developers, many other development teams benefit without the need to spend any additional development time. So have we have BUSTED this myth of no time for instrumentation
  • #15: MD From a standard ROI perspective, instrumentation does not provide real dollars back for the money invested in it ’s development. The value provided is often in reduced downtime, better understanding of code performance, better understanding of code dependencies and interactions of systems, opportunities to increase application performance and enhance the customer experience. While from a long term perspective these enhancements can provide increased revenue, it is not as immediate as implementing something like a new feature with has a more immediate ROI.
  • #16: MD
  • #17: MD The ERMA Instrumented applications sends monitoring data back to the Event Processor engine. The data is sent by a background thread in the ERMA instrumented application which prevents latency from being introduced for the other incoming remote service calls. Since Event Processing is done outside of the instrumented application, this helps to reduce the introduction of latency in the instrumented application. The event processor aggregates and summarizes the various metrics, computing summary statistics (Average, Standard Deviation, % Fail, % Success, etc), and sends these metrics over to Graphite for storage and visualization. The event processor is also capable of sending SNMP alarms when Aggregated data points exceed certain thresholds (e.g. latency is high, or rate of failures is high).
  • #18: MD Graphite consists of several components. The 2 primary components are Carbon and the Web Application. The Carbon component is responsible for reading data into the system and storing it in fixed size database files (similar to RRD files). The web application then reads these files to graphically represent the data for the end user.
  • #19: MD The Graphite composer interface allows you to browse various metrics available for reporting in a hierarchical tree. When a metric is selected a graph of that metric ’s data is drawn in the composer interface. The user can manipulate the graph by selecting size, duration of the data to be graphed, as well as other elements.
  • #20: MD The Graphite Command Line Interface allows a user to draw graphs in individual windows. These windows can be arranged and sized within the browser window. The window layout can also be saved which allows a user to create “dashboard” of commonly used graphs.
  • #21: MD
  • #22: MD Value is not in the instrumentation itself, but in the data that the instrumentation provides. Gartner estimates that on average an hour of downtime can cost an organization $42,000 per hour. Instrumentation data can help reduce the length of outages by making it easier for Operators to locate the problem (via SNMP alarms), and through the tools used to visualize the data.
  • #23: MD Instrumentation provides a Return On Investment by maximizing the ROI of the applications that are monitored.
  • #24: GO Another myth that we ’d like to address is the belief that instrumentation only causes bugs. Boiler plate code often used to apply instrumentation makes code harder to read and maintain. This has a direct effect on developer productivity. It also gives developers an argument to not add the instrumentation at all. We use several techniques that allow our developers to avoid the need to write boilerplate code. We provide reusable, well tested instrumentation packaged in libraries and applied via hooks, abstraction and AOP as described previously.
  • #25: GO Another good option to avoid boilerplate code with ERMA is Annotations. This feature, supported with Java 5 and above, applies instrumentation at build time and requires no ERMA code to be mixed with business code. The example shown here will apply an ERMA TransactionMonitor to each method in this service.
  • #26: GO This example will wrap a TransactionMonitor around only the purchase method. Setting includeArguments to true will include method parameters in the monitor object as an attribute. The nice thing about this approach is how cleanly separated the business code is from the monitoring, it is simply declarative monitoring versus intrusive instrumentation
  • #27: MD Our use of ERMA has introduced very few bugs. In fact, far more bugs have been uncovered using the ERMA data.
  • #28: MD Instrumentation data allows to base line your current application performance against historical data. You can also use instrumentation data to build theoretical models to help verify that testing tools are correctly measuring application performance.
  • #29: MD Historical instrumentation data can be used to build models based on Multivariate Adaptive Statistical Filtering and Statistical Process Control. This allows you to determine if your current application is performing within historical bounds and if something has changed.
  • #30: GO For a large system there is a need to provide an abstraction for monitoring so that developers, operators and business analysts can all share the same language for describing system functionality. Example abstractions from our domain are “hotel search”, “air purchase”, “package selection”, etc. ERMA has some unique design features that enable detailed monitoring put in the context of these abstractions. ERMA assembles hierarchies of events transparently within its MonitoringEngine component. It does this by maintaining a stack of Monitors for each application thread. Whenever a new monitor is created during request processing, a parent-child relationship is introduced with the Monitor previously on top of the stack. At the completion of request processing, the result is a tree data structure can be analyzed to find event patterns. Our Jini framework passes monitoring data back and forth, allowing these event patterns to even span the boundaries of all applications involved in servicing a user request.. As a result, we can accelerate root cause analysis by delivering alarms to our operations teams that contain both the low level root cause of a problem and the impact to our customer.
  • #31: GO What you are looking at here is an example of an ERMA event pattern captured from an air search request in our system. We present these patterns to the operator in such a way that is obvious where an exception originated and how it bubbled up through the stack. Yellow represents any monitor that has recorded a failure and red represents the lowest level failure. We use this information to zero in on the application and component that is contributing most significantly to a site issue. An e.g. alarm that may be sent to our operations center based on this data would read “Air search is failing at 80% due to a maket application DbPoolExhaustedException.” So it is very clear as to the top-level impact (air search is failing) and points to the underlying issue (likely that there are no available database connections). Before we implemented this approach an alarm would be generated for every failure in this pattern and our operators would be left trying to figure out the bigger picture. The improved alarms help ensure proper development resources are engaged quickly when support teams are troubleshooting production issues. They also help to prioritize action on alarm conditions by making clear the impact to our customers. ERMA patterns also enable you to drill down into latency metrics in order to see which components are contributing the most to latency. We are working on a user interface that will make it easier to visualize this kind of data. For example, by generating dynamic UML sequence diagrams based on the runtime behavior of the system.
  • #32: GO So in our experience many more bugs are uncovered using the data produced by instrumentation than are caused by it. Bugs that would otherwise be difficult or impossible to diagnose without the instrumentation. So the myth that instrumentation only causes bugs is busted.
  • #33: MD ??? GO Pragmatic in the design of our monitoring platform / tools. We have acknowledged developers are focused on implementing new features and site improvements. So monitoring of all core metrics is already in place. These include JVM and machine-level statistics such as: cpu, memory and threads. Resource pools such as database connections, also network connections to external suppliers. Frameworks contain monitoring of our business services and detailed monitoring of every request into the web application. Lastly, we have invested in tools that allow us to get tremendous value out of the instrumentation. These tools translate detailed metric data into an improved customer experience.
  • #34: GAO I want to thank CMG for allowing us to share our story with everyone here. I too want to thank Neil Gunther again for all of his help in putting the paper and this presentation together
  • #35: MD Both ERMA and Graphite have been open sourced by Orbitz Worldwide and the teams welcome your feedback and contributions.