SlideShare a Scribd company logo
Learning to Cook:
Network Management Recipes
https://cbsstlouis.files.wordpress.com/2013/01/kidscooking.jpg
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Mr. White has over fifteen years of experience designing and managing
the deployment of Systems Monitoring and Event Management software.
Currently, he is serving as the Operational Readiness Leader for a Fortune
50 Enterprise. Mr. White has also held positions including Executive
Architect at IBM, leader of the Monitoring and Event Management
organization at Nationwide Insurance and owner of a Service
Management Consultancy developing solutions for a wide variety of
organizations, including the Mexican Secretaría de Hacienda
y Crédito Público, Telmex, Wal-Mart of Mexico, JP Morgan Chase,
Nationwide Insurance and the US Navy Facilities and Engineering
Command.
Andrew White
Long Time System Management Expert
UX Evangelist
Brighttalk   learning to cook- network management recipes - final
For those of you who are
sleeping right now…
Brighttalk   learning to cook- network management recipes - final
This topic isn’t going to help much.
SORRY :(
http://weheartit.com/entry/12433848
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Ground rules for this
session…
•  If you can’t tell if I am trying to be funny…
–  
GO AHEAD AND LAUGH!
•  Feel free to text, tweet, yammer, or whatever.
Use 
•  If you have a question, no need to wait until
the end. Just interrupt me. Seriously… I
don’t mind.
I have a lot of experience leading
Systems and Event Management teams
Latency
I am here today to share some of what I have learned about
User
Experience
And more importantly, I am here today to talk about
What do I mean by latency and user experience?
Definitions:
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
LaŸtenŸcy – [LEYT-n-see]
-noun, plural -cies
1.  The state of being latent
2.  The time that elapses between a stimulus and the
response to it
3.  The state of being not yet evident or active
http://www.flickr.com/photos/25822731@N02/4644128723/
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
ExŸpeŸriŸence – [ik-SPEER-ee-uh’ns]
-noun
1.  The apprehension of an object, thought, or emotion through
the senses or mind
2.  Direct personal participation or observation; actual knowledge
or contact
3.  A particular incident, feeling, etc., that a person has
undergone
-verb
4.  To be emotionally or aesthetically moved by; to feel
5.  To learn by perceiving, understanding, or remembering
http://www.flickr.com/photos/51035626620@N01/170061976/sizes/l/in/photostream/
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
When you put them together we get:

The ultimate measure of success for any system is
the perception of its performance. The less
interactive a system becomes the more likely its
performance will be perceived to be poor.
Latency is the mother of inactivity!
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The Two Dimensions of
Latency…
Internal Latency vs. External Latency
Actual Latency vs. Perceived Latency
This is what user experience is all about
In other words: Perceived = Fn(Internal+External)Variation )
We need to recognize when we
have problems to solve
Brighttalk   learning to cook- network management recipes - final
Maybe.
Let me show you why this is important…
Is 5 seconds really bad?
Start…
Start…
Observed Maximum:
90th Percentile:
5.44 seconds…
15.4 seconds…
Start…
Start…
Observed Maximum:
90th Percentile:
DONE!5.44 seconds…
15.4 seconds…
Start…
Start…
Observed Maximum:
90th Percentile:
DONE!
DONE!
5.44 seconds…
15.4 seconds…
If you were the one on the phone
with one of those customers…
how would you fill that silence?
Why does any of this matter?
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
No complaint… is more common
than that of a scarcity of money
-Adam Smith, Wealth of Nations
*Among adults who accessed the internet with a mobile phone in the past 12 months (n=1,001) – Gomez Mobile Web Experience Survey conducted by Equation Research
58% of mobile phone users expect websites
to load as quickly, almost as quickly or faster
on their mobile phone, compared to the
computer they use at home*
http://www.flickr.com/photos/lucianbickerton/3858380291/sizes/l/
*Among adults who accessed the internet with a mobile phone in the past 12 months (n=1,001) – Gomez Mobile Web Experience Survey conducted by Equation Research
60% of mobile web users have had a problem in the
past year when accessing a website on their phone*
http://www.flickr.com/photos/rickyromero/1357938629/sizes/l/
*Among adults who accessed the internet with a mobile phone in the past 12 months (n=602) – Gomez Mobile Web Experience Survey conducted by Equation Research
Slow load time was the number on issue,
experience by almost 75% of them*
http://bighugelabs.com/onblack.php?id=2497744197&size=large
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Our Problem Statement:
The business needs to reliably reach its customers and
users regardless of where they may be located. Latency
forces close geographic proximity of the components
and limits the quality of service provided to
geographically distributed customers.
If the users can’t use it, it
doesn’t work.
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Our Constraints
At the same time, there are a few inescapable facts we face:
1.  Today’s users demand reliable systems to do their work
2.  IT systems will mirror the complexity of the businesses
they support
3.  Our environments must be massive to handle the workload
4.  Business continuity requires geographic diversity in our
deployment locations
5.  The speed of light isn’t changing any time soon
When all of these happen at the same time…
Ug…
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Question
Is there a better way to figure out what
monitoring would help?
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Itemize the
existing
monitors
Brainstorm
potential gaps
to fill
Deploy new
monitors
Identify the
potential
risks
Itemize the
existing
monitors
Determine
if which
gaps exist
Fill the
monitoring
gaps
Current Approach
Proposed Approach
Picking Better Monitors
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
What Do You Want To Accomplish?
Your monitoring should help you answer:
•  How will we know if the users are getting the experience
they are expecting?
•  How much capacity do we need during normal and peak
times to ensure user expectations are met?
•  How quickly can the provider we select ramp up to meet
our needs if we find that the service is underperforming?
•  How fast do we need to be able to access additional
capacity once it is ready for us?
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Composite
Applications
Site Content
Search
Session
Information
User Login
& Identity Mgmt
Content Mgmt
System
Social Network
Widgets
Site Tracking
& Analytics
Banner Ads &
Revenue Generators
Multimedia &
CDN Content
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Composite Applications Are
Everywhere
•  ATG (Oracle) – Shopping Cart
•  Estara – Click to Chat
•  Twitter Widget – Social Networking
•  Gigya – Social Networking
•  Google Maps API – GeoLocation
•  Facebook Widget – Social Networking
•  Google Analyics – User Tracking
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Seeing Is Believing
Real User Monitoring
Would Report 94ms
Response Time.
The page seemed
“done” to me
1.2 seconds later
The time spent rendering
represented 93% of the
user experienced latency
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The Same Old Problem
Corporate
LANs & VPNs
ISP
Connection
DNS & Internet
Services
Content Mgmt
System
Social Network
Widgets
Site Tracking
& Analytics
Banner Ads &
Revenue Generators
Multimedia &
CDN Content
Home Wireless
& Broadband
Mobile Broadband
Is It My Data Center?
•  Configuration errors
•  Application design issues
•  Code defects
•  Insufficient infrastructure
•  Oversubscription Issues
•  Poor routing optimization
•  Low cache hit rate
Is It a Service Provider Problem?
•  Non-optimized mobile content
•  Bad performance under load
•  Blocking content delivery
•  Incorrect geo-targeted content
Is it an ISP
Problem?
•  Peering problems
•  ISP Outages Is it My Code or a Browser Problem?
•  Missing content
•  Poorly performing JavaScript
•  Inconsistent CSS rendering
•  Browser/device incompatibility
•  Page size too big
•  Conflicting HTML tag support
•  Too many objects
•  Content not optimized for device
The Cloud
Distributed
Database
Mainframe
Network
Middleware
Storage
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Cognitive Dissonance
Corporate
LANs & VPNs
Distributed
Database
Mainframe
Network
Middleware
Storage
ISP
Connection
DNS & Internet
Services
Content Mgmt
System
Social Network
Widgets
Site Tracking
& Analytics
Banner Ads & 
Revenue Generators
Multimedia &
CDN Content
Home Wireless
& Broadband
Mobile Broadband
The Part You Control
The Part They Experience
…meanwhile
the user is
NOT
happy
All our systems
look great,
SLA’s are being
met…
You Have More
Control Here Than
You Think
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Gaining Perspective
Requires Balance
Packet Capture
Synthetic Transactions
Client Monitoring
Client Monitoring
Synthetic Transactions
Server Probe
1.  Client to the Server
2.  Server to the Client
3.  “3rd Party” Vantage Point
4.  Synthetic Transactions
Four Perspectives of User Experience
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Why Multiple Perspectives?
Know Your Customer:
•  What they do? 
§  Customers care about completing tasks
NOT whether the homepage is available
•  Where they do it from?
§  Your customers don’t live in the cloud, test from their perspective
•  When they do it?
§  Test at peak and normal traffic levels, to find all the problems
•  What expectations do customers have?
§  Is 5 seconds fast enough or does it have to be quicker?
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
What Does Good
Monitoring Look Like?
Corporate
LANs & VPNs
Load Balancer
Load Balancer
Firewall
Switch
Web Server Farm
Database
Data Power
Mainframe
Middleware
Load Balancer
1.  System Availability
2.  Operating System Performance
3.  Hardware Monitoring
4.  Service/Daemon and Process Availability
5.  Error Logs
6.  Application Resource KPIs
7.  End-to-End Transactions
8.  Point of Failure Transactions
9.  Fail-Over Success
10. “Activity Monitors” and “Reverse Hockey Stick”
Elements of Good Monitoring
32 4 5 61
7
8
9 10
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
When decisions are not made based
on information, it’s called gambling.
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Finding Metrics That Matter
§  Will the metric be used in a report? If so, which one? How is it used in the report?
§  Will the metric be used in a dashboard? If so, which one? How will it be used?
§  What action(s) will be taken if an alert is generated? Who are the actors? Will a ticket
be generated? If so, what severity?
§  How often is this event likely to occur? What is the impact if the event occurs? What
is the likelihood it can be detected by monitoring?
§  Will the metric help identify the source of a problem? Is it a coincident / symptomatic
indicator?
§  Is the metric always associated with a single problem? Could this metric become a
false indicator?
§  What is the impact if this goes undetected?
§  What is the lifespan for this metric? What is the potential for changes that may
reduce the efficacy of the metric?
Evaluating the Effectiveness of a Metric
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Watch your words
737-900ER 747-400ER
Maximum Number of
Passengers
215
 524
Maximum Crusing Speed (mph)
 511
 570
A 737 and a 747 both travel around 500 mph but the 747
carries twice as many people. Would you say it is twice as fast?
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
What Matters Most?
Dr. Lee
Goldman
Cook County Hospital,
Chicago, IL
§  Is the patient feeling unstable
angina?
§  Is there fluid in the patient’s lungs?
§  Is the patient’s systolic blood
pressure below 100?#

The Goldman Algorithm
Prediction of Patients Expected to
Have a Heart Attack Within 72 Hours
0	
  
20	
  
40	
  
60	
  
80	
  
100	
  
Traditional Techniques
 Goldman Algorithm
By paying attention to what really matters, Dr.
Goldman improved the “false negatives” by 20
percentage points and eliminated the “false
positives” altogether.
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
•  Server Metrics
–  Server Response Time
–  Server Connection Time
–  Refused Session Percentage
–  Unresponsive Session Percentage
•  Network Metrics
–  Network Round Trip Time
–  Retransmission Delay
–  Effective Network Round Trip Time
–  Network Connection Time
•  Application Metrics
–  Total Transaction Time
–  Data Transfer Time
Really Helpful KPIs
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Beware of Averages
75th
Percentile
50th
Percentile
25th
Percentile
0.5 0.7 0.9 1.8 2.5 2.5 2.6 2.9 3.3 3.5
Average
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Your Mission…
In addition to monitoring for system availability, we are
here to help manage latency.
The Recipe:
1.  Continually map, monitor, and categorize all
sources of latency
2.  Help identify and remove all sources that are found
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The Critical Path of
Performance
Browser
Workstation
OS
Workstation
Hardware
 Client LAN
Corporate
WAN
Datacenter
LAN
 Etc.
Web Server
Web Server
OS
Web Server
Hardware
Datacenter
LAN
Middleware
Server
Hardware
Middleware
Server OS
Middleware
Application
 Etc.
Database
Server
Database
Server OS
Database
Server
HBA
SAN
Fabric
Switch
Array
Hardware
Array
Controller
Hardware
Cache
Disk
Drives
 Etc.
Client Node
Middleware
Database
Starting the journey…
SNMP
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
MIBs and OIDs
root
iso (1)
org (3)
dod (6)
Internet (1)
Interfaces (2)
 IP (4)
System (1)
ifOperStatus = ..1.3.6.1.2.1.2.2.1.8.0
MIB-2 (1)
Directory (1)
 Experimental (3)
Mgmt (2)
 Private (4)
Juniper (2636)
 Cisco (9)
Apple (63)
Microsoft (311)
Port OperStatus = .1.3.6.1.4.1.9.5.1.4.1.1.6.0Functionally the same
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
MIBs and OIDs
root
iso (1)
org (3)
dod (6)
Internet (1)
Interfaces (2)
 IP (4)
System (1)
MIB-2 (1)
Directory (1)
 Experimental (3)
Mgmt (2)
 Private (4)
Juniper (2636)
 Cisco (9)
Apple (63)
Microsoft (311)
Port Index = .1.3.6.1.4.1.9.5.1.4.1.1.4.0 A MIB is the set
of OIDs for a
defining a set of
information in the
database
Port Type = .1.3.6.1.4.1.9.5.1.4.1.1.5.0
Port OperStatus = .1.3.6.1.4.1.9.5.1.4.1.1.6.0
Port IfIndex = .1.3.6.1.4.1.9.5.1.4.1.1.11.0
portMacControlUnknownProtocolFrames = .1.3.6.1.4.1.9.5.1.4.1.1.21.0
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
RMON
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
RMON is “Flow-Based”
Monitoring
RMON v1 (RFC 2819)
•  Statistics: real-time LAN statistics e.g. utilization,
collisions, CRC errors
•  History: history of selected statistics
•  Alarm: definitions for RMON SNMP traps to be
sent when statistics exceed defined thresholds
•  Hosts: host specific LAN statistics e.g. bytes
sent/received, frames sent/received
•  Hosts top N: record of N most active
connections over a given time period
•  Matrix: the sent-received traffic matrix between
systems
•  Filter: defines packet data patterns of interest e.g.
MAC address or TCP port
•  Capture: collect and forward packets matching
the Filter
•  Event: send alerts (SNMP traps) for the Alarm
group
•  Token Ring: extensions specific to Token Ring
RMON v2 (RFC 4502)
•  Protocol Directory: list of protocols the probe can
monitor
•  Protocol Distribution: traffic statistics for each
protocol
•  Address Map: maps network-layer (IP) to MAC-
layer addresses
•  Network-Layer Host: layer 3 traffic statistics, per
each host
•  Network-Layer Matrix: layer 3 traffic statistics, per
source/destination pairs of hosts
•  Application-Layer Host: traffic statistics by
application protocol, per host
•  Application-Layer Matrix: traffic statistics by
application protocol, per source/destination pairs
of hosts
•  User History: periodic samples of user-specified
variables
•  Probe Configuration: remote configure of probes
•  RMON Conformance: requirements for RMON2
MIB conformance
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The RMON MIBs
root
iso (1)
org (3)
dod (6)
Internet (1)
Interfaces (2)
 IP (4)
System (1)
MIB-2 (1)
Directory (1)
 Experimental (3)
Mgmt (2)
 Private (4)
RMON (16)
RMON data is
stored in a MIB
and can be
collected using
SNMP
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
MIBs and OIDs
root
iso (1)
org (3)
dod (6)
Internet (1)
Interfaces (2)
 IP (4)
System (1)
MIB-2 (1)
Directory (1)
 Experimental (3)
Mgmt (2)
 Private (4)
RMON (16)
rmonEventsV2 
statistics 
history 
alarm 
hosts 
hostTopN 
matrix 
filter 
Capture
Event
tokenRing
protocolDir
protocolDist
addressMao
nlHost
nlMatrix
alHost
alMatrix
usrHistory
probeConfig
rmonConformance
mediaIndependentStats 
switchRMON
interfaceTopNMIB
hcAlarmMIB
=  .1.3.6.1.2.1.16.0
=  .1.3.6.1.2.1.16.1.0
=  .1.3.6.1.2.1.16.2.0
=  .1.3.6.1.2.1.16.3.0
=  .1.3.6.1.2.1.16.4.0
=  .1.3.6.1.2.1.16.5.0
=  .1.3.6.1.2.1.16.6.0
=  .1.3.6.1.2.1.16.7.0
=  .1.3.6.1.2.1.16.8.0
=  .1.3.6.1.2.1.16.9.0
=  .1.3.6.1.2.1.16.10.0
=  .1.3.6.1.2.1.16.11.0
=  .1.3.6.1.2.1.16.12.0
=  .1.3.6.1.2.1.16.13.0
=  .1.3.6.1.2.1.16.14.0
=  .1.3.6.1.2.1.16.15.0
=  .1.3.6.1.2.1.16.16.0
=  .1.3.6.1.2.1.16.17.0
=  .1.3.6.1.2.1.16.18.0
=  .1.3.6.1.2.1.16.19.0
=  .1.3.6.1.2.1.16.20.0
=  .1.3.6.1.2.1.16.21.0
=  .1.3.6.1.2.1.16.22.0
=  .1.3.6.1.2.1.16.23.0
=  .1.3.6.1.2.1.16.24.0
All this information lives in just
one table and most people
don’t know about it!
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Setting Thresholds
Falling Threshold
Rising Threshold
Sample Interval
Policy Activations
Netflow
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
How we view the network
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
How our applications view it
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
What a Flow Record Looks Like
http://www.cisco.com/c/en/us/td/docs/ios/fnetflow/configuration/guide/12_2sr/fnf_12_2_sr_book/fnetflow_overview.html
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
One record, multiple uses
http://www.cisco.com/c/en/us/td/docs/ios/fnetflow/configuration/guide/12_2sr/fnf_12_2_sr_book/fnetflow_overview.html
Packet Inspection
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The Progression
SNMP
Granularity
Accuracy
RMON
Netflow
Packet
Inspection
That is great but we need more…
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Shallow vs Deep Packet Inspection
SPI is very focused on header information from OSI Layers 3 & 4 (IP, TCP, UDP, etc.)
DPI processes header and datagram information (HTTP, SQL, SIP, etc.)
IP Header
 TCP Header
GET /userLogin.jsp HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14
(KHTML, like Gecko) Version/7.0.3 Safari/7046A194A
Shallow Packet Inspection (SPI)
Deep Packet Inspection (DPI)
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Shallow Packet Inspection
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Degraded Threshold – The point at which users will
complain about poor performance

Excessive Threshold – The point at which users will
stop using the application due to poor
performance
Two Different Thresholds
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
3.  Compare network latency across sites
2.  Prove the value of a server upgrade1.  Document the results of QoS changes
Validating Changes
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Solving Problems
Pervasiveness:
The problem is
effecting user
across your
network
Brighttalk   learning to cook- network management recipes - final
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Troubleshooting VoIP
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Don’t Commit a Felony
Putting it all together
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Using Indices
•  Network Congestion Index
•  Packet Loss SLAs
NCI = (Packets/sec + Avg Payload) * (Avg Latency + Avg Bandwidth)
App Owner Controlled Network Controlled
bps < min(rwin/rtt, MSS/(rtt*sqrt(loss)))
For example, to achieve a gigabit per second with TCP on a coast-to-coast
path (rtt = 40 msec), with 1500 byte packets, the loss rate can not exceed
8.5x10^-8! If the loss rate was even 0.1% (far better than most SLAs), TCP
would be limited to just over 9 Mbps. [Note that large packet sizes help. If
packets were n times larger, the same throughput could be achieved with n^2
times as much packet loss.]
(C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Let’s keep the
conversation going…
Andrew.P.White@Gmail.com
ReverendDrew
SystemsManagementZen.Wordpress.com
systemsmanagementzen.wordpress.com/feed/
@SystemsMgmtZen
ReverendDrew
APWhite@us.ibm.com
614-306-3434
Brighttalk   learning to cook- network management recipes - final

More Related Content

PDF
Brighttalk outage insurance- what you need to know - final
PDF
Brighttalk reason 114 for learning math - final
PDF
Bright talk running a cloud - final
PDF
Brighttalk getting back on track - final
PDF
Brighttalk brining it all together - final
PDF
Brighttalk converged infrastructure and it operations management - final
PDF
Brighttalk high scale low touch and other bedtime stories - final
PDF
How to improve your system monitoring
Brighttalk outage insurance- what you need to know - final
Brighttalk reason 114 for learning math - final
Bright talk running a cloud - final
Brighttalk getting back on track - final
Brighttalk brining it all together - final
Brighttalk converged infrastructure and it operations management - final
Brighttalk high scale low touch and other bedtime stories - final
How to improve your system monitoring

What's hot (19)

PDF
State of on call report 2014
PDF
Marcus Ranum on Bad Idea Zombies
PDF
Dit yvol3iss41
PPTX
Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...
PDF
Automated decision making with predictive applications – Big Data Amsterdam
PPTX
Intro to a Data-Driven Computer Security Defense
PDF
Building a Successful Organization By Mastering Failure
PDF
Covid 19: Understanding the context using systems thinking techniques webinar...
PDF
"Security on the Brain" Security & Risk Psychology Workshop Nov 2013
PDF
Dr Steve Goldman's Top Ten Business Continuity Predictions / Trends for 2014
PPTX
The foundations of agile
PDF
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
PPT
Do end-users fit the informatics requirements?
PDF
Ebusiness Auditing
PPTX
When every shot counts - can you make a difference or will you fail
PDF
Beyond the Knowledge Base: Turning Data into Wisdom - an ITSM Academy Webinar
PDF
CoITus {TASK.to September 2012}
PPTX
IT Survey: UK and Germany SMEs
PPT
Help! I'm an accidental techie
State of on call report 2014
Marcus Ranum on Bad Idea Zombies
Dit yvol3iss41
Bad Advice, Unintended Consequences, and Broken Paradigms: Think & Act Di...
Automated decision making with predictive applications – Big Data Amsterdam
Intro to a Data-Driven Computer Security Defense
Building a Successful Organization By Mastering Failure
Covid 19: Understanding the context using systems thinking techniques webinar...
"Security on the Brain" Security & Risk Psychology Workshop Nov 2013
Dr Steve Goldman's Top Ten Business Continuity Predictions / Trends for 2014
The foundations of agile
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Do end-users fit the informatics requirements?
Ebusiness Auditing
When every shot counts - can you make a difference or will you fail
Beyond the Knowledge Base: Turning Data into Wisdom - an ITSM Academy Webinar
CoITus {TASK.to September 2012}
IT Survey: UK and Germany SMEs
Help! I'm an accidental techie
Ad

Similar to Brighttalk learning to cook- network management recipes - final (20)

PPT
Social Enterprise: Trust; Vision; Revolution
PDF
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
PDF
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
PPTX
Making Sense of Threat Reports
PPT
Sharing Securely SIMposium 2010
DOCX
Tech Time - Benefits of Being a Beta Site for CUES Magazine
PDF
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
PDF
Quick Response Fraud Detection
PPT
Faster In The Cloud
PDF
IRJET- Social Network Message Credibility: An Agent-based Approach
PDF
IRJET - Social Network Message Credibility: An Agent-based Approach
DOCX
ISE 510 Final Project Scenario Background Limetree Inc. is a resea.docx
PDF
Luncheon 2015-11-19 - Lessons Learned from Avid Life Media by Rob Davis
PDF
Focused agile audit planning using analytics
PDF
iCTRE: The Informal community Transformer into Recommendation Engine
PDF
3 Steps to Better Web Governance
PDF
online news portal system
PDF
Download Study Resources for Business Intelligence and Analytics Systems for ...
PDF
Retrospective data analytics slides
PDF
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
Social Enterprise: Trust; Vision; Revolution
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
Making Sense of Threat Reports
Sharing Securely SIMposium 2010
Tech Time - Benefits of Being a Beta Site for CUES Magazine
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
Quick Response Fraud Detection
Faster In The Cloud
IRJET- Social Network Message Credibility: An Agent-based Approach
IRJET - Social Network Message Credibility: An Agent-based Approach
ISE 510 Final Project Scenario Background Limetree Inc. is a resea.docx
Luncheon 2015-11-19 - Lessons Learned from Avid Life Media by Rob Davis
Focused agile audit planning using analytics
iCTRE: The Informal community Transformer into Recommendation Engine
3 Steps to Better Web Governance
online news portal system
Download Study Resources for Business Intelligence and Analytics Systems for ...
Retrospective data analytics slides
Business Intelligence and Analytics Systems for Decision Support 10th Edition...
Ad

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Electronic commerce courselecture one. Pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Modernizing your data center with Dell and AMD
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Spectral efficient network and resource selection model in 5G networks
Electronic commerce courselecture one. Pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Chapter 3 Spatial Domain Image Processing.pdf
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
Modernizing your data center with Dell and AMD
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Brighttalk learning to cook- network management recipes - final

  • 1. Learning to Cook: Network Management Recipes https://cbsstlouis.files.wordpress.com/2013/01/kidscooking.jpg
  • 2. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Mr. White has over fifteen years of experience designing and managing the deployment of Systems Monitoring and Event Management software. Currently, he is serving as the Operational Readiness Leader for a Fortune 50 Enterprise. Mr. White has also held positions including Executive Architect at IBM, leader of the Monitoring and Event Management organization at Nationwide Insurance and owner of a Service Management Consultancy developing solutions for a wide variety of organizations, including the Mexican Secretaría de Hacienda y Crédito Público, Telmex, Wal-Mart of Mexico, JP Morgan Chase, Nationwide Insurance and the US Navy Facilities and Engineering Command. Andrew White Long Time System Management Expert UX Evangelist
  • 4. For those of you who are sleeping right now…
  • 6. This topic isn’t going to help much. SORRY :(
  • 8. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Ground rules for this session… •  If you can’t tell if I am trying to be funny… –  GO AHEAD AND LAUGH! •  Feel free to text, tweet, yammer, or whatever. Use •  If you have a question, no need to wait until the end. Just interrupt me. Seriously… I don’t mind.
  • 9. I have a lot of experience leading Systems and Event Management teams
  • 10. Latency I am here today to share some of what I have learned about
  • 11. User Experience And more importantly, I am here today to talk about
  • 12. What do I mean by latency and user experience?
  • 14. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. LaŸtenŸcy – [LEYT-n-see] -noun, plural -cies 1.  The state of being latent 2.  The time that elapses between a stimulus and the response to it 3.  The state of being not yet evident or active
  • 16. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. ExŸpeŸriŸence – [ik-SPEER-ee-uh’ns] -noun 1.  The apprehension of an object, thought, or emotion through the senses or mind 2.  Direct personal participation or observation; actual knowledge or contact 3.  A particular incident, feeling, etc., that a person has undergone -verb 4.  To be emotionally or aesthetically moved by; to feel 5.  To learn by perceiving, understanding, or remembering
  • 18. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. When you put them together we get: The ultimate measure of success for any system is the perception of its performance. The less interactive a system becomes the more likely its performance will be perceived to be poor. Latency is the mother of inactivity!
  • 19. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The Two Dimensions of Latency… Internal Latency vs. External Latency Actual Latency vs. Perceived Latency This is what user experience is all about In other words: Perceived = Fn(Internal+External)Variation )
  • 20. We need to recognize when we have problems to solve
  • 22. Maybe. Let me show you why this is important…
  • 23. Is 5 seconds really bad?
  • 27. If you were the one on the phone with one of those customers… how would you fill that silence?
  • 28. Why does any of this matter?
  • 29. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. No complaint… is more common than that of a scarcity of money -Adam Smith, Wealth of Nations
  • 30. *Among adults who accessed the internet with a mobile phone in the past 12 months (n=1,001) – Gomez Mobile Web Experience Survey conducted by Equation Research 58% of mobile phone users expect websites to load as quickly, almost as quickly or faster on their mobile phone, compared to the computer they use at home* http://www.flickr.com/photos/lucianbickerton/3858380291/sizes/l/
  • 31. *Among adults who accessed the internet with a mobile phone in the past 12 months (n=1,001) – Gomez Mobile Web Experience Survey conducted by Equation Research 60% of mobile web users have had a problem in the past year when accessing a website on their phone* http://www.flickr.com/photos/rickyromero/1357938629/sizes/l/
  • 32. *Among adults who accessed the internet with a mobile phone in the past 12 months (n=602) – Gomez Mobile Web Experience Survey conducted by Equation Research Slow load time was the number on issue, experience by almost 75% of them* http://bighugelabs.com/onblack.php?id=2497744197&size=large
  • 33. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Our Problem Statement: The business needs to reliably reach its customers and users regardless of where they may be located. Latency forces close geographic proximity of the components and limits the quality of service provided to geographically distributed customers.
  • 34. If the users can’t use it, it doesn’t work.
  • 35. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Our Constraints At the same time, there are a few inescapable facts we face: 1.  Today’s users demand reliable systems to do their work 2.  IT systems will mirror the complexity of the businesses they support 3.  Our environments must be massive to handle the workload 4.  Business continuity requires geographic diversity in our deployment locations 5.  The speed of light isn’t changing any time soon
  • 36. When all of these happen at the same time… Ug…
  • 37. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Question Is there a better way to figure out what monitoring would help?
  • 38. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Itemize the existing monitors Brainstorm potential gaps to fill Deploy new monitors Identify the potential risks Itemize the existing monitors Determine if which gaps exist Fill the monitoring gaps Current Approach Proposed Approach Picking Better Monitors
  • 39. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. What Do You Want To Accomplish? Your monitoring should help you answer: •  How will we know if the users are getting the experience they are expecting? •  How much capacity do we need during normal and peak times to ensure user expectations are met? •  How quickly can the provider we select ramp up to meet our needs if we find that the service is underperforming? •  How fast do we need to be able to access additional capacity once it is ready for us?
  • 40. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Composite Applications Site Content Search Session Information User Login & Identity Mgmt Content Mgmt System Social Network Widgets Site Tracking & Analytics Banner Ads & Revenue Generators Multimedia & CDN Content
  • 41. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Composite Applications Are Everywhere •  ATG (Oracle) – Shopping Cart •  Estara – Click to Chat •  Twitter Widget – Social Networking •  Gigya – Social Networking •  Google Maps API – GeoLocation •  Facebook Widget – Social Networking •  Google Analyics – User Tracking
  • 42. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Seeing Is Believing Real User Monitoring Would Report 94ms Response Time. The page seemed “done” to me 1.2 seconds later The time spent rendering represented 93% of the user experienced latency
  • 43. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The Same Old Problem Corporate LANs & VPNs ISP Connection DNS & Internet Services Content Mgmt System Social Network Widgets Site Tracking & Analytics Banner Ads & Revenue Generators Multimedia & CDN Content Home Wireless & Broadband Mobile Broadband Is It My Data Center? •  Configuration errors •  Application design issues •  Code defects •  Insufficient infrastructure •  Oversubscription Issues •  Poor routing optimization •  Low cache hit rate Is It a Service Provider Problem? •  Non-optimized mobile content •  Bad performance under load •  Blocking content delivery •  Incorrect geo-targeted content Is it an ISP Problem? •  Peering problems •  ISP Outages Is it My Code or a Browser Problem? •  Missing content •  Poorly performing JavaScript •  Inconsistent CSS rendering •  Browser/device incompatibility •  Page size too big •  Conflicting HTML tag support •  Too many objects •  Content not optimized for device The Cloud Distributed Database Mainframe Network Middleware Storage
  • 44. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Cognitive Dissonance Corporate LANs & VPNs Distributed Database Mainframe Network Middleware Storage ISP Connection DNS & Internet Services Content Mgmt System Social Network Widgets Site Tracking & Analytics Banner Ads & Revenue Generators Multimedia & CDN Content Home Wireless & Broadband Mobile Broadband The Part You Control The Part They Experience …meanwhile the user is NOT happy All our systems look great, SLA’s are being met… You Have More Control Here Than You Think
  • 45. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Gaining Perspective Requires Balance Packet Capture Synthetic Transactions Client Monitoring Client Monitoring Synthetic Transactions Server Probe 1.  Client to the Server 2.  Server to the Client 3.  “3rd Party” Vantage Point 4.  Synthetic Transactions Four Perspectives of User Experience
  • 46. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Why Multiple Perspectives? Know Your Customer: •  What they do? §  Customers care about completing tasks NOT whether the homepage is available •  Where they do it from? §  Your customers don’t live in the cloud, test from their perspective •  When they do it? §  Test at peak and normal traffic levels, to find all the problems •  What expectations do customers have? §  Is 5 seconds fast enough or does it have to be quicker?
  • 47. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. What Does Good Monitoring Look Like? Corporate LANs & VPNs Load Balancer Load Balancer Firewall Switch Web Server Farm Database Data Power Mainframe Middleware Load Balancer 1.  System Availability 2.  Operating System Performance 3.  Hardware Monitoring 4.  Service/Daemon and Process Availability 5.  Error Logs 6.  Application Resource KPIs 7.  End-to-End Transactions 8.  Point of Failure Transactions 9.  Fail-Over Success 10. “Activity Monitors” and “Reverse Hockey Stick” Elements of Good Monitoring 32 4 5 61 7 8 9 10
  • 48. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. When decisions are not made based on information, it’s called gambling.
  • 49. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Finding Metrics That Matter §  Will the metric be used in a report? If so, which one? How is it used in the report? §  Will the metric be used in a dashboard? If so, which one? How will it be used? §  What action(s) will be taken if an alert is generated? Who are the actors? Will a ticket be generated? If so, what severity? §  How often is this event likely to occur? What is the impact if the event occurs? What is the likelihood it can be detected by monitoring? §  Will the metric help identify the source of a problem? Is it a coincident / symptomatic indicator? §  Is the metric always associated with a single problem? Could this metric become a false indicator? §  What is the impact if this goes undetected? §  What is the lifespan for this metric? What is the potential for changes that may reduce the efficacy of the metric? Evaluating the Effectiveness of a Metric
  • 50. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Watch your words 737-900ER 747-400ER Maximum Number of Passengers 215 524 Maximum Crusing Speed (mph) 511 570 A 737 and a 747 both travel around 500 mph but the 747 carries twice as many people. Would you say it is twice as fast?
  • 51. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. What Matters Most? Dr. Lee Goldman Cook County Hospital, Chicago, IL §  Is the patient feeling unstable angina? §  Is there fluid in the patient’s lungs? §  Is the patient’s systolic blood pressure below 100?# The Goldman Algorithm Prediction of Patients Expected to Have a Heart Attack Within 72 Hours 0   20   40   60   80   100   Traditional Techniques Goldman Algorithm By paying attention to what really matters, Dr. Goldman improved the “false negatives” by 20 percentage points and eliminated the “false positives” altogether.
  • 52. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. •  Server Metrics –  Server Response Time –  Server Connection Time –  Refused Session Percentage –  Unresponsive Session Percentage •  Network Metrics –  Network Round Trip Time –  Retransmission Delay –  Effective Network Round Trip Time –  Network Connection Time •  Application Metrics –  Total Transaction Time –  Data Transfer Time Really Helpful KPIs
  • 53. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Beware of Averages 75th Percentile 50th Percentile 25th Percentile 0.5 0.7 0.9 1.8 2.5 2.5 2.6 2.9 3.3 3.5 Average
  • 54. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Your Mission… In addition to monitoring for system availability, we are here to help manage latency. The Recipe: 1.  Continually map, monitor, and categorize all sources of latency 2.  Help identify and remove all sources that are found
  • 55. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The Critical Path of Performance Browser Workstation OS Workstation Hardware Client LAN Corporate WAN Datacenter LAN Etc. Web Server Web Server OS Web Server Hardware Datacenter LAN Middleware Server Hardware Middleware Server OS Middleware Application Etc. Database Server Database Server OS Database Server HBA SAN Fabric Switch Array Hardware Array Controller Hardware Cache Disk Drives Etc. Client Node Middleware Database
  • 57. SNMP
  • 58. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. MIBs and OIDs root iso (1) org (3) dod (6) Internet (1) Interfaces (2) IP (4) System (1) ifOperStatus = ..1.3.6.1.2.1.2.2.1.8.0 MIB-2 (1) Directory (1) Experimental (3) Mgmt (2) Private (4) Juniper (2636) Cisco (9) Apple (63) Microsoft (311) Port OperStatus = .1.3.6.1.4.1.9.5.1.4.1.1.6.0Functionally the same
  • 59. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. MIBs and OIDs root iso (1) org (3) dod (6) Internet (1) Interfaces (2) IP (4) System (1) MIB-2 (1) Directory (1) Experimental (3) Mgmt (2) Private (4) Juniper (2636) Cisco (9) Apple (63) Microsoft (311) Port Index = .1.3.6.1.4.1.9.5.1.4.1.1.4.0 A MIB is the set of OIDs for a defining a set of information in the database Port Type = .1.3.6.1.4.1.9.5.1.4.1.1.5.0 Port OperStatus = .1.3.6.1.4.1.9.5.1.4.1.1.6.0 Port IfIndex = .1.3.6.1.4.1.9.5.1.4.1.1.11.0 portMacControlUnknownProtocolFrames = .1.3.6.1.4.1.9.5.1.4.1.1.21.0
  • 60. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  • 61. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  • 62. RMON
  • 63. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. RMON is “Flow-Based” Monitoring RMON v1 (RFC 2819) •  Statistics: real-time LAN statistics e.g. utilization, collisions, CRC errors •  History: history of selected statistics •  Alarm: definitions for RMON SNMP traps to be sent when statistics exceed defined thresholds •  Hosts: host specific LAN statistics e.g. bytes sent/received, frames sent/received •  Hosts top N: record of N most active connections over a given time period •  Matrix: the sent-received traffic matrix between systems •  Filter: defines packet data patterns of interest e.g. MAC address or TCP port •  Capture: collect and forward packets matching the Filter •  Event: send alerts (SNMP traps) for the Alarm group •  Token Ring: extensions specific to Token Ring RMON v2 (RFC 4502) •  Protocol Directory: list of protocols the probe can monitor •  Protocol Distribution: traffic statistics for each protocol •  Address Map: maps network-layer (IP) to MAC- layer addresses •  Network-Layer Host: layer 3 traffic statistics, per each host •  Network-Layer Matrix: layer 3 traffic statistics, per source/destination pairs of hosts •  Application-Layer Host: traffic statistics by application protocol, per host •  Application-Layer Matrix: traffic statistics by application protocol, per source/destination pairs of hosts •  User History: periodic samples of user-specified variables •  Probe Configuration: remote configure of probes •  RMON Conformance: requirements for RMON2 MIB conformance
  • 64. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The RMON MIBs root iso (1) org (3) dod (6) Internet (1) Interfaces (2) IP (4) System (1) MIB-2 (1) Directory (1) Experimental (3) Mgmt (2) Private (4) RMON (16) RMON data is stored in a MIB and can be collected using SNMP
  • 65. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. MIBs and OIDs root iso (1) org (3) dod (6) Internet (1) Interfaces (2) IP (4) System (1) MIB-2 (1) Directory (1) Experimental (3) Mgmt (2) Private (4) RMON (16) rmonEventsV2 statistics history alarm hosts hostTopN matrix filter Capture Event tokenRing protocolDir protocolDist addressMao nlHost nlMatrix alHost alMatrix usrHistory probeConfig rmonConformance mediaIndependentStats switchRMON interfaceTopNMIB hcAlarmMIB =  .1.3.6.1.2.1.16.0 =  .1.3.6.1.2.1.16.1.0 =  .1.3.6.1.2.1.16.2.0 =  .1.3.6.1.2.1.16.3.0 =  .1.3.6.1.2.1.16.4.0 =  .1.3.6.1.2.1.16.5.0 =  .1.3.6.1.2.1.16.6.0 =  .1.3.6.1.2.1.16.7.0 =  .1.3.6.1.2.1.16.8.0 =  .1.3.6.1.2.1.16.9.0 =  .1.3.6.1.2.1.16.10.0 =  .1.3.6.1.2.1.16.11.0 =  .1.3.6.1.2.1.16.12.0 =  .1.3.6.1.2.1.16.13.0 =  .1.3.6.1.2.1.16.14.0 =  .1.3.6.1.2.1.16.15.0 =  .1.3.6.1.2.1.16.16.0 =  .1.3.6.1.2.1.16.17.0 =  .1.3.6.1.2.1.16.18.0 =  .1.3.6.1.2.1.16.19.0 =  .1.3.6.1.2.1.16.20.0 =  .1.3.6.1.2.1.16.21.0 =  .1.3.6.1.2.1.16.22.0 =  .1.3.6.1.2.1.16.23.0 =  .1.3.6.1.2.1.16.24.0 All this information lives in just one table and most people don’t know about it!
  • 66. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Setting Thresholds Falling Threshold Rising Threshold Sample Interval Policy Activations
  • 68. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. How we view the network
  • 69. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. How our applications view it
  • 70. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. What a Flow Record Looks Like http://www.cisco.com/c/en/us/td/docs/ios/fnetflow/configuration/guide/12_2sr/fnf_12_2_sr_book/fnetflow_overview.html
  • 71. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. One record, multiple uses http://www.cisco.com/c/en/us/td/docs/ios/fnetflow/configuration/guide/12_2sr/fnf_12_2_sr_book/fnetflow_overview.html
  • 73. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The Progression SNMP Granularity Accuracy RMON Netflow Packet Inspection
  • 74. That is great but we need more…
  • 75. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Shallow vs Deep Packet Inspection SPI is very focused on header information from OSI Layers 3 & 4 (IP, TCP, UDP, etc.) DPI processes header and datagram information (HTTP, SQL, SIP, etc.) IP Header TCP Header GET /userLogin.jsp HTTP/1.1 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A Shallow Packet Inspection (SPI) Deep Packet Inspection (DPI)
  • 76. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Shallow Packet Inspection
  • 77. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Degraded Threshold – The point at which users will complain about poor performance Excessive Threshold – The point at which users will stop using the application due to poor performance Two Different Thresholds
  • 78. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. 3.  Compare network latency across sites 2.  Prove the value of a server upgrade1.  Document the results of QoS changes Validating Changes
  • 79. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Solving Problems Pervasiveness: The problem is effecting user across your network
  • 81. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  • 82. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Troubleshooting VoIP
  • 83. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Don’t Commit a Felony
  • 84. Putting it all together
  • 85. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Using Indices •  Network Congestion Index •  Packet Loss SLAs NCI = (Packets/sec + Avg Payload) * (Avg Latency + Avg Bandwidth) App Owner Controlled Network Controlled bps < min(rwin/rtt, MSS/(rtt*sqrt(loss))) For example, to achieve a gigabit per second with TCP on a coast-to-coast path (rtt = 40 msec), with 1500 byte packets, the loss rate can not exceed 8.5x10^-8! If the loss rate was even 0.1% (far better than most SLAs), TCP would be limited to just over 9 Mbps. [Note that large packet sizes help. If packets were n times larger, the same throughput could be achieved with n^2 times as much packet loss.]
  • 86. (C) SystemsManagementZen.com 2007-2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Let’s keep the conversation going… Andrew.P.White@Gmail.com ReverendDrew SystemsManagementZen.Wordpress.com systemsmanagementzen.wordpress.com/feed/ @SystemsMgmtZen ReverendDrew APWhite@us.ibm.com 614-306-3434