SlideShare a Scribd company logo
Machine Learning in Performance Management Irina Rish IBM T.J. Watson Research Center January 24, 2001
Outline Introduction Machine learning applications in Performance Management Bayesian learning tools: extending ABLE Advancing theory Summary and future directions
Pattern discovery, classification, diagnosis and prediction Learning problems: examples System event mining Events from hosts Time End-user transaction recognition Remote Procedure Calls (RPCs) BUY? SELL? OPEN_DB? SEARCH? Transaction1 Transaction2
Approach: Bayesian learning Numerous important applications: Medicine Stock market Bio-informatics eCommerce Military  ……… Diagnosis: P(cause| symptom )=? Learn (probabilistic) dependency models P(S) P(B|S) P(X|C,S) P(C|S) P(D|C,B) Prediction: P(symptom| cause )=? Bayesian networks Pattern classification: P(class| data )=? C S B D X
Outline Introduction Machine-learning applications in Performance Management   Transaction Recognition In progress: Event Mining; Probe Placement; etc. Bayesian learning tools: extending ABLE  Advancing theory Summary and future directions
End-User Transaction Recognition:   why is it important? Client  Workstation End-User Transactions (EUT) Remote  Procedure Calls (RPCs) Server  (Web, DB,  Lotus Notes) Session  (connection) Realistic workload models   (for testing performance) Resource management  (anticipating requests) Quantifying end-user perception of performance  (response times) Examples: Lotus Notes, Web/eBusiness (on-line stores, travel agencies, trading): database transactions, buy/sell, search, email, etc. ? OpenDB Search SendMail RPCs
Why is it hard? Why learn from data? Example: EUTs and RPCs in Lotus Notes MoveMsgToFolder FindMailByKey 1.  OPEN_COLLECTION 2.  UDATE_COLLECTION 3.  DB_REPLINFO_GET 4.  GET_MOD_NOTES 5.  READ_ENTRIES 6.  OPEN_COLLECTION 7.  FIND_BY_KEY 8.  READ_ENTRIES EUTs RPCs Many RPC and EUT types  (92 RPCs and 37 EUTs) Large (unlimited) data sets  (10,000+ Tx inst.) Manual classification  of a data subset took about a  month Non-deterministic and unknown   EUT  RPC  mapping :   “ Noise” sources - client/server states   No client-side instrumentation – unknown EUT boundaries
Our approach:  Classification + Segmentation (similar to text classification) (similar to speech understanding,  image segmentation) Problem 2:  both segment and label (EUT recognition) 1 2 1 3 4 1 2 3 1 2 1 3 2 3 1 2 1 2 3 1 2 1 2 4 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx2 Tx2 Tx2 Tx2 Tx2 Tx2 Tx2 Tx2 Unsegmented RPC's Segmented RPC's and Labeled Tx's Tx2 Problem 1:  label segmented data (classification) Labeled Tx's Segmented RPC's Tx3 Tx2 1 3 3 1 3 1 3 1 3 1 3 1 3 1 3 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 3 Tx1 Tx 1 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Tx3 1 1 1 1 1 1 1 1
How to represent transactions?   “Feature vectors” RPC counts RPC occurrences
Classification scheme RPCs labeled with EUTs  Learning Classifier Unlabeled  RPCs EUTs Training phase Feature Extraction   Classifier Training data: Operation phase “ Test” data: Feature Extraction   Classification
Our classifier: naïve Bayes (NB) 2.  Classification:   given (unlabeled) instance  ,  choose most likely class: Simplifying (“naïve”) assumption: feature independence given class Training: estimate parameters  and  (e.g., ML-estimates)  (Bayesian decision rule)
Classification results on Lotus CoC data Significant improvement  over baseline classifier (75%) NB is  simple, efficient, and  comparable to the state-of-the-art  classifiers: SVM – 85-87%, Decision Tree – 90-92% Best-fit distribution  (shift. geom)  -  not necessarily best classifier ! (?) Baseline classifier: Always selects most- frequent transaction Accuracy Training set size NB + Bernoulli,  mult. or geom. NB + shifted  geom.
Transaction recognition: segmentation + classification Naive Bayes classifier Dynamic programming (Viterbi search) (Recursive) DP equation:
Transaction recognition results Accuracy Training set size Good EUT recognition accuracy: 64%  (harder problem than classification!) Reversed order of results:   best classifier - not necessarily best recognizer ! (?) further research! Third best best Multinomial Fourth best best Geometric best worst Shift. Geom. Second best best Bernoulli Segmentation Classification Model
EUT recognition: summary A novel approach:  learning EUTs from RPCs Patent, conference paper (AAAI-2000), prototype system Successful results  on Lotus Notes data (Lotus CoC): Classification –  naive Bayes  (up to  87% accuracy ) EUT recognition –  Viterbi+Bayes   (up to  64% accuracy ) Work in progress: Better  feature selection  (RPC subsequences?) Selecting  “best classifier” for segmentation  task Learning  more sophisticated classifiers  (Bayesian networks) Information-theoretic  approach to segmentation ( MDL )
Outline Introduction Machine-learning applications in Performance Management   Transaction Recognition In progress: Event Mining; Probing Strategy; etc. Bayesian learning tools: extending ABLE Advancing theory Summary and future directions
Event Mining: analyzing system event sequences Example: USAA data 858 hosts, 136 event types   67184 data points: (13 days, by sec)   Event examples: High-severity events:   'Cisco_Link_Down‘, 'chassisMinorAlarm_On‘, etc. Low-severity events: 'tcpConnectClose‘, 'duplicate_ip‘, etc. Events from hosts Time (sec) What is it? Why is it important? learning system behavior patterns  for better performance management  Why is it hard? large complex  systems (networks)  with many dependencies;  prior models not always available many events/hosts, data sets: huge and constantly growing
???  Event1 Event N   1. Learning event dependency models Event2 EventM Important issue:  incremental learning  from data streams Current approach:  learn dynamic probabilistic graphical models  ( temporal , or  dynamic Bayes nets ) Predict:   time to failure event co-occurrence existence of hidden nodes – “root causes” Recognize sequence of high-level system states:  unsupervised  version of  EUT recognition problem
2. Clustering hosts by their history “ Problematic” hosts “ Silent” hosts group hosts w/ similar event sequences :  what is appropriate  similarity (“distance”) metric ? One example: e.g.,  distance between “compressed” sequences – event distribution models:
Probing strategy (EPP) Objectives: find  probe frequency   F   that minimizes  E (Tprobe-Tstart)  - failure detection, or  E( total “failure” time – total “estimated” failure time)  -  gives accurate performance estimate Constraints on additional load induced by probes:  L(F) < MaxLoad time response time Availability violations Probes
Outline Introduction Machine-learning applications in Performance Management   Bayesian learning tools: extending ABLE Advancing theory Summary and future directions
ABLE:  A gent  B uilding and  L earning  E nvironment
What is ABLE?  What is my contribution? A JAVA toolbox for building reasoning and learning agents Provides:  visual environment, boolean and fuzzy  rules, neural networks,   genetic search My contributions: naïve Bayes classifier (batch and incremental) Discretization  Future releases:   General Bayesian learning and inference tools  Available at AlphaWorks:  www.alphaWorks.ibm.com/tech  Project page:  w3.rchland.ibm.com/projects/ABLE
How does it work?
Who is using Naïve Bayes tools? Impact on other IBM projects Video Character Recognition: (w/ C. Dorai): Naïve Bayes:  84%  accuracy  Better than SVM on some pairs  of characters (aver. SVM = 87%) Current work:  combining Naïve Bayes with SVMs Environmental data analysis: (w/ Yuan-Chi Chang) Learning mortality rates using  data on air pollutants  Naïve Bayes is currently being evaluated Performance management: Event mining – in progress  EUT recognition – successful results
Outline Introduction Machine-learning in Performance Management Bayesian learning tools: extending ABLE  Advancing theory analysis of naïve Bayes classifier  inference in Bayesian Networks Summary and future directions
Why Naïve Bayes does well? And when? When independence assumptions do not hurt classification? Class-conditional  feature independence: Unrealistic assumption! But why/when it works? True   NB estimate   P(class|f) Class Intuition:   wrong probability estimates  wrong classification! Naïve Bayes:   Bayes-optimal :
Case 1: functional dependencies Lemma 1:  Naïve Bayes is optimal when features  are functionally dependent given class Proof :
Lemma 2:  Naïve Bayes is a “good approximation”  for “almost-functional” dependencies Case 2:  “almost-functional”  (low-entropy) distributions Related practical examples: RPC occurrences in EUTs : often almost-deterministic (and NB does well)  Successful “ local inference” in almost-deterministic Bayesian networks  (Turbo coding, “mini-buckets” – see Dechter&Rish 2000) Formally: δ 1 ) a f P( or , δ 1 ) a P(f i i then If        n 1,..., i for ,
Experimental results support theory Less “noise” (smaller  )  => NB closer to optimal Random problem generator: uniform P(class); random P(f|class): 1.  A randomly selected entry in P(f|class) is assigned 2.  The rest of entries – uniform random sampling + normalization 2. Feature dependence does  NOT  correlate with NB error
Outline Introduction Machine-learning in Performance Management   Transaction Recognition Event Mining Bayesian learning tools: extending ABLE  Advancing theory analysis of naïve Bayes classifier inference in Bayesian Networks Summary and future directions
From Naïve Bayes to Bayesian Networks   Naïve Bayes model: independent features given class Bayesian network (BN) model:  Any  joint probability distributions =  P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) P(S, C, B, X, D)= Query: P (lung cancer =yes  |   smoking =no,  dyspnoea =yes  ) = ? lung Cancer Smoking X-ray Bronchitis Dyspnoea P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) CPD: C  B  D=0 D=1 0  0  0.1  0.9 0  1  0.7  0.3 1  0  0.8  0.2 1  1  0.9  0.1
Example: Printer Troubleshooting   (Microsoft Windows 95) [Heckerman, 95] Print Output OK Correct Driver Uncorrupted Driver Correct Printer Path Net Cable Connected Net/Local Printing Printer On  and Online Correct Local Port Correct  Printer Selected Local Cable Connected Application Output OK Print Spooling On Correct  Driver Settings Printer Memory Adequate Network Up Spooled Data OK GDI Data Input OK GDI Data  Output OK Print Data OK PC to Printer Transport OK Printer Data OK Spool Process OK Net Path OK Local Path OK Paper Loaded Local Disk Space Adequate
How to use Bayesian networks? MEU  Decision-making (given utility function) Prediction: P(symptom| cause )=? Diagnosis: P(cause| symptom )=? NP-complete  inference problems Approximate algorithms Applications: Medicine Stock market Bio-informatics eCommerce Performance  management etc. cause symptom symptom cause Classification: P(class| data )=?
Idea:   reduce complexity of inference by ignoring some dependencies   Successfully used for approximating  Most Probable Explanation: Very efficient on real-life (medical, decoding) and synthetic problems Local approximation scheme “Mini-buckets” (paper submitted to JACM) Less “noise” => higher accuracy similarly to naïve Bayes! General theory needed: Independence assumptions and “almost-deterministic” distributions noise Approximation accuracy Potential impact: efficient inference in complex performance  management models (e.g., event mining, system dependence models)
Summary Theory and algorithms :  analysis of Naïve Bayes accuracy  (Research Report) approximate Bayesian inference   (submitted paper) patent on meta-learning   Machine-learning tools :  ( alphaWorks) Extending ABLE w/ Bayesian classifier  Applying classifier to other IBM projects:  Video character recognition Environmental data analysis Performance management: End-user transaction recognition:  ( Lotus CoC ) novel method, patent, paper; applied to Lotus Notes In progress: event mining  ( USAA ),  probing strategies  (EPP)
Future directions Automated learning and inference Research interest Practical Problems Generic tools Theory Performance management: Transaction recognition  – better  feature selection,  segmentation  Event Mining  –  Bayes net models, clustering Web log analysis  – segmentation/  classification/ clustering Modeling system dependencies  –  Bayes nets “ Technology transfer”  – generic  approach to  “event streams” (EUTs,  sys.events, web page accesses) ML library / ABLE:   Bayesian learning general Bayes nets temporal BNs incremental learning Bayesian inference Exact inference Approximations Other tools: SVMs, decision trees Combined tools, meta-learning tools Analysis of algorithms:   Naïve  Bayes accuracy:  other distribution types Accuracy of local  inference approximations Comparing model selection criteria  (e.g., Bayes net learning) Relative analysis and combination of classifiers  (Bayes/max. margin/DT) Incremental learning
Collaborations Transaction recognition J. Hellerstein, T. Jayram (Watson) Event Mining J. Hellerstein, R. Vilalta, S. Ma, C. Perng (Watson) ABLE J. Bigus, R. Vilalta (Watson) Video Character Recognition C. Dorai (Watson) MDL approach to segmentation B. Dom (Almaden) Approximate inference in Bayes nets R. Dechter (UCI) Meta-learning R. Vilalta (Watson) Environmental data analysis Y. Chang (Watson)
Machine learning discussion group Weekly seminars: 11:30-2:30 (w/ lunch) in 1S-F40 Active group members: Mark Brodie, Vittorio Castelli, Joe Hellerstein, Daniel Oblinger,  Jayram Thathachar, Irina Rish  (more people joint recently) Agenda:  discussions of recent ML papers, book chapters (“Pattern Classification” by Duda, Hart, and Stork, 2000) brain-storming sessions about particular ML topics Recent discussions: accuracy of Bayesian classifiers (naïve Bayes) Web site: http://reswat4.research.ibm.com/projects/mlreadinggroup/mlreadinggroup.nsf/main/toppage

More Related Content

PPTX
Artificial Intelligence, Machine Learning and Deep Learning
PDF
Advance deep learning
PDF
Anomaly detection (Unsupervised Learning) in Machine Learning
PDF
Moving Object Detection And Tracking Using CNN
PDF
Icml2018 naver review
PDF
Convolutional Neural Network
PPTX
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
PDF
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
Artificial Intelligence, Machine Learning and Deep Learning
Advance deep learning
Anomaly detection (Unsupervised Learning) in Machine Learning
Moving Object Detection And Tracking Using CNN
Icml2018 naver review
Convolutional Neural Network
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...

What's hot (20)

PDF
Machine learning and_neural_network_lecture_slide_ece_dku
PDF
DEF CON 24 - Clarence Chio - machine duping 101
PPTX
Deep Learning: Chapter 11 Practical Methodology
PDF
IEEE 2015 Java Projects
PDF
IEEE 2015 Java Projects
PPTX
Machine learning in computer security
PDF
How Can Machine Learning Help Your Research Forward?
PDF
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
PDF
Neural Networks and Deep Learning
PDF
Fundamental of deep learning
PDF
P2P EC: A study of viability
PPTX
Applications of machine learning in Wireless sensor networks.
PDF
Deep Neural Networks for Multimodal Learning
PDF
Performance Evaluation of Classifiers used for Identification of Encryption A...
PDF
Computer Engineer Master Project
PDF
Machine Learning (ML) in Wireless Sensor Networks (WSNs)
PPT
Temporal Hypermap Theory and Application
DOC
2nd sem
ODP
Eswc2009
Machine learning and_neural_network_lecture_slide_ece_dku
DEF CON 24 - Clarence Chio - machine duping 101
Deep Learning: Chapter 11 Practical Methodology
IEEE 2015 Java Projects
IEEE 2015 Java Projects
Machine learning in computer security
How Can Machine Learning Help Your Research Forward?
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
Neural Networks and Deep Learning
Fundamental of deep learning
P2P EC: A study of viability
Applications of machine learning in Wireless sensor networks.
Deep Neural Networks for Multimodal Learning
Performance Evaluation of Classifiers used for Identification of Encryption A...
Computer Engineer Master Project
Machine Learning (ML) in Wireless Sensor Networks (WSNs)
Temporal Hypermap Theory and Application
2nd sem
Eswc2009
Ad

Viewers also liked (6)

PPTX
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
PDF
Building with Watson - Training and Preparing Your Conversational System
PDF
Ibm watson machine learning and watson knowledge stuido 20160827
PDF
Ml, AI and IBM Watson - 101 for Business
PPTX
AI for IA's: Machine Learning Demystified at IA Summit 2017 - IAS17
PDF
Machine Learning for Dummies
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Building with Watson - Training and Preparing Your Conversational System
Ibm watson machine learning and watson knowledge stuido 20160827
Ml, AI and IBM Watson - 101 for Business
AI for IA's: Machine Learning Demystified at IA Summit 2017 - IAS17
Machine Learning for Dummies
Ad

Similar to Advances in Bayesian Learning (20)

PPTX
Optimal Bayesian Networks
PPS
Brief Tour of Machine Learning
PPTX
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
PDF
Energy-Based Models with Applications to Speech and Language Processing
PPTX
Machine learning
PPT
594503964-Introduction-to-Classification-PPT-Slides-1.ppt
PPTX
Bayesian Neural Networks
PDF
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Applied Machin...
PPT
Free Ebooks Download ! Edhole.com
PPTX
Deep learning from a novice perspective
PPTX
SVM - Functional Verification
PPTX
Machine Learning & Predictive Maintenance
PDF
Bayesian_Decision_Theory-3.pdf
PDF
A novel methodology for constructing rule based naïve bayesian classifiers
PDF
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
PDF
Machine learning Algorithms
PDF
IRJET - Encoded Polymorphic Aspect of Clustering
PDF
Machine learning with in the python lecture for computer science
PDF
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
PDF
ML Basic Concepts.pdf
Optimal Bayesian Networks
Brief Tour of Machine Learning
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
Energy-Based Models with Applications to Speech and Language Processing
Machine learning
594503964-Introduction-to-Classification-PPT-Slides-1.ppt
Bayesian Neural Networks
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Applied Machin...
Free Ebooks Download ! Edhole.com
Deep learning from a novice perspective
SVM - Functional Verification
Machine Learning & Predictive Maintenance
Bayesian_Decision_Theory-3.pdf
A novel methodology for constructing rule based naïve bayesian classifiers
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine learning Algorithms
IRJET - Encoded Polymorphic Aspect of Clustering
Machine learning with in the python lecture for computer science
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
ML Basic Concepts.pdf

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Advances in Bayesian Learning

  • 1. Machine Learning in Performance Management Irina Rish IBM T.J. Watson Research Center January 24, 2001
  • 2. Outline Introduction Machine learning applications in Performance Management Bayesian learning tools: extending ABLE Advancing theory Summary and future directions
  • 3. Pattern discovery, classification, diagnosis and prediction Learning problems: examples System event mining Events from hosts Time End-user transaction recognition Remote Procedure Calls (RPCs) BUY? SELL? OPEN_DB? SEARCH? Transaction1 Transaction2
  • 4. Approach: Bayesian learning Numerous important applications: Medicine Stock market Bio-informatics eCommerce Military ……… Diagnosis: P(cause| symptom )=? Learn (probabilistic) dependency models P(S) P(B|S) P(X|C,S) P(C|S) P(D|C,B) Prediction: P(symptom| cause )=? Bayesian networks Pattern classification: P(class| data )=? C S B D X
  • 5. Outline Introduction Machine-learning applications in Performance Management Transaction Recognition In progress: Event Mining; Probe Placement; etc. Bayesian learning tools: extending ABLE Advancing theory Summary and future directions
  • 6. End-User Transaction Recognition: why is it important? Client Workstation End-User Transactions (EUT) Remote Procedure Calls (RPCs) Server (Web, DB, Lotus Notes) Session (connection) Realistic workload models (for testing performance) Resource management (anticipating requests) Quantifying end-user perception of performance (response times) Examples: Lotus Notes, Web/eBusiness (on-line stores, travel agencies, trading): database transactions, buy/sell, search, email, etc. ? OpenDB Search SendMail RPCs
  • 7. Why is it hard? Why learn from data? Example: EUTs and RPCs in Lotus Notes MoveMsgToFolder FindMailByKey 1. OPEN_COLLECTION 2. UDATE_COLLECTION 3. DB_REPLINFO_GET 4. GET_MOD_NOTES 5. READ_ENTRIES 6. OPEN_COLLECTION 7. FIND_BY_KEY 8. READ_ENTRIES EUTs RPCs Many RPC and EUT types (92 RPCs and 37 EUTs) Large (unlimited) data sets (10,000+ Tx inst.) Manual classification of a data subset took about a month Non-deterministic and unknown EUT RPC mapping : “ Noise” sources - client/server states No client-side instrumentation – unknown EUT boundaries
  • 8. Our approach: Classification + Segmentation (similar to text classification) (similar to speech understanding, image segmentation) Problem 2: both segment and label (EUT recognition) 1 2 1 3 4 1 2 3 1 2 1 3 2 3 1 2 1 2 3 1 2 1 2 4 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx1 Tx3 Tx2 Tx2 Tx2 Tx2 Tx2 Tx2 Tx2 Tx2 Unsegmented RPC's Segmented RPC's and Labeled Tx's Tx2 Problem 1: label segmented data (classification) Labeled Tx's Segmented RPC's Tx3 Tx2 1 3 3 1 3 1 3 1 3 1 3 1 3 1 3 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 3 Tx1 Tx 1 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Tx3 1 1 1 1 1 1 1 1
  • 9. How to represent transactions? “Feature vectors” RPC counts RPC occurrences
  • 10. Classification scheme RPCs labeled with EUTs Learning Classifier Unlabeled RPCs EUTs Training phase Feature Extraction Classifier Training data: Operation phase “ Test” data: Feature Extraction Classification
  • 11. Our classifier: naïve Bayes (NB) 2. Classification: given (unlabeled) instance , choose most likely class: Simplifying (“naïve”) assumption: feature independence given class Training: estimate parameters and (e.g., ML-estimates) (Bayesian decision rule)
  • 12. Classification results on Lotus CoC data Significant improvement over baseline classifier (75%) NB is simple, efficient, and comparable to the state-of-the-art classifiers: SVM – 85-87%, Decision Tree – 90-92% Best-fit distribution (shift. geom) - not necessarily best classifier ! (?) Baseline classifier: Always selects most- frequent transaction Accuracy Training set size NB + Bernoulli, mult. or geom. NB + shifted geom.
  • 13. Transaction recognition: segmentation + classification Naive Bayes classifier Dynamic programming (Viterbi search) (Recursive) DP equation:
  • 14. Transaction recognition results Accuracy Training set size Good EUT recognition accuracy: 64% (harder problem than classification!) Reversed order of results: best classifier - not necessarily best recognizer ! (?) further research! Third best best Multinomial Fourth best best Geometric best worst Shift. Geom. Second best best Bernoulli Segmentation Classification Model
  • 15. EUT recognition: summary A novel approach: learning EUTs from RPCs Patent, conference paper (AAAI-2000), prototype system Successful results on Lotus Notes data (Lotus CoC): Classification – naive Bayes (up to 87% accuracy ) EUT recognition – Viterbi+Bayes (up to 64% accuracy ) Work in progress: Better feature selection (RPC subsequences?) Selecting “best classifier” for segmentation task Learning more sophisticated classifiers (Bayesian networks) Information-theoretic approach to segmentation ( MDL )
  • 16. Outline Introduction Machine-learning applications in Performance Management Transaction Recognition In progress: Event Mining; Probing Strategy; etc. Bayesian learning tools: extending ABLE Advancing theory Summary and future directions
  • 17. Event Mining: analyzing system event sequences Example: USAA data 858 hosts, 136 event types 67184 data points: (13 days, by sec) Event examples: High-severity events: 'Cisco_Link_Down‘, 'chassisMinorAlarm_On‘, etc. Low-severity events: 'tcpConnectClose‘, 'duplicate_ip‘, etc. Events from hosts Time (sec) What is it? Why is it important? learning system behavior patterns for better performance management Why is it hard? large complex systems (networks) with many dependencies; prior models not always available many events/hosts, data sets: huge and constantly growing
  • 18. ??? Event1 Event N 1. Learning event dependency models Event2 EventM Important issue: incremental learning from data streams Current approach: learn dynamic probabilistic graphical models ( temporal , or dynamic Bayes nets ) Predict: time to failure event co-occurrence existence of hidden nodes – “root causes” Recognize sequence of high-level system states: unsupervised version of EUT recognition problem
  • 19. 2. Clustering hosts by their history “ Problematic” hosts “ Silent” hosts group hosts w/ similar event sequences : what is appropriate similarity (“distance”) metric ? One example: e.g., distance between “compressed” sequences – event distribution models:
  • 20. Probing strategy (EPP) Objectives: find probe frequency F that minimizes E (Tprobe-Tstart) - failure detection, or E( total “failure” time – total “estimated” failure time) - gives accurate performance estimate Constraints on additional load induced by probes: L(F) < MaxLoad time response time Availability violations Probes
  • 21. Outline Introduction Machine-learning applications in Performance Management Bayesian learning tools: extending ABLE Advancing theory Summary and future directions
  • 22. ABLE: A gent B uilding and L earning E nvironment
  • 23. What is ABLE? What is my contribution? A JAVA toolbox for building reasoning and learning agents Provides: visual environment, boolean and fuzzy rules, neural networks, genetic search My contributions: naïve Bayes classifier (batch and incremental) Discretization Future releases: General Bayesian learning and inference tools Available at AlphaWorks: www.alphaWorks.ibm.com/tech Project page: w3.rchland.ibm.com/projects/ABLE
  • 24. How does it work?
  • 25. Who is using Naïve Bayes tools? Impact on other IBM projects Video Character Recognition: (w/ C. Dorai): Naïve Bayes: 84% accuracy Better than SVM on some pairs of characters (aver. SVM = 87%) Current work: combining Naïve Bayes with SVMs Environmental data analysis: (w/ Yuan-Chi Chang) Learning mortality rates using data on air pollutants Naïve Bayes is currently being evaluated Performance management: Event mining – in progress EUT recognition – successful results
  • 26. Outline Introduction Machine-learning in Performance Management Bayesian learning tools: extending ABLE Advancing theory analysis of naïve Bayes classifier inference in Bayesian Networks Summary and future directions
  • 27. Why Naïve Bayes does well? And when? When independence assumptions do not hurt classification? Class-conditional feature independence: Unrealistic assumption! But why/when it works? True NB estimate P(class|f) Class Intuition: wrong probability estimates wrong classification! Naïve Bayes: Bayes-optimal :
  • 28. Case 1: functional dependencies Lemma 1: Naïve Bayes is optimal when features are functionally dependent given class Proof :
  • 29. Lemma 2: Naïve Bayes is a “good approximation” for “almost-functional” dependencies Case 2: “almost-functional” (low-entropy) distributions Related practical examples: RPC occurrences in EUTs : often almost-deterministic (and NB does well) Successful “ local inference” in almost-deterministic Bayesian networks (Turbo coding, “mini-buckets” – see Dechter&Rish 2000) Formally: δ 1 ) a f P( or , δ 1 ) a P(f i i then If        n 1,..., i for ,
  • 30. Experimental results support theory Less “noise” (smaller ) => NB closer to optimal Random problem generator: uniform P(class); random P(f|class): 1. A randomly selected entry in P(f|class) is assigned 2. The rest of entries – uniform random sampling + normalization 2. Feature dependence does NOT correlate with NB error
  • 31. Outline Introduction Machine-learning in Performance Management Transaction Recognition Event Mining Bayesian learning tools: extending ABLE Advancing theory analysis of naïve Bayes classifier inference in Bayesian Networks Summary and future directions
  • 32. From Naïve Bayes to Bayesian Networks Naïve Bayes model: independent features given class Bayesian network (BN) model: Any joint probability distributions = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B) P(S, C, B, X, D)= Query: P (lung cancer =yes | smoking =no, dyspnoea =yes ) = ? lung Cancer Smoking X-ray Bronchitis Dyspnoea P(D|C,B) P(B|S) P(S) P(X|C,S) P(C|S) CPD: C B D=0 D=1 0 0 0.1 0.9 0 1 0.7 0.3 1 0 0.8 0.2 1 1 0.9 0.1
  • 33. Example: Printer Troubleshooting (Microsoft Windows 95) [Heckerman, 95] Print Output OK Correct Driver Uncorrupted Driver Correct Printer Path Net Cable Connected Net/Local Printing Printer On and Online Correct Local Port Correct Printer Selected Local Cable Connected Application Output OK Print Spooling On Correct Driver Settings Printer Memory Adequate Network Up Spooled Data OK GDI Data Input OK GDI Data Output OK Print Data OK PC to Printer Transport OK Printer Data OK Spool Process OK Net Path OK Local Path OK Paper Loaded Local Disk Space Adequate
  • 34. How to use Bayesian networks? MEU Decision-making (given utility function) Prediction: P(symptom| cause )=? Diagnosis: P(cause| symptom )=? NP-complete inference problems Approximate algorithms Applications: Medicine Stock market Bio-informatics eCommerce Performance management etc. cause symptom symptom cause Classification: P(class| data )=?
  • 35. Idea: reduce complexity of inference by ignoring some dependencies Successfully used for approximating Most Probable Explanation: Very efficient on real-life (medical, decoding) and synthetic problems Local approximation scheme “Mini-buckets” (paper submitted to JACM) Less “noise” => higher accuracy similarly to naïve Bayes! General theory needed: Independence assumptions and “almost-deterministic” distributions noise Approximation accuracy Potential impact: efficient inference in complex performance management models (e.g., event mining, system dependence models)
  • 36. Summary Theory and algorithms : analysis of Naïve Bayes accuracy (Research Report) approximate Bayesian inference (submitted paper) patent on meta-learning Machine-learning tools : ( alphaWorks) Extending ABLE w/ Bayesian classifier Applying classifier to other IBM projects: Video character recognition Environmental data analysis Performance management: End-user transaction recognition: ( Lotus CoC ) novel method, patent, paper; applied to Lotus Notes In progress: event mining ( USAA ), probing strategies (EPP)
  • 37. Future directions Automated learning and inference Research interest Practical Problems Generic tools Theory Performance management: Transaction recognition – better feature selection, segmentation Event Mining – Bayes net models, clustering Web log analysis – segmentation/ classification/ clustering Modeling system dependencies – Bayes nets “ Technology transfer” – generic approach to “event streams” (EUTs, sys.events, web page accesses) ML library / ABLE: Bayesian learning general Bayes nets temporal BNs incremental learning Bayesian inference Exact inference Approximations Other tools: SVMs, decision trees Combined tools, meta-learning tools Analysis of algorithms: Naïve Bayes accuracy: other distribution types Accuracy of local inference approximations Comparing model selection criteria (e.g., Bayes net learning) Relative analysis and combination of classifiers (Bayes/max. margin/DT) Incremental learning
  • 38. Collaborations Transaction recognition J. Hellerstein, T. Jayram (Watson) Event Mining J. Hellerstein, R. Vilalta, S. Ma, C. Perng (Watson) ABLE J. Bigus, R. Vilalta (Watson) Video Character Recognition C. Dorai (Watson) MDL approach to segmentation B. Dom (Almaden) Approximate inference in Bayes nets R. Dechter (UCI) Meta-learning R. Vilalta (Watson) Environmental data analysis Y. Chang (Watson)
  • 39. Machine learning discussion group Weekly seminars: 11:30-2:30 (w/ lunch) in 1S-F40 Active group members: Mark Brodie, Vittorio Castelli, Joe Hellerstein, Daniel Oblinger, Jayram Thathachar, Irina Rish (more people joint recently) Agenda: discussions of recent ML papers, book chapters (“Pattern Classification” by Duda, Hart, and Stork, 2000) brain-storming sessions about particular ML topics Recent discussions: accuracy of Bayesian classifiers (naïve Bayes) Web site: http://reswat4.research.ibm.com/projects/mlreadinggroup/mlreadinggroup.nsf/main/toppage