SlideShare a Scribd company logo
Machine Learning
made easy with
Watson Studio
Hosted by
IBM Chicago
Nov 1st, 2018
Presenters
22018 IBM Corporation
Ahsan Rehman
Data Scientist & Developer
Advocate
IBM Analytics
Masters of Science in Analytics
rehmana@us.ibm.com
IBM Chicago
Svetlana Levitan
Developer Advocate and PMML
Release Manager
Center for Open Data and AI
Technologies (CODAIT)
IBM Digital Business Group
slevitan@us.ibm.com
Adam Massachi
Offering Manager for Watson
Studio
IBM Analytics
Degrees in Math and Philosophy
adam.massachi@ibm.com
Special Speaker
32018 IBM Corporation
IBM Chicago
Jing Shyr
Chief Statistician, IBM Fellow - Analytics
Subject matter expert for statistical analysis and data mining,
working knowledge of applying predictive analytics to business
problems
Machine Learning
Deep Learning
and AI
are everywhere
42018 IBM Corporation
facial recognition
unlocks your phone
fraud detection
protects your credit
recommendations
help you shop faster
what other users like you buy
what gets bought together
speech recognition
lets you go hands-free
business processes
automate human tasks
e.g. train based on human decisions
autonomous vehicles
detect pedestrians
machine vision
detects cancer early
spam detection
unclogs your Inbox
The future is now
How does
machine learning work?
A machine learning model is
trained to recognize
patterns in historical data
data
data data
data
1
data
data
The model is then used with
new data and asked to predict
or classify it.
2
If patterns in the new data
match the training data then the
model makes accurate
predictions
3 prediction
or
classification
???
Machine learning requires
TONS OF DATA
5IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation
need good data
6IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation
Enterprises generate TONS OF DATA
Data
Data that requires
governance
Which must be cleaned
and shaped for training
…then models must be designed
…that must be hosted and
monitored
…and trained on
high performance compute
SPSS
To select an
optimal model...
need good data
Tools & Infrastructure
• Need an environment
that enables quick
experiments and real
outcomes
• Discrete tools present
barriers to productivity
Data Governance
• If the data isn’t
secure, self-service
isn’t a reality
• Challenge
understanding data
lineage and getting to
a system of truth
Skills
• Data Science skills
are in low supply and
high demand
• Nurturing new data
professionals is
challenging
Data Access
• Data resides in silos
& difficult to access
• Unstructured and
external data wasn’t
considered
7
2018 IBM Corporation
Why are enterprises struggling to
capture the value of AI?
Watson Studio: Accelerating value from AI for
EnterprisesWatson Studio with Knowledge Catalog
accelerates the machine and deep learning workflows
required to infuse AI into your business to drive innovation.
It provides a suite of tools for data scientists/engineers, app developers, domain experts
to collaboratively connect, work with and analyze data,
and build, train, and deploy models.
AI Requires Teamwork
82018 IBM Corporation
• AI is not magic
• AI is algorithms + data + team
Her Job:
Builds AI application that meet the
requirements of the business.
What she does:
• Starts PoCs which includes
gathering content, dialog
building and model training
• Focus is on app building for the
team or company to use. Will
handle ML Ops as needed
Sometimes known as:
Front-end, back-end, full stack,
mobile or low-code developer
Tanya
Domain Expert
Her Job:
To transfer knowledge to Watson for
a successful user experience.
What she does:
• Range of domain knowledge and
uses that to teach Watson and
develop a custom models
• As Tanya gains more experience
she optimizes her knowledge to
teach Watson to design better
end-user experiences.
Sometimes known as:
Subject matter expert, content
strategist.
His Job:
Transform data into knowledge for
solving business problems.
What he does:
•Runs experiments to build custom
models that solve business problems.
•Use techniques such as Machine
Learning or Deep Learning and works
with Tanya to validate success of
trained models.
Data Science and AI Teams
Need to be enabled for Productivity and Collaboration
Sometimes known as:
ML/DL engineer, Modeler, Data Miner
Ed
Data Engineer
His Job:
Architects how data is organized
and ensures operability
What he does:
• Builds data infrastructure and ETL
pipelines. Works with Spark,
Hadoop, and HDFS.
• Works with data scientist to
transform research models into
production quality systems.
Sometimes known as:
Data infrastructure engineer
Mike
Data Scientist
Deb
The Developer
9
Watson Studio
Supporting the end-to-end ML workflow
Prepare Data
for Analysis
Analyze Data
Build and Train
ML/DL Models
Deploy Models
on Public or
Private Cloud
Monitor, Analyze
and Manage Model
Deployments
Search and Find
Relevant Data
Connect &
Access Data
on Cloud(s) or
On-Premise
Catalog Data
Watson Studio
Tools for supporting the end-to-end AI workflow
Model Lifecycle Management
Machine Learning Runtimes Deep Learning Runtimes
Authoring Tools
Cloud Infrastructure as a Service
• Most popular open source frameworks
• IBM best-in-class frameworks
• Create, collaborate, deploy, and monitor
• Best of breed open source & IBM tools
• Code (R, Python or Scala) and no-code/visual
modeling tools
• Fully managed service
• Container-based resource management
• Elastic pay as you go cpu/gpu power
Demo
12IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation
Try it at https://www.ibm.com/cloud/watson-studio
13
Get Data
Prepare
Data
Build
Models
Deploy
Models
Monitor
and Update
End To End Machine Learning Workflow
Model Deployment Challenges
14© 2018 IBM Corporation
DMG to the rescue!
15
Data Mining Group
dmg.org
Founded in late 1990’s by Professor Robert
Grossman
What is PMML?
Predictive Model Markup Language
• An Open Standard for XML Representation
• Developed by DMG
• Over 30 vendors and organizations
• PMML 4.3, 4.4 Release manager: Svetlana Levitan dmg.org/pmml
Main Components of PMML
Header
Data Dictionary
Transformation Dictionary
Model(s)
PMML under the hood
18© 2018 IBM Corporation
• Application name and version
• Timestamp and copyrightHeader
• Field names and labels
• Data type and measurement level
• Valid and missing values
Data Dictionary
• Define Function
• Derived Fields
Transformation
Dictionary
• Mining Schema
• Specific model contents
Model(s)
Transformations in PMML
• NormContinuous: piece-wise linear transform
• NormDiscrete: map a categorical
field to a set of dummy fields
• Discretize: binning
• MapValues: map one or more categorical fields into another categorical one
• Functions: built-in and user-defined
• Aggregation
PMML Models
o Association Rules Model
o Clustering Model
o General Regression
o Naïve Bayes
o Nearest Neighbor Model
o Neural Network
o Regression
o Tree Model
o Mining Model: composition or
ensemble (or both) of models
o Baseline Model
o Bayesian Network
o Gaussian Process
o Ruleset
o Scorecard
o Sequence Model
o Support Vector Machine
o Time Series
An example PMML – Data Dictionary, Transformations
21
Example PMML – Neural Network MiningSchema and inputs
22
Predictors
Example PMML - Neural Network hidden layer and outputs
23
Hidden layer neuron
Output
Layer
Neurons
Connecting target
to the neurons
<Node id=“0"> <True/>
<Node id=“1" score="Iris-setosa" recordCount="50.0">
<SimplePredicate field="petal_length"
operator="lessOrEqual“
value=“2.6"/>
<ScoreDistribution value="Iris-setosa"
recordCount="50.0"/>
<ScoreDistribution value="Iris-versicolor"
recordCount="0.0"/>
<ScoreDistribution value="Iris-virginica"
recordCount="0.0"/>
</Node>
<Node id=“2">
<SimplePredicate field="petal_length"
operator="greaterThan“
value=“2.6"/>
Example PMML for a Tree Model
PMML Powered
From
http://dmg.org/pmml/prod
ucts.html:
Alpine Data
Angoss
BigML
Equifax
Experian
FICO
Fiserv
Frontline Solvers
GDS Link
IBM (Includes SPSS)
JPMML
KNIME
KXEN
Liga Data
Microsoft
MicroStrategy
NG Data
Open Data
Opera
Pega
Pervasive Data Rush
Predixion Software
Rapid I
R
Salford Systems (Minitab)
SAND
SAS
Software AG (incl. Zementis)
Spark
Sparkling Logic
Teradata
TIBCO
WEKA
Benefits of PMML
Allows
seamless
deployment
and model
exchange
Transparency:
human and
machine-
readable
Fosters best
practices in
model building
and
deployment
PMML in Python
JPMML package is created and maintained by Villu Rasmussen.
From https://stackoverflow.com/questions/33221331/export-python-scikit-learn-models-into-pmml
pip install git+https://github.com/jpmml/sklearn2pmml.git
Example of how export a classifier tree to PMML. First grow the tree:
# example tree & viz from http://scikit-learn.org/stable/modules/tree.html
from sklearn import datasets, tree
iris = datasets.load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
SkLearn2PMML conversion takes 2 arguments: an estimator (our clf) and a mapper for preprocessing.
Our mapper is pretty basic, since no transformations.
from sklearn_pandas import DataFrameMapper
default_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']])
from sklearn2pmml import sklearn2pmml
sklearn2pmml(estimator=clf, mapper=default_mapper, pmml=“C:/workspace/IrisClassificationTree.pmml")
PMML in R
R is a programming language and software environment for statistical
computing and graphics supported by the R Foundation for Statistical Computing.
R package “pmml”
https://cran.r-project.org/package=pmml
Depends on XML package
Supports a number of R models
Maintained by Dmitriy Bolotov and Tridivesh Jena from Software AG
Create PMML in R (using R Studio)
>library(XML);
>library(pmml);
> data(iris);
Build and save a linear regression model predicting Sepal length:
> irisLR<-lm(Sepal.Length~.,iris)
>saveXML( pmml(irisLR), "IrisLR.xml" )
Build and save a decision tree (C&RT) model predicting class:
> irisTree <- rpart( Species~., iris )
> saveXML( pmml( irisTree ), "IrisTree.xml" )
Deploy a model or a pipeline into Watson Machine Learning
30
In Project view
“+ New Watson Machine Learning model”,
Give a name, select “From File”
Scoring PMML in Watson Machine Learning
31
32
Other model deployment formats
PFA from DMG – an emerging format, JSON-based, big potential
Pickle in Python – binary serialization of Scikit Learn models
MLeap - open format, not a standard, uses protobuf. Works for Spark ML models
Open Neural Network Exchange (ONNX) from Microsoft and Facebook
• binary (protobuf) format for deep learning models
• describes computation graph (including operators)
• supported by most deep learning frameworks (TF still in progress)
• now adding support for traditional ML
Neural Network Exchange Format (NNEF) by Khronos Group
• Dual format (weights in binary file)
• Allows user-defined functions
• Only neural network models
Docker containers
ONNX use pattern
ONNX IR Spec
.onnxFrontend
Models in different
frameworks
Tools
Netron visualizer
Net Drawer visualizer
Checker
Shape Inferencer
Graph Optimizer
Opset Version Converter
Backend
Models in different
frameworks
Training
Inference
Export Import
Run
33
IBM Meetup on November 1, 2018: Machine Learning made easy with Watson Studio

More Related Content

PDF
Machine Learning on IBM Watson Studio
PPTX
Simplifying AI and Machine Learning with Watson Studio
PPTX
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
PDF
Making Data Scientists Productive in Azure
PDF
Machine Learning at Hand with Power BI
PPTX
IBM Deep Learning Overview
PPTX
Machine learning and Deep learning on edge devices using TensorFlow
PDF
IBM Watson Analytics Presentation
Machine Learning on IBM Watson Studio
Simplifying AI and Machine Learning with Watson Studio
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
Making Data Scientists Productive in Azure
Machine Learning at Hand with Power BI
IBM Deep Learning Overview
Machine learning and Deep learning on edge devices using TensorFlow
IBM Watson Analytics Presentation

What's hot (20)

PDF
Building predictive models in Azure Machine Learning
PPTX
An AI Maturity Roadmap for Becoming a Data-Driven Organization
PPTX
5 Reasons to Move Your BI to the Cloud
PDF
Analytics in a Day Virtual Workshop
 
PDF
AI with Azure Machine Learning
PDF
Models in Minutes using AutoML
PPTX
Afternoons with Azure - Power BI and Azure Analysis Services
 
PPTX
Accelerating Data Science and Machine Learning Workflow with Azure Machine Le...
PPTX
AzureML TechTalk
PDF
201908 Overview of Automated ML
PDF
Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Jung
PPTX
platform for Machine Learning
PPTX
Power BI Advanced Data Modeling Virtual Workshop
 
PDF
Azure AI platform - Automated ML workshop
PDF
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
PDF
Azure Machine Learning tutorial
PDF
Watson - Who What Why
PPTX
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
PDF
Large-Scale Machine Learning at Twitter
PDF
Deep Learning Introduction - WeCloudData
Building predictive models in Azure Machine Learning
An AI Maturity Roadmap for Becoming a Data-Driven Organization
5 Reasons to Move Your BI to the Cloud
Analytics in a Day Virtual Workshop
 
AI with Azure Machine Learning
Models in Minutes using AutoML
Afternoons with Azure - Power BI and Azure Analysis Services
 
Accelerating Data Science and Machine Learning Workflow with Azure Machine Le...
AzureML TechTalk
201908 Overview of Automated ML
Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Jung
platform for Machine Learning
Power BI Advanced Data Modeling Virtual Workshop
 
Azure AI platform - Automated ML workshop
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Azure Machine Learning tutorial
Watson - Who What Why
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Large-Scale Machine Learning at Twitter
Deep Learning Introduction - WeCloudData
Ad

Similar to IBM Meetup on November 1, 2018: Machine Learning made easy with Watson Studio (20)

PDF
IBM i & Data Science in the AI era.
PDF
Ibm db2update2019 machine learning and db2 ai
PPTX
Data Science at Speed. At Scale.
PDF
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
PDF
Ibm machine learning for z os
PDF
The Future of Data Science
PPTX
Machine Learning AND Deep Learning for OpenPOWER
PDF
Past, present and future of predictive APIs - Poul Petersen
PDF
How to Consume Your Data for AI
PDF
BBBT Watson Data Platform Presentation
PDF
EDW 2015 cognitive computing panel session
PPTX
AI hype or reality
PPTX
Master the art of Data Science
PDF
[Srijan Wednesday Webinars] Artificial Intelligence & the Future of Business
PPTX
The world of Machine Learning, Deep Learning and PowerAI
PDF
Machine Learning: The First Salvo of the AI Business Revolution
PPTX
Lectuhhhhhhhhhhhhhhhhhhhhhhbbbhhhre 1.pptx
PDF
The Future of Enterprise AI Depends on Continuous Quality with Mike Gualtieri
PPTX
demo AI ML.pptx
PDF
Machine Learning and Power AI Workshop v4
IBM i & Data Science in the AI era.
Ibm db2update2019 machine learning and db2 ai
Data Science at Speed. At Scale.
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Ibm machine learning for z os
The Future of Data Science
Machine Learning AND Deep Learning for OpenPOWER
Past, present and future of predictive APIs - Poul Petersen
How to Consume Your Data for AI
BBBT Watson Data Platform Presentation
EDW 2015 cognitive computing panel session
AI hype or reality
Master the art of Data Science
[Srijan Wednesday Webinars] Artificial Intelligence & the Future of Business
The world of Machine Learning, Deep Learning and PowerAI
Machine Learning: The First Salvo of the AI Business Revolution
Lectuhhhhhhhhhhhhhhhhhhhhhhbbbhhhre 1.pptx
The Future of Enterprise AI Depends on Continuous Quality with Mike Gualtieri
demo AI ML.pptx
Machine Learning and Power AI Workshop v4
Ad

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Introduction to Business Data Analytics.
PDF
Foundation of Data Science unit number two notes
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
oil_refinery_comprehensive_20250804084928 (1).pptx
climate analysis of Dhaka ,Banglades.pptx
.pdf is not working space design for the following data for the following dat...
Reliability_Chapter_ presentation 1221.5784
Moving the Public Sector (Government) to a Digital Adoption
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction to Business Data Analytics.
Foundation of Data Science unit number two notes
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Fluorescence-microscope_Botany_detailed content
IB Computer Science - Internal Assessment.pptx
1_Introduction to advance data techniques.pptx
Launch Your Data Science Career in Kochi – 2025
Business Ppt On Nestle.pptx huunnnhhgfvu
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx

IBM Meetup on November 1, 2018: Machine Learning made easy with Watson Studio

  • 1. Machine Learning made easy with Watson Studio Hosted by IBM Chicago Nov 1st, 2018
  • 2. Presenters 22018 IBM Corporation Ahsan Rehman Data Scientist & Developer Advocate IBM Analytics Masters of Science in Analytics rehmana@us.ibm.com IBM Chicago Svetlana Levitan Developer Advocate and PMML Release Manager Center for Open Data and AI Technologies (CODAIT) IBM Digital Business Group slevitan@us.ibm.com Adam Massachi Offering Manager for Watson Studio IBM Analytics Degrees in Math and Philosophy adam.massachi@ibm.com
  • 3. Special Speaker 32018 IBM Corporation IBM Chicago Jing Shyr Chief Statistician, IBM Fellow - Analytics Subject matter expert for statistical analysis and data mining, working knowledge of applying predictive analytics to business problems
  • 4. Machine Learning Deep Learning and AI are everywhere 42018 IBM Corporation facial recognition unlocks your phone fraud detection protects your credit recommendations help you shop faster what other users like you buy what gets bought together speech recognition lets you go hands-free business processes automate human tasks e.g. train based on human decisions autonomous vehicles detect pedestrians machine vision detects cancer early spam detection unclogs your Inbox The future is now
  • 5. How does machine learning work? A machine learning model is trained to recognize patterns in historical data data data data data 1 data data The model is then used with new data and asked to predict or classify it. 2 If patterns in the new data match the training data then the model makes accurate predictions 3 prediction or classification ??? Machine learning requires TONS OF DATA 5IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation need good data
  • 6. 6IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation Enterprises generate TONS OF DATA Data Data that requires governance Which must be cleaned and shaped for training …then models must be designed …that must be hosted and monitored …and trained on high performance compute SPSS To select an optimal model... need good data
  • 7. Tools & Infrastructure • Need an environment that enables quick experiments and real outcomes • Discrete tools present barriers to productivity Data Governance • If the data isn’t secure, self-service isn’t a reality • Challenge understanding data lineage and getting to a system of truth Skills • Data Science skills are in low supply and high demand • Nurturing new data professionals is challenging Data Access • Data resides in silos & difficult to access • Unstructured and external data wasn’t considered 7 2018 IBM Corporation Why are enterprises struggling to capture the value of AI?
  • 8. Watson Studio: Accelerating value from AI for EnterprisesWatson Studio with Knowledge Catalog accelerates the machine and deep learning workflows required to infuse AI into your business to drive innovation. It provides a suite of tools for data scientists/engineers, app developers, domain experts to collaboratively connect, work with and analyze data, and build, train, and deploy models. AI Requires Teamwork 82018 IBM Corporation • AI is not magic • AI is algorithms + data + team
  • 9. Her Job: Builds AI application that meet the requirements of the business. What she does: • Starts PoCs which includes gathering content, dialog building and model training • Focus is on app building for the team or company to use. Will handle ML Ops as needed Sometimes known as: Front-end, back-end, full stack, mobile or low-code developer Tanya Domain Expert Her Job: To transfer knowledge to Watson for a successful user experience. What she does: • Range of domain knowledge and uses that to teach Watson and develop a custom models • As Tanya gains more experience she optimizes her knowledge to teach Watson to design better end-user experiences. Sometimes known as: Subject matter expert, content strategist. His Job: Transform data into knowledge for solving business problems. What he does: •Runs experiments to build custom models that solve business problems. •Use techniques such as Machine Learning or Deep Learning and works with Tanya to validate success of trained models. Data Science and AI Teams Need to be enabled for Productivity and Collaboration Sometimes known as: ML/DL engineer, Modeler, Data Miner Ed Data Engineer His Job: Architects how data is organized and ensures operability What he does: • Builds data infrastructure and ETL pipelines. Works with Spark, Hadoop, and HDFS. • Works with data scientist to transform research models into production quality systems. Sometimes known as: Data infrastructure engineer Mike Data Scientist Deb The Developer 9
  • 10. Watson Studio Supporting the end-to-end ML workflow Prepare Data for Analysis Analyze Data Build and Train ML/DL Models Deploy Models on Public or Private Cloud Monitor, Analyze and Manage Model Deployments Search and Find Relevant Data Connect & Access Data on Cloud(s) or On-Premise Catalog Data
  • 11. Watson Studio Tools for supporting the end-to-end AI workflow Model Lifecycle Management Machine Learning Runtimes Deep Learning Runtimes Authoring Tools Cloud Infrastructure as a Service • Most popular open source frameworks • IBM best-in-class frameworks • Create, collaborate, deploy, and monitor • Best of breed open source & IBM tools • Code (R, Python or Scala) and no-code/visual modeling tools • Fully managed service • Container-based resource management • Elastic pay as you go cpu/gpu power
  • 12. Demo 12IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation Try it at https://www.ibm.com/cloud/watson-studio
  • 14. Model Deployment Challenges 14© 2018 IBM Corporation
  • 15. DMG to the rescue! 15 Data Mining Group dmg.org Founded in late 1990’s by Professor Robert Grossman
  • 16. What is PMML? Predictive Model Markup Language • An Open Standard for XML Representation • Developed by DMG • Over 30 vendors and organizations • PMML 4.3, 4.4 Release manager: Svetlana Levitan dmg.org/pmml
  • 17. Main Components of PMML Header Data Dictionary Transformation Dictionary Model(s)
  • 18. PMML under the hood 18© 2018 IBM Corporation • Application name and version • Timestamp and copyrightHeader • Field names and labels • Data type and measurement level • Valid and missing values Data Dictionary • Define Function • Derived Fields Transformation Dictionary • Mining Schema • Specific model contents Model(s)
  • 19. Transformations in PMML • NormContinuous: piece-wise linear transform • NormDiscrete: map a categorical field to a set of dummy fields • Discretize: binning • MapValues: map one or more categorical fields into another categorical one • Functions: built-in and user-defined • Aggregation
  • 20. PMML Models o Association Rules Model o Clustering Model o General Regression o Naïve Bayes o Nearest Neighbor Model o Neural Network o Regression o Tree Model o Mining Model: composition or ensemble (or both) of models o Baseline Model o Bayesian Network o Gaussian Process o Ruleset o Scorecard o Sequence Model o Support Vector Machine o Time Series
  • 21. An example PMML – Data Dictionary, Transformations 21
  • 22. Example PMML – Neural Network MiningSchema and inputs 22 Predictors
  • 23. Example PMML - Neural Network hidden layer and outputs 23 Hidden layer neuron Output Layer Neurons Connecting target to the neurons
  • 24. <Node id=“0"> <True/> <Node id=“1" score="Iris-setosa" recordCount="50.0"> <SimplePredicate field="petal_length" operator="lessOrEqual“ value=“2.6"/> <ScoreDistribution value="Iris-setosa" recordCount="50.0"/> <ScoreDistribution value="Iris-versicolor" recordCount="0.0"/> <ScoreDistribution value="Iris-virginica" recordCount="0.0"/> </Node> <Node id=“2"> <SimplePredicate field="petal_length" operator="greaterThan“ value=“2.6"/> Example PMML for a Tree Model
  • 25. PMML Powered From http://dmg.org/pmml/prod ucts.html: Alpine Data Angoss BigML Equifax Experian FICO Fiserv Frontline Solvers GDS Link IBM (Includes SPSS) JPMML KNIME KXEN Liga Data Microsoft MicroStrategy NG Data Open Data Opera Pega Pervasive Data Rush Predixion Software Rapid I R Salford Systems (Minitab) SAND SAS Software AG (incl. Zementis) Spark Sparkling Logic Teradata TIBCO WEKA
  • 26. Benefits of PMML Allows seamless deployment and model exchange Transparency: human and machine- readable Fosters best practices in model building and deployment
  • 27. PMML in Python JPMML package is created and maintained by Villu Rasmussen. From https://stackoverflow.com/questions/33221331/export-python-scikit-learn-models-into-pmml pip install git+https://github.com/jpmml/sklearn2pmml.git Example of how export a classifier tree to PMML. First grow the tree: # example tree & viz from http://scikit-learn.org/stable/modules/tree.html from sklearn import datasets, tree iris = datasets.load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) SkLearn2PMML conversion takes 2 arguments: an estimator (our clf) and a mapper for preprocessing. Our mapper is pretty basic, since no transformations. from sklearn_pandas import DataFrameMapper default_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']]) from sklearn2pmml import sklearn2pmml sklearn2pmml(estimator=clf, mapper=default_mapper, pmml=“C:/workspace/IrisClassificationTree.pmml")
  • 28. PMML in R R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. R package “pmml” https://cran.r-project.org/package=pmml Depends on XML package Supports a number of R models Maintained by Dmitriy Bolotov and Tridivesh Jena from Software AG
  • 29. Create PMML in R (using R Studio) >library(XML); >library(pmml); > data(iris); Build and save a linear regression model predicting Sepal length: > irisLR<-lm(Sepal.Length~.,iris) >saveXML( pmml(irisLR), "IrisLR.xml" ) Build and save a decision tree (C&RT) model predicting class: > irisTree <- rpart( Species~., iris ) > saveXML( pmml( irisTree ), "IrisTree.xml" )
  • 30. Deploy a model or a pipeline into Watson Machine Learning 30 In Project view “+ New Watson Machine Learning model”, Give a name, select “From File”
  • 31. Scoring PMML in Watson Machine Learning 31
  • 32. 32 Other model deployment formats PFA from DMG – an emerging format, JSON-based, big potential Pickle in Python – binary serialization of Scikit Learn models MLeap - open format, not a standard, uses protobuf. Works for Spark ML models Open Neural Network Exchange (ONNX) from Microsoft and Facebook • binary (protobuf) format for deep learning models • describes computation graph (including operators) • supported by most deep learning frameworks (TF still in progress) • now adding support for traditional ML Neural Network Exchange Format (NNEF) by Khronos Group • Dual format (weights in binary file) • Allows user-defined functions • Only neural network models Docker containers
  • 33. ONNX use pattern ONNX IR Spec .onnxFrontend Models in different frameworks Tools Netron visualizer Net Drawer visualizer Checker Shape Inferencer Graph Optimizer Opset Version Converter Backend Models in different frameworks Training Inference Export Import Run 33

Editor's Notes

  • #5: AI is typically defined as the ability of a machine to perform cognitive functions we associate with human minds, such as perceiving, reasoning, learning, interacting with the environment, problem solving, and even exercising creativity. Examples of technologies that enable AI to solve business problems are robotics and autonomous vehicles, computer vision, language, virtual agents, and machine learning. A convergence of algorithmic advances, data proliferation, and tremendous increases in computing power and storage has propelled AI from hype to reality. Most recent advances in AI have been achieved by applying machine learning to very large data sets. Machine learning algorithms detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instruction.  The algorithms also adapt in response to new data and experiences to improve efficacy over time. 
  • #17: dmg.org: PMML is the leading standard for statistical and data mining models and supported by over 20 vendors and organizations. With PMML, it is easy to develop a model on one system using one application and deploy the model on another system using another application, simply by transmitting an XML configuration file.
  • #18: PMML can describe the entire data processing pipeline, including data preparation, one or several models, post-processing of model predictions.
  • #28: Villu creates many Mantis issues for PMML, helps to improve it. Here we build a tree model for classifying the iris flowers based on its dimensions, then export PMML.