SlideShare a Scribd company logo
© 2018 KNIME AG. All Right Reserved.
From Raw Data to Deployment
Scott.Fincher@knime.com
Jeanette.Prinz@knime.com
Kathrin.Melcher@knime.com
@KNIME #KNIMERoadshow
© 2018 KNIME AG. All Rights Reserved.
Do you recognize this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2018 KNIME AG. All Rights Reserved.
Let’s unroll it!
It always starts
with some data …
3
Data
Preparation
Model
Training
Model
Optimization
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
Model
Evaluation
© 2018 KNIME AG. All Rights Reserved.
The many Lives of a Dataset
4
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Partitioning:
• Training Set
• Validation Set
• Test Set
Training Set Validation Set Test Set New Data from Real
World Applications
Original Data
Set with Past
Observations
© 2018 KNIME AG. All Rights Reserved.
Data Exploration
• Sometimes in between Data Access and Data
Preparation there is a Data Exploration phase
• The Data Exploration phase is useful to get to
know the data
• KNIME offers a few visualization nodes to build
dashboards to explore the data
5
© 2018 KNIME AG. All Rights Reserved.
What about Big Data?
• Big Data serves Scalability
• The whole Analytics Process is no different on
Big Data
• You need:
– a Big Data Platform
– The KNIME Big Data (Spark & Hive) Extension
6
© 2018 KNIME AG. All Rights Reserved.
One Example for Every Need
The KNIME EXAMPLES Server
7
50_Applications/27_FromRawDataToDeployment
© 2018 KNIME AG. All Rights Reserved.
Classification Problem & Data Set
• Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller dataset (Jan 2007) (AirlineDataset.table)
• Challenge:
Predict Departure Delays
If on original airline dataset, only flights from airport ORD
Output Class = “delay” if depdelay > 15min
otherwise “no delay”
Input features all what is available and more if you can find it!
8
© 2018 KNIME AG. All Rights Reserved.
Challenges
• Group 1. Data Access and Data Preparation
• Group 2. ML Model Training
• Group 3. Model Deployment
• Import file Learnathon_2018.knar into your workspace
9
© 2018 KNIME AG. All Rights Reserved.
Group 1. Data Access and Data Preparation
10
© 2018 KNIME AG. All Rights Reserved.
Group 2. Model Training & Optimization
11
© 2018 KNIME AG. All Rights Reserved.
Group 3. Deployment
12
© 2018 KNIME AG. All Rights Reserved.
One Week of KNIME Courses in Austin
• Course for KNIME Analytics Platform, April 23-24, 2018
• Course for KNIME Server, April 25, 2018
• Text Mining Course for KNIME Analytics Platform, April 26, 2018
• Big Data Course for KNIME Analytics Platform, April 27, 2018
13
© 2018 KNIME AG. All Rights Reserved.
KNIME Fall Summit 2018
November 6 – 9 at AT&T Executive Education and Conference Center,
Austin, Texas
• Tuesday & Wednesday: One-day courses
• Thursday & Friday: Summit sessions
Use the code
US-ROADSHOW
for 10% off tickets!
Register at
www.KNIME.com
© 2018 KNIME AG. All Rights Reserved.
KNIME Beginner’s Luck Book
Free Copy of KNIME Beginner’s Luck Book at KNIME Press
https://www.knime.org/knimepress
Promotion Code:
KNIME_Learnathon_2018
© 2018 KNIME AG. All Rights Reserved.
You can find KNIMers here!
16
• KNIME (www.knime.com)
• BLOG for news, tips and tricks(www.knime.com/blog)
• FORUM for questions and answers (tech.knime.com/forum)
• EXAMPLE SERVER for example workflows
• LEARNING HUB (www.knime.com/learning-hub)
• KNIME TV channel on
• KNIME on @KNIME
• KNIME on https://www.facebook.com/KNIMEanalytics
• On
© 2017 KNIME AG. All Rights Reserved. 17
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.
Thank You!

More Related Content

PDF
Chemistry Data Basics with KNIME Analytics Platform
PDF
What's New in KNIME Analytics Platform 4.1
PDF
From raw data to deployment
PDF
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
PDF
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment
PDF
Heterogeneous Data Mining with Spark
PDF
Sharing and Deploying Data Science with KNIME Server
Chemistry Data Basics with KNIME Analytics Platform
What's New in KNIME Analytics Platform 4.1
From raw data to deployment
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
KNIME Data Science Learnathon: From Raw Data To Deployment
Heterogeneous Data Mining with Spark
Sharing and Deploying Data Science with KNIME Server

What's hot (20)

PDF
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
PDF
Sentiment Analysis with KNIME Analytics Platform
PDF
Knime customer intelligence on social media odsc london
PDF
Webinar: Behind the Scenes on Guided Analytics
PDF
Open Source Story and what’s new in KNIME Software
PDF
Knime customer intelligence on social media: Text Analytics vs. Network Mining
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
PPTX
Transforming KNIME Consumer Data into Actionable Insights
PDF
Just add Imagination
PDF
KNIME - Create Workflow with KNIME
PDF
Guided Automation- A Blueprint for Interactive Automated Machine Learning
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
PDF
KNIME Software Overview
PDF
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
PDF
Text Processing with KNIME
PDF
Scoring Metrics for Classification Models
PDF
The Race To Better Datacenters - Tailormade Colocation by Globalways AG
PDF
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
PDF
Gerrit topics support with AWS Lambda
PDF
Smart orchestrator for pipeline processing chain applied to space data cwin18...
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Sentiment Analysis with KNIME Analytics Platform
Knime customer intelligence on social media odsc london
Webinar: Behind the Scenes on Guided Analytics
Open Source Story and what’s new in KNIME Software
Knime customer intelligence on social media: Text Analytics vs. Network Mining
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
Transforming KNIME Consumer Data into Actionable Insights
Just add Imagination
KNIME - Create Workflow with KNIME
Guided Automation- A Blueprint for Interactive Automated Machine Learning
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
KNIME Software Overview
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Text Processing with KNIME
Scoring Metrics for Classification Models
The Race To Better Datacenters - Tailormade Colocation by Globalways AG
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Gerrit topics support with AWS Lambda
Smart orchestrator for pipeline processing chain applied to space data cwin18...
Ad

Similar to From Raw Data to Deployment (20)

PPTX
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
PPTX
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
PDF
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
PDF
ODSC data science to DataOps
PDF
Fri benghiat gil-odsc-data-kitchen-data science to dataops
PPTX
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
PDF
Processing malaria HTS results using KNIME: a tutorial
PPTX
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
PPTX
Energy Central Webinar on June 14, 2016
PPTX
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
PPTX
Quickly and easily build, train, and deploy machine learning models at any scale
PDF
DutchMLSchool. ML for Energy Trading and Automotive Sector
PDF
Using Machine Learning to Understand and Predict Marketing ROI
PDF
AI/ML is a Means to Digital Transformation, Not an End Itself
PDF
Minitab webinar from unicorns to racehorses - presentation slides
PDF
From unicorns to race horses. Es el momento de Machine Learning para Excelenc...
PPTX
Washington DC DataOps Meetup -- Nov 2019
PDF
Amazon SageMaker workshop
PDF
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
PPTX
ODSC May 2019 - The DataOps Manifesto
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
ODSC data science to DataOps
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Processing malaria HTS results using KNIME: a tutorial
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Energy Central Webinar on June 14, 2016
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Quickly and easily build, train, and deploy machine learning models at any scale
DutchMLSchool. ML for Energy Trading and Automotive Sector
Using Machine Learning to Understand and Predict Marketing ROI
AI/ML is a Means to Digital Transformation, Not an End Itself
Minitab webinar from unicorns to racehorses - presentation slides
From unicorns to race horses. Es el momento de Machine Learning para Excelenc...
Washington DC DataOps Meetup -- Nov 2019
Amazon SageMaker workshop
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
ODSC May 2019 - The DataOps Manifesto
Ad

More from KNIMESlides (6)

PDF
Codeless Deep Learning for Language Modeling and Image Classification
PDF
Automating Inferences out of Financial Data
PDF
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
PDF
Credit Card Fraud Detection Tutorial
PDF
Practicing Data Science: A Collection of Case Studies
PDF
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Codeless Deep Learning for Language Modeling and Image Classification
Automating Inferences out of Financial Data
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial
Practicing Data Science: A Collection of Case Studies
Big Data with KNIME is as easy as 1, 2, 3, ...4!

Recently uploaded (20)

PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Lecture1 pattern recognition............
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
1_Introduction to advance data techniques.pptx
climate analysis of Dhaka ,Banglades.pptx
Fluorescence-microscope_Botany_detailed content
Reliability_Chapter_ presentation 1221.5784
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction-to-Cloud-ComputingFinal.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Lecture1 pattern recognition............

From Raw Data to Deployment

  • 1. © 2018 KNIME AG. All Right Reserved. From Raw Data to Deployment Scott.Fincher@knime.com Jeanette.Prinz@knime.com Kathrin.Melcher@knime.com @KNIME #KNIMERoadshow
  • 2. © 2018 KNIME AG. All Rights Reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3. © 2018 KNIME AG. All Rights Reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4. © 2018 KNIME AG. All Rights Reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5. © 2018 KNIME AG. All Rights Reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6. © 2018 KNIME AG. All Rights Reserved. What about Big Data? • Big Data serves Scalability • The whole Analytics Process is no different on Big Data • You need: – a Big Data Platform – The KNIME Big Data (Spark & Hive) Extension 6
  • 7. © 2018 KNIME AG. All Rights Reserved. One Example for Every Need The KNIME EXAMPLES Server 7 50_Applications/27_FromRawDataToDeployment
  • 8. © 2018 KNIME AG. All Rights Reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Input features all what is available and more if you can find it! 8
  • 9. © 2018 KNIME AG. All Rights Reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2018.knar into your workspace 9
  • 10. © 2018 KNIME AG. All Rights Reserved. Group 1. Data Access and Data Preparation 10
  • 11. © 2018 KNIME AG. All Rights Reserved. Group 2. Model Training & Optimization 11
  • 12. © 2018 KNIME AG. All Rights Reserved. Group 3. Deployment 12
  • 13. © 2018 KNIME AG. All Rights Reserved. One Week of KNIME Courses in Austin • Course for KNIME Analytics Platform, April 23-24, 2018 • Course for KNIME Server, April 25, 2018 • Text Mining Course for KNIME Analytics Platform, April 26, 2018 • Big Data Course for KNIME Analytics Platform, April 27, 2018 13
  • 14. © 2018 KNIME AG. All Rights Reserved. KNIME Fall Summit 2018 November 6 – 9 at AT&T Executive Education and Conference Center, Austin, Texas • Tuesday & Wednesday: One-day courses • Thursday & Friday: Summit sessions Use the code US-ROADSHOW for 10% off tickets! Register at www.KNIME.com
  • 15. © 2018 KNIME AG. All Rights Reserved. KNIME Beginner’s Luck Book Free Copy of KNIME Beginner’s Luck Book at KNIME Press https://www.knime.org/knimepress Promotion Code: KNIME_Learnathon_2018
  • 16. © 2018 KNIME AG. All Rights Reserved. You can find KNIMers here! 16 • KNIME (www.knime.com) • BLOG for news, tips and tricks(www.knime.com/blog) • FORUM for questions and answers (tech.knime.com/forum) • EXAMPLE SERVER for example workflows • LEARNING HUB (www.knime.com/learning-hub) • KNIME TV channel on • KNIME on @KNIME • KNIME on https://www.facebook.com/KNIMEanalytics • On
  • 17. © 2017 KNIME AG. All Rights Reserved. 17 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!