SlideShare a Scribd company logo
© 2018 KNIME AG. All Right Reserved.
From	Raw	Data	to	Deployment
Kilian.Thiel@knime.com
Marten.Pfannenschmidt@knime.com
Kathrin.Melcher@knime.com
KNIME
© 2018 KNIME AG. All Rights Reserved.
Do	you	recognize	this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2018 KNIME AG. All Rights Reserved.
Let’s	unroll	it!
It	always	starts	
with	some	data	…
3
Data	
Preparation
Model	
Training
Model	
Optimization
Deployment
Data	Manipulation
Data	Blending
Missing	Values	Handling
Feature	Generation
Dimensionality	Reduction
Feature	Selection
Outlier	Removal
Normalization
Partitioning
…
Model	Training
Bag	of	Models
Model	Selection
Ensemble	Models
Own	Ensemble	Model
External	Models
Import	Existing	Models
Model	Factory
…
Parameter	Tuning
Parameter	Optimization
Regularization
Model	Size
No	Iterations
…
Performance	Measures
Accuracy
ROC	Curve
Cross-Validation
…
Files	&	DBs
Dashboards
REST	API
SQL	Code	Export
Reporting
…
Model	
Evaluation
© 2018 KNIME AG. All Rights Reserved.
The	many	Lives	of	a	Dataset
4
Data	
Preparation
Model	
Training
Model	
Optimization
Model	
Evaluation
Deployment
Partitioning:
• Training	Set
• Validation	Set
• Test	Set
Training	Set Validation	Set Test	Set New	Data	from	Real	
World	Applications
Original	Data	
Set	with	Past	
Observations
© 2018 KNIME AG. All Rights Reserved.
Data	Exploration
• Sometimes	in	between	Data	Access	and	Data	
Preparation	there	is	a	Data	Exploration	phase
• The	Data	Exploration	phase	is	useful	to	get	to	
know	the	data
• KNIME	offers	a	few	visualization	nodes	to	build	
dashboards	to	explore	the	data
5
© 2018 KNIME AG. All Rights Reserved.
One	Example	for	Every	Need
The	KNIME	EXAMPLES	Server
6
50_Applications/27_FromRawDataToDeployment
© 2018 KNIME AG. All Rights Reserved.
Classification	Problem	&	Data	Set
• Airline	Dataset:	http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller	dataset	(Jan	2007)	(AirlineDataset.table)
• Challenge:
Predict	Departure	Delays	
If	on	original	airline	dataset,	only	flights	from	airport	ORD
Output	Class	=	“delay”	if	depdelay >	15min	
otherwise	“no	delay”
Input	features	all	what	is	available	and	more	if	you	can	find	it!
7
© 2018 KNIME AG. All Rights Reserved.
Challenges
• Group	1. Data	Access	and	Data	Preparation
• Group	2. ML	Model	Training
• Group	3. Model	Deployment
• Import	file	Learnathon_2018.knar into	your	workspace	
8
© 2018 KNIME AG. All Rights Reserved.
Group	1. Data	Access	and	Data	Preparation
9
© 2018 KNIME AG. All Rights Reserved.
Group	2.	Model	Training	&	Optimization
10
© 2018 KNIME AG. All Rights Reserved.
Group	3. Deployment
11
© 2018 KNIME AG. All Rights Reserved.
KNIME	Spring	Summit	2018
March	5	– 9	at	Hotel	Berlin,	Berlin	in	Germany
• Monday	&	Tuesday:	One-day	courses
• Wednesday	&	Thursday: Summit	sessions
• Friday:	Workshops
Use	the	code
LEARNATHON
for	10% off	tickets!
Register	at	
www.KNIME.com
© 2018 KNIME AG. All Rights Reserved.
KNIME	Beginner’s	Luck	Book
Free	Copy	of	KNIME	Beginner’s	Luck	Book	at	KNIME	Press	
https://www.knime.org/knimepress
Promotion	Code:
KNIME_Learnathon_2018
© 2018 KNIME AG. All Rights Reserved.
You	can	find	KNIMers here!
14
• KNIME (www.knime.org)
• BLOG	for	news,	tips	and	tricks(www.knime.org/blog)
• FORUM for	questions	and	answers	(tech.knime.org/forum)
• EXAMPLE	SERVER	for	example	workflows
• LEARNING	HUB (www.knime.org/learning-hub)
• KNIME	TV		channel on
• KNIME	on														@KNIME
• KNIME on https://www.facebook.com/KNIMEanalytics
• KNIME	User	Group	UK	on	
https://www.meetup.com/KNIME-User-Group-UK/
© 2017 KNIME AG. All Rights Reserved. 15
The	KNIME®	trademark	and	logo	and	OPEN	FOR	INNOVATION®	trademark	are	used	by	KNIME.com	AG	under	license	from	KNIME	GmbH,	
and	are	registered	in	the	United	States.	KNIME®	is	also	registered	in	Germany.
Thank	You!

More Related Content

PDF
Just add Imagination
PDF
Open Source Story and what’s new in KNIME Software
PDF
Sharing and Deploying Data Science with KNIME Server
PDF
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
PDF
Webinar: Behind the Scenes on Guided Analytics
PDF
Guided Automation- A Blueprint for Interactive Automated Machine Learning
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
Just add Imagination
Open Source Story and what’s new in KNIME Software
Sharing and Deploying Data Science with KNIME Server
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Webinar: Behind the Scenes on Guided Analytics
Guided Automation- A Blueprint for Interactive Automated Machine Learning
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019

What's hot (18)

PDF
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
PDF
AWS reInvent 2019 Trip Report
PDF
#AI + #Cloud = #DigitalTransformation
PPTX
Cloud Governance within The Climate Corporation
PPTX
[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud
PDF
Unlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AEC
PDF
Melodic Keynote presentation at OW2con'19, June 12-13, Paris.
 
PPTX
Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12
PPTX
From Interactive to Automatic CAD Data Prep
PDF
Get Your Aircraft Spare Parts Inventory Management Off the Ground
 
PDF
IPv6 and Cloud Hosting
PPSX
Amberix Energy Efficient Facilities
PPT
What is Capability Analysis?
PDF
Creating a GraphQL API in Python: from Django to fully asynchronous
PPTX
PlaatEnergy Design
PDF
Summer 2017
PDF
AppSphere 15 - Monitoring Cloud & Asynchronous Applications
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
AWS reInvent 2019 Trip Report
#AI + #Cloud = #DigitalTransformation
Cloud Governance within The Climate Corporation
[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud
Unlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AEC
Melodic Keynote presentation at OW2con'19, June 12-13, Paris.
 
Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12
From Interactive to Automatic CAD Data Prep
Get Your Aircraft Spare Parts Inventory Management Off the Ground
 
IPv6 and Cloud Hosting
Amberix Energy Efficient Facilities
What is Capability Analysis?
Creating a GraphQL API in Python: from Django to fully asynchronous
PlaatEnergy Design
Summer 2017
AppSphere 15 - Monitoring Cloud & Asynchronous Applications
Ad

Similar to From raw data to deployment (20)

PDF
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
PPTX
From Raw Data to Deployment
PDF
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
PDF
AI/ML is a Means to Digital Transformation, Not an End Itself
PPTX
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
PPTX
An introduction to Machine Learning with scikit-learn (October 2018)
PPTX
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
PDF
AI for Software Engineering
PDF
AutoML - The Future of AI
PDF
Predictive Analytics - Big Data Warehousing Meetup, Zementis
PDF
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
PDF
Modeling at Scale: SigOpt at TWIMLcon 2019
PPTX
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
PPTX
Securing your Machine Learning models
PDF
Automated Product Data Preparation: Processes, Methods and Algorithms
PPTX
Introduction to Machine Learning on IBM Power Systems
PDF
AI & Altair Altair’s approach to adoption and development of technology has a...
PDF
Amazon SageMaker workshop
PPTX
Forecasting at Scale with Marcello Tomasini
PPTX
When We Spark and When We Don’t: Developing Data and ML Pipelines
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
From Raw Data to Deployment
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
AI/ML is a Means to Digital Transformation, Not an End Itself
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
An introduction to Machine Learning with scikit-learn (October 2018)
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
AI for Software Engineering
AutoML - The Future of AI
Predictive Analytics - Big Data Warehousing Meetup, Zementis
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
Modeling at Scale: SigOpt at TWIMLcon 2019
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Securing your Machine Learning models
Automated Product Data Preparation: Processes, Methods and Algorithms
Introduction to Machine Learning on IBM Power Systems
AI & Altair Altair’s approach to adoption and development of technology has a...
Amazon SageMaker workshop
Forecasting at Scale with Marcello Tomasini
When We Spark and When We Don’t: Developing Data and ML Pipelines
Ad

More from KNIMESlides (16)

PDF
What's New in KNIME Analytics Platform 4.1
PDF
Codeless Deep Learning for Language Modeling and Image Classification
PDF
Automating Inferences out of Financial Data
PDF
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
PDF
Credit Card Fraud Detection Tutorial
PDF
Practicing Data Science: A Collection of Case Studies
PDF
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
PDF
Scoring Metrics for Classification Models
PDF
Sentiment Analysis with KNIME Analytics Platform
PDF
Chemistry Data Basics with KNIME Analytics Platform
PDF
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
PDF
KNIME Software Overview
PDF
Heterogeneous Data Mining with Spark
PDF
Knime customer intelligence on social media: Text Analytics vs. Network Mining
PDF
Text Processing with KNIME
PDF
Big Data with KNIME is as easy as 1, 2, 3, ...4!
What's New in KNIME Analytics Platform 4.1
Codeless Deep Learning for Language Modeling and Image Classification
Automating Inferences out of Financial Data
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial
Practicing Data Science: A Collection of Case Studies
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
Scoring Metrics for Classification Models
Sentiment Analysis with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
KNIME Software Overview
Heterogeneous Data Mining with Spark
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Text Processing with KNIME
Big Data with KNIME is as easy as 1, 2, 3, ...4!

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Introduction to the R Programming Language
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Business Analytics and business intelligence.pdf
Introduction to Knowledge Engineering Part 1
Introduction to the R Programming Language
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Database Infoormation System (DBIS).pptx
Introduction to machine learning and Linear Models
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction-to-Cloud-ComputingFinal.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Ppt On Nestle.pptx huunnnhhgfvu
Qualitative Qantitative and Mixed Methods.pptx
Fluorescence-microscope_Botany_detailed content
Miokarditis (Inflamasi pada Otot Jantung)
Supervised vs unsupervised machine learning algorithms
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx

From raw data to deployment

  • 1. © 2018 KNIME AG. All Right Reserved. From Raw Data to Deployment Kilian.Thiel@knime.com Marten.Pfannenschmidt@knime.com Kathrin.Melcher@knime.com KNIME
  • 2. © 2018 KNIME AG. All Rights Reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3. © 2018 KNIME AG. All Rights Reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4. © 2018 KNIME AG. All Rights Reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5. © 2018 KNIME AG. All Rights Reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6. © 2018 KNIME AG. All Rights Reserved. One Example for Every Need The KNIME EXAMPLES Server 6 50_Applications/27_FromRawDataToDeployment
  • 7. © 2018 KNIME AG. All Rights Reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Input features all what is available and more if you can find it! 7
  • 8. © 2018 KNIME AG. All Rights Reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2018.knar into your workspace 8
  • 9. © 2018 KNIME AG. All Rights Reserved. Group 1. Data Access and Data Preparation 9
  • 10. © 2018 KNIME AG. All Rights Reserved. Group 2. Model Training & Optimization 10
  • 11. © 2018 KNIME AG. All Rights Reserved. Group 3. Deployment 11
  • 12. © 2018 KNIME AG. All Rights Reserved. KNIME Spring Summit 2018 March 5 – 9 at Hotel Berlin, Berlin in Germany • Monday & Tuesday: One-day courses • Wednesday & Thursday: Summit sessions • Friday: Workshops Use the code LEARNATHON for 10% off tickets! Register at www.KNIME.com
  • 13. © 2018 KNIME AG. All Rights Reserved. KNIME Beginner’s Luck Book Free Copy of KNIME Beginner’s Luck Book at KNIME Press https://www.knime.org/knimepress Promotion Code: KNIME_Learnathon_2018
  • 14. © 2018 KNIME AG. All Rights Reserved. You can find KNIMers here! 14 • KNIME (www.knime.org) • BLOG for news, tips and tricks(www.knime.org/blog) • FORUM for questions and answers (tech.knime.org/forum) • EXAMPLE SERVER for example workflows • LEARNING HUB (www.knime.org/learning-hub) • KNIME TV channel on • KNIME on @KNIME • KNIME on https://www.facebook.com/KNIMEanalytics • KNIME User Group UK on https://www.meetup.com/KNIME-User-Group-UK/
  • 15. © 2017 KNIME AG. All Rights Reserved. 15 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!