SlideShare a Scribd company logo
Copyright © 2015 KNIME.com AG
Big Data Science is just a
Click Away!
Rosaria Silipo
KNIME.com
Copyright © 2015 KNIME.com AG
Variety, Volume, Velocity
Variety:
• integrating heterogeneous data (and tools)
Volume:
• from small files...
• ...to distributed data repositories (Hadoop)
• bring the tools to the data
Velocity:
• from distributing computationally heavy
computations...
• ...to real time scoring of millions of
records/sec.
4
Copyright © 2015 KNIME.com AG
Every Minute…
5
Copyright © 2015 KNIME.com AG
IoT
6
Copyright © 2015 KNIME.com AG 7
The Challenge
Copyright © 2015 KNIME.com AG
Energy Usage Prediction from Smart Meters Data
• Read Smart Meter Energy Data (176 millions rows)
• Clean Up and Aggregate total Energy Usage by hour,
week, day, month, year
• Calculate Behavioral Measures for each Smart Meter
• Cluster Smart Meters with Similar Behavior (k-
Means)
• Predict Energy Usage in Clustered Smart Meters
(Auto-Regressive Time Series Prediction)
8
Workflow 1
Workflow 2
Workflow 3
Copyright © 2015 KNIME.com AG
Workflow 1: PrepareData
9
~ 2 days
Copyright © 2015 KNIME.com AG 10
Big Data
Copyright © 2015 KNIME.com AG
Big Data Support
• KNIME Big Data Access Nodes
– preconfigured connectors
– in database processing
• Big Data Platforms
– HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream,
Actian, any big data platform really!
• Spark MLlib integration (coming soon)
• Streaming Executor (coming soon)
Copyright © 2015 KNIME.com AG
Hadoop Sandboxes
• Hortonworks:
http://hortonworks.com/products/hortonworks-sandbox/
• Cloudera:
http://www.cloudera.com/content/cloudera/en/downloads/
quickstart_vms.html
• Virtual Box
https://www.virtualbox.org/
• VMWare Player
http://www.vmware.com/
12
Copyright © 2015 KNIME.com AG
Access Big
Data
Select Table
In-DB
Processing
Into
KNIME
… as easy as 1,2,3,… 4
13
4321
Copyright © 2015 KNIME.com AG
1. Database Connector
Generic Database Connector
– Can connect to any JDBC source
– Register new JDBC driver via
preferences page
14
Access Big
Data
Copyright © 2015 KNIME.com AG
1. Register JDBC Driver
15
Open KNIME and go to
File -> Preferences
Increase connection timeout for
long running retrieval operations
Access Big
Data
Copyright © 2015 KNIME.com AG
1. Dedicated Connectors
Dedicated pre-configured connectors
– Bundling necessary JDBC drivers
– Easy to use
– DB specific behavior/capability
Some dedicated connectors are part of
the open source KNIME Analytics
Platform, some belong to the
commercial KNIME Big Data Extension
16
works for most
Hadoop HIVE
installations,
including
Hortonworks
free
Access Big
Data
Copyright © 2015 KNIME.com AG
2. Data Table Selection
18
Select
Table
Copyright © 2015 KNIME.com AG
3. In-Database Processing
• Filter rows and columns
• Join tables/queries
• Sort your data
• Write your own query
• Aggregate* your data
19
Similar Settings as
GroupBy node
Similar Settings as
Joiner node
* Database GroupBy node exposes DB specific aggregation methods
In-DB
Processing
Copyright © 2015 KNIME.com AG
3. Queries for average Measures
20
In-DB
Processing
Copyright © 2015 KNIME.com AG
3. Average Monthly Values
22
In-DB
Processing
Copyright © 2015 KNIME.com AG
4. Import Data from Database
23
< 30 min
1 2
3
4
Into KNIME
Copyright © 2015 KNIME.com AG
New Big Data Platform?
24
No problem!
Just change the connector node!
Copyright © 2015 KNIME.com AG
Other Useful Database Nodes
• Drop table
– missing table handling
– cascade option
• Execute any SQL
statement
• Manipulate existing
queries
25
Executes several
queries separated
by ; and new line
Copyright © 2015 KNIME.com AG 26
KNIME Big Data Extension
Copyright © 2015 KNIME.com AG
KNIME Big Data Extension
• KNIME Big Data Access Nodes
– preconfigured connectors
– HDFS File Handling
– Hive/Impala Loader
• Big Data Platforms
– HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream,
Actian, SAP Hana (to be), …
• Spark MLlib integration (coming soon)
• Streaming Executor (coming soon)
Copyright © 2015 KNIME.com AG
HDFS File Handling
• KNIME & Extensions ->
KNIME File Handling Nodes
• HDFS Connection and
HDFS File Permission nodes
28
Copyright © 2015 KNIME.com AG
Hive/Impala Loader
29
• Upload a KNIME data table to Hive/Impala
Copyright © 2015 KNIME.com AG
KNIME Big Data Extension: Download and Install
KNIME.com Extension Store
License Required!
Installation Instructions
http://tech.knime.org/installation-instructions
Product Description
http://www.knime.org/knime-big-data-extension
Copyright © 2015 KNIME.com AG
License on KNIME Store
http://tech.knime.org/knime-store
30-day trial license available with special Promotion Code
education@knime.com
Copyright © 2015 KNIME.com AG
References
• Whitepaper “KNIME opens the Doors to Big Data”
http://www.knime.org/files/big_data_in_knime_1.pdf
• Blog Post “Integrating Big data is as Easy as 1,2,3, … 4”
http://www.knime.org/blog/integrating-big-data-is-as-easy-as-
1-2-3-4
• The Big Data Extension Product Description
http://www.knime.org/knime-big-data-extension
32
Copyright © 2015 KNIME.com AG
Thank You!
• education@knime.com
• Twitter: @KNIME
• LinkedIn Group: KNIME
• KNIME Blog: http://www.knime.org/blog
33

More Related Content

PPTX
KNIME Meetup 2016-04-16
PDF
Creating a customer segmentation workflow with knime
PDF
Knime &amp; bioinformatics
PPTX
Introduction to knime
PPTX
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
PDF
Hybrid is the New Normal
PDF
Masterson Storage in the Cloud: On-demand DR, Backup & Archive Seminar
PDF
Instaclustr: When and how to migrate from a relational database to Cassandra
KNIME Meetup 2016-04-16
Creating a customer segmentation workflow with knime
Knime &amp; bioinformatics
Introduction to knime
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
Hybrid is the New Normal
Masterson Storage in the Cloud: On-demand DR, Backup & Archive Seminar
Instaclustr: When and how to migrate from a relational database to Cassandra

What's hot (20)

PPTX
Big Data Quickstart Series 3: Perform Data Integration
PPTX
Benchmark of Alibaba Cloud capabilities
PDF
Alluxio Use Cases and Future Directions
PPTX
NetApp ONTAP Select for Service Providers
PDF
Snowflake + Syncsort: Get Value from Your Mainframe Data
PDF
How to Enable Industrial Decarbonization with Node-RED and InfluxDB
PPTX
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
PDF
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
PDF
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
PDF
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
PDF
Realizing the Event Driven Enterprise
PDF
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
PPTX
Free Servers to Build Big Data System on: Bing’s Approach
PDF
Chemistry Data Basics with KNIME Analytics Platform
PPTX
A Walkthrough of InfluxCloud 2.0 by Tim Hall
PPTX
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
PPTX
Spark Infrastructure Made Easy
PDF
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
PDF
Cassandra summit 2015 - Simplifying Streaming Analytics
PDF
Installing your influx enterprise cluster
Big Data Quickstart Series 3: Perform Data Integration
Benchmark of Alibaba Cloud capabilities
Alluxio Use Cases and Future Directions
NetApp ONTAP Select for Service Providers
Snowflake + Syncsort: Get Value from Your Mainframe Data
How to Enable Industrial Decarbonization with Node-RED and InfluxDB
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
Realizing the Event Driven Enterprise
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Free Servers to Build Big Data System on: Bing’s Approach
Chemistry Data Basics with KNIME Analytics Platform
A Walkthrough of InfluxCloud 2.0 by Tim Hall
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Spark Infrastructure Made Easy
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Cassandra summit 2015 - Simplifying Streaming Analytics
Installing your influx enterprise cluster
Ad

Similar to Big Data as easy as 1, 2, 3, ... 4 ... with KNIME (20)

PDF
SQL Engines for Hadoop - The case for Impala
PDF
What's New in KNIME Analytics Platform 4.1
PDF
InfoSphere BigInsights - Analytics power for Hadoop - field experience
PPTX
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
PDF
Unlocking Big Data Insights with MySQL
PDF
Hadoop Application Architectures tutorial at Big DataService 2015
PPTX
Simplifying Real-Time Architectures for IoT with Apache Kudu
PPTX
Software Defined Infrastructure
PPTX
Multi-Tenant Operations with Cloudera 5.7 & BT
PDF
KNIME Software Overview
PPTX
Vmware Serengeti - Based on Infochimps Ironfan
PDF
Strata EU tutorial - Architectural considerations for hadoop applications
PPTX
Open Sourcing GemFire - Apache Geode
PPTX
An Introduction to Apache Geode (incubating)
PPTX
1. beyond mission critical virtualizing big data and hadoop
PPTX
Analyzing the World's Largest Security Data Lake!
PPTX
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
PDF
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
PDF
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
PDF
Oracle Cloud : Big Data Use Cases and Architecture
SQL Engines for Hadoop - The case for Impala
What's New in KNIME Analytics Platform 4.1
InfoSphere BigInsights - Analytics power for Hadoop - field experience
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Unlocking Big Data Insights with MySQL
Hadoop Application Architectures tutorial at Big DataService 2015
Simplifying Real-Time Architectures for IoT with Apache Kudu
Software Defined Infrastructure
Multi-Tenant Operations with Cloudera 5.7 & BT
KNIME Software Overview
Vmware Serengeti - Based on Infochimps Ironfan
Strata EU tutorial - Architectural considerations for hadoop applications
Open Sourcing GemFire - Apache Geode
An Introduction to Apache Geode (incubating)
1. beyond mission critical virtualizing big data and hadoop
Analyzing the World's Largest Security Data Lake!
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
Oracle Cloud : Big Data Use Cases and Architecture
Ad

Recently uploaded (20)

PPTX
Managing Community Partner Relationships
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Microsoft 365 products and services descrption
PDF
Global Data and Analytics Market Outlook Report
PPTX
Leprosy and NLEP programme community medicine
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Business_Capability_Map_Collection__pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Managing Community Partner Relationships
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Business Analytics and business intelligence.pdf
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
retention in jsjsksksksnbsndjddjdnFPD.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Microsoft 365 products and services descrption
Global Data and Analytics Market Outlook Report
Leprosy and NLEP programme community medicine
DU, AIS, Big Data and Data Analytics.ppt
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
New ISO 27001_2022 standard and the changes
Qualitative Qantitative and Mixed Methods.pptx
Business_Capability_Map_Collection__pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx

Big Data as easy as 1, 2, 3, ... 4 ... with KNIME

  • 1. Copyright © 2015 KNIME.com AG Big Data Science is just a Click Away! Rosaria Silipo KNIME.com
  • 2. Copyright © 2015 KNIME.com AG Variety, Volume, Velocity Variety: • integrating heterogeneous data (and tools) Volume: • from small files... • ...to distributed data repositories (Hadoop) • bring the tools to the data Velocity: • from distributing computationally heavy computations... • ...to real time scoring of millions of records/sec. 4
  • 3. Copyright © 2015 KNIME.com AG Every Minute… 5
  • 4. Copyright © 2015 KNIME.com AG IoT 6
  • 5. Copyright © 2015 KNIME.com AG 7 The Challenge
  • 6. Copyright © 2015 KNIME.com AG Energy Usage Prediction from Smart Meters Data • Read Smart Meter Energy Data (176 millions rows) • Clean Up and Aggregate total Energy Usage by hour, week, day, month, year • Calculate Behavioral Measures for each Smart Meter • Cluster Smart Meters with Similar Behavior (k- Means) • Predict Energy Usage in Clustered Smart Meters (Auto-Regressive Time Series Prediction) 8 Workflow 1 Workflow 2 Workflow 3
  • 7. Copyright © 2015 KNIME.com AG Workflow 1: PrepareData 9 ~ 2 days
  • 8. Copyright © 2015 KNIME.com AG 10 Big Data
  • 9. Copyright © 2015 KNIME.com AG Big Data Support • KNIME Big Data Access Nodes – preconfigured connectors – in database processing • Big Data Platforms – HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, Actian, any big data platform really! • Spark MLlib integration (coming soon) • Streaming Executor (coming soon)
  • 10. Copyright © 2015 KNIME.com AG Hadoop Sandboxes • Hortonworks: http://hortonworks.com/products/hortonworks-sandbox/ • Cloudera: http://www.cloudera.com/content/cloudera/en/downloads/ quickstart_vms.html • Virtual Box https://www.virtualbox.org/ • VMWare Player http://www.vmware.com/ 12
  • 11. Copyright © 2015 KNIME.com AG Access Big Data Select Table In-DB Processing Into KNIME … as easy as 1,2,3,… 4 13 4321
  • 12. Copyright © 2015 KNIME.com AG 1. Database Connector Generic Database Connector – Can connect to any JDBC source – Register new JDBC driver via preferences page 14 Access Big Data
  • 13. Copyright © 2015 KNIME.com AG 1. Register JDBC Driver 15 Open KNIME and go to File -> Preferences Increase connection timeout for long running retrieval operations Access Big Data
  • 14. Copyright © 2015 KNIME.com AG 1. Dedicated Connectors Dedicated pre-configured connectors – Bundling necessary JDBC drivers – Easy to use – DB specific behavior/capability Some dedicated connectors are part of the open source KNIME Analytics Platform, some belong to the commercial KNIME Big Data Extension 16 works for most Hadoop HIVE installations, including Hortonworks free Access Big Data
  • 15. Copyright © 2015 KNIME.com AG 2. Data Table Selection 18 Select Table
  • 16. Copyright © 2015 KNIME.com AG 3. In-Database Processing • Filter rows and columns • Join tables/queries • Sort your data • Write your own query • Aggregate* your data 19 Similar Settings as GroupBy node Similar Settings as Joiner node * Database GroupBy node exposes DB specific aggregation methods In-DB Processing
  • 17. Copyright © 2015 KNIME.com AG 3. Queries for average Measures 20 In-DB Processing
  • 18. Copyright © 2015 KNIME.com AG 3. Average Monthly Values 22 In-DB Processing
  • 19. Copyright © 2015 KNIME.com AG 4. Import Data from Database 23 < 30 min 1 2 3 4 Into KNIME
  • 20. Copyright © 2015 KNIME.com AG New Big Data Platform? 24 No problem! Just change the connector node!
  • 21. Copyright © 2015 KNIME.com AG Other Useful Database Nodes • Drop table – missing table handling – cascade option • Execute any SQL statement • Manipulate existing queries 25 Executes several queries separated by ; and new line
  • 22. Copyright © 2015 KNIME.com AG 26 KNIME Big Data Extension
  • 23. Copyright © 2015 KNIME.com AG KNIME Big Data Extension • KNIME Big Data Access Nodes – preconfigured connectors – HDFS File Handling – Hive/Impala Loader • Big Data Platforms – HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, Actian, SAP Hana (to be), … • Spark MLlib integration (coming soon) • Streaming Executor (coming soon)
  • 24. Copyright © 2015 KNIME.com AG HDFS File Handling • KNIME & Extensions -> KNIME File Handling Nodes • HDFS Connection and HDFS File Permission nodes 28
  • 25. Copyright © 2015 KNIME.com AG Hive/Impala Loader 29 • Upload a KNIME data table to Hive/Impala
  • 26. Copyright © 2015 KNIME.com AG KNIME Big Data Extension: Download and Install KNIME.com Extension Store License Required! Installation Instructions http://tech.knime.org/installation-instructions Product Description http://www.knime.org/knime-big-data-extension
  • 27. Copyright © 2015 KNIME.com AG License on KNIME Store http://tech.knime.org/knime-store 30-day trial license available with special Promotion Code education@knime.com
  • 28. Copyright © 2015 KNIME.com AG References • Whitepaper “KNIME opens the Doors to Big Data” http://www.knime.org/files/big_data_in_knime_1.pdf • Blog Post “Integrating Big data is as Easy as 1,2,3, … 4” http://www.knime.org/blog/integrating-big-data-is-as-easy-as- 1-2-3-4 • The Big Data Extension Product Description http://www.knime.org/knime-big-data-extension 32
  • 29. Copyright © 2015 KNIME.com AG Thank You! • education@knime.com • Twitter: @KNIME • LinkedIn Group: KNIME • KNIME Blog: http://www.knime.org/blog 33