© Hitachi America, Ltd. 2017. All rights reserved.
Hands-on demo of PDI using webSpoon
Researcher at Hitachi America, Ltd.
4/27/2017
Hiromu Hota, PhD
@HiromuHota, hiromu.hota@hal.hitachi.com
© Hitachi America, Ltd. 2017. All rights reserved.
Get started with webSpoon
1
© Hitachi America, Ltd. 2017. All rights reserved.
How to get started with webSpoon
2
1. Visit
https://HighlyAvailable-env.i8gkiqhycy.us-west-2.elasticbeanstalk.com
(will be deleted after the meetup)
2. Login with
User: user
Password: password
3. From the top menu, click File > New > Transformation
© Hitachi America, Ltd. 2017. All rights reserved.
• Transformations
– are data flows, which typically start from data sources, go through some
processing, and end at a target database table.
– are comprised of steps and hops.
– are saved as .ktr (Kettle) files or to a repository.
• Steps and Hops
– Steps are designed for a specific task such as input, output, scripting, etc.
– Hops are directed data pathways that connect steps.
Basic Concepts of PDI
3
HopStep
Trans.ktr
Repository
Save
© Hitachi America, Ltd. 2017. All rights reserved.
How to operate webSpoon
4
• Drawing Steps
1. Under the Design tab, expand the Input node, then click and drag a
Generate random credit card numbers step onto the canvas.
2. Expand the Flow node; click and drag a Dummy (do nothing) step onto the
canvas.
• Drawing Hops (similar to the way in Spoon)
1. Key-down and hold the <SHIFT> key.
2. Click-down and hold the Generate random credit card numbers step.
3. Move the mouse cursor to the Dummy (do nothing) step.
4. Release the click and the key.
© Hitachi America, Ltd. 2017. All rights reserved.
Example demo
5
© Hitachi America, Ltd. 2017. All rights reserved.
Demo story
6
• Background
– Ichiro Hitachi works for a travel agency, based in San Francisco.
– He wants to offer additional benefit to his customer tourists.
– He personally likes to visit filming locations when visiting a new place,
so strongly believes that such information is useful for them too.
• Movie location notifier
– When his customers come close to a filming location, they receive a
notification that tells title, year, short plot, actor, and address
(Cropped) Map of San Francisco by Ryan Holliday / CC-BY-SA 4.0
• Godzilla (2014)
• He attacked GGB
• Golden Gate Bridge
• Forrest Gump (1994)
• He has accidentally been present
at many historic moments
• 3301 Lyon Street
© Hitachi America, Ltd. 2017. All rights reserved.
Source data: “Film Locations in San Francisco”
7
• Source data
– Available on SF OpenData (https://data.sfgov.org/).
– A list of filming locations of movies shot in San Francisco.
• Web APIs to retrieve missing information
– OMDb (Open Movie Database) API
• Short plot of the movie
– Google Maps API
• Formatted (normalized) address (e.g., Palace of Fine Arts -> 3301 Lyon Street)
• Latitude & Longitude of the location, to calculate the distance from each user
Title Year Locations Actor1 ...
Godzilla 2014 Kearney & Pine St.
Forrest Gump 1994 Palace of Fine Arts
...
© Hitachi America, Ltd. 2017. All rights reserved.
High-level demo system architecture
8
webSpoon
SF OpenData
Organizer Participants
Database
Google Maps API OMDb API
Raw data
Operations
Enriched data
Specific location data
Geo data Movie data
Not covered today
© Hitachi America, Ltd. 2017. All rights reserved.
Exercise (step 1)
9
1. Open an example file and save in a different name
1. Click File > Open, select example2, then click OK
2. Click File > Save as, change Transformation name to be unique (not to be
overwritten by others), then click OK
© Hitachi America, Ltd. 2017. All rights reserved.
Exercise (step 2)
10
2. Run
1. Click the Run button or Action > Run from the menu
2. Click the Run button at the bottom right
Step 3.1 Step 3.2
© Hitachi America, Ltd. 2017. All rights reserved.
Exercise (step 3)
11
3. Preview the result
1. Click on the “Dummy (do nothing)” step
2. Click on the “Preview data” tab in the “Execution Results” at the bottom
3. See other steps
© Hitachi America, Ltd. 2017. All rights reserved.
Exercise (step 4)
12
4. Complete the data flow by enabling the disabled hop
1. Click on the hop between “Dummy (do nothing)” and “Filter out rows...”
2. Save, Run, and preview the result
© Hitachi America, Ltd. 2017. All rights reserved.
Exercise (step 5)
13
5. Explorer the rest yourself; for example,
– Click on each step and see how it is configured
– Explorer what kinds of steps are available
– Design the exact same flow yourself
– Download and deploy webSpoon
• Docker image: https://hub.docker.com/r/hiromuhota/webspoon/
• WAR file: https://github.com/HiromuHota/pentaho-kettle/releases
– Download and install Pentaho Data Integration (including Spoon)
• http://www.pentaho.com/download (Enterprise Edition)
• http://community.pentaho.com/ (Community Edition)
© Hitachi America, Ltd. 2017. All rights reserved.
Trademarks and copyrights
14
• Pentaho is a registered trademark of Pentaho, Inc.
• AWS, Amazon Elastic Beanstalk, and any other AWS Marks and
Services are trademarks of Amazon Web Services, Inc.
• The use of AWS Simple Icons is permitted by Amazon Web Services,
Inc.
• Godzilla is a registered trademark of Toho Co., Ltd.
• Google Maps is a trademark of Google Inc.
• All content via OMDb API is licensed by Brian Fritz under CC BY-NC 4.0.
Hands-on demo of PDI using webSpoon
© Hitachi America, Ltd. 2017. All rights reserved.
Demo system architecture
16
webSpoon
Classic Load
Balancer
Auto Scaling group
Elastic Beanstalk
AWS cloud
SF OpenData
・・・
Organizer
ParticipantsDatabase
Geo data, Movie data

More Related Content

PDF
Updates on webSpoon and other innovations from Hitachi R&D
PDF
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
PPTX
分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
PDF
NetflixにおけるPresto/Spark活用事例
PDF
Presto on YARNの導入・運用
PDF
Announcing Databricks Cloud (Spark Summit 2014)
PDF
Hiveを高速化するLLAP
PDF
MLflow Model Serving
Updates on webSpoon and other innovations from Hitachi R&D
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
分析指向データレイク実現の次の一手 ~Delta Lake、なにそれおいしいの?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
NetflixにおけるPresto/Spark活用事例
Presto on YARNの導入・運用
Announcing Databricks Cloud (Spark Summit 2014)
Hiveを高速化するLLAP
MLflow Model Serving

What's hot (20)

PPTX
Hadoop -NameNode HAの仕組み-
PDF
Hadoopのシステム設計・運用のポイント
PDF
[C33] 24時間365日「本当に」止まらないデータベースシステムの導入 ~AlwaysOn+Qシステムで完全無停止運用~ by Nobuyuki Sa...
PDF
Unified MLOps: Feature Stores & Model Deployment
PDF
R言語で始めよう、データサイエンス(ハンズオン勉強会) 〜機会学習・データビジュアライゼーション事始め〜
PPTX
JIRA / Confluence の 必須プラグインはこれだ
PDF
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
PDF
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
PDF
Hadoopの概念と基本的知識
PDF
実践!DBベンチマークツールの使い方
PDF
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
PPTX
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
PPTX
Hive + Tez: A Performance Deep Dive
PDF
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
PDF
こんなに使える!今どきのAPIドキュメンテーションツール
PPTX
え!?データがオンプレにあるけどPower BI で BI したいの?
PDF
An Insider’s Guide to Maximizing Spark SQL Performance
PDF
もしOracleDBAがMySQLを管理することになったときの注意点など
PDF
爆速クエリエンジン”Presto”を使いたくなる話
PDF
Practical migration from JSP to Thymeleaf
Hadoop -NameNode HAの仕組み-
Hadoopのシステム設計・運用のポイント
[C33] 24時間365日「本当に」止まらないデータベースシステムの導入 ~AlwaysOn+Qシステムで完全無停止運用~ by Nobuyuki Sa...
Unified MLOps: Feature Stores & Model Deployment
R言語で始めよう、データサイエンス(ハンズオン勉強会) 〜機会学習・データビジュアライゼーション事始め〜
JIRA / Confluence の 必須プラグインはこれだ
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoopの概念と基本的知識
実践!DBベンチマークツールの使い方
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
Hive + Tez: A Performance Deep Dive
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
こんなに使える!今どきのAPIドキュメンテーションツール
え!?データがオンプレにあるけどPower BI で BI したいの?
An Insider’s Guide to Maximizing Spark SQL Performance
もしOracleDBAがMySQLを管理することになったときの注意点など
爆速クエリエンジン”Presto”を使いたくなる話
Practical migration from JSP to Thymeleaf
Ad

Similar to Hands-on demo of PDI using webSpoon (20)

PDF
Overview of webSpoon @ Pentaho Bay Area Meetup
PDF
Extending Android's Platform Toolsuite
PDF
Cerebro general overiew eng
PDF
Super Easy Memory Forensics
 
PPTX
Introduction to git & github
PPTX
Dori waldman android _course
PPTX
Dd13.2013.milano.open ntf
PDF
OpenSource Big Data Platform - Flamingo Project
ODP
Migrating to Git: Rethinking the Commit
PDF
Embedded Android Workshop part I ESC SV 2012
PPTX
Hacktoberfest 2020 - Open source for beginners
PDF
Prototyping for mobile
PPTX
R1-intro-to-go.pptx
PDF
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
PDF
Refactoring to Go modules: why and how
PPTX
BriMor Labs Live Response Collection - OSDFCON
PPTX
Introduction of ShinoBOT (Black Hat USA 2013 Arsenal)
PDF
What the Heck Just Happened?
PPTX
Dori waldman android _course_2
PDF
Fernando Arnaboldi - Exposing Hidden Exploitable Behaviors Using Extended Dif...
Overview of webSpoon @ Pentaho Bay Area Meetup
Extending Android's Platform Toolsuite
Cerebro general overiew eng
Super Easy Memory Forensics
 
Introduction to git & github
Dori waldman android _course
Dd13.2013.milano.open ntf
OpenSource Big Data Platform - Flamingo Project
Migrating to Git: Rethinking the Commit
Embedded Android Workshop part I ESC SV 2012
Hacktoberfest 2020 - Open source for beginners
Prototyping for mobile
R1-intro-to-go.pptx
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
Refactoring to Go modules: why and how
BriMor Labs Live Response Collection - OSDFCON
Introduction of ShinoBOT (Black Hat USA 2013 Arsenal)
What the Heck Just Happened?
Dori waldman android _course_2
Fernando Arnaboldi - Exposing Hidden Exploitable Behaviors Using Extended Dif...
Ad

Recently uploaded (20)

PDF
CCleaner 6.39.11548 Crack 2025 License Key
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
Cost to Outsource Software Development in 2025
PPTX
Cybersecurity: Protecting the Digital World
PPTX
assetexplorer- product-overview - presentation
PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
Types of Token_ From Utility to Security.pdf
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
CCleaner 6.39.11548 Crack 2025 License Key
Weekly report ppt - harsh dattuprasad patel.pptx
Monitoring Stack: Grafana, Loki & Promtail
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
MCP Security Tutorial - Beginner to Advanced
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Patient Appointment Booking in Odoo with online payment
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
GSA Content Generator Crack (2025 Latest)
Cost to Outsource Software Development in 2025
Cybersecurity: Protecting the Digital World
assetexplorer- product-overview - presentation
Autodesk AutoCAD Crack Free Download 2025
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Types of Token_ From Utility to Security.pdf
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Salesforce Agentforce AI Implementation.pdf
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM

Hands-on demo of PDI using webSpoon

  • 1. © Hitachi America, Ltd. 2017. All rights reserved. Hands-on demo of PDI using webSpoon Researcher at Hitachi America, Ltd. 4/27/2017 Hiromu Hota, PhD @HiromuHota, hiromu.hota@hal.hitachi.com
  • 2. © Hitachi America, Ltd. 2017. All rights reserved. Get started with webSpoon 1
  • 3. © Hitachi America, Ltd. 2017. All rights reserved. How to get started with webSpoon 2 1. Visit https://HighlyAvailable-env.i8gkiqhycy.us-west-2.elasticbeanstalk.com (will be deleted after the meetup) 2. Login with User: user Password: password 3. From the top menu, click File > New > Transformation
  • 4. © Hitachi America, Ltd. 2017. All rights reserved. • Transformations – are data flows, which typically start from data sources, go through some processing, and end at a target database table. – are comprised of steps and hops. – are saved as .ktr (Kettle) files or to a repository. • Steps and Hops – Steps are designed for a specific task such as input, output, scripting, etc. – Hops are directed data pathways that connect steps. Basic Concepts of PDI 3 HopStep Trans.ktr Repository Save
  • 5. © Hitachi America, Ltd. 2017. All rights reserved. How to operate webSpoon 4 • Drawing Steps 1. Under the Design tab, expand the Input node, then click and drag a Generate random credit card numbers step onto the canvas. 2. Expand the Flow node; click and drag a Dummy (do nothing) step onto the canvas. • Drawing Hops (similar to the way in Spoon) 1. Key-down and hold the <SHIFT> key. 2. Click-down and hold the Generate random credit card numbers step. 3. Move the mouse cursor to the Dummy (do nothing) step. 4. Release the click and the key.
  • 6. © Hitachi America, Ltd. 2017. All rights reserved. Example demo 5
  • 7. © Hitachi America, Ltd. 2017. All rights reserved. Demo story 6 • Background – Ichiro Hitachi works for a travel agency, based in San Francisco. – He wants to offer additional benefit to his customer tourists. – He personally likes to visit filming locations when visiting a new place, so strongly believes that such information is useful for them too. • Movie location notifier – When his customers come close to a filming location, they receive a notification that tells title, year, short plot, actor, and address (Cropped) Map of San Francisco by Ryan Holliday / CC-BY-SA 4.0 • Godzilla (2014) • He attacked GGB • Golden Gate Bridge • Forrest Gump (1994) • He has accidentally been present at many historic moments • 3301 Lyon Street
  • 8. © Hitachi America, Ltd. 2017. All rights reserved. Source data: “Film Locations in San Francisco” 7 • Source data – Available on SF OpenData (https://data.sfgov.org/). – A list of filming locations of movies shot in San Francisco. • Web APIs to retrieve missing information – OMDb (Open Movie Database) API • Short plot of the movie – Google Maps API • Formatted (normalized) address (e.g., Palace of Fine Arts -> 3301 Lyon Street) • Latitude & Longitude of the location, to calculate the distance from each user Title Year Locations Actor1 ... Godzilla 2014 Kearney & Pine St. Forrest Gump 1994 Palace of Fine Arts ...
  • 9. © Hitachi America, Ltd. 2017. All rights reserved. High-level demo system architecture 8 webSpoon SF OpenData Organizer Participants Database Google Maps API OMDb API Raw data Operations Enriched data Specific location data Geo data Movie data Not covered today
  • 10. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 1) 9 1. Open an example file and save in a different name 1. Click File > Open, select example2, then click OK 2. Click File > Save as, change Transformation name to be unique (not to be overwritten by others), then click OK
  • 11. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 2) 10 2. Run 1. Click the Run button or Action > Run from the menu 2. Click the Run button at the bottom right Step 3.1 Step 3.2
  • 12. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 3) 11 3. Preview the result 1. Click on the “Dummy (do nothing)” step 2. Click on the “Preview data” tab in the “Execution Results” at the bottom 3. See other steps
  • 13. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 4) 12 4. Complete the data flow by enabling the disabled hop 1. Click on the hop between “Dummy (do nothing)” and “Filter out rows...” 2. Save, Run, and preview the result
  • 14. © Hitachi America, Ltd. 2017. All rights reserved. Exercise (step 5) 13 5. Explorer the rest yourself; for example, – Click on each step and see how it is configured – Explorer what kinds of steps are available – Design the exact same flow yourself – Download and deploy webSpoon • Docker image: https://hub.docker.com/r/hiromuhota/webspoon/ • WAR file: https://github.com/HiromuHota/pentaho-kettle/releases – Download and install Pentaho Data Integration (including Spoon) • http://www.pentaho.com/download (Enterprise Edition) • http://community.pentaho.com/ (Community Edition)
  • 15. © Hitachi America, Ltd. 2017. All rights reserved. Trademarks and copyrights 14 • Pentaho is a registered trademark of Pentaho, Inc. • AWS, Amazon Elastic Beanstalk, and any other AWS Marks and Services are trademarks of Amazon Web Services, Inc. • The use of AWS Simple Icons is permitted by Amazon Web Services, Inc. • Godzilla is a registered trademark of Toho Co., Ltd. • Google Maps is a trademark of Google Inc. • All content via OMDb API is licensed by Brian Fritz under CC BY-NC 4.0.
  • 17. © Hitachi America, Ltd. 2017. All rights reserved. Demo system architecture 16 webSpoon Classic Load Balancer Auto Scaling group Elastic Beanstalk AWS cloud SF OpenData ・・・ Organizer ParticipantsDatabase Geo data, Movie data