SlideShare a Scribd company logo
© 2015 IBM Corporation
IBM Data Science Experience
Overview
© 2015 IBM Corporation2
Contents
§ About me…
§ Data Science Experience
§ Community
- Fork and Share
§ Open Source
- RStudio
- Shiny Web App Framework
§ IBM Value-Add
- IBM Analytics using Apache Spark
- Object Storage
- Data-Platform Connectors
© 2015 IBM Corporation3
About me…
§ My name is Thomas…Hi J
§ Open Source Systems Engineer for 11 years and he has 8 years of experience in Cloud
and hybrid environments.
§ Prior to IBM I worked as a Systems Architect, DevOps / Systems Engineer – Cloud
Operations
§ I like…Spark, Python, Linux, Configuration Management tools, Hadoop, and wrangling
cats…
© 2015 IBM Corporation4
Built-in learning to
get started or go
the distance with
advanced tutorials
Learn
The best of open source
and IBM value-add to
create state-of-the-art
data products
Create
Community and
social features that
provide meaningful
collaboration
Collaborate
Visit: http://datascience.ibm.com
Introducing the Data Science Experience
© 2015 IBM Corporation5
IBM Data Science Experience
Community Open Source IBM Added Value
Powered by IBM DataWorks Platform in the Cloud
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Code in Scala/Python/R/SQL
• Jupyter and Zeppelin* Notebooks
• RStudio IDE and Shiny apps
• Apache Spark
• Your favorite libraries
• Data Shaping/Pipeline UI *
• Auto-data preparation
and modeling*
• Advanced Visualizations*
• Model management
and deployment*
• Documented Model APIs*
• Spark as a Service
* DSX product roadmap items
Core Attributes of the Data Science Experience
© 2015 IBM Corporation6
Tailored Experiences For Users Collaborating Together
Architects how data is
organized & ensures operability
Gets deep into the data to draw
hidden insights for the business
Works with data to apply insights
to the business strategy
Plugs into data and models &
writes code to build apps
Ingest
data
Transform:
clean
Create
and build
model
Evaluate
Deliver and
deploy
model
Communicate
results
Understand
problem and
domain
Explore and
understand
data
Transform:
shape
OUTPUT
ANALYSIS
INPUT
Data Engineer
Data Scientist
Business Analyst
App Developer
DataPlatform Forge
Data Science Experience
Watson Analytics
Bluemix
© 2015 IBM Corporation7
DSX has RStudio built into the experience…
© 2015 IBM Corporation8
Modelling Energy Usage in NYC – BlocPower
Blog Link: http://ibm.co/29KLbvu
"BlocPower operation is diverse from outreach and
targeting, origination of investment-grade clean
energy projects to financing projects through our
crowdfunding marketplace. Data is the underlying tool
of our operation and IBM's Data Science Experience
will facilitate a closer integration across it and help our
business scale up faster. “
— Tooraji Arvajeh,
Chief Engineering Officer,
BlocPower
© 2015 IBM Corporation9
Use Shiny apps to share your analysis with business users
© 2015 IBM Corporation10
Interactively explore the analysis of your data science team
© 2015 IBM Corporation11
Adjust parameters on-the-fly and visualize model predictions
© 2015 IBM Corporation12
BigInsights
(HDFS)
Cloudant
(DBaaS)
dashDB
(Analytics)
Swift
(Object
Storage)
SQDB
(Managed
DB2)
Data Sources
IBM Cloud Public Cloud Cloud Apps On-Premises
Execute SQL
Statements
Streaming
Analytics via
Micro-batch
M.L. and
Statistical
Algorithms
Distributed
Graph
Processing
Framework
§ General compute engine
§ Basic I/O functions
§ Task dispatching
§ Scheduling
Spark Core
Spark SQL
Spark
Streaming
MLlib
Machine Learning
Graph
From a Notebook you can use IBM Analytics for Apache
Spark to blend multiple data types, sources, and workloads
© 2015 IBM Corporation13
The Spark Service uses Bluemix Object Storage as its preferred
data store for building performant applications
§ Object storage provides inexpensive, scalable and self-healing
retention of massive amounts of unstructured data
§ Every object exists at the same level in a flat address space
§ Bluemix Object Storage has a drag-and-drop upload and Swift
API for programmatic access
§ DataPlatform Connectors enable users to easily move data in
and out of Bluemix Object Storage
© 2015 IBM Corporation14 All of the supported targets are compatible with each source
Supported Data Sources for DSX via on-
premises and cloud Connectors
Cloud Sources On-Premises Sources Cloud Targets On-Premises Targets
Amazon Redshift Apache Hive Amazon S3 IBM DB2® LUW
Amazon S3 Cloudera Impala Bluemix Object Storage IBM Pure Data for Analytics®
Apache Hive IBM DB2® LUW IBM Cloudant™ Teradata
Bluemix Object Storage IBM Informix® IBM dashDB
IBM BigInsights™ on Cloud * IBM Pure Data for Analytics® IBM BigInsights™ on Cloud *
IBM Cloudant™ Microsoft SQL Server IBM DB2® on Cloud
IBM dashDB MySQL Enterprise Edition IBM SQL Database
IBM DB2® on Cloud Oracle IBM Watson™ Analytics
IBM SQL Database Pivotal Greenplum PostgreSQL on Compose
Microsoft Azure PostgreSQL SoftLayer Object Storage
PostgreSQL on Compose Sybase
Salesforce Sybase IQ
SoftLayer Object Storage Teradata
© 2015 IBM Corporation15
IBM DSX KEY FEATURES
Sparkling Data
Prescriptive Analytics
Shiny
Data Connections
Schedule Jobs
§Self Service Data Science platform
DSX
Notebooks
Data
Community
Data Shaping
RStudio
Projects
Scheduling
© 2015 IBM Corporation16
© 2015 IBM Corporation17
Legal Disclaimer
• © IBM Corporation 2014. All Rights Reserved.
• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained
in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are
subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing
contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and
conditions of the applicable license agreement governing the use of IBM software.
• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or
capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to
future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by
you will result in any specific sales, revenue growth or other results.
• If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete:
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
• If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete:
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs
and performance characteristics may vary by customer.
• Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM
Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server).
Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your
presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in
your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International
Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.
• If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete:
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other
countries.
• If you reference Java™ in the text, please mark the first use and include the following; otherwise delete:
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
• If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete:
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.
• If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete:
Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries.
• If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete:
UNIX is a registered trademark of The Open Group in the United States and other countries.
• If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete:
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of
others.
• If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta
Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration
purposes only.

More Related Content

PPTX
Puree through Trillion of clicks in seconds using Interana
PPTX
Explore big data at speed of thought with Spark 2.0 and Snappydata
PDF
AI Modernization at AT&T and the Application to Fraud with Databricks
PPTX
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
PDF
Building Custom Machine Learning Algorithms With Apache SystemML
PDF
How Adobe uses Structured Streaming at Scale
PDF
SnappyData Overview Slidedeck for Big Data Bellevue
PPTX
Intro to SnappyData Webinar
Puree through Trillion of clicks in seconds using Interana
Explore big data at speed of thought with Spark 2.0 and Snappydata
AI Modernization at AT&T and the Application to Fraud with Databricks
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
Building Custom Machine Learning Algorithms With Apache SystemML
How Adobe uses Structured Streaming at Scale
SnappyData Overview Slidedeck for Big Data Bellevue
Intro to SnappyData Webinar

What's hot (20)

PDF
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
PPTX
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
PPTX
Thing you didn't know you could do in Spark
PPTX
Real-time Analytics with Trino and Apache Pinot
PPTX
SnappyData overview NikeTechTalk 11/19/15
PDF
SnappyData Toronto Meetup Nov 2017
PDF
20141015 how graphs revolutionize access management
PDF
SnappyData @ Seattle Spark Meetup
PDF
Koalas: How Well Does Koalas Work?
PDF
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
PDF
Efficient State Management With Spark 2.0 And Scale-Out Databases
PDF
Infrastructure Agnostic Machine Learning Workload Deployment
PDF
Scaling Data and ML with Apache Spark and Feast
PDF
Unified, Efficient, and Portable Data Processing with Apache Beam
PDF
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
PDF
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
PDF
Ray: Enterprise-Grade, Distributed Python
PDF
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
PDF
COBOL to Apache Spark
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Thing you didn't know you could do in Spark
Real-time Analytics with Trino and Apache Pinot
SnappyData overview NikeTechTalk 11/19/15
SnappyData Toronto Meetup Nov 2017
20141015 how graphs revolutionize access management
SnappyData @ Seattle Spark Meetup
Koalas: How Well Does Koalas Work?
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Efficient State Management With Spark 2.0 And Scale-Out Databases
Infrastructure Agnostic Machine Learning Workload Deployment
Scaling Data and ML with Apache Spark and Feast
Unified, Efficient, and Portable Data Processing with Apache Beam
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Ray: Enterprise-Grade, Distributed Python
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
COBOL to Apache Spark
Ad

Similar to Spark working with a Cloud IDE: Notebook/Shiny Apps (20)

PDF
IBM Data Science Experience and Machine Learning Use Cases in Healthcare
PDF
DESY's new data taking and analysis infrastructure for PETRA III
PDF
Vision 2016 fpm 1081 - getting data from sap business warehouse into your ibm...
PDF
Capgemini Connected Car Demo Using IBM Internet of Things Foundation on Bluemix
PDF
AD 1656 - Transforming social data into business insight
PDF
Impact 2014 - enabling an intelligent enterprise theory and practice
PDF
IT Roadmap Atlanta Deliver on your innovation goals with IBM Bluemix
PDF
Libera la potenza del Machine Learning
PDF
Unifying the Silos: Optimize your Data Pipeline for Analytics and AI
PPT
Making People Flow in Cities Measurable and Analyzable
PDF
Integrate Application Security Testing into your SDLC
PDF
Empowering you with Democratized Data Access, Data Science and Machine Learning
PDF
DIY Analytics with Apache Spark
PDF
Integrating BigInsights and Puredata system for analytics with query federati...
PPTX
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
PPTX
Big Data With Graphs
PDF
Big Data with Graph, IBM Domino, and the OpenNTF API
PPTX
S200743 storage-announcements-ist2020-v2001a
PDF
Build cognitive Apps that help enhance, scale and accelerate Human Expertise
PDF
A Text Analytics Marketscape (from Strata NY 2014)
IBM Data Science Experience and Machine Learning Use Cases in Healthcare
DESY's new data taking and analysis infrastructure for PETRA III
Vision 2016 fpm 1081 - getting data from sap business warehouse into your ibm...
Capgemini Connected Car Demo Using IBM Internet of Things Foundation on Bluemix
AD 1656 - Transforming social data into business insight
Impact 2014 - enabling an intelligent enterprise theory and practice
IT Roadmap Atlanta Deliver on your innovation goals with IBM Bluemix
Libera la potenza del Machine Learning
Unifying the Silos: Optimize your Data Pipeline for Analytics and AI
Making People Flow in Cities Measurable and Analyzable
Integrate Application Security Testing into your SDLC
Empowering you with Democratized Data Access, Data Science and Machine Learning
DIY Analytics with Apache Spark
Integrating BigInsights and Puredata system for analytics with query federati...
Vision 2016 fpm 1072 - tips on using ibm cognos command center with ibm plann...
Big Data With Graphs
Big Data with Graph, IBM Domino, and the OpenNTF API
S200743 storage-announcements-ist2020-v2001a
Build cognitive Apps that help enhance, scale and accelerate Human Expertise
A Text Analytics Marketscape (from Strata NY 2014)
Ad

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
Teaching material agriculture food technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
Programs and apps: productivity, graphics, security and other tools
Teaching material agriculture food technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.

Spark working with a Cloud IDE: Notebook/Shiny Apps

  • 1. © 2015 IBM Corporation IBM Data Science Experience Overview
  • 2. © 2015 IBM Corporation2 Contents § About me… § Data Science Experience § Community - Fork and Share § Open Source - RStudio - Shiny Web App Framework § IBM Value-Add - IBM Analytics using Apache Spark - Object Storage - Data-Platform Connectors
  • 3. © 2015 IBM Corporation3 About me… § My name is Thomas…Hi J § Open Source Systems Engineer for 11 years and he has 8 years of experience in Cloud and hybrid environments. § Prior to IBM I worked as a Systems Architect, DevOps / Systems Engineer – Cloud Operations § I like…Spark, Python, Linux, Configuration Management tools, Hadoop, and wrangling cats…
  • 4. © 2015 IBM Corporation4 Built-in learning to get started or go the distance with advanced tutorials Learn The best of open source and IBM value-add to create state-of-the-art data products Create Community and social features that provide meaningful collaboration Collaborate Visit: http://datascience.ibm.com Introducing the Data Science Experience
  • 5. © 2015 IBM Corporation5 IBM Data Science Experience Community Open Source IBM Added Value Powered by IBM DataWorks Platform in the Cloud • Find tutorials and datasets • Connect with Data Scientists • Ask questions • Read articles and papers • Fork and share projects • Code in Scala/Python/R/SQL • Jupyter and Zeppelin* Notebooks • RStudio IDE and Shiny apps • Apache Spark • Your favorite libraries • Data Shaping/Pipeline UI * • Auto-data preparation and modeling* • Advanced Visualizations* • Model management and deployment* • Documented Model APIs* • Spark as a Service * DSX product roadmap items Core Attributes of the Data Science Experience
  • 6. © 2015 IBM Corporation6 Tailored Experiences For Users Collaborating Together Architects how data is organized & ensures operability Gets deep into the data to draw hidden insights for the business Works with data to apply insights to the business strategy Plugs into data and models & writes code to build apps Ingest data Transform: clean Create and build model Evaluate Deliver and deploy model Communicate results Understand problem and domain Explore and understand data Transform: shape OUTPUT ANALYSIS INPUT Data Engineer Data Scientist Business Analyst App Developer DataPlatform Forge Data Science Experience Watson Analytics Bluemix
  • 7. © 2015 IBM Corporation7 DSX has RStudio built into the experience…
  • 8. © 2015 IBM Corporation8 Modelling Energy Usage in NYC – BlocPower Blog Link: http://ibm.co/29KLbvu "BlocPower operation is diverse from outreach and targeting, origination of investment-grade clean energy projects to financing projects through our crowdfunding marketplace. Data is the underlying tool of our operation and IBM's Data Science Experience will facilitate a closer integration across it and help our business scale up faster. “ — Tooraji Arvajeh, Chief Engineering Officer, BlocPower
  • 9. © 2015 IBM Corporation9 Use Shiny apps to share your analysis with business users
  • 10. © 2015 IBM Corporation10 Interactively explore the analysis of your data science team
  • 11. © 2015 IBM Corporation11 Adjust parameters on-the-fly and visualize model predictions
  • 12. © 2015 IBM Corporation12 BigInsights (HDFS) Cloudant (DBaaS) dashDB (Analytics) Swift (Object Storage) SQDB (Managed DB2) Data Sources IBM Cloud Public Cloud Cloud Apps On-Premises Execute SQL Statements Streaming Analytics via Micro-batch M.L. and Statistical Algorithms Distributed Graph Processing Framework § General compute engine § Basic I/O functions § Task dispatching § Scheduling Spark Core Spark SQL Spark Streaming MLlib Machine Learning Graph From a Notebook you can use IBM Analytics for Apache Spark to blend multiple data types, sources, and workloads
  • 13. © 2015 IBM Corporation13 The Spark Service uses Bluemix Object Storage as its preferred data store for building performant applications § Object storage provides inexpensive, scalable and self-healing retention of massive amounts of unstructured data § Every object exists at the same level in a flat address space § Bluemix Object Storage has a drag-and-drop upload and Swift API for programmatic access § DataPlatform Connectors enable users to easily move data in and out of Bluemix Object Storage
  • 14. © 2015 IBM Corporation14 All of the supported targets are compatible with each source Supported Data Sources for DSX via on- premises and cloud Connectors Cloud Sources On-Premises Sources Cloud Targets On-Premises Targets Amazon Redshift Apache Hive Amazon S3 IBM DB2® LUW Amazon S3 Cloudera Impala Bluemix Object Storage IBM Pure Data for Analytics® Apache Hive IBM DB2® LUW IBM Cloudant™ Teradata Bluemix Object Storage IBM Informix® IBM dashDB IBM BigInsights™ on Cloud * IBM Pure Data for Analytics® IBM BigInsights™ on Cloud * IBM Cloudant™ Microsoft SQL Server IBM DB2® on Cloud IBM dashDB MySQL Enterprise Edition IBM SQL Database IBM DB2® on Cloud Oracle IBM Watson™ Analytics IBM SQL Database Pivotal Greenplum PostgreSQL on Compose Microsoft Azure PostgreSQL SoftLayer Object Storage PostgreSQL on Compose Sybase Salesforce Sybase IQ SoftLayer Object Storage Teradata
  • 15. © 2015 IBM Corporation15 IBM DSX KEY FEATURES Sparkling Data Prescriptive Analytics Shiny Data Connections Schedule Jobs §Self Service Data Science platform DSX Notebooks Data Community Data Shaping RStudio Projects Scheduling
  • 16. © 2015 IBM Corporation16
  • 17. © 2015 IBM Corporation17 Legal Disclaimer • © IBM Corporation 2014. All Rights Reserved. • The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. • References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. • If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete: Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. • If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete: All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. • Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both. • If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete: Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. • If you reference Java™ in the text, please mark the first use and include the following; otherwise delete: Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. • If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete: Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. • If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete: Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. • If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete: UNIX is a registered trademark of The Open Group in the United States and other countries. • If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete: Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. • If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only.