SlideShare a Scribd company logo
Delivering	Data	Science	
to	the	Business
Madhu Kochar
Vice	President,	Analytics	Product	
Development	and	Client	Services
IBM
Operationalizing Machine Learning and getting actionable insights has been
a huge challenge
Organization needs to act fast
ACT
NOW!
Operationalize Machine LearningData still lives in Silos
IBM Db2
Business Objective:
Drive top line growth and market share
Optimize Real-Time Marketing (RTM) and improve Return On
Investment (ROI)
Outdoor Equipment
Let’s meet Amy who works for
Outdoor Equipment Inc.
Amy
Marketing Director
Company:
Outdoor Equipment Inc. is a full-line sporting goods retailer
Amy wants to promote sales campaign
at targeted customers to increase
organization’s revenue
Sleeping Bags
Camping Chairs and Bedding
Ryan
Data Scientist
Amy needs to work with different teams who perform specific tasks
to execute the campaign
Amy
Marketing Director
Nick
Application Developer
Chris
Data Engineer
Product
details
Customer
details
Sales
campaign
Chris
Data Engineer
Operationalize
Machine
Learning
Ryan
Data Scientist
Nick
Application Developer
Federation Application IntegrationSpark Integration
With Big SQL, Amy’s team can self serve their requirement, save
time on execution and enhance productivity
Self Service
IBM Big SQL
Chris
Data Engineer
Big SQL Key Capabilities
Federation
and
Spark
Performance
Enterprise
and
Security
SQL
Compatibility
Relational
Databases
Leads
performance
metrics on high
volumes of data
and concurrent
streams
Role and Column
level Security
Ranger Integration
NoSQL
Object
Stores
PROCESSING
DATA
STORAGE
ACCESS
H o r t o n w o r k s
P o w e r S y s t e m s
E l a s t i c S t o r a g e S e r v e r
IBM
B i g S Q L
IBMIBM
3x Price-Performance Guaranteed
Get more performance with Power Systems
Find New Business Opportunities or Solve Business Problems using Big SQL
9
How do I get started?
Big SQL sandbox
Big SQL v5.0.1
NOW
Available on HDP v2.6.2
Try NOW!
Scaling Data Science
on Big Data
Date: Wed, 9/20 @ 11:00 AM
Room: C2.3
1 Ingesting Data at Blazing Speed using
Apache ORC
Data: Wed, 9/20 @ 4:20 PM
Room: C4.7
2
Open metadata and governance
with Apache Atlas
Date: Wed, 9/20 @ 5:10 PM
Room: C4.6
Empowering YOU with Democratized Data
Access, Data Science and Machine Learning
Date: Wednesday, 9/20 @ 6:00 PM
Room: C4.5
3 4
Breaching the 100TB mark
with SQL over Hadoop
Date: Thurs, 9/21 @ 2:20 PM
Room: C2.3
Birds-of a Feather: Apache Spark, Apache
Zeppelin and Data Science
Date: Thurs, 9/21 @ 6:00 PM
Room: C4.5
5 6
Thank you!
Check out the Breakout sessions
Visit IBM Booth for More Information!
Find more #DWS17 sessions and
slides at:
www.DataWorksSummit.com
12
T H A N K 	 Y O U

More Related Content

PPTX
Securing your Big Data Environments in the Cloud
PDF
Empowering you with Democratized Data Access, Data Science and Machine Learning
PPTX
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
PDF
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
PPTX
Scaling Data Science on Big Data
PPTX
Insights into Real World Data Management Challenges
PPTX
Log I am your father
POTX
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
Securing your Big Data Environments in the Cloud
Empowering you with Democratized Data Access, Data Science and Machine Learning
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Scaling Data Science on Big Data
Insights into Real World Data Management Challenges
Log I am your father
Addressing Enterprise Customer Pain Points with a Data Driven Architecture

What's hot (20)

PPTX
Hadoop for the Masses
PPTX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
PDF
On Demand HDP Clusters using Cloudbreak and Ambari
PPTX
How to deploy machine learning models into production
PPTX
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
PDF
Data pipeline and data lake for autonomous driving
PPTX
Security, ETL, BI & Analytics, and Software Integration
PPTX
Big Data at your Desk with KNIME
PPTX
Optimizing your SparkML pipelines using the latest features in Spark 2.3
PPTX
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
PDF
Data-In-Motion Unleashed
PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
PPTX
Geospatial data platform at Uber
PDF
Seeing Redshift: How Amazon Changed Data Warehousing Forever
PDF
Ingesting Data at Blazing Speed Using Apache Orc
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
PPTX
Dynamic DDL: Adding structure to streaming IoT data on the fly
PPTX
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Hadoop for the Masses
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
On Demand HDP Clusters using Cloudbreak and Ambari
How to deploy machine learning models into production
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
Data pipeline and data lake for autonomous driving
Security, ETL, BI & Analytics, and Software Integration
Big Data at your Desk with KNIME
Optimizing your SparkML pipelines using the latest features in Spark 2.3
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
High Performance Spatial-Temporal Trajectory Analysis with Spark
Data-In-Motion Unleashed
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Geospatial data platform at Uber
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Ingesting Data at Blazing Speed Using Apache Orc
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
Dynamic DDL: Adding structure to streaming IoT data on the fly
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Ad

Viewers also liked (17)

PDF
Data Guarantees and Fault Tolerance in Streaming Systems
PDF
Beyond Big Data: Data Science and AI
PDF
The Apache Way
PDF
SparkR Best Practices for R Data Scientists
PDF
Next Generation Execution for Apache Storm
PDF
Apache Hadoop Crash Course
PDF
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
PDF
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
PDF
Data Science Crash Course
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
PDF
How Big Data and Deep Learning are Revolutionizing AML and Financial Crime De...
PDF
The Future of Data in Telecom and the Rise of Connected Communities
PDF
Apache Spark Crash Course
PDF
Running Zeppelin in Enterprise
PDF
An Apache Hive Based Data Warehouse
PDF
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
PPTX
Performance Update: When Apache ORC Met Apache Spark
Data Guarantees and Fault Tolerance in Streaming Systems
Beyond Big Data: Data Science and AI
The Apache Way
SparkR Best Practices for R Data Scientists
Next Generation Execution for Apache Storm
Apache Hadoop Crash Course
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
Data Science Crash Course
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
How Big Data and Deep Learning are Revolutionizing AML and Financial Crime De...
The Future of Data in Telecom and the Rise of Connected Communities
Apache Spark Crash Course
Running Zeppelin in Enterprise
An Apache Hive Based Data Warehouse
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Performance Update: When Apache ORC Met Apache Spark
Ad

Similar to Delivering Data Science to the Business (20)

PDF
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
PDF
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
PDF
IBM Governed Data Lake
PDF
Serverless projects at Myplanet
PDF
Unlock the Power of Your Data: A Comprehensive Guide to Microsoft Fabric by K...
PPTX
Unlock Data-driven Insights in Databricks Using Location Intelligence
PDF
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
PPTX
Databricks on AWS.pptx
PDF
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
PPTX
Liberate Your Data: Integrate Data From Traditional On-Prem Systems to Next-G...
PPTX
Master the art of Data Science
PDF
Building an End-to-End Solution in Microsoft Fabric: From Dataverse to Power ...
PPT
Web hosting is a software business
PPTX
AzureML Welcome to the future of Predictive Analytics
PDF
Data Culture Series - Keynote - 3rd Dec
PDF
Effective Cost Management for Amazon EMR
PDF
ChatGPT and not only: How to use the power of GPT-X models at scale
PPTX
IBM Meetup on November 1, 2018: Machine Learning made easy with Watson Studio
PDF
Arocom Company - Portfolio Brochure Details.pdf
PDF
TestGuild and QuerySurge Presentation -DevOps for Data Testing
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
IBM Governed Data Lake
Serverless projects at Myplanet
Unlock the Power of Your Data: A Comprehensive Guide to Microsoft Fabric by K...
Unlock Data-driven Insights in Databricks Using Location Intelligence
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Databricks on AWS.pptx
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Liberate Your Data: Integrate Data From Traditional On-Prem Systems to Next-G...
Master the art of Data Science
Building an End-to-End Solution in Microsoft Fabric: From Dataverse to Power ...
Web hosting is a software business
AzureML Welcome to the future of Predictive Analytics
Data Culture Series - Keynote - 3rd Dec
Effective Cost Management for Amazon EMR
ChatGPT and not only: How to use the power of GPT-X models at scale
IBM Meetup on November 1, 2018: Machine Learning made easy with Watson Studio
Arocom Company - Portfolio Brochure Details.pdf
TestGuild and QuerySurge Presentation -DevOps for Data Testing

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Building Integrated photovoltaic BIPV_UPV.pdf
A Presentation on Artificial Intelligence
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm

Delivering Data Science to the Business

  • 2. Operationalizing Machine Learning and getting actionable insights has been a huge challenge Organization needs to act fast ACT NOW! Operationalize Machine LearningData still lives in Silos IBM Db2
  • 3. Business Objective: Drive top line growth and market share Optimize Real-Time Marketing (RTM) and improve Return On Investment (ROI) Outdoor Equipment Let’s meet Amy who works for Outdoor Equipment Inc. Amy Marketing Director Company: Outdoor Equipment Inc. is a full-line sporting goods retailer
  • 4. Amy wants to promote sales campaign at targeted customers to increase organization’s revenue Sleeping Bags Camping Chairs and Bedding
  • 5. Ryan Data Scientist Amy needs to work with different teams who perform specific tasks to execute the campaign Amy Marketing Director Nick Application Developer Chris Data Engineer Product details Customer details Sales campaign Chris Data Engineer Operationalize Machine Learning
  • 6. Ryan Data Scientist Nick Application Developer Federation Application IntegrationSpark Integration With Big SQL, Amy’s team can self serve their requirement, save time on execution and enhance productivity Self Service IBM Big SQL Chris Data Engineer
  • 7. Big SQL Key Capabilities Federation and Spark Performance Enterprise and Security SQL Compatibility Relational Databases Leads performance metrics on high volumes of data and concurrent streams Role and Column level Security Ranger Integration NoSQL Object Stores
  • 8. PROCESSING DATA STORAGE ACCESS H o r t o n w o r k s P o w e r S y s t e m s E l a s t i c S t o r a g e S e r v e r IBM B i g S Q L IBMIBM 3x Price-Performance Guaranteed Get more performance with Power Systems
  • 9. Find New Business Opportunities or Solve Business Problems using Big SQL 9 How do I get started? Big SQL sandbox Big SQL v5.0.1 NOW Available on HDP v2.6.2 Try NOW!
  • 10. Scaling Data Science on Big Data Date: Wed, 9/20 @ 11:00 AM Room: C2.3 1 Ingesting Data at Blazing Speed using Apache ORC Data: Wed, 9/20 @ 4:20 PM Room: C4.7 2 Open metadata and governance with Apache Atlas Date: Wed, 9/20 @ 5:10 PM Room: C4.6 Empowering YOU with Democratized Data Access, Data Science and Machine Learning Date: Wednesday, 9/20 @ 6:00 PM Room: C4.5 3 4 Breaching the 100TB mark with SQL over Hadoop Date: Thurs, 9/21 @ 2:20 PM Room: C2.3 Birds-of a Feather: Apache Spark, Apache Zeppelin and Data Science Date: Thurs, 9/21 @ 6:00 PM Room: C4.5 5 6 Thank you! Check out the Breakout sessions Visit IBM Booth for More Information!
  • 11. Find more #DWS17 sessions and slides at: www.DataWorksSummit.com
  • 12. 12 T H A N K Y O U

Editor's Notes

  • #3: Organizations understand the importance of machine learning and are exploring ways to implement it to improve their business. However every line of business has the challenge to find the best way of operationalizing machine learning for their business. Data gravity creates silos in the organization and it’s a challenge to bring all this data together for analyses. Even if the data can be brought together, using an ML model with data requires special set of skills and development effort. After operationalizing the machine learning model, businesses want to take actions on the discovered insights. These actions can be of variety and demand integration and development efforts. Businesses cant be agile and swiftly act on data unless these problems are tied together and addressed with a self service tool.
  • #4: Lets meet Amy, who works for Outdoor Equipment Inc. Outdoor Equipment Inc is an sporting goods retailer. Amy works as a Marketing Director for this organization. Being an exec, her business objectives are to grow the business and her organization’s market share. She plans achieve her business goals by Real time marketing and improving ROI.
  • #5: Based on competitive analysis, market trends and customer behaviors, Amy’s team has concluded that a prospective customer may convert into a paying customer if they are provided with proper incentive to shop. This key finding motivated Amy to come up with a sales campaign to send out product promotions to targeted users based on their interest in products. Amy is a well informed exec and understands the power of data science. She has decided to leverage it to get maximum ROI. She wants to put the right incentives in the hand of the right customer to convert them. She has put together a plan to run a sales campaign for 3 months with a variety of products that are available in the store.
  • #6: Amy has to work with different teams that perform specific set of tasks in order to execute the marketing campaign. Chris is the data engineer that unifies the data which exists in different data platforms such as hadoop, db2 and other RDBMs. Chris pull all the data together into one single platform so that it can be used to operationalize the machine learning model to get predictions. Ryan is the data scientiest that creates the ML model based on Amy’s requirement so as to recommend the product category that a customer would likely be interested into. Once, Chris has used the ML model created by Ryan, they have a result set of customer and their interest. Finaly Nick integrates the result with mail gun app to send out emails to targeted customers with product promotions. This repeats everyday as the product promotions are refreshed and are extensive during seasonal sales.
  • #7: With Big SQL, Amy’s team can start becoming self sufficient in operationalizing the assets on regular bases once they are created by Ryan and Nick. Amy’s team can leverage Big SQL’s federation capabilities to connect and query data that is stored in separate data sources in a secured way as its setup by Chris. So now Chris doesn’t need to ingest and bring the data into single data location. With Federation and Predicate pushdown, only the data that matters, travels over the wire. With Big SQL and Spark integration, Amy’s team can operationalize spark ML models without knowing the details of how Spark works or what Spark API’s. Finally, Amy’s team can push out the discounted sales promotions that are refreshed every day to the customers by leveraging Big SQL’s capability to call applications developed by users. Technical Meaning - Application is wrapped as a UDF and can be invoked by BigSQL Let me show you in demo that how Eric who is a marketing analyst and works for Amy is able to operationalize this whole effort in just couple of SQL statements. By using Big SQL, Amy’s team is more self-relaint in executing the marketing Campaign because of its capabilities to ties all these separate tasks together through a single tool. Amy still works with Ryan and Nick but only if she needs any changes in the assets.
  • #8: After that exciting demo, I would like to summarize that how Big SQL can help you in making your team’s more productive and improve your business Big SQL understands different sql dialects so you can leverage your existing skills on Oracle and Netezza to build application on Big SQL or import enterprise workloads on hadoop platform and run it as is without any change. Big SQL’s can access remote databases and perform query pushdown to these federated data sources. Big SQL’s integrates with Spark Bi-directionally in memory to exchange data between Big SQL data sets and Spark Dataframes. This lets Big SQL call any Spark application and operationalize Spark ML models with enterprise data. Big SQL exhibits high performance even when data scales upto 100TB with complex SQL queries. It comes with a work load manager that lets the enterprise do a lot of plumbing with resource allocation and workloads. Big SQL also has a proven track record to support many concurrent users without degrading performance. Big SQL comes enterprise ready with build in security features and also integrates with Apache Ranger for centralized management of your hadoop environment. Details: SQL COMPATIBILITY SQL Compatible with: netezza, oracle, db2, etc Applications work as-is without any changes FEDERATION AND SPARK: Federates to more than 10 data sources: RDBMS, NoSQL and/or Object Stores Integrates bi-directionally with Spark, like no other Operationalizes ML models PERFORMANCE Exhibits high performance even when data scales up to 100TB with complex SQLs Handles many concurrent users without relinquishing performance ENTERPRISE & SECURITY Secures data using SQL with roles Integrates with Ranger for centralized management
  • #11: We have some very exciting sessions lineup for you in this conference. Please attend these sessions to learn more. If you have questions about the demo or need any more information then please visit us at the IBM booth in the expo hall.