SlideShare a Scribd company logo
Future of
Data Science
Rethinking business, technology and data.
Carlo Appugliese
Data Science Evangelist
The digital age has changed the way we
Live, Play, Learn and Work…
Companies must
shift to a
Data-Driven
Business
are vulnerable
to disruption
within
three years
72%
Transformation is Critical…
Estimated worldwide
startups each day
274,000
Why we’re all vulnerable
to seismic shifts
External Threats
Born-on-digital companies that steal market
share or rewrite customer expectations
New business models that reinvent our industry
and change the game altogether
Internal Threats
Siloed data and systems
Gaps in expertise and skills
Inability to react quickly
4Group Name / DOC ID / Month XX, 2017 SOURCE cited in notes
Value
Uses of Data
Efficiency Modernization Data Decision Monetization
Unleashing your data and making the shift to a
Data-Driven Organization
Operations Reporting &
Data
Warehousing
Self-Service
Analytics
New
Business
Models
Data Science
Data science is a "concept to unify statistics, data analysis
and their related methods" in order to "understand and
analyze an actual phenomena" with data.
What is Data Science?
Math &
Stats
Computer
Science
Domain
Expertise
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark
Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
What makes a Data Scientist?
Unicorn
Data Science Projects Require multiple Skills
Math &
Stats
Computer
Science
Domain
Expertise
Data Science Projects Require multiple Skills
What makes a Data Scientist?
Unicorn
Machine
Learning
ResearchEngineering
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark
Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
Data Science is a Team Sport.
Business
Analysts
Data
Scientists
Application
Developers
Data
Engineers
Clearly Articulate
Use Case
Gather all the Data
Apply
Machine
Learning
Prepare Data
Digital
Application
Evaluate
Steps to put Data Science to work..
Data Predictions
& Insight
“Computers that learn without being explicitly programmed”
“Using algorithms to understand patterns in data”
Algorithms
Machine Learning… What is it?
Machine Learning - Process
Data
Ingestion
Data Cleaning
and
Transformation
Model
Training
Testing and
Validation
Deployment
Model Selection
History of Democratizing Data Science
1960s
Digital
Calculator Spreadsheet SQL Machine Learning
1960s
IBM
1980s
Desktop
1700s
Mechanical
Innovation
1970s
IBM
1990s
OO
1980s
IBM
2010s
Open Source
2017
2020s
AI
Math &
Stats
Computer
Science
Domain
Expertise
Scripting, SQL
Python, R Scala
Data Pipelines
Big Data/
Apache Spark Mathematics
Computational
Domain Knowledge
Supply Chain
CRM
Financials
Networking
Future of Data Science is in Democratizing Machine Learning and AI in the Cloud
Future - Democratizing Machine Learning & AI…
Unicorn
Machine
Learning
ResearchEngineering
Example of Machine Learning
Building Model to Predict Energy Consumption of Buildings.
Example of Machine Learning in Action
Chat Bot to estimate energy cost from an image of building.
Great, thanks for that
picture! Looks like
your building is
made of stone and
has large windows
I estimate your
building has a high
energy usage
intensity (EUI), with a
97.01% probability
Data Science technology trends..
SPSS SAS
Python R Scala
Trends in Google Searches (September 2nd 2016)
Data Science is Driving the Database to Big Data Evolution..
Databases
Big Data
Source: Google Trends
Hadoop
Spark
19
Open R ->
Big Data ->
Python ->
The Convergence of Big Data & Data Science
Launch Spark Technology
Cluster
www.Spark.tc
Contribute to
Community
Infuse
Portfolio
Integrate Apache Spark
throughout IBM’s
portfolio
Used by Watson
Foster
Community
Educate and grow data
scientist community
www.BigDataUniversity.com
"It's like Spark just got blessed
by the enterprise rabbi."
Ben	Horowitz
IBM is all-in on Spark
IBM Contributions – Driving Data Science at Scale…
38,500 Spark LOC
863 Spark JIRAs
253 SystemML JIRAs
422 Commits in Spark 2.0
0
200
400
600
800
19 23 28 33 37 41 45 49 1 5 9 13 17 22 25 29 33 37
Contribution Progress
2015
2016
2
1
Top 3
Driving Data Science
• ML
• PySpark
• SQL
§ Spark Machine Learning (ML) provides a toolset to create pipelines
of different ML related transformations on your data
§ IBM is #1 contributor in the Spark (ML)
IBM impact on SparkML / MLlib 2.0
0
20
40
60
80
100
120
140
Top 10 Contributing Companies to Spark ML/MLlib 2.0.0
34%
Hortonworks
16%Databricks
13%
Intel
9%
Contributions to Spark ML 2.0.0
IBM Data Science
Experience is an
environment that brings
together everything that a
Data Scientist needs to be
more productive, including
tools, data and content
Be a better data scientist
Introducing IBM Data Science Experience
Built-in learning to
get started or go
the distance with
advanced
tutorials
Learn
The best of open source
and IBM value-add to
create state-of-the-art
data products
Create
Community and
social features that
provide meaningful
collaboration
Collaborate
http://datascience.ibm.com
IBM Data Science Experience
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Watson Machine Learning
• SPSS Modeler Canvas
• Advanced Visualizations
• Projects and Version Control
• Managed Spark Service
• Code in Scala/Python/R
• Jupyter Notebooks
• RStudio IDE and Shiny
• Apache Spark
• Your favorite libraries
Open source is a powerful engine, but as with any engine, it needs
the full system to accomplish any work
• Hosting – Tools are ready to
go, no install necessary
§ Security – SSO and code
hardening to reduce security
gaps
§ Version Currency – We
keep up-to-date as open
source quickly iterates
§ Data Connectivity –
Connect to data sources
§ Scalability – Makes tools
designed for desktops
scalable to enterprise
workloads
We provide:
Notebooks are browser-based interactive and collaborative development
environments for data science
Notebooks are
interactive
computational
environments, in
which you can
combine code
execution, rich text,
mathematics, plots
and rich media.
Projects are shared, collaborative workspaces that gather all assets &
content in a single area
Internal and external
collaborators can be added,
with relevant roles /
permissions set by project
owner
Any type of analytical asset can
be part of a project, clicking on
asset opens it in the right tool
and in project context
Each project provides its own
separate storage space,
available to collaborators only
People
Artifacts
ln:
ln:
ln:
ln:
ln:
ln:
Data
Project
Divide by function: Similar to a surgical team, notebooks enable
work to be partitioned functionally, by skill level
Surgeon:
Executes all other
pre and post work
Attending
Surgeon:
Executes most
delicate procedures
requiring greatest
skill
Resident:
Preps the patient
and assists
Data Scientist:
Exploratory
analysis, feature
selection,
deployment
Sr. Data Scientist:
Builds advanced
models, reviews
earlier work
Business Analyst:
Articulates
problem, finds and
prepares data
© 2016 IBM Corporation28
Watson Machine Learning capabilities overview
Predictive
Power
100%
Capacity
Model Builder
(CADS)
Build model1
Deploy model2
Refresh model3
Import Sources:
§ DSx Notebooks
§ DSx Flow UI
§ External tools
Auto-generate model
from input data,
testing various
algorithms for best
fit (e.g. CADS)
Detect loss of
predictive power and
refresh model,
subject to
preferences
Deploy model
into production -
scale, manage
and monitor
Model Automation Model Deployment
Model
The full range of Watson Cognitive services will be accessible
within DSx
Alchemy
Language
Conversa-
tion Dialog
Document
Conversion
Language
Translator
Natural
Language
Classifier
Natural
Language
Under-
standing
Personality
Insights
Retrieve
and Rank
Tone
Analyzer
Speech to
Text
Visual
Recognition
Text to
Speech
Alchemy-
Data News
Discovery
Discovery
News
Tradeoff
Analytics
Speech
Vision
Data Insights
Language
We’ve been recognized for our vision
Source: https://www.gartner.com/doc/reprints?id=1-3TKD8OH&ct=170215&st=sb
http://www.developerweek.com/awards/2017-devies-award-winners/
Gartner Magic Quadrant 2017
Data Science Platforms
DeveloperWeek 2017
Devie
Forrester Wave 2017
Predictive Analytics & Machine Learning
IBM Data Science Experience
https://www.youtube.com/watch?v=HPzXlFp4rKE
Demo
Get Started with Data Science Experience Today!
§ DSx is available for personal use for free, with enough
power to learn data science and try most examples
§ Follow the example outlined in a blog post, with link to the
full GitHub repo and step by step instructions (see
README in directory)
§ http://datascience.ibm.com/blog/modeling-energy-
usage-in-new-york-city/
§ Additional tutorials and reference materials within the
community section of DSx
§ Find your own use case and try it, or find other relevant
examples within DSx
Sign up
Learn
Try it!
Cloud: datascience.ibm.com
Desktop: datascience.ibm.com/desktop
Local: datascience.ibm.com/local
© IBM Corporation 2017
IBM, the IBM logo, ibm.com, and Watson are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on
their first occurrence in this information with the appropriate symbol (® or ™), these symbols indicate U.S. registered or
common law trademarks owned by IBM at the time this information was published. Such trademarks may also be
registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information.”
• Other company, product, and service names may be trademarks or service marks of others.
• References in this publication to IBM products or services do not imply that IBM intends to make them available in all
countries in which IBM operates.
Trademarks and notes
34Group Name / DOC ID / Month XX, 2017
The Future of Data Science

More Related Content

PDF
Introduction on Data Science
PDF
How to Become a Data Scientist
PDF
Data science
PDF
AI and Data Science.pdf
PDF
Building a performing Machine Learning model from A to Z
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
PDF
Introduction to Data Science
PDF
Introduction to Data Science
Introduction on Data Science
How to Become a Data Scientist
Data science
AI and Data Science.pdf
Building a performing Machine Learning model from A to Z
Artificial Intelligence, Machine Learning and Deep Learning
Introduction to Data Science
Introduction to Data Science

What's hot (20)

PPTX
Data science applications and usecases
PPTX
Introduction to data science club
PPTX
Data science
PPTX
Data science & data scientist
PDF
Introduction to data science
PPTX
Predictive analytics
PPTX
Introduction to Data Science
PPTX
introduction to data science
PPTX
Data science
PPTX
Data Science
PDF
Data science presentation
PPTX
Data analytics
PDF
Data science
PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
PPTX
Introduction to data science.pptx
PPTX
Introduction to data science
PDF
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
PDF
Data Scientist Job, Career & Salary | Data Scientist Salary | Data Science Ma...
PPTX
Predictive Analytics - An Introduction
PDF
Smart Data Slides: Machine Learning - Case Studies
Data science applications and usecases
Introduction to data science club
Data science
Data science & data scientist
Introduction to data science
Predictive analytics
Introduction to Data Science
introduction to data science
Data science
Data Science
Data science presentation
Data analytics
Data science
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Introduction to data science.pptx
Introduction to data science
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Data Scientist Job, Career & Salary | Data Scientist Salary | Data Science Ma...
Predictive Analytics - An Introduction
Smart Data Slides: Machine Learning - Case Studies
Ad

Similar to The Future of Data Science (20)

PDF
IBM i & Data Science in the AI era.
PDF
BBBT Watson Data Platform Presentation
PDF
IBM and Apache Spark
PPTX
Scaling Data Science on Big Data
PDF
IIPGH Webinar 1: Getting Started With Data Science
PDF
Data Science with Spark
PDF
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...
PDF
Libera la potenza del Machine Learning
PDF
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
PDF
Data science a practitioner's perspective
PPTX
Big Data in Education Sector
PPTX
Cognitive Assistant for Data Scientists (CADS)
PPTX
Simplifying AI and Machine Learning with Watson Studio
PPTX
Software engineering practices for the data science and machine learning life...
PDF
What your employees need to learn to work with data in the 21 st century
PPTX
Career_Jobs_in_Data_Science.pptx
PPTX
Machine Learning Models in Production
PPTX
Data science 101 Masterclass
PPTX
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
PPTX
A Practical-ish Introduction to Data Science
IBM i & Data Science in the AI era.
BBBT Watson Data Platform Presentation
IBM and Apache Spark
Scaling Data Science on Big Data
IIPGH Webinar 1: Getting Started With Data Science
Data Science with Spark
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...
Libera la potenza del Machine Learning
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
Data science a practitioner's perspective
Big Data in Education Sector
Cognitive Assistant for Data Scientists (CADS)
Simplifying AI and Machine Learning with Watson Studio
Software engineering practices for the data science and machine learning life...
What your employees need to learn to work with data in the 21 st century
Career_Jobs_in_Data_Science.pptx
Machine Learning Models in Production
Data science 101 Masterclass
Voxxed Athens 2018 - IBM Watson Machine Learning – Build and train AI models ...
A Practical-ish Introduction to Data Science
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
KodekX | Application Modernization Development
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Cloud computing and distributed systems.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
KodekX | Application Modernization Development
Diabetes mellitus diagnosis method based random forest with bat algorithm
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”
Understanding_Digital_Forensics_Presentation.pptx
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf

The Future of Data Science

  • 1. Future of Data Science Rethinking business, technology and data. Carlo Appugliese Data Science Evangelist
  • 2. The digital age has changed the way we Live, Play, Learn and Work…
  • 3. Companies must shift to a Data-Driven Business are vulnerable to disruption within three years 72% Transformation is Critical…
  • 4. Estimated worldwide startups each day 274,000 Why we’re all vulnerable to seismic shifts External Threats Born-on-digital companies that steal market share or rewrite customer expectations New business models that reinvent our industry and change the game altogether Internal Threats Siloed data and systems Gaps in expertise and skills Inability to react quickly 4Group Name / DOC ID / Month XX, 2017 SOURCE cited in notes
  • 5. Value Uses of Data Efficiency Modernization Data Decision Monetization Unleashing your data and making the shift to a Data-Driven Organization Operations Reporting & Data Warehousing Self-Service Analytics New Business Models Data Science
  • 6. Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze an actual phenomena" with data. What is Data Science?
  • 7. Math & Stats Computer Science Domain Expertise Scripting, SQL Python, R Scala Data Pipelines Big Data/ Apache Spark Mathematics Computational Domain Knowledge Supply Chain CRM Financials Networking What makes a Data Scientist? Unicorn Data Science Projects Require multiple Skills
  • 8. Math & Stats Computer Science Domain Expertise Data Science Projects Require multiple Skills What makes a Data Scientist? Unicorn Machine Learning ResearchEngineering Scripting, SQL Python, R Scala Data Pipelines Big Data/ Apache Spark Mathematics Computational Domain Knowledge Supply Chain CRM Financials Networking
  • 9. Data Science is a Team Sport. Business Analysts Data Scientists Application Developers Data Engineers
  • 10. Clearly Articulate Use Case Gather all the Data Apply Machine Learning Prepare Data Digital Application Evaluate Steps to put Data Science to work..
  • 11. Data Predictions & Insight “Computers that learn without being explicitly programmed” “Using algorithms to understand patterns in data” Algorithms Machine Learning… What is it?
  • 12. Machine Learning - Process Data Ingestion Data Cleaning and Transformation Model Training Testing and Validation Deployment Model Selection
  • 13. History of Democratizing Data Science 1960s Digital Calculator Spreadsheet SQL Machine Learning 1960s IBM 1980s Desktop 1700s Mechanical Innovation 1970s IBM 1990s OO 1980s IBM 2010s Open Source 2017 2020s AI
  • 14. Math & Stats Computer Science Domain Expertise Scripting, SQL Python, R Scala Data Pipelines Big Data/ Apache Spark Mathematics Computational Domain Knowledge Supply Chain CRM Financials Networking Future of Data Science is in Democratizing Machine Learning and AI in the Cloud Future - Democratizing Machine Learning & AI… Unicorn Machine Learning ResearchEngineering
  • 15. Example of Machine Learning Building Model to Predict Energy Consumption of Buildings.
  • 16. Example of Machine Learning in Action Chat Bot to estimate energy cost from an image of building. Great, thanks for that picture! Looks like your building is made of stone and has large windows I estimate your building has a high energy usage intensity (EUI), with a 97.01% probability
  • 17. Data Science technology trends.. SPSS SAS Python R Scala Trends in Google Searches (September 2nd 2016)
  • 18. Data Science is Driving the Database to Big Data Evolution.. Databases Big Data Source: Google Trends Hadoop Spark
  • 19. 19 Open R -> Big Data -> Python -> The Convergence of Big Data & Data Science
  • 20. Launch Spark Technology Cluster www.Spark.tc Contribute to Community Infuse Portfolio Integrate Apache Spark throughout IBM’s portfolio Used by Watson Foster Community Educate and grow data scientist community www.BigDataUniversity.com "It's like Spark just got blessed by the enterprise rabbi." Ben Horowitz IBM is all-in on Spark
  • 21. IBM Contributions – Driving Data Science at Scale… 38,500 Spark LOC 863 Spark JIRAs 253 SystemML JIRAs 422 Commits in Spark 2.0 0 200 400 600 800 19 23 28 33 37 41 45 49 1 5 9 13 17 22 25 29 33 37 Contribution Progress 2015 2016 2 1 Top 3 Driving Data Science • ML • PySpark • SQL
  • 22. § Spark Machine Learning (ML) provides a toolset to create pipelines of different ML related transformations on your data § IBM is #1 contributor in the Spark (ML) IBM impact on SparkML / MLlib 2.0 0 20 40 60 80 100 120 140 Top 10 Contributing Companies to Spark ML/MLlib 2.0.0 34% Hortonworks 16%Databricks 13% Intel 9% Contributions to Spark ML 2.0.0
  • 23. IBM Data Science Experience is an environment that brings together everything that a Data Scientist needs to be more productive, including tools, data and content Be a better data scientist Introducing IBM Data Science Experience
  • 24. Built-in learning to get started or go the distance with advanced tutorials Learn The best of open source and IBM value-add to create state-of-the-art data products Create Community and social features that provide meaningful collaboration Collaborate http://datascience.ibm.com IBM Data Science Experience • Find tutorials and datasets • Connect with Data Scientists • Ask questions • Read articles and papers • Fork and share projects • Watson Machine Learning • SPSS Modeler Canvas • Advanced Visualizations • Projects and Version Control • Managed Spark Service • Code in Scala/Python/R • Jupyter Notebooks • RStudio IDE and Shiny • Apache Spark • Your favorite libraries
  • 25. Open source is a powerful engine, but as with any engine, it needs the full system to accomplish any work • Hosting – Tools are ready to go, no install necessary § Security – SSO and code hardening to reduce security gaps § Version Currency – We keep up-to-date as open source quickly iterates § Data Connectivity – Connect to data sources § Scalability – Makes tools designed for desktops scalable to enterprise workloads We provide:
  • 26. Notebooks are browser-based interactive and collaborative development environments for data science Notebooks are interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media.
  • 27. Projects are shared, collaborative workspaces that gather all assets & content in a single area Internal and external collaborators can be added, with relevant roles / permissions set by project owner Any type of analytical asset can be part of a project, clicking on asset opens it in the right tool and in project context Each project provides its own separate storage space, available to collaborators only People Artifacts ln: ln: ln: ln: ln: ln: Data Project
  • 28. Divide by function: Similar to a surgical team, notebooks enable work to be partitioned functionally, by skill level Surgeon: Executes all other pre and post work Attending Surgeon: Executes most delicate procedures requiring greatest skill Resident: Preps the patient and assists Data Scientist: Exploratory analysis, feature selection, deployment Sr. Data Scientist: Builds advanced models, reviews earlier work Business Analyst: Articulates problem, finds and prepares data © 2016 IBM Corporation28
  • 29. Watson Machine Learning capabilities overview Predictive Power 100% Capacity Model Builder (CADS) Build model1 Deploy model2 Refresh model3 Import Sources: § DSx Notebooks § DSx Flow UI § External tools Auto-generate model from input data, testing various algorithms for best fit (e.g. CADS) Detect loss of predictive power and refresh model, subject to preferences Deploy model into production - scale, manage and monitor Model Automation Model Deployment Model
  • 30. The full range of Watson Cognitive services will be accessible within DSx Alchemy Language Conversa- tion Dialog Document Conversion Language Translator Natural Language Classifier Natural Language Under- standing Personality Insights Retrieve and Rank Tone Analyzer Speech to Text Visual Recognition Text to Speech Alchemy- Data News Discovery Discovery News Tradeoff Analytics Speech Vision Data Insights Language
  • 31. We’ve been recognized for our vision Source: https://www.gartner.com/doc/reprints?id=1-3TKD8OH&ct=170215&st=sb http://www.developerweek.com/awards/2017-devies-award-winners/ Gartner Magic Quadrant 2017 Data Science Platforms DeveloperWeek 2017 Devie Forrester Wave 2017 Predictive Analytics & Machine Learning
  • 32. IBM Data Science Experience https://www.youtube.com/watch?v=HPzXlFp4rKE Demo
  • 33. Get Started with Data Science Experience Today! § DSx is available for personal use for free, with enough power to learn data science and try most examples § Follow the example outlined in a blog post, with link to the full GitHub repo and step by step instructions (see README in directory) § http://datascience.ibm.com/blog/modeling-energy- usage-in-new-york-city/ § Additional tutorials and reference materials within the community section of DSx § Find your own use case and try it, or find other relevant examples within DSx Sign up Learn Try it! Cloud: datascience.ibm.com Desktop: datascience.ibm.com/desktop Local: datascience.ibm.com/local
  • 34. © IBM Corporation 2017 IBM, the IBM logo, ibm.com, and Watson are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information.” • Other company, product, and service names may be trademarks or service marks of others. • References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates. Trademarks and notes 34Group Name / DOC ID / Month XX, 2017