SlideShare a Scribd company logo
MACHINE LEARNING IN THE ENTERPRISE
Timothy Spann | Senior Solutions Engineer
@PaasDev
2 © Cloudera, Inc. All rights reserved.
DISCLAIMER
DA
Introduction
Tim Spann has been running meetups in Princeton on Big Data technologies since 2015.
Tim has spoken at several international conferences on Apache NiFi.
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-apache-ni
Machine Learning in the Enterprise 2019
Hadoop {Submarine} Project: Running deep learning workloads on YARN ,
Tim Spann (Cloudera)
Machine Learning in the Enterprise 2019
IOT EDGE PROCESSING WITH MINIFI AND MULTIPLE DEEP LEARNING LIBRARIES
8 © Cloudera, Inc. All rights reserved.
9 © Cloudera, Inc. All rights reserved.
The Industry’s First Enterprise Data Cloud
From the Edge to AI
10 © Cloudera, Inc. All rights reserved.
WHY CLOUDERA?
One stop shop for analytics
Unified open architecture
Hybrid and multi-cloud
INGEST &
STREAMING
DATA
SCIENCE
DATA
WAREHOUSE
OPERATIONAL
DATABASE
DATA
ENGINEERING
11 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA FLOW (CDF)
12© Cloudera, Inc. All rights reserved.
13© Cloudera, Inc. All rights reserved.
MACHINE LEARNING PHASES
Where to Connect to Apache NiFi
14© Cloudera, Inc. All rights reserved.
HANDS ON
CDSW + NiFi
https://community.hortonworks.com/articles/239961/using-cloudera-data-science-workbench-with-apache.html
© Cloudera, Inc. All rights reserved.
16 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE
17 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING IS A GROWTH ENGINE
PROTECT
business
CONNECT
products & services (IoT)
DRIVE
customer insights
●
●
●
●
●
●
●
●
●
It’s enabling entirely new businesses, not just modernizing existing systems.
Machine learning refers to algorithms and methods to extract useful patterns from data.
When we say machine learning, we mean broad, transformational data capabilities.
18 © Cloudera, Inc. All rights reserved.
MOVING FROM EXPLORATION TO PRODUCTION OF ML & AI
WE’RE WITNESSING THE INDUSTRIALIZATION OF AI
FROM THE LAB… TO THE FACTORY
19 © Cloudera, Inc. All rights reserved.
ENTERPRISE-GRADE AI OPERATIONS
WHETHER YOU ARE A FORTUNE 100 OR A STARTUP
SECURITY,
GOVERNANCE,
COMPLIANCE
STRATEGY PEOPLE &
ORGANIZATION
TECHNOLOGY
20 © Cloudera, Inc. All rights reserved.
AI
MACHINE
LEARNING
DATA SCIENCE
ANALYTICS
"BIG DATA"
CLOUD
21 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING AT CLOUDERA
Our philosophy
●
●
●
22© Cloudera, Inc. All rights reserved.
OUR APPROACH
Modern enterprise platform, tools and expert guidance to help you unlock
business value with ML/AI
Agile platform to build,
train, and deploy many
scalable ML applications
Enterprise data science
tools to accelerate
team productivity
Expert guidance,
services & training to
fast track value & scale
23 © Cloudera, Inc. All rights reserved.
PLATFORM
© Cloudera, Inc. All rights reserved. 24
AND ONE MORE THING….
25 © Cloudera, Inc. All rights reserved.
Amazon
S3
Microsoft
ADLS HDFS KUDU
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
DATA CATALOG
Core
Services
Storage
Services
ANALYTIC
DATABASE
DATA
SCIENCE
EXTENSIBLE
SERVICES
OPERATIONAL
DATABASE
DATA
ENGINEERING
MACHINE LEARNING IS BUILT ON DATA MANAGEMENT
Integrated data, workflows, metadata, security, governance, ...
26 © Cloudera, Inc. All rights reserved.
CLOUDERA
ENTERPRISE
DATA
PLATFORM
The modern platform
for machine learning &
analytics optimized for
the cloud
WORKLOADS 3RD
PARTY
SERVICES
DATA
ENGINEERING
DATA
SCIENCE
DATA
WAREHOUSE
OPERATIONAL
DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU
27 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
28 © Cloudera, Inc. All rights reserved.
ACCELERATING THREE STAGES OF MACHINE LEARNING
Manage models
Deploy models
Monitor performance
DEPLOYDEVELOP
Explore data
Develop models
Share results
TRAIN
Optimize parameters
Track experiments
Compare performance
Enterprise AI platform supporting model development, training, and deployment
29 © Cloudera, Inc. All rights reserved.
A PLATFORM FOR
MACHINE LEARNING
• Open platform 
• Complete lifecycle 
• Team collaboration
• Enterprise ready 
• Runs anywhere
RESEARCH | PRODUCTION
LOCAL | SPARK | IMPALA/HIVE
DEPLOYMENT
COMPUTE
OPEN SOURCE ECOSYSTEMALGORITHMS
SELF-SERVICE
TOOLS
SOLUTIONS | USE CASESAPPS
CLOUD ON-PREMISES
ADLSS3 HDFS KUDU
CATALOG | SECURITY | GOVERNANCE
SHARED
CONTEXT
30 © Cloudera, Inc. All rights reserved.
THE CHALLENGE
Balance these needs
DATA SCIENCE
•Access to granular data
•Flexibility
• Preferred open source tools
•Elastic provisioning
• Compute
• Storage
•Reproducible research
•Path to production
DATA MANAGEMENT
•Security
•Governance
•Standards
•Low maintenance
•Low cost
•Self-service access
31 © Cloudera, Inc. All rights reserved.
THE TYPICAL SOLUTION
“If I can’t use my favorite tools, I’ll…”
• Copy data to my laptop
• Copy data to a data science appliance
• Copy data to a cloud service
Why this is a problem:
• Complicates security
• Breaks data governance
• Adds latency to process
• Makes collaboration more difficult
• Complicates model management and
deployment
• Creates infrastructure silos
32 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Accelerate Machine Learning from Research to Production
•
•
•
•
•
33 © Cloudera, Inc. All rights reserved.
CDSW ARCHITECTURE
Extends traditional clusters with new ML capabilities
• Built with Docker and Kubernetes
• Isolated, reproducible user environments
• Supports both big and small data
• Local Python, R, Scala runtimes
• Schedule & share GPU resources
• Scale to CDH/HDP with Spark, Impala, Hive
• Secure and governed by default
• Easy, audited access to Kerberized clusters
• Leverages shared platform services
• Deployed with Cloudera Manager or
package install (Ambari)
CDH/HDP CDH/HDP
Cloudera Manager/Ambari
gateway node(s) CDH nodes
Hive/Impala, HDFS,
...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
Tristan
34 © Cloudera, Inc. All rights reserved.
ACCELERATED DEEP LEARNING WITH GPUS
Multi-tenant GPU support on-premises or cloud
• Extend CDSW to deep learning
• Schedule & share GPU resources
• Train on GPUs, deploy on CPUs
• Works on-premises or cloud
CDSW
GPUCPU
CDH/HDP
CPU
CDH/HDP
single-node
training
distributed
training, scoring
“Our data scientists want GPUs, but
we need multi-tenancy. If they go to
the cloud on their own, it’s expensive
and we lose governance.”
GPU CPU GPU
35 © Cloudera, Inc. All rights reserved.
A MODERN DATA SCIENCE ARCHITECTURE
Containerized environments with scalable, on-demand compute
• Built with Docker and Kubernetes
• Isolated, reproducible user environments
• Supports both big and small data
• Local Python, R, Scala runtimes
• Schedule & share GPU resources
• Run Spark, Impala, and other CDH services
• Secure and governed by default
• Easy, audited access to Kerberized clusters
• Leverages SDX platform services
• Deployed with Cloudera Manager
CDH CDH
Cloudera Manager
gateway node(s) CDH nodes
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
36 © Cloudera, Inc. All rights reserved.
ACCELERATED DEEP LEARNING WITH GPUS
Multi-tenant GPU support on-premises or cloud
• Extend CDSW to deep learning
• Schedule & share GPU resources
• Train on GPUs, deploy on CPUs
• Works on-premises or cloud
CDSW
GPUCPU
CDH
CPU
CDH
CPU
single-node
training
distributed
training, scoring
“Our data scientists want GPUs, but
we need multi-tenancy. If they go to
the cloud on their own, it’s expensive
and we lose governance.”
GPU On CDH coming in C6
Confidential-Restricted – For Discussion Purposes Only
HDP Edge
Node
HDP
Node
HDP
Node
HDP
Node
Ambari
CDSW
Worker Node
HDFS, Hive, HBase, Spark, Phoenix…
HDP Edge Node
CDSW Master Node
Browser
HDP Edge
Node
CDSW
Worker Node
Cloudera Data Science Workbench Nodes
CDSW on HDP Architecture
Confidential-Restricted – For Discussion Purposes Only
CDSW 1.5.0 Support Matrix
● CDH 5
● CDH 6
● HDP 2.6.5
● HDP 3.1.0
© Cloudera, Inc. All rights reserved. 39
Any tool or library
THREE THINGS TO REMEMBER
Built for teams End-to-end self-service
1 2 3
40 © Cloudera, Inc. All rights reserved.
DATA CATALOG
GOVERNANCESECURITY LIFECYCLEWORKLOAD XM
STORAGE Amazon
S3
Microsof
t ADLS
HDFS KUDU
INTRODUCING CLOUDERA MACHINE LEARNING
Cloud-native enterprise machine learning platform
DATA SCIENCE DATA ENGINEERING MODEL OPERATIONS
CLOUDERA ML RUNTIME
Python/R, Spark, TensorFlow, CPU/GPU-Optimized
Interactive Development Batch Pipelines Predictive APIs
Full capability of CDSW
Rapid cloud provisioning
and elastic autoscaling
Unified data engineering and
ML with seamless
dependency management
Multi-cloud portability
powered by Kubernetes
Connects to HDFS or
cloud object storage
and shared metadata
Accelerated deep learning
with distributed GPU training
* Initially targeted for cloud
managed K8s services, then
OpenShift
KUBERNETES
EKS, AKS, GKE, OpenShift
41 © Cloudera, Inc. All rights reserved.
WHAT DATA SCIENCE TEAMS DO
Ingest data at scale.
Store and secure data.
Clean and transform data
for analysis.
Explore data and build
predictive models, offline.
Evaluate and tune models.
Develop and deliver a
modeling pipeline.
Test, verify, and approve
model for deployment.
Create and maintain
batch/stream pipelines,
embedded models, APIs.
Update models in
production.
PREPARE DATA BUILD MODELS DEPLOY MODELS
42 © Cloudera, Inc. All rights reserved.
NEW: CLOUDERA DATA SCIENCE WORKBENCH 1.5
Accelerate and simplify machine learning from research to production
ANALYZE DATA TRAIN MODELS
•
DEPLOY APIs
•
NEW! NEW!
MANAGE SHARED RESOURCES
43 © Cloudera, Inc. All rights reserved.
INTRODUCING EXPERIMENTS
Versioned model training runs for evaluation and reproducibility
Data scientists can now...
• Create a snapshot of model code,
dependencies, and configuration
necessary to train the model
• Build and execute the training run in an
isolated container
• Track specified model metrics,
performance, and model artifacts
• Inspect, compare, or deploy prior models
44 © Cloudera, Inc. All rights reserved.
INTRODUCING MODELS
Machine learning models as one-click microservices (REST APIs)
score.py
forecast
f = open('model.pk', 'rb')
model = pickle.load(f)
def forecast(data):
return model.predict(data)
45 © Cloudera, Inc. All rights reserved.
MODEL MANAGEMENT
View, test, monitor, and update models by team or project
46 © Cloudera, Inc. All rights reserved.
CLOUDERA FAST FORWARD LABS
47
CLOUDERA FAST FORWARD LABS
ADVISING &
RESEARCH
ML APPLICATION
DEVELOPMENT
ML STRATEGY
ENGAGEMENT
ML application
strategy prescription ML expert advising
research reports and
prototypes
Expert guidance to accelerate value and scale
48 © Cloudera, Inc. All rights reserved.
AS NEW TECH
CAPABILITIES EMERGE,
BE READY
THANK YOU

More Related Content

PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Cloudera - The Modern Platform for Analytics
PPTX
Get started with Cloudera's cyber solution
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
The Vision & Challenge of Applied Machine Learning
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Cloud Data Warehousing with Cloudera Altus 7.24.18
PDF
Stl meetup cloudera platform - january 2020
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera - The Modern Platform for Analytics
Get started with Cloudera's cyber solution
Edc event vienna presentation 1 oct 2019
The Vision & Challenge of Applied Machine Learning
Modern Data Warehouse Fundamentals Part 3
Cloud Data Warehousing with Cloudera Altus 7.24.18
Stl meetup cloudera platform - january 2020

What's hot (20)

PPTX
Cloudera SDX
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Introducing Workload XM 8.7.18
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Big data journey to the cloud 5.30.18 asher bartch
PPTX
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
PPTX
Introducing the data science sandbox as a service 8.30.18
PPTX
Cloudera training: secure your Cloudera cluster
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
When SAP alone is not enough
PPTX
Cloudera - IoT & Smart Cities
PPTX
Kudu Forrester Webinar
PPTX
Big data journey to the cloud maz chaudhri 5.30.18
PPTX
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
PPTX
How komatsu is driving operational efficiencies using io t and machine learni...
PPTX
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera SDX
Build a modern platform for anti-money laundering 9.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Introducing Workload XM 8.7.18
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera Data Impact Awards 2021 - Finalists
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Big data journey to the cloud 5.30.18 asher bartch
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
Introducing the data science sandbox as a service 8.30.18
Cloudera training: secure your Cloudera cluster
Extending Cloudera SDX beyond the Platform
When SAP alone is not enough
Cloudera - IoT & Smart Cities
Kudu Forrester Webinar
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
How komatsu is driving operational efficiencies using io t and machine learni...
Consolidate your data marts for fast, flexible analytics 5.24.18
Ad

Similar to Machine Learning in the Enterprise 2019 (20)

PDF
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
PPTX
Data Science and CDSW
PPTX
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
PPTX
The Edge to AI Deep Dive Barcelona Meetup March 2019
PPTX
Data Science in Enterprise
PDF
Edge to ai analytics from edge to cloud with efficient movement of machine data
PPTX
Part 1: Introducing the Cloudera Data Science Workbench
PDF
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
PPTX
Part 2: A Visual Dive into Machine Learning and Deep Learning 

PPTX
Deep Learning with Cloudera
PDF
Enterprise machine learning on k8s lessons learned and the road ahead
PPTX
From Insight to Action: Using Data Science to Transform Your Organization
PDF
Data Science and Machine Learning for the Enterprise
PPTX
Large-Scale Data Science on Hadoop (Intel Big Data Day)
PDF
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
PPTX
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data
PPTX
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
PPTX
Unlocking data science in the enterprise - with Oracle and Cloudera
PPTX
Machine Learning and Hadoop: Present and future
PPTX
Hadoop and Machine Learning
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Data Science and CDSW
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
The Edge to AI Deep Dive Barcelona Meetup March 2019
Data Science in Enterprise
Edge to ai analytics from edge to cloud with efficient movement of machine data
Part 1: Introducing the Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Deep Learning with Cloudera
Enterprise machine learning on k8s lessons learned and the road ahead
From Insight to Action: Using Data Science to Transform Your Organization
Data Science and Machine Learning for the Enterprise
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Unlocking data science in the enterprise - with Oracle and Cloudera
Machine Learning and Hadoop: Present and future
Hadoop and Machine Learning
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Foundation of Data Science unit number two notes
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Introduction to Business Data Analytics.
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
STUDY DESIGN details- Lt Col Maksud (21).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Foundation of Data Science unit number two notes
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Clinical guidelines as a resource for EBP(1).pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Miokarditis (Inflamasi pada Otot Jantung)
Fluorescence-microscope_Botany_detailed content
IB Computer Science - Internal Assessment.pptx
Introduction to Business Data Analytics.
climate analysis of Dhaka ,Banglades.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf

Machine Learning in the Enterprise 2019

  • 1. MACHINE LEARNING IN THE ENTERPRISE Timothy Spann | Senior Solutions Engineer @PaasDev
  • 2. 2 © Cloudera, Inc. All rights reserved. DISCLAIMER DA
  • 3. Introduction Tim Spann has been running meetups in Princeton on Big Data technologies since 2015. Tim has spoken at several international conferences on Apache NiFi. https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-apache-ni
  • 5. Hadoop {Submarine} Project: Running deep learning workloads on YARN , Tim Spann (Cloudera)
  • 7. IOT EDGE PROCESSING WITH MINIFI AND MULTIPLE DEEP LEARNING LIBRARIES
  • 8. 8 © Cloudera, Inc. All rights reserved.
  • 9. 9 © Cloudera, Inc. All rights reserved. The Industry’s First Enterprise Data Cloud From the Edge to AI
  • 10. 10 © Cloudera, Inc. All rights reserved. WHY CLOUDERA? One stop shop for analytics Unified open architecture Hybrid and multi-cloud INGEST & STREAMING DATA SCIENCE DATA WAREHOUSE OPERATIONAL DATABASE DATA ENGINEERING
  • 11. 11 © Cloudera, Inc. All rights reserved. CLOUDERA DATA FLOW (CDF)
  • 12. 12© Cloudera, Inc. All rights reserved.
  • 13. 13© Cloudera, Inc. All rights reserved. MACHINE LEARNING PHASES Where to Connect to Apache NiFi
  • 14. 14© Cloudera, Inc. All rights reserved. HANDS ON CDSW + NiFi https://community.hortonworks.com/articles/239961/using-cloudera-data-science-workbench-with-apache.html
  • 15. © Cloudera, Inc. All rights reserved.
  • 16. 16 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE
  • 17. 17 © Cloudera, Inc. All rights reserved. MACHINE LEARNING IS A GROWTH ENGINE PROTECT business CONNECT products & services (IoT) DRIVE customer insights ● ● ● ● ● ● ● ● ● It’s enabling entirely new businesses, not just modernizing existing systems. Machine learning refers to algorithms and methods to extract useful patterns from data. When we say machine learning, we mean broad, transformational data capabilities.
  • 18. 18 © Cloudera, Inc. All rights reserved. MOVING FROM EXPLORATION TO PRODUCTION OF ML & AI WE’RE WITNESSING THE INDUSTRIALIZATION OF AI FROM THE LAB… TO THE FACTORY
  • 19. 19 © Cloudera, Inc. All rights reserved. ENTERPRISE-GRADE AI OPERATIONS WHETHER YOU ARE A FORTUNE 100 OR A STARTUP SECURITY, GOVERNANCE, COMPLIANCE STRATEGY PEOPLE & ORGANIZATION TECHNOLOGY
  • 20. 20 © Cloudera, Inc. All rights reserved. AI MACHINE LEARNING DATA SCIENCE ANALYTICS "BIG DATA" CLOUD
  • 21. 21 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT CLOUDERA Our philosophy ● ● ●
  • 22. 22© Cloudera, Inc. All rights reserved. OUR APPROACH Modern enterprise platform, tools and expert guidance to help you unlock business value with ML/AI Agile platform to build, train, and deploy many scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  • 23. 23 © Cloudera, Inc. All rights reserved. PLATFORM
  • 24. © Cloudera, Inc. All rights reserved. 24 AND ONE MORE THING….
  • 25. 25 © Cloudera, Inc. All rights reserved. Amazon S3 Microsoft ADLS HDFS KUDU SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION DATA CATALOG Core Services Storage Services ANALYTIC DATABASE DATA SCIENCE EXTENSIBLE SERVICES OPERATIONAL DATABASE DATA ENGINEERING MACHINE LEARNING IS BUILT ON DATA MANAGEMENT Integrated data, workflows, metadata, security, governance, ...
  • 26. 26 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE DATA PLATFORM The modern platform for machine learning & analytics optimized for the cloud WORKLOADS 3RD PARTY SERVICES DATA ENGINEERING DATA SCIENCE DATA WAREHOUSE OPERATIONAL DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Microsoft ADLS COMMON SERVICES HDFS Amazon S3 CONTROL PLANE KUDU
  • 27. 27 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH
  • 28. 28 © Cloudera, Inc. All rights reserved. ACCELERATING THREE STAGES OF MACHINE LEARNING Manage models Deploy models Monitor performance DEPLOYDEVELOP Explore data Develop models Share results TRAIN Optimize parameters Track experiments Compare performance Enterprise AI platform supporting model development, training, and deployment
  • 29. 29 © Cloudera, Inc. All rights reserved. A PLATFORM FOR MACHINE LEARNING • Open platform  • Complete lifecycle  • Team collaboration • Enterprise ready  • Runs anywhere RESEARCH | PRODUCTION LOCAL | SPARK | IMPALA/HIVE DEPLOYMENT COMPUTE OPEN SOURCE ECOSYSTEMALGORITHMS SELF-SERVICE TOOLS SOLUTIONS | USE CASESAPPS CLOUD ON-PREMISES ADLSS3 HDFS KUDU CATALOG | SECURITY | GOVERNANCE SHARED CONTEXT
  • 30. 30 © Cloudera, Inc. All rights reserved. THE CHALLENGE Balance these needs DATA SCIENCE •Access to granular data •Flexibility • Preferred open source tools •Elastic provisioning • Compute • Storage •Reproducible research •Path to production DATA MANAGEMENT •Security •Governance •Standards •Low maintenance •Low cost •Self-service access
  • 31. 31 © Cloudera, Inc. All rights reserved. THE TYPICAL SOLUTION “If I can’t use my favorite tools, I’ll…” • Copy data to my laptop • Copy data to a data science appliance • Copy data to a cloud service Why this is a problem: • Complicates security • Breaks data governance • Adds latency to process • Makes collaboration more difficult • Complicates model management and deployment • Creates infrastructure silos
  • 32. 32 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Accelerate Machine Learning from Research to Production • • • • •
  • 33. 33 © Cloudera, Inc. All rights reserved. CDSW ARCHITECTURE Extends traditional clusters with new ML capabilities • Built with Docker and Kubernetes • Isolated, reproducible user environments • Supports both big and small data • Local Python, R, Scala runtimes • Schedule & share GPU resources • Scale to CDH/HDP with Spark, Impala, Hive • Secure and governed by default • Easy, audited access to Kerberized clusters • Leverages shared platform services • Deployed with Cloudera Manager or package install (Ambari) CDH/HDP CDH/HDP Cloudera Manager/Ambari gateway node(s) CDH nodes Hive/Impala, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine Tristan
  • 34. 34 © Cloudera, Inc. All rights reserved. ACCELERATED DEEP LEARNING WITH GPUS Multi-tenant GPU support on-premises or cloud • Extend CDSW to deep learning • Schedule & share GPU resources • Train on GPUs, deploy on CPUs • Works on-premises or cloud CDSW GPUCPU CDH/HDP CPU CDH/HDP single-node training distributed training, scoring “Our data scientists want GPUs, but we need multi-tenancy. If they go to the cloud on their own, it’s expensive and we lose governance.” GPU CPU GPU
  • 35. 35 © Cloudera, Inc. All rights reserved. A MODERN DATA SCIENCE ARCHITECTURE Containerized environments with scalable, on-demand compute • Built with Docker and Kubernetes • Isolated, reproducible user environments • Supports both big and small data • Local Python, R, Scala runtimes • Schedule & share GPU resources • Run Spark, Impala, and other CDH services • Secure and governed by default • Easy, audited access to Kerberized clusters • Leverages SDX platform services • Deployed with Cloudera Manager CDH CDH Cloudera Manager gateway node(s) CDH nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine
  • 36. 36 © Cloudera, Inc. All rights reserved. ACCELERATED DEEP LEARNING WITH GPUS Multi-tenant GPU support on-premises or cloud • Extend CDSW to deep learning • Schedule & share GPU resources • Train on GPUs, deploy on CPUs • Works on-premises or cloud CDSW GPUCPU CDH CPU CDH CPU single-node training distributed training, scoring “Our data scientists want GPUs, but we need multi-tenancy. If they go to the cloud on their own, it’s expensive and we lose governance.” GPU On CDH coming in C6
  • 37. Confidential-Restricted – For Discussion Purposes Only HDP Edge Node HDP Node HDP Node HDP Node Ambari CDSW Worker Node HDFS, Hive, HBase, Spark, Phoenix… HDP Edge Node CDSW Master Node Browser HDP Edge Node CDSW Worker Node Cloudera Data Science Workbench Nodes CDSW on HDP Architecture
  • 38. Confidential-Restricted – For Discussion Purposes Only CDSW 1.5.0 Support Matrix ● CDH 5 ● CDH 6 ● HDP 2.6.5 ● HDP 3.1.0
  • 39. © Cloudera, Inc. All rights reserved. 39 Any tool or library THREE THINGS TO REMEMBER Built for teams End-to-end self-service 1 2 3
  • 40. 40 © Cloudera, Inc. All rights reserved. DATA CATALOG GOVERNANCESECURITY LIFECYCLEWORKLOAD XM STORAGE Amazon S3 Microsof t ADLS HDFS KUDU INTRODUCING CLOUDERA MACHINE LEARNING Cloud-native enterprise machine learning platform DATA SCIENCE DATA ENGINEERING MODEL OPERATIONS CLOUDERA ML RUNTIME Python/R, Spark, TensorFlow, CPU/GPU-Optimized Interactive Development Batch Pipelines Predictive APIs Full capability of CDSW Rapid cloud provisioning and elastic autoscaling Unified data engineering and ML with seamless dependency management Multi-cloud portability powered by Kubernetes Connects to HDFS or cloud object storage and shared metadata Accelerated deep learning with distributed GPU training * Initially targeted for cloud managed K8s services, then OpenShift KUBERNETES EKS, AKS, GKE, OpenShift
  • 41. 41 © Cloudera, Inc. All rights reserved. WHAT DATA SCIENCE TEAMS DO Ingest data at scale. Store and secure data. Clean and transform data for analysis. Explore data and build predictive models, offline. Evaluate and tune models. Develop and deliver a modeling pipeline. Test, verify, and approve model for deployment. Create and maintain batch/stream pipelines, embedded models, APIs. Update models in production. PREPARE DATA BUILD MODELS DEPLOY MODELS
  • 42. 42 © Cloudera, Inc. All rights reserved. NEW: CLOUDERA DATA SCIENCE WORKBENCH 1.5 Accelerate and simplify machine learning from research to production ANALYZE DATA TRAIN MODELS • DEPLOY APIs • NEW! NEW! MANAGE SHARED RESOURCES
  • 43. 43 © Cloudera, Inc. All rights reserved. INTRODUCING EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can now... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  • 44. 44 © Cloudera, Inc. All rights reserved. INTRODUCING MODELS Machine learning models as one-click microservices (REST APIs) score.py forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data)
  • 45. 45 © Cloudera, Inc. All rights reserved. MODEL MANAGEMENT View, test, monitor, and update models by team or project
  • 46. 46 © Cloudera, Inc. All rights reserved. CLOUDERA FAST FORWARD LABS
  • 47. 47 CLOUDERA FAST FORWARD LABS ADVISING & RESEARCH ML APPLICATION DEVELOPMENT ML STRATEGY ENGAGEMENT ML application strategy prescription ML expert advising research reports and prototypes Expert guidance to accelerate value and scale
  • 48. 48 © Cloudera, Inc. All rights reserved. AS NEW TECH CAPABILITIES EMERGE, BE READY