SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
ClouderaAltus: Big Data in der Cloud einfach gemacht
MichaelKohs | SalesEngineer | mkohs@cloudera.com
2© Cloudera, Inc. All rights reserved.
Shift to cloud: an analyst view
Source: 451 Research, Voice of the
Enterprise: Workloads and Key Projects,
Cloud Transformation, 2017.
Key Points
▪ Cloud deployments will be
the dominant environment in
every category
▪ Every cloud deployment
environment will see
increases in every workload
category
▪ Analytics and App
Development areas
expected strong gains
3© Cloudera, Inc. All rights reserved.
My organization
is moving to the cloud,
why should we
consider Cloudera?
4© Cloudera, Inc. All rights reserved.
Traditional on-premises deployments perform reasonably well
Strong multi-function support
Strong shared data experience
Strong operational model
Moderate cost management
Moderate tenant isolation
Moderate workload elasticity
Weak on self service
Weak on speed of deployment
Shared Data Experience (Metadata, Security, Governance)
One physical cluster provides a shared data experience to multiple
workloads and tenants
But not good enough for tomorrow
5© Cloudera, Inc. All rights reserved.
Traditional cloud deployments are strong where on-premises is
weak, but at the expense of creating workload silos
Moderate multi-function support
Weak on shared data experience
Weak operational model
Moderate cost management
Strong on tenant isolation
Strong on workload elasticity
Strong on self service
Strong on speed of deployment
Shared Data (S3, ADLS)
Cloud
This is the experience of cloud house offerings
6© Cloudera, Inc. All rights reserved.
Only Cloud deployments with SDX optimize for all design goals
Shared Data Experience (Metadata, Security, Governance)
One logical cluster provides a shared data experience to multiple
workloads and tenants
SDX makes it possible to transfer on-premises design wins to cloud
Shared storage (S3, ADLS)
Cloud
Strong multi-function support
Strong shared data experience
Strong operational model
Strong on cost management
Strong on tenant isolation
Strong on workload elasticity
Strong on self service
Strong on speed of deployment
7© Cloudera, Inc. All rights reserved.
Cloudera’s Public Cloud Reference Architecture
Compute
Spark Hive
Hive on
Spark
Impala Solr …
Each workload runs in
an isolated Workload
Cluster
Navigator
Metastore
Hive
Metastore
Sentry
Metastore
SDX services run on
shared RDS | MySQL
Management
Cloudera
Manager
Cloudera
Navigator
Cloudera
Director
Each management
service supports all
Workload and SDX
services
Storage
S3, ADLS
Altus
8© Cloudera, Inc. All rights reserved. 8
The modern platform for machine learning and analytics optimized for the cloud
DATA CATALOG
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
EXTENSIBLE
SERVICES
CORE
SERVICES DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA
SCIENCE
S
3
ADL
S
HDF
S
KUD
U
STORAGE
SERVICES
Cloudera Enterprise
PRIVATE CLOUDBARE METAL INFRASTRUCTURE
DEPLOYMENT
OPTIONS SERVICES
9© Cloudera, Inc. All rights reserved.
Cloudera Director for cluster lifecycle management
Easy
• Single pane of glass for all cloud
infrastructure
• Create templates to run applications in a pre-
optimized manner
Flexible
• Multi-cloud: AWS, Azure, GCP
• Hourly pricing with auto billing & metering
• Spot instance/block support
Enterprise-grade
• Integration across Cloudera Enterprise
• Management of CDH deployments at scale
• Deeply integrated with Cloudera Manager
10© Cloudera, Inc. All rights reserved.
Cloudera Altus
Multi-cloud foundation for building new cloud services
Altus PaaS Foundation
DATA
ENGINEERING
ANALYTIC
DATABASE
(beta)
Altus Platform Services
11© Cloudera, Inc. All rights reserved.
Director vs Altus - when to use either?
Altus Director
Automated log saving x
Automated Cluster spin up /
down (no extra coding)
x
Data Engineering – Hive, Spark,
HoS, MR
x x
Production Job Driven x
Workload Analytics x
Cluster Duration Purely Transient Transient OR Persistent
Job Development / Exploration x
3rd Party Installations x
Full Control of CM x
Analytical / Operational - Impala,
HBase, Search
x
Persistent (or Transient) x
Grow / Shrink Cluster x
12© Cloudera, Inc. All rights reserved.
Altus Service Architecture (AWS)
● Runs in Cloudera’s secured and monitored environment
● Manages CDH clusters in customer cloud account
● Customer data does not pass to Cloudera (Workload Analytics
requires opt-in log data transfer to Cloudera)
13© Cloudera, Inc. All rights reserved.
Altus Service Architecture (Azure)
● Runs in Cloudera’s secured and monitored environment
● Manages CDH clusters in customer cloud account
● Customer data does not pass to Cloudera (Workload Analytics
requires opt-in log data transfer to Cloudera)
14© Cloudera, Inc. All rights reserved.
Altus Data Engineering
for ETL, machine learning, and data processing
• Fast, easy job submission without the
cluster management
• Built-in Workload Analytics for
troubleshooting and optimization
• Lower costs with transient resources
and pay-per-use pricing
• Full benefits of isolation + Shared Data
Experience
15© Cloudera, Inc. All rights reserved.
End-user focused with jobs as first-class
objects
Workload troubleshooting and
analytics
● Troubleshoot jobs after cluster
termination through job log and
configuration browsing
● Insight into causes of job failure
● Identification and root cause analysis
of slow jobs
16© Cloudera, Inc. All rights reserved.
Capture metadata spanning multiple clusters
Persist metadata with
Cloudera Navigator
● Export metadata and lineage
information from Altus clusters
● Insight into full data management
pipeline including transient
clusters
17© Cloudera, Inc. All rights reserved.
Three immediate use cases for Altus Data Engineering
ETL FOR
ANALYTIC DB
BATCH MACHINE
LEARNING
ETL OFFLOAD
Cloud-native batch
preparation for Impala
on IaaS or, soon,
Altus Analytic DB.
Scalable compute for
massively-parallel batch
machine learning training,
scoring, or simulation.
Offload batch processing
jobs from overburdened
on-premises clusters.
MLData ScienceETL Analytic DB
ETL
On-Prem
18© Cloudera, Inc. All rights reserved.
Altus Analytic Database
For business analysts:
• Query with predictable performance, at
any time, without risking SLAs
• Bring limitless new users and use cases
with instant self-service analytic access
• Data available for broad access (SQL, BI
tools, Python, R, etc)
The first data warehouse cloud service to bring the warehouse to the data - delivering instant
analytics to anyone
For IT:
• Easily and elastically provision isolated
resources as and when they’re needed
• Simple multi-tenant management including
federated identity and consistent governance
• Eliminate data movement and copies across
workloads
19© Cloudera, Inc. All rights reserved.
Long Running Cluster:
Director Managed
Elastic Cluster(s):
Director Managed
/ Altus Managed
ETL to Cloud-Native Analytic Database
SDX: Schema (HMS), Security (Sentry), Lineage/Audit (Navigator)
Data Engineering Workflow: ETL
Source
Data
1+ Jobs
(e.g. Hive)
1+ Jobs
(e.g. Spark)
Analytic Database
(Impala)
Source
Data
BI Tools
(e.g. Tableau)
SQL Editor
(e.g. Hue)
Transient Cluster(s):
Atlus Managed
Workflow, Monitoring, and Scheduling (e.g. AutoSys, ControlM, Airflow, Talend, Informatica)
Object Store (ADLS, S3)
20© Cloudera, Inc. All rights reserved.
The Scenario
My Role: Data Analyst at DataCo – a Sports Retailer
Business Issue: Experiencing lower than expected website sales. Why?
Technical Issues: I have a data warehouse on premise, which contains my sales
order data, but it is very old and slow and it is difficult to do ad hoc
queries on it.
My clickstream data is too big to ingest into my data warehouse
Requirements: Need an Analytic Database to do ad hoc queries on order data
Need a temporary platform to process weblogs once a day
Ability to join processed weblogs to order data
21© Cloudera, Inc. All rights reserved.
22© Cloudera, Inc. All rights reserved.
Demo - Retail clickstream analysis
Data
engineering
Analytic
database
Well-formatted
data
semi-
structured
weblogs
cleaning
formatting
filtering
SQL
reporting
Self-service
BI[IP, date, method,
url, http_version,
code1, code2,
dash, user_agent]
Insights
23© Cloudera, Inc. All rights reserved.
Object store
Cloudera
Director
Long-running
Kafka cluster
Long-running
stream processing
cluster
Long-running
Analytic DB
cluster (Impala)
Click stream
data
Transient Data
Engineering
clusters (Altus)
Cloudera
Altus
Cloudera
Manager
HDFS
on premise
Self-Service
Analytic DB
clusters
Cloudera
Altus
Shared Data Experience (Metadata, Security, Governance)
24© Cloudera, Inc. All rights reserved.
Q&A
25© Cloudera, Inc. All rights reserved.
Resources
Cloudera Altus
Cloudera Altus documentation
Cloudera Director
Cloudera Director documentation
Cloudera SDX
Try Cloudera Director on Microsoft Azure
Try Cloudera Director on AWS with AWS Quickstart
Cloudera Reference Architectures for public and private cloud deployments

More Related Content

PPTX
Strategies for Enterprise Grade Azure-based Analytics
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
When SAP alone is not enough
PPTX
Introducing the data science sandbox as a service 8.30.18
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Cloudera SDX
PPTX
Cloudera - The Modern Platform for Analytics
PPTX
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Strategies for Enterprise Grade Azure-based Analytics
Cloudera Data Impact Awards 2021 - Finalists
When SAP alone is not enough
Introducing the data science sandbox as a service 8.30.18
Edc event vienna presentation 1 oct 2019
Cloudera SDX
Cloudera - The Modern Platform for Analytics
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...

What's hot (20)

PPTX
Cloudera - IoT & Smart Cities
PPTX
PaaS or Fail: Rule the Cloud with Altus
PPTX
How komatsu is driving operational efficiencies using io t and machine learni...
PPTX
Introducing Workload XM 8.7.18
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Turning Data into Business Value with a Modern Data Platform
PPTX
Top 5 IoT Use Cases
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Cloud Data Warehousing with Cloudera Altus 7.24.18
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PDF
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
PPTX
Using Big Data to Transform Your Customer’s Experience - Part 1

PPTX
Get started with Cloudera's cyber solution
PPTX
Big Data Fundamentals
PPTX
Cloudera training secure your cloudera cluster 7.10.18
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
PPTX
Cloudera training: secure your Cloudera cluster
PPTX
How to Lower TCO and Avoid Cloud Lock-in

Cloudera - IoT & Smart Cities
PaaS or Fail: Rule the Cloud with Altus
How komatsu is driving operational efficiencies using io t and machine learni...
Introducing Workload XM 8.7.18
2020 Cloudera Data Impact Awards Finalists
Turning Data into Business Value with a Modern Data Platform
Top 5 IoT Use Cases
Extending Cloudera SDX beyond the Platform
Cloud Data Warehousing with Cloudera Altus 7.24.18
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Using Big Data to Transform Your Customer’s Experience - Part 1

Get started with Cloudera's cyber solution
Big Data Fundamentals
Cloudera training secure your cloudera cluster 7.10.18
Spark and Deep Learning Frameworks at Scale 7.19.18
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Cloudera training: secure your Cloudera cluster
How to Lower TCO and Avoid Cloud Lock-in

Ad

Similar to Cloudera Altus: Big Data in der Cloud einfach gemacht (20)

PDF
Cloudera GoDataFest Deploying Cloudera in the Cloud
PPTX
A deep dive into running data analytic workloads in the cloud
PPTX
Cloudera Altus: Big Data in the Cloud Made Easy
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
PPTX
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
PPTX
High-Performance Analytics in the Cloud with Apache Impala
PDF
Hybrid is the New Normal
PDF
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PDF
Cloudera enterprise-datasheet
PPTX
Big data journey to the cloud 5.30.18 asher bartch
PPTX
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
PPTX
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
PPTX
Data Warehouse Optimization
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
A deep dive into running data analytic workloads in the cloud
Cloudera Altus: Big Data in the Cloud Made Easy
Leveraging the cloud for analytics and machine learning 1.29.19
Modern Data Warehouse Fundamentals Part 2
Leveraging the Cloud for Big Data Analytics 12.11.18
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
High-Performance Analytics in the Cloud with Apache Impala
Hybrid is the New Normal
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Cloudera enterprise-datasheet
Big data journey to the cloud 5.30.18 asher bartch
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Data Warehouse Optimization
Modern Data Warehouse Fundamentals Part 1
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Ad

More from Cloudera, Inc. (12)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
How Cloudera SDX can aid GDPR compliance
PDF
Multi task learning stepping away from narrow expert models 7.11.18
PPTX
The 5 Biggest Data Myths in Telco: Exposed
Partner Briefing_January 25 (FINAL).pptx
Machine Learning with Limited Labeled Data 4/3/19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modern Data Warehouse Fundamentals Part 3
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
How Cloudera SDX can aid GDPR compliance
Multi task learning stepping away from narrow expert models 7.11.18
The 5 Biggest Data Myths in Telco: Exposed

Recently uploaded (20)

PPT
Chapter four Project-Preparation material
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
Laughter Yoga Basic Learning Workshop Manual
PDF
Business model innovation report 2022.pdf
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PDF
Traveri Digital Marketing Seminar 2025 by Corey and Jessica Perlman
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
PPT
Data mining for business intelligence ch04 sharda
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PPTX
Probability Distribution, binomial distribution, poisson distribution
PPTX
HR Introduction Slide (1).pptx on hr intro
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
Chapter four Project-Preparation material
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Laughter Yoga Basic Learning Workshop Manual
Business model innovation report 2022.pdf
Ôn tập tiếng anh trong kinh doanh nâng cao
unit 1 COST ACCOUNTING AND COST SHEET
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
Traveri Digital Marketing Seminar 2025 by Corey and Jessica Perlman
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
Roadmap Map-digital Banking feature MB,IB,AB
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
Data mining for business intelligence ch04 sharda
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
Probability Distribution, binomial distribution, poisson distribution
HR Introduction Slide (1).pptx on hr intro
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...

Cloudera Altus: Big Data in der Cloud einfach gemacht

  • 1. 1© Cloudera, Inc. All rights reserved. ClouderaAltus: Big Data in der Cloud einfach gemacht MichaelKohs | SalesEngineer | mkohs@cloudera.com
  • 2. 2© Cloudera, Inc. All rights reserved. Shift to cloud: an analyst view Source: 451 Research, Voice of the Enterprise: Workloads and Key Projects, Cloud Transformation, 2017. Key Points ▪ Cloud deployments will be the dominant environment in every category ▪ Every cloud deployment environment will see increases in every workload category ▪ Analytics and App Development areas expected strong gains
  • 3. 3© Cloudera, Inc. All rights reserved. My organization is moving to the cloud, why should we consider Cloudera?
  • 4. 4© Cloudera, Inc. All rights reserved. Traditional on-premises deployments perform reasonably well Strong multi-function support Strong shared data experience Strong operational model Moderate cost management Moderate tenant isolation Moderate workload elasticity Weak on self service Weak on speed of deployment Shared Data Experience (Metadata, Security, Governance) One physical cluster provides a shared data experience to multiple workloads and tenants But not good enough for tomorrow
  • 5. 5© Cloudera, Inc. All rights reserved. Traditional cloud deployments are strong where on-premises is weak, but at the expense of creating workload silos Moderate multi-function support Weak on shared data experience Weak operational model Moderate cost management Strong on tenant isolation Strong on workload elasticity Strong on self service Strong on speed of deployment Shared Data (S3, ADLS) Cloud This is the experience of cloud house offerings
  • 6. 6© Cloudera, Inc. All rights reserved. Only Cloud deployments with SDX optimize for all design goals Shared Data Experience (Metadata, Security, Governance) One logical cluster provides a shared data experience to multiple workloads and tenants SDX makes it possible to transfer on-premises design wins to cloud Shared storage (S3, ADLS) Cloud Strong multi-function support Strong shared data experience Strong operational model Strong on cost management Strong on tenant isolation Strong on workload elasticity Strong on self service Strong on speed of deployment
  • 7. 7© Cloudera, Inc. All rights reserved. Cloudera’s Public Cloud Reference Architecture Compute Spark Hive Hive on Spark Impala Solr … Each workload runs in an isolated Workload Cluster Navigator Metastore Hive Metastore Sentry Metastore SDX services run on shared RDS | MySQL Management Cloudera Manager Cloudera Navigator Cloudera Director Each management service supports all Workload and SDX services Storage S3, ADLS Altus
  • 8. 8© Cloudera, Inc. All rights reserved. 8 The modern platform for machine learning and analytics optimized for the cloud DATA CATALOG SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE S 3 ADL S HDF S KUD U STORAGE SERVICES Cloudera Enterprise PRIVATE CLOUDBARE METAL INFRASTRUCTURE DEPLOYMENT OPTIONS SERVICES
  • 9. 9© Cloudera, Inc. All rights reserved. Cloudera Director for cluster lifecycle management Easy • Single pane of glass for all cloud infrastructure • Create templates to run applications in a pre- optimized manner Flexible • Multi-cloud: AWS, Azure, GCP • Hourly pricing with auto billing & metering • Spot instance/block support Enterprise-grade • Integration across Cloudera Enterprise • Management of CDH deployments at scale • Deeply integrated with Cloudera Manager
  • 10. 10© Cloudera, Inc. All rights reserved. Cloudera Altus Multi-cloud foundation for building new cloud services Altus PaaS Foundation DATA ENGINEERING ANALYTIC DATABASE (beta) Altus Platform Services
  • 11. 11© Cloudera, Inc. All rights reserved. Director vs Altus - when to use either? Altus Director Automated log saving x Automated Cluster spin up / down (no extra coding) x Data Engineering – Hive, Spark, HoS, MR x x Production Job Driven x Workload Analytics x Cluster Duration Purely Transient Transient OR Persistent Job Development / Exploration x 3rd Party Installations x Full Control of CM x Analytical / Operational - Impala, HBase, Search x Persistent (or Transient) x Grow / Shrink Cluster x
  • 12. 12© Cloudera, Inc. All rights reserved. Altus Service Architecture (AWS) ● Runs in Cloudera’s secured and monitored environment ● Manages CDH clusters in customer cloud account ● Customer data does not pass to Cloudera (Workload Analytics requires opt-in log data transfer to Cloudera)
  • 13. 13© Cloudera, Inc. All rights reserved. Altus Service Architecture (Azure) ● Runs in Cloudera’s secured and monitored environment ● Manages CDH clusters in customer cloud account ● Customer data does not pass to Cloudera (Workload Analytics requires opt-in log data transfer to Cloudera)
  • 14. 14© Cloudera, Inc. All rights reserved. Altus Data Engineering for ETL, machine learning, and data processing • Fast, easy job submission without the cluster management • Built-in Workload Analytics for troubleshooting and optimization • Lower costs with transient resources and pay-per-use pricing • Full benefits of isolation + Shared Data Experience
  • 15. 15© Cloudera, Inc. All rights reserved. End-user focused with jobs as first-class objects Workload troubleshooting and analytics ● Troubleshoot jobs after cluster termination through job log and configuration browsing ● Insight into causes of job failure ● Identification and root cause analysis of slow jobs
  • 16. 16© Cloudera, Inc. All rights reserved. Capture metadata spanning multiple clusters Persist metadata with Cloudera Navigator ● Export metadata and lineage information from Altus clusters ● Insight into full data management pipeline including transient clusters
  • 17. 17© Cloudera, Inc. All rights reserved. Three immediate use cases for Altus Data Engineering ETL FOR ANALYTIC DB BATCH MACHINE LEARNING ETL OFFLOAD Cloud-native batch preparation for Impala on IaaS or, soon, Altus Analytic DB. Scalable compute for massively-parallel batch machine learning training, scoring, or simulation. Offload batch processing jobs from overburdened on-premises clusters. MLData ScienceETL Analytic DB ETL On-Prem
  • 18. 18© Cloudera, Inc. All rights reserved. Altus Analytic Database For business analysts: • Query with predictable performance, at any time, without risking SLAs • Bring limitless new users and use cases with instant self-service analytic access • Data available for broad access (SQL, BI tools, Python, R, etc) The first data warehouse cloud service to bring the warehouse to the data - delivering instant analytics to anyone For IT: • Easily and elastically provision isolated resources as and when they’re needed • Simple multi-tenant management including federated identity and consistent governance • Eliminate data movement and copies across workloads
  • 19. 19© Cloudera, Inc. All rights reserved. Long Running Cluster: Director Managed Elastic Cluster(s): Director Managed / Altus Managed ETL to Cloud-Native Analytic Database SDX: Schema (HMS), Security (Sentry), Lineage/Audit (Navigator) Data Engineering Workflow: ETL Source Data 1+ Jobs (e.g. Hive) 1+ Jobs (e.g. Spark) Analytic Database (Impala) Source Data BI Tools (e.g. Tableau) SQL Editor (e.g. Hue) Transient Cluster(s): Atlus Managed Workflow, Monitoring, and Scheduling (e.g. AutoSys, ControlM, Airflow, Talend, Informatica) Object Store (ADLS, S3)
  • 20. 20© Cloudera, Inc. All rights reserved. The Scenario My Role: Data Analyst at DataCo – a Sports Retailer Business Issue: Experiencing lower than expected website sales. Why? Technical Issues: I have a data warehouse on premise, which contains my sales order data, but it is very old and slow and it is difficult to do ad hoc queries on it. My clickstream data is too big to ingest into my data warehouse Requirements: Need an Analytic Database to do ad hoc queries on order data Need a temporary platform to process weblogs once a day Ability to join processed weblogs to order data
  • 21. 21© Cloudera, Inc. All rights reserved.
  • 22. 22© Cloudera, Inc. All rights reserved. Demo - Retail clickstream analysis Data engineering Analytic database Well-formatted data semi- structured weblogs cleaning formatting filtering SQL reporting Self-service BI[IP, date, method, url, http_version, code1, code2, dash, user_agent] Insights
  • 23. 23© Cloudera, Inc. All rights reserved. Object store Cloudera Director Long-running Kafka cluster Long-running stream processing cluster Long-running Analytic DB cluster (Impala) Click stream data Transient Data Engineering clusters (Altus) Cloudera Altus Cloudera Manager HDFS on premise Self-Service Analytic DB clusters Cloudera Altus Shared Data Experience (Metadata, Security, Governance)
  • 24. 24© Cloudera, Inc. All rights reserved. Q&A
  • 25. 25© Cloudera, Inc. All rights reserved. Resources Cloudera Altus Cloudera Altus documentation Cloudera Director Cloudera Director documentation Cloudera SDX Try Cloudera Director on Microsoft Azure Try Cloudera Director on AWS with AWS Quickstart Cloudera Reference Architectures for public and private cloud deployments