SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights reserved
Running Enterprise Workloads with an Open
Source Hybrid Cloud Data Architecture
Alan Gates, Co-founder Hortonworks
2 © Hortonworks Inc. 2011–2018. All rights reserved
HDF HDP
Next Generation Data Problems
My Data Is Spread Across Multiple
Clusters and Data Sources
I Store & Analyze Data From
ERP/CRM, Systems, IoT/ Mobile
Devices, Social Media, Geo
Location etc.
Some of my data is on-premise,
some is in the cloud. I move my data
from cloud to on-premise & vice
versa between different clouds
™ ®
3 © Hortonworks Inc. 2011–2018. All rights reserved
Data Is Your Business
Focus on Your Data Strategy
●Consider how you store, manage and protect your data
●Data must be made known, discoverable, available, trusted and compliant
●Security and Governance of all data is paramount
●Stewardship, discovery, delivery and use of data is a key concern
Treat Your Data as a Strategic Asset
●Turn data into predictive and prescriptive analytics
●Enable self-service analytics to accelerate delivery of new business insights
●Build a solid foundation for higher value Data Science, ML and AI
●Data explosion is uncovering new possibilities – if you can seize them
The Next Generation of Data Problems require a Data Strategy
Big Data Platform Owners
Balancing Enterprise Requirements for Hybrid Cloud Data Strategy
Time to Insight
Access a Broad Set of Analytics Tools
On-demand, Self-service Access
Data Discovery, Provisioning and Deployment
Global Data Access Transparent of Location
Single Pane of Glass
Reduce Risk
Consistent Security and Governance
Manage Cloud and Shadow Spend
Retain Data Context, Lineage and Visibility
Operational Reliability, Portability
Remain Cloud Agnostic
Data Analyst, Data Engineer
and Data Scientists
Line of Business practitioners vs Enterprise IT stakeholders
5 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
You Have Data Everywhere
Cluster 2
(Unstructured)
Cluster 1
(Structured)
Cluster 2
(Unstructured)
Cluster 1
(Structured)
Cluster 3
(Structured)
Data Center Dublin
Cluster 2
(Unstructured)
Cluster 1
(Structured)
Cluster 3
(Structured)
Cluster 4
(Unstructured)
Data Center Las Vegas
Cluster 2
(Unstructured)
Cluster 1
(Structured)
Cluster 3
(Structured)
Data Center Melbourne
Cluster 1
(Unstructured)
Cluster 2
(Structured)
Shared
Services
Connectivity
Application
Portability
6 © Hortonworks Inc. 2011–2018. All rights reserved
Data Plane Service Enables a Hybrid Architecture for Global Data
Management
From the edge, through movement, to rest
DataPlane Service
a foundational platform for the delivery of data
solutions that will:
• Support enterprise hybrid deployment strategy
and adoption of cloud
• Common Metadata, Security and Governance
across all deployments
• Simplified enterprise data asset management
• Support variety of workloads
• Extensible to new services: Services enablement
layer for rapidly bringing new solutions to market
DATAPLANE
SERVICE
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
Manage, Secure, Govern
DATA AT REST
Cloudera
Data Platform
DATA IN MOTION
Cloudera
Data Flow
7 © Hortonworks Inc. 2011–2018. All rights reserved.
DataPlane Service (DPS)
8 © Hortonworks Inc. 2011–2018. All rights reserved
The DPS Ecosystem
DPS PLATFORM
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO*
DATA
ANALYTICS
STUDIO*
STREAMS
MESSAGING
MANAGER
DATA PLANE SERVICES
Authentication, Role-based access, Service lifecycle management,
Cluster registration, Cluster Service discovery and access
Platform or Flow Cluster
DLM Engine
Profiler
Service
DAS Agent
SMM Agent
9 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
⬢ Manage the Data Lifecycle:
– Replication/failback to another cloud/on-prem
site for Disaster Recovery
– Auto Tiering of hot/warm/cold data to cloud
object storage/on-prem for TCO reduction
– Backup & Recover Critical Business Data
⬢ Maintain Common Security and Governance Policies
Across Multi Data Sources/ Environments
Data Lifecycle Manager (DLM)
DATA LIFECYCLE MANAGER
REPLICATION &
DISASTER
RECOVERY
Cluster Cluster ClusterMOVE MOVE
AUTO TIERING
BACKUP &
RESTORE
P(use): high
Cost: $$$
P(use): medium
Cost: $$
P(use): low
Cost: $
Full
backup
day 1 day 2 day 3
Cumulative incremental
backups
Accident
delete
X
FAILBACK
REPLICATION
RESTORE
Prod
Cluster
Backup
Cluster
Generally
Available
Coming Soon
Coming Soon
DLM
10 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM 1.0 (GA Product) DLM: Pair clusters and manage data replication flows
Data Lifecycle Manager (DLM)
11 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM: Replicate between on-prem and cloud
DPS PlatformData Lifecycle Manager (DLM)
12 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM: Replication policies and instances
Data Lifecycle Manager (DLM)Data Lifecycle Manager (DLM)
13 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
14 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Enhance productivity through full featured auto-
complete, results direct download, quick-data
preview features
Data Analytics Studio (DAS)
15 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Self optimize queries and storage based on heuristic
recommendation engine
Data Analytics Studio (DAS)
16 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Built-in batch operations
No more scripting needed for day-to-day operations
Data Analytics Studio (DAS)Data Analytics Studio (DAS)
17 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Streams Messaging Manager (SMM)
What is SMM?
à Kafka Management and Monitoring tool
à Single Monitoring Dashboard for all your
Kafka Clusters across 4 entities
– Broker
– Producer
– Topic
– Consumer
à Supports multiple HDP and/or HDF Kafka
Clusters
à REST as a First Class Citizen
à Delivered as a DataPlane Service
18 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: Full visibility into all details of Kafka Clusters
DPS PlatformStreams Messaging Manager
19 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: Detailed Views of specific Topics
DPS PlatformStreams Messaging Manager
20 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: All producers and Consumers associated with a
topic
DPS PlatformStreams Messaging Manager
21 © Hortonworks Inc. 2011–2018. All rights reserved.
Goals
22 © Hortonworks Inc. 2011–2018. All rights reserved.
Know your Sensitive Data
• Automatically detect and
profile sensitive & personal
data
• Attach classification
annotations for sensitivity
• Manual approval and curation
of sensitive data
classifications
• Leverage classification based
data protection
• Sensitive data dashboard on
Asset 360
Sensitive Data Profiling
23 © Hortonworks Inc. 2011–2018. All rights reserved.
Track your Sensitive Data
• IBAN (27 EU Countries)
• Credit Card Numbers
• Email
• Telephone (AMER, EU)
• IP Address
• URL
• Passport (12 EU Countries)
• National ID (19 EU Countries)
• Australian Drivers License
• Australian Passport
• Australian National ID
Sensitive Data Types
24 © Hortonworks Inc. 2011–2018. All rights reserved.
Track Your Data Asset – Lineage and Impact
• Consolidated Upstream lineage and
downstream impact
• Detailed click-through to asset properties
Data Lineage and Impact
25 © Hortonworks Inc. 2011–2018. All rights reserved.
View Security Policies for your Data Assets
• View security policies on
data assets
• View classification based
policies on assets
Security Policies
26 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you!

More Related Content

PDF
Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture
PDF
Data in the Cloud Crash Course
PDF
10 Lessons Learned from Meeting with 150 Banks Across the Globe
PDF
The Car of the Future - Autonomous, Connected, and Data Centric
PDF
Containers and Big Data
PDF
Data in the Cloud Crash Course
PDF
Fast SQL on Hadoop, really?
PDF
Deep learning 101
Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture
Data in the Cloud Crash Course
10 Lessons Learned from Meeting with 150 Banks Across the Globe
The Car of the Future - Autonomous, Connected, and Data Centric
Containers and Big Data
Data in the Cloud Crash Course
Fast SQL on Hadoop, really?
Deep learning 101

What's hot (20)

PDF
What is New in Apache Hive 3.0?
PDF
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
PDF
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
PDF
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
PDF
Hybrid is the New Normal
PDF
Curing the Kafka Blindness – Streams Messaging Manager
PDF
PPTX
Lessons learned running a container cloud on YARN
PPTX
Containers and Big Data
PPTX
Modernise your EDW - Data Lake
PDF
What's New in Apache Hive 3.0?
PDF
Hadoop: The Unintended Benefits
PPTX
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Ozone and HDFS’s evolution
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
PPTX
The Elephant in the Clouds
PPTX
Automatic Detection, Classification and Authorization of Sensitive Personal D...
PDF
Deep Learning 101
PPTX
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
What is New in Apache Hive 3.0?
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Hybrid is the New Normal
Curing the Kafka Blindness – Streams Messaging Manager
Lessons learned running a container cloud on YARN
Containers and Big Data
Modernise your EDW - Data Lake
What's New in Apache Hive 3.0?
Hadoop: The Unintended Benefits
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Ozone and HDFS’s evolution
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
The Elephant in the Clouds
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Deep Learning 101
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Ad

Similar to Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture (20)

PPTX
Hortonworks - IBM - Cloud Event
PPTX
The Implacable advance of the data
PDF
Paris FOD meetup - Streams Messaging Manager
PPTX
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
PDF
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
PDF
Hortonworks Hybrid Cloud - Putting you back in control of your data
PDF
Hortonworks - IBM Cognitive - The Future of Data Science
PDF
IBM Cloud Paris meetup 20180213 - Hortonworks
PPTX
Manage democratization of the data - Data Replication in Hadoop
PDF
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
PDF
Reinvent Your Data Management Strategy for Successful Digital Transformation
PPTX
Enterprise data science at scale
PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
PDF
Hortonworks & Bilot Data Driven Transformations with Hadoop
PDF
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
PDF
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
PPTX
Big Data LDN 2016: Case Studies of Business Transformation through Big Data
PPTX
Bigger Data For Your Budget
PDF
Hortonworks - How Hadoop makes the successful Retailer.
PDF
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Hortonworks - IBM - Cloud Event
The Implacable advance of the data
Paris FOD meetup - Streams Messaging Manager
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks - IBM Cognitive - The Future of Data Science
IBM Cloud Paris meetup 20180213 - Hortonworks
Manage democratization of the data - Data Replication in Hadoop
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Reinvent Your Data Management Strategy for Successful Digital Transformation
Enterprise data science at scale
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks & Bilot Data Driven Transformations with Hadoop
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
Big Data LDN 2016: Case Studies of Business Transformation through Big Data
Bigger Data For Your Budget
Hortonworks - How Hadoop makes the successful Retailer.
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation theory and applications.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Electronic commerce courselecture one. Pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25-Week II
Reach Out and Touch Someone: Haptics and Empathic Computing
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Electronic commerce courselecture one. Pdf

Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture Alan Gates, Co-founder Hortonworks
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved HDF HDP Next Generation Data Problems My Data Is Spread Across Multiple Clusters and Data Sources I Store & Analyze Data From ERP/CRM, Systems, IoT/ Mobile Devices, Social Media, Geo Location etc. Some of my data is on-premise, some is in the cloud. I move my data from cloud to on-premise & vice versa between different clouds ™ ®
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Data Is Your Business Focus on Your Data Strategy ●Consider how you store, manage and protect your data ●Data must be made known, discoverable, available, trusted and compliant ●Security and Governance of all data is paramount ●Stewardship, discovery, delivery and use of data is a key concern Treat Your Data as a Strategic Asset ●Turn data into predictive and prescriptive analytics ●Enable self-service analytics to accelerate delivery of new business insights ●Build a solid foundation for higher value Data Science, ML and AI ●Data explosion is uncovering new possibilities – if you can seize them The Next Generation of Data Problems require a Data Strategy
  • 4. Big Data Platform Owners Balancing Enterprise Requirements for Hybrid Cloud Data Strategy Time to Insight Access a Broad Set of Analytics Tools On-demand, Self-service Access Data Discovery, Provisioning and Deployment Global Data Access Transparent of Location Single Pane of Glass Reduce Risk Consistent Security and Governance Manage Cloud and Shadow Spend Retain Data Context, Lineage and Visibility Operational Reliability, Portability Remain Cloud Agnostic Data Analyst, Data Engineer and Data Scientists Line of Business practitioners vs Enterprise IT stakeholders
  • 5. 5 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. You Have Data Everywhere Cluster 2 (Unstructured) Cluster 1 (Structured) Cluster 2 (Unstructured) Cluster 1 (Structured) Cluster 3 (Structured) Data Center Dublin Cluster 2 (Unstructured) Cluster 1 (Structured) Cluster 3 (Structured) Cluster 4 (Unstructured) Data Center Las Vegas Cluster 2 (Unstructured) Cluster 1 (Structured) Cluster 3 (Structured) Data Center Melbourne Cluster 1 (Unstructured) Cluster 2 (Structured) Shared Services Connectivity Application Portability
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Data Plane Service Enables a Hybrid Architecture for Global Data Management From the edge, through movement, to rest DataPlane Service a foundational platform for the delivery of data solutions that will: • Support enterprise hybrid deployment strategy and adoption of cloud • Common Metadata, Security and Governance across all deployments • Simplified enterprise data asset management • Support variety of workloads • Extensible to new services: Services enablement layer for rapidly bringing new solutions to market DATAPLANE SERVICE MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID Manage, Secure, Govern DATA AT REST Cloudera Data Platform DATA IN MOTION Cloudera Data Flow
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. DataPlane Service (DPS)
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved The DPS Ecosystem DPS PLATFORM DATA LIFECYCLE MANAGER DATA STEWARD STUDIO* DATA ANALYTICS STUDIO* STREAMS MESSAGING MANAGER DATA PLANE SERVICES Authentication, Role-based access, Service lifecycle management, Cluster registration, Cluster Service discovery and access Platform or Flow Cluster DLM Engine Profiler Service DAS Agent SMM Agent
  • 9. 9 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. ⬢ Manage the Data Lifecycle: – Replication/failback to another cloud/on-prem site for Disaster Recovery – Auto Tiering of hot/warm/cold data to cloud object storage/on-prem for TCO reduction – Backup & Recover Critical Business Data ⬢ Maintain Common Security and Governance Policies Across Multi Data Sources/ Environments Data Lifecycle Manager (DLM) DATA LIFECYCLE MANAGER REPLICATION & DISASTER RECOVERY Cluster Cluster ClusterMOVE MOVE AUTO TIERING BACKUP & RESTORE P(use): high Cost: $$$ P(use): medium Cost: $$ P(use): low Cost: $ Full backup day 1 day 2 day 3 Cumulative incremental backups Accident delete X FAILBACK REPLICATION RESTORE Prod Cluster Backup Cluster Generally Available Coming Soon Coming Soon DLM
  • 10. 10 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM 1.0 (GA Product) DLM: Pair clusters and manage data replication flows Data Lifecycle Manager (DLM)
  • 11. 11 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM: Replicate between on-prem and cloud DPS PlatformData Lifecycle Manager (DLM)
  • 12. 12 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM: Replication policies and instances Data Lifecycle Manager (DLM)Data Lifecycle Manager (DLM)
  • 13. 13 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
  • 14. 14 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Enhance productivity through full featured auto- complete, results direct download, quick-data preview features Data Analytics Studio (DAS)
  • 15. 15 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Self optimize queries and storage based on heuristic recommendation engine Data Analytics Studio (DAS)
  • 16. 16 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Built-in batch operations No more scripting needed for day-to-day operations Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  • 17. 17 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Streams Messaging Manager (SMM) What is SMM? à Kafka Management and Monitoring tool à Single Monitoring Dashboard for all your Kafka Clusters across 4 entities – Broker – Producer – Topic – Consumer à Supports multiple HDP and/or HDF Kafka Clusters à REST as a First Class Citizen à Delivered as a DataPlane Service
  • 18. 18 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: Full visibility into all details of Kafka Clusters DPS PlatformStreams Messaging Manager
  • 19. 19 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: Detailed Views of specific Topics DPS PlatformStreams Messaging Manager
  • 20. 20 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: All producers and Consumers associated with a topic DPS PlatformStreams Messaging Manager
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. Goals
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. Know your Sensitive Data • Automatically detect and profile sensitive & personal data • Attach classification annotations for sensitivity • Manual approval and curation of sensitive data classifications • Leverage classification based data protection • Sensitive data dashboard on Asset 360 Sensitive Data Profiling
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. Track your Sensitive Data • IBAN (27 EU Countries) • Credit Card Numbers • Email • Telephone (AMER, EU) • IP Address • URL • Passport (12 EU Countries) • National ID (19 EU Countries) • Australian Drivers License • Australian Passport • Australian National ID Sensitive Data Types
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. Track Your Data Asset – Lineage and Impact • Consolidated Upstream lineage and downstream impact • Detailed click-through to asset properties Data Lineage and Impact
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. View Security Policies for your Data Assets • View security policies on data assets • View classification based policies on assets Security Policies
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Thank you!