SlideShare a Scribd company logo
Apache Ozone
Evolution of HDFS Scalability & built-in GDPR compliance
Hadoop,	Ozone	&	Apache	are	trademarks	of	the	Apache	Software	Foundation.
Dinesh Chitlangia, Cloudera
Ajay Kumar, Google
Agenda
• Why, When, What
• Notions, Architecture,
Deployment
• Ozone for Enterprise
Ozone
• Ozone – Delete Path
• Ozone & GDPR
GDPR
Q & A
HDFS scalability
limits
400M+
Future
Make your HDFS
healthy day
Why
Object Store for Big
Data
•Scale both Objects & IOPS
Set of Micro-services
- Divide, Conquer,
Scale
Seamless transition
for Yarn, MapReduce,
Hive, Spark apps.
Supports K8s, CSI and
ability to run on K8s
natively.
Ozone
Scale beyond HDFS
Large Data Store /
Dedicated Storage
Clusters
Cloud like presence
on-prem
First class citizen
on K8
When
Notions
Volumes ~
user accounts
Buckets ~
directories (no
sub-buckets)
Keys ~ files
HDDS Notions
Containers
[Collection of
Blocks]
Pipeline
Architecture
Ozone’s Microservices - Divide, Conquer, Scale
• Ozone Manager - namespace [~Namenodes]
• Storage Container Managers - blockspace [~BlockServer]
• Recon Server - Control Plane
• S3 Gateway
• Datanodes
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Deployment
Variants
Ozone - Write Path
Similar to DFS Write, Blocks are written directly to Datanodes
Ozone - Read Path
Similar to DFS Read, Blocks are read directly from Datanodes
Using Ozone: Is it as painful as HDFS?
We hear you and we have to setup Ozone every time we test.
• Docker
• docker-compose up -d
• runs it on local machine
• K8s
• helm install ozone
• Traditional tarball
• Untar
• Run genconfig
• Update the configurations
• If you are familiar with HDFS commands
• dfs -ls hdfs://user
• with ozone, it will become
• dfs -ls o3fs://user
• If you are familiar with S3 commands like
• aws s3 ls -endpoint=us-west1. /bucketName
• with Ozone s3 it becomes
• aws s3 ls -endpoint=s3g.local. /bucketName
Setup Usage
Ozone for Enterprise
Scale
Consistency
Security
Ozone for Enterprise
• 10 Billion Keys will be supported in first official release
• Scale OM/SCM independently, without any disruption
• Evenly distribute metadata across the cluster including Datanodes
• RAFT Consensus Protocol via Apache RATIS
• Tested with industry recognized off-the-shelf components
• Blockade Tests - Tests to inject errors/failures in the clusters
• Tested Apache Spark, YARN, Hive workloads
• K8s based clusters, long running clusters, ephemeral clusters
• Freon - custom load generator
Ozone for Enterprise
Simplified Security
• Similar to HDFS, relies on Kerberos / Delegation Token / Block Token
• SCM comes with its own Certificate Authority and users DO NOT need to know
about it.
• Kerberos is only needed for OM/SCM, not for datanodes
• Security is on by default, not an afterthought
• Transparent Data Encryption
• Selectively audit READ or WRITE events, switch configs without the need to
restart.
Ozone for Enterprise
High Availability
• Built-in HA
• Single HA Configuration mode
• Regular HA Configuration mode [3 instances of OM/SCM]
ENFORCEMENTTRACKER.COM
British Airways £183.39M
Marriott International £100M
Swedish School for facial tracking
Dutch Hospital for unsecured patient
data
GENERAL DATA PROTECTION REGULATION (GDPR)
• Law for handling personal data
• Imposes responsibility on Data Controllers
• Enforces Accountability for Compliance
• Grants rights to Data Entity
• European Law: Spills outside of EU in Digital Era
STORAGE SYSTEMS & GDPR
Territorial Scope
Personal Data
Right to Erasure
(Right to be Forgotten)
Notification Obligatan
of the Controller
Delete Path - Overview
Delete Path – Under the hood
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
OZONE & GDPR
• GDPR Enabled Bucket
• During Ozone Key creation, generate Simple Encryption Key(SEK)
• Client writes data to blocks, encoded by SEK under the hood
• During read, the data is decoded using same SEK.
• During delete, OM moves the KeyInfo to Deleted Keys Section.
• SEK is irrevocable lost, Data cannot be decoded even if the actual blocks are
deleted much later
• Notification of Obligation is achieved
OZONE & GDPR -Limitations
• Backups & Restore
• Rapid Key Create/Delete cycles – false positives
• Existing Buckets need manual copy
• Network Topology
• HA Support
• Disk Scanner
• In-place upgrades for HDFS Clusters
• Erasure Coding
• Consistent Reads from Standby OM/SCM
• Stability & Scale testing
• TPC-DS, Chaos Monkey, Scale testing with Partners
Road ahead
Interested in Ozone?
https://hadoop.apache.org/ozone/
https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map
Q & A
THANK YOU

More Related Content

PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
PDF
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
PPTX
Managing your Hadoop Clusters with Apache Ambari
PPTX
Hive on Spark の設計指針を読んでみた
PDF
Care and Feeding of Catalyst Optimizer
PDF
Treasure Dataを支える技術 - MessagePack編
PPTX
NiFi Best Practices for the Enterprise
The Parquet Format and Performance Optimization Opportunities
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
Managing your Hadoop Clusters with Apache Ambari
Hive on Spark の設計指針を読んでみた
Care and Feeding of Catalyst Optimizer
Treasure Dataを支える技術 - MessagePack編
NiFi Best Practices for the Enterprise

What's hot (20)

PDF
Introduction to Apache Flink - Fast and reliable big data processing
PPTX
Hadoop REST API Security with Apache Knox Gateway
PDF
Hadoopの概念と基本的知識
PDF
Writing Continuous Applications with Structured Streaming in PySpark
PDF
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
PDF
Parquet Hadoop Summit 2013
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PPTX
Query Compilation in Impala
PPTX
Introduction to Apache Spark
PPTX
HDFS Namenode High Availability
PDF
Presto Summit 2018 - 09 - Netflix Iceberg
PDF
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
PPTX
Understanding SQL Trace, TKPROF and Execution Plan for beginners
PDF
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
PDF
Introducing the Apache Flink Kubernetes Operator
PDF
Productizing Structured Streaming Jobs
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
PDF
Parquet Strata/Hadoop World, New York 2013
PPTX
Splunk Architecture
Introduction to Apache Flink - Fast and reliable big data processing
Hadoop REST API Security with Apache Knox Gateway
Hadoopの概念と基本的知識
Writing Continuous Applications with Structured Streaming in PySpark
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
Parquet Hadoop Summit 2013
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Query Compilation in Impala
Introduction to Apache Spark
HDFS Namenode High Availability
Presto Summit 2018 - 09 - Netflix Iceberg
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Understanding SQL Trace, TKPROF and Execution Plan for beginners
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Introducing the Apache Flink Kubernetes Operator
Productizing Structured Streaming Jobs
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Parquet Strata/Hadoop World, New York 2013
Splunk Architecture
Ad

Similar to Ozone: Evolution of HDFS scalability & built-in GDPR compliance (20)

PPTX
Hadoop Meetup Jan 2019 - Overview of Ozone
PDF
Ozone - Evolution of hdfs scalability
PPTX
Ozone: scaling HDFS to trillions of objects
PPTX
Ozone: An Object Store in HDFS
PDF
Data Day Texas 2017: Scaling Data Science at Stitch Fix
PDF
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
PPTX
Ozone and HDFS’s evolution
PDF
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
PDF
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
PPTX
Big Data on Cloud Native Platform
PPTX
Big Data on Cloud Native Platform
PDF
Architecting Data Lakes on AWS
PDF
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
PDF
Ozone and HDFS's Evolution
PPTX
Big data journey to the cloud rohit pujari 5.30.18
PDF
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
PDF
Big Data on AWS
PPTX
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
PPTX
HIPAA Compliance in the Cloud
PPTX
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Hadoop Meetup Jan 2019 - Overview of Ozone
Ozone - Evolution of hdfs scalability
Ozone: scaling HDFS to trillions of objects
Ozone: An Object Store in HDFS
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Ozone and HDFS’s evolution
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Big Data on Cloud Native Platform
Big Data on Cloud Native Platform
Architecting Data Lakes on AWS
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Ozone and HDFS's Evolution
Big data journey to the cloud rohit pujari 5.30.18
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Big Data on AWS
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
HIPAA Compliance in the Cloud
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Ad

Recently uploaded (20)

PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
L1 - Introduction to python Backend.pptx
PPT
JAVA ppt tutorial basics to learn java programming
PDF
Understanding Forklifts - TECH EHS Solution
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Softaken Excel to vCard Converter Software.pdf
PPT
Introduction Database Management System for Course Database
PPTX
history of c programming in notes for students .pptx
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
VVF-Customer-Presentation2025-Ver1.9.pptx
Materi-Enum-and-Record-Data-Type (1).pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
L1 - Introduction to python Backend.pptx
JAVA ppt tutorial basics to learn java programming
Understanding Forklifts - TECH EHS Solution
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
ManageIQ - Sprint 268 Review - Slide Deck
Design an Analysis of Algorithms I-SECS-1021-03
Upgrade and Innovation Strategies for SAP ERP Customers
ISO 45001 Occupational Health and Safety Management System
Which alternative to Crystal Reports is best for small or large businesses.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Softaken Excel to vCard Converter Software.pdf
Introduction Database Management System for Course Database
history of c programming in notes for students .pptx
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...

Ozone: Evolution of HDFS scalability & built-in GDPR compliance

  • 1. Apache Ozone Evolution of HDFS Scalability & built-in GDPR compliance Hadoop, Ozone & Apache are trademarks of the Apache Software Foundation. Dinesh Chitlangia, Cloudera Ajay Kumar, Google
  • 2. Agenda • Why, When, What • Notions, Architecture, Deployment • Ozone for Enterprise Ozone • Ozone – Delete Path • Ozone & GDPR GDPR Q & A
  • 4. Object Store for Big Data •Scale both Objects & IOPS Set of Micro-services - Divide, Conquer, Scale Seamless transition for Yarn, MapReduce, Hive, Spark apps. Supports K8s, CSI and ability to run on K8s natively. Ozone
  • 5. Scale beyond HDFS Large Data Store / Dedicated Storage Clusters Cloud like presence on-prem First class citizen on K8 When
  • 6. Notions Volumes ~ user accounts Buckets ~ directories (no sub-buckets) Keys ~ files HDDS Notions Containers [Collection of Blocks] Pipeline
  • 7. Architecture Ozone’s Microservices - Divide, Conquer, Scale • Ozone Manager - namespace [~Namenodes] • Storage Container Managers - blockspace [~BlockServer] • Recon Server - Control Plane • S3 Gateway • Datanodes
  • 10. Ozone - Write Path Similar to DFS Write, Blocks are written directly to Datanodes
  • 11. Ozone - Read Path Similar to DFS Read, Blocks are read directly from Datanodes
  • 12. Using Ozone: Is it as painful as HDFS? We hear you and we have to setup Ozone every time we test. • Docker • docker-compose up -d • runs it on local machine • K8s • helm install ozone • Traditional tarball • Untar • Run genconfig • Update the configurations • If you are familiar with HDFS commands • dfs -ls hdfs://user • with ozone, it will become • dfs -ls o3fs://user • If you are familiar with S3 commands like • aws s3 ls -endpoint=us-west1. /bucketName • with Ozone s3 it becomes • aws s3 ls -endpoint=s3g.local. /bucketName Setup Usage
  • 14. Ozone for Enterprise • 10 Billion Keys will be supported in first official release • Scale OM/SCM independently, without any disruption • Evenly distribute metadata across the cluster including Datanodes • RAFT Consensus Protocol via Apache RATIS • Tested with industry recognized off-the-shelf components • Blockade Tests - Tests to inject errors/failures in the clusters • Tested Apache Spark, YARN, Hive workloads • K8s based clusters, long running clusters, ephemeral clusters • Freon - custom load generator
  • 15. Ozone for Enterprise Simplified Security • Similar to HDFS, relies on Kerberos / Delegation Token / Block Token • SCM comes with its own Certificate Authority and users DO NOT need to know about it. • Kerberos is only needed for OM/SCM, not for datanodes • Security is on by default, not an afterthought • Transparent Data Encryption • Selectively audit READ or WRITE events, switch configs without the need to restart.
  • 16. Ozone for Enterprise High Availability • Built-in HA • Single HA Configuration mode • Regular HA Configuration mode [3 instances of OM/SCM]
  • 17. ENFORCEMENTTRACKER.COM British Airways £183.39M Marriott International £100M Swedish School for facial tracking Dutch Hospital for unsecured patient data
  • 18. GENERAL DATA PROTECTION REGULATION (GDPR) • Law for handling personal data • Imposes responsibility on Data Controllers • Enforces Accountability for Compliance • Grants rights to Data Entity • European Law: Spills outside of EU in Digital Era
  • 19. STORAGE SYSTEMS & GDPR Territorial Scope Personal Data Right to Erasure (Right to be Forgotten) Notification Obligatan of the Controller
  • 20. Delete Path - Overview
  • 21. Delete Path – Under the hood
  • 24. OZONE & GDPR • GDPR Enabled Bucket • During Ozone Key creation, generate Simple Encryption Key(SEK) • Client writes data to blocks, encoded by SEK under the hood • During read, the data is decoded using same SEK. • During delete, OM moves the KeyInfo to Deleted Keys Section. • SEK is irrevocable lost, Data cannot be decoded even if the actual blocks are deleted much later • Notification of Obligation is achieved
  • 25. OZONE & GDPR -Limitations • Backups & Restore • Rapid Key Create/Delete cycles – false positives • Existing Buckets need manual copy
  • 26. • Network Topology • HA Support • Disk Scanner • In-place upgrades for HDFS Clusters • Erasure Coding • Consistent Reads from Standby OM/SCM • Stability & Scale testing • TPC-DS, Chaos Monkey, Scale testing with Partners Road ahead
  • 28. Q & A THANK YOU