SlideShare a Scribd company logo
Confidential and Proprietary1
Debunking Common Myths
About Cassandra Backup and
Test Data Management
Hari Mankude, CTO
December 2016
Confidential and Proprietary2
My Background
Confidential and Proprietary3
Why Bother With Backup and Test Data Mgmt?
The average cost of a data loss incident is $900,000
90% of enterprises delay applications because of a lack
of test data
• Source: EMC, Talena
Confidential and Proprietary4
Myth #1 Data Replicas Prevent Data Loss
N1
N2
N3
N4
Human errors: dropping
column of a table
Application corruption:
incorrect updates to a
column
Confidential and Proprietary5
Myth #2 Cassandra Replication Prevents Data Loss
N1
N2
N3
N4
N5
N6
N7
N8Data Center #1 Data Center #2
Confidential and Proprietary6
Myth #3: Cassandra snapshots are an effective backup
strategy
Snapshots
result in
storage
amplification
due to
compaction
PROBLEM
Need
scheduler to
take timely
snapshots &
delete older
restore points
PROBLEM
Confidential and Proprietary7
Myth #4: Restoring from snapshots is trivial
When your
cluster size
changes due
to addition or
deletion of
nodes
PROBLEM
If you have
config (e.g.,
compaction
policy) or
name changes
PROBLEM
Scaling your
restore to
hundreds of
nodes
PROBLEM
Confidential and Proprietary8
Myth #5: The traditional backup/restore process works
Confidential and Proprietary9
Myth #6 Test Data Management Is A Simple Process
Change
Request -
1 week
Provision
Production
Data - 1
week
Create
Test DB
and Mask
Data - 1
week
Create
Samples of
Production
Data – 2
days
Push
Production
Data To
Test –
Hours
Repeat
Process –
3-4 weeks
Confidential and Proprietary10
The Evolution of Data Management
THE NEXT
25 YEARS
THE
TRADITIONAL
WORLD
Data ManagementData Platforms
Confidential and Proprietary11
Talena in Production
Test
Cluster
Research
Cluster
Talena GUI
Hadoop/Spark
Cluster
Cassandra
Cluster
Vertica
Cluster
Couchbase
Cluster
Talena
Smart Storage
Cluster
Confidential and Proprietary12
The Talena Architecture
• Deep de-duplication and compression with app-aware architecture
• Incremental-forever backup architecture
• High availability via erasure coding in distributed cluster architecture
Smart Storage Optimizer
Confidential and Proprietary13
The Talena Architecture
Native querying and analytics
via active compute layer
Unbounded scale with a
Hadoop-native architecture
Smart Storage Optimizer
Active Compute Services Distributed File System
Confidential and Proprietary14
The Talena Architecture
• Google-like catalog
shortens data recovery
time
• Automatic schema
generation for mirroring
and backups
• Granular recovery at an
object level
• Recovery to multiple
topologies
• Native integration with
LDAP and Kerberos for
authentication
• Role-based access control
defines specific privileges
• Transparent data encryption
• Masking for PII data
Smart Storage Optimizer
Active Compute Services Distributed File System
Metadata Catalog Data Orchestration ServicesSecurity Services
Confidential and Proprietary15
Smart Storage Optimizer
The Talena Architecture
GUI CLI API
Active Compute Services Distributed File System
• ‘Single pane of glass’ for multiple use cases and data platforms
• Agentless architecture minimizes management overhead
• GUI, CLI, REST-based Talena API options
Metadata Catalog Data Orchestration ServicesSecurity Services
Confidential and Proprietary16
Q&A
 We’ll send you a link to our
eBook “The Cassandra
Backup Guide”
 Additional resources: talena-
inc.com/resources and
talena-inc.com/blog
 Ping us with any additional
questions: info@talena-
inc.com
Confidential and Proprietary17
Q and A

More Related Content

PPTX
Debunking Common Myths of Hadoop Backup & Test Data Management
PPTX
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
PPTX
Supercharge Splunk with Cloudera

PPTX
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
PPTX
Optimize Your Vertica Data Management Infrastructure
PPTX
Apache Kudu: Technical Deep Dive


PPTX
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
PPTX
Part 3: Models in Production: A Look From Beginning to End
Debunking Common Myths of Hadoop Backup & Test Data Management
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Supercharge Splunk with Cloudera

Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Optimize Your Vertica Data Management Infrastructure
Apache Kudu: Technical Deep Dive


Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Part 3: Models in Production: A Look From Beginning to End

What's hot (20)

PPTX
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
PPTX
Big Data Fundamentals
PPT
A Community Approach to Fighting Cyber Threats
PPTX
Unlock Hadoop Success with Cloudera Navigator Optimizer
PPTX
Part 1: Lambda Architectures: Simplified by Apache Kudu
PPTX
Road to Cloudera certification
PPTX
A deep dive into running data analytic workloads in the cloud
PPTX
Data Science and CDSW
PPTX
Solr consistency and recovery internals
PPTX
Multi-Tenant Operations with Cloudera 5.7 & BT
PDF
Data Science and Machine Learning for the Enterprise
PPTX
Big Data Case Study: Fortune 100 Telco
PPTX
Key Architecture and Performance Principles to Optimize Data Management
PPTX
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
PDF
Hadoop on Cloud: Why and How?
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
PDF
In memory computing principles by Mac Moore of GridGain
PPTX
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
PDF
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
PPT
Migrating legacy ERP data into Hadoop
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Big Data Fundamentals
A Community Approach to Fighting Cyber Threats
Unlock Hadoop Success with Cloudera Navigator Optimizer
Part 1: Lambda Architectures: Simplified by Apache Kudu
Road to Cloudera certification
A deep dive into running data analytic workloads in the cloud
Data Science and CDSW
Solr consistency and recovery internals
Multi-Tenant Operations with Cloudera 5.7 & BT
Data Science and Machine Learning for the Enterprise
Big Data Case Study: Fortune 100 Telco
Key Architecture and Performance Principles to Optimize Data Management
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Hadoop on Cloud: Why and How?
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
In memory computing principles by Mac Moore of GridGain
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Migrating legacy ERP data into Hadoop
Ad

Similar to Debunking Common Myths of Cassandra Backup (20)

PPTX
Webinar Presentation: Stories of Accidental Data Loss
PPTX
Optimizing Data Management for MongoDB
PDF
Oracle Storage Cloud Conference
PPTX
4 Ways To Save Big Money in Your Data Center and Private Cloud
PDF
Symantec Appliances Strategy Launch
PDF
PROACT SYNC 2013 - Breakout - CommVault IntelliSnap Recovery Manager de inzet...
PDF
Rapid_Recovery-T75-v2204j.pdf
PPTX
Get Started with Cloudera’s Cyber Solution
PDF
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
PDF
Data Warehouse or Data Lake, Which Do I Choose?
PPTX
Enterprise data management for microsoft hd insight
PPTX
Webinar | Introducing DataStax Enterprise 4.6
PDF
Data Architecture Best Practices for Advanced Analytics
PDF
Houd controle over uw data
PPTX
NetBackup Story Customer Presentation.pptx
PDF
Demystifying Data Warehouse as a Service (DWaaS)
PPTX
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
PPTX
Three Steps to Modern Media Asset Management with Active Archive
PDF
Estimating the Total Costs of Your Cloud Analytics Platform
PDF
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
Webinar Presentation: Stories of Accidental Data Loss
Optimizing Data Management for MongoDB
Oracle Storage Cloud Conference
4 Ways To Save Big Money in Your Data Center and Private Cloud
Symantec Appliances Strategy Launch
PROACT SYNC 2013 - Breakout - CommVault IntelliSnap Recovery Manager de inzet...
Rapid_Recovery-T75-v2204j.pdf
Get Started with Cloudera’s Cyber Solution
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Data Warehouse or Data Lake, Which Do I Choose?
Enterprise data management for microsoft hd insight
Webinar | Introducing DataStax Enterprise 4.6
Data Architecture Best Practices for Advanced Analytics
Houd controle over uw data
NetBackup Story Customer Presentation.pptx
Demystifying Data Warehouse as a Service (DWaaS)
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
Three Steps to Modern Media Asset Management with Active Archive
Estimating the Total Costs of Your Cloud Analytics Platform
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
Ad

Recently uploaded (20)

PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Digital Logic Computer Design lecture notes
PDF
PPT on Performance Review to get promotions
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Project quality management in manufacturing
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
composite construction of structures.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
R24 SURVEYING LAB MANUAL for civil enggi
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Digital Logic Computer Design lecture notes
PPT on Performance Review to get promotions
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
OOP with Java - Java Introduction (Basics)
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Project quality management in manufacturing
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
composite construction of structures.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Construction Project Organization Group 2.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...

Debunking Common Myths of Cassandra Backup

  • 1. Confidential and Proprietary1 Debunking Common Myths About Cassandra Backup and Test Data Management Hari Mankude, CTO December 2016
  • 3. Confidential and Proprietary3 Why Bother With Backup and Test Data Mgmt? The average cost of a data loss incident is $900,000 90% of enterprises delay applications because of a lack of test data • Source: EMC, Talena
  • 4. Confidential and Proprietary4 Myth #1 Data Replicas Prevent Data Loss N1 N2 N3 N4 Human errors: dropping column of a table Application corruption: incorrect updates to a column
  • 5. Confidential and Proprietary5 Myth #2 Cassandra Replication Prevents Data Loss N1 N2 N3 N4 N5 N6 N7 N8Data Center #1 Data Center #2
  • 6. Confidential and Proprietary6 Myth #3: Cassandra snapshots are an effective backup strategy Snapshots result in storage amplification due to compaction PROBLEM Need scheduler to take timely snapshots & delete older restore points PROBLEM
  • 7. Confidential and Proprietary7 Myth #4: Restoring from snapshots is trivial When your cluster size changes due to addition or deletion of nodes PROBLEM If you have config (e.g., compaction policy) or name changes PROBLEM Scaling your restore to hundreds of nodes PROBLEM
  • 8. Confidential and Proprietary8 Myth #5: The traditional backup/restore process works
  • 9. Confidential and Proprietary9 Myth #6 Test Data Management Is A Simple Process Change Request - 1 week Provision Production Data - 1 week Create Test DB and Mask Data - 1 week Create Samples of Production Data – 2 days Push Production Data To Test – Hours Repeat Process – 3-4 weeks
  • 10. Confidential and Proprietary10 The Evolution of Data Management THE NEXT 25 YEARS THE TRADITIONAL WORLD Data ManagementData Platforms
  • 11. Confidential and Proprietary11 Talena in Production Test Cluster Research Cluster Talena GUI Hadoop/Spark Cluster Cassandra Cluster Vertica Cluster Couchbase Cluster Talena Smart Storage Cluster
  • 12. Confidential and Proprietary12 The Talena Architecture • Deep de-duplication and compression with app-aware architecture • Incremental-forever backup architecture • High availability via erasure coding in distributed cluster architecture Smart Storage Optimizer
  • 13. Confidential and Proprietary13 The Talena Architecture Native querying and analytics via active compute layer Unbounded scale with a Hadoop-native architecture Smart Storage Optimizer Active Compute Services Distributed File System
  • 14. Confidential and Proprietary14 The Talena Architecture • Google-like catalog shortens data recovery time • Automatic schema generation for mirroring and backups • Granular recovery at an object level • Recovery to multiple topologies • Native integration with LDAP and Kerberos for authentication • Role-based access control defines specific privileges • Transparent data encryption • Masking for PII data Smart Storage Optimizer Active Compute Services Distributed File System Metadata Catalog Data Orchestration ServicesSecurity Services
  • 15. Confidential and Proprietary15 Smart Storage Optimizer The Talena Architecture GUI CLI API Active Compute Services Distributed File System • ‘Single pane of glass’ for multiple use cases and data platforms • Agentless architecture minimizes management overhead • GUI, CLI, REST-based Talena API options Metadata Catalog Data Orchestration ServicesSecurity Services
  • 16. Confidential and Proprietary16 Q&A  We’ll send you a link to our eBook “The Cassandra Backup Guide”  Additional resources: talena- inc.com/resources and talena-inc.com/blog  Ping us with any additional questions: info@talena- inc.com

Editor's Notes

  • #2: .
  • #10: ----- Meeting Notes (9/1/16 13:59) ----- Change the slide ----- Meeting Notes (9/1/16 16:03) ----- Add sampling bullet point Then push sampled data to test Add repeat bucket
  • #11: Starting over 20 years ago, the traditional database market became the foundation of enterprise applications. A whole ecosystem of data management products emerged to provide capabilities like backup/recovery (Veritas), storage pooling (Data Domain) test/dev management (Delphix) and Iron Mountain (archiving). But, companies had to purchase separate products to provide a full data management solution for their enterprise. Over the past few years and into the foreseeable future, modern data platforms will become new hubs of enterprise applications. These modern data platforms also need data management capabilities, similar to what happened with traditional databases. (Click for build) Our vision is to help companies with their critical data management needs in a single software product, one that is optimized specifically for these modern Big Data environments.
  • #13: The next few slides will introduce the unique Talena architecture and highlight how this architecture delivers on these core business benefits. One of the most significant components of our architecture is our Smart Storage Optimizer. By integrating compute and storage management into our storage optimizer, we’re able to deliver significant cost savings. Our application-aware architecture enables us to do deep de-duplication and compression. Our backup process is incremental-forever, saving on storage costs, and by incorporating erasure coding we also ensure high availability no matter how large a Talena cluster you choose to deploy.
  • #15: Supports transparent data encryption in the security services section
  • #16: Our agentless architecture makes Talena an ideal solution for big data architectures and minimizes your operational overhead. Furthermore, Talena can support multiple data platforms, versions, and use cases in a single deployment of Talena, thereby providing a “single pane of glass” for all your big data management needs. While most of our clients work within our user interface, we also provide a REST-based API to accomplish the same tasks.