SlideShare a Scribd company logo
Version 1.0
Getting Started with DataStax
Enterprise (DSE) on Docker
In Cassandra Lunch #75, we are going to look at getting
started with DataStax Enterprise on Docker.
Isaac Omolayo
Jr. Software Engineer
@Anant
Getting Started with DataStax Enterprise
● What is DataStax Enterprise ?
● Packages and capabilities of DataStax Enterprise
● Using the DSE Search to solve data problems
● Using DSE Analytics (Spark) to handle data workloads
● Using the DataStax Enterprise Graph
● Running DataStax Enterprise packages on Docker
● Working with DSE Studio, DSE Search, DSE Analytics, DSE Graph
What is DataStax Enterprise ?
● DataStax Enterprise helps enterprises to build transformational data architectures for applications, microservices and
different use cases. The purpose of these is for data sovereignty, availability, scalability, agility, and accessibility by any user
● DataStax Enterprise (DSE) is built on Apache Cassandra
● DSE the world’s most scalable database, well known for 100% uptime, unmatched low latency
● DSE has the ability to handle and manage massive data at planetary scale
● DataStax Enterprise is a cohesive data management platform
● You have the ability to handle different workloads for different use cases using DSE Graph, DSE Analytics, and DSE Search
integration
Packages and capabilities of DataStax Enterprise
● There are different packages that come together to form the DataStax Enterprise ecosystem
○ DataStax OpsCenter
○ DataStax Studio
○ DataStax Enterprise
○ DataStax Enterprise Search
○ DataStax Enterprise Analytics with Spark
○ DataStax Enterprise Graph e.t.c
DataStax Enterprise with Search
● DSE Search allows you to quickly find data and provides a more flexibility search experience for your users
● With DSE Search you can create features like product catalogs, document repositories, ad-hoc reporting engines easily
● Data is written to the database first, and then indexes are updated next, you must create index on your data to enable search
capabilities
● The benefits of running enterprise search functions through DataStax Enterprise and DSE Search include:
○ DSE Search is backed by a scalable database, the connections and the packages are fully integrated
○ A persistent store for search indexes
○ You can easily examine and aggregate data in real-time using CQL
○ Supports indexing and querying of advanced data types, including tuples and user-defined types (UDT)
DataStax Enterprise with Analytics (Spark)
● DSE integrates real-time and batch operational analytics capabilities with an of Apache Spark
● With DSE Analytics you can easily generate reports, target customer and process real-time streams of data
● Care should be taken when enabling both Search and Analytics capability are enabled on a DSE node
● Provision sufficient memory and compute resources to accommodate the specific indexing, query, and processing
appropriate to the use case
● Spark is the default mode when you start an analytics node in a packaged installation. Spark runs locally on each node
DataStax Enterprise with Analytics (Spark)
● DSE Analytics includes integration with Apache Spark, Spark is the framework that will help to support the analytics
applications. Use DSE Analytics to analyze huge databases
● Spark is a distributed computation engine that is designed to handle big data and for in-memory processing
● Features of DSE Analytics
○ Spark Master management
○ Analytics without ETL
○ DataStax Enterprise file system (DSEFS)
○ DSE Analytics Solo
○ Integrated security
○ AlwaysOn SQL
DataStax Enterprises with Graph
● DSE graph is built on top of Apache TinkerPop, Apache Cassandra, Apache Solr, and Apache Spark
● DSE Graph uses Apache TinkerPop standards for data and traversal while also using Apache Cassandra for scalable storage
and retrieval
● DSE Graph supports both transactional and analytic workloads, using two different engines
○ OLAP: Online analytical processing (OLAP) is typically used to perform multidimensional analysis of data
■ Complex calculations on aggregated historical data
○ OLTP: Online transactional processing (OLTP) is characterized by a large number of short, online transactions for
very fast query processing
■ OLTP is typically used for data entry and retrieval with transaction-oriented applications
■ OLTP queries are best for questions that require access to a limited subset of the entire graph
DataStax Enterprise with Graph
● All the DataStax enterprise components are integrated into the DSE graph to form a real-time graph database management
system
● It has the built-in DSE Analytics and DSE Search functionality, visual management and monitoring, and development tools
including DataStax Studio incorporated
Running DataStax Enterprise packages on Docker
● Install Docker on your machine
● Pull all the needed DataStax Enterprise packages images
● Set up DSE Search, DSE Analytics and DSE Graph on Docker container
● Remote into the Docker Containers
● Create a table in Cassandra using CQL
● Access and create a search index on table
● Transform table with Spark Scala on Cassandra table using DSE Analytics
● Access the table in DataStax Studio
● Use the DSE Graph to query the data
Demo
● https://github.com/yTek01/Getting-Started-with-DSE-on-Docker
Resources
● https://docs.datastax.com/en/dse/6.7/dse-
admin/datastax_enterprise/newFeatures.html
● https://docs.datastax.com/en/dse/6.0/dse-
dev/datastax_enterprise/dseGettingStarted.html
● https://docs.datastax.com/en/dse/6.0/dse-
arch/datastax_enterprise/dbArch/archGraphSimilarDiff.html
● https://github.com/datastax/docker-images
● https://github.com/roberd13/Getting-Started-With-DSE-and-Docker
● https://docs.docker.com/engine/install/
Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037

More Related Content

PPTX
Building a REST API with Cassandra on Datastax Astra Using Python and Node
PDF
Cassandra Distributions and Variants
PPTX
Apache Cassandra Lunch #70: Basics of Apache Cassandra
PPTX
Introducing DataStax Enterprise 4.7
PPTX
Migrating from a Relational Database to Cassandra: Why, Where, When and How
PPTX
Apache Cassandra Lunch #71: Creating a User Profile Using DataStax Astra and ...
PDF
Dynamic Object Routing
PPTX
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
Building a REST API with Cassandra on Datastax Astra Using Python and Node
Cassandra Distributions and Variants
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Introducing DataStax Enterprise 4.7
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Apache Cassandra Lunch #71: Creating a User Profile Using DataStax Astra and ...
Dynamic Object Routing
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival

What's hot (20)

PDF
Feeding Cassandra with Spark-Streaming and Kafka
PPTX
Backup multi-cloud solution based on named pipes
PPTX
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
PPTX
Datastax / Cassandra Modeling Strategies
PDF
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
PPTX
Apache Cassandra Lunch #50: Machine Learning with Spark + Cassandra
PDF
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
PPTX
Apache Cassandra Lunch #72: Databricks and Cassandra
PDF
Real Time Analytics with Dse
PDF
An Overview of Apache Spark
PPTX
Captial One: Why Stream Data as Part of Data Transformation?
PDF
Introduction to apache spark
PPTX
Spark - The Ultimate Scala Collections by Martin Odersky
PDF
Databases and how to choose them
PPTX
Zabbix at scale with Elasticsearch
PDF
Workshop - How to benchmark your database
PDF
Demystifying the Distributed Database Landscape
PPTX
Empowering the AWS DynamoDB™ application developer with Alternator
PDF
Data Pipelines with Spark & DataStax Enterprise
PPTX
Building an ETL pipeline for Elasticsearch using Spark
Feeding Cassandra with Spark-Streaming and Kafka
Backup multi-cloud solution based on named pipes
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Datastax / Cassandra Modeling Strategies
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Apache Cassandra Lunch #50: Machine Learning with Spark + Cassandra
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Apache Cassandra Lunch #72: Databricks and Cassandra
Real Time Analytics with Dse
An Overview of Apache Spark
Captial One: Why Stream Data as Part of Data Transformation?
Introduction to apache spark
Spark - The Ultimate Scala Collections by Martin Odersky
Databases and how to choose them
Zabbix at scale with Elasticsearch
Workshop - How to benchmark your database
Demystifying the Distributed Database Landscape
Empowering the AWS DynamoDB™ application developer with Alternator
Data Pipelines with Spark & DataStax Enterprise
Building an ETL pipeline for Elasticsearch using Spark
Ad

Similar to Apache Cassandra Lunch #75: Getting Started with DataStax Enterprise on Docker (20)

PPTX
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
PPTX
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
PPTX
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
PPTX
Webinar - Bringing connected graph data to Cassandra with DSE Graph
PDF
Datastax enterprise presentation
PPSX
implementation of a big data architecture for real-time analytics with data s...
PDF
DataStax: Datastax Enterprise - The Multi-Model Platform
PDF
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
PPTX
Introduction to DataStax Enterprise Graph Database
PDF
DataStax: Making a Difference with Smart Analytics
PPTX
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
PPTX
Webinar | Introducing DataStax Enterprise 4.6
PPTX
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
PDF
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
PPTX
How to get Real-Time Value from your IoT Data - Datastax
PDF
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
PPTX
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
PPTX
Webinar: Don't Leave Your Data in the Dark
PPTX
BI, Reporting and Analytics on Apache Cassandra
PPTX
Cassandra Lunch #95: Spark Graph Operations with DSEGraphFrames Scala API
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar - Bringing connected graph data to Cassandra with DSE Graph
Datastax enterprise presentation
implementation of a big data architecture for real-time analytics with data s...
DataStax: Datastax Enterprise - The Multi-Model Platform
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Introduction to DataStax Enterprise Graph Database
DataStax: Making a Difference with Smart Analytics
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar | Introducing DataStax Enterprise 4.6
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
How to get Real-Time Value from your IoT Data - Datastax
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Don't Leave Your Data in the Dark
BI, Reporting and Analytics on Apache Cassandra
Cassandra Lunch #95: Spark Graph Operations with DSEGraphFrames Scala API
Ad

More from Anant Corporation (20)

PPTX
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
PPTX
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
PDF
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
PDF
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
PDF
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
PDF
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
PPTX
YugabyteDB Developer Tools
PPTX
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
PPTX
Machine Learning Orchestration with Airflow
PDF
Cassandra Lunch 130: Recap of Cassandra Forward Talks
PDF
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
PDF
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
PDF
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
PDF
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
PDF
Data Engineer's Lunch #85: Designing a Modern Data Stack
PPTX
PDF
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
PDF
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
PPTX
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
PPTX
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
YugabyteDB Developer Tools
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Machine Learning Orchestration with Airflow
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Mega Projects Data Mega Projects Data
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
Business Analytics and business intelligence.pdf
climate analysis of Dhaka ,Banglades.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
ISS -ESG Data flows What is ESG and HowHow
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
IB Computer Science - Internal Assessment.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
Mega Projects Data Mega Projects Data
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to machine learning and Linear Models
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx

Apache Cassandra Lunch #75: Getting Started with DataStax Enterprise on Docker

  • 1. Version 1.0 Getting Started with DataStax Enterprise (DSE) on Docker In Cassandra Lunch #75, we are going to look at getting started with DataStax Enterprise on Docker. Isaac Omolayo Jr. Software Engineer @Anant
  • 2. Getting Started with DataStax Enterprise ● What is DataStax Enterprise ? ● Packages and capabilities of DataStax Enterprise ● Using the DSE Search to solve data problems ● Using DSE Analytics (Spark) to handle data workloads ● Using the DataStax Enterprise Graph ● Running DataStax Enterprise packages on Docker ● Working with DSE Studio, DSE Search, DSE Analytics, DSE Graph
  • 3. What is DataStax Enterprise ? ● DataStax Enterprise helps enterprises to build transformational data architectures for applications, microservices and different use cases. The purpose of these is for data sovereignty, availability, scalability, agility, and accessibility by any user ● DataStax Enterprise (DSE) is built on Apache Cassandra ● DSE the world’s most scalable database, well known for 100% uptime, unmatched low latency ● DSE has the ability to handle and manage massive data at planetary scale ● DataStax Enterprise is a cohesive data management platform ● You have the ability to handle different workloads for different use cases using DSE Graph, DSE Analytics, and DSE Search integration
  • 4. Packages and capabilities of DataStax Enterprise ● There are different packages that come together to form the DataStax Enterprise ecosystem ○ DataStax OpsCenter ○ DataStax Studio ○ DataStax Enterprise ○ DataStax Enterprise Search ○ DataStax Enterprise Analytics with Spark ○ DataStax Enterprise Graph e.t.c
  • 5. DataStax Enterprise with Search ● DSE Search allows you to quickly find data and provides a more flexibility search experience for your users ● With DSE Search you can create features like product catalogs, document repositories, ad-hoc reporting engines easily ● Data is written to the database first, and then indexes are updated next, you must create index on your data to enable search capabilities ● The benefits of running enterprise search functions through DataStax Enterprise and DSE Search include: ○ DSE Search is backed by a scalable database, the connections and the packages are fully integrated ○ A persistent store for search indexes ○ You can easily examine and aggregate data in real-time using CQL ○ Supports indexing and querying of advanced data types, including tuples and user-defined types (UDT)
  • 6. DataStax Enterprise with Analytics (Spark) ● DSE integrates real-time and batch operational analytics capabilities with an of Apache Spark ● With DSE Analytics you can easily generate reports, target customer and process real-time streams of data ● Care should be taken when enabling both Search and Analytics capability are enabled on a DSE node ● Provision sufficient memory and compute resources to accommodate the specific indexing, query, and processing appropriate to the use case ● Spark is the default mode when you start an analytics node in a packaged installation. Spark runs locally on each node
  • 7. DataStax Enterprise with Analytics (Spark) ● DSE Analytics includes integration with Apache Spark, Spark is the framework that will help to support the analytics applications. Use DSE Analytics to analyze huge databases ● Spark is a distributed computation engine that is designed to handle big data and for in-memory processing ● Features of DSE Analytics ○ Spark Master management ○ Analytics without ETL ○ DataStax Enterprise file system (DSEFS) ○ DSE Analytics Solo ○ Integrated security ○ AlwaysOn SQL
  • 8. DataStax Enterprises with Graph ● DSE graph is built on top of Apache TinkerPop, Apache Cassandra, Apache Solr, and Apache Spark ● DSE Graph uses Apache TinkerPop standards for data and traversal while also using Apache Cassandra for scalable storage and retrieval ● DSE Graph supports both transactional and analytic workloads, using two different engines ○ OLAP: Online analytical processing (OLAP) is typically used to perform multidimensional analysis of data ■ Complex calculations on aggregated historical data ○ OLTP: Online transactional processing (OLTP) is characterized by a large number of short, online transactions for very fast query processing ■ OLTP is typically used for data entry and retrieval with transaction-oriented applications ■ OLTP queries are best for questions that require access to a limited subset of the entire graph
  • 9. DataStax Enterprise with Graph ● All the DataStax enterprise components are integrated into the DSE graph to form a real-time graph database management system ● It has the built-in DSE Analytics and DSE Search functionality, visual management and monitoring, and development tools including DataStax Studio incorporated
  • 10. Running DataStax Enterprise packages on Docker ● Install Docker on your machine ● Pull all the needed DataStax Enterprise packages images ● Set up DSE Search, DSE Analytics and DSE Graph on Docker container ● Remote into the Docker Containers ● Create a table in Cassandra using CQL ● Access and create a search index on table ● Transform table with Spark Scala on Cassandra table using DSE Analytics ● Access the table in DataStax Studio ● Use the DSE Graph to query the data
  • 12. Resources ● https://docs.datastax.com/en/dse/6.7/dse- admin/datastax_enterprise/newFeatures.html ● https://docs.datastax.com/en/dse/6.0/dse- dev/datastax_enterprise/dseGettingStarted.html ● https://docs.datastax.com/en/dse/6.0/dse- arch/datastax_enterprise/dbArch/archGraphSimilarDiff.html ● https://github.com/datastax/docker-images ● https://github.com/roberd13/Getting-Started-With-DSE-and-Docker ● https://docs.docker.com/engine/install/
  • 13. Strategy: Scalable Fast Data Architecture: Cassandra, Spark, Kafka Engineering: Node, Python, JVM,CLR Operations: Cloud, Container Rescue: Downtime!! I need help. www.anant.us | solutions@anant.us | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037