SlideShare a Scribd company logo
Version 1.0
StreamSets and Cassandra
In Cassandra Lunch #94, Arpan Patel will discuss how to connect
StreamSets and Cassandra.
Arpan Patel
Engineer @ Anant
Streamsets
● Data Integration Platform Built for
DataOps
● Build streaming, batch, CDC, ETL, and
ML pipelines from a single UI and
deploy data and workloads to any cloud
● DataOps Platform
○ Free tier (no cc required) with
Data Collector Engine, Transform
Engine, Control Hub
○ Self Managed Deployments via
Docker
○ 2 active jobs, 2 active users, 10
published pipelines
Streamsets
● Control Hub
● Data Collector Engine
○ Open-source
● Transformer Engine
○ Can natively execute on Apache Spark,
Snowflake, AWS EMR, Google Cloud
Dataproc, and Databricks platforms
● Pre-built connectors and native integrations
○ Applications, Big Data, SQL/NoSQL DBs,
Storage/Warehouses, Streaming
○ Tons of Sources + Destinations
● StreamSets Academy + Tutorials
Cassandra Connector
● https://docs.streamsets.com/platform-
datacollector/latest/datacollector/UserGuide/Destinations/
Cassandra.html
● Supported Apache Cassandra Versions
○ 1.2, 2.x, 3.x
● Additional Auth
○ DSE
○ Kerberos
○ SSL
○ TLS
● Batch Writes
○ Logged -> distributed batch log and atomic
○ Unlogged -> can write partial batches of records to
Cassandra
Demo
● Spin up Data Collector Deployment from Control Hub + Docker
● Get CSV data from GitHub
● Make data transformations + arithmetic
● Spin Up Cassandra on Docker + Connect in Control Hub
● Write to Cassandra
Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037

More Related Content

PPTX
Data Engineer's Lunch #57: StreamSets for Data Engineering
PPTX
Apache Cassandra Lunch #72: Databricks and Cassandra
PDF
Hadoop and OpenStack - Hadoop Summit San Jose 2014
PDF
Hadoop and OpenStack
PPTX
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
PPTX
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
PDF
Cassandra REST API with Pagination TEAM 15
PDF
Introduction to Apache Mesos and DC/OS
Data Engineer's Lunch #57: StreamSets for Data Engineering
Apache Cassandra Lunch #72: Databricks and Cassandra
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Cassandra REST API with Pagination TEAM 15
Introduction to Apache Mesos and DC/OS

Similar to Apache Cassandra Lunch #94: StreamSets and Cassandra (20)

PDF
Customer Education Webcast: New Features in Data Integration and Streaming CDC
PDF
Openstack For Beginners
PPTX
Introduction to Apache Apex and writing a big data streaming application
PPTX
Cassandra - A Basic Introduction Guide
PDF
Introduction openstack-meetup-nov-28
PDF
SQL Engines for Hadoop - The case for Impala
PPTX
Kubernetes #1 intro
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
PDF
Cassandra & Spark for IoT
PDF
Sa introduction to big data pipelining with cassandra & spark west mins...
PPTX
Container Conf 2017: Rancher Kubernetes
PDF
Multi-cluster k8ssandra
PPTX
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
PPTX
Data Engineer's Lunch #56: Spring Cloud Data Flow with Cassandra
PDF
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
PPTX
High Performance Processing of Streaming Data
ODP
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
PPTX
Spark Study Notes
PDF
Chicago Kafka Meetup
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
Customer Education Webcast: New Features in Data Integration and Streaming CDC
Openstack For Beginners
Introduction to Apache Apex and writing a big data streaming application
Cassandra - A Basic Introduction Guide
Introduction openstack-meetup-nov-28
SQL Engines for Hadoop - The case for Impala
Kubernetes #1 intro
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Cassandra & Spark for IoT
Sa introduction to big data pipelining with cassandra & spark west mins...
Container Conf 2017: Rancher Kubernetes
Multi-cluster k8ssandra
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Data Engineer's Lunch #56: Spring Cloud Data Flow with Cassandra
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
High Performance Processing of Streaming Data
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Spark Study Notes
Chicago Kafka Meetup
5 Comparing Microsoft Big Data Technologies for Analytics
Ad

More from Anant Corporation (20)

PPTX
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
PPTX
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
PDF
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
PDF
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
PDF
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
PDF
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
PPTX
YugabyteDB Developer Tools
PPTX
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
PPTX
Machine Learning Orchestration with Airflow
PDF
Cassandra Lunch 130: Recap of Cassandra Forward Talks
PDF
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
PDF
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
PDF
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
PDF
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
PDF
Data Engineer's Lunch #85: Designing a Modern Data Stack
PPTX
PDF
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
PDF
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
PPTX
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
PPTX
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
YugabyteDB Developer Tools
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Machine Learning Orchestration with Airflow
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Lecture1 pattern recognition............
PDF
Foundation of Data Science unit number two notes
PPT
Quality review (1)_presentation of this 21
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Business Analytics and business intelligence.pdf
PPTX
1_Introduction to advance data techniques.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Galatica Smart Energy Infrastructure Startup Pitch Deck
Lecture1 pattern recognition............
Foundation of Data Science unit number two notes
Quality review (1)_presentation of this 21
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Knowledge Engineering Part 1
IB Computer Science - Internal Assessment.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
climate analysis of Dhaka ,Banglades.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Analytics and business intelligence.pdf
1_Introduction to advance data techniques.pptx

Apache Cassandra Lunch #94: StreamSets and Cassandra

  • 1. Version 1.0 StreamSets and Cassandra In Cassandra Lunch #94, Arpan Patel will discuss how to connect StreamSets and Cassandra. Arpan Patel Engineer @ Anant
  • 2. Streamsets ● Data Integration Platform Built for DataOps ● Build streaming, batch, CDC, ETL, and ML pipelines from a single UI and deploy data and workloads to any cloud ● DataOps Platform ○ Free tier (no cc required) with Data Collector Engine, Transform Engine, Control Hub ○ Self Managed Deployments via Docker ○ 2 active jobs, 2 active users, 10 published pipelines
  • 3. Streamsets ● Control Hub ● Data Collector Engine ○ Open-source ● Transformer Engine ○ Can natively execute on Apache Spark, Snowflake, AWS EMR, Google Cloud Dataproc, and Databricks platforms ● Pre-built connectors and native integrations ○ Applications, Big Data, SQL/NoSQL DBs, Storage/Warehouses, Streaming ○ Tons of Sources + Destinations ● StreamSets Academy + Tutorials
  • 4. Cassandra Connector ● https://docs.streamsets.com/platform- datacollector/latest/datacollector/UserGuide/Destinations/ Cassandra.html ● Supported Apache Cassandra Versions ○ 1.2, 2.x, 3.x ● Additional Auth ○ DSE ○ Kerberos ○ SSL ○ TLS ● Batch Writes ○ Logged -> distributed batch log and atomic ○ Unlogged -> can write partial batches of records to Cassandra
  • 5. Demo ● Spin up Data Collector Deployment from Control Hub + Docker ● Get CSV data from GitHub ● Make data transformations + arithmetic ● Spin Up Cassandra on Docker + Connect in Control Hub ● Write to Cassandra
  • 6. Strategy: Scalable Fast Data Architecture: Cassandra, Spark, Kafka Engineering: Node, Python, JVM,CLR Operations: Cloud, Container Rescue: Downtime!! I need help. www.anant.us | solutions@anant.us | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037