SlideShare a Scribd company logo
Big Data Logging Pipeline with Apache Spark and Kafka
Shipping YaaS logs with Apache Spark and Kafka
Dogukan Sonmez
Senior Software Engineer @hybris Software
@dogukansonmez
Agenda
² Introduction to Yaas
² Architecture of Logging pipeline
² Technology behind logging pipeline
² Challenges
² Recap
² Q&A
What is YaaS
SAP hybris as a Service
(YaaS)
A micro-service based Business PaaS
Integrated with hybris and SAP Solutions
Build
Publish
Fast
yaas.io
Architecture of Logging pipeline
Architecture of Logging pipeline
Technology behind logging pipeline
High Throughput messaging
Broker
Distributed
Scalable
Fault Tolerant
Topic
Partition
Replicated
Offset
Technology behind logging pipeline
Micro Batching RDD
Streaming
DAG
Reliable
ML
Scalable
Graph
Fast
Big Data pipeline challenges
Reliability of Kafka
v 3 Brokers
v 3 Zookeeper instances
v default.replication.factor=2
v Mainly with Default Configurations
v 5 Brokers
v 5 Zookeeper instances
v unclean.leader.election.enable=false
v min.insync.replicas=2
v default.replication.factor=3
BEFORE AFTER
Big Data pipeline challenges
Spark Streaming Checkpointing
v Spark checkpointing
v All RDD serialized and stored at HDFS
v Custom kafka checkpointing
(Only latest offset stored at kafka)
BEFORE AFTER
Big Data pipeline challenges
Elasticsearch indexing big data
v Default mapping
v index.refresh_interval = 1s
v Indices.memory_index_buffer_size= 10%
v Custom mapping with disabled norms
v Mapping using simple analyzer
v index.refresh_interval = 30s
v Indices.memory_index_buffer_size= 30%
v spark.streaming.kafka.maxRatePerPartition=10000
BEFORE AFTER
Recap
Recap
Q&A
https://hackingat.hybris.com

More Related Content

PPTX
Confluent and Syncsort Webinar August 2016
PPTX
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
PDF
RedisConf17 - Explosion of Data at the Edge in Equinix
PDF
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
PDF
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
PDF
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
PDF
Integrating Apache Kafka Into Your Environment
Confluent and Syncsort Webinar August 2016
Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...
RedisConf17 - Explosion of Data at the Edge in Equinix
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Integrating Apache Kafka Into Your Environment

What's hot (20)

PPTX
Encrypting Kafka messages at rest to secure applications | Robert Barnes, Has...
PDF
End to-end large messages processing with Kafka Streams & Kafka Connect
PDF
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
PDF
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
PDF
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
PDF
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
PDF
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
PDF
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
PDF
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...
PPTX
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
PPTX
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
PDF
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
PDF
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
PPTX
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
PDF
Automate Your Kafka Cluster with Kubernetes Custom Resources
PPTX
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
PDF
From data stream management to distributed dataflows and beyond
PDF
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
Encrypting Kafka messages at rest to secure applications | Robert Barnes, Has...
End to-end large messages processing with Kafka Streams & Kafka Connect
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Webinar | Better Together: Apache Cassandra and Apache Kafka
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Automate Your Kafka Cluster with Kubernetes Custom Resources
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
From data stream management to distributed dataflows and beyond
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
Ad

Viewers also liked (18)

PPTX
Pixel shaders
PPTX
Spark Data Streaming Pipeline
PDF
Email Classifier using Spark 1.3 Mlib / ML Pipeline
ODP
PPTX
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
PPT
Geometry Shader-based Bump Mapping Setup
PDF
Shaders - Claudia Doppioslash - Unity With the Best
PPTX
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
PDF
Unity Surface Shader for Artist 02
PDF
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
PDF
Building Scalable Big Data Pipelines
PPTX
Building a unified data pipeline in Apache Spark
PPTX
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
PPTX
Working with Shader with Unity
PPTX
Aws overview
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PDF
Unity道場11 Shader Forge 101 ~ShaderForgeをつかって学ぶシェーダー入門~ 基本操作とよく使われるノード編
Pixel shaders
Spark Data Streaming Pipeline
Email Classifier using Spark 1.3 Mlib / ML Pipeline
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Geometry Shader-based Bump Mapping Setup
Shaders - Claudia Doppioslash - Unity With the Best
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
Unity Surface Shader for Artist 02
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Building Scalable Big Data Pipelines
Building a unified data pipeline in Apache Spark
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
Working with Shader with Unity
Aws overview
Real time Analytics with Apache Kafka and Apache Spark
Unity道場11 Shader Forge 101 ~ShaderForgeをつかって学ぶシェーダー入門~ 基本操作とよく使われるノード編
Ad

Similar to Big Data Logging Pipeline with Apache Spark and Kafka (20)

PDF
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
PDF
2015-11-cloudsoft-basho-brooklyn-riak
PDF
Introducing Kafka's Streams API
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
PDF
What is Apache Kafka and What is an Event Streaming Platform?
PPTX
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
PDF
Apache Spark Streaming
PDF
Wasp2 - IoT and Streaming Platform
PDF
Apache Kafka - Scalable Message Processing and more!
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
Getting started with MariaDB with Docker
PDF
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
PPTX
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
PDF
Serverless Security: Are you ready for the Future?
PDF
Leverage Kafka to build a stream processing platform
PDF
What is apache Kafka?
PDF
What is Apache Kafka®?
PPTX
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
PPTX
Spark Streaming & Kafka-The Future of Stream Processing
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
2015-11-cloudsoft-basho-brooklyn-riak
Introducing Kafka's Streams API
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
What is Apache Kafka and What is an Event Streaming Platform?
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Apache Spark Streaming
Wasp2 - IoT and Streaming Platform
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message-Processing and more !
Getting started with MariaDB with Docker
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
Serverless Security: Are you ready for the Future?
Leverage Kafka to build a stream processing platform
What is apache Kafka?
What is Apache Kafka®?
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming & Kafka-The Future of Stream Processing

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Quality review (1)_presentation of this 21
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
1_Introduction to advance data techniques.pptx
Foundation of Data Science unit number two notes
Supervised vs unsupervised machine learning algorithms
Introduction-to-Cloud-ComputingFinal.pptx
Business Analytics and business intelligence.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Knowledge Engineering Part 1
Quality review (1)_presentation of this 21
climate analysis of Dhaka ,Banglades.pptx
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Qualitative Qantitative and Mixed Methods.pptx
1_Introduction to advance data techniques.pptx

Big Data Logging Pipeline with Apache Spark and Kafka

  • 2. Shipping YaaS logs with Apache Spark and Kafka Dogukan Sonmez Senior Software Engineer @hybris Software @dogukansonmez
  • 3. Agenda ² Introduction to Yaas ² Architecture of Logging pipeline ² Technology behind logging pipeline ² Challenges ² Recap ² Q&A
  • 5. SAP hybris as a Service (YaaS) A micro-service based Business PaaS Integrated with hybris and SAP Solutions Build Publish Fast
  • 9. Technology behind logging pipeline High Throughput messaging Broker Distributed Scalable Fault Tolerant Topic Partition Replicated Offset
  • 10. Technology behind logging pipeline Micro Batching RDD Streaming DAG Reliable ML Scalable Graph Fast
  • 11. Big Data pipeline challenges Reliability of Kafka v 3 Brokers v 3 Zookeeper instances v default.replication.factor=2 v Mainly with Default Configurations v 5 Brokers v 5 Zookeeper instances v unclean.leader.election.enable=false v min.insync.replicas=2 v default.replication.factor=3 BEFORE AFTER
  • 12. Big Data pipeline challenges Spark Streaming Checkpointing v Spark checkpointing v All RDD serialized and stored at HDFS v Custom kafka checkpointing (Only latest offset stored at kafka) BEFORE AFTER
  • 13. Big Data pipeline challenges Elasticsearch indexing big data v Default mapping v index.refresh_interval = 1s v Indices.memory_index_buffer_size= 10% v Custom mapping with disabled norms v Mapping using simple analyzer v index.refresh_interval = 30s v Indices.memory_index_buffer_size= 30% v spark.streaming.kafka.maxRatePerPartition=10000 BEFORE AFTER
  • 14. Recap
  • 15. Recap
  • 16. Q&A