SlideShare a Scribd company logo
Data Engineering with Protobuf
Protocol buffers
As a Data Engineer
Who is this guy talking?
Football
+4 years as Data Engineer
2 Years in SafetyCulture
Data Engineering with Protobuf
Protocol Buffers
Protocol Buffers
According to Google:
- Protocol buffers are Google's language-neutral, platform-
neutral, extensible mechanism for serializing structured
data
Simple Example
Why Protocol Buffers?
Company Strategy
Company Strategy
Service
A DB
Service
B
Data Lake
Connector
RedshiftConsumer
Consumer
Company Strategy
Company Strategy
Service
A
DB
Service
B
Data Lake Redshift
DBConsumer Consumer
PROTOBUF PROTOBUF
How it works?
Code Sample
Code Sample
Device.proto
Code
Generation
Legacy BE
New BE
Mobile
Data Eng. Stack
Code Sample
How do we managed it?
Protobuf
Schema Repo
Java Code Data Engineer
Repo
Message Crew
Build
Commit Code
Data Processing
Data StagingConsumer
Data Lake
Java Protobuf/ ScalaPb
Parquet
Redshift
Spectrum
Kafka
Streams
Athena
Performance
- Performance
http://bit.ly/2mlQbkN
Performance
- Comparing different JVM Libs
http://bit.ly/2moTnw9
The Good Things!
- Performance!
- Most or problems found in Compilation Time
- Less test cases
- Schema Changes are Smooth
- Addons + Validations direct to the Schema!
Not Good Things
- Lack of tooling for Kafka, You will need to write your own SerDe
- No schema registry for Confluent Kafka
- KSQL Nah!
- Lack of tooling for Spark, but that is not too bad with ScalaPb
- AWS Tools Doesn’t support Protobuf native (Athena, Glue)
Conclusion
Q & A
Data Engineering with Protobuf

More Related Content

PPTX
DeNA West & BigQuery
PDF
Word Embedding for Nearest Words
PPTX
Oops! I Wrote my Data Science in COBOL
ODP
Scaling Streaming - Concepts, Research, Goals
PPTX
Google Protocol Buffers + gRPC
PDF
5 pipeline arch_rationale
PPT
Soprex framework on .net in action
PDF
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
DeNA West & BigQuery
Word Embedding for Nearest Words
Oops! I Wrote my Data Science in COBOL
Scaling Streaming - Concepts, Research, Goals
Google Protocol Buffers + gRPC
5 pipeline arch_rationale
Soprex framework on .net in action
The Practice of Presto & Alluxio in E-Commerce Big Data Platform

Similar to Data Engineering with Protobuf (20)

PDF
Benchmarking Hadoop and Big Data
PPTX
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
PPTX
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
PDF
Bodo Value Guide.pdf
PDF
ISTA 2019 - Migrating data-intensive microservices from Python to Go
DOC
REEJA_CV1
PPT
Ibm Cognos B Iund Pmfj
PDF
System to generate speech to text in real time
PPTX
Testing Big Data solutions fast and furiously
PPTX
Case Study: Credit Card Core System with Exalogic, Exadata, Oracle Cloud Mach...
PDF
Faster, more Secure Application Modernization and Replatforming with PKS - Ku...
PDF
Verification of the QorIQ Communication Platform Containing CoreNet Fabric wi...
PDF
Ceph Day Beijing - SPDK for Ceph
PDF
Ceph Day Beijing - SPDK in Ceph
PDF
Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King
PPT
Perfsystems- Consulting Services
PDF
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
PPT
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
PPT
Asko Oja Moskva Architecture Highload
PDF
Building an MLOps Stack for Companies at Reasonable Scale
Benchmarking Hadoop and Big Data
Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems ...
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Bodo Value Guide.pdf
ISTA 2019 - Migrating data-intensive microservices from Python to Go
REEJA_CV1
Ibm Cognos B Iund Pmfj
System to generate speech to text in real time
Testing Big Data solutions fast and furiously
Case Study: Credit Card Core System with Exalogic, Exadata, Oracle Cloud Mach...
Faster, more Secure Application Modernization and Replatforming with PKS - Ku...
Verification of the QorIQ Communication Platform Containing CoreNet Fabric wi...
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK in Ceph
Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King
Perfsystems- Consulting Services
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Asko Oja Moskva Architecture Highload
Building an MLOps Stack for Companies at Reasonable Scale
Ad

Recently uploaded (20)

PPTX
Introduction to Artificial Intelligence
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Nekopoi APK 2025 free lastest update
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
medical staffing services at VALiNTRY
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
history of c programming in notes for students .pptx
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Transform Your Business with a Software ERP System
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Cost to Outsource Software Development in 2025
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
Introduction to Artificial Intelligence
Understanding Forklifts - TECH EHS Solution
Digital Systems & Binary Numbers (comprehensive )
Reimagine Home Health with the Power of Agentic AI​
L1 - Introduction to python Backend.pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Nekopoi APK 2025 free lastest update
Design an Analysis of Algorithms II-SECS-1021-03
medical staffing services at VALiNTRY
CHAPTER 2 - PM Management and IT Context
history of c programming in notes for students .pptx
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Designing Intelligence for the Shop Floor.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Transform Your Business with a Software ERP System
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
wealthsignaloriginal-com-DS-text-... (1).pdf
Cost to Outsource Software Development in 2025
Wondershare Filmora 15 Crack With Activation Key [2025
Ad

Data Engineering with Protobuf