SlideShare a Scribd company logo
KafkaandKafkaStreamsinthe
GlobalSchibstedDataPlatform
Fredrik Vraalsen, 16.10.2018
•Data Platform Architect

City of Oslo
•Former Data Engineer

Data Platform
FredrikVraalsen
WHOAMI?
SCHIBSTED
SCHIBSTED
22 countries
200 million users/month
20 billion pageviews/month
SCHIBSTEDPRODUCTS&TECHNOLOGY
ABITOFHISTORY…
DATAPIPELINE
Collector
Kinesis
Batch
Storage S3
Piper
STREAMPROCESSING
Collector
Kinesis
Batch
Storage S3
Piper
INCOMINGEVENTS
THEISSUES
• “High” latency (~30 seconds)
• Delivery guarantees
• On-boarding
• Manual configuration
• Homegrown solution
VISION
• Data-driven applications
• Self-serve
• GDPR
• Performance
• State of the art
WHY ?
STREAMPROCESSING
• Lightweight library
• Streams and Tables
• High-level DSL
• Low level API
KAFKASTREAMS
YGGDRASILWASBORN
OLDSTREAMINGPIPELINE
Storage Piper
NEWSTREAMINGPIPELINE
Storage Piper YggdrasilStorage
GETTINGDATAIN&OUT
https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
GETTINGDATAIN&OUT
http://kafka.apache.org/documentation.html
GETTINGDATAIN&OUT
DuratroYggdrasilStorage
Event
firehose
Sink
topics
3RDPARTYANALYTICS
DATAQUALITY
DATADRIVENAPPLICATIONS
https://www.slideshare.net/DataStax/c-for-deep-learning-andrew-jefferson-tracktable-cassandra-summit-2016
https://www.flickr.com/photos/rahulrodriguez/14683524180
https://pixabay.com/en/map-photoshop-geolocation-journey-947471/
GROWINGPAINS
SCALINGUP
BUMPYRIDE
https://pixabay.com/no/veien-pukler-fremover-veiskilt-246/
CHALLENGES&EXPERIENCES
http://www.publicdomainpictures.net/view-image.php?image=6884
https://pixabay.com/en/software-testing-service-762486/
https://pixabay.com/no/brett-l%C3%A6r-note-ferdigheter-597190/
99,5%
99,99976%
CHALLENGES&EXPERIENCES
http://www.publicdomainpictures.net/view-image.php?image=6884
https://pixabay.com/en/software-testing-service-762486/
https://pixabay.com/no/brett-l%C3%A6r-note-ferdigheter-597190/
100%
99,99992%
SELFSERVE
SELFSERVE
• Challenge: Transformations and routing
• Multiple configurations
• Who maintains?
• Required Scala knowledge
{
"time": round(parse-time(.published, "yyyy-MM-dd'T'HH:mm:ssX") * 1000),
"device_manufacturer": .device.manufacturer,
"device_model": .device.model,
"language": .device.acceptLanguage,
"os_name": .device.osType,
"os_version": .device.osVersion,
"platform": .device.platformType,
"user_properties": {
"is_logged_in" : boolean(.actor."spt:userId")
}
}
SELFSERVE
• JSLT – DSL for JSON transformation & queries
https://github.com/schibsted/jslt
SELFSERVE
Kafka and Kafka Streams in the Global Schibsted Data Platform
Kafka and Kafka Streams in the Global Schibsted Data Platform
Data Platform – Oslo Origo
Oslo Origo
• Digital transformation
• Smarter services
• Data driven
City of Oslo
673,469
53,000
50+
200,000
Data Platform – Oslo Origo
https://www.flickr.com/photos/boaski/8079390195
Data Platform – Oslo Origo
• Creating value from our data
• Awareness of opportunities
• Simple and safe data access
• Insights and decision making
• Collaboration and sharing
Kafka and Kafka Streams in the Global Schibsted Data Platform
THANKYOU!
Fredrik Vraalsen
@fredriv
fredrik@vraalsen.no

More Related Content

PDF
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
PPTX
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
PDF
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
PPTX
Google Cloud and Data Pipeline Patterns
PPTX
Introduction to knime
PPTX
Integrate 2017 unlock azure hybrid integration with biz talk - ws
PDF
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
PDF
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Google Cloud and Data Pipeline Patterns
Introduction to knime
Integrate 2017 unlock azure hybrid integration with biz talk - ws
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...

What's hot (20)

PDF
The Expert Guide to Fast Data
PPTX
A Walkthrough of InfluxCloud 2.0 by Tim Hall
PDF
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
PDF
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
PDF
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
PDF
How to use hybrid cloud to migrate and deploy unified business applications i...
PDF
DataXDay - Real-Time Access log analysis
PDF
"Smooth Operator" [Bay Area NewSQL meetup]
PPTX
Concept to reality: An advanced agile integration blueprint
PDF
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
PPTX
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
PPTX
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
PDF
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
PDF
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
PDF
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
PDF
Why you really want SQL in a Real-Time Enterprise Environment
PDF
How to Gain Visibility into Containers, VM’s and Multi-Cloud Environments Usi...
PPT
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
PDF
[WSO2Con USA 2018] Microservices, Containers, and Beyond
PDF
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
The Expert Guide to Fast Data
A Walkthrough of InfluxCloud 2.0 by Tim Hall
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
How to use hybrid cloud to migrate and deploy unified business applications i...
DataXDay - Real-Time Access log analysis
"Smooth Operator" [Bay Area NewSQL meetup]
Concept to reality: An advanced agile integration blueprint
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Why you really want SQL in a Real-Time Enterprise Environment
How to Gain Visibility into Containers, VM’s and Multi-Cloud Environments Usi...
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
[WSO2Con USA 2018] Microservices, Containers, and Beyond
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Ad

Similar to Kafka and Kafka Streams in the Global Schibsted Data Platform (20)

PDF
PLNOG 8: Kazimierz Jantas - Innowacyjne rozwiązania dla IT
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PDF
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
PDF
World’s Fastest Image Serving Technology
PDF
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
PDF
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
PDF
HBase Meetup @ Cask HQ 09/25
PDF
Building real time data-driven products
PPTX
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
PPTX
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
PDF
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
PDF
What's New in Upcoming Apache Spark 2.3
PDF
Accelerate Big Data Application Development with Cascading
PPTX
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
PDF
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
PDF
Wasp2 - IoT and Streaming Platform
PDF
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
PDF
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
PLNOG 8: Kazimierz Jantas - Innowacyjne rozwiązania dla IT
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
World’s Fastest Image Serving Technology
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
HBase Meetup @ Cask HQ 09/25
Building real time data-driven products
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
What's New in Upcoming Apache Spark 2.3
Accelerate Big Data Application Development with Cascading
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Wasp2 - IoT and Streaming Platform
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Ad

More from Fredrik Vraalsen (10)

PDF
Building applications with Serverless Framework and AWS Lambda - JavaZone 2019
PDF
Building applications with Serverless Framework and AWS Lambda
PDF
Scala intro workshop
PDF
Event stream processing using Kafka streams
PDF
Hjelp, vi skal kode funksjonelt i Java!
PDF
Java 8 DOs and DON'Ts - javaBin Oslo May 2015
PDF
Functional programming in Java 8 - workshop at flatMap Oslo 2014
PDF
Java 8 - Return of the Java
PDF
Java 8 to the rescue!?
ODP
Git i praksis - erfaringer med overgang fra ClearCase til Git
Building applications with Serverless Framework and AWS Lambda - JavaZone 2019
Building applications with Serverless Framework and AWS Lambda
Scala intro workshop
Event stream processing using Kafka streams
Hjelp, vi skal kode funksjonelt i Java!
Java 8 DOs and DON'Ts - javaBin Oslo May 2015
Functional programming in Java 8 - workshop at flatMap Oslo 2014
Java 8 - Return of the Java
Java 8 to the rescue!?
Git i praksis - erfaringer med overgang fra ClearCase til Git

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Advanced Soft Computing BINUS July 2025.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Monthly Chronicles - July 2025
MYSQL Presentation for SQL database connectivity
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
Cloud computing and distributed systems.
Review of recent advances in non-invasive hemoglobin estimation
GamePlan Trading System Review: Professional Trader's Honest Take
Advanced methodologies resolving dimensionality complications for autism neur...

Kafka and Kafka Streams in the Global Schibsted Data Platform