SlideShare a Scribd company logo
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the Universe and How Kafka Can Help on That
ACCELERATINGPARTICLESTO
EXPLORE THEMISTERIESOF THE UNIVERSE
ANDHOWKAFKACANONTHAT
Manuel Martín Márquez
CERN – Scalable Analytics Services
Kafka Summit 2017 – San Francisco
3
CERN
EUROPEANORGANIZATIONFORNUCLEARRESEACH
AWORLDWIDECOLLABORATION
4Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17thCERN - European Laboratory for Particle Physics
5Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
A World-Wide Collaboration
6
WHAT ISTHE UNIVERSEMADE OF?
HOWDITIT START?
FUNDAMENTALRESEARCH
7Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
8
98/9/17
10
WORLD’SMOSTCOMPLEXSCIENTIFICEXPERIMENT
THE MOSTPOWERFULPARTICLEACCELERATOR
8/9/17 Documentreference 11
CERN Aerial View
World’s largest scientific instrument
27km, 6000+ superconducting magnets
Emptiest place in the solar system
High vacuum inside the magnets
Hottest spot in the galaxy
During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun;
Fastest racetrack on Earth
Protons circulate 11245 times/s (99.9999991% the speed of light)
8/9/17 Documentreference 12
LHC Installation
CERN Accelerator Complex is unique installation
Therefore, we have to face unique challenges
Control and Operations
Million of sensors, large number of control devices, front-end equipment, etc.
Many critical systems: Cryogenics, Vacuums, Machine Protection, etc.
8/9/17 Documentreference 13
CMS Detector
150 Million of sensor
Control and detection sensors
Massive 3D camera
Capturing 40+ million collisions per second
14
ON THELIMITOFSCIENCE
TOWARDSHIGHER LUMINOSITY
FUTURECIRCULARCOLLIDER
15
Post-LHC accelerator projects (80-100 km)
17
HOWKAFKACAN SUPPORTUS?
CERN ACCELERATORLOGGING
MONITORING
COMPUTINGSECURITY
PHYSICSJOBSMONITORINGAND ANALISIS
PHYSICSDATA?
18
0
200
400
600
800
2014
Data ingestion Per day GB
requests per day. Direct SQL access is not permitted.
• A generic Java GUI called TIMBER is also provided
as a means to visualize and extract logged data. The
tool is heavily used, with more than 800 active users.
Figure 2: Logging ervice architecture overview.
The Java APIs for both logging and extracting data are
procedures have been he
an understanding of h
performing, in terms of
how it is being done, and
Optimal Use of Softw
The database mod
business logic (written i
Java infrastructure intera
engineered to use the
Oracle, to maximize per
the LS systems are be
aforementioned instrume
which features and tech
performance.
Data Quality Contro
The MDB (introduce
filtering capabilities wh
for long-term storage.
effort [4] to ensure
configurations, the MD
Filters for
data
Reduction
~ 250’000Signals
~ 50 data loading processes
~ 5.5 billion records per day
~ 275 GB / day
à 100 TB / year throughput
~ 1 million signals
~ 300 data loading processes
~ 4 billion records per day
~ 160 GB / day
à 52 TB / year stored
Ø +800 extraction clients
Ø +5 million extraction requests per day
Ø 130 custom applications
Credit: BE-CO-DS
CERN ACCELERATORLOGGING
19
HDFS
Storage
Gobblin
HBase
1min
Compactor
Schema	Partition
Provider
Kafka
Speed
Batch
7	min
1min
7	min
CCDB
Log.	
Proc.
Log.	
Proc.
100mS
Credit: BE-CO-DS
KAFKAAND ACCELERATORLOGGING
20
Database Futures Workshop (May 2017) it-db-nile-admins@cern.ch
Credit: IT-CM-MM
KAFKABASED MONITORING
21
System architecture
4
KAFKABASED SECURITYOPERATIONCENTRE
22
HETEROGENEOUS ENVIRONMENTS
KAFKA ON DEMAND SERVICE
IaaSCLOUD
23
On-demand service approach
Ø Self-Service Cluster Management
Ø Creation, Deletion, Extension
Ø Parameters configuration
Ø Self-Service Topics Management
Ø Configuration, ACLs
Ø Dedicated and shared clusters depending on the use case
Ø Cluster and Topic Owners have administrative rights
Ø But no access to the underlying hardware
Ø Best effort support
KAFKAAS ASERVICE
24
KAFKAAS ASERVICE
Development and Supported Functionalities
Ø Transparent integration with rest of CERN services
Ø Storage: different storage systems depending on the requirements
Ø Cloud based (Openstack) but also bare metal as an option
Ø Monitoring capabilities
Ø Cluster management RestAPI and Web interface
Ø Kafka Mirroring for high availability
Ø Security:
Ø Kerberos + ACLs
Ø SSL
Ø Support for service continuity in case of hardware failure
25
KAFKAAS ASERVICE:KAFKAAND ZOOKEEPER
26
KAFKAAS ASERVICE:CLOUDBASED
Cloud Layer
27
KAFKAAS ASERVICE:STORAGE
Cloud Layer
Storage
28
KAFKAAS ASERVICE:CONFIGURATION
MANAGEMENT
Cloud Layer
Storage
Configuration
Management
29
RestAPI
KAFKAAS ASERVICE:CONFIGURATIONDBAND
APIs
Cloud Layer
Storage
Configuration
Management
Configuration
Management
Meta-Data
30
RestAPI
KAFKAAS ASERVICE:USERWEBINTERFACE
Cloud Layer
Storage
Configuration
Management
Configuration
Management
Meta-Data
Management
Web Interface
31
RestAPI
KAFKAAS ASERVICE:MONITORING
Cloud Layer
Storage
Configuration
Management
Configuration
Management
Meta-Data
Management
Web Interface
Service Monitoring
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the Universe and How Kafka Can Help on That

More Related Content

PPTX
Increase Profits with Better Vehicle Listing Data
PDF
Onboarding process made agile with confluent and flowabl
PPTX
Unlocking the Power of Salesforce Integrations with Confluent
PDF
Five Trends in Real Time Applications
PDF
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
PDF
Confluent Messaging Modernization Forum
PDF
Etl, esb, mq? no! es Apache Kafka®
PDF
Introduction to Apache Kafka and Confluent... and why they matter!
Increase Profits with Better Vehicle Listing Data
Onboarding process made agile with confluent and flowabl
Unlocking the Power of Salesforce Integrations with Confluent
Five Trends in Real Time Applications
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Confluent Messaging Modernization Forum
Etl, esb, mq? no! es Apache Kafka®
Introduction to Apache Kafka and Confluent... and why they matter!

What's hot (20)

PPTX
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
PDF
What every software engineer should know about streams and tables in kafka ...
PDF
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
PDF
Schemas, streams, and grocery stores
PDF
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
PDF
Build Event-Driven Microservices with Confluent Cloud Workshop #1
PPTX
Bank of China (HK) Tech Talk 1: Dive Into Apache Kafka
PDF
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
PPTX
Kafka Deployment to Steel Thread
PDF
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture
PDF
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
PDF
Work is a Stream of Applications (Audun Strand, NAV) Kafka Summit London 2019
PDF
apidays LIVE Australia 2020 - Building an Enterprise Eventing Platform by Gna...
PDF
Confluent & Attunity: Mainframe Data Modern Analytics
PDF
Risk Management in Retail with Stream Processing
PDF
Modernising Change - Lime Point - Confluent - Kong
PDF
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
PDF
Microservices, Kafka Streams and KafkaEsque
PDF
Talking Traffic: Data in the Driver's Seat (Dominique Chanet, Klarrio) Kafka ...
PDF
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
What every software engineer should know about streams and tables in kafka ...
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Schemas, streams, and grocery stores
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Build Event-Driven Microservices with Confluent Cloud Workshop #1
Bank of China (HK) Tech Talk 1: Dive Into Apache Kafka
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
Kafka Deployment to Steel Thread
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
Work is a Stream of Applications (Audun Strand, NAV) Kafka Summit London 2019
apidays LIVE Australia 2020 - Building an Enterprise Eventing Platform by Gna...
Confluent & Attunity: Mainframe Data Modern Analytics
Risk Management in Retail with Stream Processing
Modernising Change - Lime Point - Confluent - Kong
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
Microservices, Kafka Streams and KafkaEsque
Talking Traffic: Data in the Driver's Seat (Dominique Chanet, Klarrio) Kafka ...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Ad

Similar to Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the Universe and How Kafka Can Help on That (20)

PPTX
Intel_IoT_Munich
PDF
ServiceNow Event 15.11.2012 / ITIL for the Enterprise @CERN
PPTX
DA-JPL-final
PDF
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
PDF
Big Data Management at CERN: The CMS Example
PPTX
Big Data for Big Discoveries
PDF
CERN Data Centre Evolution
PDF
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
PDF
InfluxDB at CERN and Its Experiments
PDF
OSMC 2012 | Monitoring at CERN by Christophe Haen
PPTX
CERN IT Monitoring
PPTX
CERN User Story
PPTX
20190314 cern register v3
PDF
OpenStack @ CERN, by Tim Bell
PPTX
The OpenStack Cloud at CERN - OpenStack Nordic
PDF
Designing and building the world largest machine.
PPTX
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
PPTX
20190620 accelerating containers v3
PDF
Big Fast Data in High-Energy Particle Physics
PPTX
CERN Mass and Agility talk at OSCON 2014
Intel_IoT_Munich
ServiceNow Event 15.11.2012 / ITIL for the Enterprise @CERN
DA-JPL-final
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Big Data Management at CERN: The CMS Example
Big Data for Big Discoveries
CERN Data Centre Evolution
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
InfluxDB at CERN and Its Experiments
OSMC 2012 | Monitoring at CERN by Christophe Haen
CERN IT Monitoring
CERN User Story
20190314 cern register v3
OpenStack @ CERN, by Tim Bell
The OpenStack Cloud at CERN - OpenStack Nordic
Designing and building the world largest machine.
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
20190620 accelerating containers v3
Big Fast Data in High-Energy Particle Physics
CERN Mass and Agility talk at OSCON 2014
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
Digital Strategies for Manufacturing Companies
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
System and Network Administration Chapter 2
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Transform Your Business with a Software ERP System
PDF
AI in Product Development-omnex systems
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
history of c programming in notes for students .pptx
PPTX
Introduction to Artificial Intelligence
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPT
Introduction Database Management System for Course Database
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Understanding Forklifts - TECH EHS Solution
PDF
top salesforce developer skills in 2025.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Digital Strategies for Manufacturing Companies
VVF-Customer-Presentation2025-Ver1.9.pptx
Softaken Excel to vCard Converter Software.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
System and Network Administration Chapter 2
PTS Company Brochure 2025 (1).pdf.......
Transform Your Business with a Software ERP System
AI in Product Development-omnex systems
L1 - Introduction to python Backend.pptx
Operating system designcfffgfgggggggvggggggggg
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
history of c programming in notes for students .pptx
Introduction to Artificial Intelligence
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Introduction Database Management System for Course Database
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Understanding Forklifts - TECH EHS Solution
top salesforce developer skills in 2025.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx

Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the Universe and How Kafka Can Help on That