How we have grown 10x within 2 years

REAL-TIME DATA PROCESSING AT RTB HOUSEREAL-TIME DATA PROCESSING AT RTB HOUSE
Real-Time Data
Processing
at RTB House
How we have grown 10x within 2 years
Bartosz Łoś, 2019

AGENDA
●
our RTB platform
●
the previous iterations: three different architectures

AGENDA
●
our RTB platform
●
●
the fourth iteration: multi-dc architecture

AGENDA
●
our RTB platform
●
●
●
our use cases: requirements and processing patterns

AGENDA
●
our RTB platform
●
●
●
our use cases: requirements and processing patterns
●
kafka workers

THE 1ST ITERATION: MUTABLE IMPRESSIONS

THE 2ND ITERATION: LAMBDA ARCHITECTURE

THE 3RD ITERATION: IMMUTABLE STREAMS OF EVENTS

THE FOURTH ITERATION:
MULTI-DC

THE 4TH ITERATION: MAIN CHANGES
●
10x larger scale:
●
from 350K to 3.5M bid requests/s within 2 years

●
10x larger scale:
●
●
full multi-dc architecture:
●
synchronization of user profiles
●
merging streams of events

●
10x larger scale:
●
●
●
●
●
fixed partitioning in all DCs:
●
parallelism, merging, end-to-end lag

●
10x larger scale:
●
●
●
●
●
●
●
end-to-end exactly-once processing:
●
at-least-once output semantics & deduplication

●
10x larger scale:
●
●
●
●
●
●
●
end-to-end exactly-once processing:
●
at-least-once output semantics & deduplication
●
a few better components:
●
new stats-counter, new data-flow
●
logstash
●
merger, dispatcher & loader

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

STATS-COUNTER: STORM TOPOLOGY (THE 2ND ITERATION)

APACHE STORM: TRIDENT + EXACTLY-ONCE STATE

APACHE STORM: PARALLELISM MODEL

DATA-FLOW: KAFKA STREAMS (THE 4TH ITERATION)

KAFKA STREAMS: PARALLELISM MODEL

KAFKA STREAMS: EXACTLY-ONCE DELIVERY
Kafka Streams:
●
processing.guarantee = exactly-once

Kafka Streams:
●
Producer:
●
transactions
●
enable.idempotence = true

Kafka Streams:
●
Producer:
●
transactions
●
enable.idempotence = true
Consumer:
●
isolation.level = read_committed

KAFKA WORKERS: MAIN FEATURES
●
higher level of distribution

●
public interface WorkerPartitioner<K, V> {
int subpartition(ConsumerRecord<K, V> consumerRecord);
}

●
●
possibility to pause and resume processing for given partition

●
●
public interface WorkerTask<K, V> {
boolean accept(WorkerRecord<K, V> record);
void process(WorkerRecord<K, V> record, RecordStatusObserver observer);
}

●
●
●
asynchronous processing
●
tighter control of offsets commits
●
backpressure
●
processing timeouts

●
●
●
●
●
backpressure
●
processing timeouts
public interface RecordStatusObserver {
void onSuccess();
void onFailure(Exception exception);
}

●
●
●
●
●
backpressure
●
processing timeouts
●
at-least-once semantics

●
●
●
●
●
backpressure
●
processing timeouts
●
●
handling failures

●
●
●
●
●
backpressure
●
processing timeouts
●
●
handling failures
●
kafka-to-kafka, hdfs, bigquery, elasticsearch connectors

●
●
●
●
●
backpressure
●
processing timeouts
●
●
handling failures
●
kafka-to-kafka, hdfs, bigquery, elasticsearch connectors
●
github.com/RTBHOUSE/kafka-workers

KAFKA WORKERS: PARALLELISM MODEL

THE 5TH ITERATION: KAFKA WORKERS

How we have grown 10x within 2 years

More Related Content

What's hot (20)

Similar to How we have grown 10x within 2 years (20)

Recently uploaded (20)

How we have grown 10x within 2 years