SlideShare a Scribd company logo
@nicolas_frankel
A gentle introduction to Stream
Processing
Nicolas Fränkel
@nicolas_frankel
Me, myself and I
 Former developer, team lead, architect,
blah-blah
 Developer Advocate
 Ramping up on distributed systems
@nicolas_frankel
Hazelcast
HAZELCAST IMDG is an operational,
in-memory, distributed computing
platform that manages data using
in-memory storage and performs
execution for breakthrough
and scale.
HAZELCAST JET is the ultra
fast, application embeddable,
3rd generation stream
processing engine for low
latency batch and stream
processing.
@nicolas_frankel
Schedule
 Why streaming?
 Streaming approaches
 Hazelcast Jet
 Open Data
 General Transit Feed Specification
 The demo!
 Q&A
@nicolas_frankel
In a time before our time…
Data was neatly stored in SQL databases
@nicolas_frankel
The need for Extract Transform Load
 Analytics
• Supermarket sales in the last hour?
 Reporting
• Banking account annual closing
@nicolas_frankel
What SQL really means
 Constraints
 Joints
 Normal forms
@nicolas_frankel
Writes vs. reads
 Normalized vs. denormalized
 Correct vs. fast
@nicolas_frankel
The need for ETL
 Different actors
 With different needs
 Using the same database?
@nicolas_frankel
The batch model
1. Extract
2. Transform
3. Load
@nicolas_frankel
Batches are everywhere!
@nicolas_frankel
Properties of batches
 Scheduled at regular intervals
• Daily
• Weekly
• Monthly
• Yearly
 Run in a specific amount of time
@nicolas_frankel
Oops
 When the execution time overlaps the
next execution schedule
 When the space taken by the data
exceeds the storage capacity
 When the batch fails mid-execution
 etc.
@nicolas_frankel
Big data!
 Parallelize everything
• Map - Reduce
• Hadoop
 NoSQL
• Schema on Read vs. Schema on Write
@nicolas_frankel
Or chunking?
 Keep a cursor
• And only manage “chunks” of data
 What about new data coming in?
@nicolas_frankel
Event-Driven Programming
“Programming paradigm in which the flow of
the program is determined by events such
as user actions (mouse clicks, key presses),
sensor outputs, or messages from other
programs or threads”
-- Wikipedia
@nicolas_frankel
Event Sourcing
“Event sourcing persists the state of a business entity
such an Order or a Customer as a sequence of state-
changing events. Whenever the state of a business
entity changes, a new event is appended to the list of
events. Since saving an event is a single operation, it is
inherently atomic. The application reconstructs an entity’s
current state by replaying the events.”
-- https://microservices.io/patterns/data/event-sourcing.html
@nicolas_frankel
Database internals
 Ordered append-only log
• e.g. MySQL binlog
@nicolas_frankel
Make everything event-based!
@nicolas_frankel
Benefits
 Memory-friendly
 Easily processed
 Pull vs. push
• Very close to real-time
• Keeps derived data in-sync
@nicolas_frankel
From finite datasets to infinite
@nicolas_frankel
Streaming is smart ETL
Processing
Ingest
In-Memory
Operational
Storage
Combine
Join, Enrich,
Group, Aggregate
Stream
Windowing,
Event-Time
Processing
Compute
Distributed and
Parallel
Computation
Transform
Filter, Clean,
Convert
Publish
In-Memory,
Subscriber
Notifications
Notify if response
time is 10% over 24
hour average, second
by second
@nicolas_frankel
Use Case: Analytics and Decision Making
 Real-time dashboards
• Decision making
• Recommendations
 Stats (gaming, infrastructure monitoring)
 Prediction - often based on algorithmic
prediction
• Push stream through ML model
 Complex Event Processing
@nicolas_frankel
Persistent event-storage systems
 Kafka
 Pulsar
@nicolas_frankel
Kafka
 Distributed
 On-disk storage
 Messages sent and read from a topic
• Publish-subscribe
• Queue
 Consumer can keep track of the offset
@nicolas_frankel
In-memory stream processing engines
 Apache Flink
 Amazon Kinesis
 IBM Streams
 Hazelcast Jet
 Apache Beam
• Abstraction over some of the above
 …
@nicolas_frankel
Hazelcast Jet
 Apache 2 Open Source
 Single JAR
 Leverages Hazelcast IMDG
 Unified batch/streaming API
 (Hazelcast Jet Enterprise)
@nicolas_frankel
Hazelcast Jet
@nicolas_frankel
Concept: Pipeline
• Declaration (code) that defines
and links sources, transforms,
and sinks
• Platform-specific SDK (Pipeline
API in Jet)
• Client submits pipeline to the
Stream Processing Engine (SPE)
@nicolas_frankel
Concept: Job
 Running instance of pipeline in SPE
 SPE executes the pipeline
 Code execution
 Data routing
 Flow control
 Parallel and distributed execution
@nicolas_frankel
Imperative model
final String text = "...";
final Map<String, Long> counts = new HashMap<>();
for (String word : text.split("W+")) {
Long count = counts.get(word);
counts.put(count == null ? 1L : count + 1);
}
@nicolas_frankel
Declarative model
Map<String, Long> counts = lines.stream()
.map(String::toLowerCase)
.flatMap(
line -> Arrays.stream(line.split("W+"))
)
.filter(word -> !word.isEmpty())
.collect(Collectors.groupingBy(
word -> word, Collectors.counting())
);
@nicolas_frankel
What Distributed Means to Hazelcast
 Multiple nodes
 Scalable storage and performance
 Elasticity
 Data stored, partitioned and replicated
 No single point of failure
@nicolas_frankel
Distributed Parallel Processing
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<Long, String>map(BOOK_LINES))
.flatMap(line -> traverseArray(line.getValue().split("W+")))
.filter(word -> !word.isEmpty())
.groupingKey(wholeItem())
.aggregate(counting())
.drainTo(Sinks.map(COUNTS));
Data
Sink
Data
Source
from aggrmap filter to
Translate declarative code to a Directed Acyclic Graph
@nicolas_frankel
Node 1
Distributed Parallel Processing
read cmb
map
+
filter
acc sink
read cmb
map
+
filter
acc
Node 2
read cmb
map
+
filter
acc
sinkread cmb
map
+
filter
acc
Data
Source
Data
Sink
sink
sink
@nicolas_frankel
Open Data
« Open data is the idea that some data
should be freely available to everyone to
use and republish as they wish, without
restrictions from copyright, patents or
other mechanisms of control. »
--https://en.wikipedia.org/wiki/Open_data
@nicolas_frankel
Some Open Data initiatives
 France:
• https://www.data.gouv.fr/fr/
 Switzerland:
• https://opendata.swiss/en/
 European Union:
• https://data.europa.eu/euodp/en/data/
@nicolas_frankel
Challenges
1. Access
2. Format
3. Standard
4. Data correctness
@nicolas_frankel
Access
 Download a file
 Access it interactively through a web-
service
@nicolas_frankel
Format
In general, Open Data means Open Format
 PDF
 CSV
 XML
 JSON
 etc.
@nicolas_frankel
Standard
 Let’s pretend the format is XML
• Which grammar is used?
 A shared standard is required
• Congruent to a domain
@nicolas_frankel
Data correctness
"32.TA.66-43","16:20:00","16:20:00","8504304"
"32.TA.66-44","24:53:00","24:53:00","8500100"
"32.TA.66-44","25:00:00","25:00:00","8500162"
"32.TA.66-44","25:02:00","25:02:00","8500170"
"32.TA.66-45","23:32:00","23:32:00","8500170"
@nicolas_frankel
General Transit Feed Specification
”The General Transit Feed Specification (GTFS) […] defines a
common format for public transportation schedules and
associated geographic information. GTFS feeds let public
transit agencies publish their transit data and developers write
applications that consume that data in an interoperable way.”
@nicolas_frankel
GTFS static model
Filename Required Defines
agency.txt Required Transit agencies with service represented in this dataset.
stops.txt Required
Stops where vehicles pick up or drop off riders. Also defines stations and station
entrances.
routes.txt Required Transit routes. A route is a group of trips that are displayed to riders as a single service.
trips.txt Required
Trips for each route. A trip is a sequence of two or more stops that occur during a specific
time period.
stop_times.txt Required Times that a vehicle arrives at and departs from stops for each trip.
calendar.txt
Conditionally
required
Service dates specified using a weekly schedule with start and end dates. This file is
required unless all dates of service are defined in calendar_dates.txt.
calendar_dates.txt
Conditionally
required
Exceptions for the services defined in the calendar.txt. If calendar.txt is omitted, then
calendar_dates.txt is required and must contain all dates of service.
fare_attributes.txt Optional Fare information for a transit agency's routes.
@nicolas_frankel
GTFS static model
Filename Required Defines
fare_rules.txt Optional Rules to apply fares for itineraries.
shapes.txt Optional Rules for mapping vehicle travel paths, sometimes referred to as route alignments.
frequencies.txt Optional
Headway (time between trips) for headway-based service or a compressed representation of fixed-
schedule service.
transfers.txt Optional Rules for making connections at transfer points between routes.
pathways.txt Optional Pathways linking together locations within stations.
levels.txt Optional Levels within stations.
feed_info.txt Optional Dataset metadata, including publisher, version, and expiration information.
translations.txt Optional Translated information of a transit agency.
attributions.txt Optional Specifies the attributions that are applied to the dataset.
@nicolas_frankel
GTFS dynamic model
@nicolas_frankel
Use-case: Swiss Public Transport
 Open Data
 GTFS static available as downloadable
.txt files
 GTFS dynamic available as a REST
endpoint
@nicolas_frankel
The available data model
Where’s the position?!
@nicolas_frankel
The dynamic data pipeline
1. Source: web service
2. Split into trip updates
3. Enrich with trip data
4. Enrich with stop times data
5. Transform hours into timestamp
6. Enrich with location data
7. Sink: Hazelcast IMDG
@nicolas_frankel
Architecture overview
@nicolas_frankel
@nicolas_frankel
Recap
 Streaming has a lot of benefits
 Leverage Open Data
 It’s the Wild West out there
• No standards
• Real-world data sucks!
 But you can get cool stuff done
@nicolas_frankel
Thanks a lot!
 https://blog.frankel.ch/
 @nicolas_frankel
 https://jet.hazelcast.org/
 https://bit.ly/opendataswiss
 https://bit.ly/gtransportfs
 https://bit.ly/jet-train

More Related Content

PPTX
SCALE - Stream processing and Open Data, a match made in Heaven
PPTX
WaJUG - Introduction to data streaming
PPTX
BruJUG - Introduction to data streaming
PPTX
JUG SF - Introduction to data streaming
PPTX
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
PDF
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
PDF
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
PDF
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
SCALE - Stream processing and Open Data, a match made in Heaven
WaJUG - Introduction to data streaming
BruJUG - Introduction to data streaming
JUG SF - Introduction to data streaming
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...

What's hot (20)

PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
PDF
The Future of Real-Time in Spark
PDF
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
PDF
Deep dive into stateful stream processing in structured streaming by Tathaga...
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
PDF
Predictive Maintenance at the Dutch Railways with Ivo Everts
PDF
Streaming SQL
PDF
Introduction to Real-time data processing
PDF
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
PDF
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
PPTX
Introduction to Real-Time Data Processing
PDF
ISNCC 2017
PDF
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
PPTX
Stream Analytics with SQL on Apache Flink
PDF
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
PDF
Streaming Analytics @ Uber
PDF
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
PDF
Introduction to the Processor API
PDF
Apache Flink & Graph Processing
PDF
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
The Future of Real-Time in Spark
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Deep dive into stateful stream processing in structured streaming by Tathaga...
How to understand and analyze Apache Hive query execution plan for performanc...
Predictive Maintenance at the Dutch Railways with Ivo Everts
Streaming SQL
Introduction to Real-time data processing
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Introduction to Real-Time Data Processing
ISNCC 2017
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Stream Analytics with SQL on Apache Flink
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Streaming Analytics @ Uber
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Introduction to the Processor API
Apache Flink & Graph Processing
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
Ad

Similar to vJUG - Introduction to data streaming (20)

PPTX
Devclub.lv - Introduction to stream processing
PPTX
BigData conference - Introduction to stream processing
PPTX
Stream Processing and Real-Time Data Pipelines
PPTX
Trivento summercamp fast data 9/9/2016
PPTX
In-Memory Stream Processing with Hazelcast Jet @JEEConf
PPTX
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
PPTX
Trivento summercamp masterclass 9/9/2016
PDF
Data Stream Processing - Concepts and Frameworks
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
PDF
Building end to end streaming application on Spark
PDF
Architecting applications with Hadoop - Fraud Detection
PPTX
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
PDF
Event Streaming in Academia With John Desjardins | Current 2022
PPTX
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
PPTX
JEEConf 2017 - In-Memory Data Streams With Hazelcast Jet
PPTX
Big Data for QAs
PDF
Building Big Data Streaming Architectures
PPTX
Have your cake and eat it too, further dispelling the myths of the lambda arc...
ODP
Web-scale data processing: practical approaches for low-latency and batch
Devclub.lv - Introduction to stream processing
BigData conference - Introduction to stream processing
Stream Processing and Real-Time Data Pipelines
Trivento summercamp fast data 9/9/2016
In-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
Trivento summercamp masterclass 9/9/2016
Data Stream Processing - Concepts and Frameworks
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Building end to end streaming application on Spark
Architecting applications with Hadoop - Fraud Detection
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
Event Streaming in Academia With John Desjardins | Current 2022
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
JEEConf 2017 - In-Memory Data Streams With Hazelcast Jet
Big Data for QAs
Building Big Data Streaming Architectures
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Web-scale data processing: practical approaches for low-latency and batch
Ad

More from Nicolas Fränkel (20)

PPTX
SnowCamp - Adding search to a legacy application
PPTX
Un CV de dévelopeur toujours a jour
PPTX
Zero-downtime deployment on Kubernetes with Hazelcast
PDF
jLove - A Change-Data-Capture use-case: designing an evergreen cache
PPTX
ADDO - Your own Kubernetes controller, not only in Go
PPTX
TestCon Europe - Mutation Testing to the Rescue of Your Tests
PPTX
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
PPTX
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache
PPTX
JavaDay Istanbul - 3 improvements in your microservices architecture
PPTX
OSCONF Hyderabad - Shorten all URLs!
PPTX
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
PPTX
JOnConf - A CDC use-case: designing an Evergreen Cache
PPTX
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
PPTX
JUG Tirana - Introduction to data streaming
PPTX
Java.IL - Your own Kubernetes controller, not only in Go!
PPTX
London Java Community - An Experiment in Continuous Deployment of JVM applica...
PPTX
OSCONF - Your own Kubernetes controller: not only in Go
PPTX
vKUG - Migrating Spring Boot apps from annotation-based config to Functional
PPTX
Tech talks - 3 performance improvements
PPTX
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...
SnowCamp - Adding search to a legacy application
Un CV de dévelopeur toujours a jour
Zero-downtime deployment on Kubernetes with Hazelcast
jLove - A Change-Data-Capture use-case: designing an evergreen cache
ADDO - Your own Kubernetes controller, not only in Go
TestCon Europe - Mutation Testing to the Rescue of Your Tests
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache
JavaDay Istanbul - 3 improvements in your microservices architecture
OSCONF Hyderabad - Shorten all URLs!
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
JOnConf - A CDC use-case: designing an Evergreen Cache
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
JUG Tirana - Introduction to data streaming
Java.IL - Your own Kubernetes controller, not only in Go!
London Java Community - An Experiment in Continuous Deployment of JVM applica...
OSCONF - Your own Kubernetes controller: not only in Go
vKUG - Migrating Spring Boot apps from annotation-based config to Functional
Tech talks - 3 performance improvements
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...

Recently uploaded (20)

PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
history of c programming in notes for students .pptx
PPTX
Essential Infomation Tech presentation.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
L1 - Introduction to python Backend.pptx
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
ai tools demonstartion for schools and inter college
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Upgrade and Innovation Strategies for SAP ERP Customers
Wondershare Filmora 15 Crack With Activation Key [2025
Internet Downloader Manager (IDM) Crack 6.42 Build 41
VVF-Customer-Presentation2025-Ver1.9.pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
history of c programming in notes for students .pptx
Essential Infomation Tech presentation.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
L1 - Introduction to python Backend.pptx
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
ai tools demonstartion for schools and inter college
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
wealthsignaloriginal-com-DS-text-... (1).pdf
Nekopoi APK 2025 free lastest update
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Understanding Forklifts - TECH EHS Solution
Which alternative to Crystal Reports is best for small or large businesses.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises

vJUG - Introduction to data streaming

Editor's Notes

  • #24: Real-time (latency-sensitive) operations combined with analytics Count usages per CC in last 10 secs, fraud if > 10 Real-time querying Based on analytics, prediction Fraud detection ran overnight has low value Complex event processing Pattern detection (if A and B -> C) SPE runs this at scale Valuable: IOT support. Machine analytics/predictions - fits into AI Without streaming?
  • #35: SP - make use of multi-processor multi-node runtime, Minimize costs: data shuffling, context switching Jet Does Distributed Parallel Processing 1/ Execution plan 2/ Execute it in parallel How can be computation parallelized? Task parallelism - make use of multiprocessor machines Continuous - run in parallel and exchange data MR - just two steps
  • #36: SP - make use of multi-processor multi-node runtime, Minimize costs: data shuffling, context switching 1/ Data parallelism - distribute data partitions among available resources DAG deployed to all cluster members = more cores What can be parallelized? Source - if partitioned, can read in parallel Map/filter - can run in parallel Extend the edges Shuffing / moving data is expensive Keeps data local, even reads it locally when co-located with the data source
  • #47: @startuml class FeedMessage class FeedHeader { gtfs_realtime_version: string timestamp: uint64 } enum Incrementality { FULL_DATASET DIFFERENTIAL } class FeedEntity { id: String is_deleted: boolean } class TripUpdate { timestamp: uint64 delay: int32 } class VehiclePosition { current_stop_sequence: uint32 stop_id: string timestamp: uint64 } enum VehicleStopStatus { INCOMING_AT STOPPED_AT IN_TRANSIT_TO } enum CongestionLevel { UNKNOWN_CONGESTION_LEVEL RUNNING_SMOOTHLY STOP_AND_GO CONGESTION SEVERE_CONGESTION } class Alert enum Cause { UNKNOWN_CAUSE OTHER_CAUSE TECHNICAL_PROBLEM STRIKE DEMONSTRATION ACCIDENT HOLIDAY WEATHER MAINTENANCE CONSTRUCTION POLICE_ACTIVITY MEDICAL_EMERGENCY } enum Effect { NO_SERVICE REDUCED_SERVICE SIGNIFICANT_DELAYS DETOUR ADDITIONAL_SERVICE MODIFIED_SERVICE OTHER_EFFECT UNKNOWN_EFFECT STOP_MOVED } class TimeRange { start: uint64 end: uint64 } class Position { latitude: float longitude: float bearing: float odometer: double speed: float } class TripDescriptor { trip_id: String route_id: String direction_id: uint32 start_time: string start_date: string } class VehicleDescriptor { id: string label: string license_plate: string } class StopTimeUpdate { stop_sequence: uint32 stop_id: string } class StopTimeEvent { delay: uint32 time: int64 uncertainty: int32 } enum ScheduleRelationship { SCHEDULED SKIPPED NO_DATA } class TripDescriptor { trip_id: string route_id: string direction_id: uint32 start_time: string start_date: string } enum ScheduleRelationship2 as "ScheduleRelationship" { SCHEDULED ADDED UNSCHEDULED CANCELED } class EntitySelector { agency_id: string route_id: string route_type: int32 stop_id: string } class Translation { text: string language: string } FeedMessage -up-> "1" FeedHeader: header FeedMessage -down-> "*" FeedEntity: entity FeedHeader -right-> "1" Incrementality FeedEntity --> "0..1" TripUpdate FeedEntity -left-> "0..1" VehiclePosition FeedEntity -right-> "0..1" Alert TripUpdate --> "1" TripDescriptor: trip TripUpdate -left-> "0..1" VehicleDescriptor: vehicle TripUpdate --> "*" StopTimeUpdate StopTimeUpdate -left-> "0..1" StopTimeEvent: arrival StopTimeUpdate -left-> "0..1" StopTimeEvent: departure StopTimeUpdate --> "0..1" ScheduleRelationship TripDescriptor -right-> "0..1" ScheduleRelationship2 VehiclePosition --> "0..1" TripDescriptor: trip VehiclePosition --> "0..1" VehicleDescriptor: vehicle VehiclePosition -left-> "0..1" Position: vehicle VehiclePosition -up-> "0..1" VehicleStopStatus: current_status VehiclePosition -up-> "0..1" CongestionLevel Alert --> "*" TimeRange: active_period Alert --> "1..*" EntitySelector: informed_entity Alert -up-> "0..1" Cause Alert -up-> "0..1" Effect Alert -right-> "0..1" TranslatedString: url Alert -right-> "1" TranslatedString: header_text Alert -right-> "1" TranslatedString: description_text EntitySelector --> "0..1" TripDescriptor: trip TranslatedString --> "1..*" Translation note left of FeedMessage: Root message hide empty members @enduml
  • #51: node "Hazelcast Jet" as jet { database "Hazelcast IMDG" as imdg artifact "Load reference data Job" as staticjob artifact "Load dynamic data Job" as dynamicjob folder "Reference data files" as refdata { file trips.txt file routes.txt } } component "Reference data loader" <<Loader>> as staticloader component "Dynamic data loader" <<Loader>> as dynamicloader component "Web application" <<Spring Boot>> as webapp cloud { interface "Open Data endpoint" as ws } staticloader --> staticjob: Send job staticjob --> refdata: Read files staticjob --> imdg: Store JSON dynamicloader --> dynamicjob: Send job dynamicjob -right-> ws: Call REST endpoint dynamicjob --> imdg: Store JSON webapp -left-> imdg: Register to changes