SlideShare a Scribd company logo
Big Data
BluePrint
Architect for change
@daangerits
#bdbp
Who am I?
@daangerits
daan@bigboards.io
Agenda
Concepts
Architecture
Examples
Concepts
TransCo
Meet TransCo - Parcel delivery service
Common interactions
A customer requesting a quote
A website visitor clicking on a link
Booking a financial transaction
A delivery truck pinging its GPS
coördinates
TransCo
All these have a similar thing:
Events
IT
Finance
Legal
Logistics
Sales
Communications
...
Events
Events used to manipulate our
master data
Events
Today, events ARE our master data
Anatomy of an event
Timestamp
When did it
happen?
Origin
Where did it
came from?
Actor
Who did it?
Subject
Who was
affected?
Facts
What
changed?
Event
Anatomy of an event - example
2014-05-03
13:40:51
timestamp
CRM
Application
origin
Daan
Gerits
actor
Alfred
Hitchcock
subject
street=”...”
vat=”...”
facts
Event
Architecture
Store
View
Generator
View
Generator
Overview
Translate entities
into events and
facts.
Resolve values to
ids. Especially
subject, actor and
origin.
Explode a single
fact to multiple
rollup levels. Only
explode if
applicable.
Store the raw
events so we can
replay whenever
we want.
DetonatorLinkerTranslator
Ingest View generators
can perform
analytical tasks on
the incoming
events.
The generated
view can be
stored in a storage
system of choice.
S
I
T L D
V
V
Ingest
S
I
T L D
V
V
Get records in from other systems
- Event Bus/Broker
- Ingestion System like Flume / Sqoop / …
- ETL processes (not recommended)
- Backups
- Nagios / Statsd / Ganglia / ...
Translator
Convert records into events
- 1 record field = 1 fact
- record timestamp vs generated timestamp
Only store changed facts
- What changed?
- Compare with existing views
S
I
T L D
V
V
Store
Persist the events as they are
Raw Data
- Source of truth
- Recovery
Optimize Storage
- Parquet, Avro, Thrift, ...
S
I
T L D
V
V
Linker
Resolve event fields
- “Daan Gerits” == id 44543-45436-9928
Optimize for speed
- Use lookup tables
- Group data if needed
S
I
T L D
V
V
Detonator
Explode a fact to multiple rollup levels
Why?
- Real-time rollups
- Running analytics
When?
- if there is an hierarchy in actor or actee
- if there is an hierarchy in timestamp
S
I
T L D
V
V
IN OUT
{ts: 2014-05-19, fact: …} {ts: 2014-05-19, fact: …}
{ts: 2014-05, fact: …}
{ts: 2014, fact: …}
View Generator
Use facts to generate a view
A view is
- != database view
- read-only
- optimised data model for a single purpose
- disposable
- based on all facts (facts depth & width)
A view generator manipulates
- RDBMs, graphs, search indexes, ...
S
I
T L D
V
V
Rules of the game
Only add and remove are allowed
Events are re-playable
Remove only be done by BDA’s (Big Data Administrators)
Example
Add Customer
IN:
processing system: CRM
user: “fbaker”
data: { id: “9332-DG”, name: ”Daan Gerits”, address: “container 9” }
DATA:
event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
Update Customer
IN:
processing system: ERP
user: “wvl”
data: { id: “9332-DG”, address: “container 24” }
DATA:
event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
DELETE Customer
IN:
processing system: ERP
user: “fbaker”
data: { id: “9332-DG” }
DATA:
event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
63 erp fbaker 9332-DG 20141201 address
63 erp fbaker 9332-DG 20141201 name
Aaaarrgghhh!!
IN:
processing system: ERP
user: “fbaker”
data: { id: “9332-DG” }
event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
63 erp fbaker 9332-DG 20141201 address
63 erp fbaker 9332-DG 20141201 name
64 erp wvl 9332-DG 20141109 address container 24
64 crm fbaker 9332-DG 20140514 name Daan Gerits
Allows fact trending
driver statistics for his whole career
Allows state regeneration
the state of all facts on februari 12, 2005
Is human-error-proof
remove the facts with eventId #
Scales very well
Conclusion
We don’t hire
datascientists, architects,
developers, ux designers
or engineers.
We hire individuals
ShamelessPlug
ThankYou!

More Related Content

PDF
Trumania: generate all the things!
PDF
Dotnet titles
PDF
Big Data & the importance of Data Science
PPTX
Apache kafka
PDF
Start small bigger biggest
PDF
INF2190_W1_2016_public
PDF
fundamentalsofeventdrivenmicroservices11728489736099.pdf
PDF
Intro to big data and applications - day 2
Trumania: generate all the things!
Dotnet titles
Big Data & the importance of Data Science
Apache kafka
Start small bigger biggest
INF2190_W1_2016_public
fundamentalsofeventdrivenmicroservices11728489736099.pdf
Intro to big data and applications - day 2

Similar to Big Data BluePrint (20)

PPT
CS8091_BDA_Unit_I_Analytical_Architecture
PPTX
Big data webinar may23 nrit by sunil
PDF
Big Data overview
PPTX
Big Data: Architectures and Approaches
PPTX
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
PPTX
Big Data
PPTX
Fundamentals of Big Data
PDF
Big Data and Fast Data combined – is it possible?
PPTX
What is the concept of Big Data?
PPTX
BIG DATA INTRO , bigdata_intro , Hadoop PPT
PPTX
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
PDF
Demystify Big Data, Data Science & Signal Extraction Deep Dive
PDF
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
PPTX
Architecting for Big Data: Trends, Tips, and Deployment Options
PDF
Future of Power: Big Data - Søren Ravn
PPT
Introduction to Big Data An analogy between Sugar Cane & Big Data
PDF
Demystify big data data science
PDF
Business with Big data
PDF
What_BigData_means_to_your_organization
CS8091_BDA_Unit_I_Analytical_Architecture
Big data webinar may23 nrit by sunil
Big Data overview
Big Data: Architectures and Approaches
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Big Data
Fundamentals of Big Data
Big Data and Fast Data combined – is it possible?
What is the concept of Big Data?
BIG DATA INTRO , bigdata_intro , Hadoop PPT
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Architecting for Big Data: Trends, Tips, and Deployment Options
Future of Power: Big Data - Søren Ravn
Introduction to Big Data An analogy between Sugar Cane & Big Data
Demystify big data data science
Business with Big data
What_BigData_means_to_your_organization
Ad

Recently uploaded (20)

PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPT
Predictive modeling basics in data cleaning process
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Inferential Statistics.pptx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
How to run a consulting project- client discovery
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
[EN] Industrial Machine Downtime Prediction
DOCX
Factor Analysis Word Document Presentation
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Database Infoormation System (DBIS).pptx
SAP 2 completion done . PRESENTATION.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Predictive modeling basics in data cleaning process
Optimise Shopper Experiences with a Strong Data Estate.pdf
Pilar Kemerdekaan dan Identi Bangsa.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Inferential Statistics.pptx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
How to run a consulting project- client discovery
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
[EN] Industrial Machine Downtime Prediction
Factor Analysis Word Document Presentation
STERILIZATION AND DISINFECTION-1.ppthhhbx
Database Infoormation System (DBIS).pptx
Ad

Big Data BluePrint

  • 1. Big Data BluePrint Architect for change @daangerits #bdbp
  • 5. TransCo Meet TransCo - Parcel delivery service
  • 6. Common interactions A customer requesting a quote A website visitor clicking on a link Booking a financial transaction A delivery truck pinging its GPS coördinates
  • 7. TransCo All these have a similar thing: Events IT Finance Legal Logistics Sales Communications ...
  • 8. Events Events used to manipulate our master data
  • 9. Events Today, events ARE our master data
  • 10. Anatomy of an event Timestamp When did it happen? Origin Where did it came from? Actor Who did it? Subject Who was affected? Facts What changed? Event
  • 11. Anatomy of an event - example 2014-05-03 13:40:51 timestamp CRM Application origin Daan Gerits actor Alfred Hitchcock subject street=”...” vat=”...” facts Event
  • 13. Store View Generator View Generator Overview Translate entities into events and facts. Resolve values to ids. Especially subject, actor and origin. Explode a single fact to multiple rollup levels. Only explode if applicable. Store the raw events so we can replay whenever we want. DetonatorLinkerTranslator Ingest View generators can perform analytical tasks on the incoming events. The generated view can be stored in a storage system of choice. S I T L D V V
  • 14. Ingest S I T L D V V Get records in from other systems - Event Bus/Broker - Ingestion System like Flume / Sqoop / … - ETL processes (not recommended) - Backups - Nagios / Statsd / Ganglia / ...
  • 15. Translator Convert records into events - 1 record field = 1 fact - record timestamp vs generated timestamp Only store changed facts - What changed? - Compare with existing views S I T L D V V
  • 16. Store Persist the events as they are Raw Data - Source of truth - Recovery Optimize Storage - Parquet, Avro, Thrift, ... S I T L D V V
  • 17. Linker Resolve event fields - “Daan Gerits” == id 44543-45436-9928 Optimize for speed - Use lookup tables - Group data if needed S I T L D V V
  • 18. Detonator Explode a fact to multiple rollup levels Why? - Real-time rollups - Running analytics When? - if there is an hierarchy in actor or actee - if there is an hierarchy in timestamp S I T L D V V IN OUT {ts: 2014-05-19, fact: …} {ts: 2014-05-19, fact: …} {ts: 2014-05, fact: …} {ts: 2014, fact: …}
  • 19. View Generator Use facts to generate a view A view is - != database view - read-only - optimised data model for a single purpose - disposable - based on all facts (facts depth & width) A view generator manipulates - RDBMs, graphs, search indexes, ... S I T L D V V
  • 20. Rules of the game Only add and remove are allowed Events are re-playable Remove only be done by BDA’s (Big Data Administrators)
  • 22. Add Customer IN: processing system: CRM user: “fbaker” data: { id: “9332-DG”, name: ”Daan Gerits”, address: “container 9” } DATA: event ID origin actor subject timestamp fact value 1 crm fbaker 9332-DG 20140514 name Daan Gerits 1 crm fbaker 9332-DG 20140514 address container 9
  • 23. Update Customer IN: processing system: ERP user: “wvl” data: { id: “9332-DG”, address: “container 24” } DATA: event ID origin actor subject timestamp fact value 1 crm fbaker 9332-DG 20140514 name Daan Gerits 1 crm fbaker 9332-DG 20140514 address container 9 39 erp wvl 9332-DG 20141109 address container 24
  • 24. DELETE Customer IN: processing system: ERP user: “fbaker” data: { id: “9332-DG” } DATA: event ID origin actor subject timestamp fact value 1 crm fbaker 9332-DG 20140514 name Daan Gerits 1 crm fbaker 9332-DG 20140514 address container 9 39 erp wvl 9332-DG 20141109 address container 24 63 erp fbaker 9332-DG 20141201 address 63 erp fbaker 9332-DG 20141201 name
  • 25. Aaaarrgghhh!! IN: processing system: ERP user: “fbaker” data: { id: “9332-DG” } event ID origin actor subject timestamp fact value 1 crm fbaker 9332-DG 20140514 name Daan Gerits 1 crm fbaker 9332-DG 20140514 address container 9 39 erp wvl 9332-DG 20141109 address container 24 63 erp fbaker 9332-DG 20141201 address 63 erp fbaker 9332-DG 20141201 name 64 erp wvl 9332-DG 20141109 address container 24 64 crm fbaker 9332-DG 20140514 name Daan Gerits
  • 26. Allows fact trending driver statistics for his whole career Allows state regeneration the state of all facts on februari 12, 2005 Is human-error-proof remove the facts with eventId # Scales very well Conclusion
  • 27. We don’t hire datascientists, architects, developers, ux designers or engineers. We hire individuals ShamelessPlug ThankYou!