SlideShare a Scribd company logo
Tech view on Regulatory Compliance
MarkLogic User Group Benelux Meetup December 2016
Speaker: Alexander L. de Goeij
About me
• Architect / Consultant
• Financial Services: Core Trading
• Regulations: EMIR, MiFID II
• Architecture: Enterprise / Solution / Project Architect
• Consulting: IT Strategy, implementations, vendor selection, etc.
• Business degree, Tech addiction.
“Regulations really make my life more fun! ”
As said by no-one, ever.
“Regulations really make my life more fun! ”
As said by no-one, ever.
everyone who gets to use cool databases!
exciting
The challenge we think we are facing:
TransformExtract
Source Data
Happy
Regulator
Load Send
extractload
Some Application
The actual challenge we are facing:
Happy
Regulators
DB 1Load
Source Data
Extract
Email
FTP
REST
SOAP
Tool 2Load Extract
Thing NLoad Extract
Database you
didn’t know
still existed
Current solution:
Doesn’t work anymore:
• Auditability / Process checks included in
Regulations.
• Obligation to re-report.
• More complex Ad-Hoc requests from the
Regulator.
• Not suited for Real-Time reporting.
• Waste of money…
What do we need?
• Auditability: keep original data in original format to prove results,
keep track of ‘who-did-what’ with the data.
• Consistency: real-time requirement from regulator demands more
than eventual consistency.
• Forward Flexibility: we know we don’t know what we will have to
report tomorrow.
Looking to technology for a better answer!
Your favorite RDBMS
• ACID, consistent, and blazing fast
if you buy Exadata
• Normalize your way out, and fail.
• Not fit for processing/reporting
across different data objects:
e.g. Trades and Mortgages
• Try to do NoSQL with SQL
(innovative, but terribly slow and
impossible to maintain)
Example of what not to do:
SQL
SQL
MongoDB
• Free! Open Source! GridFS!
• Have to transform data on ingest
(to JSON) as most data is XML
• Eventual consistency (AKA data
loss) means not real-time.
• Good at homogeneous data.
• Still master-slave, and scaling
issues
• Brilliant for RAD / prototyping!
Where things go wrong:
Source: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
Cassandra (DataStax)
• Favors data duplication over normalization
• Very fast (if you duplicate well) but does not do JOINs
• Used by ING as main component of their Risk grid (YouTube)
• Excellent for time series data
Source: https://academy.datastax.com/resources/getting-started-time-series-data-modeling
Hadoop
Source: http://hortonworks.com/products/data-center/hdp/
MarkLogic
• Focused on heterogeneously
structured data
• Bitemporal, if you dare
• Semantics / RDF Triples
• ACID, Consistent, stores original file
• ABAC & redaction in enterprise
version
• Rules, Workflows, Alerts, Triggers
• Not a COTS!
Ok, so now what?
Two approaches to a solution
Infra approach:
• Build everything yourself, use
open source components
E.g.:
• Hadoop
• Cassandra + Kafka
Platform approach:
• Focus on application and
business logic, not on infra
E.g.:
• MarkLogic
• Spark (without Hadoop)
Akka ActorsAkka Actors
Spark
SparkKafkaKafka
Infra approach (SMACK example)
• Used (and designed) by
Netflix, LinkedIn, Uber,
Twitter
• Massive amounts of event
processing (IoT)
• HA and Geo distributed
• Scala, Python, R, Java(Script)
• Asynchronous everywhere
• Near impossible to destroy:
reactive, self-healing, back-
pressure.
Kafka
Akka Actors
Play REST APIs
Cassandra
Spark
Mesos OS
Bare
Metal
Bare
Metal
Bare
Metal
Bare
Metal
Cassandra
Cassandra
Zookeeper
Marathon
Play REST APIsPlay REST APIs
Tech view on Regulatory Compliance
Platform approach
MarkLogic
Insert
Time Series
Database here
Spark
Source Data
Qualitative
Quantitative
Data Flows Data Stores Analytics Feedback Loop
Happy
Regulator
• Schema transformations
• Business Rules
• Workflow
• Rights management
Main take-aways
• There are no one-stop solutions
• Don’t pick bleeding edge stuff if you need it to work
• Focus on Business benefit of investment in Regulatory Compliance
• Separate the platform from the project!
• Start small, think big
Thank you for listening !
Alexander L. de Goeij
alexander@aldg.nl
References
• https://academy.datastax.com/resources/getting-started-time-series-data-modeling
• http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
• http://hortonworks.com/products/data-center/hdp/
• https://www.linkedin.com/pulse/data-hubs-marklogic-vs-hadoop-kurt-cagle
• https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin
• http://www.datanami.com/2015/10/05/how-uber-uses-spark-and-hadoop
• https://blog.twitter.com/2015/handling-five-billion-sessions-a-day-in-real-time
• http://techblog.netflix.com/2013/12/announcing-suro-backbone-of-netflix.html

More Related Content

PDF
Stored Procedure Superpowers: A Developer’s Guide
PPTX
Big Data in the Cloud - Montreal April 2015
PDF
Moving Beyond Batch: Transactional Databases for Real-time Data
PDF
Using a Fast Operational Database to Build Real-time Streaming Aggregations
PDF
Data Warehousing Trends
PPTX
Content Engineering and The Internet of “Smart” Things
PPTX
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
PDF
Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...
Stored Procedure Superpowers: A Developer’s Guide
Big Data in the Cloud - Montreal April 2015
Moving Beyond Batch: Transactional Databases for Real-time Data
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Data Warehousing Trends
Content Engineering and The Internet of “Smart” Things
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...

What's hot (20)

PDF
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
PDF
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
PDF
The lean principles of data ops
PPTX
DITA's New Thang: Going Mapless!
PPTX
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
PDF
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
PPTX
Data-Driven User Experience
PDF
DataOps - Lean principles and lean practices
PDF
LinkedInSaxoBankDataWorkbench
PDF
Ready for Fast Data: How Lightbend Enables Teams To Build Real-Time, Streamin...
PDF
Graphs for Enterprise Architects
PPTX
Preparing Your Legacy Data for Automation in S1000D
PDF
How Verizon Uses Disruptive Developments for Organized Progress
PPT
Intranet show and_tell_2010
PDF
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
PPTX
Content Development: Measuring the Trends
PDF
Implementing and running a secure datalake from the trenches
PDF
Offload, Transform, and Present - the New World of Data Integration
PDF
The State of Streaming Analytics: The Need for Speed and Scale
PPTX
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
The lean principles of data ops
DITA's New Thang: Going Mapless!
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Data-Driven User Experience
DataOps - Lean principles and lean practices
LinkedInSaxoBankDataWorkbench
Ready for Fast Data: How Lightbend Enables Teams To Build Real-Time, Streamin...
Graphs for Enterprise Architects
Preparing Your Legacy Data for Automation in S1000D
How Verizon Uses Disruptive Developments for Organized Progress
Intranet show and_tell_2010
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Content Development: Measuring the Trends
Implementing and running a secure datalake from the trenches
Offload, Transform, and Present - the New World of Data Integration
The State of Streaming Analytics: The Need for Speed and Scale
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
Ad

Viewers also liked (14)

PPTX
Helderheid in Wegdekreflectie CROW infradagen 2016 (Paper 106) 160622
PDF
Testing For Web Accessibility
PDF
What is the Joomla Framework and why do we need it?
PDF
Nghị định 44/2016/NĐ-CP ngày 15 tháng 5 năm 2016 có hiệu lực ngày 01 tháng 7 ...
DOC
44 2016 nd-cp_quy định chi tiết một số điều của luật atvslđ về hoạt động kiểm...
PDF
Your first patch to OpenStack
PDF
De la administración de salario a la gestión de la Recompensa Total
PPTX
Big Data - Hadoop and MapReduce - Aditya Garg
PDF
API Testing
PPTX
The New Gives and Takes in a testers role
PPTX
Blood collection and anticoagulants
PDF
Nghị định số 39/2016/NĐ-CP
Helderheid in Wegdekreflectie CROW infradagen 2016 (Paper 106) 160622
Testing For Web Accessibility
What is the Joomla Framework and why do we need it?
Nghị định 44/2016/NĐ-CP ngày 15 tháng 5 năm 2016 có hiệu lực ngày 01 tháng 7 ...
44 2016 nd-cp_quy định chi tiết một số điều của luật atvslđ về hoạt động kiểm...
Your first patch to OpenStack
De la administración de salario a la gestión de la Recompensa Total
Big Data - Hadoop and MapReduce - Aditya Garg
API Testing
The New Gives and Takes in a testers role
Blood collection and anticoagulants
Nghị định số 39/2016/NĐ-CP
Ad

Similar to Tech view on Regulatory Compliance (20)

PDF
Webinar: How Banks Manage Reference Data with MongoDB
PPTX
Atlanta hadoop users group july 2013
PDF
Hadoop and the Data Warehouse: Point/Counter Point
PPTX
Big Data Strategy for the Relational World
PPTX
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
PPT
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
PDF
Big Data at a Gaming Company: Spil Games
PPTX
L’architettura di Classe Enterprise di Nuova Generazione
PPTX
When to Use MongoDB...and When You Should Not...
PPTX
Options for Data Prep - A Survey of the Current Market
PPTX
Data lake – On Premise VS Cloud
PDF
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
PPTX
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
PDF
Intro to Big Data
PDF
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
PDF
Continuum Analytics and Python
KEY
What ya gonna do?
 
PPTX
Big data in the enterprise: When to use what?
Webinar: How Banks Manage Reference Data with MongoDB
Atlanta hadoop users group july 2013
Hadoop and the Data Warehouse: Point/Counter Point
Big Data Strategy for the Relational World
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
Big Data at a Gaming Company: Spil Games
L’architettura di Classe Enterprise di Nuova Generazione
When to Use MongoDB...and When You Should Not...
Options for Data Prep - A Survey of the Current Market
Data lake – On Premise VS Cloud
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Intro to Big Data
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Continuum Analytics and Python
What ya gonna do?
 
Big data in the enterprise: When to use what?

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
Advanced methodologies resolving dimensionality complications for autism neur...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Understanding_Digital_Forensics_Presentation.pptx
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Tech view on Regulatory Compliance

  • 1. Tech view on Regulatory Compliance MarkLogic User Group Benelux Meetup December 2016 Speaker: Alexander L. de Goeij
  • 2. About me • Architect / Consultant • Financial Services: Core Trading • Regulations: EMIR, MiFID II • Architecture: Enterprise / Solution / Project Architect • Consulting: IT Strategy, implementations, vendor selection, etc. • Business degree, Tech addiction.
  • 3. “Regulations really make my life more fun! ” As said by no-one, ever.
  • 4. “Regulations really make my life more fun! ” As said by no-one, ever. everyone who gets to use cool databases! exciting
  • 5. The challenge we think we are facing: TransformExtract Source Data Happy Regulator Load Send extractload Some Application
  • 6. The actual challenge we are facing: Happy Regulators DB 1Load Source Data Extract Email FTP REST SOAP Tool 2Load Extract Thing NLoad Extract Database you didn’t know still existed
  • 7. Current solution: Doesn’t work anymore: • Auditability / Process checks included in Regulations. • Obligation to re-report. • More complex Ad-Hoc requests from the Regulator. • Not suited for Real-Time reporting. • Waste of money…
  • 8. What do we need? • Auditability: keep original data in original format to prove results, keep track of ‘who-did-what’ with the data. • Consistency: real-time requirement from regulator demands more than eventual consistency. • Forward Flexibility: we know we don’t know what we will have to report tomorrow.
  • 9. Looking to technology for a better answer!
  • 10. Your favorite RDBMS • ACID, consistent, and blazing fast if you buy Exadata • Normalize your way out, and fail. • Not fit for processing/reporting across different data objects: e.g. Trades and Mortgages • Try to do NoSQL with SQL (innovative, but terribly slow and impossible to maintain) Example of what not to do: SQL SQL
  • 11. MongoDB • Free! Open Source! GridFS! • Have to transform data on ingest (to JSON) as most data is XML • Eventual consistency (AKA data loss) means not real-time. • Good at homogeneous data. • Still master-slave, and scaling issues • Brilliant for RAD / prototyping! Where things go wrong: Source: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
  • 12. Cassandra (DataStax) • Favors data duplication over normalization • Very fast (if you duplicate well) but does not do JOINs • Used by ING as main component of their Risk grid (YouTube) • Excellent for time series data Source: https://academy.datastax.com/resources/getting-started-time-series-data-modeling
  • 14. MarkLogic • Focused on heterogeneously structured data • Bitemporal, if you dare • Semantics / RDF Triples • ACID, Consistent, stores original file • ABAC & redaction in enterprise version • Rules, Workflows, Alerts, Triggers • Not a COTS!
  • 15. Ok, so now what?
  • 16. Two approaches to a solution Infra approach: • Build everything yourself, use open source components E.g.: • Hadoop • Cassandra + Kafka Platform approach: • Focus on application and business logic, not on infra E.g.: • MarkLogic • Spark (without Hadoop)
  • 17. Akka ActorsAkka Actors Spark SparkKafkaKafka Infra approach (SMACK example) • Used (and designed) by Netflix, LinkedIn, Uber, Twitter • Massive amounts of event processing (IoT) • HA and Geo distributed • Scala, Python, R, Java(Script) • Asynchronous everywhere • Near impossible to destroy: reactive, self-healing, back- pressure. Kafka Akka Actors Play REST APIs Cassandra Spark Mesos OS Bare Metal Bare Metal Bare Metal Bare Metal Cassandra Cassandra Zookeeper Marathon Play REST APIsPlay REST APIs
  • 19. Platform approach MarkLogic Insert Time Series Database here Spark Source Data Qualitative Quantitative Data Flows Data Stores Analytics Feedback Loop Happy Regulator • Schema transformations • Business Rules • Workflow • Rights management
  • 20. Main take-aways • There are no one-stop solutions • Don’t pick bleeding edge stuff if you need it to work • Focus on Business benefit of investment in Regulatory Compliance • Separate the platform from the project! • Start small, think big
  • 21. Thank you for listening ! Alexander L. de Goeij alexander@aldg.nl
  • 22. References • https://academy.datastax.com/resources/getting-started-time-series-data-modeling • http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/ • http://hortonworks.com/products/data-center/hdp/ • https://www.linkedin.com/pulse/data-hubs-marklogic-vs-hadoop-kurt-cagle • https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin • http://www.datanami.com/2015/10/05/how-uber-uses-spark-and-hadoop • https://blog.twitter.com/2015/handling-five-billion-sessions-a-day-in-real-time • http://techblog.netflix.com/2013/12/announcing-suro-backbone-of-netflix.html