SlideShare a Scribd company logo
Andrei Varanovich, InSpark
Lambda Architecture in
the Cloud with Azure
Databricks
#SAISDev6
Selfie
Data & AI Lead
@DrGigabit
andrei.varanovich
andrei.varanovich@inspark.nl
Data
Programmability
Cloud
High-
performance
teams
Neural
Networks
##SAISDev6
Big Data problem is many
small data problems
##SAISDev6
4
2.500.000 visitors per year
8.000 objects of art and history
1.000.000 objects stored from
the year 1200
##SAISDev6
Under the hood
5
Retail
Sponsorship engagement
Occupancy rate
Service Management
Incident Management
Marketing Performance
Warehouse Management
Capacity Planning
Financial Performance
Ticket Sales
##SAISDev6
THE DATA JOURNEY
6##SAISDev6
Consolidation
Insights
Innovation
Consolidate data
in a centralized
store
Organizational
processes and
efficiency
New ideas,
leveraging
machine learning
7
IN THE NEED FOR THE PLATFORM
START
• Begin small and
focused
• Prove value
GROW
• Grow organically
as more use cases
arise
SCALE
• Go production and
scale to the revel
required
We are in the need of the truly elastic data platform, to avoid any upfront planning, deployment and
operations expenses, and put business value discovery first. The platform should support the [big]data
projects in any stage, without the need to reengineer the whole solution.
##SAISDev6
Lambda Architecture on Azure
8
INGEST BATCH
INGEST STREAM STORE ANALYZE
Azure Data
Lake Store
Azure Data Factory
Azure Databricks
(managed Spark;
batch & streaming)
Social
LOB
Graph
IoT
Image
CRM
AI models/ APIs
Cognitive Services
Azure container
Service & registry
Insight sharing
Power BI/ other tools
Event Hubs
Stream
Batch
SECURITY &
MANAGEMENT Azure Log
Analytics
Azure Graph
API
Cost
monitoring
Azure Active Directory
##SAISDev6
9
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Production jobs & workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
##SAISDev6
10
Simplicity is the ultimate sophistication
Leonardo da Vinci
##SAISDev6
11
LAMBDA TO THE RESCUE
12
##SAISDev6
##SAISDev6 13
Composition of
functions is applying
one function to the
result of another
14
f(x) = x+1
(g º f)(x) = g(f(x))
(g º f)(x) = (x+1)2
g(x) = x2
input input+1 input2
##SAISDev6
input+1
(input+1)2
15
Transformation pipeline as a series of transitions
s1
f3
f2
f1
sum
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Production jobs & workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
##SAISDev6
Conclusions
17
… with proper design, the features come cheaply. This
approach is arduous, but continues to succeed.
—Dennis Ritchie
##SAISDev6
• Standardization on Apache Spark allows us to move forward without
introducing extra complexity.
• 100% PaaS offering is important – no need to maintain the
infrastructure. All components we use offered as PaaS on Azure.
• Data pipelines as function composition allows us to ensure end-to-end
consistency and spot the errors quickly.
• Saving intermediate states allows to quickly inspect the data sets.
Thank you!
18
##SAISDev6
Questions?
19
##SAISDev6

More Related Content

PPTX
Building Modern Data Platform with Microsoft Azure
PDF
Data Mesh for Dinner
PDF
Azure Data Factory presentation with links
PDF
Modern Data architecture Design
PPTX
Migration to Databricks - On-prem HDFS.pptx
PPTX
Azure Synapse Analytics Overview (r2)
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PDF
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Building Modern Data Platform with Microsoft Azure
Data Mesh for Dinner
Azure Data Factory presentation with links
Modern Data architecture Design
Migration to Databricks - On-prem HDFS.pptx
Azure Synapse Analytics Overview (r2)
Enabling a Data Mesh Architecture with Data Virtualization
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)

What's hot (20)

PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
PPTX
Azure Synapse Analytics Overview (r1)
PDF
Azure Data Factory V2; The Data Flows
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PDF
Data Platform Architecture Principles and Evaluation Criteria
PPTX
Microsoft Azure Data Factory Hands-On Lab Overview Slides
PPTX
Azure Data Factory
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PDF
Azure Data Factory Introduction.pdf
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
PDF
Graph Databases for Master Data Management
PPTX
Data Lakehouse Symposium | Day 4
PPTX
Microsoft Data Platform - What's included
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PPTX
Databricks Fundamentals
PDF
Webinar Data Mesh - Part 3
PDF
The ABCs of Treating Data as Product
PPTX
Azure Data Factory Data Flows Training (Sept 2020 Update)
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Azure Synapse Analytics Overview (r1)
Azure Data Factory V2; The Data Flows
Modern Data Warehousing with the Microsoft Analytics Platform System
Data Lakehouse Symposium | Day 1 | Part 1
Data Platform Architecture Principles and Evaluation Criteria
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Azure Data Factory
Building Lakehouses on Delta Lake with SQL Analytics Primer
Azure Data Factory Introduction.pdf
Architect’s Open-Source Guide for a Data Mesh Architecture
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Graph Databases for Master Data Management
Data Lakehouse Symposium | Day 4
Microsoft Data Platform - What's included
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Databricks Fundamentals
Webinar Data Mesh - Part 3
The ABCs of Treating Data as Product
Azure Data Factory Data Flows Training (Sept 2020 Update)
Ad

Similar to Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich (20)

PDF
Managing data analytics in a hybrid cloud
PDF
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
PDF
Databricks + Snowflake: Catalyzing Data and AI Initiatives
PDF
Talend introduction v1
PDF
Microsoft R Server for Data Sciencea
PDF
Azure Synapse 101 Webinar Presentation
PDF
Achieving Business Value by Fusing Hadoop and Corporate Data
PDF
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
PDF
Trivadis - Microsoft Transform your data estate with cloud, data and AI
PPTX
Azure Databricks - An Introduction (by Kris Bock)
PDF
Introduction to Azure Synapse Webinar
PDF
EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads
PPTX
Azure Databricks & Spark @ Techorama 2018
PDF
Be the Data Hero in Your Organization with SAP and CA Analytic Solutions
PDF
SAP LEONARDO SAP LEONARDO the digital digital innovation innovation innovatio...
PDF
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
PDF
ASUG SAPPHIRENOW 2017 - SAP Leonardo Internet of Things - Briefing Book
PDF
Analytics in a Day Ft. Synapse Virtual Workshop
 
PDF
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
PDF
SAP IQ 16 Product Annoucement
Managing data analytics in a hybrid cloud
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Talend introduction v1
Microsoft R Server for Data Sciencea
Azure Synapse 101 Webinar Presentation
Achieving Business Value by Fusing Hadoop and Corporate Data
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Trivadis - Microsoft Transform your data estate with cloud, data and AI
Azure Databricks - An Introduction (by Kris Bock)
Introduction to Azure Synapse Webinar
EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads
Azure Databricks & Spark @ Techorama 2018
Be the Data Hero in Your Organization with SAP and CA Analytic Solutions
SAP LEONARDO SAP LEONARDO the digital digital innovation innovation innovatio...
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
ASUG SAPPHIRENOW 2017 - SAP Leonardo Internet of Things - Briefing Book
Analytics in a Day Ft. Synapse Virtual Workshop
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
SAP IQ 16 Product Annoucement
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
PDF
Machine Learning CI/CD for Email Attack Detection
PDF
Jeeves Grows Up: An AI Chatbot for Performance and Quality
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Machine Learning CI/CD for Email Attack Detection
Jeeves Grows Up: An AI Chatbot for Performance and Quality

Recently uploaded (20)

PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Business Analytics and business intelligence.pdf
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to machine learning and Linear Models
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Qualitative Qantitative and Mixed Methods.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
Business Analytics and business intelligence.pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
.pdf is not working space design for the following data for the following dat...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to machine learning and Linear Models
Introduction-to-Cloud-ComputingFinal.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Acumen Training GuidePresentation.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Fluorescence-microscope_Botany_detailed content
Reliability_Chapter_ presentation 1221.5784
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

  • 1. Andrei Varanovich, InSpark Lambda Architecture in the Cloud with Azure Databricks #SAISDev6
  • 2. Selfie Data & AI Lead @DrGigabit andrei.varanovich andrei.varanovich@inspark.nl Data Programmability Cloud High- performance teams Neural Networks ##SAISDev6
  • 3. Big Data problem is many small data problems ##SAISDev6
  • 4. 4 2.500.000 visitors per year 8.000 objects of art and history 1.000.000 objects stored from the year 1200 ##SAISDev6
  • 5. Under the hood 5 Retail Sponsorship engagement Occupancy rate Service Management Incident Management Marketing Performance Warehouse Management Capacity Planning Financial Performance Ticket Sales ##SAISDev6
  • 6. THE DATA JOURNEY 6##SAISDev6 Consolidation Insights Innovation Consolidate data in a centralized store Organizational processes and efficiency New ideas, leveraging machine learning
  • 7. 7 IN THE NEED FOR THE PLATFORM START • Begin small and focused • Prove value GROW • Grow organically as more use cases arise SCALE • Go production and scale to the revel required We are in the need of the truly elastic data platform, to avoid any upfront planning, deployment and operations expenses, and put business value discovery first. The platform should support the [big]data projects in any stage, without the need to reengineer the whole solution. ##SAISDev6
  • 8. Lambda Architecture on Azure 8 INGEST BATCH INGEST STREAM STORE ANALYZE Azure Data Lake Store Azure Data Factory Azure Databricks (managed Spark; batch & streaming) Social LOB Graph IoT Image CRM AI models/ APIs Cognitive Services Azure container Service & registry Insight sharing Power BI/ other tools Event Hubs Stream Batch SECURITY & MANAGEMENT Azure Log Analytics Azure Graph API Cost monitoring Azure Active Directory ##SAISDev6
  • 9. 9 Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Production jobs & workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST ##SAISDev6
  • 10. 10 Simplicity is the ultimate sophistication Leonardo da Vinci ##SAISDev6
  • 11. 11
  • 12. LAMBDA TO THE RESCUE 12 ##SAISDev6
  • 13. ##SAISDev6 13 Composition of functions is applying one function to the result of another
  • 14. 14 f(x) = x+1 (g º f)(x) = g(f(x)) (g º f)(x) = (x+1)2 g(x) = x2 input input+1 input2 ##SAISDev6 input+1 (input+1)2
  • 15. 15 Transformation pipeline as a series of transitions s1 f3 f2 f1 sum
  • 16. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Production jobs & workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST ##SAISDev6
  • 17. Conclusions 17 … with proper design, the features come cheaply. This approach is arduous, but continues to succeed. —Dennis Ritchie ##SAISDev6 • Standardization on Apache Spark allows us to move forward without introducing extra complexity. • 100% PaaS offering is important – no need to maintain the infrastructure. All components we use offered as PaaS on Azure. • Data pipelines as function composition allows us to ensure end-to-end consistency and spot the errors quickly. • Saving intermediate states allows to quickly inspect the data sets.