SlideShare a Scribd company logo
Data exposure in Azure:
Production use-case
April, 2018
2
FEW WORDS ABOUT MYSELF…
I’m Alexander Laysha
• Solution Architect at EPAM Systems
• Co-Head of TR Cloud Center of Excellence at EPAM Systems
• Microsoft Azure MVP
• Focused on backend cloud solutions
• Leader of Belarus Azure Community
My contacts
• Email: layshaalex@gmail.com
• Twitter: @layshaalexander
• Facebook: alexander.laysha
3
• Overview & Requirements
• OData API Solution
• Data Abstraction Solution
• Tabular Model Solution
• Summary
AGENDA
4
OVERVIEW & REQUIREMENTS
5
• Сustomer has 1000+ staging data warehouses
for it’s clients with 6TB of data in overall
• Customer clients has access to data from
staging data warehouses through OLAPs that
are consumed from web applications
CONTEXT
SSIS Server
- Orchestrator
- Packages
- Jobs
On-Premise
– OLTP Source
– 7 DB Servers
Master DB
Tenant Data
Warehouses
SSAS Servers
- Dashboard CUBES
- Benchmark CUBES
Web APP
– DASHBOARDS
- Reporting
Company
Databases
Benchmark
Database
SSIS Server
- Orchestrator
- Packages
- Jobs
6
• Integration Client - would like to perform full & incremental extract of it's own raw data from
staging data warehouses
• BI Client - would like to connect to it’s data in staging data warehouses using own BI Tools
NEW USE-CASES
7
• Data warehouses are exposed as read-only sources
• Client can’t connect to data warehouse directly because of customer security policies
• Source database schema change propagation
• Multi-tenant support & strong tenant data isolation
• Azure AD authentication
• Role-based data access
• Row-level security in the future
• Support of major BI Tools: PowerBI, Tableau, Qlik
• Clients should not wait hours to load their data
• Data warehouses should not be affected by load spikes
ARCHITECTURE REQUIREMENTS
8
PROPOSED ARCHITECTURE APPROACHES
• OData API Solution – abstracts clients
from staging data warehouses and
provides integration points for clients
• Data Abstraction Solution – contains ETL
process to extract, transform and load
data into separate storage that acts as
an integration point for clients
• Tabular Model Solution – memory
optimized databased for analytical
workloads hosted in Analysis Service. It
extracts data from staging data
warehouses and provides integration
point for clients
Storage Area
Data
Warehouse
Tenant1
Data
Warehouse
Tenant2
Data
Warehouse
Tenant3
...
Data
Warehouse
TenantN
Consumers Area
Power BI Tableau Excel Any Compatible
Client
Exposure Area
OData API
Solution
Data
Abstraction
Solution
Tabular Model
Solution
Authentication/
Authorization
9
ODATA API SOLUTION
10
SOLUTION OVERVIEW
Storage Area
- Data Warehouses- Solution Components
Master DB
Data
Warehouse
Tenant1
Data
Warehouse
Tenant2
...
...
Data
Warehouse
TenantN
Exposure Area
Configuration Logging &
Monitoring
Consumers Area
Power BI Tableau Excel Any Compatible
Client
Authentication
BasicOpenID/OAuth 2.0
Authorization
Table-levelTenant-level
API
ODATA Engine (Maskx.Odata)
Data Access
11
AZURE ARCHITECTURE
Client Autoscaling
Azure Key Vault
Master DB
Client Azure AD
API App
Instance #1
API App
Instance #N
Custom OData API
Storage Area
TCP
Application
Insights
HTTPSInternet
Authentication
Configuration Logging & monitoring
Data Warehouse
Tenant1
Data
Warehouse
TenantN
Azure
AD
(B2B or B2C)
Authentication
Data
Warehouse
Tenant2
Read-only replica
12
• Most BI tools support an “extract” integration model when using ODATA API as a data source.
• No possibility to perform incremental extract, only reloading the entire dataset on schedule.
“Incremental load” feature is in development now.
• Power BI has strict 2-hour timeout that cannot be exceeded during the data import, which raises
additional concerns on exposing large data sets using the API.
• Power BI requires API endpoint and Azure AD tenant to be hosted under custom domain name to
work with Azure AD authentication.
• For Azure AD B2C prototyping we had to develop a custom policy using the Identity Experience
Framework, which may introduce additional risk since the feature is still in the public preview.
• Lack of developer documentation for the Identity Experience Framework: we had to submit issue
for assistance to Microsoft team.
SOLUTION LIMITATIONS
13
Solution Pain Points
• BI tools use “extract” integration model when working with API.
• Most of BI tools are not able to extract data from API incrementally. Can be potentially mitigated by some pre-aggregation
on the database side.
Solution Benefits
• Simple on-boarding mechanism for new clients since HTTP-based integration does not introduce any dependency on the
implementation stack.
• Standardized mechanism of exposing datasets with their metadata. Good number of client libraries available for wide set
of programming languages.
• Ability for clients to select only subset of data.
• Solution allows adopting authentication and authorization logic already applied in customer company. Level of flexibility is
higher comparing to direct usage of Azure services like Azure Storage.
Optimal strategy: exposing Custom API along with alternative integration mechanism that could provide “direct query”
integration model.
SUMMARY
14
DATA ABSTRACTION
SOLUTION
15
SOLUTION OVERVIEW
Storage Area
- Data Warehouses- Solution Components
Data
Warehouse
Tenant1
Data
Warehouse
Tenant2
Data
Warehouse
Tenant3
...
Data
Warehouse
TenantN
ETL Area
Cross-cutting Area
Consumers Area
Power BI Tableau Excel Any Compatible
Client
Multi-tenant
ETL Engine
Tenant1
ETL Process
Tenant2
ETL Process
Tenant3
ETL Process
TenantN
ETL Process
...
Exposure Area
Exposed
Tenant1 DB
Exposed
Tenant2 DB
Exposed
Tenant3 DB ... Exposed
TenantN DB
Monitoring
Logging
Security
16
AZURE ARCHITECTURE
DW
TenantN
DW
TenantN
TCP/IP
TCP/IP
...
Storage Area ETL Area
SQLAzureCluster
Pipeline per Tenant
Version of successfully
extracted data from every
sql table is stored in Version
Table of Tenant Storage
Account
Multi-tenant
Data Factory
Tenant 1
Storage Tables
Storage Account
per Tenant
Tenant N
Storage Tables
Storage Account
per Tenant
HTTPS
Exposure Area
...
Consumers Area
HTTPS
HTTPS
Power BI
Power BI
TenantNToolsTenant1Tools
HTTPS
Tenant1
AAD
Any Client
Any Client
Cross-cutting Area
Log Analytics
HTTPS HTTPS HTTPS
Tenant2
AAD
HTTPS
Access Control Monitor
17
• Table Storage is supported only by PowerBI as data source.
• PowerBI supports only “extract” integration model using Table Storage as data source.
• Incremental data load to PowerBI is not supported thus PowerBI needs to reload the whole dataset during
refresh.
• Power BI has strict 2-hour timeout that cannot be exceeded during the data import.
• Number of storage accounts per Azure subscription is limited to 250 maximum.
• Storage Account does not support integration with AAD in area of authenticating users and authorizing access
to stored data.
• Row-level security is not supported by Table Storage.
• PowerBI supports authentication with Storage Account only using Account Key.
• Table storage (and storage account) might be throttled at high-scale (max 20.000 req/sec per storage account
for 1KB entity, 2000 req/sec per table partition).
SOLUTION LIMITATIONS
18
Solution Pain Points (because of Storage Account)
• Filtering by columns not included into Partition and Row keys might lead to poor performance depending on data volume
• Absent of integration with AAD in area of authentication and authorization
• Supported only by PowerBI as a data source at the moment
• PowerBI use “extract” integration model when working with Table Storage
Solution Benefits (in case of use of SQL Azure as an Exposure Area)
• Integration with AAD for authentication and data access authorization
• Supported table and row-level security
• Supported by popular BI Tools using “direct mode” – BI Tool translates chart queries into sql queries and sends to SQL Azure
for execution
• Any client can connect to SQL Azure
• Allows to limit set of exposed tables by modifying ETL process or perform transformation of data during ETL into
materialized view with pre-aggregated data for better performance and lower DTU usage of SQL Azure
Optimal strategy: implementing Data Abstraction solution using SQL Azure or Azure Data Lake Store as an Exposure Area
SUMMARY
19
TABULAR MODEL SOLUTION
20
SOLUTION OVERVIEW
Storage Area
- Data Warehouses- Solution Components
Data
Warehouse
Tenant1
Data
Warehouse
Tenant2
Data
Warehouse
Tenant3
...
Data
Warehouse
TenantN
Exposure Area Cross-cutting Area
Consumer Area
Power BI Tableau Excel Any Compatible
Client
Multi-tenant Analytical Data Engine
Tenant1
Tabular
Model
Monitoring
Logging
Security
Tenant2
Tabular
Model
Tenant3
Tabular
Model
Analytical Data Engine
TenantN
Tabular
Model
21
AZURE ARCHITECTURE
Multi-Tenant
Analysis Service
DW
Tenant 2
DW
Tenant 3
Single Tenant
Analysis Service
TCP/IP
TCP/IP
Storage Area Exposure Area
SQLAzureCluster
HTTPS
In-Memory Model
Per Tenant
One In-Memory Model
AAD
DW
Tenant 1
asazure://
asazure://
TenantNToolsTenant1Tools
Consumers Area
Power BI
Power BI
HTTPS
Any Client
Any Client
Monitor
Cross-cutting Area
Log Analytics
HTTPS HTTPS
Access Control
22
• Expensive for huge data volumes (5.920$ for 100GB).
• Supports authentication only for organizational accounts that are members of default AAD
of subscription where Analysis Service resides.
• Isn’t supported by AWS QuickSight.
• PowerBI doesn’t allow to create/modify relations for model imported from Analysis
Service. All relations, measures, calculations and other entities should be defined in Tabular
model.
SOLUTION LIMITATIONS
23
Solution Pain Points:
• Expensive… ~3000$/month for 50GB and 200 QPUS
Solution Benefits
• Can be queried directly using BI Tool or custom application using DAX queries
• Supported by popular BI Tools like PowerBI, Tableau, Qlik
• Supports integration with AAD
• Pure PaaS offering that can scale-out if needed
• Can ingest data from multiple sources
• Powerful role-based security implementation that is supported on all levels: model, table, row
• Even big source databases (>100GB) can nicely fit into Analysis Service for analytics scenarios in a cost effective way by
extracting only needed information and aggregated data (materialized views)
Optimal strategy: use AAS for analytical purposes along with integration mechanism that could provide better approach for raw
data extraction in cost effective way
SUMMARY
24
SUMMARY
25
SOLUTIONS COMPARISON
OData API Solution Data Abstraction Solution Tabular Model Solution
Scenarios
Integration scenario suitability High Medium Medium
Analytics scenario suitability Low Low High
Quality attributes
Authentication High Low High
Authorization High Low High
Sync with source High Medium Medium
Maintainability Medium Medium Medium
Schema Medium High High
Infrastructure cost Low Low High
Scalability High High High
26
HYBRID SOLUTION
Multi-Tenant
Analysis Service
Monitor
DW
Tenant1
DW
TenantN
Tenant N
Analysis Service
asazure://
asazure://
... ...
Storage Area Exposure Area
Consumers
SQLAzureCluster
Power BI
Power BI
Cross-Cutting Area
Log Analytics
HTTPS
TenantNToolsTenant1Tools
In-Memory Model per tenant
In-Memory Model
HTTPS
...
Application
Insights
Master DB
Multi-tenant OData API
API App
Instance #1
API App
Instance #N
Key Vault
HTTPS
TCP/IP
Access control
OData client
Pre-aggregated
model
QuickSight
TCP/IP,
HTTPS
Data Factory
ETL Area
Autoscaling
Azure AD
27
THANK YOU!

More Related Content

PDF
Snowflake on AWS Workshop
PDF
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
PPTX
CRM UG Belux March 2017 - Power BI and Dynamics 365
PDF
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
PDF
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
PDF
Optimization SQL Server for Dynamics AX 2012 R3
PDF
Sql server 2016 new features
PDF
Microsoft SQL Server 2016 - Everything Built In
Snowflake on AWS Workshop
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
CRM UG Belux March 2017 - Power BI and Dynamics 365
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
Optimization SQL Server for Dynamics AX 2012 R3
Sql server 2016 new features
Microsoft SQL Server 2016 - Everything Built In

What's hot (20)

PPTX
The Future of Data Warehousing, Data Science and Machine Learning
PDF
NoSQL and Spatial Database Capabilities using PostgreSQL
 
PPTX
Tableau API
PDF
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
PDF
Editioning use in ebs
PPTX
OBIEE Upgrade - Key things you need to know
PDF
SAP HANA Architecture Overview | SAP HANA Tutorial
PPTX
SQL Server 2008 R2 Parallel Data Warehouse
PPTX
OAC - From Cloud Entry to Data Engineering to Data Science
PPTX
SQL Server 2016 new features
PPTX
HTAP Queries
PPTX
Expert summit SQL Server 2016
PPTX
SQL to Azure Migrations
PDF
OpenPOWER Roadmap Toward CORAL
PDF
Performance Stability, Tips and Tricks and Underscores
PDF
KSnow: Getting started with Snowflake
PPTX
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
PPTX
The Changing Role of a DBA in an Autonomous World
PPTX
A tour of Oracle DV V3.0 new features (June 2017)
PPTX
Understanding saa s
The Future of Data Warehousing, Data Science and Machine Learning
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Tableau API
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
Editioning use in ebs
OBIEE Upgrade - Key things you need to know
SAP HANA Architecture Overview | SAP HANA Tutorial
SQL Server 2008 R2 Parallel Data Warehouse
OAC - From Cloud Entry to Data Engineering to Data Science
SQL Server 2016 new features
HTAP Queries
Expert summit SQL Server 2016
SQL to Azure Migrations
OpenPOWER Roadmap Toward CORAL
Performance Stability, Tips and Tricks and Underscores
KSnow: Getting started with Snowflake
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
The Changing Role of a DBA in an Autonomous World
A tour of Oracle DV V3.0 new features (June 2017)
Understanding saa s
Ad

Similar to Data exposure in Azure - production use-case (20)

PPTX
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
PDF
Serverless SQL
PDF
PPWT2019 - EmPower your BI architecture
PPTX
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
PDF
Azure Data Analysis.pdf
PDF
What's New in Apache Hive 3.0?
PDF
What's New in Apache Hive 3.0 - Tokyo
PPTX
Data Modernization_Harinath Susairaj.pptx
PDF
Azure data analytics platform - A reference architecture
PPT
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
PPTX
Building Modern Data Platform with Microsoft Azure
PDF
In-memory ColumnStore Index
PPTX
SQL PASS Taiwan 七月份聚會-1
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PPTX
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
PDF
SQL Server 2019 Big Data Cluster
PPTX
Afternoons with Azure - Azure Data Services
 
PPTX
Azure fundamental -Introduction
PDF
Unlocking the Value of Your Data Lake
PPTX
Service quality monitoring system architecture
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Serverless SQL
PPWT2019 - EmPower your BI architecture
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Azure Data Analysis.pdf
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
Data Modernization_Harinath Susairaj.pptx
Azure data analytics platform - A reference architecture
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building Modern Data Platform with Microsoft Azure
In-memory ColumnStore Index
SQL PASS Taiwan 七月份聚會-1
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
SQL Server 2019 Big Data Cluster
Afternoons with Azure - Azure Data Services
 
Azure fundamental -Introduction
Unlocking the Value of Your Data Lake
Service quality monitoring system architecture
Ad

More from Alexander Laysha (6)

PPTX
High throughput data streaming in Azure
PPTX
Multi-Tenant Hybrid Solution based on Hybrid Connections & App Service
PPTX
Implement API Gateway using Azure API Management
PPTX
Usage of Reliable Actors in Azure Service Fabric
PPTX
Monitoring of distributed app hosted in Azure App Service
PPTX
Миграция в Azure Service Fabric
High throughput data streaming in Azure
Multi-Tenant Hybrid Solution based on Hybrid Connections & App Service
Implement API Gateway using Azure API Management
Usage of Reliable Actors in Azure Service Fabric
Monitoring of distributed app hosted in Azure App Service
Миграция в Azure Service Fabric

Recently uploaded (20)

PPTX
Computer network topology notes for revision
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Global journeys: estimating international migration
PDF
Mega Projects Data Mega Projects Data
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Computer network topology notes for revision
Introduction to Knowledge Engineering Part 1
Supervised vs unsupervised machine learning algorithms
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
climate analysis of Dhaka ,Banglades.pptx
Moving the Public Sector (Government) to a Digital Adoption
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Foundation of Data Science unit number two notes
Global journeys: estimating international migration
Mega Projects Data Mega Projects Data
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Clinical guidelines as a resource for EBP(1).pdf
Miokarditis (Inflamasi pada Otot Jantung)
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction-to-Cloud-ComputingFinal.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx

Data exposure in Azure - production use-case

  • 1. Data exposure in Azure: Production use-case April, 2018
  • 2. 2 FEW WORDS ABOUT MYSELF… I’m Alexander Laysha • Solution Architect at EPAM Systems • Co-Head of TR Cloud Center of Excellence at EPAM Systems • Microsoft Azure MVP • Focused on backend cloud solutions • Leader of Belarus Azure Community My contacts • Email: layshaalex@gmail.com • Twitter: @layshaalexander • Facebook: alexander.laysha
  • 3. 3 • Overview & Requirements • OData API Solution • Data Abstraction Solution • Tabular Model Solution • Summary AGENDA
  • 5. 5 • Сustomer has 1000+ staging data warehouses for it’s clients with 6TB of data in overall • Customer clients has access to data from staging data warehouses through OLAPs that are consumed from web applications CONTEXT SSIS Server - Orchestrator - Packages - Jobs On-Premise – OLTP Source – 7 DB Servers Master DB Tenant Data Warehouses SSAS Servers - Dashboard CUBES - Benchmark CUBES Web APP – DASHBOARDS - Reporting Company Databases Benchmark Database SSIS Server - Orchestrator - Packages - Jobs
  • 6. 6 • Integration Client - would like to perform full & incremental extract of it's own raw data from staging data warehouses • BI Client - would like to connect to it’s data in staging data warehouses using own BI Tools NEW USE-CASES
  • 7. 7 • Data warehouses are exposed as read-only sources • Client can’t connect to data warehouse directly because of customer security policies • Source database schema change propagation • Multi-tenant support & strong tenant data isolation • Azure AD authentication • Role-based data access • Row-level security in the future • Support of major BI Tools: PowerBI, Tableau, Qlik • Clients should not wait hours to load their data • Data warehouses should not be affected by load spikes ARCHITECTURE REQUIREMENTS
  • 8. 8 PROPOSED ARCHITECTURE APPROACHES • OData API Solution – abstracts clients from staging data warehouses and provides integration points for clients • Data Abstraction Solution – contains ETL process to extract, transform and load data into separate storage that acts as an integration point for clients • Tabular Model Solution – memory optimized databased for analytical workloads hosted in Analysis Service. It extracts data from staging data warehouses and provides integration point for clients Storage Area Data Warehouse Tenant1 Data Warehouse Tenant2 Data Warehouse Tenant3 ... Data Warehouse TenantN Consumers Area Power BI Tableau Excel Any Compatible Client Exposure Area OData API Solution Data Abstraction Solution Tabular Model Solution Authentication/ Authorization
  • 10. 10 SOLUTION OVERVIEW Storage Area - Data Warehouses- Solution Components Master DB Data Warehouse Tenant1 Data Warehouse Tenant2 ... ... Data Warehouse TenantN Exposure Area Configuration Logging & Monitoring Consumers Area Power BI Tableau Excel Any Compatible Client Authentication BasicOpenID/OAuth 2.0 Authorization Table-levelTenant-level API ODATA Engine (Maskx.Odata) Data Access
  • 11. 11 AZURE ARCHITECTURE Client Autoscaling Azure Key Vault Master DB Client Azure AD API App Instance #1 API App Instance #N Custom OData API Storage Area TCP Application Insights HTTPSInternet Authentication Configuration Logging & monitoring Data Warehouse Tenant1 Data Warehouse TenantN Azure AD (B2B or B2C) Authentication Data Warehouse Tenant2 Read-only replica
  • 12. 12 • Most BI tools support an “extract” integration model when using ODATA API as a data source. • No possibility to perform incremental extract, only reloading the entire dataset on schedule. “Incremental load” feature is in development now. • Power BI has strict 2-hour timeout that cannot be exceeded during the data import, which raises additional concerns on exposing large data sets using the API. • Power BI requires API endpoint and Azure AD tenant to be hosted under custom domain name to work with Azure AD authentication. • For Azure AD B2C prototyping we had to develop a custom policy using the Identity Experience Framework, which may introduce additional risk since the feature is still in the public preview. • Lack of developer documentation for the Identity Experience Framework: we had to submit issue for assistance to Microsoft team. SOLUTION LIMITATIONS
  • 13. 13 Solution Pain Points • BI tools use “extract” integration model when working with API. • Most of BI tools are not able to extract data from API incrementally. Can be potentially mitigated by some pre-aggregation on the database side. Solution Benefits • Simple on-boarding mechanism for new clients since HTTP-based integration does not introduce any dependency on the implementation stack. • Standardized mechanism of exposing datasets with their metadata. Good number of client libraries available for wide set of programming languages. • Ability for clients to select only subset of data. • Solution allows adopting authentication and authorization logic already applied in customer company. Level of flexibility is higher comparing to direct usage of Azure services like Azure Storage. Optimal strategy: exposing Custom API along with alternative integration mechanism that could provide “direct query” integration model. SUMMARY
  • 15. 15 SOLUTION OVERVIEW Storage Area - Data Warehouses- Solution Components Data Warehouse Tenant1 Data Warehouse Tenant2 Data Warehouse Tenant3 ... Data Warehouse TenantN ETL Area Cross-cutting Area Consumers Area Power BI Tableau Excel Any Compatible Client Multi-tenant ETL Engine Tenant1 ETL Process Tenant2 ETL Process Tenant3 ETL Process TenantN ETL Process ... Exposure Area Exposed Tenant1 DB Exposed Tenant2 DB Exposed Tenant3 DB ... Exposed TenantN DB Monitoring Logging Security
  • 16. 16 AZURE ARCHITECTURE DW TenantN DW TenantN TCP/IP TCP/IP ... Storage Area ETL Area SQLAzureCluster Pipeline per Tenant Version of successfully extracted data from every sql table is stored in Version Table of Tenant Storage Account Multi-tenant Data Factory Tenant 1 Storage Tables Storage Account per Tenant Tenant N Storage Tables Storage Account per Tenant HTTPS Exposure Area ... Consumers Area HTTPS HTTPS Power BI Power BI TenantNToolsTenant1Tools HTTPS Tenant1 AAD Any Client Any Client Cross-cutting Area Log Analytics HTTPS HTTPS HTTPS Tenant2 AAD HTTPS Access Control Monitor
  • 17. 17 • Table Storage is supported only by PowerBI as data source. • PowerBI supports only “extract” integration model using Table Storage as data source. • Incremental data load to PowerBI is not supported thus PowerBI needs to reload the whole dataset during refresh. • Power BI has strict 2-hour timeout that cannot be exceeded during the data import. • Number of storage accounts per Azure subscription is limited to 250 maximum. • Storage Account does not support integration with AAD in area of authenticating users and authorizing access to stored data. • Row-level security is not supported by Table Storage. • PowerBI supports authentication with Storage Account only using Account Key. • Table storage (and storage account) might be throttled at high-scale (max 20.000 req/sec per storage account for 1KB entity, 2000 req/sec per table partition). SOLUTION LIMITATIONS
  • 18. 18 Solution Pain Points (because of Storage Account) • Filtering by columns not included into Partition and Row keys might lead to poor performance depending on data volume • Absent of integration with AAD in area of authentication and authorization • Supported only by PowerBI as a data source at the moment • PowerBI use “extract” integration model when working with Table Storage Solution Benefits (in case of use of SQL Azure as an Exposure Area) • Integration with AAD for authentication and data access authorization • Supported table and row-level security • Supported by popular BI Tools using “direct mode” – BI Tool translates chart queries into sql queries and sends to SQL Azure for execution • Any client can connect to SQL Azure • Allows to limit set of exposed tables by modifying ETL process or perform transformation of data during ETL into materialized view with pre-aggregated data for better performance and lower DTU usage of SQL Azure Optimal strategy: implementing Data Abstraction solution using SQL Azure or Azure Data Lake Store as an Exposure Area SUMMARY
  • 20. 20 SOLUTION OVERVIEW Storage Area - Data Warehouses- Solution Components Data Warehouse Tenant1 Data Warehouse Tenant2 Data Warehouse Tenant3 ... Data Warehouse TenantN Exposure Area Cross-cutting Area Consumer Area Power BI Tableau Excel Any Compatible Client Multi-tenant Analytical Data Engine Tenant1 Tabular Model Monitoring Logging Security Tenant2 Tabular Model Tenant3 Tabular Model Analytical Data Engine TenantN Tabular Model
  • 21. 21 AZURE ARCHITECTURE Multi-Tenant Analysis Service DW Tenant 2 DW Tenant 3 Single Tenant Analysis Service TCP/IP TCP/IP Storage Area Exposure Area SQLAzureCluster HTTPS In-Memory Model Per Tenant One In-Memory Model AAD DW Tenant 1 asazure:// asazure:// TenantNToolsTenant1Tools Consumers Area Power BI Power BI HTTPS Any Client Any Client Monitor Cross-cutting Area Log Analytics HTTPS HTTPS Access Control
  • 22. 22 • Expensive for huge data volumes (5.920$ for 100GB). • Supports authentication only for organizational accounts that are members of default AAD of subscription where Analysis Service resides. • Isn’t supported by AWS QuickSight. • PowerBI doesn’t allow to create/modify relations for model imported from Analysis Service. All relations, measures, calculations and other entities should be defined in Tabular model. SOLUTION LIMITATIONS
  • 23. 23 Solution Pain Points: • Expensive… ~3000$/month for 50GB and 200 QPUS Solution Benefits • Can be queried directly using BI Tool or custom application using DAX queries • Supported by popular BI Tools like PowerBI, Tableau, Qlik • Supports integration with AAD • Pure PaaS offering that can scale-out if needed • Can ingest data from multiple sources • Powerful role-based security implementation that is supported on all levels: model, table, row • Even big source databases (>100GB) can nicely fit into Analysis Service for analytics scenarios in a cost effective way by extracting only needed information and aggregated data (materialized views) Optimal strategy: use AAS for analytical purposes along with integration mechanism that could provide better approach for raw data extraction in cost effective way SUMMARY
  • 25. 25 SOLUTIONS COMPARISON OData API Solution Data Abstraction Solution Tabular Model Solution Scenarios Integration scenario suitability High Medium Medium Analytics scenario suitability Low Low High Quality attributes Authentication High Low High Authorization High Low High Sync with source High Medium Medium Maintainability Medium Medium Medium Schema Medium High High Infrastructure cost Low Low High Scalability High High High
  • 26. 26 HYBRID SOLUTION Multi-Tenant Analysis Service Monitor DW Tenant1 DW TenantN Tenant N Analysis Service asazure:// asazure:// ... ... Storage Area Exposure Area Consumers SQLAzureCluster Power BI Power BI Cross-Cutting Area Log Analytics HTTPS TenantNToolsTenant1Tools In-Memory Model per tenant In-Memory Model HTTPS ... Application Insights Master DB Multi-tenant OData API API App Instance #1 API App Instance #N Key Vault HTTPS TCP/IP Access control OData client Pre-aggregated model QuickSight TCP/IP, HTTPS Data Factory ETL Area Autoscaling Azure AD