SlideShare a Scribd company logo
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 1
Azure Data
dataredkite.com
premiseo.com
Who are we ?
26/02/2021 2
I'm a data and cloud Architect and Spark lover.
I worked many years as an Oracle consultant and
expert, and now I work with Cloud solutions devoted to
solve complex problems with high volumes of data.
I am a Data Analyst & Solution Architect indepedent -
☁️ MCSE, Cosmos DB & Delta lover.
I developed my skills through various clients' projects,
teaching at the University and personal proof of
concepts.
I’m also the Co-Founder of DataRedkite, a product which
can quickly give to its user a good management of data
in Microsoft Azure DataLake.
Laurent Leturgez Alexandre Bergere
Meetup Azure Lille
dataredkite.com
premiseo.com
Summary
26/02/2021 Meetup Azure Lille 3
Relational Databases NoSQL Databases Big Data Storage
Data Big Data Streaming
Storage :
Compute :
premiseo.com dataredkite.com
Storage
4
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 5
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed Postgres database
service for app developers
Azure Database for PostGres
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 6
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGres
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 7
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database Azure SQL Database
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 8
• Azure SQL
• SQL Server Paas service
• Managed upgrades, patches, backups and monitoring
• Latest Stable version of SQL Server
• 99,99% availability
• Deployment model
• Single Database : database runs on non shared resources
• Elastic Pool : database runs with a collection of databases that share set of resources at a
predictable price
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 9
• Azure SQL
• Purchasing model
• DTU (Database Transaction Unit) : https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tiers-
dtu
• Basic tier
• Standard Tier
• Premium Tier
• vCore model
• Serverless
• Service Tier
• General Purpose (vCore) / Standard (DTU) : Common workloads
• Business Critical (vCore) / Premium (DTU) : High transaction and availability / low latency IO
• HyperScale (vCore) :
• Up to 100Tb Database
• Rapid Scale up (compute resources)
• Rapid Scale out (read only nodes : read workload / hot-standby)
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 10
• Azure SQL Managed Instance
• Features
• Paas platform for lift and shift at scale
• Broadest SQL Server engine compatibility (network integration, features etc.)
• With perservation of all Paas capabilities (patching, updates, backups, HA etc.)
• vCore purchase model only
• BYOL available
• SQL Virtual Machine
• SQL Server deployment on VM (Linux and Windows)
• Can choice SQL Server version
• From 2008 R2
• Up to 2019
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 11
Azure SQL Database
Managed Instance
Instance scoped model with
high compatibility to SQL Server
Best for modernisation at scale
with low cost effort (lift & shift)
Single
Standalone managed database
for predictable and stable
workloads
Elastic Pool
Shared resources model :
multitenant
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 12
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 13
• Paas Service for PostgreSQL
• Runs on Windows
• Single Server
• v9.5 to 11
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/postgresql/concepts-pricing-
tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Bunch of PG Extensions available
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 14
• Paas Service for PostgreSQL
• HyperScale (Citus)
• High performance and analytical workloads beyond 100Gb
• Hyperscale delivers
• Horizontal scaling across multiple machine (with Sharding)
• Query parallelization across these servers
• High performance for analytics
• Based on server groups
• Design approach required for table distribution and performance
• Distributed tables (based on distribution column)
• Reference tables (content concentrated into a single shard replicated on every worker node)
• Local tables (ordinary unsharded tables. Perfect for small tables not involded into joins)
• Automated backup through storage snapshots
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 15
• Paas Service for PostgreSQL
• Flexible Server (Preview)
• Automated patching
• Automatic backups
• Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized
High Availability Zone Redundant HA (Optional)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 16
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
Managed MySQL database service
for app developers
Azure Database for MySQL
dataredkite.com
premiseo.com
Azure Database for MariaDB
27/04/2021 Meetup Azure Lille 17
• Paas Service for MariaDB
• Runs on Windows
• Single Server
• V10.2 and 10.3
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mariadb/concepts-pricing-tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 18
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
Managed MySQL database service
for app developers
Azure Database for MySQL
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 19
• Paas Service for MySQL
• Runs on Windows
• Single Server
• V5.6, 5.7, and 8.0
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mysql/concepts-pricing-tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 20
• Paas Service for MySQL
Flexible Server (Preview)
• V5.7
• Automated patching
• Automatic backups
• Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized
• Network Isolation
• Private Access through Vnet integration
• Public Access
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 21
• Paas Service for MySQL
Flexible Server (Preview)
High Availability Zone Redundant HA (Optional)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 22
NOSQL Databases
Globally distributed, multi-model
database for any scale
Azure Cosmos DB
dataredkite.com
premiseo.com
Azure Cosmos DB
26/02/2021 23
A globally distributed, massively scalable, multi-model database service
Azure Cosmos DB
o SQL API
o MongoDB API
o Cassandra API
o Gremlin API
o Table API
dataredkite.com
premiseo.com
Azure Cosmos DB
26/02/2021 24
Throughput
Cosmic Notes
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 25
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 26
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
dataredkite.com
premiseo.com
Storage Account
26/02/2021 27
o Azure Blobs : A scalable object store for text and binary data
o Azure Files : Managed file shares for cloud or on-premises deployments
o Azure Queues : A messaging store for reliable messaging between application components
o Azure Tables : A NoSQL store for no-schema storage of structured data
Azure Storage accounts are the base storage type within Azure. Azure Storage offers a very scalable object store for data
objects and file system services in the cloud. It can also provide a messaging store for reliable messaging, or it can act as a
NoSQL store.
Azure selected four of these data services and placed them together under the name Azure Storage. The four services are
Azure Blobs, Azure Files, Azure Queues, and Azure Tables. The following illustration shows the elements of Azure Storage
dataredkite.com
premiseo.com
Storage Account
26/02/2021 28
Type of Storage Account
Storage account type Services Redundancy options
General-purpose V2 Basic storage account type for blobs, files, queues, and tables. Recommended
for most scenarios using Azure Storage.
LRS, GRS, RA-GRS, ZRS, GZRS,
RA-GZRS
General-purpose V1 Legacy account type for blobs, files, queues, and tables. Use general-purpose
v2 accounts instead when possible.
LRS, GRS, RA-GRS
BlockBlobStorage Storage accounts with premium performance characteristics for block blobs
and append blobs. Recommended for scenarios with high transactions rates, or
scenarios that use smaller objects or require consistently low storage latency.
LRS, ZRS
FileStorage Files-only storage accounts with premium performance characteristics.
Recommended for enterprise or high performance scale applications.
LRS, ZRS
BlobStorage Legacy Blob-only storage accounts. Use general-purpose v2 accounts instead
when possible.
LRS, GRS, RA-GRS
dataredkite.com
premiseo.com
Replication Options
27/04/2021 29
dataredkite.com
premiseo.com
Replication Strategy
27/04/2021 30
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 31
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
dataredkite.com
premiseo.com
Azure Datalake Store
26/02/2021 32
Azure Data Lake Storage is a Hadoop-compatible data repository that can store any size or type of data. This storage
service is available as Generation 1 (Gen1) or Generation 2 (Gen2).
Key features of Data Lake Storage:
o Unlimited scalability
o Hadoop compatibility
o Security support for both access control lists (ACLs) & RBAC (for Gen 2 only)
o POSIX compliance
o An optimized Azure Blob File System (ABFS) driver that's designed for big-data analytics
o Zone-redundant storage
o Geo-redundant storage
Azure Datalake Gen 1 Azure Datalake Gen 2
dataredkite.com
premiseo.com
Choose a storage solution on Azure
26/02/2021 33
Data classification Operations Latency & throughput Transactional support Recommended service
Product catalog data Semi-structured because of
the need to extend or modify
the schema for new products
o Customers require a high
number of read operations,
with the ability to query on
many fields within the
database.
o The business requires a
high number of write
operations to track the
constantly changing
inventory.
High throughput and low
latency
Required Azure Cosmos DB
Photos and videos Unstructured o Only need to be retrieved
by ID.
o Customers require a high
number of read operations
with low latency.
o Creates and updates will be
somewhat infrequent and
can have higher latency
than read operations.
Retrievals by ID need to
support low latency and high
throughput. Creates and
updates can have higher
latency than read operations.
Not required Azure Blob storage
Business data Structured Read-only, complex analytical
queries across multiple
databases
Some latency in the results is
expected based on the
complex nature of the queries
Required Azure SQL Database
Azure Database for MariaDB
Azure Database for PostGre
Azure Database for MySQL
premiseo.com dataredkite.com
Compute
34
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 35
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 36
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
dataredkite.com
premiseo.com
Azure Function
37
Azure Functions is the serverless compute service from Microsoft. Functions are event-driven: each function defines a
trigger — the exact definition of the event source, for instance, the name of a storage queue.
Uses cases:
If you want to... then...
Build a web API Implement an endpoint for your web applications using the HTTP trigger
Process file uploads Run code when a file is uploaded or changed in blob storage
Build a serverless workflow Chain a series of functions together using durable functions
Respond to database changes Run custom logic when a document is created or updated in Cosmos DB
Run scheduled tasks Execute code at set times
Create reliable message queue systems Process message queues using Queue Storage, Service Bus, or Event Hubs
dataredkite.com
premiseo.com
Azure Function
38
Consumption Plan Functions
Consumption Plan (B1, B2, B3, S1, S2, S3
Scale automatically and only pay for compute resources when your functions are running. On
the Consumption plan, instances of the Functions host will be dynamically added and
removed based on the number of incoming events.
Premium plan (P1v2, P2v2, P3v3)
While automatically scaling based on demand, use prewarmed workers to run applications
with no delay after being idle, run on more powerful instances and connect to VNETs.
Azure App Service plan
Run Functions within an App Service plan at regular App Service plan rates. Good fit for long-
running operations, as well as when more predictive scaling and costs are required.
Azure Functions hosting options : Azure Plan
dataredkite.com
premiseo.com
27/04/2021 39
Durable Functions is a library that brings workflow orchestration abstractions to Azure Functions. It introduces a number of idioms and tools
to define stateful, potentially long-running operations, and manages a lot of mechanics of reliable communication and state management
behind the scenes.
Log of events in the course of orchestrator
progression
3 steps of a workflow executed in sequence
https://medium.com/hackernoon/making-sense-of-azure-durable-functions-
645ecb3c1d58
Azure Function
Azure Durable Functions
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 40
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 41
• Serverless Data Integration service
• Data Pipeline : logical group of activities
• Data Flow : Data Transformation activity
• Data Copy : Data Transfer activity
• SSIS Integration
• Git integration
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 42
• Serverless Data Integration service
• Job scheduling
• Automatically through internal Scheduler
• Manually
• SDK : .NET, Python
• REST API
• PowerShell
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 43
• Serverless Data Integration service
• Integration runtime
• Compute infrastructure used by ADF to provide data integration
• Azure : Serverless
• Self Hosted : Onprem or Azure Virtual Machine (Windows)
• SSIS
Activity Features
Azure Data Flow
Data Copy
Dispatch Activity (HDI, Databricks, SQL …)
Cloud to Cloud data transfer/flows
Self-Hosted Data Flow
Data Copy
Dispatch Activity (HDI, Databricks, SQL …)
OnPrem or Virtual Machine deployment (Windows)
OnPrem <-> Cloud data transfer/flows
When connectors are not available
SSIS SSIS Package execution Private or public Network
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 44
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 45
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 46
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 47
Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks
offers two environments for developing data intensive applications:
o Azure Databricks Workspace: provides an interactive workspace that enables collaboration between data engineers,
data scientists, and machine learning engineers.
o Azure Databricks SQL Analytics: provides an easy-to-use platform for analysts who want to run SQL queries on their
data lake, create multiple visualization types to explore query results from different perspectives, and build and share
dashboards.
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 48
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure HDInsights
27/04/2021 Meetup Azure Lille 49
• Managed Hadoop distribution for Azure
• Based on Cloudera Hortonworks hadoop distribution
• Comes in various flavours / shapes (VM shapes and number)
• Hadoop : General purpose (HDFS, Yarn, MapReduce, Hive, Pig, Sqoop, Oozie)
• Spark
• Kafka
• HBase
• Hive / LLAP (Interactive Query)
• Storm (Stream processing)
• ML Services with R
dataredkite.com
premiseo.com
Azure HDInsights
27/04/2021 Meetup Azure Lille 50
• At least one Storage account mandatory (for libs and binaries)
• External Metastores available for Ambari, Hive and Oozie
• HDInsights architecture
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 51
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsigth
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 52
Stream Analytics
Data Factory
Data Lake
Modern Analytics
MPP
Datawarehouse
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 53
MPP
Datawarehou
se
Choice of language (T-SQL, Spark
SQL, Python, Scala, .Net)
Analytics ready (Analysis Services,
Power BI)
Data Science and AI Ready (Azure
Machine Learning integration)
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 54
Synapse Analytics
• Sample Use Case : Pure Business Intelligence !
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 55
• Not for small database (Usually > 1Tb)
• Cost Model
• Synapse Provisioned
• T-SQL Pool with DWU (Datawarehouse Units)
• Storage (Geo redundant option)
• Synapse Serverless
• Spark Pools
• Synapse Pipeline
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 56
• Architecture
• DMS (Data Movement Service)
• Used for Data Colocation
• Key point: Data Partitioning
and Data Distribution
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 57
• Hash distributed table • Replicated Table • Round Robin
distributed Table
• Example
• Dimension to Fact table join
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 58
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 59
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Streaming Analytics
26/02/2021 60
dataredkite.com
premiseo.com
Azure Streaming Analytics
26/02/2021 61
o Azure Stream Analytics supports user-defined functions (UDF) or user-defined aggregates (UDA) in JavaScript for cloud jobs and C# for IoT
Edge jobs
UDFs, UDAs, and custom deserializers:
o Analyze real-time telemetry streams from IoT devices
o Web logs/clickstream analytics
o Geospatial analytics for fleet management and driverless vehicles
o Remote monitoring and predictive maintenance of high value assets
o Real-time analytics on Point of Sale data for inventory control and anomaly detection
Examples scenarios:
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 62
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Iot Hub
63
Azure IoT Hub :
o The cloud gateway that connects IoT devices to gather data and drive business insights and automation.
o The big data streaming service of Azure. It is designed for high throughput data streaming scenarios where customers
may send billions of requests per day.
o Bi-directional communication capabilities
dataredkite.com
premiseo.com
Iot Hub or Event Hubs
64
IoT Hub was developed to address the unique requirements of connecting IoT devices to the Azure cloud while Event Hubs
was designed for big data streaming. Microsoft recommends using Azure IoT Hub to connect IoT devices to Azure.
IoT Capability IoT Hub standard tier IoT Hub basic tier Event Hubs
Device-to-cloud messaging
Protocols: HTTPS, AMQP, AMQP over webSockets
Protocols: MQTT, MQTT over webSockets
Per-device identity
File upload from devices
Device Provisioning Service
Cloud-to-device messaging
Device twin and device management
Device streams (preview)
IoT Edge
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 65
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Connect, monitor and manage
billions of IoT assets
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Apache Kafka on HDInsight architecture
27/04/2021 Meetup Azure Lille 66
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 67
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 68
o Apache Spark Streaming is a scalable fault-tolerant streaming processing system that natively supports both batch and
streaming workloads.
o Spark Streaming is an extension of the core Spark API
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 69
Data Tools
dataredkite.com
premiseo.com
Azure Data Studio
26/02/2021 70
Azure Data Studio is a cross-platform database tool that you can run on Windows, macOS, and Linux. You'll use it to
connect to SQL Data Warehouse and Azure SQL Database.
Previously released under the preview name SQL Operations Studio, Azure Data Studio offers a modern editor experience
with IntelliSense, code snippets, source control integration, and an integrated terminal. It is engineered with the data
platform user in mind, with built in charting of query result sets and customizable dashboards.
dataredkite.com
premiseo.com
Storage Explorer
26/02/2021 71
Begin by downloading and installing Storage Explorer. You can use Storage Explorer to do several operations against data
in your Azure Storage account and data lake:
o Upload files or folders from your local computer into Azure Storage.
o Download cloud-based data to your local computer.
o Copy or move files and folders around in the storage account.
o Delete data from the storage account.
dataredkite.com
premiseo.com
Visual Studio Code
26/02/2021 72
Visual Studio Code is a lightweight source code editor which runs on your desktop and is available for Windows, macOS
and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for
other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity).
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 73
Data Migration Tools
dataredkite.com
premiseo.com
Summary
26/02/2021 74
Scenario Some recommended solutions
Disaster Recovery Azure geo-redundant backups
Read Scale Use read-only replicas to load balance read-only query
workloads (preview)
ETL (OLTP to OLAP) Azure Data Factory or SQL Server Integration Services or
Databricks
Migration from on-premises SQL Server to Azure SQL
Database
Azure Database Migration Service
Kept up-to-date across several Azure SQL databases or SQL
Server database
Azure SQL Data Sync
Detecting compatibility issues that can impact database
functionality in your new version of SQL Server or Azure SQL
Database
Data Migration Assistant (DMA)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 75
Resources
dataredkite.com
premiseo.com
Azure charts
26/02/2021 76
https://azurecharts.com/
premiseo.com dataredkite.com
26/02/2021 77
Just few sources in Microsoft Learn:
o Azure for the Data Engineer
o Store data in Azure
o Work with relational data in Azure
o Large Scale Data Processing with Azure Data Lake Storage Gen2
o Implement a Data Streaming Solution with Azure Streaming Analytics
o Implement a Data Warehouse with Azure SQL Data Warehouse
Sources
dataredkite.com
premiseo.com
Fill the form
78
https://forms.office.com/Pages/ResponsePage.as
px?id=M3s0akU8nUyLePs4Zpn6Tp_2uFsS8cJJsHCS
wweCY5JUNVlMMllQNU4yRUVVWjFEOU5GVVc2S
VU3Si4u
Your turn !
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 79
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
Next Session: Azure
Databricks
dataredkite.com
premiseo.com
Thank you
26/02/2021 80
Meetup Azure Lille
dataredkite.com
https://premiseo.com/

More Related Content

PDF
Azure fundamentals
PDF
Cloud architecture - Azure - AWS
PPTX
Leverage Azure Blob Storage to build storage intensive cloud native applications
PDF
Microsoft Azure Fundamentals
PPTX
Cloud - Fundamentals
PPTX
Google cloud computing
PPTX
Cloud1 Computing 01
PPTX
AZURE Data Related Services
Azure fundamentals
Cloud architecture - Azure - AWS
Leverage Azure Blob Storage to build storage intensive cloud native applications
Microsoft Azure Fundamentals
Cloud - Fundamentals
Google cloud computing
Cloud1 Computing 01
AZURE Data Related Services

What's hot (20)

PPTX
Ppt on cloud service
PPTX
Azure intelligent edge solutions overview
PPTX
What are the Business Benefits of Microsoft Azure
PDF
Cassandra at eBay - Cassandra Summit 2013
PPTX
Google Cloud Platform
PPTX
Azure Storage
PDF
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
PDF
Introduction to Microsoft Azure Cloud
PPTX
Data saturday Oslo Azure Purview Erwin de Kreuk
PDF
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
PPTX
04 Azure IAAS 101
PDF
Mastering azure devOps - Dot Net Tricks
PPTX
AWS for the Data Professional
PPTX
Benefits of the Azure cloud
PDF
Tom Grey - Google Cloud Platform
PPTX
Citrix on Azure
PDF
Azure Stack - Azure in your own Data Center
PPTX
Introduction to Microsoft Azure
PDF
Building Hybrid Cloud Apps with Azure and Azure stack
PDF
Microsoft Azure Stack
Ppt on cloud service
Azure intelligent edge solutions overview
What are the Business Benefits of Microsoft Azure
Cassandra at eBay - Cassandra Summit 2013
Google Cloud Platform
Azure Storage
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
Introduction to Microsoft Azure Cloud
Data saturday Oslo Azure Purview Erwin de Kreuk
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
04 Azure IAAS 101
Mastering azure devOps - Dot Net Tricks
AWS for the Data Professional
Benefits of the Azure cloud
Tom Grey - Google Cloud Platform
Citrix on Azure
Azure Stack - Azure in your own Data Center
Introduction to Microsoft Azure
Building Hybrid Cloud Apps with Azure and Azure stack
Microsoft Azure Stack
Ad

Similar to 20210427 azure lille_meetup_azure_data_stack (20)

PDF
Azure Data services
PPTX
Perth Azure Usergroup Build 2018 updates
PPTX
Azure data platform overview
PPTX
Rising Interest in Open Source Relational Databases
PDF
Lakehouse in Azure
PDF
OSS DB on Azure
PDF
DBaaS with EDB Postgres on AWS
 
PDF
Azure - Data Platform
PPTX
2014.10.22 Building Azure Solutions with Office 365
PPTX
Scalable relational database with SQL Azure
PDF
IBM - Introduction to Cloudant
PDF
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
PDF
Serverless Data Platform
PPTX
cosmodb ppt personal.pptxgskjhkjsfgkhkjgskhk
PPTX
Afternoons with Azure - Azure Data Services
 
PPTX
Azure data platform overview
PDF
Clash of Technologies Google Cloud vs Microsoft Azure
PDF
MySQL Ecosystem in 2020
PDF
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
PDF
Azure SQL Database
Azure Data services
Perth Azure Usergroup Build 2018 updates
Azure data platform overview
Rising Interest in Open Source Relational Databases
Lakehouse in Azure
OSS DB on Azure
DBaaS with EDB Postgres on AWS
 
Azure - Data Platform
2014.10.22 Building Azure Solutions with Office 365
Scalable relational database with SQL Azure
IBM - Introduction to Cloudant
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Serverless Data Platform
cosmodb ppt personal.pptxgskjhkjsfgkhkjgskhk
Afternoons with Azure - Azure Data Services
 
Azure data platform overview
Clash of Technologies Google Cloud vs Microsoft Azure
MySQL Ecosystem in 2020
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
Azure SQL Database
Ad

More from Alexandre BERGERE (6)

PDF
Databases - beyond SQL : Cosmos DB (part 6)
PDF
comparatifs des familles NoSQL & concepts de modélisation
PDF
Azure data stack_2019_08
PDF
Big dataclasses 2019_nosql
PDF
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
PDF
MongoDB classes 2019
Databases - beyond SQL : Cosmos DB (part 6)
comparatifs des familles NoSQL & concepts de modélisation
Azure data stack_2019_08
Big dataclasses 2019_nosql
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
MongoDB classes 2019

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
A Presentation on Artificial Intelligence
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Modernizing your data center with Dell and AMD
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
CIFDAQ's Market Insight: SEC Turns Pro Crypto
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
A Presentation on Artificial Intelligence
Chapter 3 Spatial Domain Image Processing.pdf
cuic standard and advanced reporting.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

20210427 azure lille_meetup_azure_data_stack

  • 2. dataredkite.com premiseo.com Who are we ? 26/02/2021 2 I'm a data and cloud Architect and Spark lover. I worked many years as an Oracle consultant and expert, and now I work with Cloud solutions devoted to solve complex problems with high volumes of data. I am a Data Analyst & Solution Architect indepedent - ☁️ MCSE, Cosmos DB & Delta lover. I developed my skills through various clients' projects, teaching at the University and personal proof of concepts. I’m also the Co-Founder of DataRedkite, a product which can quickly give to its user a good management of data in Microsoft Azure DataLake. Laurent Leturgez Alexandre Bergere Meetup Azure Lille
  • 3. dataredkite.com premiseo.com Summary 26/02/2021 Meetup Azure Lille 3 Relational Databases NoSQL Databases Big Data Storage Data Big Data Streaming Storage : Compute :
  • 5. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 5 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed MySQL database service for app developers Azure Database for MySQL Managed Postgres database service for app developers Azure Database for PostGres
  • 6. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 6 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MySQL database service for app developers Azure Database for MySQL Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGres
  • 7. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 7 Relational Databases Managed relational SQL Database as a service Azure SQL Database Azure SQL Database
  • 8. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 8 • Azure SQL • SQL Server Paas service • Managed upgrades, patches, backups and monitoring • Latest Stable version of SQL Server • 99,99% availability • Deployment model • Single Database : database runs on non shared resources • Elastic Pool : database runs with a collection of databases that share set of resources at a predictable price
  • 9. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 9 • Azure SQL • Purchasing model • DTU (Database Transaction Unit) : https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tiers- dtu • Basic tier • Standard Tier • Premium Tier • vCore model • Serverless • Service Tier • General Purpose (vCore) / Standard (DTU) : Common workloads • Business Critical (vCore) / Premium (DTU) : High transaction and availability / low latency IO • HyperScale (vCore) : • Up to 100Tb Database • Rapid Scale up (compute resources) • Rapid Scale out (read only nodes : read workload / hot-standby)
  • 10. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 10 • Azure SQL Managed Instance • Features • Paas platform for lift and shift at scale • Broadest SQL Server engine compatibility (network integration, features etc.) • With perservation of all Paas capabilities (patching, updates, backups, HA etc.) • vCore purchase model only • BYOL available • SQL Virtual Machine • SQL Server deployment on VM (Linux and Windows) • Can choice SQL Server version • From 2008 R2 • Up to 2019
  • 11. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 11 Azure SQL Database Managed Instance Instance scoped model with high compatibility to SQL Server Best for modernisation at scale with low cost effort (lift & shift) Single Standalone managed database for predictable and stable workloads Elastic Pool Shared resources model : multitenant
  • 12. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 12 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MySQL database service for app developers Azure Database for MySQL Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre
  • 13. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 13 • Paas Service for PostgreSQL • Runs on Windows • Single Server • v9.5 to 11 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/postgresql/concepts-pricing- tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Bunch of PG Extensions available • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 14. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 14 • Paas Service for PostgreSQL • HyperScale (Citus) • High performance and analytical workloads beyond 100Gb • Hyperscale delivers • Horizontal scaling across multiple machine (with Sharding) • Query parallelization across these servers • High performance for analytics • Based on server groups • Design approach required for table distribution and performance • Distributed tables (based on distribution column) • Reference tables (content concentrated into a single shard replicated on every worker node) • Local tables (ordinary unsharded tables. Perfect for small tables not involded into joins) • Automated backup through storage snapshots
  • 15. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 15 • Paas Service for PostgreSQL • Flexible Server (Preview) • Automated patching • Automatic backups • Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized High Availability Zone Redundant HA (Optional)
  • 16. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 16 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre Managed MySQL database service for app developers Azure Database for MySQL
  • 17. dataredkite.com premiseo.com Azure Database for MariaDB 27/04/2021 Meetup Azure Lille 17 • Paas Service for MariaDB • Runs on Windows • Single Server • V10.2 and 10.3 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mariadb/concepts-pricing-tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 18. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 18 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre Managed MySQL database service for app developers Azure Database for MySQL
  • 19. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 19 • Paas Service for MySQL • Runs on Windows • Single Server • V5.6, 5.7, and 8.0 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mysql/concepts-pricing-tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 20. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 20 • Paas Service for MySQL Flexible Server (Preview) • V5.7 • Automated patching • Automatic backups • Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized • Network Isolation • Private Access through Vnet integration • Public Access
  • 21. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 21 • Paas Service for MySQL Flexible Server (Preview) High Availability Zone Redundant HA (Optional)
  • 22. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 22 NOSQL Databases Globally distributed, multi-model database for any scale Azure Cosmos DB
  • 23. dataredkite.com premiseo.com Azure Cosmos DB 26/02/2021 23 A globally distributed, massively scalable, multi-model database service Azure Cosmos DB o SQL API o MongoDB API o Cassandra API o Gremlin API o Table API
  • 25. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 25 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 26. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 26 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 27. dataredkite.com premiseo.com Storage Account 26/02/2021 27 o Azure Blobs : A scalable object store for text and binary data o Azure Files : Managed file shares for cloud or on-premises deployments o Azure Queues : A messaging store for reliable messaging between application components o Azure Tables : A NoSQL store for no-schema storage of structured data Azure Storage accounts are the base storage type within Azure. Azure Storage offers a very scalable object store for data objects and file system services in the cloud. It can also provide a messaging store for reliable messaging, or it can act as a NoSQL store. Azure selected four of these data services and placed them together under the name Azure Storage. The four services are Azure Blobs, Azure Files, Azure Queues, and Azure Tables. The following illustration shows the elements of Azure Storage
  • 28. dataredkite.com premiseo.com Storage Account 26/02/2021 28 Type of Storage Account Storage account type Services Redundancy options General-purpose V2 Basic storage account type for blobs, files, queues, and tables. Recommended for most scenarios using Azure Storage. LRS, GRS, RA-GRS, ZRS, GZRS, RA-GZRS General-purpose V1 Legacy account type for blobs, files, queues, and tables. Use general-purpose v2 accounts instead when possible. LRS, GRS, RA-GRS BlockBlobStorage Storage accounts with premium performance characteristics for block blobs and append blobs. Recommended for scenarios with high transactions rates, or scenarios that use smaller objects or require consistently low storage latency. LRS, ZRS FileStorage Files-only storage accounts with premium performance characteristics. Recommended for enterprise or high performance scale applications. LRS, ZRS BlobStorage Legacy Blob-only storage accounts. Use general-purpose v2 accounts instead when possible. LRS, GRS, RA-GRS
  • 31. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 31 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 32. dataredkite.com premiseo.com Azure Datalake Store 26/02/2021 32 Azure Data Lake Storage is a Hadoop-compatible data repository that can store any size or type of data. This storage service is available as Generation 1 (Gen1) or Generation 2 (Gen2). Key features of Data Lake Storage: o Unlimited scalability o Hadoop compatibility o Security support for both access control lists (ACLs) & RBAC (for Gen 2 only) o POSIX compliance o An optimized Azure Blob File System (ABFS) driver that's designed for big-data analytics o Zone-redundant storage o Geo-redundant storage Azure Datalake Gen 1 Azure Datalake Gen 2
  • 33. dataredkite.com premiseo.com Choose a storage solution on Azure 26/02/2021 33 Data classification Operations Latency & throughput Transactional support Recommended service Product catalog data Semi-structured because of the need to extend or modify the schema for new products o Customers require a high number of read operations, with the ability to query on many fields within the database. o The business requires a high number of write operations to track the constantly changing inventory. High throughput and low latency Required Azure Cosmos DB Photos and videos Unstructured o Only need to be retrieved by ID. o Customers require a high number of read operations with low latency. o Creates and updates will be somewhat infrequent and can have higher latency than read operations. Retrievals by ID need to support low latency and high throughput. Creates and updates can have higher latency than read operations. Not required Azure Blob storage Business data Structured Read-only, complex analytical queries across multiple databases Some latency in the results is expected based on the complex nature of the queries Required Azure SQL Database Azure Database for MariaDB Azure Database for PostGre Azure Database for MySQL
  • 35. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 35 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 36. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 36 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 37. dataredkite.com premiseo.com Azure Function 37 Azure Functions is the serverless compute service from Microsoft. Functions are event-driven: each function defines a trigger — the exact definition of the event source, for instance, the name of a storage queue. Uses cases: If you want to... then... Build a web API Implement an endpoint for your web applications using the HTTP trigger Process file uploads Run code when a file is uploaded or changed in blob storage Build a serverless workflow Chain a series of functions together using durable functions Respond to database changes Run custom logic when a document is created or updated in Cosmos DB Run scheduled tasks Execute code at set times Create reliable message queue systems Process message queues using Queue Storage, Service Bus, or Event Hubs
  • 38. dataredkite.com premiseo.com Azure Function 38 Consumption Plan Functions Consumption Plan (B1, B2, B3, S1, S2, S3 Scale automatically and only pay for compute resources when your functions are running. On the Consumption plan, instances of the Functions host will be dynamically added and removed based on the number of incoming events. Premium plan (P1v2, P2v2, P3v3) While automatically scaling based on demand, use prewarmed workers to run applications with no delay after being idle, run on more powerful instances and connect to VNETs. Azure App Service plan Run Functions within an App Service plan at regular App Service plan rates. Good fit for long- running operations, as well as when more predictive scaling and costs are required. Azure Functions hosting options : Azure Plan
  • 39. dataredkite.com premiseo.com 27/04/2021 39 Durable Functions is a library that brings workflow orchestration abstractions to Azure Functions. It introduces a number of idioms and tools to define stateful, potentially long-running operations, and manages a lot of mechanics of reliable communication and state management behind the scenes. Log of events in the course of orchestrator progression 3 steps of a workflow executed in sequence https://medium.com/hackernoon/making-sense-of-azure-durable-functions- 645ecb3c1d58 Azure Function Azure Durable Functions
  • 40. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 40 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 41. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 41 • Serverless Data Integration service • Data Pipeline : logical group of activities • Data Flow : Data Transformation activity • Data Copy : Data Transfer activity • SSIS Integration • Git integration
  • 42. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 42 • Serverless Data Integration service • Job scheduling • Automatically through internal Scheduler • Manually • SDK : .NET, Python • REST API • PowerShell
  • 43. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 43 • Serverless Data Integration service • Integration runtime • Compute infrastructure used by ADF to provide data integration • Azure : Serverless • Self Hosted : Onprem or Azure Virtual Machine (Windows) • SSIS Activity Features Azure Data Flow Data Copy Dispatch Activity (HDI, Databricks, SQL …) Cloud to Cloud data transfer/flows Self-Hosted Data Flow Data Copy Dispatch Activity (HDI, Databricks, SQL …) OnPrem or Virtual Machine deployment (Windows) OnPrem <-> Cloud data transfer/flows When connectors are not available SSIS SSIS Package execution Private or public Network
  • 44. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 44 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 45. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 45 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 47. dataredkite.com premiseo.com Azure Databricks 26/02/2021 47 Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks offers two environments for developing data intensive applications: o Azure Databricks Workspace: provides an interactive workspace that enables collaboration between data engineers, data scientists, and machine learning engineers. o Azure Databricks SQL Analytics: provides an easy-to-use platform for analysts who want to run SQL queries on their data lake, create multiple visualization types to explore query results from different perspectives, and build and share dashboards.
  • 48. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 48 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 49. dataredkite.com premiseo.com Azure HDInsights 27/04/2021 Meetup Azure Lille 49 • Managed Hadoop distribution for Azure • Based on Cloudera Hortonworks hadoop distribution • Comes in various flavours / shapes (VM shapes and number) • Hadoop : General purpose (HDFS, Yarn, MapReduce, Hive, Pig, Sqoop, Oozie) • Spark • Kafka • HBase • Hive / LLAP (Interactive Query) • Storm (Stream processing) • ML Services with R
  • 50. dataredkite.com premiseo.com Azure HDInsights 27/04/2021 Meetup Azure Lille 50 • At least one Storage account mandatory (for libs and binaries) • External Metastores available for Ambari, Hive and Oozie • HDInsights architecture
  • 51. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 51 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsigth Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 52. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 52 Stream Analytics Data Factory Data Lake Modern Analytics MPP Datawarehouse
  • 53. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 53 MPP Datawarehou se Choice of language (T-SQL, Spark SQL, Python, Scala, .Net) Analytics ready (Analysis Services, Power BI) Data Science and AI Ready (Azure Machine Learning integration)
  • 54. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 54 Synapse Analytics • Sample Use Case : Pure Business Intelligence !
  • 55. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 55 • Not for small database (Usually > 1Tb) • Cost Model • Synapse Provisioned • T-SQL Pool with DWU (Datawarehouse Units) • Storage (Geo redundant option) • Synapse Serverless • Spark Pools • Synapse Pipeline
  • 56. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 56 • Architecture • DMS (Data Movement Service) • Used for Data Colocation • Key point: Data Partitioning and Data Distribution
  • 57. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 57 • Hash distributed table • Replicated Table • Round Robin distributed Table • Example • Dimension to Fact table join
  • 58. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 58 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 59. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 59 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 61. dataredkite.com premiseo.com Azure Streaming Analytics 26/02/2021 61 o Azure Stream Analytics supports user-defined functions (UDF) or user-defined aggregates (UDA) in JavaScript for cloud jobs and C# for IoT Edge jobs UDFs, UDAs, and custom deserializers: o Analyze real-time telemetry streams from IoT devices o Web logs/clickstream analytics o Geospatial analytics for fleet management and driverless vehicles o Remote monitoring and predictive maintenance of high value assets o Real-time analytics on Point of Sale data for inventory control and anomaly detection Examples scenarios:
  • 62. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 62 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 63. dataredkite.com premiseo.com Azure Iot Hub 63 Azure IoT Hub : o The cloud gateway that connects IoT devices to gather data and drive business insights and automation. o The big data streaming service of Azure. It is designed for high throughput data streaming scenarios where customers may send billions of requests per day. o Bi-directional communication capabilities
  • 64. dataredkite.com premiseo.com Iot Hub or Event Hubs 64 IoT Hub was developed to address the unique requirements of connecting IoT devices to the Azure cloud while Event Hubs was designed for big data streaming. Microsoft recommends using Azure IoT Hub to connect IoT devices to Azure. IoT Capability IoT Hub standard tier IoT Hub basic tier Event Hubs Device-to-cloud messaging Protocols: HTTPS, AMQP, AMQP over webSockets Protocols: MQTT, MQTT over webSockets Per-device identity File upload from devices Device Provisioning Service Cloud-to-device messaging Device twin and device management Device streams (preview) IoT Edge
  • 65. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 65 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Connect, monitor and manage billions of IoT assets Spark Streaming with Databricks
  • 66. dataredkite.com premiseo.com Apache Kafka on HDInsight architecture 27/04/2021 Meetup Azure Lille 66
  • 67. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 67 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 68. dataredkite.com premiseo.com Azure Databricks 26/02/2021 68 o Apache Spark Streaming is a scalable fault-tolerant streaming processing system that natively supports both batch and streaming workloads. o Spark Streaming is an extension of the core Spark API
  • 70. dataredkite.com premiseo.com Azure Data Studio 26/02/2021 70 Azure Data Studio is a cross-platform database tool that you can run on Windows, macOS, and Linux. You'll use it to connect to SQL Data Warehouse and Azure SQL Database. Previously released under the preview name SQL Operations Studio, Azure Data Studio offers a modern editor experience with IntelliSense, code snippets, source control integration, and an integrated terminal. It is engineered with the data platform user in mind, with built in charting of query result sets and customizable dashboards.
  • 71. dataredkite.com premiseo.com Storage Explorer 26/02/2021 71 Begin by downloading and installing Storage Explorer. You can use Storage Explorer to do several operations against data in your Azure Storage account and data lake: o Upload files or folders from your local computer into Azure Storage. o Download cloud-based data to your local computer. o Copy or move files and folders around in the storage account. o Delete data from the storage account.
  • 72. dataredkite.com premiseo.com Visual Studio Code 26/02/2021 72 Visual Studio Code is a lightweight source code editor which runs on your desktop and is available for Windows, macOS and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity).
  • 73. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 73 Data Migration Tools
  • 74. dataredkite.com premiseo.com Summary 26/02/2021 74 Scenario Some recommended solutions Disaster Recovery Azure geo-redundant backups Read Scale Use read-only replicas to load balance read-only query workloads (preview) ETL (OLTP to OLAP) Azure Data Factory or SQL Server Integration Services or Databricks Migration from on-premises SQL Server to Azure SQL Database Azure Database Migration Service Kept up-to-date across several Azure SQL databases or SQL Server database Azure SQL Data Sync Detecting compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL Database Data Migration Assistant (DMA)
  • 77. premiseo.com dataredkite.com 26/02/2021 77 Just few sources in Microsoft Learn: o Azure for the Data Engineer o Store data in Azure o Work with relational data in Azure o Large Scale Data Processing with Azure Data Lake Storage Gen2 o Implement a Data Streaming Solution with Azure Streaming Analytics o Implement a Data Warehouse with Azure SQL Data Warehouse Sources
  • 79. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 79 Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks Next Session: Azure Databricks
  • 80. dataredkite.com premiseo.com Thank you 26/02/2021 80 Meetup Azure Lille dataredkite.com https://premiseo.com/