Discovering Insights - Azure Data Explorer Unleashed

Discovering Insights:
Azure Data Explorer
Unleashed
Callon Campbell
Microsoft MVP | Azure
@flying_maverick

Callon Campbell
Azure Architect | Developer
Adastra
Microsoft MVP | Azure (2018-2024)
 25 years enterprise development with Microsoft technologies – .NET
(C#), Azure, ASP.NET, Desktop, SQL, and Mobile
 Passionate about serverless and cloud-native application development,
with focus on app migration and modernization, app integration and
data analytics
 Blogging at https://TheFlyingMaverick.com, and on @flying_maverick
 Speaker at community events and meetups
 Organizer of “Canada’s Technology Triangle .NET User Group” in
Kitchener, Ontario
About me

Agenda  What is Azure Data Explorer
 Infrastructure
 Kusto Query Language
 Demos
 Q&A

Azure Data Explorer (ADX)
Any append-
only stream of
records
Relational query
model:
Filter, aggregate,
join, calculated
columns, …
Fully-
managed
Rapid iterations to
explore the data
High volume
High velocity
High variance
(structured, semi-
structured, free-text)
PaaS, Vanilla,
Database
A big data analytics cloud platform
optimized for interactive, ad-hoc
queries
Purposely built

Query Tools
Orchestration
Notebook Connectivity
Connectors
TDS, JDBC, ODBC
API and Client Libraries
REST
API
Export
Visualization
ADX
Dashboard
Data Ingestion
ecosystem
Managed pipelines
SDKs
Tools
One Click
Ingestion
LightIngest
Connection & plugins
Data Formats
Data Consumption
ecosystem
Batching
Streaming
Azure Data Explorer Architecture

Batching vs streaming ingestion
Batching
• Optimized for high ingestion throughput
• Preferred method and most performant
• Data is batched according to properties
• Set ingestion batching policy on databases
or tables
• Default max batching value is 5 minutes,
1000 items or total of 1 GB
• 4 GB data size limit for a batch ingestion
command
Streaming
• Ongoing data ingestion from a streaming
source
• Near real-time latency for small sets of data
per table
• Initially ingested to row store
• Then moved to column store extents
• Steaming can be done using ADX client
library or supported pipelines
NOTE: The recommendation is to ingest files between 100 MB and 1 GB.

Tools
Connection & plugins
Managed pipelines
SDKs
Data Management and Ingestion
Data Explorer
Batching
Ingestion
Management
Data
Management
Connections
Data Explorer
Engine
1
One Click
Ingestion
LightIngest

Cross Queries
Between Data Explorer databases and clusters
1
Query SQL pool from Data Explorer
Query Azure Monitor from Data Explorer
Query Cosmos DB from Data Explorer
Query Data Lake from Data Explorer

Data Sharing
Share data within the company
• Load balancing
• Data-as-a-service
• Hub and Spoke model
• Chargeback
No Maintenance
• In-Place Data Sharing
• No data pipeline to
maintain
• Near real-time updates
1
Leader
Ingest Query
DB1 DB2 DB3
Symbolic link
Azure Blob Storage
Compute
R/W R/W R/W
Follower
DB3 DB4
Azure Blob Storage
Compute
R/W
Query
R

Infrastructure Cluste
r
Databas
e
Table
Table
Table
Databas
e
Table
Table
Function
View

Using ADX Commands to Manage Tables
 .create table
 .create-merge table
 .drop table
 .alter column
 .rename column

What is a Kusto query?
• A Kusto query is a read-only request
to process data and return results.
• Has one or more query statements
and returns data in a tabular or graph
format.
• Statements are sequenced by a pipe
(|).
Data flows, or is piped, from one
operator to the next.
• The data is filtered/manipulated at
each step and then fed into the
following step.
It's like a funnel, where you start out
with an entire data table.
• Each time the data passes through
another operator, it's filtered,

KQL Concepts
 Relational operators (filters, union, joins, aggregations, …)
 Each operator consumes tabular input and produces tabular
output
 Can be combined with ‘|’ (pipe).
 Ease to write, read, change

KQL
Basic operators
for data
exploration
… | count
• Counts records in input table (e.g. T)
… | take 10
• Get few records - convenient to start get familiar with the data
• No actual order ensured
… | where Timestamp > ago(1) and UserId ==
‘abdcdef’
• Filtering on a specific fields
… | project Col1, Col2, …
• Choose some columns (great if input table has dozens of coluns)
…| extend NewCol1=Col1+Col2
• Introduces new calculated columns
… | render timechart
• Plot the data (in KE and KWE) while exploring

SQL to KQL
 Try the ‘EXPLAIN’ operator as follows
 Use SQL to KQL Cheat Sheet

Query Optimization Tips
 Use Materialized Views
 Use Time Filters First!
 Avoid filtering on calculated columns
 Use case-insensitive operators when possible
tinyurl.com/
ADXQueryBestPractices

Schema is
Relational, Lightweight, Dynamic
Databases
Authorization boundary
Transaction boundary
But not query boundary!
Supporting cross-database and cross-cluster
queries
Tables
Rectangular
Columns
Supported types: boolean, integer, real, decimal,
dates, timespan, string, dynamic (JSON)
Stored functions (views)
Materialized views
Schema

Stored Functions
 Essence: Reusable function, defined
and used in a Database scope
 View is parameter-less function
 Schema: scalar or tabular
 Special powers: Can override table
with the same name.
 Safe and Secure: Control commands
are forbidden.
 Applications:
 Sharing queries between
users/applications
 Abstracting complex logic from other
applications
 SQL-compat tools connecting to Kusto via
Views to run high perf queries

Update Policy
(inner
ETL/Trigger)
 Essence: Triggered ingestion into
another table.
 Semantics: Attached to ‘target’
table and points on ‘source’ table.
Transformation is an arbitrary
Kusto query.
 Special powers: Source table is
scoped to the newly ingested data
only.
 Applications:
 Transform data schema (lightweight
ETL)
 De-multiplexing data stream into
several tables
 Use target tables with longer retention
 Reduce duplications

Discovering Insights - Azure Data Explorer Unleashed

Did you know
 The Azure Data Explorer (ADX), formerly known by its internal
code name “Kusto”, has an interesting origin story.
 Back in 2014, the team needed a name that captured the
essence of their mission: exploring vast oceans of data. Inspired
by the legendary oceanographer Jacques Cousteau, they chose
the internal code name "Kusto". Just like Cousteau explored the
depths of the ocean, the Kusto project aimed to tackle the
challenges of fast and scalable log and telemetry analytics.

Follow up
https://aka.ms/adx.docs
https://aka.ms/kustofree
https://aka.ms/adx.try
https://aka.ms/adx.lab
https://aka.ms/adx.cost
https://aka.ms/adx.architectures

Let’s connect
https://LinkedIn.com/in/CallonCampb
ell
@Flying_Maverick
Callon@CloudMavericks.ca
https://GitHub.com/CallonCampbell

Discovering Insights - Azure Data Explorer Unleashed

More Related Content

Similar to Discovering Insights - Azure Data Explorer Unleashed (20)

More from Callon Campbell (20)

Recently uploaded (20)

Discovering Insights - Azure Data Explorer Unleashed

Editor's Notes