SlideShare a Scribd company logo
#azuresatpn
Azure Saturday 2019
Azure ADX
Time Series Analytics with Azure ADX
#azuresatpn
Questions
What about TIME SERIES DATABASE?
When Have I to use it?
Which are market possible choices?
OpenTSDB? Kairos over Scylla/Cassandra? Influx?
Why Have I to learn Another DB yet!?
Why not SQL? Why not COSMOS?
#azuresatpn
1. Intro
2. Service
3. Trust
4. Basics
5. Tecniques
6. Dive into Scalar Functions
7. Real Use Cases (in IIoT)
Multi-temperature data processing paths
#azuresatpn
<INTRO/>
#azuresatpn
• seconds freshness, days retention
• in-mem aggregated data
• pre-defined standing queries
• split-seconds query performance
• data viewing
Hot
• minutes freshness, months retention
• raw data
• ad-hoc queries
• seconds-minutes query perf
• data exploration
Warm
• hours freshness, years retention
• raw data
• programmatic batch processing
• minutes-hours query perf
• data manipulation
Cold
• in-mem cube
• stream analytics
• …
• column store
• Indexing
• …
• distributed file system
• map reduce
• …
Multi-temperature data processing paths
#azuresatpn
What is Azure Data Explorer
Any append-
only stream
of records
Relational query
model:
Filter, aggregate, join,
calculated columns, …
Fully-
managed
Rapid iterations to
explore the data
High volume
High velocity
High variance
(structured, semi-
structured, free-text)
PaaS,
Vanilla,
Database
Purposely built
#azuresatpn
Fully managed big data analytics service
• Fully managed
for efficiency
Focus on insights, not the
infra-structure for fast time to
value
• No infrastructure to manage;
provision the service, choose the
SKU for your workload, and
create database.
• Optimized for
streaming data
Get near-instant insights
from fast-flowing data
• Scale linearly up to 200 MB per
second per node with highly
performant, low latency ingestion.
• Designed for
data exploration
• Run ad-hoc queries using the
intuitive query language
• Returns results from 1 Billion
records < 1 second without
modifying the data or metadata
#azuresatpn
<SERVICE/>
#azuresatpn
When is it useful?
1. Analyze Telemetry data
2. Retrieve trends/Series from clustered data
3. Make regression over Big Data
4. Summarize and export ordered streams
LAB 01
From IoT Hub to ADX
https://docs.microsoft.com/it-
it/azure/data-explorer/ingest-data-
iot-hub
#azuresatpn
Azure Data Explorer Architecture
SPARK
ADF
Apps => API
Logstash plg
Kafka sync
IotHub
EventHub
EventGrid
Data
Management
Engine
SSD
Blob /
ADLS
STREAM
BATCH
Ingested
DAta
ODBC
PowerBI
ADX UI
MS Flow
Logic Apps
Notebooks
Grafana
Spark
#azuresatpn
How about the ADX Story?
Telemetry Analytics for internal Analytics Data Platform for products
AI OMS ASC Defender IOT
Interactive Analytics Big Data Platform
2015 - 2016
Starting with 1st party validation
Building modern analytics
Vision of analytics platform for MSFT
2019
Analytics engine for 3rd party offers
Unified platform across OMS/AI
Expanded scenarios for IOT timeseries
Bridged across client/server security
2017
GA - February 2019
#azuresatpn
Available SKU
Attribute D SKU L SKU
Small SKUs Minimal size is D11 with two cores Minimal size is L4 with four cores
Availability Available in all regions (the DS+PS version
has more limited availability)
Available in a few regions
Cost per GB cache per core High with the D SKU, low with the DS+PS
version
Lowest with the Pay-As-You-Go option
Reserved Instances (RI) pricing High discount (over 55 percent for a three-
year commitment)
Lower discount (20 percent for a three-year
commitment)
• D v2: The D SKU is compute-optimized (Optional Premium Storage disk)
• LS: The L SKU is storage-optimized (greater SSD size than D SKU equivalent)
D1-5 v2 instances are based on either the 2.4 GHz
Intel Xeon® E5-2673 v3 (Haswell) processor or the
2.3 GHz Intel Xeon® E5-2673 v4 (Broadwell)
processor and can achieve 3.1 GHz with Intel Turbo
Boost Technology 2.0
Ls-series instances are storage-optimised virtual
machines for low-latency workloads such as NoSQL
databases (e.g. Cassandra, MongoDB and Redis)
#azuresatpn
Azure Data Explorer SLA
• SLA: at least 99.9%
availability (Last
updated: Feb 2019)
Maximum Available Minutes: is the total number of minutes for a given Cluster deployed by Customer in a
Microsoft Azure subscription during a billing month.
Downtime: is the total number of minutes within Maximum Available Minutes during which a Cluster is unavailable.
Monthly Uptime Percentage: for the Azure Data Explorer is calculated as Maximum Available Minutes less
Downtime divided by Maximum Available Minutes.
Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100
Daily: 01m 26.4s
Weekly: 10m 04.8s
Monthly: 43m 49.7s
#azuresatpn
Pricing
• Based on VM size, Storage and Network
• Not based on DATABASE Numbers
https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
Think to ADX as a ANALYSIS TOOL, in a MultiTenant
Environment
if you want to pay little money
if you can afford a space shuttle
Think to ADX as an INGESTION+RESILIENCY TOOL in
order to break through your traditional «Live DWH»
#azuresatpn
<TRUST/>
#azuresatpn
First questions about ADX
• Are we sure that is a mature Service?
• which are the correct use cases where it is really useful?
• Which are the OSS Alternatives that I should compare with?
#azuresatpn
Typical use cases
1. You need a Telemetry Analytics Platform, in order to retrieve aggregations or
statistical calcultation on historial series («As an IT Manager» I want a platform to
load logs from various file types, in order to analyze them and focus graphically the
problem during time)
2. You want to offer multi tenant SAAS Solutions («As a Product Lead Engineer» I want
to manage the backend of my multitenant SAAS solution using a unique, fat, huge
backend service)
3. You need, within an Industrial IoT solution deelopment, to have common backend
to handle with process variables, to make a correation analysis using continuous
stream query («As a Quality manager» I need a prebuilt backend solutions to
dynamically configure time based query on data in order tofind out correlations
from process variables )
#azuresatpn
Why ADX is Unique
Simplified costs
• Vm costs
• ADX service add
on cost
Many Prebuilt
Inputs
• ADF
• Spark
• Logstash
• Kafka
• Iothub
• EventHub
Many Prebuilt
Outputs
• TDS/SQL
• Power BI
• ODBC Connector
• Spark
• Jupyter
• Grafana
#azuresatpn
Azure services with ADX usage
Azure Monitor
• Log Analytics
• Application Insights
Security Products
• Windows Defender
• Azure Security Center
• Azure Sentinel
IoT
• Time Series Insights
• Azure IoT Central
#azuresatpn
ADX vs Elastic Search
From db-engines.com
Azure Data Explorer
Fully managed big data interactive analytics
platform
Elastic Search
A distributed, RESTful modern search and
analytics engine
#azuresatpn
<BASICS/>
#azuresatpn
Create a Cluster
ADX follows standard creation process
• Azure CLI
• Powershell
• C#
• Python
• ARM
Login
az login
Select Subscription
az account set --subscription MyAzureSub
Cluster creation
az kusto cluster create --name azureclitest --sku
D11_v2 --resource-group testrg
Database Creation
az kusto database create --cluster-name azureclitest --
name clidatabase --resource-group testrg --soft-
delete-period P365D --hot-cache-period P31D
HOT-CACHE-PERIOD: Amount of time that data should be kept in cache.
Duration in ISO8601 format (for example, 100 days would be P100D).
SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available to query.
Duration in ISO8601 format (for example, 100 days would be P100D)
#azuresatpn
How to set and use ADX?
Create a database
Use Database to link Ingestion Sources
[Optional]
Choose a DataConnection
EventHub | Blob Storage | IotHub
#azuresatpn
How to script with Visual Studio Code
• Use Log Analytics or KUSTO/KQL extensions
( .csl | .kusto | .kql)
• Open VSC, create a file, save it and then
edit
• [Optional] Build a web application using
MONACO IDE, then share kusto code with
friends
https://microsoft.github.io/monaco-editor/index.html
#azuresatpn
How about the Tools?
3.VISUALIZE
• Azure Notebooks
(preview)
• Power BI
• Graphana
2.QUERY
• Kusto.Explorer
• Web UI
4.ORCHESTRATE
• Microsoft Flow
• Microsoft Logic App
1.LOAD
• LightIngest
• Azure Data Factory
Load
Query
Visualize
Orchestrate
BI People
IT People
ML People
#azuresatpn
• command-line utility for ad-hoc data
ingestion into Kusto
• pull source data from a local folder
• pull source data from an Azure Blob
Storage container
• Useful to ingest fastly and play with
ADX
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
What is LightIngest
LAB 0X
LightIngest
https://docs.microsoft.com/en-
us/azure/kusto/tools/lightingest
#azuresatpn
<TECNIQUES/>
#azuresatpn
Ingestion capabilities
Event Grid (using Blob as trigger)
Ingest Azure Blobs into Azure Data Explorer
Event Hub pipeline
Ingest data from Event Hub into Azure Data Explorer
Logstash plugin
Ingest data from Logstash to Azure Data Explorer
Kafka connector
Ingest data from Kafka into Azure Data Explorer
Azure Data Factory (ADF)
Copy data from Azure Data Factory to Azure Data Explorer
Kusto offers client SDK that can be
used to ingest and query data with:
• Python SDK
• .NET SDK
• Java SDK
• Node SDK
• REST API
• Not only Azure Endpoints
• As a ELK replacement, offers Logstash plugin
• As a OSS LAMBDA replacement, offers Kafka connector
#azuresatpn
Ingestion Tecniques
LAB 02
Queued Ingestion
https://docs.microsoft.com/en-
us/azure/kusto/api/netfx/kusto-
ingest-queued-ingest-sample
For high-volume, reliable, and
cheap data ingestion
Batch ingestion
(provided by SDK)
the client uploads the data to Azure Blob
storage (designated by the Azure Data
Explorer data management service) and
posts a notification to an Azure Queue.
Batch ingestion is the recommended
technique.
Most appropriate for
exploration and prototyping
.Inline ingestion
(provided by query tools)
Inline ingestion: control command
(.ingest inline) containing in-band data is
intended for ad hoc testing purposes.
Ingest from query: control command
(.set, .set-or-append, .set-or-replace)
that points to query results is used for
generating reports or small temporary
tables.
Ingest from storage: control command
(.ingest into) with data stored externally
(for example, Azure Blob Storage) allows
efficient bulk ingestion of data.
LAB 03
Inline Ingestion
https://docs.microsoft.com/it-
it/azure/kusto/management/data-
ingestion/ingest-inline
#azuresatpn
Supported data formats
For all ingestion methods other than ingest from query, format the data so that
Azure Data Explorer can parse it. The supported data formats are:
• CSV, TSV, TSVE, PSV, SCSV, SOH
• JSON (line-separated, multi-line), Avro
• ZIP and GZIP
Schema mapping helps bind source data fields to destination table columns.
• CSV Mapping (optional) works with all ordinal-based formats. It can
be performed using the ingest command parameter or pre-created
on the table and referenced from the ingest command parameter.
• JSON Mapping (mandatory) and Avro mapping (mandatory) can be
performed using the ingest command parameter. They can also
be pre-created on the table and referenced from the ingest
command parameter.
LAB 04
Mapping example
https://docs.microsoft.com/it-
it/azure/kusto/management/data-
ingestion/ingest-inline
#azuresatpn
Use ADX as ODBC Datasource
1. Download SQL 17 ODBC Driver:
https://www.microsoft.com/en-us/download/details.aspx?id=56567
2. Configure ODBC source (as a normal SQL SERVER ODBC DSN )
Than you can use your preferred tool: POWER BI DESKTOP, QLIK SENSE DESKTOP, SISENSE, ecc.
#azuresatpn
Notebooks + ADX = KQL Magic
KQL magic:
https://github.com/microsoft/jupyter-Kqlmagic
• extends the capabilities of the Python kernel in Jupyter
• can run Kusto language queries natively
• combine Python and Kusto query language
LAB 05
Notebook example
https://notebooks.azure.com/riccardo
-zamana/projects/azuresaturday2019
#azuresatpn
<DIVE-INTO-ADX/>
#azuresatpn
Kusto for SQL USers
• Perform SQL SELECT (no DDL, only SELECT)
• Use KQL (Kusto Query Language)
• Supports translating T-SQL queries to Kusto query language
--
explain
select top(10) * from StormEvents
order by DamageProperty desc
StormEvents
| sort by DamageProperty desc nulls first
| take 10
LAB 05
SQL to KQL example
https://docs.microsoft.com/en-
us/azure/kusto/query/sqlcheatsheet
#azuresatpn
ADX Functions
Functions are reusable queries or query parts. Kusto
supports several kinds of functions:
• Stored functions, which are user-defined functions that
are stored and managed a one kind of a database's
schema entities. See Stored functions.
• Query-defined functions, which are user-defined
functions that are defined and used within the scope of
a single query. The definition of such functions is done
through a let statement. See User-defined functions.
• Built-in functions, which are hard-coded (defined by
Kusto and cannot be modified by users).
LAB 06
Function example
https://docs.microsoft.com/en-
us/azure/kusto/query/functions/user-
defined-functions
#azuresatpn
Language examples
Alias
database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").database("somedatabase");
database("wiki").PageViews | count
Let
start = ago(5h);
let period = 2h;
T | where Time > start and Time < start + period | ...
Bin:
T | summarize Hits=count() by bin(Duration, 1s)
Batch:
let m = materialize(StormEvents | summarize n=count() by State);
m | where n > 2000; m | where n < 10
Tabular expression:
Logs
| where Timestamp > ago(1d)
| join ( Events | where continent == 'Europe' ) on RequestId
#azuresatpn
Time Series Analysis – Bin Operator
T | summarize Hits=count() by bin(Duration, 1s)
bin(value,roundTo)
USE CASE
bin operator
Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be
grouped into a smaller set of specific values.
[Rule]
[Example]
#azuresatpn
Time Series Analysis – Make Series Operator
T | make-series sum(amount) default=0, avg(price) default=0 on
timestamp from datetime(2016-01-01) to datetime(2016-01-10) step
1d by supplier
T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on
AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]]
make-series operator
[Rule]
[Example]
USE CASE
#azuresatpn
Time Series Analysis – Basket Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
basket operator
Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns
that passed the frequency threshold in the original query.
[Rule]
[Example]
T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
USE CASE
#azuresatpn
Time Series Analysis – Autocluster Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
autocluster operator
AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the
original query (whether it's 100 or 100k rows) to a small number of patterns.
[Rule]
[Example]
T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...])
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
USE CASE
#azuresatpn
Export
To Storage
.export async compressed to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey
" ) with ( sizeLimit=100000, namePrefix=export, includeHeaders=all,
encoding =UTF8NoBOM ) <| myLogs | where id == "moshe" | limit 10000
To Sql
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Database=
MyDatabase;Authentication=Active Directory
Integrated;Connection Timeout=30;" with
(createifnotexists="true", primarykey="Id") <| print Message =
"Hello World!", Timestamp = now(), Id=12345678
1. DEFINE COMMAND
Define ADX command and try your
recurrent export strategy
2. TRY IN EDITOR
Use an Editor to try command,
verifying conection strings and
parametrizing them
3. BUILD A JOB
Build a Notebook or a C# JOB using
the command as a SQL QUERY in
your CODE
LAB 08
Export example
https://docs.microsoft.com/en-
us/azure/kusto/query/functions/user-
defined-functions
#azuresatpn
External tables & Continuous Export
It’s an external endpoint:
• Azure Storage
• Azure Datalake Store
• SQL Server
You need to define:
• Destination
• Continuous-Export Strategy
EXT TABLE CREATION
.create external table ExternalAdlsGen2 (Timestamp:datetime, x:long,
s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv (
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey
' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" )
EXPORT to EXT TABLE
.create-or-alter continuous-export MyExport over (T) to table
ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m,
sizeLimit=104857600) <| T
#azuresatpn
Policy
• Cache policy
• Ingestion Batching policy
• IngestionTime policy
• Merge policy
• Retention policy
• Restricted view access policy
• Row order policy
• Streaming ingestion policy
• Sharding policy
• Update policy
#azuresatpn
FACTS:
A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage).
B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes,
Retention policy
The Kusto cache provides a granular cache policy that
customers can use to differentiate between two data
cache policies: hot data cache and cold data cache.
set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d) on X
YOU CAN SPECIFY WHICH LOCATION MUST BE USED
Cache policy
is independent
of retention policy !
#azuresatpn
Retention policy
• Soft Delete Period (number)
• Data is available for query
ts is the ADX IngestionDate
• Default is set to 100 YEARS
• Recoverability (enabled/disabled)
• Default is set to ENABLED
• Recoverable for 14 days after deletion
.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}«
EXAMPLE:
{ "SoftDeletePeriod": "36500.00:00:00",
"Recoverability":"Enabled" }
.delete database DatabaseName policy retention
.delete table TableName policy retention
.alter-merge table MyTable1 policy retention softdelete = 7d
2 Parameters, applicable to DB or Table Use KUSTO to set KUSTO
#azuresatpn
Data Purge
The purge process is final and irreversible
PURGE PROCESS:
1. It requires database admin
permissions
2. Prior to Purging you have to
be ENABLED, opening a
SUPPORT TICKET.
3. Run purge QUERY, and
identify SIZE, EXEC.TIME and
give VerificationToken
4. Run REALLY purge QUERY
passing Verification Token
.purge table MyTable records in database MyDatabase <| where
CustomerId in ('X', 'Y')
NumRecordsToPurge
EstimatedPurge
ExecutionTime VerificationToken
1,596 00:00:02 e43c7184ed22f4f
23c7a9d7b124d19
6be2e570096987
e5baadf65057fa6
5736b
.purge table MyTable records in database MyDatabase with
(verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570
096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y')
.purge table MyTable records
in database MyDatabase
with (noregrets='true')
2 STEP PROCESS 1 STEP PROCESS
With No Regrets !!!!
#azuresatpn
KUSTO: Do and Don’t
• DO analytics over Big Data.
• DO and support entities such as databases, tables, and columns
• DO and support complex analytics query operators (calculated
columns, filtering, group by, joins).
• DO NOT perform in-place updates
#azuresatpn
Virtual Network ( preview)
BENEFITS
• USE NSG rules to limit traffic.
• Connect your on-premise network to Azure Data Explorer cluster's subnet.
• Secure your data connection sources (Event Hub and Event Grid) with service
endpoints.
VNET gives you TWO Independent IPs
• Private IP: access the cluster inside the VNet.
• Public IP: access the cluster from outside the VNet (management and monitoring) and as a source address
for outbound connections initiated from the cluster.
#azuresatpn
ADLS & AzureDataExplorer
#azuresatpn
<REAL-USE-CASES/>
#azuresatpn
Event Correlation
Name City
SessionI
d Timestamp
Start London 2817330 2015-12-09T10:12:02.32
Game London 2817330 2015-12-09T10:12:52.45
Start Manchester 4267667 2015-12-09T10:14:02.23
Stop London 2817330 2015-12-09T10:23:43.18
Cancel Manchester 4267667 2015-12-09T10:27:26.29
Stop Manchester 4267667 2015-12-09T10:28:31.72
City SessionId StartTime StopTime Duration
London 2817330 2015-12-
09T10:12:02.32
2015-12-
09T10:23:43.18
00:11:40.46
Manch
ester
4267667 2015-12-
09T10:14:02.23
2015-12-
09T10:28:31.72
00:14:29.49
Get sessions from start and stop events
Let's suppose we have a log of events, in which some events mark the start or end of an
extended activity or session. Every event has an SessionId, so the problem is to match up the
start and stop events with the same id.
Kusto
let Events = MyLogTable
| where ... ;
Events
| where Name == "Start"
| project Name, City, SessionId, StartTime=timestamp
| join (Events | where Name="Stop" | project StopTime=timestamp, SessionId)
on SessionId
| project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime
Use let to name a projection of the table that is pared down as far as possible
before going into the join.
Project is used to change the names of the timestamps so that both the start
and stop times can appear in the result. It also selects the other columns we
want to see in the result.
join matches up the start and stop entries for the same activity, creating a row
for each activity. Finally, project again adds a column to show the duration of
the activity.
#azuresatpn
In Place Enrichment
Creating and using query-time dimension tables
In many cases one wants to join the results of a query with some ad-hoc dimension table that is not stored in the
database. It is possible to define an expression whose result is a table scoped to a single query by doing something
like this:
Kusto
// Create a query-time dimension table using datatable
let DimTable = datatable(EventType:string, Code:string) [ "Heavy Rain", "HR", "Tornado", "T" ] ;
DimTable
| join StormEvents on EventType
| summarize count() by Code
#azuresatpn
<TOOLS/>
#azuresatpn
Azure Data Explorer
Easy to ingest the data and easy to query the data
Blob
&
Azure
Queue
Python
SDK
IoT Hub
.NET SDK
Azure Data
Explorer
REST API
Event Hub
.NET SDK
Python SDK
Web UI
Desktop App
Jupyter
Magic
APIs UX
Power BI
Direct Query
Microsoft
Flow
Azure App
Logic
Connectors
Grafana
ADF
MS-TDS
Java SDK
Java Script
Monaco IDE
Azure
Notebooks
Protocols
Streaming
Bulk
APIs
Blob
&
Event Grid
Queued
Ingestion Direct
Java SDK
#azuresatpn
• Web GUI
• https://dataexplorer.azure.com
• KUSTO Explorer
• https://docs.microsoft.com/it-it/azure/kusto/tools/kusto-explorer
• Visual Studio Code KQL plugin
• KusKus
Tools
#azuresatpn
Brief Summary
#azuresatpn
GRAZIE
#azuresatpn
<OFF.LINE.DEMO/>
#azuresatpn
KQL MAGIC
#azuresatpn
Let’s do an Example
#azuresatpn
#azuresatpn
Create table, load data … and play!
Create Table
.create table TestTable (TimeStamp: datetime, Name: string, Metric: int, Source:string)
Ingest sample data
.ingest into table StormEvents
h'https://kustosamplefiles.blob.core.windows.net/samplefiles/StormEvents.csv?st=2018-08-
31T22%3A02%3A25Z&se=2020-09-01T22%3A02%3A00Z&sp=r&sv=2018-03-
28&sr=b&sig=LQIbomcKI8Ooz425hWtjeq6d61uEaq21UVX7YrM61N4%3D' with
(ignoreFirstRecord=true)
2.
#azuresatpn
Example
• .create table tbl001_AABA (Date: datetime, Open: int, High: int, Low:
int, Close: int, Volume: int)
• .drop tables (tbl001_AABA) ifexists

More Related Content

PPTX
Time Series Analytics Azure ADX
PPTX
MCT Virtual Summit 2021
PPTX
Azure Data Explorer deep dive - review 04.2020
PPTX
Azure Industrial Iot Edge
PPTX
Kusto (Azure Data Explorer) Training for R&D - January 2019
PPTX
A developer's introduction to big data processing with Azure Databricks
PPTX
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
PPTX
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Time Series Analytics Azure ADX
MCT Virtual Summit 2021
Azure Data Explorer deep dive - review 04.2020
Azure Industrial Iot Edge
Kusto (Azure Data Explorer) Training for R&D - January 2019
A developer's introduction to big data processing with Azure Databricks
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Getting to 1.5M Ads/sec: How DataXu manages Big Data

What's hot (13)

PPTX
Data saturday malta - ADX Azure Data Explorer overview
PDF
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
PDF
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
PPTX
Technical overview of Azure Cosmos DB
PDF
Azure Hd insigth news
PPTX
Microsoft Azure Databricks
PDF
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
PPTX
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
PDF
DataMinds 2022 Azure Purview Erwin de Kreuk
PDF
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
PDF
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
PPTX
Is there a way that we can build our Azure Data Factory all with parameters b...
Data saturday malta - ADX Azure Data Explorer overview
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Technical overview of Azure Cosmos DB
Azure Hd insigth news
Microsoft Azure Databricks
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
DataMinds 2022 Azure Purview Erwin de Kreuk
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
5 Comparing Microsoft Big Data Technologies for Analytics
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
Is there a way that we can build our Azure Data Factory all with parameters b...
Ad

Similar to Azure satpn19 time series analytics with azure adx (20)

PPTX
Microsoft Azure News - Dec 2016
PPTX
TechEvent Databricks on Azure
PPTX
Cepta The Future of Data with Power BI
PDF
Estimating the Total Costs of Your Cloud Analytics Platform
PDF
1 Introduction to Microsoft data platform analytics for release
PPTX
Building a Real-Time IoT monitoring application with Azure
PPTX
Microsoft Azure News - December 2019
PPTX
Azure Databricks - An Introduction 2019 Roadshow.pptx
PDF
Customer Migration to Azure SQL Database_2024.pdf
PPTX
Spark Streaming with Azure Databricks
PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
PPTX
Machine Learning and AI
PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
PPTX
Ho-Ho-Hold onto Your Hats! Real-Time Data Magic from Santa’s Sleigh with Azur...
PPTX
Azuresatpn19 - An Introduction To Azure Data Factory
PDF
201908 Overview of Automated ML
PPTX
Microsoft Azure News - 2018 March
PPTX
Introducing Azure SQL Database
PPTX
cosmodb ppt personal.pptxgskjhkjsfgkhkjgskhk
PDF
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Microsoft Azure News - Dec 2016
TechEvent Databricks on Azure
Cepta The Future of Data with Power BI
Estimating the Total Costs of Your Cloud Analytics Platform
1 Introduction to Microsoft data platform analytics for release
Building a Real-Time IoT monitoring application with Azure
Microsoft Azure News - December 2019
Azure Databricks - An Introduction 2019 Roadshow.pptx
Customer Migration to Azure SQL Database_2024.pdf
Spark Streaming with Azure Databricks
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Machine Learning and AI
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Ho-Ho-Hold onto Your Hats! Real-Time Data Magic from Santa’s Sleigh with Azur...
Azuresatpn19 - An Introduction To Azure Data Factory
201908 Overview of Automated ML
Microsoft Azure News - 2018 March
Introducing Azure SQL Database
cosmodb ppt personal.pptxgskjhkjsfgkhkjgskhk
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Ad

More from Riccardo Zamana (8)

PDF
Copilot Prompting Toolkit_All Resources.pdf
PPTX
At the core you will have KUSTO
PPTX
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
PPTX
Industrial iot: dalle parole ai fatti
PPTX
Azure dayroma java, il lato oscuro del cloud
PPTX
Industrial Iot - IotSaturday
PPTX
Azure reactive systems
PPTX
Industrial IoT on azure
Copilot Prompting Toolkit_All Resources.pdf
At the core you will have KUSTO
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Industrial iot: dalle parole ai fatti
Azure dayroma java, il lato oscuro del cloud
Industrial Iot - IotSaturday
Azure reactive systems
Industrial IoT on azure

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Lecture1 pattern recognition............
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Global journeys: estimating international migration
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Fluorescence-microscope_Botany_detailed content
Supervised vs unsupervised machine learning algorithms
Lecture1 pattern recognition............
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Global journeys: estimating international migration
Introduction-to-Cloud-ComputingFinal.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Database Infoormation System (DBIS).pptx
Quality review (1)_presentation of this 21
Data_Analytics_and_PowerBI_Presentation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Reliability_Chapter_ presentation 1221.5784
IBA_Chapter_11_Slides_Final_Accessible.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Clinical guidelines as a resource for EBP(1).pdf
Mega Projects Data Mega Projects Data
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

Azure satpn19 time series analytics with azure adx

  • 1. #azuresatpn Azure Saturday 2019 Azure ADX Time Series Analytics with Azure ADX
  • 2. #azuresatpn Questions What about TIME SERIES DATABASE? When Have I to use it? Which are market possible choices? OpenTSDB? Kairos over Scylla/Cassandra? Influx? Why Have I to learn Another DB yet!? Why not SQL? Why not COSMOS?
  • 3. #azuresatpn 1. Intro 2. Service 3. Trust 4. Basics 5. Tecniques 6. Dive into Scalar Functions 7. Real Use Cases (in IIoT) Multi-temperature data processing paths
  • 5. #azuresatpn • seconds freshness, days retention • in-mem aggregated data • pre-defined standing queries • split-seconds query performance • data viewing Hot • minutes freshness, months retention • raw data • ad-hoc queries • seconds-minutes query perf • data exploration Warm • hours freshness, years retention • raw data • programmatic batch processing • minutes-hours query perf • data manipulation Cold • in-mem cube • stream analytics • … • column store • Indexing • … • distributed file system • map reduce • … Multi-temperature data processing paths
  • 6. #azuresatpn What is Azure Data Explorer Any append- only stream of records Relational query model: Filter, aggregate, join, calculated columns, … Fully- managed Rapid iterations to explore the data High volume High velocity High variance (structured, semi- structured, free-text) PaaS, Vanilla, Database Purposely built
  • 7. #azuresatpn Fully managed big data analytics service • Fully managed for efficiency Focus on insights, not the infra-structure for fast time to value • No infrastructure to manage; provision the service, choose the SKU for your workload, and create database. • Optimized for streaming data Get near-instant insights from fast-flowing data • Scale linearly up to 200 MB per second per node with highly performant, low latency ingestion. • Designed for data exploration • Run ad-hoc queries using the intuitive query language • Returns results from 1 Billion records < 1 second without modifying the data or metadata
  • 9. #azuresatpn When is it useful? 1. Analyze Telemetry data 2. Retrieve trends/Series from clustered data 3. Make regression over Big Data 4. Summarize and export ordered streams LAB 01 From IoT Hub to ADX https://docs.microsoft.com/it- it/azure/data-explorer/ingest-data- iot-hub
  • 10. #azuresatpn Azure Data Explorer Architecture SPARK ADF Apps => API Logstash plg Kafka sync IotHub EventHub EventGrid Data Management Engine SSD Blob / ADLS STREAM BATCH Ingested DAta ODBC PowerBI ADX UI MS Flow Logic Apps Notebooks Grafana Spark
  • 11. #azuresatpn How about the ADX Story? Telemetry Analytics for internal Analytics Data Platform for products AI OMS ASC Defender IOT Interactive Analytics Big Data Platform 2015 - 2016 Starting with 1st party validation Building modern analytics Vision of analytics platform for MSFT 2019 Analytics engine for 3rd party offers Unified platform across OMS/AI Expanded scenarios for IOT timeseries Bridged across client/server security 2017 GA - February 2019
  • 12. #azuresatpn Available SKU Attribute D SKU L SKU Small SKUs Minimal size is D11 with two cores Minimal size is L4 with four cores Availability Available in all regions (the DS+PS version has more limited availability) Available in a few regions Cost per GB cache per core High with the D SKU, low with the DS+PS version Lowest with the Pay-As-You-Go option Reserved Instances (RI) pricing High discount (over 55 percent for a three- year commitment) Lower discount (20 percent for a three-year commitment) • D v2: The D SKU is compute-optimized (Optional Premium Storage disk) • LS: The L SKU is storage-optimized (greater SSD size than D SKU equivalent) D1-5 v2 instances are based on either the 2.4 GHz Intel Xeon® E5-2673 v3 (Haswell) processor or the 2.3 GHz Intel Xeon® E5-2673 v4 (Broadwell) processor and can achieve 3.1 GHz with Intel Turbo Boost Technology 2.0 Ls-series instances are storage-optimised virtual machines for low-latency workloads such as NoSQL databases (e.g. Cassandra, MongoDB and Redis)
  • 13. #azuresatpn Azure Data Explorer SLA • SLA: at least 99.9% availability (Last updated: Feb 2019) Maximum Available Minutes: is the total number of minutes for a given Cluster deployed by Customer in a Microsoft Azure subscription during a billing month. Downtime: is the total number of minutes within Maximum Available Minutes during which a Cluster is unavailable. Monthly Uptime Percentage: for the Azure Data Explorer is calculated as Maximum Available Minutes less Downtime divided by Maximum Available Minutes. Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100 Daily: 01m 26.4s Weekly: 10m 04.8s Monthly: 43m 49.7s
  • 14. #azuresatpn Pricing • Based on VM size, Storage and Network • Not based on DATABASE Numbers https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html Think to ADX as a ANALYSIS TOOL, in a MultiTenant Environment if you want to pay little money if you can afford a space shuttle Think to ADX as an INGESTION+RESILIENCY TOOL in order to break through your traditional «Live DWH»
  • 16. #azuresatpn First questions about ADX • Are we sure that is a mature Service? • which are the correct use cases where it is really useful? • Which are the OSS Alternatives that I should compare with?
  • 17. #azuresatpn Typical use cases 1. You need a Telemetry Analytics Platform, in order to retrieve aggregations or statistical calcultation on historial series («As an IT Manager» I want a platform to load logs from various file types, in order to analyze them and focus graphically the problem during time) 2. You want to offer multi tenant SAAS Solutions («As a Product Lead Engineer» I want to manage the backend of my multitenant SAAS solution using a unique, fat, huge backend service) 3. You need, within an Industrial IoT solution deelopment, to have common backend to handle with process variables, to make a correation analysis using continuous stream query («As a Quality manager» I need a prebuilt backend solutions to dynamically configure time based query on data in order tofind out correlations from process variables )
  • 18. #azuresatpn Why ADX is Unique Simplified costs • Vm costs • ADX service add on cost Many Prebuilt Inputs • ADF • Spark • Logstash • Kafka • Iothub • EventHub Many Prebuilt Outputs • TDS/SQL • Power BI • ODBC Connector • Spark • Jupyter • Grafana
  • 19. #azuresatpn Azure services with ADX usage Azure Monitor • Log Analytics • Application Insights Security Products • Windows Defender • Azure Security Center • Azure Sentinel IoT • Time Series Insights • Azure IoT Central
  • 20. #azuresatpn ADX vs Elastic Search From db-engines.com Azure Data Explorer Fully managed big data interactive analytics platform Elastic Search A distributed, RESTful modern search and analytics engine
  • 22. #azuresatpn Create a Cluster ADX follows standard creation process • Azure CLI • Powershell • C# • Python • ARM Login az login Select Subscription az account set --subscription MyAzureSub Cluster creation az kusto cluster create --name azureclitest --sku D11_v2 --resource-group testrg Database Creation az kusto database create --cluster-name azureclitest -- name clidatabase --resource-group testrg --soft- delete-period P365D --hot-cache-period P31D HOT-CACHE-PERIOD: Amount of time that data should be kept in cache. Duration in ISO8601 format (for example, 100 days would be P100D). SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available to query. Duration in ISO8601 format (for example, 100 days would be P100D)
  • 23. #azuresatpn How to set and use ADX? Create a database Use Database to link Ingestion Sources [Optional] Choose a DataConnection EventHub | Blob Storage | IotHub
  • 24. #azuresatpn How to script with Visual Studio Code • Use Log Analytics or KUSTO/KQL extensions ( .csl | .kusto | .kql) • Open VSC, create a file, save it and then edit • [Optional] Build a web application using MONACO IDE, then share kusto code with friends https://microsoft.github.io/monaco-editor/index.html
  • 25. #azuresatpn How about the Tools? 3.VISUALIZE • Azure Notebooks (preview) • Power BI • Graphana 2.QUERY • Kusto.Explorer • Web UI 4.ORCHESTRATE • Microsoft Flow • Microsoft Logic App 1.LOAD • LightIngest • Azure Data Factory Load Query Visualize Orchestrate BI People IT People ML People
  • 26. #azuresatpn • command-line utility for ad-hoc data ingestion into Kusto • pull source data from a local folder • pull source data from an Azure Blob Storage container • Useful to ingest fastly and play with ADX [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB -sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100 [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 What is LightIngest LAB 0X LightIngest https://docs.microsoft.com/en- us/azure/kusto/tools/lightingest
  • 28. #azuresatpn Ingestion capabilities Event Grid (using Blob as trigger) Ingest Azure Blobs into Azure Data Explorer Event Hub pipeline Ingest data from Event Hub into Azure Data Explorer Logstash plugin Ingest data from Logstash to Azure Data Explorer Kafka connector Ingest data from Kafka into Azure Data Explorer Azure Data Factory (ADF) Copy data from Azure Data Factory to Azure Data Explorer Kusto offers client SDK that can be used to ingest and query data with: • Python SDK • .NET SDK • Java SDK • Node SDK • REST API • Not only Azure Endpoints • As a ELK replacement, offers Logstash plugin • As a OSS LAMBDA replacement, offers Kafka connector
  • 29. #azuresatpn Ingestion Tecniques LAB 02 Queued Ingestion https://docs.microsoft.com/en- us/azure/kusto/api/netfx/kusto- ingest-queued-ingest-sample For high-volume, reliable, and cheap data ingestion Batch ingestion (provided by SDK) the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. Batch ingestion is the recommended technique. Most appropriate for exploration and prototyping .Inline ingestion (provided by query tools) Inline ingestion: control command (.ingest inline) containing in-band data is intended for ad hoc testing purposes. Ingest from query: control command (.set, .set-or-append, .set-or-replace) that points to query results is used for generating reports or small temporary tables. Ingest from storage: control command (.ingest into) with data stored externally (for example, Azure Blob Storage) allows efficient bulk ingestion of data. LAB 03 Inline Ingestion https://docs.microsoft.com/it- it/azure/kusto/management/data- ingestion/ingest-inline
  • 30. #azuresatpn Supported data formats For all ingestion methods other than ingest from query, format the data so that Azure Data Explorer can parse it. The supported data formats are: • CSV, TSV, TSVE, PSV, SCSV, SOH • JSON (line-separated, multi-line), Avro • ZIP and GZIP Schema mapping helps bind source data fields to destination table columns. • CSV Mapping (optional) works with all ordinal-based formats. It can be performed using the ingest command parameter or pre-created on the table and referenced from the ingest command parameter. • JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest command parameter. They can also be pre-created on the table and referenced from the ingest command parameter. LAB 04 Mapping example https://docs.microsoft.com/it- it/azure/kusto/management/data- ingestion/ingest-inline
  • 31. #azuresatpn Use ADX as ODBC Datasource 1. Download SQL 17 ODBC Driver: https://www.microsoft.com/en-us/download/details.aspx?id=56567 2. Configure ODBC source (as a normal SQL SERVER ODBC DSN ) Than you can use your preferred tool: POWER BI DESKTOP, QLIK SENSE DESKTOP, SISENSE, ecc.
  • 32. #azuresatpn Notebooks + ADX = KQL Magic KQL magic: https://github.com/microsoft/jupyter-Kqlmagic • extends the capabilities of the Python kernel in Jupyter • can run Kusto language queries natively • combine Python and Kusto query language LAB 05 Notebook example https://notebooks.azure.com/riccardo -zamana/projects/azuresaturday2019
  • 34. #azuresatpn Kusto for SQL USers • Perform SQL SELECT (no DDL, only SELECT) • Use KQL (Kusto Query Language) • Supports translating T-SQL queries to Kusto query language -- explain select top(10) * from StormEvents order by DamageProperty desc StormEvents | sort by DamageProperty desc nulls first | take 10 LAB 05 SQL to KQL example https://docs.microsoft.com/en- us/azure/kusto/query/sqlcheatsheet
  • 35. #azuresatpn ADX Functions Functions are reusable queries or query parts. Kusto supports several kinds of functions: • Stored functions, which are user-defined functions that are stored and managed a one kind of a database's schema entities. See Stored functions. • Query-defined functions, which are user-defined functions that are defined and used within the scope of a single query. The definition of such functions is done through a let statement. See User-defined functions. • Built-in functions, which are hard-coded (defined by Kusto and cannot be modified by users). LAB 06 Function example https://docs.microsoft.com/en- us/azure/kusto/query/functions/user- defined-functions
  • 36. #azuresatpn Language examples Alias database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").database("somedatabase"); database("wiki").PageViews | count Let start = ago(5h); let period = 2h; T | where Time > start and Time < start + period | ... Bin: T | summarize Hits=count() by bin(Duration, 1s) Batch: let m = materialize(StormEvents | summarize n=count() by State); m | where n > 2000; m | where n < 10 Tabular expression: Logs | where Timestamp > ago(1d) | join ( Events | where continent == 'Europe' ) on RequestId
  • 37. #azuresatpn Time Series Analysis – Bin Operator T | summarize Hits=count() by bin(Duration, 1s) bin(value,roundTo) USE CASE bin operator Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be grouped into a smaller set of specific values. [Rule] [Example]
  • 38. #azuresatpn Time Series Analysis – Make Series Operator T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01-01) to datetime(2016-01-10) step 1d by supplier T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]] make-series operator [Rule] [Example] USE CASE
  • 39. #azuresatpn Time Series Analysis – Basket Operator StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State, EventType, Damage, DamageCrops | evaluate basket(0.2) basket operator Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns that passed the frequency threshold in the original query. [Rule] [Example] T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...]) USE CASE
  • 40. #azuresatpn Time Series Analysis – Autocluster Operator StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.6) autocluster operator AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the original query (whether it's 100 or 100k rows) to a small number of patterns. [Rule] [Example] T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...]) StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.2, '~', '~', '*') USE CASE
  • 41. #azuresatpn Export To Storage .export async compressed to csv ( h@"https://storage1.blob.core.windows.net/containerName;secretKey", h@"https://storage1.blob.core.windows.net/containerName2;secretKey " ) with ( sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM ) <| myLogs | where id == "moshe" | limit 10000 To Sql .export async to sql ['dbo.MySqlTable'] h@"Server=tcp:myserver.database.windows.net,1433;Database= MyDatabase;Authentication=Active Directory Integrated;Connection Timeout=30;" with (createifnotexists="true", primarykey="Id") <| print Message = "Hello World!", Timestamp = now(), Id=12345678 1. DEFINE COMMAND Define ADX command and try your recurrent export strategy 2. TRY IN EDITOR Use an Editor to try command, verifying conection strings and parametrizing them 3. BUILD A JOB Build a Notebook or a C# JOB using the command as a SQL QUERY in your CODE LAB 08 Export example https://docs.microsoft.com/en- us/azure/kusto/query/functions/user- defined-functions
  • 42. #azuresatpn External tables & Continuous Export It’s an external endpoint: • Azure Storage • Azure Datalake Store • SQL Server You need to define: • Destination • Continuous-Export Strategy EXT TABLE CREATION .create external table ExternalAdlsGen2 (Timestamp:datetime, x:long, s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv ( h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey ' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" ) EXPORT to EXT TABLE .create-or-alter continuous-export MyExport over (T) to table ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m, sizeLimit=104857600) <| T
  • 43. #azuresatpn Policy • Cache policy • Ingestion Batching policy • IngestionTime policy • Merge policy • Retention policy • Restricted view access policy • Row order policy • Streaming ingestion policy • Sharding policy • Update policy
  • 44. #azuresatpn FACTS: A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage). B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes, Retention policy The Kusto cache provides a granular cache policy that customers can use to differentiate between two data cache policies: hot data cache and cold data cache. set query_datascope="hotcache"; T | union U | join (T datascope=all | where Timestamp < ago(365d) on X YOU CAN SPECIFY WHICH LOCATION MUST BE USED Cache policy is independent of retention policy !
  • 45. #azuresatpn Retention policy • Soft Delete Period (number) • Data is available for query ts is the ADX IngestionDate • Default is set to 100 YEARS • Recoverability (enabled/disabled) • Default is set to ENABLED • Recoverable for 14 days after deletion .alter database DatabaseName policy retention "{}" .alter table TableName policy retention "{}« EXAMPLE: { "SoftDeletePeriod": "36500.00:00:00", "Recoverability":"Enabled" } .delete database DatabaseName policy retention .delete table TableName policy retention .alter-merge table MyTable1 policy retention softdelete = 7d 2 Parameters, applicable to DB or Table Use KUSTO to set KUSTO
  • 46. #azuresatpn Data Purge The purge process is final and irreversible PURGE PROCESS: 1. It requires database admin permissions 2. Prior to Purging you have to be ENABLED, opening a SUPPORT TICKET. 3. Run purge QUERY, and identify SIZE, EXEC.TIME and give VerificationToken 4. Run REALLY purge QUERY passing Verification Token .purge table MyTable records in database MyDatabase <| where CustomerId in ('X', 'Y') NumRecordsToPurge EstimatedPurge ExecutionTime VerificationToken 1,596 00:00:02 e43c7184ed22f4f 23c7a9d7b124d19 6be2e570096987 e5baadf65057fa6 5736b .purge table MyTable records in database MyDatabase with (verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570 096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y') .purge table MyTable records in database MyDatabase with (noregrets='true') 2 STEP PROCESS 1 STEP PROCESS With No Regrets !!!!
  • 47. #azuresatpn KUSTO: Do and Don’t • DO analytics over Big Data. • DO and support entities such as databases, tables, and columns • DO and support complex analytics query operators (calculated columns, filtering, group by, joins). • DO NOT perform in-place updates
  • 48. #azuresatpn Virtual Network ( preview) BENEFITS • USE NSG rules to limit traffic. • Connect your on-premise network to Azure Data Explorer cluster's subnet. • Secure your data connection sources (Event Hub and Event Grid) with service endpoints. VNET gives you TWO Independent IPs • Private IP: access the cluster inside the VNet. • Public IP: access the cluster from outside the VNet (management and monitoring) and as a source address for outbound connections initiated from the cluster.
  • 51. #azuresatpn Event Correlation Name City SessionI d Timestamp Start London 2817330 2015-12-09T10:12:02.32 Game London 2817330 2015-12-09T10:12:52.45 Start Manchester 4267667 2015-12-09T10:14:02.23 Stop London 2817330 2015-12-09T10:23:43.18 Cancel Manchester 4267667 2015-12-09T10:27:26.29 Stop Manchester 4267667 2015-12-09T10:28:31.72 City SessionId StartTime StopTime Duration London 2817330 2015-12- 09T10:12:02.32 2015-12- 09T10:23:43.18 00:11:40.46 Manch ester 4267667 2015-12- 09T10:14:02.23 2015-12- 09T10:28:31.72 00:14:29.49 Get sessions from start and stop events Let's suppose we have a log of events, in which some events mark the start or end of an extended activity or session. Every event has an SessionId, so the problem is to match up the start and stop events with the same id. Kusto let Events = MyLogTable | where ... ; Events | where Name == "Start" | project Name, City, SessionId, StartTime=timestamp | join (Events | where Name="Stop" | project StopTime=timestamp, SessionId) on SessionId | project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime Use let to name a projection of the table that is pared down as far as possible before going into the join. Project is used to change the names of the timestamps so that both the start and stop times can appear in the result. It also selects the other columns we want to see in the result. join matches up the start and stop entries for the same activity, creating a row for each activity. Finally, project again adds a column to show the duration of the activity.
  • 52. #azuresatpn In Place Enrichment Creating and using query-time dimension tables In many cases one wants to join the results of a query with some ad-hoc dimension table that is not stored in the database. It is possible to define an expression whose result is a table scoped to a single query by doing something like this: Kusto // Create a query-time dimension table using datatable let DimTable = datatable(EventType:string, Code:string) [ "Heavy Rain", "HR", "Tornado", "T" ] ; DimTable | join StormEvents on EventType | summarize count() by Code
  • 54. #azuresatpn Azure Data Explorer Easy to ingest the data and easy to query the data Blob & Azure Queue Python SDK IoT Hub .NET SDK Azure Data Explorer REST API Event Hub .NET SDK Python SDK Web UI Desktop App Jupyter Magic APIs UX Power BI Direct Query Microsoft Flow Azure App Logic Connectors Grafana ADF MS-TDS Java SDK Java Script Monaco IDE Azure Notebooks Protocols Streaming Bulk APIs Blob & Event Grid Queued Ingestion Direct Java SDK
  • 55. #azuresatpn • Web GUI • https://dataexplorer.azure.com • KUSTO Explorer • https://docs.microsoft.com/it-it/azure/kusto/tools/kusto-explorer • Visual Studio Code KQL plugin • KusKus Tools
  • 62. #azuresatpn Create table, load data … and play! Create Table .create table TestTable (TimeStamp: datetime, Name: string, Metric: int, Source:string) Ingest sample data .ingest into table StormEvents h'https://kustosamplefiles.blob.core.windows.net/samplefiles/StormEvents.csv?st=2018-08- 31T22%3A02%3A25Z&se=2020-09-01T22%3A02%3A00Z&sp=r&sv=2018-03- 28&sr=b&sig=LQIbomcKI8Ooz425hWtjeq6d61uEaq21UVX7YrM61N4%3D' with (ignoreFirstRecord=true) 2.
  • 63. #azuresatpn Example • .create table tbl001_AABA (Date: datetime, Open: int, High: int, Low: int, Close: int, Volume: int) • .drop tables (tbl001_AABA) ifexists

Editor's Notes

  • #2: 1. SCOPO DI OGGI 2. LE slide sono parlanti apposta!
  • #3: - L’intento di oggi è - cosa NON DIREMO - perché gli esempi sono MS
  • #24: SE SEI IN ANTICIPO, PROVA!!!!
  • #25: PROVA SU VSC CTRL+P => kuskus Poi: cluster(adxclu001).database('db001').table('TBL_LAB01') | count
  • #26: Fai vedere NOTEBOOKS
  • #35: FARE UN PO DI PROVE FARE la distinct per introdurre la SUMMERIZE
  • #36: .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | limit 100 | where minimum_nights > i} MyFunction1(80); explain SELECT name, minimum_nights from TBL_LAB0X .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | project name, minimum_nights | limit 100 | where minimum_nights > i | render columnchart} MyFunction1(80);
  • #37: T | summarize Hits=count() by bin(Duration, 1s)