Just-in-Time Analytics and the Need for Autonomous Database Administration with Wes Holler

Wes Holler, Chief Architect
Algebraix Data
JUST-IN-TIME ANALYTICS
AND THE NEED FOR AUTONOMOUS
DATABASE ADMINISTRATION

• A novel application of Set
Theory to Data Processing
• Applicable to many data
models including SQL
Data Algebra

3
Just-in-Time Analytics
Autonomous Data
Management
needs…

JIT Analytics and the Life of a
Modern Analyst
Statistics
& ML
Short
RTT
Big Data
systems
Model
business
questions
4
This shouldn’t
require intimate
knowledge of how
underlying systems
work.

Spark for JIT Analytics: The Good
• Unified API
• Schema-on-read and Heterogeneous Data
Sources
• Declarative Languages/APIs and Catalyst
• Elastic Compute
5

Spark for JIT Analytics: The Bad
• Challenges for interactivity, efficiency, and
scalability
• Cost of creating and maintaining “glue
code”
• Data scientists and engineers are doing
DBA work

Database Management Responsibilities
7
Capacity planning Configuration
Performance tuning A billion other things
#manual
We will focus on the
performance and
tuning aspects

Improving and Maintaining
Performance
8
• Indexes
• Materialize views
• Pre-aggregate data
• Lots of configuration

Performance Tuning Strategies in
Spark
• Segment, cache, and checkpoint
• Configure cluster parameters
• spark.sql.shuffle.partitions
9

What is the Problem with Manual
Tuning?
• Varies with the data (skew and scale), queries, and
hardware
• Often done through trial and error
• Problems are exacerbated with JIT analytics case
• Shared resources
10

What is the Problem with Manual
Tuning?
11
It is hard and
time-
consuming.

A Motivating Example for
Autonomous Data Management
12

A Motivating Example for
Autonomous Data Management
13
Data Algebra
SQL-DA
Entity Store
Optimizer

𝑄: = 𝑓𝑖𝑙𝑡𝑒𝑟 * +,--(𝑓𝑖𝑙𝑡𝑒𝑟 - +[0] 𝐴𝛻𝐵 )
𝐴𝛻𝐵 =
0 ↦ 𝛼, …, 3 ↦ 42.0 : 3,
0 ↦ 𝛽 : 1,
…
,
0
𝑎
…
…
3
𝑏
4
𝑏2
𝑖𝑛𝑡 … 𝑓𝑙𝑜𝑎𝑡 𝑖𝑛𝑡
𝐴 𝐴 𝐵 𝐵
bar
baz

To
SQL-DA
Analyze
Entity Store
bar
baz
bar
baz Optimize
To SQL

Entity Store
bar
baz
To
SQL-DA
Analyze
bar
baz

Entity Store
bar
baz
To
SQL-DA
Analyze
bar
baz
Optimize
Materialize
View

Entity Store
To
SQL-DA
Analyze
bar
baz
Optimize
To SQL
bar
baz

Complex Query Expressions
are Turned Into Look-ups
19

Benefits of Autonomous Data
Management
• Reduce query time
• Reduce computation resources required
• Allow the analyst to focus on problem
solving, not data management
20

Algebraix Inside:
An Implementation of ADM
21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Before After
The PySpark API (DataFrames and SQL) is shimmed.
from pyspark import *
from pyspark.sql import SQLContext
conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
names = sc.readText(“people.txt”)
namesDF = sc.createDataFrame(names)
namesDF.registerTempTable(“names”)
sqlContext.sql(”””
SELECT * FROM names
”””).show()
from aqaspark import *
conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
names = sc.readText(“people.txt”)
namesDF = sc.createDataFrame(names)
namesDF.registerTempTable(“names”)
sqlContext.sql(”””
SELECT * FROM names
”””).show()

Wrap Up
22
Autonomous Data
Management makes Spark
great for SQL analytics.

Thank You.
@wes_holler
wholler@algebraixdata.com
www.algebraixdata.com
tstraub@algebraixdata.com

Just-in-Time Analytics and the Need for Autonomous Database Administration with Wes Holler

More Related Content

What's hot (20)

Similar to Just-in-Time Analytics and the Need for Autonomous Database Administration with Wes Holler (20)

More from Databricks (20)

Recently uploaded (20)

Just-in-Time Analytics and the Need for Autonomous Database Administration with Wes Holler