SlideShare a Scribd company logo
9
Most read
15
Most read
17
Most read
Measures in SQL
Julian Hyde (Google)
John Fremlin (Google)
2024-06-11 17:30 Europa
Measures in SQL
ABSTRACT
SQL has attained widespread adoption, but Business Intelligence tools still use their
own higher level languages based upon a multidimensional paradigm. Composable
calculations are what is missing from SQL, and we propose a new kind of column,
called a measure, that attaches a calculation to a table. Like regular tables, tables
with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional
languages but retains SQL semantics. Measure invocations can be expanded in place
to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive
expressions (a way to evaluate multidimensional expressions that is consistent with
existing SQL semantics), a concept called evaluation context, and several operations
for setting and modifying the evaluation context.
SIGMOD, June 9–15, 2024, Santiago, Chile
Julian Hyde
Google Inc.
San Francisco, CA, USA
julianhyde@google.com
John Fremlin
Google Inc.
New York, NY, USA
fremlin@google.com
1. Problem
Tables are broken!
Tables are unable to provide reusable calculations.
Problem: Calculate profit margin of orders
SELECT prodName,
(SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Happy’;
profitMargin
============
0.47
prodName custName orderDate revenue cost
Happy Alice 2023/11/28 6 4
Acme Bob 2023/11/27 5 2
Happy Alice 2024/11/28 7 4
Whizz Celia 2023/11/25 3 1
Happy Bob 2022/11/27 4 1
SELECT prodName,
(SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Happy’;
profitMargin
============
0.47
Attempted solution: Create a view
SELECT AVG(profitMargin) AS profitMargin
FROM SummarizedOrders
WHERE prodName = ‘Happy’;
profitMargin
============
0.50
CREATE VIEW SummarizedOrders AS
SELECT prodName, orderDate,
(SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
GROUP BY prodName, orderDate;
prodName custName orderDate revenue cost
Happy Alice 2023/11/28 6 4
Acme Bob 2023/11/27 5 2
Happy Alice 2024/11/28 7 4
Whizz Celia 2023/11/25 3 1
Happy Bob 2022/11/27 4 1
SELECT prodName,
(SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Happy’;
profitMargin
============
0.47
2. Theory
1. Allow tables to have measures DESCRIBE EnhancedOrders;
column type
============ ==============
prodName STRING
custName STRING
orderDate DATE
revenue INTEGER
cost INTEGER
profitMargin DOUBLE MEASURE
2. Operators for evaluating measures
SELECT prodName, profitMargin
FROM EnhancedOrders
GROUP BY prodName;
prodName profitMargin
======== ============
Acme 0.60
Happy 0.47
Whizz 0.67
3. Syntax to define measures in a query
SELECT *,
(SUM(revenue) - SUM(cost)) / SUM(revenue)
AS MEASURE profitMargin
FROM Orders
GROUP BY prodName;
Extend the relational model with measures
SELECT prodName,
profitMargin
FROM EnhancedOrders
GROUP BY prodName;
Definitions
A context-sensitive expression (CSE) is an expression
whose value is determined by an evaluation context.
An evaluation context is a predicate whose terms are one
or more columns from the same table.
● This set of columns is the dimensionality of the
CSE.
A measure is a special kind of column that becomes a
CSE when used in a query.
● A measure’s dimensionality is the set of non-
measure columns in its table.
● The data type of a measure that returns a value of
type t is t MEASURE, e.g. INTEGER MEASURE.
prodName profitMargin
======== ============
Acme 0.60
Happy 0.50
Whizz 0.67
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Acme’;
profitMargin
============
0.60
profitMargin is a
measure (and a CSE)
Dimensionality is
{prodName, custName,
orderDate, revenue, cost}
Evaluation context for
this cell is
prodName = ‘Acme’
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Whizz’
AND custName = ‘Bob’;
m
====
NULL
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Acme’;
m
====
0.60
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Happy’;
m
====
0.50
SELECT prodName,
profitMargin,
profitMargin
AT (SET prodName = ‘Happy’)
AS happyMargin,
profitMargin
AT (SET custName = ‘Bob’)
AS bobMargin
FROM EnhancedOrders
GROUP BY prodName;
AT operator
The context transformation operator AT modifies the
evaluation context.
Syntax:
expression AT (contextModifier…)
contextModifier ::=
WHERE predicate
| ALL
| ALL dimension
| SET dimension = [CURRENT] expression
| VISIBLE
prodName profitMargin happyMargin bobMargin
======== ============ =========== =========
Acme 0.60 0.50 0.60
Happy 0.50 0.50 0.75
Whizz 0.67 0.50 NULL
Evaluation context
for this cell is
prodName = ‘Acme’
Evaluation context
for these cells is
prodName = ‘Happy’
Evaluation context for
this cell is
prodName = ‘Whizz’
AND custName = ‘Bob’
3. Consequences
Grain-locking
What is the average age of the customer
who would ordered each product?
When we use an aggregate function in
a join query, it will ‘double count’ if the
join duplicates rows.
This is generally not we want for
measures – except if we want a
weighted average – but is difficult to
avoid in SQL.
Measures are locked to the grain of the
table that defined them.
WITH EnhancedCustomers AS (
SELECT *,
AVG(custAge) AS MEASURE avgAge
FROM Customers)
SELECT o.prodName,
AVG(c.custAge) AS weightedAvgAge,
c.avgAge AS avgAge
FROM Orders AS o
JOIN EnhancedCustomers AS c USING (custName)
GROUP BY o.prodName;
prodName weightedAvgAge avgAge
======== ============== ======
Acme 41 41
Happy 29 32
Whizz 17 17
prodName custName orderDate revenue cost
Happy Alice 2023/11/28 6 4
Acme Bob 2023/11/27 5 2
Happy Alice 2024/11/28 7 4
Whizz Celia 2023/11/25 3 1
Happy Bob 2022/11/27 4 1
custName custAge
Alice 23
Bob 41
Celia 17
Alice (age 23)
has two orders;
Bob (age 41) has
one order.
Measures prevent self-joins
In 2020, what was the revenue and year-
on-year revenue growth of each product?
SELECT o20.prodName
o20.sumRevenue,
o20.sumRevenue - o19.sumRevenue
AS revenueGrowth
FROM (
SELECT prodName,
SUM(revenue) AS sumRevenue
FROM Orders
JOIN Products USING (prodName)
WHERE YEAR(orderDate) = 2020
GROUP BY prodName) AS o20
JOIN (
SELECT prodName,
SUM(revenue) AS sumRevenue
FROM Orders
JOIN Products USING (prodName)
WHERE YEAR(orderDate) = 2019
GROUP BY prodName) AS o19
ON o20.prodName = 019.prodName;
SELECT prodName,
sumRevenue,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
FROM (
SELECT *,
SUM(revenue) AS MEASURE sumRevenue
FROM Orders
JOIN Products USING (prodName))
WHERE YEAR(orderDate) = 2020
GROUP BY prodName;
Relational algebra (bottom-up) Multidimensional (top-down)
Products
Customers
⨝
⨝
Σ
⨝
σ
Orders
Products
Customers
⨝
⨝
Σ
σ
Orders
π
(customer: all,
orderYear: 2019,
prodName: all)
(customer: all,
orderYear: 2020,
prodName: all)
custName
prodName
orderDate
Bottom-up vs Top-down query
Represent a Business Intelligence model as a SQL view
Orders Products
Customers
CREATE VIEW OrdersCube AS
SELECT *
FROM (
SELECT o.orderDate AS `order.date`,
o.revenue AS `order.revenue`,
SUM(o.revenue) AS MEASURE `order.sum_revenue`
FROM Orders) AS o
LEFT JOIN (
SELECT c.custName AS `customer.name`,
c.state AS `customer.state`,
c.custAge AS `customer.age`,
AVG(c.custAge) AS MEASURE `customer.avg_age`
FROM Customers) AS c
ON o.custName = c.custName
LEFT JOIN (
SELECT p.prodName AS `product.name`,
p.color AS `product.color`,
AVG(p.weight) AS MEASURE `product.avg_weight`
FROM Products) AS p
ON o.prodName = p.prodName;
SELECT `customer.state`, `product.avg_weight`
FROM OrdersCube
GROUP BY `customer.state`;
● SQL planner handles view expansion
● Grain locking makes it safe to use a
star schema
● Users can define new models simply
by writing queries
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue
FROM Orders
Composition & closure
Just as tables are closed under queries, so
tables-with-measures are closed under
queries-with-measures
Measures can reference measures
Complex analytical calculations without
touching the FROM clause
Evaluation contexts can be nested
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin
FROM Orders
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY
FROM Orders
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY,
ARRAY_AGG(productId
ORDER BY sumRevenue DESC LIMIT 5)
AT (ALL productId)
AS MEASURE top5Products
FROM Orders;
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY,
ARRAY_AGG(productId
ORDER BY sumRevenue DESC LIMIT 5)
AT (ALL productId)
AS MEASURE top5Products,
ARRAY_AGG(customerId
ORDER BY sumRevenue DESC LIMIT 3)
AT (ALL customerId
SET productId MEMBER OF top5Products
AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1))
AS MEASURE top3CustomersOfTop5Products
FROM Orders;
Implementing measures & CSEs as SQL rewrites
simple
complex
Complexity Query Expanded query
Simple measure
can be inlined
SELECT prodName, avgRevenue
FROM OrdersCube
GROUP BY prodName
SELECT prodName, AVG(revenue)
FROM orders
GROUP BY prodName
Join requires grain-
locking
SELECT prodName, avgAge
FROM OrdersCube
GROUP BY prodName
SELECT o.prodName, AVG(c.custAge PER
c.custName) FROM orders JOIN customers
GROUP BY prodName
→ (something with GROUPING SETS)
Period-over- period SELECT prodName, avgAge -
avgAge AT (SET year =
CURRENT year - 1)
FROM OrdersCube
GROUP BY prodName
(something with window aggregates)
Scalar subquery
can accomplish
anything
SELECT prodName, prodColor
avgAge AT (ALL custState
SET year = CURRENT year - 1)
FROM OrdersCube
GROUP BY prodName, prodColor
SELECT prodName, prodColor,
(SELECT … FROM orders
WHERE <evaluation context>)
FROM orders
GROUP BY prodName, prodColor
Summary
Measures provide reusable calculations
● Can represent BI models (aka ‘cubes’, ‘semantic layer’) as SQL views
Top-down evaluation makes queries concise
● Fewer self joins → fewer user errors, less planner effort, more efficient execution
Measures don’t break SQL
● Queries without measures give same results to regular SQL
● Queries with measures give same row count as regular SQL
● Measures can be implemented by expanding to SQL
Measures provide
reusable
calculations in
SQL
https://doi.org/10.1145/3626246.3653374
@julianhyde
@JohnFremlin
@ApacheCalcite
https://calcite.apache.org

More Related Content

PPTX
Four Essential Ways Control Charts Guide Healthcare Improvement
PDF
CS3352-Foundations of Data Science Notes.pdf
PDF
OOMD UNIT2.pdf
PDF
Artificial intelligence cs607 handouts lecture 11 - 45
PPT
Planning & orag.imaging services
PDF
10 Steps To Decommission Your SAP BusinessObjects Successfully
PDF
The Databases applications in government sections
PPTX
Artifacts, Data Dictionary, Data Modeling, Data Wrangling
Four Essential Ways Control Charts Guide Healthcare Improvement
CS3352-Foundations of Data Science Notes.pdf
OOMD UNIT2.pdf
Artificial intelligence cs607 handouts lecture 11 - 45
Planning & orag.imaging services
10 Steps To Decommission Your SAP BusinessObjects Successfully
The Databases applications in government sections
Artifacts, Data Dictionary, Data Modeling, Data Wrangling

Similar to Measures in SQL (SIGMOD 2024, Santiago, Chile) (20)

PDF
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
PDF
Cubing and Metrics in SQL, oh my!
PDF
Adding measures to Calcite SQL
PDF
Building a semantic/metrics layer using Calcite
PDF
Question 2B
PPTX
Love Your Database Railsconf 2017
PDF
advance-sqaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal.pdf
PDF
Aggregate functions in sql
PPTX
Database Management System - SQL Advanced Training
PPTX
Subqueries, Backups, Users and Privileges
PDF
CS121Lec05.pdf
PDF
SQL Queries .pdf
PDF
Practical Sql A Beginners Guide To Storytelling With Data 2nd Edition 2 Conve...
PDF
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
PDF
The ultimate-guide-to-sql
PDF
ILOUG 2019 - SQL features for Developers
PPTX
Server Query Language – Getting Started.pptx
PDF
Why you care about
 relational algebra (even though you didn’t know it)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Cubing and Metrics in SQL, oh my!
Adding measures to Calcite SQL
Building a semantic/metrics layer using Calcite
Question 2B
Love Your Database Railsconf 2017
advance-sqaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal.pdf
Aggregate functions in sql
Database Management System - SQL Advanced Training
Subqueries, Backups, Users and Privileges
CS121Lec05.pdf
SQL Queries .pdf
Practical Sql A Beginners Guide To Storytelling With Data 2nd Edition 2 Conve...
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
The ultimate-guide-to-sql
ILOUG 2019 - SQL features for Developers
Server Query Language – Getting Started.pptx
Why you care about
 relational algebra (even though you didn’t know it)
Ad

More from Julian Hyde (20)

PDF
Morel, a data-parallel programming language
PDF
Is there a perfect data-parallel programming language? (Experiments with More...
PDF
Morel, a Functional Query Language
PDF
Apache Calcite (a tutorial given at BOSS '21)
PDF
The evolution of Apache Calcite and its Community
PDF
What to expect when you're Incubating
PDF
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
PDF
Efficient spatial queries on vanilla databases
PDF
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
PDF
Tactical data engineering
PDF
Don't optimize my queries, organize my data!
PDF
Spatial query on vanilla databases
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
PDF
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
PPTX
Lazy beats Smart and Fast
PDF
Don’t optimize my queries, optimize my data!
PDF
Data profiling with Apache Calcite
PDF
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
PDF
Data Profiling in Apache Calcite
PDF
Streaming SQL
Morel, a data-parallel programming language
Is there a perfect data-parallel programming language? (Experiments with More...
Morel, a Functional Query Language
Apache Calcite (a tutorial given at BOSS '21)
The evolution of Apache Calcite and its Community
What to expect when you're Incubating
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Efficient spatial queries on vanilla databases
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Tactical data engineering
Don't optimize my queries, organize my data!
Spatial query on vanilla databases
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Lazy beats Smart and Fast
Don’t optimize my queries, optimize my data!
Data profiling with Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
Data Profiling in Apache Calcite
Streaming SQL
Ad

Recently uploaded (20)

PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
AI in Product Development-omnex systems
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Introduction to Artificial Intelligence
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Understanding Forklifts - TECH EHS Solution
PDF
medical staffing services at VALiNTRY
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPT
Introduction Database Management System for Course Database
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
System and Network Administraation Chapter 3
PDF
top salesforce developer skills in 2025.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
ISO 45001 Occupational Health and Safety Management System
AI in Product Development-omnex systems
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Introduction to Artificial Intelligence
PTS Company Brochure 2025 (1).pdf.......
Understanding Forklifts - TECH EHS Solution
medical staffing services at VALiNTRY
Navsoft: AI-Powered Business Solutions & Custom Software Development
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How Creative Agencies Leverage Project Management Software.pdf
Introduction Database Management System for Course Database
CHAPTER 2 - PM Management and IT Context
System and Network Administraation Chapter 3
top salesforce developer skills in 2025.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus

Measures in SQL (SIGMOD 2024, Santiago, Chile)

  • 1. Measures in SQL Julian Hyde (Google) John Fremlin (Google) 2024-06-11 17:30 Europa
  • 2. Measures in SQL ABSTRACT SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries. SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL. To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context. SIGMOD, June 9–15, 2024, Santiago, Chile Julian Hyde Google Inc. San Francisco, CA, USA julianhyde@google.com John Fremlin Google Inc. New York, NY, USA fremlin@google.com
  • 4. Tables are broken! Tables are unable to provide reusable calculations.
  • 5. Problem: Calculate profit margin of orders SELECT prodName, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders WHERE prodName = ‘Happy’; profitMargin ============ 0.47 prodName custName orderDate revenue cost Happy Alice 2023/11/28 6 4 Acme Bob 2023/11/27 5 2 Happy Alice 2024/11/28 7 4 Whizz Celia 2023/11/25 3 1 Happy Bob 2022/11/27 4 1 SELECT prodName, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders WHERE prodName = ‘Happy’; profitMargin ============ 0.47
  • 6. Attempted solution: Create a view SELECT AVG(profitMargin) AS profitMargin FROM SummarizedOrders WHERE prodName = ‘Happy’; profitMargin ============ 0.50 CREATE VIEW SummarizedOrders AS SELECT prodName, orderDate, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders GROUP BY prodName, orderDate; prodName custName orderDate revenue cost Happy Alice 2023/11/28 6 4 Acme Bob 2023/11/27 5 2 Happy Alice 2024/11/28 7 4 Whizz Celia 2023/11/25 3 1 Happy Bob 2022/11/27 4 1 SELECT prodName, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders WHERE prodName = ‘Happy’; profitMargin ============ 0.47
  • 8. 1. Allow tables to have measures DESCRIBE EnhancedOrders; column type ============ ============== prodName STRING custName STRING orderDate DATE revenue INTEGER cost INTEGER profitMargin DOUBLE MEASURE 2. Operators for evaluating measures SELECT prodName, profitMargin FROM EnhancedOrders GROUP BY prodName; prodName profitMargin ======== ============ Acme 0.60 Happy 0.47 Whizz 0.67 3. Syntax to define measures in a query SELECT *, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS MEASURE profitMargin FROM Orders GROUP BY prodName; Extend the relational model with measures
  • 9. SELECT prodName, profitMargin FROM EnhancedOrders GROUP BY prodName; Definitions A context-sensitive expression (CSE) is an expression whose value is determined by an evaluation context. An evaluation context is a predicate whose terms are one or more columns from the same table. ● This set of columns is the dimensionality of the CSE. A measure is a special kind of column that becomes a CSE when used in a query. ● A measure’s dimensionality is the set of non- measure columns in its table. ● The data type of a measure that returns a value of type t is t MEASURE, e.g. INTEGER MEASURE. prodName profitMargin ======== ============ Acme 0.60 Happy 0.50 Whizz 0.67 SELECT (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders WHERE prodName = ‘Acme’; profitMargin ============ 0.60 profitMargin is a measure (and a CSE) Dimensionality is {prodName, custName, orderDate, revenue, cost} Evaluation context for this cell is prodName = ‘Acme’
  • 10. SELECT (SUM(revenue) - SUM(cost)) / SUM(revenue) AS m FROM Orders WHERE prodName = ‘Whizz’ AND custName = ‘Bob’; m ==== NULL SELECT (SUM(revenue) - SUM(cost)) / SUM(revenue) AS m FROM Orders WHERE prodName = ‘Acme’; m ==== 0.60 SELECT (SUM(revenue) - SUM(cost)) / SUM(revenue) AS m FROM Orders WHERE prodName = ‘Happy’; m ==== 0.50 SELECT prodName, profitMargin, profitMargin AT (SET prodName = ‘Happy’) AS happyMargin, profitMargin AT (SET custName = ‘Bob’) AS bobMargin FROM EnhancedOrders GROUP BY prodName; AT operator The context transformation operator AT modifies the evaluation context. Syntax: expression AT (contextModifier…) contextModifier ::= WHERE predicate | ALL | ALL dimension | SET dimension = [CURRENT] expression | VISIBLE prodName profitMargin happyMargin bobMargin ======== ============ =========== ========= Acme 0.60 0.50 0.60 Happy 0.50 0.50 0.75 Whizz 0.67 0.50 NULL Evaluation context for this cell is prodName = ‘Acme’ Evaluation context for these cells is prodName = ‘Happy’ Evaluation context for this cell is prodName = ‘Whizz’ AND custName = ‘Bob’
  • 12. Grain-locking What is the average age of the customer who would ordered each product? When we use an aggregate function in a join query, it will ‘double count’ if the join duplicates rows. This is generally not we want for measures – except if we want a weighted average – but is difficult to avoid in SQL. Measures are locked to the grain of the table that defined them. WITH EnhancedCustomers AS ( SELECT *, AVG(custAge) AS MEASURE avgAge FROM Customers) SELECT o.prodName, AVG(c.custAge) AS weightedAvgAge, c.avgAge AS avgAge FROM Orders AS o JOIN EnhancedCustomers AS c USING (custName) GROUP BY o.prodName; prodName weightedAvgAge avgAge ======== ============== ====== Acme 41 41 Happy 29 32 Whizz 17 17 prodName custName orderDate revenue cost Happy Alice 2023/11/28 6 4 Acme Bob 2023/11/27 5 2 Happy Alice 2024/11/28 7 4 Whizz Celia 2023/11/25 3 1 Happy Bob 2022/11/27 4 1 custName custAge Alice 23 Bob 41 Celia 17 Alice (age 23) has two orders; Bob (age 41) has one order.
  • 13. Measures prevent self-joins In 2020, what was the revenue and year- on-year revenue growth of each product? SELECT o20.prodName o20.sumRevenue, o20.sumRevenue - o19.sumRevenue AS revenueGrowth FROM ( SELECT prodName, SUM(revenue) AS sumRevenue FROM Orders JOIN Products USING (prodName) WHERE YEAR(orderDate) = 2020 GROUP BY prodName) AS o20 JOIN ( SELECT prodName, SUM(revenue) AS sumRevenue FROM Orders JOIN Products USING (prodName) WHERE YEAR(orderDate) = 2019 GROUP BY prodName) AS o19 ON o20.prodName = 019.prodName; SELECT prodName, sumRevenue, sumRevenue - sumRevenue AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1) FROM ( SELECT *, SUM(revenue) AS MEASURE sumRevenue FROM Orders JOIN Products USING (prodName)) WHERE YEAR(orderDate) = 2020 GROUP BY prodName;
  • 14. Relational algebra (bottom-up) Multidimensional (top-down) Products Customers ⨝ ⨝ Σ ⨝ σ Orders Products Customers ⨝ ⨝ Σ σ Orders π (customer: all, orderYear: 2019, prodName: all) (customer: all, orderYear: 2020, prodName: all) custName prodName orderDate Bottom-up vs Top-down query
  • 15. Represent a Business Intelligence model as a SQL view Orders Products Customers CREATE VIEW OrdersCube AS SELECT * FROM ( SELECT o.orderDate AS `order.date`, o.revenue AS `order.revenue`, SUM(o.revenue) AS MEASURE `order.sum_revenue` FROM Orders) AS o LEFT JOIN ( SELECT c.custName AS `customer.name`, c.state AS `customer.state`, c.custAge AS `customer.age`, AVG(c.custAge) AS MEASURE `customer.avg_age` FROM Customers) AS c ON o.custName = c.custName LEFT JOIN ( SELECT p.prodName AS `product.name`, p.color AS `product.color`, AVG(p.weight) AS MEASURE `product.avg_weight` FROM Products) AS p ON o.prodName = p.prodName; SELECT `customer.state`, `product.avg_weight` FROM OrdersCube GROUP BY `customer.state`; ● SQL planner handles view expansion ● Grain locking makes it safe to use a star schema ● Users can define new models simply by writing queries
  • 16. SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue FROM Orders Composition & closure Just as tables are closed under queries, so tables-with-measures are closed under queries-with-measures Measures can reference measures Complex analytical calculations without touching the FROM clause Evaluation contexts can be nested SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue, (sumRevenue - sumCost) / sumRevenue AS MEASURE profitMargin FROM Orders SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue, (sumRevenue - sumCost) / sumRevenue AS MEASURE profitMargin, sumRevenue - sumRevenue AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1) AS MEASURE revenueGrowthYoY FROM Orders SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue, (sumRevenue - sumCost) / sumRevenue AS MEASURE profitMargin, sumRevenue - sumRevenue AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1) AS MEASURE revenueGrowthYoY, ARRAY_AGG(productId ORDER BY sumRevenue DESC LIMIT 5) AT (ALL productId) AS MEASURE top5Products FROM Orders; SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue, (sumRevenue - sumCost) / sumRevenue AS MEASURE profitMargin, sumRevenue - sumRevenue AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1) AS MEASURE revenueGrowthYoY, ARRAY_AGG(productId ORDER BY sumRevenue DESC LIMIT 5) AT (ALL productId) AS MEASURE top5Products, ARRAY_AGG(customerId ORDER BY sumRevenue DESC LIMIT 3) AT (ALL customerId SET productId MEMBER OF top5Products AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1)) AS MEASURE top3CustomersOfTop5Products FROM Orders;
  • 17. Implementing measures & CSEs as SQL rewrites simple complex Complexity Query Expanded query Simple measure can be inlined SELECT prodName, avgRevenue FROM OrdersCube GROUP BY prodName SELECT prodName, AVG(revenue) FROM orders GROUP BY prodName Join requires grain- locking SELECT prodName, avgAge FROM OrdersCube GROUP BY prodName SELECT o.prodName, AVG(c.custAge PER c.custName) FROM orders JOIN customers GROUP BY prodName → (something with GROUPING SETS) Period-over- period SELECT prodName, avgAge - avgAge AT (SET year = CURRENT year - 1) FROM OrdersCube GROUP BY prodName (something with window aggregates) Scalar subquery can accomplish anything SELECT prodName, prodColor avgAge AT (ALL custState SET year = CURRENT year - 1) FROM OrdersCube GROUP BY prodName, prodColor SELECT prodName, prodColor, (SELECT … FROM orders WHERE <evaluation context>) FROM orders GROUP BY prodName, prodColor
  • 18. Summary Measures provide reusable calculations ● Can represent BI models (aka ‘cubes’, ‘semantic layer’) as SQL views Top-down evaluation makes queries concise ● Fewer self joins → fewer user errors, less planner effort, more efficient execution Measures don’t break SQL ● Queries without measures give same results to regular SQL ● Queries with measures give same row count as regular SQL ● Measures can be implemented by expanding to SQL

Editor's Notes

  • #2: Image source: http://www.hydromatic.net/pix/pix2021/raw/P2021719_j6d_IMG_2063.JPG
  • #15: https://tngchristians.ca/images/articles/gridcube.jpg
  • #17: Don’t touch the FROM clause – to form analytic calculations, it is sufficient to write complex expressions in the SELECT clause
  • #19: Top-down evaluation makes queries concise. No self-joins are necessary (so you don’t have to repeat yourself, self-join, and deal with duplicate data) Calculations are reusable, which means that we can define the calculations in sub-queries, store the sub-queries as views, and share the views as the ‘model’.
  • #20: Image source: http://www.hydromatic.net/pix/pix2021/raw/P2021719_j6d_IMG_2063.JPG