SlideShare a Scribd company logo
1
SAN FRANCISCO PALACE OF FINE ARTS
SPEAKERS
SPEAKERS
JONATHON MILLER-GIRVETZ
Data Analyst, Customer Support
To Use or Not Use
PDT’s
What we will
cover
•  Why we derive and persist
•  Types of derived tables
•  When to use them
•  What to LOOK out for
•  When to move to ETL
•  Balance
•  Best practices
2
Why we derive and persist?
A derived table is a SQL query that defines a set of business logic, returns reduced amounts of
data, and can include complex calculations and data transformations
Persistence is when data survives after its creation process has terminated
Some examples
•  Persisting form data in a web app for a better UX
•  Persisting data aggregation in an embedded visualization to easily and quickly access
complex analysis
•  Persisting in Looker ensuring data is ready for analysis
3
4
Types of derived tables
Ephemeral derived tables, EDT’s
WITH tmp AS (SELECT user_id,
SUM(active_usage_min) AS total_active_usage_min
FROM usage
GROUP BY 1 ORDER BY 2 DESC)
Persistent derived tables, PDT’s
CREATE TABLE usage_facts AS
(SELECT user_id,
SUM(active_usage_min) AS total_active_usage_min
FROM usage
GROUP BY 1 ORDER BY 2 DESC)
5
PDT’s build by persisting
and/or triggering, which
caches the table
EDT’s build every time at
runtime of the query
When to build a derived table?
To name a few from the top...
•  Historical summaries
•  Entity and transaction tables
•  Roll-ups/aggregations
•  Overcome SQL structural limitations
•  Window functions
•  Required subqueries
•  Nested aggregates
•  Correlated subqueries
6
When to build an EDT instead of
a PDT?
•  When the view is quick to run
•  When the view should include real-time data
•  A UNION ALL between a historical PDT and a sort-key-filtered, indexed-
filtered, and partitioned-filtered current slice - multi node databases
•  When it should be dynamically built based on user filter inputs
•  Templated filters
•  When a view needs to be dynamic, but the number of
permutations is manageable and likely to be reused
•  User selections
•  Filter values
•  User attributes
7
8
“I love ephemeral derived tables because they
feel light-weight and focused, but they make
the most sense when you're doing something
small and quick and/or if what you're doing is
sensitive to frequent ETL. If you don't mind the
[computation] cost and redoing the
computation each time, then I'd say don't
persist.”
Maxie Corbin
Looker Data Analyst, Customer Support
9
When should we build a PDT?
•  Data freshness requirements
•  Available database resources ratio to resources consumed by the build
•  Prototyping - laying the groundwork for views, business logic and future
ETL processes
How to?
datagroup: set build caching policies - release 4.16+
persist_for: co builds
sql_trigger_value: builds
10
What to LOOK out for?
PDT’s are very powerful but they are not perfect
•  Being aware of the front-end UX and the derived table aggregations that affect it
•  Computational resources
•  Available database resources
•  Time, query queue, and (potentially) money
How much
usage per
customer?
How has our
retention rate
changed over
the past 6
years?
None of the
queries
appear to be
working?
Select
margin
of error?
[SQL
ERROR]:
Table lock?
Table lock.
When should a PDT be part of
the ETL?
•  When a powerful ETL/transformation tool can be leveraged
•  When a PDT is consistently being used
•  When a PDT’s logic is well-understood, stable and rarely changing
•  When raw data only needs to be processed “once” or incrementally
•  When a PDT is being used outside of Looker
•  AVOID table locks which halt the query queue and backup your query breadline
•  When the naming of the ETL’d table clearly communicates its contents and/or a
data dictionary exists to view the definition of the ETL’d table
12
13
1. Extract data from
sources
2. Transform data with
PDT in Looker
3. Excellent User
Experience
Prototype PDT, load it in Looker,
and move it to ETL if merited
Collect more data
and iterate
Move it!
14
PDT’s ETLYou already have the SQL
lkml provides models across
dialects
The high wire balance
It’s a pragmatic balance between flexibility and reliability, where few PDT’s are flexible,
but many PDT’s can be unreliable.
PDT
Too many to keep track of? Not
feeling reliable or manageable? ETL!
ReliabilityFlexibility
ETL
Feeling stiff and rigid? Need to
stretch out those analytical thoughts?
PDT’s!
15
When to use take away
16
•  Real-time data
•  Quick query
•  Dynamically built
•  Data freshness
•  Available database
resources
•  Prototyping
•  Powerful ETL tool
•  Well understood
PDT
•  Consistently used
PDT
•  Used outside Looker
EDT PDT ETL
Development best practices
•  Use consistent naming conventions
•  Easy to locate and determine primary keys without the need to look through the
entire PDT definition
•  Development guidelines
•  Iterative development
•  Test the SQL as you develop
•  Validate the lkml often
•  File and code structure
•  Horizontal vs vertical rules
•  Changing and pushing to prod
•  Update a PDT SQL definition or datagroup in dev and push to prod will result in a non-existent
PDT in prod - forces build on production. So, BUILD IN DEV! and then push :)
17
Horizontal Development
connection: “myconnection”
label: “My Marketing Team”
# includes marketing views
include: “marketing.*.view”
# includes marketing dashboards
include: “marketing.*.dashboard”
18
view: usage_per_user { }
view: total_usage { }
view: pct_usage_per_user { }
explore: pct_usage_per_user {hidden: yes}
view: pct_usage_per_user {
derived_table: {
sql: SELECT email,
usage,
SUM(1.0*usage_per_customer/NULLIF(total_usage,0)) OVER (ORDER BY total_usage
DESC ) AS running_total_usage
FROM ${usage_per_user.SQL_TABLE_NAME}, ${total_usage.SQL_TABLE_NAME} ;;
}
}
view: usage_per_user {
derived_table: {
sql: SELECT DISTINCT users.user_id,
users.email,
SUM(usage_fact.usage_minutes) AS usage_per_user
FROM users
INNER JOIN ${usage_fact.SQL_TABLE_NAME} AS usage_fact ON users.id =
usage_fact.user_id
GROUP BY 1, 2 ORDER BY 3 DESC ;;
}
}
view: total_usage {
derived_table: {
sql: SELECT SUM(usage_minutes) AS total_usage FROM usage_fact ;;
}
}
Vertical Development
19
Single view file: pct_usage_per_user.view.lkml
Questions?
https://discourse.looker.com/t/join-2017-deep-dive-to-use-or-not-use-pdts/5846
Jonathon M-G
21
Data Analyst, Customer Support

More Related Content

PPTX
The Impala Cookbook
PPTX
Gobierno de datos con Power BI
PDF
PostgreSQLアーキテクチャ入門(PostgreSQL Conference 2012)
PPT
Query optimization
PPTX
Oracle sharding : Installation & Configuration
DOC
Informatica Interview Questions & Answers
PDF
Exploring BigData with Google BigQuery
The Impala Cookbook
Gobierno de datos con Power BI
PostgreSQLアーキテクチャ入門(PostgreSQL Conference 2012)
Query optimization
Oracle sharding : Installation & Configuration
Informatica Interview Questions & Answers
Exploring BigData with Google BigQuery

What's hot (20)

PPTX
DATA WAREHOUSING
PPT
Data warehouse
PDF
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
PDF
Informatica interview questions
PPTX
Introduction of sql server indexing
PDF
Data Modeling in Looker
PDF
Visual analytics
KEY
Beyond (No)SQL
PDF
Tuning SQL for Oracle Exadata: The Good, The Bad, and The Ugly Tuning SQL fo...
PPSX
Designing high performance datawarehouse
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
PDF
Building an open data platform with apache iceberg
PPTX
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
PPTX
Apache Tez – Present and Future
PPTX
Intro to dbms
PPTX
Machine Learning in the IoT with Apache NiFi
PDF
TPC-DI - The First Industry Benchmark for Data Integration
PDF
Practical Partitioning in Production with Postgres
 
PDF
mysql 8.0 architecture and enhancement
PDF
Introduction to Data Warehousing
DATA WAREHOUSING
Data warehouse
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
Informatica interview questions
Introduction of sql server indexing
Data Modeling in Looker
Visual analytics
Beyond (No)SQL
Tuning SQL for Oracle Exadata: The Good, The Bad, and The Ugly Tuning SQL fo...
Designing high performance datawarehouse
File Format Benchmark - Avro, JSON, ORC and Parquet
Building an open data platform with apache iceberg
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Tez – Present and Future
Intro to dbms
Machine Learning in the IoT with Apache NiFi
TPC-DI - The First Industry Benchmark for Data Integration
Practical Partitioning in Production with Postgres
 
mysql 8.0 architecture and enhancement
Introduction to Data Warehousing
Ad

Similar to Join 2017_Deep Dive_To Use or Not Use PDT's (20)

PPTX
What is ETL?
PPTX
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
PDF
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
PPTX
Top Data Build Tool Training – DBT Training in Hyderabad.pptx
PPTX
Data Modeling on Azure for Analytics
PDF
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
PDF
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
PPT
Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...
PPTX
HBaseCon 2013: ETL for Apache HBase
PPTX
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
DOCX
Ajith_kumar_4.3 Years_Informatica_ETL
PDF
Enterprise Data Management - Data Lake - A Perspective
PPT
Should ETL Become Obsolete
PPT
Skills Portfolio
PPTX
Modernise your EDW - Data Lake
PDF
Finit how to let go - enterprise archive apps
PPTX
Designing modern dw and data lake
PDF
Information Retrieval And Evaluating Its Usefulness
PPTX
The Transformation of your Data in modern IT (Presented by DellEMC)
PDF
BI Chapter 03.pdf business business business business business business
What is ETL?
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
Top Data Build Tool Training – DBT Training in Hyderabad.pptx
Data Modeling on Azure for Analytics
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
Ajith_kumar_4.3 Years_Informatica_ETL
Enterprise Data Management - Data Lake - A Perspective
Should ETL Become Obsolete
Skills Portfolio
Modernise your EDW - Data Lake
Finit how to let go - enterprise archive apps
Designing modern dw and data lake
Information Retrieval And Evaluating Its Usefulness
The Transformation of your Data in modern IT (Presented by DellEMC)
BI Chapter 03.pdf business business business business business business
Ad

More from Looker (20)

PDF
Join 2017_Deep Dive_Table Calculations 201
PDF
Join 2017_Deep Dive_Table Calculations 101
PDF
Join 2017_Deep Dive_Smart Caching
PDF
Join 2017_Deep Dive_Sessionization
PDF
Join 2017_Deep Dive_Redshift Optimization
PDF
Join 2017_Deep Dive_Integrating Looker with R and Python
PDF
Join 2017_Deep Dive_Customer Retention
PDF
Join 2017_Deep Dive_Workflows with Zapier
PDF
Join2017_Deep Dive_AWS Operations
PDF
Join 2017 - Deep Dive - Action Hub
PPTX
Winning the 3rd Wave of BI
PDF
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
PPTX
Wisdom of Crowds Webinar Deck
PPTX
How the economist with cloud BI and Looker have improved data-driven decision...
PPTX
Frank Bien Opening Keynote - Join 2016
PPTX
Frank Bien Opening Keynote - Join 2016
PPTX
Meet Looker 4
PPTX
Winning with Data
PPTX
Data Stack Considerations: Build vs. Buy at Tout
PPTX
Embedding Data & Analytics With Looker
Join 2017_Deep Dive_Table Calculations 201
Join 2017_Deep Dive_Table Calculations 101
Join 2017_Deep Dive_Smart Caching
Join 2017_Deep Dive_Sessionization
Join 2017_Deep Dive_Redshift Optimization
Join 2017_Deep Dive_Integrating Looker with R and Python
Join 2017_Deep Dive_Customer Retention
Join 2017_Deep Dive_Workflows with Zapier
Join2017_Deep Dive_AWS Operations
Join 2017 - Deep Dive - Action Hub
Winning the 3rd Wave of BI
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Wisdom of Crowds Webinar Deck
How the economist with cloud BI and Looker have improved data-driven decision...
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
Meet Looker 4
Winning with Data
Data Stack Considerations: Build vs. Buy at Tout
Embedding Data & Analytics With Looker

Recently uploaded (20)

PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
ai tools demonstartion for schools and inter college
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
top salesforce developer skills in 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Essential Infomation Tech presentation.pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
history of c programming in notes for students .pptx
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
VVF-Customer-Presentation2025-Ver1.9.pptx
Odoo Companies in India – Driving Business Transformation.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
ai tools demonstartion for schools and inter college
Wondershare Filmora 15 Crack With Activation Key [2025
top salesforce developer skills in 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How to Migrate SBCGlobal Email to Yahoo Easily
Essential Infomation Tech presentation.pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Softaken Excel to vCard Converter Software.pdf
Understanding Forklifts - TECH EHS Solution
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Odoo POS Development Services by CandidRoot Solutions
history of c programming in notes for students .pptx
How to Choose the Right IT Partner for Your Business in Malaysia

Join 2017_Deep Dive_To Use or Not Use PDT's

  • 1. 1 SAN FRANCISCO PALACE OF FINE ARTS SPEAKERS SPEAKERS JONATHON MILLER-GIRVETZ Data Analyst, Customer Support To Use or Not Use PDT’s
  • 2. What we will cover •  Why we derive and persist •  Types of derived tables •  When to use them •  What to LOOK out for •  When to move to ETL •  Balance •  Best practices 2
  • 3. Why we derive and persist? A derived table is a SQL query that defines a set of business logic, returns reduced amounts of data, and can include complex calculations and data transformations Persistence is when data survives after its creation process has terminated Some examples •  Persisting form data in a web app for a better UX •  Persisting data aggregation in an embedded visualization to easily and quickly access complex analysis •  Persisting in Looker ensuring data is ready for analysis 3
  • 4. 4 Types of derived tables Ephemeral derived tables, EDT’s WITH tmp AS (SELECT user_id, SUM(active_usage_min) AS total_active_usage_min FROM usage GROUP BY 1 ORDER BY 2 DESC) Persistent derived tables, PDT’s CREATE TABLE usage_facts AS (SELECT user_id, SUM(active_usage_min) AS total_active_usage_min FROM usage GROUP BY 1 ORDER BY 2 DESC)
  • 5. 5 PDT’s build by persisting and/or triggering, which caches the table EDT’s build every time at runtime of the query
  • 6. When to build a derived table? To name a few from the top... •  Historical summaries •  Entity and transaction tables •  Roll-ups/aggregations •  Overcome SQL structural limitations •  Window functions •  Required subqueries •  Nested aggregates •  Correlated subqueries 6
  • 7. When to build an EDT instead of a PDT? •  When the view is quick to run •  When the view should include real-time data •  A UNION ALL between a historical PDT and a sort-key-filtered, indexed- filtered, and partitioned-filtered current slice - multi node databases •  When it should be dynamically built based on user filter inputs •  Templated filters •  When a view needs to be dynamic, but the number of permutations is manageable and likely to be reused •  User selections •  Filter values •  User attributes 7
  • 8. 8 “I love ephemeral derived tables because they feel light-weight and focused, but they make the most sense when you're doing something small and quick and/or if what you're doing is sensitive to frequent ETL. If you don't mind the [computation] cost and redoing the computation each time, then I'd say don't persist.” Maxie Corbin Looker Data Analyst, Customer Support
  • 9. 9 When should we build a PDT? •  Data freshness requirements •  Available database resources ratio to resources consumed by the build •  Prototyping - laying the groundwork for views, business logic and future ETL processes How to? datagroup: set build caching policies - release 4.16+ persist_for: co builds sql_trigger_value: builds
  • 10. 10 What to LOOK out for? PDT’s are very powerful but they are not perfect •  Being aware of the front-end UX and the derived table aggregations that affect it •  Computational resources •  Available database resources •  Time, query queue, and (potentially) money
  • 11. How much usage per customer? How has our retention rate changed over the past 6 years? None of the queries appear to be working? Select margin of error? [SQL ERROR]: Table lock? Table lock.
  • 12. When should a PDT be part of the ETL? •  When a powerful ETL/transformation tool can be leveraged •  When a PDT is consistently being used •  When a PDT’s logic is well-understood, stable and rarely changing •  When raw data only needs to be processed “once” or incrementally •  When a PDT is being used outside of Looker •  AVOID table locks which halt the query queue and backup your query breadline •  When the naming of the ETL’d table clearly communicates its contents and/or a data dictionary exists to view the definition of the ETL’d table 12
  • 13. 13 1. Extract data from sources 2. Transform data with PDT in Looker 3. Excellent User Experience Prototype PDT, load it in Looker, and move it to ETL if merited Collect more data and iterate
  • 14. Move it! 14 PDT’s ETLYou already have the SQL lkml provides models across dialects
  • 15. The high wire balance It’s a pragmatic balance between flexibility and reliability, where few PDT’s are flexible, but many PDT’s can be unreliable. PDT Too many to keep track of? Not feeling reliable or manageable? ETL! ReliabilityFlexibility ETL Feeling stiff and rigid? Need to stretch out those analytical thoughts? PDT’s! 15
  • 16. When to use take away 16 •  Real-time data •  Quick query •  Dynamically built •  Data freshness •  Available database resources •  Prototyping •  Powerful ETL tool •  Well understood PDT •  Consistently used PDT •  Used outside Looker EDT PDT ETL
  • 17. Development best practices •  Use consistent naming conventions •  Easy to locate and determine primary keys without the need to look through the entire PDT definition •  Development guidelines •  Iterative development •  Test the SQL as you develop •  Validate the lkml often •  File and code structure •  Horizontal vs vertical rules •  Changing and pushing to prod •  Update a PDT SQL definition or datagroup in dev and push to prod will result in a non-existent PDT in prod - forces build on production. So, BUILD IN DEV! and then push :) 17
  • 18. Horizontal Development connection: “myconnection” label: “My Marketing Team” # includes marketing views include: “marketing.*.view” # includes marketing dashboards include: “marketing.*.dashboard” 18
  • 19. view: usage_per_user { } view: total_usage { } view: pct_usage_per_user { } explore: pct_usage_per_user {hidden: yes} view: pct_usage_per_user { derived_table: { sql: SELECT email, usage, SUM(1.0*usage_per_customer/NULLIF(total_usage,0)) OVER (ORDER BY total_usage DESC ) AS running_total_usage FROM ${usage_per_user.SQL_TABLE_NAME}, ${total_usage.SQL_TABLE_NAME} ;; } } view: usage_per_user { derived_table: { sql: SELECT DISTINCT users.user_id, users.email, SUM(usage_fact.usage_minutes) AS usage_per_user FROM users INNER JOIN ${usage_fact.SQL_TABLE_NAME} AS usage_fact ON users.id = usage_fact.user_id GROUP BY 1, 2 ORDER BY 3 DESC ;; } } view: total_usage { derived_table: { sql: SELECT SUM(usage_minutes) AS total_usage FROM usage_fact ;; } } Vertical Development 19 Single view file: pct_usage_per_user.view.lkml
  • 21. Jonathon M-G 21 Data Analyst, Customer Support