SlideShare a Scribd company logo
DISTRIBUTED QUERY
Conor Cunningham
Principal Architect
SQL Server Engineering Team
WHO AM I?
•I work on the SQL Server Core Engine
•Specialize in Query Processing/Optimization
•14+ years at Microsoft
•3rd year speaking at SQLBits – 3 talks this year
•I love hearing about how you use the product
• I take that back to the Engineering team so we can work on
the next versions of SQL Server/Azure
TALK AGENDA
•Problem Statement
•(Quick) Summary of SQL Server’s Optimizer
•DQ Optimization Approach
•Under-the-hood Examples
•Distributed Partitioned Views
•Common Troubleshooting Techniques
PROBLEM STATEMENT
• Data Living on Different Servers?
• Data Living on non-SQL Server?
• Need to Manage Many Servers?
• Want to move data from one server to another
without dealing with SSIS?
• …
• There are many reasons to use Distributed Query
– it fills many holes
OPTIMIZER OVERVIEW
• I gave a SQLBits talk on this 2 years ago
• You can watch that talk on sqlbits.com
• Key Concepts in the Optimizer:
• Operators shaped into trees
• Trees and Sub-Trees have Properties
• Rules transform sub-trees into new sub-trees
• Equivalent sub-trees get stored in a management structure called
the “Memo”
• The sequence of rules and heuristics is applied to try to generate
good query plans efficiently
DQ OPTIMIZATION GOAL
• DQ tries to make remote tables appear to be local
(so you don’t care that they are remote)
SELECT SUM(col1), col2 FROM
<remotetbl> WHERE col3 > 10000
GROUP BY col2
SELECT SUM(col1), col2 FROM
localtbl WHERE col3 > 10000
GROUP BY col2
Server 1 (Local) Server 2 (Remote)
Expectation: Push operations to remote server
• That works for the basic cases
• What about more complex cases?
NEXT EXAMPLE – SHOULD IT REMOTE?
• Let’s try a cross product:
SELECT * FROM
<remotetbl> as t1,
<remotetbl> as t2
Server 1 (Local) Server 2 (Remote)
Should it remote?
HOW ABOUT THIS ONE?
• Join Small Local Table to Large Remote Table
SELECT * FROM smalllocal
as L, <bigremote> as R
ON L.col1=R.col1
Server 1 (Local)
• Pulling a big table over the
network is expensive
• It would be great if we could get
that join condition to remote…
DQ OPTIMIZATION DIFFERENCES
• Data is remote, expensive to move (network)
• Often the desired behavior is pretty basic –
remote if you can do so
• Sweet spots for several optimizations changes
• We force several optimizations we use only for “expensive” local
queries (example: pushing local group by to the remote source)
ONE LAYER DEEPER…
• SQL Server’s QP acts like a SQL Server client
• Based on OLEDB
• It can talk to most OLEDB providers, not just SQL
Server
• So you can pull data from Oracle or DB2 or Excel or Text Files or
even write your own provider
• Each phase of query compilation and execution
are overridden to use remote data instead
QUERY BINDING
• We load metadata from OLEDB schema rowsets instead of
our own system tables
• DBSCHEMA_TABLES, _COLUMNS, _INDEXES, …
• Metadata is cached locally to avoid round trips
• OLEDB Types converted to closest SQL type
• Lossy conversions possible for non-SQL Server
• We ask for the output schema for views and sprocs by
compiling them on the remote side
• If we do, we try to cache this connection for execution
OPTIMIZATION
• General Goal: Remote Large Subtrees
• We do use statistics, indexes, and some constraint information from
remote sources
• We can work against SQL providers, Index providers, or simple table
providers
• We start with a “get all data from remote source” plan and try to find
better plans
• Startup and per-row costs for remote sources are expensive
• We also tweak lots and lots of rules to run differently for DQ (no trivial
plan, different join reordering, aggressive local-global agg pushdown)
• Finally, we generate lots of subtrees that remote and pick the “cheapest”
one per our cost model
OPTIMIZATION SEARCH
GB(b,c) SUM(C.d)
Join
Join
RmtA B
RmtC
GB(a,c) SUM(C.d)
Join
Join
B
GB(b,c) SUM(C.d)
Join
Join
RmtA
B
RmtC
GB(a,c) SUM(C.d)
Join
RmtA
B
RmtC
GB(b,c) SUM(C.d)
Join
Join
RmtA
B
RmtC
GB(c) SUM(C.d)
GB(b,c) SUM(C.d)
Join
B
EXECUTION
• Mostly similar to regular OLEDB clients
• Open DB, SetCommandText, Execute, Read Rows
• Some parts are more unique
• Compile and Execute are 2 separate steps
• We have to validate the plan is still valid
• So we compare the schema compile vs. execute (and recompile if needed)
o We find many provider bugs nobody else does here 
• Note: we can remote lock hints in remote queries
READING DQ EXECUTION PLANS
1. ICommand::Execute/IOpenRowset opens each scan initially
2. We retrieve rows in batches (50-100) when possible
3. Each new NLJ scan of inner side calls IRowset::RestartPosition
4. We stop reading when we have satisfied the query requirements
(only do complete scans when necessary)
1
2
3
4
(DISTRIBUTED) TRANSACTIONS
• Transactions ensure correctness
• Distributed Transactions require multiple databases to either commit
or abort together
• Microsoft ships a component called MSDTC that:
• Provides a common service for dist. Transactions
• Works for non-database things (queues, etc.)
• Brokers between transaction protocols of different vendors
• DQ uses this component
• Not all queries require transactions, and DQ optimizes performance
by only starting a DTC when necessary
• Configuring MSDTC is done on the Domain Controller by the Domain
Administrator…
DOUBLE-HOP AUTHENTICATION
• Use Integrated Auth? Get Errors through DQ?
• This scenario happens in different places
• User->IIS->SQL Server
• User->SQL-(DQ)->SQL
• This is known as the “double hop problem”
• Don’t be afraid! It is possible to flow credentials and use your domain
identities – talk to your domain administrator to define your SPN and
permissions!
Links to read:
http://msdn.microsoft.com/en-us/library/ms189580.aspx
http://support.microsoft.com/kb/238477
DISTRIBUTED PARTITIONED VIEWS
• DPVs were an early scale-out model in DQ
• You split a table by ranges and put each on its own server (check
constraints for the ranges)
• A UNION ALL view tied them together
• DQ then did various optimizations including:
• Pruning of unneeded partitions
• Startup predicates to do dynamic pruning
• Downsides:
• Compilation time was high
• Commands not done in parallel to each server
• This feature influenced our partitioned tables design
TROUBLESHOOTING
• Biggest problem in DQ is “it didn’t remote”
• Various reasons:
• Some function isn’t supported by DQ
• Exotic data types (XML, CLR types)
• Correctness issues – most date issues only trust the local clock (otherwise results can
differ when you remote)
• Sometimes the costing model will be close on 2+ plan choices and a plan will “stop
remoting” (switch plans) to one that we think is similar in cost but is not
• Workarounds: In most cases, OPENQUERY() can be used to specify the
exact text you wish to remote. Think of this as plan forcing for
Distributed Query
• Also note:
• SQL Server – SQL Server remoting is much better than SQL-Other DMBS vendors
(our algebra and theirs does not always align)
CONCLUSION
• Thank you for your attention
• Questions?
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related Content

PDF
Akka Actors and Clustering
PPTX
NoSQL Evolution
PDF
Client Drivers and Cassandra, the Right Way
PDF
Cassandra Core Concepts - Cassandra Day Toronto
PPTX
Scott Schnoll - Exchange server 2013 high availability and site resilience
PDF
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
PDF
Planning for Disaster Recovery (DR) with Galera Cluster
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Akka Actors and Clustering
NoSQL Evolution
Client Drivers and Cassandra, the Right Way
Cassandra Core Concepts - Cassandra Day Toronto
Scott Schnoll - Exchange server 2013 high availability and site resilience
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Planning for Disaster Recovery (DR) with Galera Cluster
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...

What's hot (18)

PDF
Geographically Distributed Multi-Master MySQL Clusters
PPTX
Pnuts yahoo!’s hosted data serving platform
PDF
Using galera replication to create geo distributed clusters on the wan
PPTX
Performance out
PPT
Clustering
PDF
Introduction to Cassandra - Denver
PPTX
Building WebLogic Domains With WLST
PPTX
Lost with data consistency
PPTX
Oracle hard and soft parsing
PPTX
Load Balancing from the Cloud - Layer 7 Aware Solution
ODP
Distributed systems and consistency
PPTX
Scott Schnoll - Exchange server 2013 virtualization best practices
PDF
MariaDB High Availability Webinar
PPT
Weblogic - clustering failover, and load balancing
PDF
Webinar Slides: MySQL Multi-Site Multi-Master Done Right
PPT
Building large scale, job processing systems with Scala Akka Actor framework
PPTX
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Geographically Distributed Multi-Master MySQL Clusters
Pnuts yahoo!’s hosted data serving platform
Using galera replication to create geo distributed clusters on the wan
Performance out
Clustering
Introduction to Cassandra - Denver
Building WebLogic Domains With WLST
Lost with data consistency
Oracle hard and soft parsing
Load Balancing from the Cloud - Layer 7 Aware Solution
Distributed systems and consistency
Scott Schnoll - Exchange server 2013 virtualization best practices
MariaDB High Availability Webinar
Weblogic - clustering failover, and load balancing
Webinar Slides: MySQL Multi-Site Multi-Master Done Right
Building large scale, job processing systems with Scala Akka Actor framework
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Ad

Similar to Distributed query deep dive conor cunningham (20)

PDF
SQL Server 2019 CTP2.4
PDF
Understanding Presto - Presto meetup @ Tokyo #1
PPTX
The End of a Myth: Ultra-Scalable Transactional Management
PDF
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
PPTX
Experience SQL Server 2017: The Modern Data Platform
PDF
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
PPT
Ebook8
PPT
Sql interview question part 8
PPTX
Chapter 5 - Distributed Database and QODD.pptx
PDF
Apache Big Data EU 2015 - Phoenix
PPT
Distributed Queries in IDS: New features.
PPTX
Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014
PPTX
Database , 6 Query Introduction
PDF
Brad McGehee Intepreting Execution Plans Mar09
PDF
Brad McGehee Intepreting Execution Plans Mar09
PPTX
Database Performance Tuning
PPTX
Challenges in Querying a Distributed Relational Database
PPTX
Modernizing Mission-Critical Apps with SQL Server
PDF
Database and application performance vivek sharma
PDF
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
SQL Server 2019 CTP2.4
Understanding Presto - Presto meetup @ Tokyo #1
The End of a Myth: Ultra-Scalable Transactional Management
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Experience SQL Server 2017: The Modern Data Platform
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
Ebook8
Sql interview question part 8
Chapter 5 - Distributed Database and QODD.pptx
Apache Big Data EU 2015 - Phoenix
Distributed Queries in IDS: New features.
Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014
Database , 6 Query Introduction
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
Database Performance Tuning
Challenges in Querying a Distributed Relational Database
Modernizing Mission-Critical Apps with SQL Server
Database and application performance vivek sharma
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Ad

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
annual-report-2024-2025 original latest.
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Computer network topology notes for revision
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
IB Computer Science - Internal Assessment.pptx
Mega Projects Data Mega Projects Data
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
annual-report-2024-2025 original latest.
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Qualitative Qantitative and Mixed Methods.pptx
Business Analytics and business intelligence.pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Fluorescence-microscope_Botany_detailed content
Miokarditis (Inflamasi pada Otot Jantung)
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
climate analysis of Dhaka ,Banglades.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Computer network topology notes for revision

Distributed query deep dive conor cunningham

  • 1. DISTRIBUTED QUERY Conor Cunningham Principal Architect SQL Server Engineering Team
  • 2. WHO AM I? •I work on the SQL Server Core Engine •Specialize in Query Processing/Optimization •14+ years at Microsoft •3rd year speaking at SQLBits – 3 talks this year •I love hearing about how you use the product • I take that back to the Engineering team so we can work on the next versions of SQL Server/Azure
  • 3. TALK AGENDA •Problem Statement •(Quick) Summary of SQL Server’s Optimizer •DQ Optimization Approach •Under-the-hood Examples •Distributed Partitioned Views •Common Troubleshooting Techniques
  • 4. PROBLEM STATEMENT • Data Living on Different Servers? • Data Living on non-SQL Server? • Need to Manage Many Servers? • Want to move data from one server to another without dealing with SSIS? • … • There are many reasons to use Distributed Query – it fills many holes
  • 5. OPTIMIZER OVERVIEW • I gave a SQLBits talk on this 2 years ago • You can watch that talk on sqlbits.com • Key Concepts in the Optimizer: • Operators shaped into trees • Trees and Sub-Trees have Properties • Rules transform sub-trees into new sub-trees • Equivalent sub-trees get stored in a management structure called the “Memo” • The sequence of rules and heuristics is applied to try to generate good query plans efficiently
  • 6. DQ OPTIMIZATION GOAL • DQ tries to make remote tables appear to be local (so you don’t care that they are remote) SELECT SUM(col1), col2 FROM <remotetbl> WHERE col3 > 10000 GROUP BY col2 SELECT SUM(col1), col2 FROM localtbl WHERE col3 > 10000 GROUP BY col2 Server 1 (Local) Server 2 (Remote) Expectation: Push operations to remote server • That works for the basic cases • What about more complex cases?
  • 7. NEXT EXAMPLE – SHOULD IT REMOTE? • Let’s try a cross product: SELECT * FROM <remotetbl> as t1, <remotetbl> as t2 Server 1 (Local) Server 2 (Remote) Should it remote?
  • 8. HOW ABOUT THIS ONE? • Join Small Local Table to Large Remote Table SELECT * FROM smalllocal as L, <bigremote> as R ON L.col1=R.col1 Server 1 (Local) • Pulling a big table over the network is expensive • It would be great if we could get that join condition to remote…
  • 9. DQ OPTIMIZATION DIFFERENCES • Data is remote, expensive to move (network) • Often the desired behavior is pretty basic – remote if you can do so • Sweet spots for several optimizations changes • We force several optimizations we use only for “expensive” local queries (example: pushing local group by to the remote source)
  • 10. ONE LAYER DEEPER… • SQL Server’s QP acts like a SQL Server client • Based on OLEDB • It can talk to most OLEDB providers, not just SQL Server • So you can pull data from Oracle or DB2 or Excel or Text Files or even write your own provider • Each phase of query compilation and execution are overridden to use remote data instead
  • 11. QUERY BINDING • We load metadata from OLEDB schema rowsets instead of our own system tables • DBSCHEMA_TABLES, _COLUMNS, _INDEXES, … • Metadata is cached locally to avoid round trips • OLEDB Types converted to closest SQL type • Lossy conversions possible for non-SQL Server • We ask for the output schema for views and sprocs by compiling them on the remote side • If we do, we try to cache this connection for execution
  • 12. OPTIMIZATION • General Goal: Remote Large Subtrees • We do use statistics, indexes, and some constraint information from remote sources • We can work against SQL providers, Index providers, or simple table providers • We start with a “get all data from remote source” plan and try to find better plans • Startup and per-row costs for remote sources are expensive • We also tweak lots and lots of rules to run differently for DQ (no trivial plan, different join reordering, aggressive local-global agg pushdown) • Finally, we generate lots of subtrees that remote and pick the “cheapest” one per our cost model
  • 13. OPTIMIZATION SEARCH GB(b,c) SUM(C.d) Join Join RmtA B RmtC GB(a,c) SUM(C.d) Join Join B GB(b,c) SUM(C.d) Join Join RmtA B RmtC GB(a,c) SUM(C.d) Join RmtA B RmtC GB(b,c) SUM(C.d) Join Join RmtA B RmtC GB(c) SUM(C.d) GB(b,c) SUM(C.d) Join B
  • 14. EXECUTION • Mostly similar to regular OLEDB clients • Open DB, SetCommandText, Execute, Read Rows • Some parts are more unique • Compile and Execute are 2 separate steps • We have to validate the plan is still valid • So we compare the schema compile vs. execute (and recompile if needed) o We find many provider bugs nobody else does here  • Note: we can remote lock hints in remote queries
  • 15. READING DQ EXECUTION PLANS 1. ICommand::Execute/IOpenRowset opens each scan initially 2. We retrieve rows in batches (50-100) when possible 3. Each new NLJ scan of inner side calls IRowset::RestartPosition 4. We stop reading when we have satisfied the query requirements (only do complete scans when necessary) 1 2 3 4
  • 16. (DISTRIBUTED) TRANSACTIONS • Transactions ensure correctness • Distributed Transactions require multiple databases to either commit or abort together • Microsoft ships a component called MSDTC that: • Provides a common service for dist. Transactions • Works for non-database things (queues, etc.) • Brokers between transaction protocols of different vendors • DQ uses this component • Not all queries require transactions, and DQ optimizes performance by only starting a DTC when necessary • Configuring MSDTC is done on the Domain Controller by the Domain Administrator…
  • 17. DOUBLE-HOP AUTHENTICATION • Use Integrated Auth? Get Errors through DQ? • This scenario happens in different places • User->IIS->SQL Server • User->SQL-(DQ)->SQL • This is known as the “double hop problem” • Don’t be afraid! It is possible to flow credentials and use your domain identities – talk to your domain administrator to define your SPN and permissions! Links to read: http://msdn.microsoft.com/en-us/library/ms189580.aspx http://support.microsoft.com/kb/238477
  • 18. DISTRIBUTED PARTITIONED VIEWS • DPVs were an early scale-out model in DQ • You split a table by ranges and put each on its own server (check constraints for the ranges) • A UNION ALL view tied them together • DQ then did various optimizations including: • Pruning of unneeded partitions • Startup predicates to do dynamic pruning • Downsides: • Compilation time was high • Commands not done in parallel to each server • This feature influenced our partitioned tables design
  • 19. TROUBLESHOOTING • Biggest problem in DQ is “it didn’t remote” • Various reasons: • Some function isn’t supported by DQ • Exotic data types (XML, CLR types) • Correctness issues – most date issues only trust the local clock (otherwise results can differ when you remote) • Sometimes the costing model will be close on 2+ plan choices and a plan will “stop remoting” (switch plans) to one that we think is similar in cost but is not • Workarounds: In most cases, OPENQUERY() can be used to specify the exact text you wish to remote. Think of this as plan forcing for Distributed Query • Also note: • SQL Server – SQL Server remoting is much better than SQL-Other DMBS vendors (our algebra and theirs does not always align)
  • 20. CONCLUSION • Thank you for your attention • Questions?
  • 21. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.