SlideShare a Scribd company logo
CS 542 Database Management SystemsQuery ExecutionJ Singh March 21, 2011
This meetingData Models for NoSQL DatabasesPreliminariesWhat are we shooting for?Reference Material for Benchmarks posted in blogSome slides from TPC-C SIGMOD ‘97 PresentationQuery ExecutionSort: Chapter 15Join: Sections 16.1 – 16.4
Data Models forNoSQL DatabasesClass Discussion at Next Meeting. How would you represent many-to-many relationships? Also many-to-one and one-to-one.Cassandra. Brian CardMongoDB. AnniesDuctanRedis. Jonathan GlumacGoogle App Engine. Sahel MastoureshghAmazon SimpleDB. ZahidMianCouchDB. Robert Van Reenen3-minute presentation (on 3/21) for 20 bonus points
What are we shooting for?Good benchmarksDefine the playing fieldSet the performance agendaMeasure release-to-release progressSet goals (e.g., 10,000 tpmC, < 50 $/tpmC) Something managers can understand (!)Benchmark abuse BenchmarketingBenchmark wars more $ on ads than developmentTo keep abuses to a minimum, Benchmarks are defined with precision and read like they are legal documents (example).Some companies include specific prohibitions against publishing benchmark results in their license agreements
Benchmarks have a LifetimeGood benchmarks drive industry and technology forward.At some point, all reasonable advances have been made.Benchmarks can become counter productive by encouraging artificial optimizations.So, even good benchmarks become obsolete over time.
Database BenchmarksRelational Database (OLTP) BenchmarksTPC = Transaction Processing Performance CouncilDe facto industry standards body for OLTP performanceMost TPC specs, info, results are on the web page: http://www.tpc.orgTPC-C has been the workhorse of the industry, more in a minuteTPC-E is more comprehensiveDifferent problem spaces require different benchmarksOther benchmarks for analytics / decision support systemsTwo papers referenced on the course website on NoSQL / MapReduceBenchmarks define the problem set, not the technologyE.g., if managing documents, create and use a document management benchmark, not one that was created to show off the capabilities of your DB.
TPC-C’s Five TransactionsWorkload DefinitionTransactions operate against a database of nine tablesTransactions:New-order: enter a new order from a customerPayment: update customer balance to reflect a paymentDelivery: deliver orders (done as a batch transaction)Order-status: retrieve status of customer’s most recent orderStock-level: monitor warehouse inventorySpecifies size of each tableSpecifies # of users and workflow (next slide)Specifies configuration requirements must be ACID, failure tolerant, distributed, …Response time requirement: 90% of each type of transaction must have a response time <= 5 seconds, except stock-level which is <= 20 seconds.Result:How many TPC-C transactions can be supported?What is the $/tpm cost
TPC-C Workflow1Select txn from menu:1. New-Order 	45%2. Payment 	43%	3. Order-Status	4%4. Delivery 	4%5. Stock-Level 	4%Cycle Time Decomposition(typical values, in seconds, for weighted average txn)Menu = 0.3Keying = 9.6Txn RT = 2.1Think = 11.4Average cycle time = 23.42Measure menu Response TimeInput screenKeying time3Measure txn Response TimeOutput screenThink timeGo back to 1
TPC-C Results (by DBMS, as of 5/9/97)Stating the obvious…These results are not a comparison of databasesThey are a comparison of databases for the specific problem specified by the TPC-C benchmarkEnsuring a level playing field is essential when defining a benchmark and conducting measurementsWitness the Pavlo/Dean debate
Benchmarks for Other DatabasesClass Discussion at Next Meeting. What benchmarks are appropriate for Key-value stores?Document databases?Network databases?Geospatial databases?Genomic databases?Time series databases?Other?General discussion, no bonus pointsPlease let me know if I may call on you, and for which?
Overview of Query Execution
An example to work withBut first we must revisit Relational Algebra…Database: City, Country, CountryLanguage database.Example query: All cities in Finland with a population at least double of ArubaSELECT  [xyz]FROM  City, CountryWHERECity.CountryCode = 'fin' ANDCountry.Code = 'abw' ANDCity.population > 2*Country.population;
Relational OperatorsSelection Basics IdempotentCommutativeSelection ConjunctionsUseful when pruningSelection DisjunctionsEquivalent to UNIONS
Selection and Cross ProductWhen Selection is followed by a Cross Product,for A(R  S), Break A into three conditions such that A = r⋀ s⋀rs wherer only has the set of attributes only in Rs only has the set of attributes only in Srs, has the set of attributes in both R and SThen, the following holds:A(R  S) = r⋀ s⋀ rs(R  S) = rs(r(R)  s(S))In case you forgot…R ⋈A S = A(R  S)This result helps us compute Theta-joins!Review Chapter 2 of the textbook for more; back to the example…
An example to work withDatabase: City, Country, CountryLanguage database.Example query: All cities in Finland with a population at least double of ArubaSELECT  [xyz] FROM  City, CountryWHERECity.CountryCode = 'fin' ANDCountry.Code = 'abw' ANDCity.population > 2*Country.population;Algebra Representationxyz((T.cc = 'fin' ⋀ Y.cc = 'abw' ⋀ T.pop > 2*Y.pop) (T  Y)), orcontinued…
Example: Algebra ManipulationAlgebra Representationxyz((T.cc = 'fin' ⋀ Y.cc = 'abw'⋀T.pop > 2*Y.pop) (T  Y)), orxyz( ( T.pop > 2*Y.pop) (  (T.cc = 'fin' ) (T)   (Y.cc= 'abw' ) (Y) )Graphical Representation of Plan
Visualizing Plan ExecutionThe plan is a set of ‘operators’The operators operate in parallelOn different machines? On different processors? In different processes? In different threads? Yes, depends on the architecture.Each operator feeds its input to the next operatorThe “parallel operators” visualization allows for pipeliningThe output of one operator is the input to the nextA operator can block if its inputs are not readyDesign goal is for the operators to pipeline (if possible)Would like to start operating with partial dataTakes advantage of as much parallelism as the problem allows
Common ElementsKey metrics of each component:How much RAM does it consume?How much Disk I/O does it require?Each component is implemented as an IteratorBase class for each operator. Three methods:Open(). May block ifInput is not readyUnable to proceed till all data has been receivedGetNext(). Returns the next tuple.May block if the next tuple is not readyReturns NotFound when exhaustedClose()Performs any cleanup and terminates
Example: Table-scan operatorOpen():  passGetNext():  for b in blocks:    for t in tuples of b:      if valid t: return t  return NotFoundClose():  passKey Metrics:RAM: 1 blockDisk I/O: Number of blocksNotes:Represents the operations T(=City) and Y(=Country)Used only if appropriate indexes don’t existCan use prefetchingNot shown here
Summary so farBenchmarks are critical for defining performance goals of the databaseTPC-C is a widely-used benchmark,TPC-E is broader in scope but less widespreadNeed to choose benchmarks to fit the problem at handA query can be parsed into primitives for executionParallelism & pipelining are essential for performance
CS-542 Database Management SystemsQuery Execution Algorithms
One-pass AlgorithmsLend themselves nicely to pipelining (with minimum blocking)Good forTable-scans (as seen)Tuple-at-a-time operations (selection and projection)Full-relation binary operations (∪, ∩, -, ⋈, ) as long as one of the operands can fit in memoryConsidering JOIN next, read others from book
Open():  read S into memoryGetNext():  for b in blocks of R:    for t in tuples of b:      if t matches tuple s:        return join (t,s)  return NotFoundClose():  passExample: JOIN (R,S)Key Metrics:RAM: Blocks(S) + 1 blockDisk I/O: Blocks(R) + Blocks(S)Notes:Can use prefetching for RNot shown here
Nested-Loop JoinsWhat if all of S won’t fit into memory? We can do it chunk-by-chunk, a ‘chunk’ is as many blocks of S that will fitAlgorithm sketch:(I/O operations shown in bold)GetNext():for c in chunks of S:    for b in blocks of R:      for t in tuples of b:        for s in tuples of c:          return join(t,s)  return NotFoundKey MetricsRAM: MDisk I/O: Blocks(S)                + k * Blocks(R)    where k = (size(S)/#chunks)Note how quickly performance deteriorates!We can do better
Two-pass algorithmsSort-based two-pass algorithmsThe first pass does a sort on some parameter(s) of each operandThe second pass algorithm relies on the sort results and can be pipelinedHash-based two-pass algorithmsDo a prep-pass and write the result back to diskCompute the result in the second pass
Two-pass idea: sort exampleFor each of C chunks of M blocks, sort each chunk and write it backIn the example, we have 4 chunks, each 6 blocksMerge the resultKey MetricsFor the first pass:RAM: MDisk I/O: 2 * Blocks(R)For the 2nd pass:RAM: CDisk I/O: Blocks(R)
Naïve two-pass JOINSort R and S on the common attributes of the JOINMerge the sorted R and S on the common attributesSee section 15.4.9 of book for more detailsAlso known as Sort-JoinKey MetricsSortRAM: MDisk I/O:         4 * (Blocks(R) + Blocks(S))4, not 3 because we wrote the sort results backJoinRAM: 2Disk I/O:         (Blocks(R) + Blocks(S))Total OperationRAM: MDisk I/O:         5 * (Blocks(R) + Blocks(S))
Efficient two-pass JOINKey MetricsSort (only pass 1)RAM: MDisk I/O:         2 * (Blocks(R) + Blocks(S))JoinRAM: 2Disk I/O: None additional        (Blocks(R) + Blocks(S))Total OperationRAM: MDisk I/O:         3 * (Blocks(R) + Blocks(S))Main idea:Combine pass 2 of the sort with join
Hash JoinMain Idea:Pass 1: Dividetuples in R and S into m hash bucketsRead a block of R (or S)For each tuple in that block, find its hash i and move it to hash bucket i.Keep one block for each hash bucket in memoryWrite it out to disk when fullPass 2: For each iRead buckets Ri and Si and do their join.Key MetricsRAM: MDisk I/O:           3 * (Blocks(R) + Blocks(S))Disk I/O can be less if:Hash the bigger relation firstExpect that many of the buckets will still be in memory
Index-based AlgorithmsRefresher course on indexes and clusteringThe basic idea:Use the index to locate records and thus cut down on I/O
Index-based SelectionIf the relation T has a clustering index on cc,All tuples will be contiguousDisk I/O: Blocks(T)/V(T, 'fin')Where V(T,cc) is the number of tuples with cc = 'fin‘Sort of…If the relation T does not have a clustering index on cc,Tuples could be scatteredDisk I/O: Tuples(T)/V(T, 'fin')Big difference!Consider the selection (T.cc= 'fin' ) (T)
Index-based JOINIf, say, R has an index on Y,Same as a two-pass JOIN except that we don’t have to first sort/hash on RIf clustering index, Disk I/O,Blocks(R)/V(R,Y) + 3 * Blocks(S)Otherwise,Tuples(R)/V(R,Y) + 3 * Blocks(S)If both R and S are indexed,Disk I/O is reduced even furtherConsider the JOINR(X,Y) ⋈ S(Y,Z), where Y is the common set of attributes of R and S
SummaryExecution primitives forpipeliningOne-pass algorithms should be used wherever possibleTwo-pass algorithms can usually be used no matter how big the problemIndexes help and should be taken advantage of where possible
Query OptimizationBased on slides from Prof. Garcia-Molina
Desired Endpoint x=1 AND y=2 AND z<5 (R)R ⋈ S ⋈ UExample Physical Query Planstwo-passhash-join101 buffersFilter(x=1 AND z<5)materializeIndexScan(R,y=2)two-passhash-join101 buffersTableScan(U)TableScan(R)TableScan(S)
OutlineConvert SQL query to a parse treeSemantic checking: attributes, relation names, typesConvert to a logical query plan (relational algebra expression)deal with subqueriesImprove the logical query planuse algebraic transformationsgroup together certain operatorsevaluate logical plan based on estimated size of relations Convert to a physical query plansearch the space of physical plans choose order of operationscomplete the physical query plan
Improving the Logical Query PlanThere are numerous algebraic laws concerning relational algebra operationsBy applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficientlyNext we'll survey some of these laws
Relational Operators (revisited)Selection Basics IdempotentCommutativeSelection ConjunctionsUseful when pruningSelection DisjunctionsEquivalent to UNIONS
Laws Involving SelectionSelections usually reduce the size of the relationUsually good to do selections early, i.e., "push them down the tree"Also can be helpful to break up a complex selection into parts
Selection and Binary OperatorsMust push selection to both arguments:C (R U S) = C (R) U C (S)Must push to first arg, optional for 2nd:C (R - S) = C (R) -  SC (R - S) = C (R) -  C (S)Push to at least one arg with all attributes mentioned in C:product, natural join, theta join, intersectione.g., C (R X S) = C (R) X  S, if R has all the attributes in C
Pushing Selection Up the TreeSuppose we have relationsStarsIn(title,year,starName)Movie(title,year,len,inColor,studioName)and a viewCREATE VIEW MoviesOf1996 AS			SELECT *			FROM Movie			WHERE year = 1996;and the querySELECT starName, studioName	FROM MoviesOf1996 NATURAL JOIN StarsIn;
The Straightforward TreeRemember the ruleC(R ⋈S) = C(R) ⋈S ?starName,studioNameyear=1996           StarsInMovie
The Improved Logical Query PlanstarName,studioNamestarName,studioNamestarName,studioNameyear=1996year=1996      year=1996 year=1996           StarsInStarsInMovieStarsInMoviepush selectionup treepush selectiondown treeMovie
Laws Involving ProjectionsAdding a projection lower in the tree can improve performance, since often tuple size is reducedUsually not as helpful as pushing selections downConsult textbook for details, will not be on the exam
Joins and ProductsRecall from the definitions of relational algebra:R ⋈C S = C (R X S) (theta join)	where C equates same-name attributes in R and STo improve a logical query plan, replace a product followed by a selection with a joinJoin algorithms are usually faster than doing product followed by selection
Summary of LQP ImprovementsSelections:push down tree as far as possibleif condition is an AND, split and push separatelysometimes need to push up before pushing downProjections:can be pushed down (sometimes, read book)Selection/product combinations:can sometimes be replaced with join
OutlineConvert SQL query to a parse treeSemantic checking: attributes, relation names, typesConvert to a logical query plan (relational algebra expression)deal with subqueriesImprove the logical query planuse algebraic transformationsgroup together certain operatorsevaluate logical plan based on estimated size of relations Convert to a physical query plansearch the space of physical plans choose order of operationscomplete the physical query plan
Grouping Assoc/Comm OperatorsGroup together adjacent joins, adjacent unions, and adjacent intersections as siblings in the treeSets up the logical QP for future optimization when physical QP is constructed:  determine best order for doing a sequence of joins (or unions or intersections)U    D   E   FUDEFUA   B   C  ABC
Evaluating Logical Query PlansThe transformations discussed so far intuitively seem like good ideasBut how can we evaluate them more scientifically?Estimate size of relations, also helpful in evaluating physical query plansComing up next…
CS-542 Database Management SystemsPlan Estimation, based on slides from Prof. Garcia-Molina
Estimating Sizes of RelationsUsed in two places:to help decide between competing logical query plansto help decide between competing physical query plansNotation review:T(R): number of tuples in relation RB(R): minimum number of blocks needed to store RSo far, we’ve spelled it out Blocks(R)V(R,a): number of distinct values in R of attribute a
Requirements for Estimation RulesGive accurate estimatesAre easy (fast) to computeAre logically consistent: estimated size should not depend on how the relation is computedHere describe some simple heuristics.All we really need is a scheme that properly ranks competing plans.
Estimating Size of Selection (p1)Suppose selection condition is A = c, where A is an attribute and c is a constant.A reasonable estimate of the number of tuples in the result is:T(R)/V(R,A), i.e., original number of tuples divided by number of different values of AGood approximation if values of A are evenly distributedAlso good approximation in some other, common, situations (see textbook)
Estimating Size of Selection (p2)If condition is A < c:a good estimate is T(R)/3;  intuition is that usually you ask about something that is true of less than half the tuplesIf condition is A ≠ c:	a good estimate is T(R )If condition is the AND of several equalities and inequalities, estimate in series.
ExampleConsider relation R(a,b,c) with 10,000 tuples and 50 different values for attribute a.Consider selecting all tuples from R with a = 10 and b < 20.Estimate of number of resulting tuples is 10,000*(1/50)*(1/3) = 67.
Estimating Size of Selection (p3)If condition has the form C1 OR C2, use:sum of estimate for C1 and estimate for C2, unless that sum is > T(R) and the previous , orassuming C1 and C2 are independent,	T(R)*(1  (1f1)*(1f2)),	where f1 is fraction of R satisfying C1and                   f2is fraction of R satisfying C2
ExampleConsider relation R(a,b) 10,000 tuples and 50 different values for a.Consider selecting all tuples from R with a = 10 or b < 20.EstimateEstimate for a = 10 is 10,000/50 = 200Estimate for b < 20 is 10,000/3 = 3333Estimate for combined condition is200 + 3333 = 3533 or10,000*(1  (1  1/50)*(1  1/3)) = 3466Different, but not really
Estimating Size of Natural JoinAssume join is on a single attribute Y.Some possibilities:R and S have disjoint sets of Y values, so size of join is 0Y is the key of S and a foreign key of R, so size of join is T(R)All the tuples of R and S have the same Y value, so size of join is T(R)*T(S)We need some assumptions…
Join Estimation RuleExpected number of tuples in result isT(R)*T(S) / max(V(R,Y),V(S,Y))Why?  Suppose V(R,Y) ≤ V(S,Y).There are T(R) tuples in R.Each of them has a 1/V(S,Y) chance of joining with a given tuple of S, creating T(S)/V(S,Y) new tuples
ExampleSuppose we haveR(a,b) with T(R) = 1000 and V(R,b) = 20S(b,c) with T(S) = 2000, V(S,b) = 50, and V(S,c) = 100U(c,d) with T(U) = 5000 and V(U,c) = 500What is the estimated size of R ⋈S ⋈U?First join R and S (on attribute b): estimated size of result, X, is T(R)*T(S)/max(V(R,b),V(S,b)) = 40,000number of values of c in X is the same as in S, namely 100Then join X with U (on attribute c): estimated size of result is T(X)*T(U)/max(V(X,c),V(U,c)) = 400,000
Summary of Estimation RulesProjection: exactly computableProduct: exactly computableSelection: reasonable heuristicsJoin: reasonable heuristicsThe other operators are harder to estimate…
Estimating Size ParametersEstimating the size of a relation depended on knowing T(R) and V(R,a)'sEstimating cost of a physical algorithm depends on also knowing B(R).How can the query compiler learn them?Scan relation to learn T, V's, and then calculate BCan also keep a histogram of the values of attributes. Makes estimating join results more accurateRecomputed periodically, after some time or some number of updates, or if DB administrator thinks optimizer isn't choosing good plans
Heuristics to Reduce Cost of LQPFor each transformation of the tree being considered, estimate the "cost" before and after doing the transformationAt this point, "cost" only refers to sizes of intermediate relations (we don't yet know about number of disk I/O's)Sum of sizes of all intermediate relations is the heuristic:  if this sum is smaller after the transformation, then incorporate it
Why couldn’t we… A few questions to exploreNoSQL has also been described as NoJOINCould we use the techniques discussed here to implement JOINs on a NoSQL database?Could we implement the parallel operators as MapReduce jobs?Suitable topics in case you have not yet chosen a project
Update on ProjectsConsider includingbenchmark results in your presentationThere is no need to submit your codeKey fragments can be included in your report, as seen in numerous papersDo include design of the code in your reportDo not submit code. It will not be evaluatedPace yourselfPlan to finish up your project coding in 2 weeks (by 4/4)Plan to write and perfect your report and PPT after thatBudget your presentation time carefully.How is it going?
Next weekQuery OptimizationSuggested topic?We have half-a-lecture open to cover any topics of interest to everyone

More Related Content

PDF
PDF
Hadoop exercise
PDF
Data Analytics and Simulation in Parallel with MATLAB*
PPTX
CS 542 -- Query Optimization
PDF
Hadoop map reduce in operation
PDF
Pretzel: optimized Machine Learning framework for low-latency and high throug...
PPTX
Run time administration
Hadoop exercise
Data Analytics and Simulation in Parallel with MATLAB*
CS 542 -- Query Optimization
Hadoop map reduce in operation
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Run time administration

What's hot (20)

PPT
Chapter 7 Run Time Environment
PPTX
Hadoop job chaining
DOCX
Matrix Multiplication Report
PPTX
Search algorithms for discrete optimization
PPTX
Access to non local names
PPTX
Parallel sorting algorithm
PDF
Oracle Parallel Distribution and 12c Adaptive Plans
PDF
Hadoop combiner and partitioner
PPTX
Interpreting the Data:Parallel Analysis with Sawzall
PPTX
Mapreduce advanced
PPTX
Cloud schedulers and Scheduling in Hadoop
PPT
Cupdf.com introduction to-data-structures-and-algorithm
PPTX
Multi layered perceptron (mlp)
PPTX
Hadoop and HBase experiences in perf log project
PPTX
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
PPT
Parallel algorithms
PPTX
04 pig data operations
PDF
Chapter 4: Parallel Programming Languages
PPTX
Heap Management
Chapter 7 Run Time Environment
Hadoop job chaining
Matrix Multiplication Report
Search algorithms for discrete optimization
Access to non local names
Parallel sorting algorithm
Oracle Parallel Distribution and 12c Adaptive Plans
Hadoop combiner and partitioner
Interpreting the Data:Parallel Analysis with Sawzall
Mapreduce advanced
Cloud schedulers and Scheduling in Hadoop
Cupdf.com introduction to-data-structures-and-algorithm
Multi layered perceptron (mlp)
Hadoop and HBase experiences in perf log project
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Parallel algorithms
04 pig data operations
Chapter 4: Parallel Programming Languages
Heap Management
Ad

Viewers also liked (7)

PPTX
Query Execution Time and Query Optimization.
PDF
Understanding Query Execution
PPT
Query execution
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
PDF
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
PPTX
U-SQL Query Execution and Performance Tuning
PPTX
"Query Execution: Expectation - Reality (Level 300)" Денис Резник
Query Execution Time and Query Optimization.
Understanding Query Execution
Query execution
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
U-SQL Query Execution and Performance Tuning
"Query Execution: Expectation - Reality (Level 300)" Денис Резник
Ad

Similar to CS 542 -- Query Execution (20)

PDF
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
PDF
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
PDF
Database Systems - A Historical Perspective
PPT
01 intro
PDF
Database System Concepts 6th Edition, (Ebook PDF)
PDF
Query Evaluation Techniques for Large Databases.pdf
PDF
Five steps perform_2013
PPT
Database Management System Processing.ppt
PPTX
Unlocking Insights: Harnessing Data analytics
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
PPTX
An Introduction to Database systems.pptx
PPT
Understanding MySQL Performance through Benchmarking
PPTX
Performance By Design
PDF
Database System Concepts 6th Edition, (Ebook PDF)
PPTX
Databases for Storage Engineers
PDF
Readings in Database Systems Fourth Edition Joseph M. Hellerstein
PPT
Implementing the Database Server session 01
PPT
Column-vs-Row-how-different-are-they.ppt
PPTX
Modernizing Mission-Critical Apps with SQL Server
PPTX
Manjeet Singh.pptx
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Database Systems - A Historical Perspective
01 intro
Database System Concepts 6th Edition, (Ebook PDF)
Query Evaluation Techniques for Large Databases.pdf
Five steps perform_2013
Database Management System Processing.ppt
Unlocking Insights: Harnessing Data analytics
The Future of Fast Databases: Lessons from a Decade of QuestDB
An Introduction to Database systems.pptx
Understanding MySQL Performance through Benchmarking
Performance By Design
Database System Concepts 6th Edition, (Ebook PDF)
Databases for Storage Engineers
Readings in Database Systems Fourth Edition Joseph M. Hellerstein
Implementing the Database Server session 01
Column-vs-Row-how-different-are-they.ppt
Modernizing Mission-Critical Apps with SQL Server
Manjeet Singh.pptx

More from J Singh (20)

PDF
OpenLSH - a framework for locality sensitive hashing
PPTX
Designing analytics for big data
PDF
Open LSH - september 2014 update
PPTX
PaaS - google app engine
PPTX
Mining of massive datasets using locality sensitive hashing (LSH)
PPTX
Data Analytic Technology Platforms: Options and Tradeoffs
PPTX
Facebook Analytics with Elastic Map/Reduce
PPTX
Big Data Laboratory
PPTX
The Hadoop Ecosystem
PPTX
Social Media Mining using GAE Map Reduce
PPTX
High Throughput Data Analysis
PPTX
NoSQL and MapReduce
PPTX
CS 542 -- Concurrency Control, Distributed Commit
PPTX
CS 542 -- Failure Recovery, Concurrency Control
PPTX
CS 542 Putting it all together -- Storage Management
PPTX
CS 542 Parallel DBs, NoSQL, MapReduce
PPTX
CS 542 Database Index Structures
PPTX
CS 542 Controlling Database Integrity and Performance
PPTX
CS 542 Overview of query processing
PPTX
CS 542 Introduction
OpenLSH - a framework for locality sensitive hashing
Designing analytics for big data
Open LSH - september 2014 update
PaaS - google app engine
Mining of massive datasets using locality sensitive hashing (LSH)
Data Analytic Technology Platforms: Options and Tradeoffs
Facebook Analytics with Elastic Map/Reduce
Big Data Laboratory
The Hadoop Ecosystem
Social Media Mining using GAE Map Reduce
High Throughput Data Analysis
NoSQL and MapReduce
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Failure Recovery, Concurrency Control
CS 542 Putting it all together -- Storage Management
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Database Index Structures
CS 542 Controlling Database Integrity and Performance
CS 542 Overview of query processing
CS 542 Introduction

CS 542 -- Query Execution

  • 1. CS 542 Database Management SystemsQuery ExecutionJ Singh March 21, 2011
  • 2. This meetingData Models for NoSQL DatabasesPreliminariesWhat are we shooting for?Reference Material for Benchmarks posted in blogSome slides from TPC-C SIGMOD ‘97 PresentationQuery ExecutionSort: Chapter 15Join: Sections 16.1 – 16.4
  • 3. Data Models forNoSQL DatabasesClass Discussion at Next Meeting. How would you represent many-to-many relationships? Also many-to-one and one-to-one.Cassandra. Brian CardMongoDB. AnniesDuctanRedis. Jonathan GlumacGoogle App Engine. Sahel MastoureshghAmazon SimpleDB. ZahidMianCouchDB. Robert Van Reenen3-minute presentation (on 3/21) for 20 bonus points
  • 4. What are we shooting for?Good benchmarksDefine the playing fieldSet the performance agendaMeasure release-to-release progressSet goals (e.g., 10,000 tpmC, < 50 $/tpmC) Something managers can understand (!)Benchmark abuse BenchmarketingBenchmark wars more $ on ads than developmentTo keep abuses to a minimum, Benchmarks are defined with precision and read like they are legal documents (example).Some companies include specific prohibitions against publishing benchmark results in their license agreements
  • 5. Benchmarks have a LifetimeGood benchmarks drive industry and technology forward.At some point, all reasonable advances have been made.Benchmarks can become counter productive by encouraging artificial optimizations.So, even good benchmarks become obsolete over time.
  • 6. Database BenchmarksRelational Database (OLTP) BenchmarksTPC = Transaction Processing Performance CouncilDe facto industry standards body for OLTP performanceMost TPC specs, info, results are on the web page: http://www.tpc.orgTPC-C has been the workhorse of the industry, more in a minuteTPC-E is more comprehensiveDifferent problem spaces require different benchmarksOther benchmarks for analytics / decision support systemsTwo papers referenced on the course website on NoSQL / MapReduceBenchmarks define the problem set, not the technologyE.g., if managing documents, create and use a document management benchmark, not one that was created to show off the capabilities of your DB.
  • 7. TPC-C’s Five TransactionsWorkload DefinitionTransactions operate against a database of nine tablesTransactions:New-order: enter a new order from a customerPayment: update customer balance to reflect a paymentDelivery: deliver orders (done as a batch transaction)Order-status: retrieve status of customer’s most recent orderStock-level: monitor warehouse inventorySpecifies size of each tableSpecifies # of users and workflow (next slide)Specifies configuration requirements must be ACID, failure tolerant, distributed, …Response time requirement: 90% of each type of transaction must have a response time <= 5 seconds, except stock-level which is <= 20 seconds.Result:How many TPC-C transactions can be supported?What is the $/tpm cost
  • 8. TPC-C Workflow1Select txn from menu:1. New-Order 45%2. Payment 43% 3. Order-Status 4%4. Delivery 4%5. Stock-Level 4%Cycle Time Decomposition(typical values, in seconds, for weighted average txn)Menu = 0.3Keying = 9.6Txn RT = 2.1Think = 11.4Average cycle time = 23.42Measure menu Response TimeInput screenKeying time3Measure txn Response TimeOutput screenThink timeGo back to 1
  • 9. TPC-C Results (by DBMS, as of 5/9/97)Stating the obvious…These results are not a comparison of databasesThey are a comparison of databases for the specific problem specified by the TPC-C benchmarkEnsuring a level playing field is essential when defining a benchmark and conducting measurementsWitness the Pavlo/Dean debate
  • 10. Benchmarks for Other DatabasesClass Discussion at Next Meeting. What benchmarks are appropriate for Key-value stores?Document databases?Network databases?Geospatial databases?Genomic databases?Time series databases?Other?General discussion, no bonus pointsPlease let me know if I may call on you, and for which?
  • 11. Overview of Query Execution
  • 12. An example to work withBut first we must revisit Relational Algebra…Database: City, Country, CountryLanguage database.Example query: All cities in Finland with a population at least double of ArubaSELECT [xyz]FROM City, CountryWHERECity.CountryCode = 'fin' ANDCountry.Code = 'abw' ANDCity.population > 2*Country.population;
  • 13. Relational OperatorsSelection Basics IdempotentCommutativeSelection ConjunctionsUseful when pruningSelection DisjunctionsEquivalent to UNIONS
  • 14. Selection and Cross ProductWhen Selection is followed by a Cross Product,for A(R  S), Break A into three conditions such that A = r⋀ s⋀rs wherer only has the set of attributes only in Rs only has the set of attributes only in Srs, has the set of attributes in both R and SThen, the following holds:A(R  S) = r⋀ s⋀ rs(R  S) = rs(r(R)  s(S))In case you forgot…R ⋈A S = A(R  S)This result helps us compute Theta-joins!Review Chapter 2 of the textbook for more; back to the example…
  • 15. An example to work withDatabase: City, Country, CountryLanguage database.Example query: All cities in Finland with a population at least double of ArubaSELECT [xyz] FROM City, CountryWHERECity.CountryCode = 'fin' ANDCountry.Code = 'abw' ANDCity.population > 2*Country.population;Algebra Representationxyz((T.cc = 'fin' ⋀ Y.cc = 'abw' ⋀ T.pop > 2*Y.pop) (T  Y)), orcontinued…
  • 16. Example: Algebra ManipulationAlgebra Representationxyz((T.cc = 'fin' ⋀ Y.cc = 'abw'⋀T.pop > 2*Y.pop) (T  Y)), orxyz( ( T.pop > 2*Y.pop) ( (T.cc = 'fin' ) (T)   (Y.cc= 'abw' ) (Y) )Graphical Representation of Plan
  • 17. Visualizing Plan ExecutionThe plan is a set of ‘operators’The operators operate in parallelOn different machines? On different processors? In different processes? In different threads? Yes, depends on the architecture.Each operator feeds its input to the next operatorThe “parallel operators” visualization allows for pipeliningThe output of one operator is the input to the nextA operator can block if its inputs are not readyDesign goal is for the operators to pipeline (if possible)Would like to start operating with partial dataTakes advantage of as much parallelism as the problem allows
  • 18. Common ElementsKey metrics of each component:How much RAM does it consume?How much Disk I/O does it require?Each component is implemented as an IteratorBase class for each operator. Three methods:Open(). May block ifInput is not readyUnable to proceed till all data has been receivedGetNext(). Returns the next tuple.May block if the next tuple is not readyReturns NotFound when exhaustedClose()Performs any cleanup and terminates
  • 19. Example: Table-scan operatorOpen(): passGetNext(): for b in blocks: for t in tuples of b: if valid t: return t return NotFoundClose(): passKey Metrics:RAM: 1 blockDisk I/O: Number of blocksNotes:Represents the operations T(=City) and Y(=Country)Used only if appropriate indexes don’t existCan use prefetchingNot shown here
  • 20. Summary so farBenchmarks are critical for defining performance goals of the databaseTPC-C is a widely-used benchmark,TPC-E is broader in scope but less widespreadNeed to choose benchmarks to fit the problem at handA query can be parsed into primitives for executionParallelism & pipelining are essential for performance
  • 21. CS-542 Database Management SystemsQuery Execution Algorithms
  • 22. One-pass AlgorithmsLend themselves nicely to pipelining (with minimum blocking)Good forTable-scans (as seen)Tuple-at-a-time operations (selection and projection)Full-relation binary operations (∪, ∩, -, ⋈, ) as long as one of the operands can fit in memoryConsidering JOIN next, read others from book
  • 23. Open(): read S into memoryGetNext(): for b in blocks of R: for t in tuples of b: if t matches tuple s: return join (t,s) return NotFoundClose(): passExample: JOIN (R,S)Key Metrics:RAM: Blocks(S) + 1 blockDisk I/O: Blocks(R) + Blocks(S)Notes:Can use prefetching for RNot shown here
  • 24. Nested-Loop JoinsWhat if all of S won’t fit into memory? We can do it chunk-by-chunk, a ‘chunk’ is as many blocks of S that will fitAlgorithm sketch:(I/O operations shown in bold)GetNext():for c in chunks of S: for b in blocks of R: for t in tuples of b: for s in tuples of c: return join(t,s) return NotFoundKey MetricsRAM: MDisk I/O: Blocks(S) + k * Blocks(R) where k = (size(S)/#chunks)Note how quickly performance deteriorates!We can do better
  • 25. Two-pass algorithmsSort-based two-pass algorithmsThe first pass does a sort on some parameter(s) of each operandThe second pass algorithm relies on the sort results and can be pipelinedHash-based two-pass algorithmsDo a prep-pass and write the result back to diskCompute the result in the second pass
  • 26. Two-pass idea: sort exampleFor each of C chunks of M blocks, sort each chunk and write it backIn the example, we have 4 chunks, each 6 blocksMerge the resultKey MetricsFor the first pass:RAM: MDisk I/O: 2 * Blocks(R)For the 2nd pass:RAM: CDisk I/O: Blocks(R)
  • 27. Naïve two-pass JOINSort R and S on the common attributes of the JOINMerge the sorted R and S on the common attributesSee section 15.4.9 of book for more detailsAlso known as Sort-JoinKey MetricsSortRAM: MDisk I/O: 4 * (Blocks(R) + Blocks(S))4, not 3 because we wrote the sort results backJoinRAM: 2Disk I/O: (Blocks(R) + Blocks(S))Total OperationRAM: MDisk I/O: 5 * (Blocks(R) + Blocks(S))
  • 28. Efficient two-pass JOINKey MetricsSort (only pass 1)RAM: MDisk I/O: 2 * (Blocks(R) + Blocks(S))JoinRAM: 2Disk I/O: None additional (Blocks(R) + Blocks(S))Total OperationRAM: MDisk I/O: 3 * (Blocks(R) + Blocks(S))Main idea:Combine pass 2 of the sort with join
  • 29. Hash JoinMain Idea:Pass 1: Dividetuples in R and S into m hash bucketsRead a block of R (or S)For each tuple in that block, find its hash i and move it to hash bucket i.Keep one block for each hash bucket in memoryWrite it out to disk when fullPass 2: For each iRead buckets Ri and Si and do their join.Key MetricsRAM: MDisk I/O: 3 * (Blocks(R) + Blocks(S))Disk I/O can be less if:Hash the bigger relation firstExpect that many of the buckets will still be in memory
  • 30. Index-based AlgorithmsRefresher course on indexes and clusteringThe basic idea:Use the index to locate records and thus cut down on I/O
  • 31. Index-based SelectionIf the relation T has a clustering index on cc,All tuples will be contiguousDisk I/O: Blocks(T)/V(T, 'fin')Where V(T,cc) is the number of tuples with cc = 'fin‘Sort of…If the relation T does not have a clustering index on cc,Tuples could be scatteredDisk I/O: Tuples(T)/V(T, 'fin')Big difference!Consider the selection (T.cc= 'fin' ) (T)
  • 32. Index-based JOINIf, say, R has an index on Y,Same as a two-pass JOIN except that we don’t have to first sort/hash on RIf clustering index, Disk I/O,Blocks(R)/V(R,Y) + 3 * Blocks(S)Otherwise,Tuples(R)/V(R,Y) + 3 * Blocks(S)If both R and S are indexed,Disk I/O is reduced even furtherConsider the JOINR(X,Y) ⋈ S(Y,Z), where Y is the common set of attributes of R and S
  • 33. SummaryExecution primitives forpipeliningOne-pass algorithms should be used wherever possibleTwo-pass algorithms can usually be used no matter how big the problemIndexes help and should be taken advantage of where possible
  • 34. Query OptimizationBased on slides from Prof. Garcia-Molina
  • 35. Desired Endpoint x=1 AND y=2 AND z<5 (R)R ⋈ S ⋈ UExample Physical Query Planstwo-passhash-join101 buffersFilter(x=1 AND z<5)materializeIndexScan(R,y=2)two-passhash-join101 buffersTableScan(U)TableScan(R)TableScan(S)
  • 36. OutlineConvert SQL query to a parse treeSemantic checking: attributes, relation names, typesConvert to a logical query plan (relational algebra expression)deal with subqueriesImprove the logical query planuse algebraic transformationsgroup together certain operatorsevaluate logical plan based on estimated size of relations Convert to a physical query plansearch the space of physical plans choose order of operationscomplete the physical query plan
  • 37. Improving the Logical Query PlanThere are numerous algebraic laws concerning relational algebra operationsBy applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficientlyNext we'll survey some of these laws
  • 38. Relational Operators (revisited)Selection Basics IdempotentCommutativeSelection ConjunctionsUseful when pruningSelection DisjunctionsEquivalent to UNIONS
  • 39. Laws Involving SelectionSelections usually reduce the size of the relationUsually good to do selections early, i.e., "push them down the tree"Also can be helpful to break up a complex selection into parts
  • 40. Selection and Binary OperatorsMust push selection to both arguments:C (R U S) = C (R) U C (S)Must push to first arg, optional for 2nd:C (R - S) = C (R) - SC (R - S) = C (R) - C (S)Push to at least one arg with all attributes mentioned in C:product, natural join, theta join, intersectione.g., C (R X S) = C (R) X S, if R has all the attributes in C
  • 41. Pushing Selection Up the TreeSuppose we have relationsStarsIn(title,year,starName)Movie(title,year,len,inColor,studioName)and a viewCREATE VIEW MoviesOf1996 AS SELECT * FROM Movie WHERE year = 1996;and the querySELECT starName, studioName FROM MoviesOf1996 NATURAL JOIN StarsIn;
  • 42. The Straightforward TreeRemember the ruleC(R ⋈S) = C(R) ⋈S ?starName,studioNameyear=1996 StarsInMovie
  • 43. The Improved Logical Query PlanstarName,studioNamestarName,studioNamestarName,studioNameyear=1996year=1996 year=1996 year=1996 StarsInStarsInMovieStarsInMoviepush selectionup treepush selectiondown treeMovie
  • 44. Laws Involving ProjectionsAdding a projection lower in the tree can improve performance, since often tuple size is reducedUsually not as helpful as pushing selections downConsult textbook for details, will not be on the exam
  • 45. Joins and ProductsRecall from the definitions of relational algebra:R ⋈C S = C (R X S) (theta join) where C equates same-name attributes in R and STo improve a logical query plan, replace a product followed by a selection with a joinJoin algorithms are usually faster than doing product followed by selection
  • 46. Summary of LQP ImprovementsSelections:push down tree as far as possibleif condition is an AND, split and push separatelysometimes need to push up before pushing downProjections:can be pushed down (sometimes, read book)Selection/product combinations:can sometimes be replaced with join
  • 47. OutlineConvert SQL query to a parse treeSemantic checking: attributes, relation names, typesConvert to a logical query plan (relational algebra expression)deal with subqueriesImprove the logical query planuse algebraic transformationsgroup together certain operatorsevaluate logical plan based on estimated size of relations Convert to a physical query plansearch the space of physical plans choose order of operationscomplete the physical query plan
  • 48. Grouping Assoc/Comm OperatorsGroup together adjacent joins, adjacent unions, and adjacent intersections as siblings in the treeSets up the logical QP for future optimization when physical QP is constructed: determine best order for doing a sequence of joins (or unions or intersections)U D E FUDEFUA B C ABC
  • 49. Evaluating Logical Query PlansThe transformations discussed so far intuitively seem like good ideasBut how can we evaluate them more scientifically?Estimate size of relations, also helpful in evaluating physical query plansComing up next…
  • 50. CS-542 Database Management SystemsPlan Estimation, based on slides from Prof. Garcia-Molina
  • 51. Estimating Sizes of RelationsUsed in two places:to help decide between competing logical query plansto help decide between competing physical query plansNotation review:T(R): number of tuples in relation RB(R): minimum number of blocks needed to store RSo far, we’ve spelled it out Blocks(R)V(R,a): number of distinct values in R of attribute a
  • 52. Requirements for Estimation RulesGive accurate estimatesAre easy (fast) to computeAre logically consistent: estimated size should not depend on how the relation is computedHere describe some simple heuristics.All we really need is a scheme that properly ranks competing plans.
  • 53. Estimating Size of Selection (p1)Suppose selection condition is A = c, where A is an attribute and c is a constant.A reasonable estimate of the number of tuples in the result is:T(R)/V(R,A), i.e., original number of tuples divided by number of different values of AGood approximation if values of A are evenly distributedAlso good approximation in some other, common, situations (see textbook)
  • 54. Estimating Size of Selection (p2)If condition is A < c:a good estimate is T(R)/3; intuition is that usually you ask about something that is true of less than half the tuplesIf condition is A ≠ c: a good estimate is T(R )If condition is the AND of several equalities and inequalities, estimate in series.
  • 55. ExampleConsider relation R(a,b,c) with 10,000 tuples and 50 different values for attribute a.Consider selecting all tuples from R with a = 10 and b < 20.Estimate of number of resulting tuples is 10,000*(1/50)*(1/3) = 67.
  • 56. Estimating Size of Selection (p3)If condition has the form C1 OR C2, use:sum of estimate for C1 and estimate for C2, unless that sum is > T(R) and the previous , orassuming C1 and C2 are independent, T(R)*(1  (1f1)*(1f2)), where f1 is fraction of R satisfying C1and f2is fraction of R satisfying C2
  • 57. ExampleConsider relation R(a,b) 10,000 tuples and 50 different values for a.Consider selecting all tuples from R with a = 10 or b < 20.EstimateEstimate for a = 10 is 10,000/50 = 200Estimate for b < 20 is 10,000/3 = 3333Estimate for combined condition is200 + 3333 = 3533 or10,000*(1  (1  1/50)*(1  1/3)) = 3466Different, but not really
  • 58. Estimating Size of Natural JoinAssume join is on a single attribute Y.Some possibilities:R and S have disjoint sets of Y values, so size of join is 0Y is the key of S and a foreign key of R, so size of join is T(R)All the tuples of R and S have the same Y value, so size of join is T(R)*T(S)We need some assumptions…
  • 59. Join Estimation RuleExpected number of tuples in result isT(R)*T(S) / max(V(R,Y),V(S,Y))Why? Suppose V(R,Y) ≤ V(S,Y).There are T(R) tuples in R.Each of them has a 1/V(S,Y) chance of joining with a given tuple of S, creating T(S)/V(S,Y) new tuples
  • 60. ExampleSuppose we haveR(a,b) with T(R) = 1000 and V(R,b) = 20S(b,c) with T(S) = 2000, V(S,b) = 50, and V(S,c) = 100U(c,d) with T(U) = 5000 and V(U,c) = 500What is the estimated size of R ⋈S ⋈U?First join R and S (on attribute b): estimated size of result, X, is T(R)*T(S)/max(V(R,b),V(S,b)) = 40,000number of values of c in X is the same as in S, namely 100Then join X with U (on attribute c): estimated size of result is T(X)*T(U)/max(V(X,c),V(U,c)) = 400,000
  • 61. Summary of Estimation RulesProjection: exactly computableProduct: exactly computableSelection: reasonable heuristicsJoin: reasonable heuristicsThe other operators are harder to estimate…
  • 62. Estimating Size ParametersEstimating the size of a relation depended on knowing T(R) and V(R,a)'sEstimating cost of a physical algorithm depends on also knowing B(R).How can the query compiler learn them?Scan relation to learn T, V's, and then calculate BCan also keep a histogram of the values of attributes. Makes estimating join results more accurateRecomputed periodically, after some time or some number of updates, or if DB administrator thinks optimizer isn't choosing good plans
  • 63. Heuristics to Reduce Cost of LQPFor each transformation of the tree being considered, estimate the "cost" before and after doing the transformationAt this point, "cost" only refers to sizes of intermediate relations (we don't yet know about number of disk I/O's)Sum of sizes of all intermediate relations is the heuristic: if this sum is smaller after the transformation, then incorporate it
  • 64. Why couldn’t we… A few questions to exploreNoSQL has also been described as NoJOINCould we use the techniques discussed here to implement JOINs on a NoSQL database?Could we implement the parallel operators as MapReduce jobs?Suitable topics in case you have not yet chosen a project
  • 65. Update on ProjectsConsider includingbenchmark results in your presentationThere is no need to submit your codeKey fragments can be included in your report, as seen in numerous papersDo include design of the code in your reportDo not submit code. It will not be evaluatedPace yourselfPlan to finish up your project coding in 2 weeks (by 4/4)Plan to write and perfect your report and PPT after thatBudget your presentation time carefully.How is it going?
  • 66. Next weekQuery OptimizationSuggested topic?We have half-a-lecture open to cover any topics of interest to everyone