SlideShare a Scribd company logo
Blazing fast queries:
when indexes are not
enough
Davide Mauri | dm@sensoriainc.com | @mauridb
Director SW Engineering & Lead data architect
Sensoria Inc.
About Me
Microsoft Data Platform MVP since 2006
Working with the Microsoft Data Platform from 1997
• Specialized in Data Solution Architecture, Database Design, Performance Tuning,
High-Performance Data Warehousing, BI, Big Data
• Very strong developer background
Loves this community!
• President of UGISS (Italian SQL Server UG) for 11 Years
• AppDev VG Leader since 2017
Regular Speaker @ Microsoft Data Platform events
Director SW & Cloud @ Sensoria: http://www.sensoriafitness.com/
E-mail: dm@sensoriainc.com-Twitter: @mauridb
Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx
About this session
Discuss and demonstrate alternative ways of solving daily problem such in a way you
will have
Easier, more maintainable, more understandable code base
Better performance
Better scalability
All without using indexes…because maybe you already used them and performances
are still not good!
It will “just” require a mind shift.
Free Space Problem
Given a group of elements of defined size,
find where there is enough free space to
make sure that all elements will be stored
near to each other
This sample can be applied to
warehouses, cinemas, stadiums, and so
on…
Free Spaces Problem:
Free Spaces – Most common solution
3 4 5 6 7 9
A. Set the “Free Space Counter” (FSP) to 1
B. Start from the first free seat (n), store this number as
First Free Space (FFS)
C. If next free space is n+1 then increase the FSC
D. If next free space is not n+1, store it somewhere along with FFS
E. restart from point A
1 2 5 1
Free Spaces
Common solution demo
Considerations
Is such solution the best?
…or just the first one that came to our mind?
Is really *that* easy?
Not really: code is much more complex that
one first thought
Is scalable?
Let’s say we have a warehouse with a million of small
spaces. Will it perform well?
Lateral Thinking
Let’s try to solve the problem with some creativity, using the so-called “Lateral
Thinking”:
Lateral thinking is solving problems through an indirect and creative approach, using
reasoning that is not immediately obvious and involving ideas that may not be
obtainable by using only traditional step-by-step logic
https://en.wikipedia.org/wiki/Lateral_thinking
First steps in lateral thinking
A. If SQL Server could see the picture above would be able to see that
there are groups of free spaces
Then we could simply use “GROUP BY”
B. This means that there is some hidden information in the picture
and we have to figure out how to turn it into something a
RDBMS can handle
First steps in lateral thinking
A. Let’s enumerate all the seats (free and occupied)
B. And then enumerate all the free seats
C. Do a simple difference….
D. And now we can use a GROUP BY 
1 2 3 4 5 6 7 8 9 10…
1 2 3 4 5 6 7 …
2 2 2 2 2 3 3
Free Spaces
Lateral Thinking Solution Demo
Lateral thinking results
Code is so much simpler!
Performances and scalability are vastly improved
Lateral thinking
Seems hard?
Well, at the beginning everything is hard, right? Remember where you were a kid?
Easy: A little less easy: But it is quite clear who’s
going to win in a
competition, right?
The only way to improve
is exercising!
Let’s start! 1st problem: Loans
Knowing the value of the requested loan, interests and number for rates, generate a
row for each rate, from the beginning up until the end of the loan
Loan Value # Rates Payment Frequency % Interest
€ 10.000,00 12 1 5,00%
€ 20.000,00 12 6 6,00%
€ 30.000,00 12 6 5,50%
Rate Date
€ 875,00 January-20
€ 875,00 February-20
€ 875,00 March-20
€ 875,00 April-20
€ 875,00 May-20
€ 875,00 June-20
€ 875,00 July-20
€ 875,00 August-20
€ 875,00 September-20
€ 875,00 October-20
€ 875,00 November-20
€ 875,00 December-20
Loans
Demo
Groups and Correlated Values
Given a list of machines statuses, tell the latest reported status of each machine
Controller
A
B
C
Controller Status TimeStamp
A StandBy 1001
B StandBy 1001
C StandBy 1003
D Working 1004
A Working 1004
B StandBy 1006
D StandBy 1009
D Working 1010
D Working 1011
Groups and Correlated Values
Demo
Stocks
Given a table with stock transactions get the High, Low, Open, Close & Volume values
for each stock, each hour, each day
SymbolID TransactionDateTime Price Volume
1 2008-01-01 09:15:21.000 75.800 2589
1 2008-01-01 09:25:44.000 68.200 4386
1 2008-01-01 09:29:31.000 74.300 2837
1 2008-01-01 09:34:42.000 68.900 2937
1 2008-01-01 09:39:13.000 72.300 4513
1 2008-01-01 09:43:35.000 67.300 838
1 2008-01-01 09:51:57.000 73.800 1380
1 2008-01-01 09:56:42.000 68.700 4190
tran_hour from_datetime to_datetime symbol_id high low volume open close
9
2008-01-01
09:15:21.000
2008-01-01
09:56:42.000 1 75.800 67.300 23670 75.800 68.700
Stocks
Demo
Value Packing
Now, from the same machines statuses table, we need to “pack” all the ranges
Controller Status TimeStamp
A StandBy 1001
B StandBy 1002
B StandBy 1003
B StandBy 1004
A Working 1005
A Working 1006
A Working 1007
B Working 1008
B Working 1009
A StandBy 1010
A StandBy 1011
B StandBy 1012
Controller Status From To
A StandBy 1001 1001
B StandBy 1002 1004
A Working 1005 1007
B Working 1008 1009
A StandBy 1010 1011
B StandBy 1012 1012
Value Packing
Demo
Conclusions
Don’t stop at the first solution that
comes to your mind. Try to figure
out if there is a different one
1
Try to use a declarative, set-based,
approach: in RDBMS it is usually
the best shot
• Declarative: you tell the system which result
you want not how to obtain it
• Leverage Windows Functions that are really
a powerful feature
2
Avoid the Row-By-Agonizing-Row
(RBAR) whenever possible! It is
complex, it doesn’t scale, it
doesn’t perform well
3
Thanks!Demos available here:
https://github.com/yorek/appdev2017

More Related Content

PPTX
Dapper: the microORM that will change your life
PPTX
Agile Data Warehousing
PDF
Server’s variations bsw2015
PPTX
Protecting privacy with fuzzy-feeling test data
PPTX
Speaking 'Development Language' (Or, how to get your hands dirty with technic...
PDF
Agile experiments in Machine Learning with F#
PPTX
Schema less table & dynamic schema
PPTX
SQL PASS BAC - 60 reporting tips in 60 minutes
Dapper: the microORM that will change your life
Agile Data Warehousing
Server’s variations bsw2015
Protecting privacy with fuzzy-feeling test data
Speaking 'Development Language' (Or, how to get your hands dirty with technic...
Agile experiments in Machine Learning with F#
Schema less table & dynamic schema
SQL PASS BAC - 60 reporting tips in 60 minutes

What's hot (13)

PPTX
60 reporting tips in 60 minutes - SQLBits 2018
PPTX
Agile Experiments in Machine Learning
PPTX
500-Level Guide to Career Internals
PPTX
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
PDF
Participating in the Community - Beyond Code: Presented by Cassandra Targett,...
PDF
An Introduction to jOOQ
PPTX
SSIS Monitoring Deep Dive
PDF
Yeoman AngularJS and D3 - A solid stack for web apps
PPTX
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
PPTX
11 Goals of High Functioning SQL Developers
PDF
Optimera STHLM 2011 - Mikael Berggren, Spotify
PPTX
SQL Server High Availability and DR - Too Many Choices!
PDF
Surviving in a microservices environment
60 reporting tips in 60 minutes - SQLBits 2018
Agile Experiments in Machine Learning
500-Level Guide to Career Internals
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Participating in the Community - Beyond Code: Presented by Cassandra Targett,...
An Introduction to jOOQ
SSIS Monitoring Deep Dive
Yeoman AngularJS and D3 - A solid stack for web apps
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
11 Goals of High Functioning SQL Developers
Optimera STHLM 2011 - Mikael Berggren, Spotify
SQL Server High Availability and DR - Too Many Choices!
Surviving in a microservices environment
Ad

Similar to When indexes are not enough (20)

PPTX
Sql server infernals
PPTX
Three Tools for "Human-in-the-loop" Data Science
PDF
Don't optimize my queries, organize my data!
PDF
Hadoop World 2011: BI on Hadoop in Financial Services - Stefan Grschupf, Data...
PPTX
Lazy beats Smart and Fast
PDF
Don’t optimize my queries, optimize my data!
PDF
Runaway complexity in Big Data... and a plan to stop it
PPTX
SQL Server 2012 Best Practices
PPTX
SQL Windowing
PDF
Advanced Allocations
PPTX
Introduction to Databases for Data analytics.pptx
PDF
Data Warehouse Design & Dimensional Modeling
PPTX
Exciting Features for SQL Devs in SQL 2012
PDF
Five finger audit
PPTX
The Other HPC: High Productivity Computing in Polystore Environments
PDF
Microsoft Big Data @ SQLUG 2013
PPTX
Advance sql - window functions patterns and tricks
PDF
Speed up sql
PPTX
New features of SQL 2012
PDF
Sql server infernals
Three Tools for "Human-in-the-loop" Data Science
Don't optimize my queries, organize my data!
Hadoop World 2011: BI on Hadoop in Financial Services - Stefan Grschupf, Data...
Lazy beats Smart and Fast
Don’t optimize my queries, optimize my data!
Runaway complexity in Big Data... and a plan to stop it
SQL Server 2012 Best Practices
SQL Windowing
Advanced Allocations
Introduction to Databases for Data analytics.pptx
Data Warehouse Design & Dimensional Modeling
Exciting Features for SQL Devs in SQL 2012
Five finger audit
The Other HPC: High Productivity Computing in Polystore Environments
Microsoft Big Data @ SQLUG 2013
Advance sql - window functions patterns and tricks
Speed up sql
New features of SQL 2012
Ad

More from Davide Mauri (20)

PPTX
Azure serverless Full-Stack kickstart
PPTX
Building a Real-Time IoT monitoring application with Azure
PPTX
Azure SQL & SQL Server 2016 JSON
PPTX
SQL Server & SQL Azure Temporal Tables - V2
PPTX
SQL Server 2016 Temporal Tables
PPTX
SQL Server 2016 What's New For Developers
PPTX
Azure Stream Analytics
PPTX
Azure Machine Learning
PPTX
Dashboarding with Microsoft: Datazen & Power BI
PPTX
Azure ML: from basic to integration with custom applications
PPTX
Event Hub & Azure Stream Analytics
PPTX
SQL Server 2016 JSON
PPTX
SSIS Monitoring Deep Dive
PPTX
Real Time Power BI
PPTX
AzureML - Creating and Using Machine Learning Solutions (Italian)
PPTX
Datarace: IoT e Big Data (Italian)
PPTX
Azure Machine Learning (Italian)
PPTX
Back to the roots - SQL Server Indexing
PPTX
Iris Multi-Class Classifier with Azure ML
PPTX
BIML: BI to the next level
Azure serverless Full-Stack kickstart
Building a Real-Time IoT monitoring application with Azure
Azure SQL & SQL Server 2016 JSON
SQL Server & SQL Azure Temporal Tables - V2
SQL Server 2016 Temporal Tables
SQL Server 2016 What's New For Developers
Azure Stream Analytics
Azure Machine Learning
Dashboarding with Microsoft: Datazen & Power BI
Azure ML: from basic to integration with custom applications
Event Hub & Azure Stream Analytics
SQL Server 2016 JSON
SSIS Monitoring Deep Dive
Real Time Power BI
AzureML - Creating and Using Machine Learning Solutions (Italian)
Datarace: IoT e Big Data (Italian)
Azure Machine Learning (Italian)
Back to the roots - SQL Server Indexing
Iris Multi-Class Classifier with Azure ML
BIML: BI to the next level

Recently uploaded (20)

PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Global journeys: estimating international migration
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Introduction to Business Data Analytics.
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Reliability_Chapter_ presentation 1221.5784
.pdf is not working space design for the following data for the following dat...
Global journeys: estimating international migration
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction-to-Cloud-ComputingFinal.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Business Data Analytics.
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Knowledge Engineering Part 1
Clinical guidelines as a resource for EBP(1).pdf
climate analysis of Dhaka ,Banglades.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Foundation of Data Science unit number two notes
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Acceptance and paychological effects of mandatory extra coach I classes.pptx

When indexes are not enough

  • 1. Blazing fast queries: when indexes are not enough Davide Mauri | dm@sensoriainc.com | @mauridb Director SW Engineering & Lead data architect Sensoria Inc.
  • 2. About Me Microsoft Data Platform MVP since 2006 Working with the Microsoft Data Platform from 1997 • Specialized in Data Solution Architecture, Database Design, Performance Tuning, High-Performance Data Warehousing, BI, Big Data • Very strong developer background Loves this community! • President of UGISS (Italian SQL Server UG) for 11 Years • AppDev VG Leader since 2017 Regular Speaker @ Microsoft Data Platform events Director SW & Cloud @ Sensoria: http://www.sensoriafitness.com/ E-mail: dm@sensoriainc.com-Twitter: @mauridb Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx
  • 3. About this session Discuss and demonstrate alternative ways of solving daily problem such in a way you will have Easier, more maintainable, more understandable code base Better performance Better scalability All without using indexes…because maybe you already used them and performances are still not good! It will “just” require a mind shift.
  • 4. Free Space Problem Given a group of elements of defined size, find where there is enough free space to make sure that all elements will be stored near to each other This sample can be applied to warehouses, cinemas, stadiums, and so on…
  • 6. Free Spaces – Most common solution 3 4 5 6 7 9 A. Set the “Free Space Counter” (FSP) to 1 B. Start from the first free seat (n), store this number as First Free Space (FFS) C. If next free space is n+1 then increase the FSC D. If next free space is not n+1, store it somewhere along with FFS E. restart from point A 1 2 5 1
  • 8. Considerations Is such solution the best? …or just the first one that came to our mind? Is really *that* easy? Not really: code is much more complex that one first thought Is scalable? Let’s say we have a warehouse with a million of small spaces. Will it perform well?
  • 9. Lateral Thinking Let’s try to solve the problem with some creativity, using the so-called “Lateral Thinking”: Lateral thinking is solving problems through an indirect and creative approach, using reasoning that is not immediately obvious and involving ideas that may not be obtainable by using only traditional step-by-step logic https://en.wikipedia.org/wiki/Lateral_thinking
  • 10. First steps in lateral thinking A. If SQL Server could see the picture above would be able to see that there are groups of free spaces Then we could simply use “GROUP BY” B. This means that there is some hidden information in the picture and we have to figure out how to turn it into something a RDBMS can handle
  • 11. First steps in lateral thinking A. Let’s enumerate all the seats (free and occupied) B. And then enumerate all the free seats C. Do a simple difference…. D. And now we can use a GROUP BY  1 2 3 4 5 6 7 8 9 10… 1 2 3 4 5 6 7 … 2 2 2 2 2 3 3
  • 13. Lateral thinking results Code is so much simpler! Performances and scalability are vastly improved
  • 14. Lateral thinking Seems hard? Well, at the beginning everything is hard, right? Remember where you were a kid? Easy: A little less easy: But it is quite clear who’s going to win in a competition, right? The only way to improve is exercising!
  • 15. Let’s start! 1st problem: Loans Knowing the value of the requested loan, interests and number for rates, generate a row for each rate, from the beginning up until the end of the loan Loan Value # Rates Payment Frequency % Interest € 10.000,00 12 1 5,00% € 20.000,00 12 6 6,00% € 30.000,00 12 6 5,50% Rate Date € 875,00 January-20 € 875,00 February-20 € 875,00 March-20 € 875,00 April-20 € 875,00 May-20 € 875,00 June-20 € 875,00 July-20 € 875,00 August-20 € 875,00 September-20 € 875,00 October-20 € 875,00 November-20 € 875,00 December-20
  • 17. Groups and Correlated Values Given a list of machines statuses, tell the latest reported status of each machine Controller A B C Controller Status TimeStamp A StandBy 1001 B StandBy 1001 C StandBy 1003 D Working 1004 A Working 1004 B StandBy 1006 D StandBy 1009 D Working 1010 D Working 1011
  • 18. Groups and Correlated Values Demo
  • 19. Stocks Given a table with stock transactions get the High, Low, Open, Close & Volume values for each stock, each hour, each day SymbolID TransactionDateTime Price Volume 1 2008-01-01 09:15:21.000 75.800 2589 1 2008-01-01 09:25:44.000 68.200 4386 1 2008-01-01 09:29:31.000 74.300 2837 1 2008-01-01 09:34:42.000 68.900 2937 1 2008-01-01 09:39:13.000 72.300 4513 1 2008-01-01 09:43:35.000 67.300 838 1 2008-01-01 09:51:57.000 73.800 1380 1 2008-01-01 09:56:42.000 68.700 4190 tran_hour from_datetime to_datetime symbol_id high low volume open close 9 2008-01-01 09:15:21.000 2008-01-01 09:56:42.000 1 75.800 67.300 23670 75.800 68.700
  • 21. Value Packing Now, from the same machines statuses table, we need to “pack” all the ranges Controller Status TimeStamp A StandBy 1001 B StandBy 1002 B StandBy 1003 B StandBy 1004 A Working 1005 A Working 1006 A Working 1007 B Working 1008 B Working 1009 A StandBy 1010 A StandBy 1011 B StandBy 1012 Controller Status From To A StandBy 1001 1001 B StandBy 1002 1004 A Working 1005 1007 B Working 1008 1009 A StandBy 1010 1011 B StandBy 1012 1012
  • 23. Conclusions Don’t stop at the first solution that comes to your mind. Try to figure out if there is a different one 1 Try to use a declarative, set-based, approach: in RDBMS it is usually the best shot • Declarative: you tell the system which result you want not how to obtain it • Leverage Windows Functions that are really a powerful feature 2 Avoid the Row-By-Agonizing-Row (RBAR) whenever possible! It is complex, it doesn’t scale, it doesn’t perform well 3

Editor's Notes

  • #16: Key Message: Joins supports also other operator than “=“
  • #18: Key Message: Introduce ranking functions and their usage, Show that execution plan can tell you “lies”