SlideShare a Scribd company logo
New
performance
 features in
SQL Server
    2012
ColumnStore
  Indexes
Improved Data Warehouse Query performance


•   Columnstore indexes provide an easy way
    to significantly improve data warehouse
    and decision support query performance
    against very large data sets
•   Performance improvements for “typical”
    data warehouse queries from 10x to 100x
•   Ideal candidates include queries against
    star schemas that use filtering,
    aggregations and grouping against very
    large fact tables

                                                       3
What Happens When…


• You need to execute high performance DW
  queries against very large data sets?
  • In SQL Server 2008 and SQL Server 2008 R2
     • OLAP (SSAS) MDX solution
     • ROLAP and T-SQL + intermediate summary tables,
       indexed views and aggregate tables
        • Inherently inflexible




                                                        4
What Happens When…

• You need to execute high performance DW
  queries against very large data sets?
  • In SQL Server 2012
     • You can create a columnstore index on a very large fact table
       referencing all columns with supporting data types
         • Utilizing T-SQL and core Database Engine functionality
         • Minimal query refactoring or intervention
     • Upon creating the columnstore index, your table becomes
       “read only” – but you can still use partitioning to switch in
       and out data OR drop/rebuild indexes periodically

                                                                       5
How Are These Performance
              Gains Achieved?
• Two complimentary technologies:
  • Storage
     • Data is stored in a compressed columnar data format
       (stored by column) instead of row store format (stored
       by row).
        • Columnar storage allows for less data to be accessed when
          only a sub-set of columns are referenced
        • Data density/selectivity determines how compression friendly
          a column is – example “State” / “City” / “Gender”
        • Translates to improved buffer pool memory usage



                                                                         6
How Are These Performance
              Gains Achieved?
• Two complimentary technologies:
  • New “batch mode” execution
     • Data can then be processed in batches (1,000 row
       blocks) versus row-by-row
     • Depending on filtering and other factors, a query may
       also benefit by “segment elimination” - bypassing
       million row chunks (segments) of data, further reducing
       I/O



                                                                 7
Column vs. Row Store
Batch Mode

• Allows processing of 1,000 row blocks as an
  alternative to single row-by-row operations
  • Enables additional algorithms that can reduce CPU
    overhead significantly
  • Batch mode “segment” is a partition broken into
    million row chunks with associated statistics used
    for Storage Engine filtering


                                                         9
Batch Mode

• Batch mode can work to further improve query
  performance of a columnstore index, but this
  mode isn’t always chosen:
  • Some operations aren’t enabled for batch mode:
     • E.g. outer joins to columnstore index table / joining strings /
       NOT IN / IN / EXISTS / scalar aggregates
  • Row mode might be used if there is SQL Server
    memory pressure or parallelism is unavailable
  • Confirm batch vs. row mode by looking at the
    graphical execution plan


                                                                         1
Columnstore format + batch
                mode Variations
• Performance gains can come from a
  combination of:
  • Columnstore indexing alone + traditional row
    mode in QP
  • Columnstore indexing + batch mode in QP
  • Columnstore indexing + hybrid of batch and
    traditional row mode in QP


                                                   1
Creating a columnstore index

• T-SQL




• SSMS



                                         1
Good Candidates for
                 Columnstore Indexing
• Table candidates:
  • Very large fact tables (for example – billions of
    rows)
  • Larger dimension tables (millions of rows) with
    compression friendly column data
  • If unsure, it is easy to create a columnstore index
    and test the impact on your query workload


                                                          1
Good Candidates for
                      Columnstore Indexing
•   Query candidates (against table with a columnstore index):
    • Scan versus seek (columnstore indexes don’t support seek
      operations)
    • Aggregated results far smaller than table size
    • Joins to smaller dimension tables
    • Filtering on fact / dimension tables – star schema pattern
    • Sub-set of columns (being selective in columns versus returning
      ALL columns)
    • Single-column joins between columnstore indexed table and
      other tables




                                                                        1
Defining the Columnstore
                                    Index
•   Index type
    • Columnstore indexes are always non-clustered and non-unique
    • They cannot be created on views, indexed views, sparse columns
    • They cannot act as primary or foreign key constraints
•   Column selection
    • Unlike other index types, there are no “key columns”
        •   Instead you choose the columns that you anticipate will be used in your queries
        •   Up to 1,024 columns – and the ordering in your CREATE INDEX doesn’t matter
        •   No concept of “INCLUDE”
        •   No 900 byte index key size limit
•   Column ordering
    • Use of ASC or DESC sorting not allowed – as ordering is defined via columnstore
      compression algorithms




                                                                                              1
Supported Data Types
•   Supported data types
    • Char / nchar / varchar / nvarchar
         • (max) types, legacy LOB types and FILESTREAM are not supported
    • Decimal/numeric
         • Precision greater than 18 digits NOT supported
    •   Tinyint, smallint, int, bigint
    •   Float/real
    •   Bit
    •   Money, smallmoney
    •   Date and time data types
         • Datetimeoffset with scale > 2 NOT supported



                                                                            1
Demo:
   Simple
ColumnStore
   Index
Maintaining Data in a
                   Columnstore Index
• Once built, the table becomes “read-only” and
  INSERT/UPDATE/DELETE/MERGE is no longer allowed
• ALTER INDEX REBUILD / REORGANIZE not allowed
• Other options are still supported:
  • Partition switches (IN and OUT)
  • Drop columnstore index / make modifications / add
    columnstore index
  • UNION ALL (but be sure to validate performance)



                                                        1
Limitations
•   Columnstore indexes cannot be used in conjunction with
    •   Change Data Capture and Change Tracking
    •   Filestream columns (supported columns from same table are supported)
    •   Page, row and vardecimal storage compression
    •   Replication
    •   Sparse columns
•   Data type limitations
    •   Binary / varbinary / ntext / text / image / varchar (max) / nvarchar (max) /
        uniqueidentifier / rowversion / sql_variant / decimal or numeric with precesion > 18
        digits / CLR types / hierarchyid / xml / datetimeoffset with scale > 2
•   You can prevent a query from using the columnstore index using the
    IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint



                                                                                               1
Demo:
ColumnStore
and Partition
   Switch
Summary


•   SQL Server 2012 offers significantly faster query performance for
    data warehouse and decision support scenarios
    • 10x to 100x performance improvement depending on the schema and
      query
        • I/O reduction and memory savings through columnstore compressed storage
        • CPU reduction with batch versus row processing, further I/O reduction if
          segmentation elimination occurs
    • Easy to deploy and requires less management than some legacy
      ROLAP or OLAP methods
        • No need to create intermediate tables, aggregates, pre-processing and cubes
    • Interoperability with partitioning
    • For the best interactive end-user BI experience, consider Analysis
      Services, PowerPivot and Crescent



                                                                                        2
Partitioning
Enhancements
Improvements for Table Partitioning


•   SQL Server 2012 RTM supports up to 15,000 partitions
    • No need for a service pack to gain functionality
•   Partition statistics are created using a row sub-set sampling
    when an index is rebuilt or created - versus scanning all rows to
    create the statistics
•   Additional partition management wizard options can assist
    with executing or scripting out common partition operations
•   Partitioning can be used in conjunction with tables that have a
    columnstore index in order to switch in and out data



                                                                        2
2
2
What Happens When…

• You need to partition data by day for 3 years of
  data or more? Or you need to partition data by
  hour for a year’s worth of data?
  • In SQL Server 2008 & SQL Server 2008 R2
     • Limited to 1,000
        • Unless you installed 2008 SP2 or 2008 R2 SP1 – which allowed for
          15,000 partitions when enabled via sp_db_increased_partitions
            • This prevented moving from 2008 SP2 to 2008 R2 RTM
            • Also prevented moving SQL Server 2008 SP2 database with
              15,000 partitions enabled to SQL Server 2008 or 2008 SP1
            • Created other restrictions for Log Shipping, Database
              Mirroring, Replication, SSMS manageability



                                                                             2
What Happens When…


• You need to partition data by day for 3 years
  of data or more? Or you need to partition
  data by hour for a year’s worth of data?
  • In SQL Server 2012
     • 15,000 partitions are supported in RTM (no SP
       required)




                                                       2
15,000 Partitions


• You now have the option – as appropriate
  • Flexibility to partition based on common data
    warehousing increments (hours / days / months)
    without hitting the limit
     • This doesn’t remove the need for an archiving strategy or
       mindful planning
  • You have native support for log shipping, availability
    groups, database mirroring, replication and SSMS
    management

                                                                   2
15,000 Partitions



• Exceptions:
  • > 1000 partitions for x86 is permitted but not
    supported
  • > 1000 partitions for non-aligned indexes is
    permitted but not supported
  • For both exceptions – the risk is in degraded
    performance and insufficient memory


                                                     2
What Happens When…



• Your partitioned index is rebuilt or created:
  • In SQL Server 2008 and SQL Server 2008 R2
     • All table rows are scanned in order to create the
       statistics histogram
  • In SQL Server 2012
     • A default sampling algorithm is used instead
        • May or may not have an impact on performance
        • You can still choose to scan all rows by using CREATE STATISTICS
          or UPDATE STATISTICS with FULLSCAN


                                                                             3
What Happens When…




2008 ->




2012 ->




                               3
Enhanced Manage Partition
                   Wizard
• SQL Server 2008 R2   • SQL Server 2012




                                           3
Manage Partition Wizard

• SQL Server 2008 R2




                                    3
Demo:
 Partitioning
Enhancements
Summary


• SQL Server 2012 offers
  • An increased number of partitions, helping address
    common data warehouse requirements
  • Prevention of lock starvation during SWITCH
    operations
  • Reduced statistics generation footprint (not scanning
    ALL rows by default)
  • An enhanced manageability experience, enabling
    wizard-based SWITCH IN and SWITCH OUT assistance


                                                            3
TSQL performance
  enhancements
OVER Clause Windowing

• Some existing queries do not optimize well
• Example: Details of orders and days since
  previous order of each product
OVER Clause Windowing

-- Traditional approach

SELECT rs.ProductKey,rs.OrderDateKey,rs.SalesOrderNumber,
rs.OrderDateKey-(
    SELECT TOP(1) prev.OrderDateKey
    FROM dbo.FactResellerSales AS prev
    WHERE rs.ProductKey=prev.ProductKey
    AND prev.OrderDateKey<=rs.OrderDateKey
    AND prev.SalesOrderNumber<rs.SalesOrderNumber
    ORDER BY prev.OrderDateKey DESC,prev.SalesOrderNumber
    DESC)
        AS DaysSincePrevOrder
FROM dbo.FactResellerSales AS rs
ORDER BY rs.ProductKey,rs.OrderDateKey,rs.SalesOrderNumber;
OVER Clause Windowing

-- Windowed approach

SELECT ProductKey,OrderDateKey,SalesOrderNumber,
 • Some existing queries do not optimize well
OrderDateKey-LAG(OrderDateKey)
OVER (PARTITION BY ProductKey
 • Example: Details of orders and days since
       ORDER BY OrderDateKey,SalesOrderNumber)
AS DaysSincePrevOrder
   previous order of each product
FROM dbo.FactResellerSales AS rs
ORDER BY ProductKey,OrderDateKey,SalesOrderNumber;
OVER Clause Windowing

• Some existing queries do not optimize well
• Example: Details of orders and days since
  previous order of each product
Demo:
New Window
 Functions
Performance
Sequences

•CREATE SEQUENCE Booking.BookingID AS INT
    User-defined object
   START WITH 20001
• INCREMENT BY 10; particular table
    Not tied to any
•CREATE TABLE Booking.FlightBooking in a single table
    Not restricted to being used
 (FlightBookingID INT PRIMARY KEY CLUSTERED
•DEFAULT (NEXT VALUE FOR Booking.BookingID),
 ...
    Eases migration from other database engines
Sequence: Cache or No Cache

• Cache can increases performance for
  applications that use sequence objects by
  minimizing the number of disk IOs that are
  required to generate sequence numbers
Cache Example
•   If the Database Engine is stopped after you use 22 numbers, the next
    intended sequence number in memory (23) is written to the system
    tables, replacing the previously stored number.
•   If the Database Engine stops abnormally for an event such as a power
    failure, the sequence restarts with the number read from system tables
    (39). Any sequence numbers allocated to memory (but never requested
    by a user or application) are lost. This functionality may leave gaps, but
    guarantees that the same value will never be issued two times for a
    single sequence object unless it is defined as CYCLE or is manually
    restarted.
Cache default?

• If the cache option is enabled without
  specifying a cache size, the Database Engine
  will select a size
• “Don’t count on it being consistent. Microsoft
  might change the method of calculating the
  cache size without notice.”
Demo:
Sequences and
   Caching
Distributed
 Replay
Improved Benchmarking and
                        Testing

• Benchmarking and testing are improved through
  implementation of:
  • Distributed Replay Controller
  • Support for multiple Distributed Replay Clients
What happens when…
• The business needs to perform application
  compatibility testing prior to performing an
  upgrade, performance debugging of a highly
  concurrent workload, system capacity planning,
  or benchmark analysis of a database workload
  • In SQL Server 2008
     • SQL Server Profiler may be used to replay a captured trace
       against an upgraded test environment from a single computer
     • Event replay does not follow original query rates
What happens when…
• In SQL Server 2012
   • Distributed Replay can be used to replay a workload
     from multiple computers and better simulate a mission-
     critical workload
   • Replay can be configured to reproduce original query
     rates, or to run in stress test mode where the rate of
     replay occurs faster than the original query rate
Distributed Replay
                                       Components
•   Administration Tool
    • Command line application that talks to the Replay Controller
•   Replay Controller
    • Computer running the “SQL Server Distributed Replay Controller” service which is used to control
        the Replay Clients
•   Replay Clients
    • One or more computers which run the “SQL Server Distributed Replay Client” service
•   Target Server
    • SQL Server instance that the replay is directed towards




                        DBA
                        DBA                Replay
                                          Replay                                    SQL Server
                     Workstation         Controller                                   Target SQL
                     Workstation         Controller                                  Instance
                                                                                        Server
                                                                Replay
                                                                 Replay                Instance
                                                                Clients
                                                                 Clients
Distributed Replay Process
•   Event Capture
    • Events are captured using SQL Server Profiler or a server side trace
      based on the Replay Trace template
•   Preprocessing
    • Trace data is parsed into an intermediate file for replay
    • Specifies whether system session activities are included in the replay
      and the max idle time setting
•   Event Replay
    • Intermediate file divided among the replay clients
    • After clients receive dispatch data, controller launches and
      synchronizes the replay operation
    • Each client can record the replay activity to a local result trace file
Distributed Replay Sequencing
                       Modes
• Synchronization mode
  (Application compatibility and performance
  testing)
  • Events are replayed in the order in which they were
    submitted during the capture, within and across
    connections based on the events timestamp
  • The replay engine will try to emulate the original query
    rate observed during the capture
Distributed Replay Sequencing
                            Modes
•   Stress mode
    (Stress testing and capacity planning or forecasting)
    • No order or time synchronization across clients
    • Submit order is only maintained within each connection allowing the
      replay engine to drive more load against SQL Server than in
      synchronization mode
    • ConnectTimeScale and ThinkTimeScale parameters adjust the degree of
      stress during replay
        • Actual connect time is multiplied by ConnectTimeScale/100 to determine replay
          connect time
        • Actual think time is multiplied by ThinkTimeScale/100 to determine replay think
          time
Summary

• SQL Server 2012 offers better benchmarking and
  testing
  • Distributed Replay supports multiple replay clients
    allowing for higher scalability during the replay process
  • Replay operations can match original query rates for
    more accurate analysis of changes to the environment
Scalability
  using
AlwaysOn
What happens when…

• The business wants to:
  • Make use of the mostly-unused
    failover server(s) for reporting
  • Against real-time business data
SQL Server 2008 R2 or prior

•   Database mirroring required snapshot management of
    the mirrored databases for reporting purposes
    • Snapshot data does not change requiring a new snapshot to
      keep data up to date, plus connection migration to the new
      snapshot
    • Snapshots exist until cleaned up, even after failover occurs
    • Reporting workload can block database mirroring process
•   Log shipping using RESTORE … WITH STANDBY provides
    near real-time access to business data
    • Log restore operations require exclusive access to the database
In SQL Server 2012


• In SQL Server 2012
  • AlwaysOn Readable Secondaries enable read-only
    access for offloading reporting workloads
  • Read workload on Readable Secondaries does not
    interfere with data transfer from primary replica
  • Readable Secondaries can be used for offloading
    backup operations
Topology Example
Readable Secondary
                                            Client Connectivity


•   Client connection behavior determined by the Availability Group Replica
    option
    • Replica option determines whether a replica is enabled for read access when in a
      secondary role and which clients can connect to it
    • Choices are:
         • No connections
         • Only connections specifying Application Intent=ReadOnly connection property
         • All connections
•   Read-only Routing enables redirection of client connection to new
    readable secondary after a failover
    • Connection specifies the Availability Group Listener Virtual Name plus Application
      Intent=ReadOnly in the connection string
    • Possible for connections to go to different readable secondaries if available to
      balance read-only access
Readable secondary
                         Readonly routing




                                            Availability Group
                                                 Listener
• Client connects to the
  Availability Group Listener
  virtual name
  • Standard connections are
    routed to the Primary server
    for read/write operations
  • ReadOnly connections are
    routed to a readable
    secondary based on ReadOnly
    routing configuration
Query Performance on the Secondary



•   Challenges:
    •   Query workloads typically require index/column statistics so the query optimizer can formulate an
        efficient query plan
    •   Read-only workloads on a secondary replica may require different statistics than the workload on the
        primary replica
    •   Users cannot create different statistics themselves (secondaries can’t be modified)
•   Solution:
    •   SQL Server will automatically create required statistics, but store them as temporary statistics in
        tempdb on the secondary node
•   If different indexes are required by the secondary workload, these must be
    created on the primary replica so they will be present on the secondaries
    •   Care should be taken when creating additional indexes that maintenance overhead does not affect
        the workload performance on the primary replica
Offloading Backups To a Secondary


•   Backups can be done on any replica of a database to offload I/O
    from primary replica
    • Transaction log backups, plus COPY_ONLY full backups
•   Backup jobs can be configured on all replicas and preferences set
    so that a job only runs on the preferred replica at that time
    • This means no script/job changes are required after a failover
•   Transaction log backups done on all replicas form a single log chain
•   Database Recovery Advisor tool helps with restoring backups from
    multiple Secondaries
Workload Impact on the Secondary


•   Read-only workloads on mirror database using traditional database mirroring can
    block replay of transactions from the principal
•   Using Readable Secondaries, the reporting workload uses snapshot isolation to
    avoid blocking the replay of transactions
    • Snapshot isolation avoids read locks which could block the REDO background
       thread
    • The REDO thread will never be chosen as the deadlock victim, if a deadlock
       occurs
•   Replaying DDL operations on the secondary may be blocked by schema locks held by
    long running or complex queries
    • XEvent fires which allows programmatic termination/resumption of reporting
        • sqlserver.lock_redo_blocked event
Summary
•   SQL Server 2012 allows more efficient use of IT infrastructure
    • Failover servers are available for read-only workloads
    • Read-only secondaries are updated continuously from the primary
      without having to disconnect the reporting workload
•   SQL Server 2012 can improve performance of workloads
    • Reporting workloads can be offloaded to failover servers, improving
      performance of the reporting workload and the main workload
    • Backups can be offloaded to failover servers, improving performance of
      the main workload

More Related Content

PPTX
SQL Explore 2012: P&T Part 3
PPTX
Products.intro.forum version
PPTX
Manageability Enhancements of SQL Server 2012
PPTX
Sql Server 2014 In Memory
PDF
SQL Server 2019 CTP 2.5
PDF
SQL Server 2019 CTP2.4
PPTX
Powering GIS Application with PostgreSQL and Postgres Plus
PDF
Enterprise PostgreSQL - EDB's answer to conventional Databases
SQL Explore 2012: P&T Part 3
Products.intro.forum version
Manageability Enhancements of SQL Server 2012
Sql Server 2014 In Memory
SQL Server 2019 CTP 2.5
SQL Server 2019 CTP2.4
Powering GIS Application with PostgreSQL and Postgres Plus
Enterprise PostgreSQL - EDB's answer to conventional Databases

What's hot (20)

PDF
MariaDB: Connect Storage Engine
PPTX
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
PDF
Always on in sql server 2017
PPT
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
PPTX
Hardware planning & sizing for sql server
PPTX
Le novità di sql server 2019
PDF
Clustered Columnstore - Deep Dive
PPTX
LVOUG meetup #2 - Forcing SQL Execution Plan Instability
PPTX
Novedades SQL Server 2014
PDF
Maximum Availability Architecture with Fusion Middleware 12c and Oracle Datab...
PPTX
Jss 2015 in memory and operational analytics
PDF
DB210 Smarter Database IBM Tech Forum 2011
PDF
The InnoDB Storage Engine for MySQL
PDF
Columnstore improvements in SQL Server 2016
PDF
DB2 10 Smarter Database - IBM Tech Forum
PDF
MySQL Performance Tuning: Top 10 Tips
PPTX
Evolutionary database design
PDF
MySQL 5.6 Performance
PDF
Clustered Columnstore Introduction
PDF
Database as a Service on the Oracle Database Appliance Platform
MariaDB: Connect Storage Engine
Real Time Operational Analytics with Microsoft Sql Server 2016 [Liviu Ieran]
Always on in sql server 2017
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Hardware planning & sizing for sql server
Le novità di sql server 2019
Clustered Columnstore - Deep Dive
LVOUG meetup #2 - Forcing SQL Execution Plan Instability
Novedades SQL Server 2014
Maximum Availability Architecture with Fusion Middleware 12c and Oracle Datab...
Jss 2015 in memory and operational analytics
DB210 Smarter Database IBM Tech Forum 2011
The InnoDB Storage Engine for MySQL
Columnstore improvements in SQL Server 2016
DB2 10 Smarter Database - IBM Tech Forum
MySQL Performance Tuning: Top 10 Tips
Evolutionary database design
MySQL 5.6 Performance
Clustered Columnstore Introduction
Database as a Service on the Oracle Database Appliance Platform
Ad

Similar to SQL Explore 2012: P&T Part 2 (20)

PDF
Columnstore indexes in sql server 2014
PPTX
AWS (Amazon Redshift) presentation
PPT
Lecture3.ppt
PPTX
MySQL: Know more about open Source Database
PPTX
MySQL: Know more about open Source Database
PPT
Tunning overview
PDF
Sql Server2008
PDF
Vectorwise database training
PDF
Nosql data models
PDF
Building better SQL Server Databases
PPTX
A tour of Amazon Redshift
PPTX
Query Optimization in SQL Server
PPTX
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
PDF
databases management system and other DBA1 نظري.pdf
PDF
25 snowflake
PPTX
Unit III Key-Value Based Databases in nosql.pptx
PPTX
Cassandra training
PPTX
Comparative study of modern databases
PDF
Oracle 12c New Features For Better Performance
PDF
Columnar databases on Big data analytics
Columnstore indexes in sql server 2014
AWS (Amazon Redshift) presentation
Lecture3.ppt
MySQL: Know more about open Source Database
MySQL: Know more about open Source Database
Tunning overview
Sql Server2008
Vectorwise database training
Nosql data models
Building better SQL Server Databases
A tour of Amazon Redshift
Query Optimization in SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
databases management system and other DBA1 نظري.pdf
25 snowflake
Unit III Key-Value Based Databases in nosql.pptx
Cassandra training
Comparative study of modern databases
Oracle 12c New Features For Better Performance
Columnar databases on Big data analytics
Ad

More from sqlserver.co.il (20)

PDF
Windows azure sql_database_security_isug012013
PPTX
Things you can find in the plan cache
PPTX
Sql server user group news january 2013
PPTX
DAC 2012
PPTX
Query handlingbytheserver
PPTX
Adi Sapir ISUG 123 11/10/2012
PPTX
SQL Explore 2012: P&T Part 1
PPTX
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
PPTX
SQL Explore 2012 - Michael Zilberstein: ColumnStore
PPTX
SQL Explore 2012 - Meir Dudai: DAC
PPTX
SQL Explore 2012 - Aviad Deri: Spatial
PPTX
מיכאל
PPTX
נועם
PPTX
PPTX
מיכאל
PDF
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
PPTX
DBCC - Dubi Lebel
PPTX
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
PPTX
ISUG 113: File stream
PDF
Extreme performance - IDF UG
Windows azure sql_database_security_isug012013
Things you can find in the plan cache
Sql server user group news january 2013
DAC 2012
Query handlingbytheserver
Adi Sapir ISUG 123 11/10/2012
SQL Explore 2012: P&T Part 1
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Aviad Deri: Spatial
מיכאל
נועם
מיכאל
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
DBCC - Dubi Lebel
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
ISUG 113: File stream
Extreme performance - IDF UG

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The Rise and Fall of 3GPP – Time for a Sabbatical?
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
MIND Revenue Release Quarter 2 2025 Press Release
Building Integrated photovoltaic BIPV_UPV.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx

SQL Explore 2012: P&T Part 2

  • 3. Improved Data Warehouse Query performance • Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets • Performance improvements for “typical” data warehouse queries from 10x to 100x • Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables 3
  • 4. What Happens When… • You need to execute high performance DW queries against very large data sets? • In SQL Server 2008 and SQL Server 2008 R2 • OLAP (SSAS) MDX solution • ROLAP and T-SQL + intermediate summary tables, indexed views and aggregate tables • Inherently inflexible 4
  • 5. What Happens When… • You need to execute high performance DW queries against very large data sets? • In SQL Server 2012 • You can create a columnstore index on a very large fact table referencing all columns with supporting data types • Utilizing T-SQL and core Database Engine functionality • Minimal query refactoring or intervention • Upon creating the columnstore index, your table becomes “read only” – but you can still use partitioning to switch in and out data OR drop/rebuild indexes periodically 5
  • 6. How Are These Performance Gains Achieved? • Two complimentary technologies: • Storage • Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). • Columnar storage allows for less data to be accessed when only a sub-set of columns are referenced • Data density/selectivity determines how compression friendly a column is – example “State” / “City” / “Gender” • Translates to improved buffer pool memory usage 6
  • 7. How Are These Performance Gains Achieved? • Two complimentary technologies: • New “batch mode” execution • Data can then be processed in batches (1,000 row blocks) versus row-by-row • Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O 7
  • 9. Batch Mode • Allows processing of 1,000 row blocks as an alternative to single row-by-row operations • Enables additional algorithms that can reduce CPU overhead significantly • Batch mode “segment” is a partition broken into million row chunks with associated statistics used for Storage Engine filtering 9
  • 10. Batch Mode • Batch mode can work to further improve query performance of a columnstore index, but this mode isn’t always chosen: • Some operations aren’t enabled for batch mode: • E.g. outer joins to columnstore index table / joining strings / NOT IN / IN / EXISTS / scalar aggregates • Row mode might be used if there is SQL Server memory pressure or parallelism is unavailable • Confirm batch vs. row mode by looking at the graphical execution plan 1
  • 11. Columnstore format + batch mode Variations • Performance gains can come from a combination of: • Columnstore indexing alone + traditional row mode in QP • Columnstore indexing + batch mode in QP • Columnstore indexing + hybrid of batch and traditional row mode in QP 1
  • 12. Creating a columnstore index • T-SQL • SSMS 1
  • 13. Good Candidates for Columnstore Indexing • Table candidates: • Very large fact tables (for example – billions of rows) • Larger dimension tables (millions of rows) with compression friendly column data • If unsure, it is easy to create a columnstore index and test the impact on your query workload 1
  • 14. Good Candidates for Columnstore Indexing • Query candidates (against table with a columnstore index): • Scan versus seek (columnstore indexes don’t support seek operations) • Aggregated results far smaller than table size • Joins to smaller dimension tables • Filtering on fact / dimension tables – star schema pattern • Sub-set of columns (being selective in columns versus returning ALL columns) • Single-column joins between columnstore indexed table and other tables 1
  • 15. Defining the Columnstore Index • Index type • Columnstore indexes are always non-clustered and non-unique • They cannot be created on views, indexed views, sparse columns • They cannot act as primary or foreign key constraints • Column selection • Unlike other index types, there are no “key columns” • Instead you choose the columns that you anticipate will be used in your queries • Up to 1,024 columns – and the ordering in your CREATE INDEX doesn’t matter • No concept of “INCLUDE” • No 900 byte index key size limit • Column ordering • Use of ASC or DESC sorting not allowed – as ordering is defined via columnstore compression algorithms 1
  • 16. Supported Data Types • Supported data types • Char / nchar / varchar / nvarchar • (max) types, legacy LOB types and FILESTREAM are not supported • Decimal/numeric • Precision greater than 18 digits NOT supported • Tinyint, smallint, int, bigint • Float/real • Bit • Money, smallmoney • Date and time data types • Datetimeoffset with scale > 2 NOT supported 1
  • 17. Demo: Simple ColumnStore Index
  • 18. Maintaining Data in a Columnstore Index • Once built, the table becomes “read-only” and INSERT/UPDATE/DELETE/MERGE is no longer allowed • ALTER INDEX REBUILD / REORGANIZE not allowed • Other options are still supported: • Partition switches (IN and OUT) • Drop columnstore index / make modifications / add columnstore index • UNION ALL (but be sure to validate performance) 1
  • 19. Limitations • Columnstore indexes cannot be used in conjunction with • Change Data Capture and Change Tracking • Filestream columns (supported columns from same table are supported) • Page, row and vardecimal storage compression • Replication • Sparse columns • Data type limitations • Binary / varbinary / ntext / text / image / varchar (max) / nvarchar (max) / uniqueidentifier / rowversion / sql_variant / decimal or numeric with precesion > 18 digits / CLR types / hierarchyid / xml / datetimeoffset with scale > 2 • You can prevent a query from using the columnstore index using the IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint 1
  • 21. Summary • SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios • 10x to 100x performance improvement depending on the schema and query • I/O reduction and memory savings through columnstore compressed storage • CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs • Easy to deploy and requires less management than some legacy ROLAP or OLAP methods • No need to create intermediate tables, aggregates, pre-processing and cubes • Interoperability with partitioning • For the best interactive end-user BI experience, consider Analysis Services, PowerPivot and Crescent 2
  • 23. Improvements for Table Partitioning • SQL Server 2012 RTM supports up to 15,000 partitions • No need for a service pack to gain functionality • Partition statistics are created using a row sub-set sampling when an index is rebuilt or created - versus scanning all rows to create the statistics • Additional partition management wizard options can assist with executing or scripting out common partition operations • Partitioning can be used in conjunction with tables that have a columnstore index in order to switch in and out data 2
  • 24. 2
  • 25. 2
  • 26. What Happens When… • You need to partition data by day for 3 years of data or more? Or you need to partition data by hour for a year’s worth of data? • In SQL Server 2008 & SQL Server 2008 R2 • Limited to 1,000 • Unless you installed 2008 SP2 or 2008 R2 SP1 – which allowed for 15,000 partitions when enabled via sp_db_increased_partitions • This prevented moving from 2008 SP2 to 2008 R2 RTM • Also prevented moving SQL Server 2008 SP2 database with 15,000 partitions enabled to SQL Server 2008 or 2008 SP1 • Created other restrictions for Log Shipping, Database Mirroring, Replication, SSMS manageability 2
  • 27. What Happens When… • You need to partition data by day for 3 years of data or more? Or you need to partition data by hour for a year’s worth of data? • In SQL Server 2012 • 15,000 partitions are supported in RTM (no SP required) 2
  • 28. 15,000 Partitions • You now have the option – as appropriate • Flexibility to partition based on common data warehousing increments (hours / days / months) without hitting the limit • This doesn’t remove the need for an archiving strategy or mindful planning • You have native support for log shipping, availability groups, database mirroring, replication and SSMS management 2
  • 29. 15,000 Partitions • Exceptions: • > 1000 partitions for x86 is permitted but not supported • > 1000 partitions for non-aligned indexes is permitted but not supported • For both exceptions – the risk is in degraded performance and insufficient memory 2
  • 30. What Happens When… • Your partitioned index is rebuilt or created: • In SQL Server 2008 and SQL Server 2008 R2 • All table rows are scanned in order to create the statistics histogram • In SQL Server 2012 • A default sampling algorithm is used instead • May or may not have an impact on performance • You can still choose to scan all rows by using CREATE STATISTICS or UPDATE STATISTICS with FULLSCAN 3
  • 32. Enhanced Manage Partition Wizard • SQL Server 2008 R2 • SQL Server 2012 3
  • 33. Manage Partition Wizard • SQL Server 2008 R2 3
  • 35. Summary • SQL Server 2012 offers • An increased number of partitions, helping address common data warehouse requirements • Prevention of lock starvation during SWITCH operations • Reduced statistics generation footprint (not scanning ALL rows by default) • An enhanced manageability experience, enabling wizard-based SWITCH IN and SWITCH OUT assistance 3
  • 36. TSQL performance enhancements
  • 37. OVER Clause Windowing • Some existing queries do not optimize well • Example: Details of orders and days since previous order of each product
  • 38. OVER Clause Windowing -- Traditional approach SELECT rs.ProductKey,rs.OrderDateKey,rs.SalesOrderNumber, rs.OrderDateKey-( SELECT TOP(1) prev.OrderDateKey FROM dbo.FactResellerSales AS prev WHERE rs.ProductKey=prev.ProductKey AND prev.OrderDateKey<=rs.OrderDateKey AND prev.SalesOrderNumber<rs.SalesOrderNumber ORDER BY prev.OrderDateKey DESC,prev.SalesOrderNumber DESC) AS DaysSincePrevOrder FROM dbo.FactResellerSales AS rs ORDER BY rs.ProductKey,rs.OrderDateKey,rs.SalesOrderNumber;
  • 39. OVER Clause Windowing -- Windowed approach SELECT ProductKey,OrderDateKey,SalesOrderNumber, • Some existing queries do not optimize well OrderDateKey-LAG(OrderDateKey) OVER (PARTITION BY ProductKey • Example: Details of orders and days since ORDER BY OrderDateKey,SalesOrderNumber) AS DaysSincePrevOrder previous order of each product FROM dbo.FactResellerSales AS rs ORDER BY ProductKey,OrderDateKey,SalesOrderNumber;
  • 40. OVER Clause Windowing • Some existing queries do not optimize well • Example: Details of orders and days since previous order of each product
  • 42. Sequences •CREATE SEQUENCE Booking.BookingID AS INT User-defined object START WITH 20001 • INCREMENT BY 10; particular table Not tied to any •CREATE TABLE Booking.FlightBooking in a single table Not restricted to being used (FlightBookingID INT PRIMARY KEY CLUSTERED •DEFAULT (NEXT VALUE FOR Booking.BookingID), ... Eases migration from other database engines
  • 43. Sequence: Cache or No Cache • Cache can increases performance for applications that use sequence objects by minimizing the number of disk IOs that are required to generate sequence numbers
  • 44. Cache Example • If the Database Engine is stopped after you use 22 numbers, the next intended sequence number in memory (23) is written to the system tables, replacing the previously stored number. • If the Database Engine stops abnormally for an event such as a power failure, the sequence restarts with the number read from system tables (39). Any sequence numbers allocated to memory (but never requested by a user or application) are lost. This functionality may leave gaps, but guarantees that the same value will never be issued two times for a single sequence object unless it is defined as CYCLE or is manually restarted.
  • 45. Cache default? • If the cache option is enabled without specifying a cache size, the Database Engine will select a size • “Don’t count on it being consistent. Microsoft might change the method of calculating the cache size without notice.”
  • 48. Improved Benchmarking and Testing • Benchmarking and testing are improved through implementation of: • Distributed Replay Controller • Support for multiple Distributed Replay Clients
  • 49. What happens when… • The business needs to perform application compatibility testing prior to performing an upgrade, performance debugging of a highly concurrent workload, system capacity planning, or benchmark analysis of a database workload • In SQL Server 2008 • SQL Server Profiler may be used to replay a captured trace against an upgraded test environment from a single computer • Event replay does not follow original query rates
  • 50. What happens when… • In SQL Server 2012 • Distributed Replay can be used to replay a workload from multiple computers and better simulate a mission- critical workload • Replay can be configured to reproduce original query rates, or to run in stress test mode where the rate of replay occurs faster than the original query rate
  • 51. Distributed Replay Components • Administration Tool • Command line application that talks to the Replay Controller • Replay Controller • Computer running the “SQL Server Distributed Replay Controller” service which is used to control the Replay Clients • Replay Clients • One or more computers which run the “SQL Server Distributed Replay Client” service • Target Server • SQL Server instance that the replay is directed towards DBA DBA Replay Replay SQL Server Workstation Controller Target SQL Workstation Controller Instance Server Replay Replay Instance Clients Clients
  • 52. Distributed Replay Process • Event Capture • Events are captured using SQL Server Profiler or a server side trace based on the Replay Trace template • Preprocessing • Trace data is parsed into an intermediate file for replay • Specifies whether system session activities are included in the replay and the max idle time setting • Event Replay • Intermediate file divided among the replay clients • After clients receive dispatch data, controller launches and synchronizes the replay operation • Each client can record the replay activity to a local result trace file
  • 53. Distributed Replay Sequencing Modes • Synchronization mode (Application compatibility and performance testing) • Events are replayed in the order in which they were submitted during the capture, within and across connections based on the events timestamp • The replay engine will try to emulate the original query rate observed during the capture
  • 54. Distributed Replay Sequencing Modes • Stress mode (Stress testing and capacity planning or forecasting) • No order or time synchronization across clients • Submit order is only maintained within each connection allowing the replay engine to drive more load against SQL Server than in synchronization mode • ConnectTimeScale and ThinkTimeScale parameters adjust the degree of stress during replay • Actual connect time is multiplied by ConnectTimeScale/100 to determine replay connect time • Actual think time is multiplied by ThinkTimeScale/100 to determine replay think time
  • 55. Summary • SQL Server 2012 offers better benchmarking and testing • Distributed Replay supports multiple replay clients allowing for higher scalability during the replay process • Replay operations can match original query rates for more accurate analysis of changes to the environment
  • 57. What happens when… • The business wants to: • Make use of the mostly-unused failover server(s) for reporting • Against real-time business data
  • 58. SQL Server 2008 R2 or prior • Database mirroring required snapshot management of the mirrored databases for reporting purposes • Snapshot data does not change requiring a new snapshot to keep data up to date, plus connection migration to the new snapshot • Snapshots exist until cleaned up, even after failover occurs • Reporting workload can block database mirroring process • Log shipping using RESTORE … WITH STANDBY provides near real-time access to business data • Log restore operations require exclusive access to the database
  • 59. In SQL Server 2012 • In SQL Server 2012 • AlwaysOn Readable Secondaries enable read-only access for offloading reporting workloads • Read workload on Readable Secondaries does not interfere with data transfer from primary replica • Readable Secondaries can be used for offloading backup operations
  • 61. Readable Secondary Client Connectivity • Client connection behavior determined by the Availability Group Replica option • Replica option determines whether a replica is enabled for read access when in a secondary role and which clients can connect to it • Choices are: • No connections • Only connections specifying Application Intent=ReadOnly connection property • All connections • Read-only Routing enables redirection of client connection to new readable secondary after a failover • Connection specifies the Availability Group Listener Virtual Name plus Application Intent=ReadOnly in the connection string • Possible for connections to go to different readable secondaries if available to balance read-only access
  • 62. Readable secondary Readonly routing Availability Group Listener • Client connects to the Availability Group Listener virtual name • Standard connections are routed to the Primary server for read/write operations • ReadOnly connections are routed to a readable secondary based on ReadOnly routing configuration
  • 63. Query Performance on the Secondary • Challenges: • Query workloads typically require index/column statistics so the query optimizer can formulate an efficient query plan • Read-only workloads on a secondary replica may require different statistics than the workload on the primary replica • Users cannot create different statistics themselves (secondaries can’t be modified) • Solution: • SQL Server will automatically create required statistics, but store them as temporary statistics in tempdb on the secondary node • If different indexes are required by the secondary workload, these must be created on the primary replica so they will be present on the secondaries • Care should be taken when creating additional indexes that maintenance overhead does not affect the workload performance on the primary replica
  • 64. Offloading Backups To a Secondary • Backups can be done on any replica of a database to offload I/O from primary replica • Transaction log backups, plus COPY_ONLY full backups • Backup jobs can be configured on all replicas and preferences set so that a job only runs on the preferred replica at that time • This means no script/job changes are required after a failover • Transaction log backups done on all replicas form a single log chain • Database Recovery Advisor tool helps with restoring backups from multiple Secondaries
  • 65. Workload Impact on the Secondary • Read-only workloads on mirror database using traditional database mirroring can block replay of transactions from the principal • Using Readable Secondaries, the reporting workload uses snapshot isolation to avoid blocking the replay of transactions • Snapshot isolation avoids read locks which could block the REDO background thread • The REDO thread will never be chosen as the deadlock victim, if a deadlock occurs • Replaying DDL operations on the secondary may be blocked by schema locks held by long running or complex queries • XEvent fires which allows programmatic termination/resumption of reporting • sqlserver.lock_redo_blocked event
  • 66. Summary • SQL Server 2012 allows more efficient use of IT infrastructure • Failover servers are available for read-only workloads • Read-only secondaries are updated continuously from the primary without having to disconnect the reporting workload • SQL Server 2012 can improve performance of workloads • Reporting workloads can be offloaded to failover servers, improving performance of the reporting workload and the main workload • Backups can be offloaded to failover servers, improving performance of the main workload