SlideShare a Scribd company logo
Effective design patterns with
                                 NewSQL
                       Jags Ramnarayan, Chief Architect, GemFire/SQLFire, vFabric
                       Guillermo Tantachuco, Regional Sr. Systems Engineer, vFabric


© 2012 SpringOne 2GX. All rights reserved. Do not distribute without permission.
We Challenge the traditional RDBMS design NOT SQL


   Buffers
  primarily                                            First write to
tuned for IO                                                Log



Second write
to Data Files

        • Too much I/O
        • Design roots don‟t necessarily apply today
          • Too much focus on ACID
 2
          • Disk synchronization bottlenecks
Achieving consistent response times is challenging

    – Resources (memory, IO) consumed can vary a lot
    – Highly selective query using an index can be very fast one moment
        • a high cache hit rate most of the times
    – But, complex concurrent queries may wipe out the buffers causing a huge
      spike in IO the next moment
                   http://blog.tonybain.com/tony_bain/2009/05/the-problem-with-the-relational-database-part-2-predictability.html




3
Common themes in next-gen DB architectures
                                  “Shared nothing” commodity clusters
                                focus shifts to memory, distributing data and
                                                  clustering

                                 Scale by partitioning the data and move
                                         behavior to data nodes


                                HA within cluster and across data centers


                                    Add capacity to scale dynamically
 4
NoSQL, Data Grids, Data Fabrics, NewSQL
But, what about sharding?

•   Sharding works but can be huge burden over time
•   Querying across partitions
     – A simple nested loop join can be very expensive
     – Aggregations, ordering, Groupings have to be hand coded
     – Managing large intermediate data sets become an app problem
•   Transactions
     – Cross partition transactions are not possible
     – Loss of atomicity/isolation means compensatory code needs to be built
•   Management, elasticity
     – Cannot expand cluster size on demand
     – Management in general is difficult



5
NewSQL Concepts with VMWare SQLFire

• Main memory oriented Clustered SQL DB

• NoSQL characteristics of scalability, performance, availability but
  retains support for distributed transactions, SQL querying

• It is also designed so you can use it as a operational layer in front of
  your legacy databases through a caching framework
SQLFire at a glance
                             Tables can be replicated or
                                     partitioned.
                             Replication within cluster is
                                    synchronous
Expand cluster
 on demand




 Caching Framework – write
                              Shared nothing „append
  through, write-behind to
                               only‟ disk persistence
          RDBMS
Partitioning & Replication
Explore features using simple STAR schema
                                                                              FLIGHTAVAILABILITY
                                                                              ---------------------------------------------

                             FLIGHTS                                   FLIGHT_ID CHAR(6) NOT NULL ,
                                                                        SEGMENT_NUMBER INTEGER NOT NULL ,
                 ---------------------------------------------
                                                                        FLIGHT_DATE DATE NOT NULL ,
                                                                        ECONOMY_SEATS_TAKEN INTEGER ,
           FLIGHT_ID CHAR(6) NOT NULL ,
                                                                       …..
           SEGMENT_NUMBER INTEGER NOT NULL ,
           ORIG_AIRPORT CHAR(3),                                 1–M
                                                                       PRIMARY KEY ( FLIGHT_ID,
           DEPART_TIME TIME,
                                                                        SEGMENT_NUMBER,
     …..
                                                                        FLIGHT_DATE))
     PRIMARY KEY (FLIGHT_ID,
                                                                       FOREIGN KEY (FLIGHT_ID,
     SEGMENT_NUMBER)
                                                                            SEGMENT_NUMBER)
                                                                           REFERENCES FLIGHTS (
                                                                            FLIGHT_ID,
                                                                            SEGMENT_NUMBER)
                                     1–1




                     FLIGHTHISTORY
                 ---------------------------------------------
                                                                        SEVERAL CODE/DIMENSION TABLES
                 FLIGHT_ID CHAR(6),                                                ---------------------------------------------
                 SEGMENT_NUMBER INTEGER,
                 ORIG_AIRPORT CHAR(3),                             AIRLINES: AIRLINE INFORMATION (VERY STATIC)
                 DEPART_TIME TIME,                                 COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTS
                 DEST_AIRPORT CHAR(3),                             CITIES:
           …..                                                     MAPS: PHOTOS OF REGIONS SERVED




           Assume, thousands of flight rows, millions of flightavailability records
11
Creating tables
     CREATE TABLE AIRLINES (
       AIRLINE CHAR(2) NOT NULL PRIMARY KEY,
       AIRLINE_FULL VARCHAR(24),
       BASIC_RATE DOUBLE PRECISION,
       DISTANCE_DISCOUNT DOUBLE PRECISION,…. );




                            Table




                SQLF                      SQLF    SQLF
Replicated tables
       CREATE TABLE AIRLINES (                                Design Pattern
         AIRLINE CHAR(2) NOT NULL PRIMARY KEY,          Replicate reference tables in
         AIRLINE_FULL VARCHAR(24),                             STAR schemas
         BASIC_RATE DOUBLE PRECISION,                     (seldom change, often
         DISTANCE_DISCOUNT DOUBLE PRECISION,…. )          referenced in queries)
         REPLICATE;




  Replicated Table            Replicated Table             Replicated Table




                     SQLF                        SQLF                         SQLF
Partitioned tables
         CREATE TABLE FLIGHTS (
           FLIGHT_ID CHAR(6) NOT NULL ,                          Design Pattern
           SEGMENT_NUMBER INTEGER NOT NULL ,              Partition Fact tables in STAR
           ORIG_AIRPORT CHAR(3),                          schemas for load balancing
           DEST_AIRPORT CHAR(3)                               (large, write heavy)
           DEPART_TIME TIME,
           FLIGHT_MILES INTEGER NOT NULL)
           PARTITION BY COLUMN(FLIGHT_ID);



   Replicated Table            Replicated Table
                               Table                        Replicated Table
   Partitioned Table           Partitioned Table            Partitioned Table




                       SQLF                        SQLF                         SQLF
Partitioned but highly available
        CREATE TABLE FLIGHTS (
                                                                  Design Pattern
          FLIGHT_ID CHAR(6) NOT NULL ,
                                                             Increase redundant copies
          SEGMENT_NUMBER INTEGER NOT NULL ,
                                                             for HA and load balancing
          ORIG_AIRPORT CHAR(3),
                                                               queries across replicas
          DEST_AIRPORT CHAR(3)
          DEPART_TIME TIME,
          FLIGHT_MILES INTEGER NOT NULL)
           PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1;



   Replicated Table             Replicated Table
                                Table                        Replicated Table
   Partitioned Table            Partitioned Table            Partitioned Table

   Redundant Partition          Redundant Partition          Redundant Partition

                         SQLF                         SQLF                         SQLF
Disk resident tables
       CREATE TABLE FLIGHTS (
         FLIGHT_ID CHAR(6) NOT NULL ,
         SEGMENT_NUMBER INTEGER NOT NULL ,                           Data dictionary is always
         …..                                                          persisted in each server
          PARTITION BY COLUMN (FLIGHT_ID)
          PERSISTENT;

                                                     sqlf backup /export/fileServerDirectory/sqlfireBackupLocation




  Replicated Table             Replicated Table
                               Table                                          Replicated Table
  Partitioned Table            Partitioned Table                              Partitioned Table
  Colocated Partition          Colocated Partition                            Colocated Partition
  Redundant Partition          Redundant Partition                            Redundant Partition
  Redundant Partition          Redundant Partition                            Redundant Partition
                        SQLF                            SQLF                                            SQLF
Partition by Primary Key

To partition using the Primary Key, use:

        PARTITION BY PRIMARY KEY
                                               - Consistent hash on key resolves to a
                                                 logical bucket
                                               - Buckets map to physical processes
 CREATE TABLE FLIGHTS (
                                                 (nodes)
   FLIGHT_ID CHAR(6) NOT NULL ,
   SEGMENT_NUMBER INTEGER NOT NULL ,
   ORIG_AIRPORT CHAR(3),
   DEST_AIRPORT CHAR(3)
   DEPART_TIME TIME,
   FLIGHT_MILES INTEGER NOT NULL,
   PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) )
   PARTITION BY PRIMARY KEY;
Partition by Column(s)
     To partition using a column or columns, use:
  PARTITION BY COLUMN (column-name [ , column-name ]*)




CREATE TABLE FLIGHTS (                        - Hash key uses all partition columns
  FLIGHT_ID CHAR(6) NOT NULL ,
  SEGMENT_NUMBER INTEGER NOT NULL ,
  ORIG_AIRPORT CHAR(3),
  DEST_AIRPORT CHAR(3)
  DEPART_TIME TIME,
  FLIGHT_MILES INTEGER NOT NULL,
  PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) )
  PARTITION BY COLUMN (FLIGHT_ID);
Partition by List
 To partition based on specific column values:

      PARTITION BY LIST (column-name)
        VALUES ( value [ , value ]* )
          [ , VALUES ( value [ , value ]* ) ]*




CREATE TABLE FLIGHTS (
                                                 Partitioned Table   Node 1
  FLIGHT_ID CHAR(6) NOT NULL ,
  SEGMENT_NUMBER INTEGER NOT NULL ,
  …..
  PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) )
  PARTITION BY LIST (ORIG_AIRPORT)
   (VALUES („PDX‟, „LAX‟)                        Partitioned Table   Node 2
    VALUES („AMS‟, „DUB‟));
Partition by Range
  To partition based on a range of values of a specific column :
PARTITION BY RANGE (column-name)
  VALUES BETWEEN ( value AND value
   [ , VALUES BETWEEN ( value AND value ]* )


                                               Partitioned Table   Node 1
CREATE TABLE FLIGHTS (
  FLIGHT_ID CHAR(6) NOT NULL ,
  SEGMENT_NUMBER INTEGER NOT NULL ,
  …..
                                               Partitioned Table   Node 2
  PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) )
  PARTITION BY RANGE (FLIGHT_MILES)
   (VALUES BETWEEN 0 AND 100,
    VALUES BETWEEN 100 AND 500,
                                               Partitioned Table   Node 3
    VALUES BETWEEN 500 AND 1000 );
Partition by Expression
              To partition on a derived value:
                    PARTITION BY (expression)



      CREATE TABLE FLIGHTS (
        FLIGHT_ID CHAR(6) NOT NULL ,
        SEGMENT_NUMBER INTEGER NOT NULL ,
        ….
        PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) )
        PARTITION BY (HOUR(DEPART_TIME);
Demo environment
             SQLFire Locator    sqlf locator start
                                -client-bind-address=loc1
SQL client                      -client-port=1527
             SQLFire server 1   sqlf server start
                                -locators=loc1[10101]
                                -locators=loc1[10101]
             SQLFire server 2   -client-bind-address=server1
                                -client-port=1528
             SQLFire server 3


               JMX agent        sqlf agent start
                                -locators=loc1[10101]
Scaling with
Partitioned tables
Hash partitioning for linear scalability




  Key Hashing provides single hop access to its partition
  But, what if the access is not based on the key … say, joins are involved
Hash partitioning only goes so far
•   Consider this query :
                  Select * from flights, flightAvailability
            where <equijoin flights with flightAvailability>
                          and flightId =‘AA1116';

•   If both tables are hash partitioned the join logic will need execution on
    all nodes where flightavailability data is stored

•   Distributed joins are expensive and inhibit scaling
     • joins across distributed nodes could involve distributed locks and
        potentially a lot of intermediate data transfer across nodes

    EquiJOIN is supported for only colocated data in SQLFire 1.0
Partition aware DB design

       Designer thinks about how data access maps to logical partitions

       For scaling try to:
  1)     minimize excessive data distribution by keeping the most
         frequently accessed and joined data collocated on partitions

  2)     Collocate transaction working set on partitions so complex 2-
         phase commits/paxos commit is eliminated or minimized.


            Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper
Partition aware DB design

   – Identify partition key for “Entity Group”
       • "entity groups": set of entities across several related tables that
         can all share a single identifier
           – flightID is shared between the parent and child tables
           – CustomerID shared between customer, order and
              shipment tables

    CREATE TABLE FLIGHTAVAILABILITY (
      FLIGHT_ID CHAR(6) NOT NULL ,
      SEGMENT_NUMBER INTEGER NOT NULL ,
     …)

       PARTITION BY COLUMN (FLIGHT_ID)
       COLOCATE WITH (FLIGHTS);
Partition aware DB design


                            Select * from Flights where flight_id = „UA326‟



                                Select * from Flights f, flightAvailability fa
                                         where <JOIN clause> and
                                             flight_id = „UA326‟



                                   Select * from Flights f, flightAvailability fa
                                             where <JOIN clause> and
                                          flight_id IN („UA326‟, „AA400‟)




                            Select * from Flights f where orig_airport = „SFO‟
Partition Aware DB design

•   STAR schema design is the norm in OLTP design
•   Fact tables (fast changing) are natural partitioning candidates
     – Partition by: FlightID … Availability, history rows colocated with Flights
•   Dimension tables are natural replicated table candidates
     – Replicate Airlines, Countries, Cities on all nodes



•   Dealing with Joins involving M-M relationships
     – Can the one side of the M-M become a replicated table?
     – If not, run the Join logic in a parallel stored procedure to minimize distribution
     – Else, split the query into multiple queries in application
APPLICATION DESIGN PATTERNS
1. “Write thru” Distributed caching

                             “Write thru” – participate in
                               container transaction

                           Lazily load using “RowLoader”
                                   for PK queries

                           Trade-off: Throttled by legacy
                                      database
2. Distributed caching with Async writes to DB
                               Queues reside in memory
                               redundantly & persistent
                                  on multiple nodes

                             Primary / Secondary listeners

                                  Store-and-forward
Demo


     Write behind to MySQL using the DBSynchronizer (AsyncEventListener)..




33
3. As a scalable OLTP data store




        High throughput, response time, linear scale

Redundant copies, shared-nothing persistence, online backups

    Reduce maintenance cost and operational overhead
4. As embedded, clustered Java database
                       Just deploy a JAR or WAR into
                            clustered App nodes

                   Just like H2 or Derby except data can
                    be sync’d with DB is partitioned or
                      replicated across the cluster


                        Simply switch the URL from
                        jdbc:sqlfire://myHostName:1527/
                                        TO
                  jdbc:sqlfire:;mcast-port=33666;host-data=true
5. To process app behavior in parallel




     Map-reduce but based on simpler RPC
Scaling Application logic with
    Parallel “Data Aware
        procedures”
Procedures
                Java Stored Procedures may be created according to the
                                     SQL Standard
CREATE PROCEDURE getOverBookedFlights ()
LANGUAGE JAVA PARAMETER STYLE JAVA
READS SQL DATA DYNAMIC RESULT SETS 1
EXTERNAL NAME
 „examples.OverBookedStatus.getOverBookedStatus‟;


      SQLFire also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports
                                        an arbitrary Serializable Java object.




               In this case, the procedure will be executed on the server to
                   which a client is connected (or locally for Peer Clients)
Data Aware Procedures
                 Parallelize procedure and prune to nodes with required data
Extend the procedure call with the following syntax:
 CALL [PROCEDURE]                                                                   Client
 procedure_name
 ( WITH RESULTexpression ]* ] ) processor_name ]
 [ [ expression [, PROCESSOR
 [ { ON TABLE table_name [ WHERE whereClause ] }        |
     { ON {ALL | SERVER GROUPS
       (server_group_name [, server_group_name ]*) }}
                                                                                       Fabric Server 1     Fabric Server 2
 ]


 CALL getOverBookedFlights( )
 ON TABLE FLIGHTAVAILABILITY
                                                                    Hint the data the procedure depends on
 WHERE FLIGHT_ID = „AA1116‟;

         If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with
                                          the data (node with “AA1116” in this case)
Parallelize procedure then aggregate (reduce)
                                                                  CALL [PROCEDURE]
register a Java Result Processor (optional in some cases):        procedure_name
                                                                  [ WITH RESULT PROCESSOR processor_name ]
                                                                  [ [ ON TABLE table_name WHERE whereClause
                                                                  ( { expression [, expression[ ]* ] )
                                                                     ]} |
                                                                      { ON {ALL | SERVER GROUPS
               Client                                                  (server_group_name [, server_group_name ]*) }}
                                                                  ]




                  Fabric Server 1        Fabric Server 2     Fabric Server 3
Demo


     Data Aware procedure demo




41
6. To make data visible across sites in real time
Consistency model
Consistency Model without Transactions
   – Replication within cluster is always eager and synchronous
   – Row updates are always atomic; No need to use transactions
   – FIFO consistency: writes performed by a single thread are seen by all
     other processes in the order in which they were issued
Consistency Model without Transactions
   – Consistency in Partitioned tables
       • a partitioned table row owned by one member at a point in time
       • all updates are serialized to replicas through owner
       • "Total ordering" at a row level: atomic and isolated


   – Membership changes and consistency – need another hour 

   – Pessimistic concurrency support using „Select for update‟
   – Support for referential integrity
Distributed Transactions
   •   Full support for distributed transactions
        • Support READ_COMITTED and REPEATABLE_READ
   •   Highly scalable without any centralized coordinator or lock manager
   •   We make some important assumptions
         • Most OLTP transactions are small in duration and size
         • W-W conflicts are very rare in practice
Distributed Transactions
   • How does it work?
       • Each data node has a sub-coordinator to track TX state
       • Eagerly acquire local “write” locks on each replica
           • Object owned by a single primary at a point in time
       • Fail fast if lock cannot be obtained
   • Atomic and works with the cluster Failure detection system
   • Isolated until commit for READ_COMMITTED
       • Only support local isolation during commit
Parallel disk persistence
Why is disk latency so high?
•   Challenges
     – Disk seek times is still > 2ms
     – OLTP transactions are small writes
         • Flushing to disk will result in a seek
         • Best rates in 100s per second
•   RDBs and NoSQL try to avoid the problem
     – Append to transaction logs; out-of-band writes to data files
     – But, reads can cause seeks to disk
Disk persistence in SQLF

                                                     Memory                                          Memory
                                                     Tables                                          Tables


                           LOG                                             LOG
                         Compressor                                      Compressor




                                   OS Buffers                                      OS Buffers

                                         Record1                                         Record1
                               Record1

                               Record2
                                         Record2    Append only                Record1

                                                                               Record2
                                                                                         Record2    Append only
                               Record3
                                         Record3
                                                   Operation logs              Record3
                                                                                         Record3
                                                                                                   Operation logs




•    Parallel log structured storage                                •   Don‟t seek to disk
•    Each partition writes in parallel                              •   Don‟t flush all the way to disk
•    Backups write to disk also                                          – Use OS scheduler to time write
      –   Increase reliability against h/w loss                     •   Do this on primary + secondary
                                                                    •   Realize very high throughput
Performance benchmark
How does it perform? Scale?

•   Scale from 2 to 10 servers (one per host)
•   Scale from 200 to 1200 simulated clients (10 hosts)
•   Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
How does it perform? Scale?

•   CPU% remained low per server – about 30% indicating many more clients
    could be handled
Is latency low with scale?
•   Latency decreases with server capacity
•   50-70% take < 1 millisecond
•   About 90% take less than 2 milliseconds
Thank you:

  You can reach us at …

  Jags Ramnarayan: jramnara@vmware.com

  Guillermo Tantachuco: gtantachuco@vmware.com

http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire




                         Q&A

More Related Content

DOC
Clarity quick reference im
PDF
Symfony Admin Generator
PDF
Sentencias básicas en oracle
PPT
Sql Injection
PDF
Excel Audit Software Aditya Presentations In Agra
PPTX
Database Project Airport management System
PPTX
SQLFire at VMworld Europe 2011
PPTX
vFabric SQLFire Introduction
Clarity quick reference im
Symfony Admin Generator
Sentencias básicas en oracle
Sql Injection
Excel Audit Software Aditya Presentations In Agra
Database Project Airport management System
SQLFire at VMworld Europe 2011
vFabric SQLFire Introduction

Similar to Spring One 2012 Presentation – Effective design patterns with NewSQL (20)

PDF
Drizzles Approach To Improving Performance Of The Server
PPT
vFabric SQLFire for high performance data
PDF
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
PDF
Apache Spark: The Analytics Operating System
PDF
Oracle Diagnostics : Locks and Lock Trees
PPTX
SQLFire Webinar
PDF
Apache Spark: What's under the hood
PPT
01 oracle architecture
PPTX
Rishikesh Sharma Portfolio
PPTX
COLLADA & WebGL
PDF
CSW2017 Amanda rousseau cansecwest2017_net_hijacking_powershell
DOCX
Database Implementation Final Document
PDF
Gemfire Sqlfire - La Marmite NoSql
PPTX
HivePart1.pptx
PPTX
Apache Cassandra. Inception - all you need to know by Mikhail Dubkov
PDF
Newest topic of spider 20131016 in Buenos Aires Argentina
PPTX
2019 indit blackhat_honeypot your database server
PDF
2013 Collaborate - OAUG - Presentation
PDF
Pieter De Baets - An introduction to React Native
PDF
[B14] A MySQL Replacement by Colin Charles
Drizzles Approach To Improving Performance Of The Server
vFabric SQLFire for high performance data
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Apache Spark: The Analytics Operating System
Oracle Diagnostics : Locks and Lock Trees
SQLFire Webinar
Apache Spark: What's under the hood
01 oracle architecture
Rishikesh Sharma Portfolio
COLLADA & WebGL
CSW2017 Amanda rousseau cansecwest2017_net_hijacking_powershell
Database Implementation Final Document
Gemfire Sqlfire - La Marmite NoSql
HivePart1.pptx
Apache Cassandra. Inception - all you need to know by Mikhail Dubkov
Newest topic of spider 20131016 in Buenos Aires Argentina
2019 indit blackhat_honeypot your database server
2013 Collaborate - OAUG - Presentation
Pieter De Baets - An introduction to React Native
[B14] A MySQL Replacement by Colin Charles
Ad

More from VMware vFabric (13)

PPTX
VMware vFabric - CIO Webinar - Al Sargent
PPTX
vFabric - Ideal Platform for SaaS Apps
PPTX
New trends in data
PPTX
vFabric Data Director - DB as a Service
PPTX
Introduction to Cloud Foundry
PPTX
Introduction to Cloud Application Platform
PPTX
vFabric for i ISVs and MSPs
PPTX
7 steps to Enterprise PaaS
PPTX
Application management for hybrid cloud
PPTX
Migration from Weblogic to vFabric Cloud App Platform
PPTX
VMware vFabric Data Director for DB as a Service
PPTX
VMware vFabric gemfire for high performance, resilient distributed apps
PPTX
Data fabric and VMware
VMware vFabric - CIO Webinar - Al Sargent
vFabric - Ideal Platform for SaaS Apps
New trends in data
vFabric Data Director - DB as a Service
Introduction to Cloud Foundry
Introduction to Cloud Application Platform
vFabric for i ISVs and MSPs
7 steps to Enterprise PaaS
Application management for hybrid cloud
Migration from Weblogic to vFabric Cloud App Platform
VMware vFabric Data Director for DB as a Service
VMware vFabric gemfire for high performance, resilient distributed apps
Data fabric and VMware
Ad

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
cuic standard and advanced reporting.pdf
Electronic commerce courselecture one. Pdf
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Spring One 2012 Presentation – Effective design patterns with NewSQL

  • 1. Effective design patterns with NewSQL Jags Ramnarayan, Chief Architect, GemFire/SQLFire, vFabric Guillermo Tantachuco, Regional Sr. Systems Engineer, vFabric © 2012 SpringOne 2GX. All rights reserved. Do not distribute without permission.
  • 2. We Challenge the traditional RDBMS design NOT SQL Buffers primarily First write to tuned for IO Log Second write to Data Files • Too much I/O • Design roots don‟t necessarily apply today • Too much focus on ACID 2 • Disk synchronization bottlenecks
  • 3. Achieving consistent response times is challenging – Resources (memory, IO) consumed can vary a lot – Highly selective query using an index can be very fast one moment • a high cache hit rate most of the times – But, complex concurrent queries may wipe out the buffers causing a huge spike in IO the next moment http://blog.tonybain.com/tony_bain/2009/05/the-problem-with-the-relational-database-part-2-predictability.html 3
  • 4. Common themes in next-gen DB architectures “Shared nothing” commodity clusters focus shifts to memory, distributing data and clustering Scale by partitioning the data and move behavior to data nodes HA within cluster and across data centers Add capacity to scale dynamically 4 NoSQL, Data Grids, Data Fabrics, NewSQL
  • 5. But, what about sharding? • Sharding works but can be huge burden over time • Querying across partitions – A simple nested loop join can be very expensive – Aggregations, ordering, Groupings have to be hand coded – Managing large intermediate data sets become an app problem • Transactions – Cross partition transactions are not possible – Loss of atomicity/isolation means compensatory code needs to be built • Management, elasticity – Cannot expand cluster size on demand – Management in general is difficult 5
  • 6. NewSQL Concepts with VMWare SQLFire • Main memory oriented Clustered SQL DB • NoSQL characteristics of scalability, performance, availability but retains support for distributed transactions, SQL querying • It is also designed so you can use it as a operational layer in front of your legacy databases through a caching framework
  • 7. SQLFire at a glance Tables can be replicated or partitioned. Replication within cluster is synchronous Expand cluster on demand Caching Framework – write Shared nothing „append through, write-behind to only‟ disk persistence RDBMS
  • 9. Explore features using simple STAR schema FLIGHTAVAILABILITY --------------------------------------------- FLIGHTS FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , --------------------------------------------- FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER , FLIGHT_ID CHAR(6) NOT NULL , ….. SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), 1–M PRIMARY KEY ( FLIGHT_ID, DEPART_TIME TIME, SEGMENT_NUMBER, ….. FLIGHT_DATE)) PRIMARY KEY (FLIGHT_ID, FOREIGN KEY (FLIGHT_ID, SEGMENT_NUMBER) SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER) 1–1 FLIGHTHISTORY --------------------------------------------- SEVERAL CODE/DIMENSION TABLES FLIGHT_ID CHAR(6), --------------------------------------------- SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), AIRLINES: AIRLINE INFORMATION (VERY STATIC) DEPART_TIME TIME, COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTS DEST_AIRPORT CHAR(3), CITIES: ….. MAPS: PHOTOS OF REGIONS SERVED Assume, thousands of flight rows, millions of flightavailability records 11
  • 10. Creating tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ); Table SQLF SQLF SQLF
  • 11. Replicated tables CREATE TABLE AIRLINES ( Design Pattern AIRLINE CHAR(2) NOT NULL PRIMARY KEY, Replicate reference tables in AIRLINE_FULL VARCHAR(24), STAR schemas BASIC_RATE DOUBLE PRECISION, (seldom change, often DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) referenced in queries) REPLICATE; Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF
  • 12. Partitioned tables CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , Design Pattern SEGMENT_NUMBER INTEGER NOT NULL , Partition Fact tables in STAR ORIG_AIRPORT CHAR(3), schemas for load balancing DEST_AIRPORT CHAR(3) (large, write heavy) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN(FLIGHT_ID); Replicated Table Replicated Table Table Replicated Table Partitioned Table Partitioned Table Partitioned Table SQLF SQLF SQLF
  • 13. Partitioned but highly available CREATE TABLE FLIGHTS ( Design Pattern FLIGHT_ID CHAR(6) NOT NULL , Increase redundant copies SEGMENT_NUMBER INTEGER NOT NULL , for HA and load balancing ORIG_AIRPORT CHAR(3), queries across replicas DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1; Replicated Table Replicated Table Table Replicated Table Partitioned Table Partitioned Table Partitioned Table Redundant Partition Redundant Partition Redundant Partition SQLF SQLF SQLF
  • 14. Disk resident tables CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , Data dictionary is always ….. persisted in each server PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT; sqlf backup /export/fileServerDirectory/sqlfireBackupLocation Replicated Table Replicated Table Table Replicated Table Partitioned Table Partitioned Table Partitioned Table Colocated Partition Colocated Partition Colocated Partition Redundant Partition Redundant Partition Redundant Partition Redundant Partition Redundant Partition Redundant Partition SQLF SQLF SQLF
  • 15. Partition by Primary Key To partition using the Primary Key, use: PARTITION BY PRIMARY KEY - Consistent hash on key resolves to a logical bucket - Buckets map to physical processes CREATE TABLE FLIGHTS ( (nodes) FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL, PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) ) PARTITION BY PRIMARY KEY;
  • 16. Partition by Column(s) To partition using a column or columns, use: PARTITION BY COLUMN (column-name [ , column-name ]*) CREATE TABLE FLIGHTS ( - Hash key uses all partition columns FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL, PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) ) PARTITION BY COLUMN (FLIGHT_ID);
  • 17. Partition by List To partition based on specific column values: PARTITION BY LIST (column-name) VALUES ( value [ , value ]* ) [ , VALUES ( value [ , value ]* ) ]* CREATE TABLE FLIGHTS ( Partitioned Table Node 1 FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) ) PARTITION BY LIST (ORIG_AIRPORT) (VALUES („PDX‟, „LAX‟) Partitioned Table Node 2 VALUES („AMS‟, „DUB‟));
  • 18. Partition by Range To partition based on a range of values of a specific column : PARTITION BY RANGE (column-name) VALUES BETWEEN ( value AND value [ , VALUES BETWEEN ( value AND value ]* ) Partitioned Table Node 1 CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. Partitioned Table Node 2 PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) ) PARTITION BY RANGE (FLIGHT_MILES) (VALUES BETWEEN 0 AND 100, VALUES BETWEEN 100 AND 500, Partitioned Table Node 3 VALUES BETWEEN 500 AND 1000 );
  • 19. Partition by Expression To partition on a derived value: PARTITION BY (expression) CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , …. PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) ) PARTITION BY (HOUR(DEPART_TIME);
  • 20. Demo environment SQLFire Locator sqlf locator start -client-bind-address=loc1 SQL client -client-port=1527 SQLFire server 1 sqlf server start -locators=loc1[10101] -locators=loc1[10101] SQLFire server 2 -client-bind-address=server1 -client-port=1528 SQLFire server 3 JMX agent sqlf agent start -locators=loc1[10101]
  • 22. Hash partitioning for linear scalability Key Hashing provides single hop access to its partition But, what if the access is not based on the key … say, joins are involved
  • 23. Hash partitioning only goes so far • Consider this query : Select * from flights, flightAvailability where <equijoin flights with flightAvailability> and flightId =‘AA1116'; • If both tables are hash partitioned the join logic will need execution on all nodes where flightavailability data is stored • Distributed joins are expensive and inhibit scaling • joins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodes EquiJOIN is supported for only colocated data in SQLFire 1.0
  • 24. Partition aware DB design Designer thinks about how data access maps to logical partitions For scaling try to: 1) minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions 2) Collocate transaction working set on partitions so complex 2- phase commits/paxos commit is eliminated or minimized. Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper
  • 25. Partition aware DB design – Identify partition key for “Entity Group” • "entity groups": set of entities across several related tables that can all share a single identifier – flightID is shared between the parent and child tables – CustomerID shared between customer, order and shipment tables CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS);
  • 26. Partition aware DB design Select * from Flights where flight_id = „UA326‟ Select * from Flights f, flightAvailability fa where <JOIN clause> and flight_id = „UA326‟ Select * from Flights f, flightAvailability fa where <JOIN clause> and flight_id IN („UA326‟, „AA400‟) Select * from Flights f where orig_airport = „SFO‟
  • 27. Partition Aware DB design • STAR schema design is the norm in OLTP design • Fact tables (fast changing) are natural partitioning candidates – Partition by: FlightID … Availability, history rows colocated with Flights • Dimension tables are natural replicated table candidates – Replicate Airlines, Countries, Cities on all nodes • Dealing with Joins involving M-M relationships – Can the one side of the M-M become a replicated table? – If not, run the Join logic in a parallel stored procedure to minimize distribution – Else, split the query into multiple queries in application
  • 29. 1. “Write thru” Distributed caching “Write thru” – participate in container transaction Lazily load using “RowLoader” for PK queries Trade-off: Throttled by legacy database
  • 30. 2. Distributed caching with Async writes to DB Queues reside in memory redundantly & persistent on multiple nodes Primary / Secondary listeners Store-and-forward
  • 31. Demo Write behind to MySQL using the DBSynchronizer (AsyncEventListener).. 33
  • 32. 3. As a scalable OLTP data store High throughput, response time, linear scale Redundant copies, shared-nothing persistence, online backups Reduce maintenance cost and operational overhead
  • 33. 4. As embedded, clustered Java database Just deploy a JAR or WAR into clustered App nodes Just like H2 or Derby except data can be sync’d with DB is partitioned or replicated across the cluster Simply switch the URL from jdbc:sqlfire://myHostName:1527/ TO jdbc:sqlfire:;mcast-port=33666;host-data=true
  • 34. 5. To process app behavior in parallel Map-reduce but based on simpler RPC
  • 35. Scaling Application logic with Parallel “Data Aware procedures”
  • 36. Procedures Java Stored Procedures may be created according to the SQL Standard CREATE PROCEDURE getOverBookedFlights () LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME „examples.OverBookedStatus.getOverBookedStatus‟; SQLFire also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object. In this case, the procedure will be executed on the server to which a client is connected (or locally for Peer Clients)
  • 37. Data Aware Procedures Parallelize procedure and prune to nodes with required data Extend the procedure call with the following syntax: CALL [PROCEDURE] Client procedure_name ( WITH RESULTexpression ]* ] ) processor_name ] [ [ expression [, PROCESSOR [ { ON TABLE table_name [ WHERE whereClause ] } | { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} Fabric Server 1 Fabric Server 2 ] CALL getOverBookedFlights( ) ON TABLE FLIGHTAVAILABILITY Hint the data the procedure depends on WHERE FLIGHT_ID = „AA1116‟; If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with “AA1116” in this case)
  • 38. Parallelize procedure then aggregate (reduce) CALL [PROCEDURE] register a Java Result Processor (optional in some cases): procedure_name [ WITH RESULT PROCESSOR processor_name ] [ [ ON TABLE table_name WHERE whereClause ( { expression [, expression[ ]* ] ) ]} | { ON {ALL | SERVER GROUPS Client (server_group_name [, server_group_name ]*) }} ] Fabric Server 1 Fabric Server 2 Fabric Server 3
  • 39. Demo Data Aware procedure demo 41
  • 40. 6. To make data visible across sites in real time
  • 42. Consistency Model without Transactions – Replication within cluster is always eager and synchronous – Row updates are always atomic; No need to use transactions – FIFO consistency: writes performed by a single thread are seen by all other processes in the order in which they were issued
  • 43. Consistency Model without Transactions – Consistency in Partitioned tables • a partitioned table row owned by one member at a point in time • all updates are serialized to replicas through owner • "Total ordering" at a row level: atomic and isolated – Membership changes and consistency – need another hour  – Pessimistic concurrency support using „Select for update‟ – Support for referential integrity
  • 44. Distributed Transactions • Full support for distributed transactions • Support READ_COMITTED and REPEATABLE_READ • Highly scalable without any centralized coordinator or lock manager • We make some important assumptions • Most OLTP transactions are small in duration and size • W-W conflicts are very rare in practice
  • 45. Distributed Transactions • How does it work? • Each data node has a sub-coordinator to track TX state • Eagerly acquire local “write” locks on each replica • Object owned by a single primary at a point in time • Fail fast if lock cannot be obtained • Atomic and works with the cluster Failure detection system • Isolated until commit for READ_COMMITTED • Only support local isolation during commit
  • 47. Why is disk latency so high? • Challenges – Disk seek times is still > 2ms – OLTP transactions are small writes • Flushing to disk will result in a seek • Best rates in 100s per second • RDBs and NoSQL try to avoid the problem – Append to transaction logs; out-of-band writes to data files – But, reads can cause seeks to disk
  • 48. Disk persistence in SQLF Memory Memory Tables Tables LOG LOG Compressor Compressor OS Buffers OS Buffers Record1 Record1 Record1 Record2 Record2 Append only Record1 Record2 Record2 Append only Record3 Record3 Operation logs Record3 Record3 Operation logs • Parallel log structured storage • Don‟t seek to disk • Each partition writes in parallel • Don‟t flush all the way to disk • Backups write to disk also – Use OS scheduler to time write – Increase reliability against h/w loss • Do this on primary + secondary • Realize very high throughput
  • 50. How does it perform? Scale? • Scale from 2 to 10 servers (one per host) • Scale from 200 to 1200 simulated clients (10 hosts) • Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
  • 51. How does it perform? Scale? • CPU% remained low per server – about 30% indicating many more clients could be handled
  • 52. Is latency low with scale? • Latency decreases with server capacity • 50-70% take < 1 millisecond • About 90% take less than 2 milliseconds
  • 53. Thank you: You can reach us at … Jags Ramnarayan: jramnara@vmware.com Guillermo Tantachuco: gtantachuco@vmware.com http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire Q&A

Editor's Notes

  • #4: Relational databases are not predictable or reliable in terms of consistent performance for a number of reasons.Firstly, every query uses a different amount of resources.  A query could consume 1 or 2 I/O’s or 1 or 2 million I/O’s depending on how the query is written, what data is selected and factors such as how the database is indexed.  Performance is further varied by how the database is maintained (fragmentation).  What makes matters more complex is that different predicate values for a query can have vastly given data distributions.  The same query executed with different constants can have vastly different resource requirements.Because every query has a different “footprint” running a query in isolation does not provide indicative statistics on how that query will perform under concurrent load.  In fact is become impossible to predict the exact execution duration of a relational database query as its performance will be dependent on what else is being executed at the exact moment it is.Cost based optimizations can also change the plans dynamically resulting in, again, a variance in execution times. Essentially, a lot gets done to reduce disk IO which becomes a serious source of bottleneck.
  • #23: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that&apos;s not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
  • #32: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that&apos;s not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
  • #33: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that&apos;s not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
  • #35: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that&apos;s not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
  • #36: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that&apos;s not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
  • #37: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that&apos;s not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
  • #43: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that&apos;s not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.