SlideShare a Scribd company logo
Multiple ways of storing 
-> Data <- 
SQL -> NOSQL -> NEWSQL 
Tony Rogerson 
@tonyrogerson 
tonyrogerson@torver.net 
dataidol.com/tonyrogerson
Agenda 
Data structures 
◦ Relational, Key/Value pair, Document, Graph, Column/Column Family Store 
◦ Key Concepts 
◦ Hashing, Partitioning, Sharding, ACID, BASE 
Technology Areas 
◦ SQL, NoSQL, NewSQL
Who-am-I 
Freelance SQL Server professional and Data Specialist 
Fellow BCS, MSc in BI, PGCert in Data Science 
Started out in 1986 – VSAM, System W, Application System, DB2, Oracle, SQL Server since 4.21a 
Awarded SQL Server MVP yearly since 97 
Founded UK SQL Server User Group back in ’99, founder member of DDD, SQL Bits, SQL Relay, 
SQL Santa 
Interested in commodity based distributed processing of Data.
Data Structures 
WAYS OF STRUCTURING DATA
What is data? 
Tony Rogerson 
tonyrogerson@torver.net 
Harpenden 
36 on 2014-01-01, 36 on 2014-05-01, 38 on 2014-10-15 
46 
44
Data needs context and structure 
Tony Rogerson FullName 
tonyrogerson@torver.net Email 
Harpenden PostalTown 
36 on 2014-01-01, 
36 on 2014-05-01, {WaistInches, RecordedOn} 
38 on 2014-10-15 
46 ChestInches 
44 Ages 
Schema gives 
Context
Relational [Tables] 
FullName (PK) Email PostalTown WaistInches ChestInches AgeYears 
Tony Rogerson tonyrogerson@ 
torver.net 
Harpenden 46 44 
FullName (FK) WaistInches RecordedDate 
Tony Rogerson 36 2014-01-01 
Tony Rogerson 36 2014-05-01 
Tony Rogerson 38 2014-10-01 
People WaistInches 
Tony Rogerson 
tonyrogerson@torver.net 
Harpenden 
36 on 2014-01-01, 36 on 2014-05-01, 38 on 2014-10-15 
46 
44
Key/Value pair (EAV) 
Entity Attribute Value 
Person FullName Tony Rogerson 
Person Email tonyrogerson@torver.net 
Person PostalTown Harpenden 
Person ChestInches 46 
Person Age 44 
WaistInches FullName Tony Rogerson 
WaistInches WaistInches 36 
WaistInches RecordedDate 2014-01-01 
WaistInches FullName Tony Rogerson 
WaistInches WaistInches 36 
WaistInches RecordedDate 2014-05-01 
WaistInches FullName Tony Rogerson 
WaistInches WaistInches 38 
WaistInches RecordedDate 2014-10-01 
Examples: 
Riak, Dyanamo, Redis, 
Foundation etc. 
Tony Rogerson 
tonyrogerson@torver.net 
Harpenden 
36 on 2014-01-01, 36 on 2014-05-01, 38 on 2014-10-15 
46 
44
Document 
JSON Schema JSON Document 
{ 
“FullName” : “string”, 
“Email” : “string”, 
“PostalTown” : “string”, 
“WaistInches” : { 
“WaistInches” : “number”, 
“RecordedDate” : “string” }, 
“ChestInches” : “number”, 
“Age” : “number” 
} 
{ 
“FullName” : “Tony Rogerson”, 
“Email” : “tonyrogerson@torver.net”, 
“PostalTown” : “Harpenden”, 
“WaistInches” : [ { 
Examples: 
MongoDB, Couchbase, 
CouchDB etc. 
“WaistInches” : 36, 
“RecordedDate” : “2014-01-01” }, 
{ 
“WaistInches” : 36, 
“RecordedDate” : “2014-05-01” } ], 
“ChestInches” : 46, 
“Age” : 44 
} 
JSON vs XML discussion: http://stackoverflow.com/questions/4862310/json-and-xml-comparison 
Tony Rogerson 
tonyrogerson@torver.net 
Harpenden 
36 on 2014-01-01, 36 on 2014-05-01, 38 on 2014-10-15 
46 
44
Schema Design 
E.g. 100 machine cluster 
Document Database Normal Form (Relational) 
{ 
"firstName": "John", 
"lastName": "Smith", 
"isAlive": true, 
"age": 25, 
"height_cm": 167.6, 
"address": { 
"streetAddress": "21 2nd Street", 
"city": "New York", 
"state": "NY", 
"postalCode": "10021-3100" 
}, 
"phoneNumbers": [ 
{ "type": "home", "number": "212 555-1234" }, 
{ "type": "office", "number": "646 555-4567" } 
] 
} 
person address 
phoneN 
umbers 
Object data 
stored together 
(collection) 
Object data 
stored separately 
(tables)
MongoDB Example 
Use ESTER for MongoVUE 
What do documents look like?
Graph 
SQL (inherently very poor performance): 
◦ Nested Sets 
◦ Recursive CTE 
Represents “connected” data 
All about understanding and exploring relationships 
Examples: 
Neo4j, Virtuoso, Allegro. 
Tony Dave 
Fred 
Sid 
Node 
Relationship
Examples: 
Cassandra, Druid, HBase 
Column 
Values stored as a key-value pair 
Column Name (unique) 
Value 
Timestamp 
Important bit: It may not appear in each row! 
Column Family is: container for columns and rows (like but not a relational table) 
Relational Table: Fixed Columns 
Column Family: determined by application – flexible
Column storage 
Examples: 
Cassandra, Druid, HBase 
http://www.datastax.com/docs/1.1/ddl/column_family 
Stored as…
SQL Server Columnstore 
Table sliced into rowgroups (a group of rows – a batch) 
Each rowgroup compressed in column-wise manner 
Column segment is a column of data from within the rowgroup 
Column segment per column in table which is then compressed onto 
storage. 
SO: a table has rows (sliced into rowgroups), rowgroups have columns 
(each column having a column segment)
Demo: SQL Sparse columns
Key Concepts 
SHARDING, PARTITIONING, HASHING
Hashing 
Distributed Database Cluster has fixed number of data nodes 
Your data is spread across the database cluster 
◦ 10 node cluster; each data item may reside on 3 nodes 
◦ Which 3 nodes? 
Data key is Hashed to a number – hashing algorithm is deterministic 
data-node = f( data-key ) 
◦ print ( checksum( 'All hale to the ale' ) * 1.) % 10 
◦ print ( checksum( 'And a glass of wine for the ladies' ) * 1.) % 10
Partitioning 
Chop big table up into “horizontal 
partitions” 
Partition key required 
Each partition is self-contained binding rows 
by the partitioning key 
Access all data through logical view over all 
partitions 
Table by table basis
Shared Nothing 
Partitioning+ 
Each Shard is self-contained and has all the 
procs, meta-data and of course your partition of 
data 
Shard Key common to multiple tables, for 
example CustomerID, Email Address. 
Greater autonomy across the distributed 
database 
Seeing the entire database as a logical unit is 
more difficult – joining is a nightmare 
Node 1 
Node 2 
Node 3
Sharding Sync 
Node 1 
Node 2 
Node 3 
Full copy of data 
Subset of data 
Replication
ACID (Automicity, Consistency, Isolation, Durability) 
BASE (Basically Available, Soft-state, Eventually Consistent) 
ACID is a Transactional model 
Not specific to the relational database 
◦ eg. HIVE (interface to HADOOP) provides ACID facilities 
Durability: write ahead Logging expensive (latency from serialisation of writes) 
Distributed transactions – Two Phase Commit (2PC) 
◦ Poor scalability because of Latency 
◦ ACID across distributed nodes bad design choice 
◦ Partition/Shard database and ACID in-node only 
Coordinator 
Subordinate 
Subordinate 
INSERT 
2PC Transaction 
All or nothing
ACID (Automicity, Consistency, Isolation, Durability) 
BASE (Basically Available, Soft-state, Eventually Consistent) 
BASE is a Transactional modelish 
Specific to Distributed database model 
Basically Available – all or some of the system is available 
Node 1 Node 2 Node 3
ACID (Automicity, Consistency, Isolation, Durability) 
BASE (Basically Available, Soft-state, Eventually Consistent) 
Soft-state 
Eventually Consistent 
System may change over time [as replica’s become up-to-date (consistent)] 
Node 1 Node 2 Node 3 
Insert value ‘A’
SQL 
AH – THE COMMON DENOMINATOR OF AN ACCESS LAYER
What is SQL? 
SQL is NOT a method of storing data! 
SQL is a language, it’s just syntax 
Relational Theory = thinking in sets 
SQL is a language that follows (but does not obey) relational theory 
With SQL we associate ACID (but durability is now optional in SQL 2014)
NoSQL 
NOT ONLY SQL 
NO SQL
Origins NoSQL? 
First NoSQL database was an open source relational database 
NoSQL (really NoREL) started in mid 2000’s 
Realisation that ACID doesn’t scale easily 
Should really be NoACID (Mutually exclusive for some 70’s developers) 
Hadoop – came out of Yahoo 
Cassandra, Riak and others derivatives of Amazon Dynamo 
NoSQL basically means: ACID doesn’t scale, SQL is too restrictive, and I’m a developer and I like 
complexity.
But why the need for “NoSQL”? 
Feb 2001 
◦ BigData - http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling- 
Data-Volume-Velocity-and-Variety.pdf 
Basically Scale-Up (SAN) costs too much and doesn’t scale well 
Sick of vendor lock in and associated costs – open source software running across cheap 
commodity machines (Redundant Array of Inexpensive Servers) 
Availability, Resilience – by design – by software and not expensive hardware 
Existing Relational Databases (with SQL as their only language) expensive and too slow (ACID) 
BASE v ACID 
SQL implements a rigid and inflexible framework (or does it)
Eventual Consistency in SQL Server 
Asynchronous Availability Groups/Database Mirroring 
Replication 
Eventual / Causal Consistency 
◦ Eventual no good for order specific [and important] transactions 
◦ Like Merge replication 
◦ Causal: deliver messages in correct order [e.g. service broker] 
◦ Like Transactional Replication
MongoDB – Replica Set 
primary 
$ mongo --host 10.0.0.1 --port 27017 
ROSIE 
10.0.0.2 
ESTER 
10.0.0.1 
HAZEL 
10.0.0.3 
secondary's 
replication replication 
Heart-beat 
• 1 Master – Multiple Secondary’s 
• 1 R/W – Multiple Readers 
• Setup: 
• Use replication.replSetName in mongo config file 
• On Primary: 
• rs.initiate() 
• rs.add( “---secondary address” ) 
• rs.add( “---secondary address” ) 
• rs.status()
MongoDB - Sharding 
Shards of data (data chopped up into multiple ranges, 
range depends where it sits) 
Standalone or Replica-Set MongoDB instances 
(data storage) 
Stores configuration information 
about the Shards.
MongoDB – Sharding (with Replica-Set) 
mongod: port 27017, replSet: rsDemoRS2 
DAISY 
10.0.0.4 
CONISTON 
10.0.0.11 
POPPY 
10.0.0.5 
KARLI 
10.0.0.6 
mongod: port 27017, replSet: rsDemo 
mongos: port 27020 (on ESTER, HAZEL, ROSIE) 
ROSIE 
10.0.0.2 
config servers 
port 27019 
(shard information 
point to replica sets) 
ESTER 
10.0.0.1 
primary 
HAZEL 
10.0.0.3 
secondary's 
replication replication 
Heart-beat 
THIRLMERE 
10.0.0.13 
primary 
ULLSWATER 
10.0.0.12 
secondary's 
replication replication 
Heart-beat 
DAISY 
10.0.0.4 
Query Balancer 
Query
NewSQL 
SCALABLE ACID!
Relational Databases catch up 
Maintains ACID 
Same scalability and performance of NoSQL systems 
Some Vendors: Clustrix, MemSQL, NuoDB, VoltDB, Postgres-XL 
Auto-sharding, auto-partitioning 
Queries need to take place on same box to save latency 
http://www.postgres-xl.org/overview/
Summary / Q & A / Discuss

More Related Content

PDF
broadfield_vm_cv
ODP
Introduction to PostgreSQL
PDF
Php &amp; my sql - how do pdo, mysq-li, and x devapi do what they do
PDF
BITS: Introduction to MySQL - Introduction and Installation
PPTX
Introduction databases and MYSQL
PPTX
Redis/Lessons learned
PDF
When ACLs Attack
PDF
Open Source World June '21 -- JSON Within a Relational Database
broadfield_vm_cv
Introduction to PostgreSQL
Php &amp; my sql - how do pdo, mysq-li, and x devapi do what they do
BITS: Introduction to MySQL - Introduction and Installation
Introduction databases and MYSQL
Redis/Lessons learned
When ACLs Attack
Open Source World June '21 -- JSON Within a Relational Database

What's hot (20)

PDF
DSpace Under the Hood
PPTX
March 2011 HUG: HDFS Federation
PPTX
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
PPTX
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
PPTX
Hopsfs 10x HDFS performance
PDF
The data model is dead, long live the data model
PPTX
Discover the Power of the NoSQL + SQL with MySQL
PDF
JavaScript and Friends August 20th, 20201 -- MySQL Shell and JavaScript
PDF
Json within a relational database
PPTX
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
PDF
Cassandra introduction 2016
PDF
Introduction to CQL and Data Modeling with Apache Cassandra
PDF
Datacon LA - MySQL without the SQL - Oh my!
PDF
SUPPORTING QUERYING ON MULTI-MILLION EVENTS PER SECOND from Structure:Data 2012
PDF
MySQL without the SQL -- Cascadia PHP
PDF
Pinterest的数据库分片架构
PDF
Cassandra By Example: Data Modelling with CQL3
ODP
MySQL Without the MySQL -- Oh My!
PPTX
DSpace Under the Hood
March 2011 HUG: HDFS Federation
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Hopsfs 10x HDFS performance
The data model is dead, long live the data model
Discover the Power of the NoSQL + SQL with MySQL
JavaScript and Friends August 20th, 20201 -- MySQL Shell and JavaScript
Json within a relational database
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
Cassandra introduction 2016
Introduction to CQL and Data Modeling with Apache Cassandra
Datacon LA - MySQL without the SQL - Oh my!
SUPPORTING QUERYING ON MULTI-MILLION EVENTS PER SECOND from Structure:Data 2012
MySQL without the SQL -- Cascadia PHP
Pinterest的数据库分片架构
Cassandra By Example: Data Modelling with CQL3
MySQL Without the MySQL -- Oh My!
Ad

Viewers also liked (20)

PDF
Why SQL Server 2014 Cardinality Estimator is *the* killer feature
PPTX
Leveraging memory in sql server
PPTX
NewSQL - Deliverance from BASE and back to SQL and ACID
PDF
Why new hardware may not make SQL Server faster
PDF
The have no fear guide to virtualizing databases
PDF
Veja em primeira mão os tópicos de tecnologia de 2016
PPTX
Building scalable application with sql server
PDF
SQL Server 2014 In-Memory Tables (XTP, Hekaton)
PPTX
Azure Machine Learning
PDF
How to find what is making your Oracle database slow
PPTX
Column store indexes and batch processing mode (nx power lite)
PPTX
SolarWinds State of Government IT Management and Monitoring Survey
PDF
Why new hardware may not make Oracle databases faster
PPTX
Super scaling singleton inserts
PPTX
2015 Top 10 Vorhersagen Für IT-Profis
PPTX
Sql sever engine batch mode and cpu architectures
PPTX
Back to the roots - SQL Server Indexing
PDF
The 2015 Top Ten IT Pro-dictions
PPTX
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
PPTX
Azure ML: from basic to integration with custom applications
Why SQL Server 2014 Cardinality Estimator is *the* killer feature
Leveraging memory in sql server
NewSQL - Deliverance from BASE and back to SQL and ACID
Why new hardware may not make SQL Server faster
The have no fear guide to virtualizing databases
Veja em primeira mão os tópicos de tecnologia de 2016
Building scalable application with sql server
SQL Server 2014 In-Memory Tables (XTP, Hekaton)
Azure Machine Learning
How to find what is making your Oracle database slow
Column store indexes and batch processing mode (nx power lite)
SolarWinds State of Government IT Management and Monitoring Survey
Why new hardware may not make Oracle databases faster
Super scaling singleton inserts
2015 Top 10 Vorhersagen Für IT-Profis
Sql sever engine batch mode and cpu architectures
Back to the roots - SQL Server Indexing
The 2015 Top Ten IT Pro-dictions
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Azure ML: from basic to integration with custom applications
Ad

Similar to NoSQL, SQL, NewSQL - methods of structuring data. (20)

PPTX
Introduction to NoSql
PPTX
No sq lv2
PDF
Database Systems - A Historical Perspective
PPTX
The Rise of NoSQL and Polyglot Persistence
PPTX
PPTX
Introduction to Data Science NoSQL.pptx
PDF
SQL vs NoSQL deep dive
PPTX
Modern database
PDF
NoSql and it's introduction features-Unit-1.pdf
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
PPTX
To SQL or NoSQL, that is the question
PPTX
PPTX
Big Data Analytics Module-3 as per vtu syllabus.pptx
PDF
Beyond Relational Databases
PDF
Nosql part1 8th December
PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
NoSQL.pptx
PPTX
NoSQL Architecture Overview
PPSX
A Seminar on NoSQL Databases.
PPTX
NoSQL databases
Introduction to NoSql
No sq lv2
Database Systems - A Historical Perspective
The Rise of NoSQL and Polyglot Persistence
Introduction to Data Science NoSQL.pptx
SQL vs NoSQL deep dive
Modern database
NoSql and it's introduction features-Unit-1.pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
To SQL or NoSQL, that is the question
Big Data Analytics Module-3 as per vtu syllabus.pptx
Beyond Relational Databases
Nosql part1 8th December
cours database pour etudiant NoSQL (1).pptx
NoSQL.pptx
NoSQL Architecture Overview
A Seminar on NoSQL Databases.
NoSQL databases

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Chapter 3 Spatial Domain Image Processing.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Electronic commerce courselecture one. Pdf
NewMind AI Monthly Chronicles - July 2025
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

NoSQL, SQL, NewSQL - methods of structuring data.

  • 1. Multiple ways of storing -> Data <- SQL -> NOSQL -> NEWSQL Tony Rogerson @tonyrogerson tonyrogerson@torver.net dataidol.com/tonyrogerson
  • 2. Agenda Data structures ◦ Relational, Key/Value pair, Document, Graph, Column/Column Family Store ◦ Key Concepts ◦ Hashing, Partitioning, Sharding, ACID, BASE Technology Areas ◦ SQL, NoSQL, NewSQL
  • 3. Who-am-I Freelance SQL Server professional and Data Specialist Fellow BCS, MSc in BI, PGCert in Data Science Started out in 1986 – VSAM, System W, Application System, DB2, Oracle, SQL Server since 4.21a Awarded SQL Server MVP yearly since 97 Founded UK SQL Server User Group back in ’99, founder member of DDD, SQL Bits, SQL Relay, SQL Santa Interested in commodity based distributed processing of Data.
  • 4. Data Structures WAYS OF STRUCTURING DATA
  • 5. What is data? Tony Rogerson tonyrogerson@torver.net Harpenden 36 on 2014-01-01, 36 on 2014-05-01, 38 on 2014-10-15 46 44
  • 6. Data needs context and structure Tony Rogerson FullName tonyrogerson@torver.net Email Harpenden PostalTown 36 on 2014-01-01, 36 on 2014-05-01, {WaistInches, RecordedOn} 38 on 2014-10-15 46 ChestInches 44 Ages Schema gives Context
  • 7. Relational [Tables] FullName (PK) Email PostalTown WaistInches ChestInches AgeYears Tony Rogerson tonyrogerson@ torver.net Harpenden 46 44 FullName (FK) WaistInches RecordedDate Tony Rogerson 36 2014-01-01 Tony Rogerson 36 2014-05-01 Tony Rogerson 38 2014-10-01 People WaistInches Tony Rogerson tonyrogerson@torver.net Harpenden 36 on 2014-01-01, 36 on 2014-05-01, 38 on 2014-10-15 46 44
  • 8. Key/Value pair (EAV) Entity Attribute Value Person FullName Tony Rogerson Person Email tonyrogerson@torver.net Person PostalTown Harpenden Person ChestInches 46 Person Age 44 WaistInches FullName Tony Rogerson WaistInches WaistInches 36 WaistInches RecordedDate 2014-01-01 WaistInches FullName Tony Rogerson WaistInches WaistInches 36 WaistInches RecordedDate 2014-05-01 WaistInches FullName Tony Rogerson WaistInches WaistInches 38 WaistInches RecordedDate 2014-10-01 Examples: Riak, Dyanamo, Redis, Foundation etc. Tony Rogerson tonyrogerson@torver.net Harpenden 36 on 2014-01-01, 36 on 2014-05-01, 38 on 2014-10-15 46 44
  • 9. Document JSON Schema JSON Document { “FullName” : “string”, “Email” : “string”, “PostalTown” : “string”, “WaistInches” : { “WaistInches” : “number”, “RecordedDate” : “string” }, “ChestInches” : “number”, “Age” : “number” } { “FullName” : “Tony Rogerson”, “Email” : “tonyrogerson@torver.net”, “PostalTown” : “Harpenden”, “WaistInches” : [ { Examples: MongoDB, Couchbase, CouchDB etc. “WaistInches” : 36, “RecordedDate” : “2014-01-01” }, { “WaistInches” : 36, “RecordedDate” : “2014-05-01” } ], “ChestInches” : 46, “Age” : 44 } JSON vs XML discussion: http://stackoverflow.com/questions/4862310/json-and-xml-comparison Tony Rogerson tonyrogerson@torver.net Harpenden 36 on 2014-01-01, 36 on 2014-05-01, 38 on 2014-10-15 46 44
  • 10. Schema Design E.g. 100 machine cluster Document Database Normal Form (Relational) { "firstName": "John", "lastName": "Smith", "isAlive": true, "age": 25, "height_cm": 167.6, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100" }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "office", "number": "646 555-4567" } ] } person address phoneN umbers Object data stored together (collection) Object data stored separately (tables)
  • 11. MongoDB Example Use ESTER for MongoVUE What do documents look like?
  • 12. Graph SQL (inherently very poor performance): ◦ Nested Sets ◦ Recursive CTE Represents “connected” data All about understanding and exploring relationships Examples: Neo4j, Virtuoso, Allegro. Tony Dave Fred Sid Node Relationship
  • 13. Examples: Cassandra, Druid, HBase Column Values stored as a key-value pair Column Name (unique) Value Timestamp Important bit: It may not appear in each row! Column Family is: container for columns and rows (like but not a relational table) Relational Table: Fixed Columns Column Family: determined by application – flexible
  • 14. Column storage Examples: Cassandra, Druid, HBase http://www.datastax.com/docs/1.1/ddl/column_family Stored as…
  • 15. SQL Server Columnstore Table sliced into rowgroups (a group of rows – a batch) Each rowgroup compressed in column-wise manner Column segment is a column of data from within the rowgroup Column segment per column in table which is then compressed onto storage. SO: a table has rows (sliced into rowgroups), rowgroups have columns (each column having a column segment)
  • 16. Demo: SQL Sparse columns
  • 17. Key Concepts SHARDING, PARTITIONING, HASHING
  • 18. Hashing Distributed Database Cluster has fixed number of data nodes Your data is spread across the database cluster ◦ 10 node cluster; each data item may reside on 3 nodes ◦ Which 3 nodes? Data key is Hashed to a number – hashing algorithm is deterministic data-node = f( data-key ) ◦ print ( checksum( 'All hale to the ale' ) * 1.) % 10 ◦ print ( checksum( 'And a glass of wine for the ladies' ) * 1.) % 10
  • 19. Partitioning Chop big table up into “horizontal partitions” Partition key required Each partition is self-contained binding rows by the partitioning key Access all data through logical view over all partitions Table by table basis
  • 20. Shared Nothing Partitioning+ Each Shard is self-contained and has all the procs, meta-data and of course your partition of data Shard Key common to multiple tables, for example CustomerID, Email Address. Greater autonomy across the distributed database Seeing the entire database as a logical unit is more difficult – joining is a nightmare Node 1 Node 2 Node 3
  • 21. Sharding Sync Node 1 Node 2 Node 3 Full copy of data Subset of data Replication
  • 22. ACID (Automicity, Consistency, Isolation, Durability) BASE (Basically Available, Soft-state, Eventually Consistent) ACID is a Transactional model Not specific to the relational database ◦ eg. HIVE (interface to HADOOP) provides ACID facilities Durability: write ahead Logging expensive (latency from serialisation of writes) Distributed transactions – Two Phase Commit (2PC) ◦ Poor scalability because of Latency ◦ ACID across distributed nodes bad design choice ◦ Partition/Shard database and ACID in-node only Coordinator Subordinate Subordinate INSERT 2PC Transaction All or nothing
  • 23. ACID (Automicity, Consistency, Isolation, Durability) BASE (Basically Available, Soft-state, Eventually Consistent) BASE is a Transactional modelish Specific to Distributed database model Basically Available – all or some of the system is available Node 1 Node 2 Node 3
  • 24. ACID (Automicity, Consistency, Isolation, Durability) BASE (Basically Available, Soft-state, Eventually Consistent) Soft-state Eventually Consistent System may change over time [as replica’s become up-to-date (consistent)] Node 1 Node 2 Node 3 Insert value ‘A’
  • 25. SQL AH – THE COMMON DENOMINATOR OF AN ACCESS LAYER
  • 26. What is SQL? SQL is NOT a method of storing data! SQL is a language, it’s just syntax Relational Theory = thinking in sets SQL is a language that follows (but does not obey) relational theory With SQL we associate ACID (but durability is now optional in SQL 2014)
  • 27. NoSQL NOT ONLY SQL NO SQL
  • 28. Origins NoSQL? First NoSQL database was an open source relational database NoSQL (really NoREL) started in mid 2000’s Realisation that ACID doesn’t scale easily Should really be NoACID (Mutually exclusive for some 70’s developers) Hadoop – came out of Yahoo Cassandra, Riak and others derivatives of Amazon Dynamo NoSQL basically means: ACID doesn’t scale, SQL is too restrictive, and I’m a developer and I like complexity.
  • 29. But why the need for “NoSQL”? Feb 2001 ◦ BigData - http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling- Data-Volume-Velocity-and-Variety.pdf Basically Scale-Up (SAN) costs too much and doesn’t scale well Sick of vendor lock in and associated costs – open source software running across cheap commodity machines (Redundant Array of Inexpensive Servers) Availability, Resilience – by design – by software and not expensive hardware Existing Relational Databases (with SQL as their only language) expensive and too slow (ACID) BASE v ACID SQL implements a rigid and inflexible framework (or does it)
  • 30. Eventual Consistency in SQL Server Asynchronous Availability Groups/Database Mirroring Replication Eventual / Causal Consistency ◦ Eventual no good for order specific [and important] transactions ◦ Like Merge replication ◦ Causal: deliver messages in correct order [e.g. service broker] ◦ Like Transactional Replication
  • 31. MongoDB – Replica Set primary $ mongo --host 10.0.0.1 --port 27017 ROSIE 10.0.0.2 ESTER 10.0.0.1 HAZEL 10.0.0.3 secondary's replication replication Heart-beat • 1 Master – Multiple Secondary’s • 1 R/W – Multiple Readers • Setup: • Use replication.replSetName in mongo config file • On Primary: • rs.initiate() • rs.add( “---secondary address” ) • rs.add( “---secondary address” ) • rs.status()
  • 32. MongoDB - Sharding Shards of data (data chopped up into multiple ranges, range depends where it sits) Standalone or Replica-Set MongoDB instances (data storage) Stores configuration information about the Shards.
  • 33. MongoDB – Sharding (with Replica-Set) mongod: port 27017, replSet: rsDemoRS2 DAISY 10.0.0.4 CONISTON 10.0.0.11 POPPY 10.0.0.5 KARLI 10.0.0.6 mongod: port 27017, replSet: rsDemo mongos: port 27020 (on ESTER, HAZEL, ROSIE) ROSIE 10.0.0.2 config servers port 27019 (shard information point to replica sets) ESTER 10.0.0.1 primary HAZEL 10.0.0.3 secondary's replication replication Heart-beat THIRLMERE 10.0.0.13 primary ULLSWATER 10.0.0.12 secondary's replication replication Heart-beat DAISY 10.0.0.4 Query Balancer Query
  • 35. Relational Databases catch up Maintains ACID Same scalability and performance of NoSQL systems Some Vendors: Clustrix, MemSQL, NuoDB, VoltDB, Postgres-XL Auto-sharding, auto-partitioning Queries need to take place on same box to save latency http://www.postgres-xl.org/overview/
  • 36. Summary / Q & A / Discuss

Editor's Notes

  • #2: Today’s environment is a polyglot database, that is to say, it’s made up of a number of different database sources and possibly types. In this session we’ll look at some of the options of storing data – relational, key/value, document etc. I’ll overview what is SQL, NoSQL and NewSQL to give you some context for today’s world of data storage.