SlideShare a Scribd company logo
 The name “NoSQL” was in fact first used by Carlo Strozzi in 1998 as the name of file-
based database he was developing. It was a relational database.
 Neal Ford coined the term Polyglot Programming in 2006
 SABRE, a database still used in the airline industry, predates relational databases
(however, they began using a relational database is 2001)
 Ingres and System R were two early relational database prototypes in the 70s
 The term "relational database" was invented by E. F. Codd at IBM in 1970
 Some claim that “NoSQL” now means “Not Only SQL”, and that it isn’t anti-relational
 Rackspace used the term “NoSQL” for a conference in 2009.
 Terms: “Relational Winter” gives way to “Database Thaw” (ok, not a fact, but still fun)
Which Azure Database
Option Should I Choose? Anne Bougie
Senior Software Developer
Concurrency, Inc.
Twitter: @bougiefever
Understanding Database Options
Azure SQL
DocumentDb (Cosmos)
Azure Table Service (Cosmos)
Azure Redis Cache
Hbase
Gremlin (Cosmos)
SQL
NoSQL
Azure Database Options - NoSql vs Sql
Azure Database Options - NoSql vs Sql
SQL NoSQLRelational
Persistent
Transactional
Schema
Data Integrity
Schema-less
Clusters
Scalability
Fast Data Access
Partitions
Concurrency
Sharding
Azure Database Options - NoSql vs Sql
Querying Data
Transactions
Scalability
Configuration/Management
Schema
Speed
SQL NoSQL
SQL NoSQL
SQL NoSQL
Order
Player: Mojo Jojo
In-App Purchase
Order Id: 1234
Financial
Credit Card: 4012888888881881
Expiration: 5/2020
1 Gems 250 $5
2 Potions 10 $10
Order
Player Assets
Line Items
Credit Card
The object-relational impedance
mismatch is a set of conceptual and
technical difficulties that are often
encountered when a relational database
management system (RDBMS) is being
served by an application program (or
multiple application programs) written in
an object-oriented programming
language or style, particularly because
objects or class definitions must
be mapped to database tables defined by
relational schema.
Order
Player: Mojo Jojo
In-App Purchase
Order Id: 1234
Financial
Credit Card: 4012888888881881
Expiration: 5/2020
1 Gems 250 $5
2 Potions 10 $10
Order
Player Assets
Line Items
Credit Card
Order
Player: Mojo Jojo
In-App Purchase
Order Id: 1234
Financial
Credit Card: 4012888888881881
Expiration: 5/2020
1 Gems 250 $5
2 Potions 10 $10
{
orderId: 1234,
player: ‘Mojo Jojo’,
inAppPurchase {
date: 1/1/2017,
items: [
{
objectType: ‘Gem’,
quantity: 250,
priceEach: 0.02
},
{
objectType: ‘Potion’,
quantity: 10,
priceEach: 1.00
}
]
}
…
Document
Column Family
Graph
Key-Value
DocumentDb Azure Table Service
Azure Redis Cache
Azure Hbase
NoSQL
Gremlin
Azure Database Options - NoSql vs Sql
 Fast reads & writes
 Flat data structure
 Schema-less
 Partition key
 Row key
 Time stamp
 Service will scale out using partition
key
 In memory key-value store
 Very fast reads (faster than table
storage),
 Used as a database, cache and
message broker
 Transactions
 Expiration of items
Azure Database Options - NoSql vs Sql
 Stores data in json
 Schema-less
 Stores complex, hierarchical data
 Highly scalable
Azure Database Options - NoSql vs Sql
 Part of the Hadoop eco system
 Commonly augmented with Hive
 Can handle very large amounts of writes in a short period of time
Azure Database Options - NoSql vs Sql
 Nodes and relationships
 Data with many complex relationships
 Typically used to augment the system of record
Document
Column
Family
Key Value
Graph
Azure Database Options - NoSql vs Sql
Consistency
Availability
Partition
Tolerance
SQL
CosmosDB
CosmosDB
Hbase
CosmosDB
Azure Redis Cache
Never Gonna Get It
 What are the relationships
 How much data, and how fast is it coming in
 How the data will be accessed
 Data access performance requirements
 Consistency/Transactional requirements
 Entity complexity
 Programmer skill level
 Lots of data
 Simple data
structure
 Quickly perform
small read and
write operations
 Inexpensive,
fairly simple
 Need to add data
items willy nilly
 Lots of data
 Need high
performance
 More complex
data structure
 Need to add data
items willy nilly
 Lots of data
 Insanely huge
amounts of data
 Need high
performance
 No joins
 Lots of data
 Lots of
connections
between entities
 Quickly changing
relationships
between entities
Key Value Column Family Document Graph
 Azure Redis Cache
 Leaderboards, Shopping Carts
 Latest x items of anything
 Deletes and filters
 Cache
 Azure Table Storage
 Large amounts of data with a flat structure
 Fast querying using the partition and row keys
 DocumentDb
 Product catalogs, gaming, social networking
 Hbase
 Voting, Race, anything with huge amounts of data being generated in huge bursts, telemetry
 Gremlin
 Social networking relationships, anything with complex and changing relationships between
entities
Problem
 Lots of data
 Fairly simple data structure
 Lots of small reads and writes
 Need high performance
 Need high availability
 Need fast searching on columns other
than the key
Solution
 Use Azure Storage
 Augment with Redis Cache for
searching
Problem
 Really need high consistency
 Highly structured data
 Current data is not really large
 Historical data is huge
 And we need to report on historical
data
Solution
 Use Azure Sql
 Archive to DocumentDb
Just cause it’s old, doesn’t mean it’s
not cool anymore
 Slides on GitHub https://github.com/Bougiefever/AzureNoSqlDataPrimer
 A Newbie Guide to Databases https://blog.appdynamics.com/engineering/a-newbie-
guide-to-databases/
 That NoSQL Thing: Column (Family) Databases https://ayende.com/blog/4500/that-
no-sql-thing-column-family-databases
 NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence
http://a.co/1YwtJ47
 On Sharding Graph Databases http://jimwebber.org/2011/02/on-sharding-graph-
databases/
 Azure Cosmos DB Documentation https://docs.microsoft.com/en-us/azure/cosmos-db/
Anne Bougie
anne.bougie@gmail.com
@bougiefever
http://www.bougiefever.com
https://github.com/Bougiefever

More Related Content

PPTX
Webinar: Live Data Visualisation with Tableau and MongoDB
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
PPTX
Beyond the Basics 3: Introduction to the MongoDB BI Connector
PDF
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PPTX
Jumpstart: Introduction to MongoDB
PDF
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
PDF
Blazing Fast Analytics with MongoDB & Spark
Webinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Beyond the Basics 3: Introduction to the MongoDB BI Connector
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
Jumpstart: Introduction to MongoDB
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
Blazing Fast Analytics with MongoDB & Spark

What's hot (20)

PPTX
SlamData - How MongoDB Is Powering a Revolution in Visual Analytics
PDF
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
PDF
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
Spark and MongoDB
PPTX
CData Data Today: A Developer's Dilemma
PDF
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
PDF
MongoDB Launchpad 2016: What’s New in the 3.4 Server
PPTX
Tutorial: Building Your First App with MongoDB Stitch
DOCX
Mongo db report
PPTX
PPTX
Document Model for High Speed Spark Processing
PDF
Mongo db a deep dive of mongodb indexes
PPTX
Jumpstart: Introduction to Schema Design
PPTX
MongoDB and Spark
PDF
Analyze and visualize non-relational data with DocumentDB + Power BI
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PPTX
MongoDB and Hadoop: Driving Business Insights
PPTX
Addressing Your Backup Needs Using Ops Manager and Atlas
PPTX
Data Analytics: Understanding Your MongoDB Data
PDF
Give Me My Damn Report: Making NoSQL Data Accessible to the Business
SlamData - How MongoDB Is Powering a Revolution in Visual Analytics
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
Spark and MongoDB
CData Data Today: A Developer's Dilemma
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Launchpad 2016: What’s New in the 3.4 Server
Tutorial: Building Your First App with MongoDB Stitch
Mongo db report
Document Model for High Speed Spark Processing
Mongo db a deep dive of mongodb indexes
Jumpstart: Introduction to Schema Design
MongoDB and Spark
Analyze and visualize non-relational data with DocumentDB + Power BI
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB and Hadoop: Driving Business Insights
Addressing Your Backup Needs Using Ops Manager and Atlas
Data Analytics: Understanding Your MongoDB Data
Give Me My Damn Report: Making NoSQL Data Accessible to the Business
Ad

Similar to Azure Database Options - NoSql vs Sql (20)

PDF
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
PPTX
7 Databases in 70 minutes
PPTX
The Internet as a Single Database
PDF
Big Data Analytics from Azure Cloud to Power BI Mobile
PPTX
Designing big data analytics solutions on azure
PPTX
Modern data warehouse
PDF
Red hatpartner2013edb futureofdatabase
 
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PPTX
Evolution of the DBA to Data Platform Administrator/Specialist
PDF
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
PPTX
Data infrastructure architecture for medium size organization: tips for colle...
PDF
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
PPT
Kellogg XML Holland Speech
PDF
Events and metrics the Lifeblood of Webops
PPTX
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
PPTX
Relational databases vs Non-relational databases
PDF
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
PDF
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
PDF
Making the leap to BI on Hadoop by Mariani, dave @ atscale
PDF
Adf walkthrough
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
7 Databases in 70 minutes
The Internet as a Single Database
Big Data Analytics from Azure Cloud to Power BI Mobile
Designing big data analytics solutions on azure
Modern data warehouse
Red hatpartner2013edb futureofdatabase
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Evolution of the DBA to Data Platform Administrator/Specialist
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Data infrastructure architecture for medium size organization: tips for colle...
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Kellogg XML Holland Speech
Events and metrics the Lifeblood of Webops
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
Relational databases vs Non-relational databases
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Adf walkthrough
Ad

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Mega Projects Data Mega Projects Data
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Computer network topology notes for revision
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
.pdf is not working space design for the following data for the following dat...
SAP 2 completion done . PRESENTATION.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
1_Introduction to advance data techniques.pptx
Introduction to Knowledge Engineering Part 1
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Clinical guidelines as a resource for EBP(1).pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Supervised vs unsupervised machine learning algorithms
Mega Projects Data Mega Projects Data
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IB Computer Science - Internal Assessment.pptx
Computer network topology notes for revision

Azure Database Options - NoSql vs Sql

  • 1.  The name “NoSQL” was in fact first used by Carlo Strozzi in 1998 as the name of file- based database he was developing. It was a relational database.  Neal Ford coined the term Polyglot Programming in 2006  SABRE, a database still used in the airline industry, predates relational databases (however, they began using a relational database is 2001)  Ingres and System R were two early relational database prototypes in the 70s  The term "relational database" was invented by E. F. Codd at IBM in 1970  Some claim that “NoSQL” now means “Not Only SQL”, and that it isn’t anti-relational  Rackspace used the term “NoSQL” for a conference in 2009.  Terms: “Relational Winter” gives way to “Database Thaw” (ok, not a fact, but still fun)
  • 2. Which Azure Database Option Should I Choose? Anne Bougie Senior Software Developer Concurrency, Inc. Twitter: @bougiefever Understanding Database Options
  • 3. Azure SQL DocumentDb (Cosmos) Azure Table Service (Cosmos) Azure Redis Cache Hbase Gremlin (Cosmos) SQL NoSQL
  • 11. Order Player: Mojo Jojo In-App Purchase Order Id: 1234 Financial Credit Card: 4012888888881881 Expiration: 5/2020 1 Gems 250 $5 2 Potions 10 $10 Order Player Assets Line Items Credit Card The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being served by an application program (or multiple application programs) written in an object-oriented programming language or style, particularly because objects or class definitions must be mapped to database tables defined by relational schema.
  • 12. Order Player: Mojo Jojo In-App Purchase Order Id: 1234 Financial Credit Card: 4012888888881881 Expiration: 5/2020 1 Gems 250 $5 2 Potions 10 $10 Order Player Assets Line Items Credit Card
  • 13. Order Player: Mojo Jojo In-App Purchase Order Id: 1234 Financial Credit Card: 4012888888881881 Expiration: 5/2020 1 Gems 250 $5 2 Potions 10 $10 { orderId: 1234, player: ‘Mojo Jojo’, inAppPurchase { date: 1/1/2017, items: [ { objectType: ‘Gem’, quantity: 250, priceEach: 0.02 }, { objectType: ‘Potion’, quantity: 10, priceEach: 1.00 } ] } …
  • 14. Document Column Family Graph Key-Value DocumentDb Azure Table Service Azure Redis Cache Azure Hbase NoSQL Gremlin
  • 16.  Fast reads & writes  Flat data structure  Schema-less  Partition key  Row key  Time stamp  Service will scale out using partition key
  • 17.  In memory key-value store  Very fast reads (faster than table storage),  Used as a database, cache and message broker  Transactions  Expiration of items
  • 19.  Stores data in json  Schema-less  Stores complex, hierarchical data  Highly scalable
  • 21.  Part of the Hadoop eco system  Commonly augmented with Hive  Can handle very large amounts of writes in a short period of time
  • 23.  Nodes and relationships  Data with many complex relationships  Typically used to augment the system of record
  • 27.  What are the relationships  How much data, and how fast is it coming in  How the data will be accessed  Data access performance requirements  Consistency/Transactional requirements  Entity complexity  Programmer skill level
  • 28.  Lots of data  Simple data structure  Quickly perform small read and write operations  Inexpensive, fairly simple  Need to add data items willy nilly  Lots of data  Need high performance  More complex data structure  Need to add data items willy nilly  Lots of data  Insanely huge amounts of data  Need high performance  No joins  Lots of data  Lots of connections between entities  Quickly changing relationships between entities Key Value Column Family Document Graph
  • 29.  Azure Redis Cache  Leaderboards, Shopping Carts  Latest x items of anything  Deletes and filters  Cache  Azure Table Storage  Large amounts of data with a flat structure  Fast querying using the partition and row keys  DocumentDb  Product catalogs, gaming, social networking  Hbase  Voting, Race, anything with huge amounts of data being generated in huge bursts, telemetry  Gremlin  Social networking relationships, anything with complex and changing relationships between entities
  • 30. Problem  Lots of data  Fairly simple data structure  Lots of small reads and writes  Need high performance  Need high availability  Need fast searching on columns other than the key Solution  Use Azure Storage  Augment with Redis Cache for searching
  • 31. Problem  Really need high consistency  Highly structured data  Current data is not really large  Historical data is huge  And we need to report on historical data Solution  Use Azure Sql  Archive to DocumentDb
  • 32. Just cause it’s old, doesn’t mean it’s not cool anymore
  • 33.  Slides on GitHub https://github.com/Bougiefever/AzureNoSqlDataPrimer  A Newbie Guide to Databases https://blog.appdynamics.com/engineering/a-newbie- guide-to-databases/  That NoSQL Thing: Column (Family) Databases https://ayende.com/blog/4500/that- no-sql-thing-column-family-databases  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence http://a.co/1YwtJ47  On Sharding Graph Databases http://jimwebber.org/2011/02/on-sharding-graph- databases/  Azure Cosmos DB Documentation https://docs.microsoft.com/en-us/azure/cosmos-db/ Anne Bougie anne.bougie@gmail.com @bougiefever http://www.bougiefever.com https://github.com/Bougiefever

Editor's Notes

  • #5: What I.m going to cover today is an overview of several of the types of storage options in Azure, how they compare to one another.
  • #6: In the past, before my time even, there were many ways to store data. Punch cards, paper reels, dewey decimal system.
  • #7: In the 80s, Relation Databases became popular, and have remained so, and will continue to be. In fact, RDBMSs have been so popular, that the choice hasn’t been which data storage option to choose, but how are we going to set up our tables, where are the foreign keys, indexes and so on. Talk about paper machine telemetry.
  • #8: Since NoSQL covers such a broad range of data storage types, it really doesn't have a clear-cut definition. Set's start out by listing some of the words we use to describe them. On the SQL side, we talk about schema, table relationships, transactions, enforcing data and referential integrity On the NoSQL side, you hear a lot of chatter about no schemas - this is considered one of the many benefits of NoSQL - Scalability, partitioning - and you may have heard about the fast data access Talk about Schema-less vs schema here!
  • #9: Today, however, with NoSQL offerings, that choice is not as clear cut. Designing the data storage has never been a clear cut, easy task, and it’s even more complex now. So, hopefully I can shed some light on what options there are, and what the trade-offs are between choosing one over another.   Since Relational database have been so popular, let's try to define what NoSQL is. if you starting a new strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but you should seriously look at other alternatives.
  • #10: Since NoSQL is not that easy to define, I thought it made sense to compare some of the features against one another. SQL - Bubbles/NoSQL Buttercup Querying - (SQL) very strong query support/joining - you can even join tables in different database - RDBMS is an integration tool NoSQL side - There are tools to help with indexing and querying, but the tools themselves fall short in this area. This may or may not be a problem, depending on your application Transactions - Relational databases are the golden standard. That's not to say that NoSQL doesn't have transactional support, but Bubbles is the clear winner. Redid Cache has no concept of a transaction. Scalability - NoSQL it's one of the biggest reasons No SQL was invented. Sharding is something that is planned for when you first set up your database - you must provide a partition key. Configuration/Management - (NoSQL) Had to go with NoSQL. Adding or dropping fields is as simple as either saving or not saving them in your code. Both Azure SQL and DocumentDb offer geo-redundancy and failover, but DocDb does it more gracefully and without any changes to your apps to reflect the change Schema - If you want strong control over your data at the database level, clearly Bubbles is the winner. However, if you want the flexibility that schema-less has, then Buttercup wins. Talk about schematic vs schema-less Speed - Again, one of the reasons it was invented was to deliver fast speeds when you have lots and lots of data. We all know SQL queries against many millions of rows are slow, even with indexing.
  • #11: Scalability - So - we have our data on our servers in the cloud and we need to scale.   social networks, activity in logs, mapping data.   In SQL Server, our only option, really, is to scale up to a larger, faster server. There are some horizontal scaling options, but it is tricky to set up and maintain. (show) (NoSql) Partitioning and scaling horizontally is very easy to do, and after setting up the partition key, you really don't have to do much at all - That being said, you need to think a lot about what you choose for your partition key, because a wrong choice will mean poor performance. Entities that are retrieved together need to be in the same partition so you don't end up hitting
  • #12: Scalability - So - we have our data on our servers in the cloud and we need to scale.   social networks, activity in logs, mapping data.   In SQL Server, our only option, really, is to scale up to a larger, faster server. There are some horizontal scaling options, but it is tricky to set up and maintain. (show) (NoSql) Partitioning and scaling horizontally is very easy to do, and after setting up the partition key, you really don't have to do much at all - That being said, you need to think a lot about what you choose for your partition key, because a wrong choice will mean poor performance. Entities that are retrieved together need to be in the same partition so you don't end up hitting
  • #13: When object-oriented program is persisted to a RDBMS (relational database management system) A cohesive set of data in your program gets persisted to several different tables ORMs created to handle this problem Order – we think of an order as one thing, but Order/LineItem – when we save it, we need to split it out
  • #14: When object-oriented program is persisted to a RDBMS (relational database management system) A cohesive set of data in your program gets persisted to several different tables ORMs created to handle this problem Order – we think of an order as one thing, but Order/LineItem – when we save it, we need to split it out
  • #15: Talk about schema/not lack of stucture
  • #16: Document: DocumentDb/MongoDb/CouchDb Json – Complex hierarchical structures – can pull out parts of document ------ KeyValue: Azure Table Storage/SimpleDb/DynamoDb/Riak Simpleist – Glorified Dictionary/Hashmap Redis Cache/Redis/Memcached ColumnFamily: Azure Hbase/Apache Hbase/Cassandra (Apache)/BigTable Graph: Neo4J/OrientDb
  • #17: Azure Table Storage/Azure Redis Cache/DynamoDb/Riak KeyValue: Azure Table Storage/SimpleDb/DynamoDb/Riak Simpleist – Glorified Dictionary/Hashmap No set schema – need structure – Implicit Schema to get info out and to deserialize Partition along partition key Azure Storage - Retrieve data with partition key and row key Redis Cache – Retrieve data with key
  • #18: 500 TB 1MB each row 252 properties each table Edm.Binary. Edm.Boolean. Edm.DateTime. Edm.Double. Edm.Guid. Edm.Int32. Edm.Int64. Edm.String. PartitionKeys and RowKeys Drive Performance and Scalability
  • #20: Memcached
  • #21: CouchDb,MongoDb Similar to key/value, but may store complex objects Documents in a document store are roughly equivalent to the programming concept of an object.  No set schema – need structure – Implicit Schema to get info out and to deserialize
  • #22: Document size 2MB Types of data: Json
  • #23: ColumnFamily: Azure Hbase/Apache Hbase/Cassandra (Apache)/BigTable In column-oriented NoSQL database, data is stored in cells grouped in columns of data rather than as rows of data. Columns are logically grouped into column families. Column families can contain a virtually unlimited number of columns that can be created at runtime or the definition of the schema. Read and write is done using columns rather than rows Instead of calling them tables, they are called columns.  Unlike a table, however, the only thing that you define in a column family is the name and the key sort options (there is no schema). A value is a tuple consisting of a name, a value, and a timestamp. More complicated Row key – can store multiple column families – columns that fit together Retrieve by row key and column family name
  • #24: Open source Built on Hadoop – modeled after Google’s BigTable framework that supports the processing and storage of extremely large data sets in a distributed computing environment Hive provides a database query interface to Apache Hadoop. Imagine you had a file that was larger than your PC's capacity. You could not store that file, right? Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very, very large files. It also lets you store many, many files.“ Move data to server
  • #25: Node and relationships – relationships are first-class citizens of db Inbound/outbound - Movie Club
  • #26: Customer with orders – products Find all customer’s who’ve ordered a particular product – lots of work in traditional db In graph – you have an order node, look at inbound relationship, connect to customer nodes Typically used
  • #29: Strong consistency is only guaranteed within a geo region, across geo regions the consistency guarantees are weaker. So the CAP statement theorem is only overcome within a certain geographic scope. Consistent: Every read receives the most recent write Available: Every request receives a response, but it may not be the most recent Partition Tolerance: The system continues to operate despite dropped requests between nodes
  • #30: Relationships: Data that has a lot of relationships don’t do as well in relational – Graph can get data with lots of connections faster than lots of joining Scaling needs: FB – Column family database Complexity – DocumentDb with json especially if you are going to serialize to json anyway ORM has helped a lot – performance is often a problem
  • #31: CFDB is what happens when you take a database, strip everything away that make it hard to run in on a cluster and see what happens. (Ayende)
  • #33: It always sounds easy - "use the best tool for the job". With very isolated systems, it's easy to decide RDBMS for one application, Redis for another and Cassandra for somethings else. When it comes time to building systems with multiple persistent stores, we're met with challenges in integration, existing applications, and push back from IT administrators.
  • #36: https://vimeo.com/68320412 https://martinfowler.com/bliki/PolyglotPersistence.html