SlideShare a Scribd company logo
Database Systems
A Historical Perspective
_
Károly Kálmán
June 22, 2023
Topics Covered
Historical databases
Relational databases
Non-relational databases
Future directions
Historical Databases
Historical Databases (No Database)
All data is stored in memory
It's a start
✔ Fast
✔ Store anything in any format
✖ No persistent and durable storage
Historical Databases (Flat File)
Ted Scott ▫ $100 ▫ Apple ☷ Ai Joe ▫ $900 ▫ Peach ☷
◺ ◿ ↑ ↑
field │ │
value │ └─ record separator
└─ field separator
✔ Persistent
✔ Store anything (records can be different)
✖ Low-level access, programmer needed
✖ Complex queries are hard and slow
➤ Today: for small data sets in some domains
Historical Databases (Hierarchical)
CTO
╱ ╲
Head1 Head2
╱ ╲
Mngr1 Mngr2
✔ Defined structure
✔ Faster than flat file
✖ Navigation through the hierarchy only (up-down)
✖ "Programmer perspective" needed
➤ Today: LDAP, Active Directory
Historical Databases (Navigational)
John ── Alice ── Maggie Rob
│ │ │ │
Richard ── Scott Susie ── Nancy
✔ Relaxed navigation
✔ Very fast
✖ Still pre-determined navigation (no ad hoc queries)
✖ "Programmer perspective" needed
➤ Today: IBM Information Management System v15
Relational Databases
Relational Database Management System (RDBMS)
E. F. Codd in 1970 (IBM)
Relational model of data
Based on formal (math) rules
Optimal database design (NF)
Data access optimization
User friendly
Very popular
MySQL, Oracle, MS SQL, Sybase, MS Access, etc.
RDBMS Concepts (Table)
Database = tables + table cross-reference + keys
RDBMS Concepts (Keys)
Primary Key (PK) is an unique identifier for an entity
Keys are needed to make relations
Structured Query Language (SQL)
SQL = Structured Query Language
ANSI Standard
Declarative language
Focus on what to do, not how to do
User friendly
Abstractions for non-programmers
English like language
Pure SQL applications (MS Access)
Not fancy, but no programming needed
Structured Query Language (Table Operations)
Create table
CREATE TABLE families (f_name char, s_name char, id int);
Modify table (add column)
ALTER TABLE families ADD child_name char;
Delete table
DROP TABLE families;
DROP TABLE = NoSQL :)
Structured Query Language (Data Operations)
Insert new data
INSERT INTO families VALUES ("Philip", "Zimmer", 3);
Query for data
SELECT f_name, s_name FROM families WHERE child_count > 2;
Modify existing data
UPDATE families SET f_name ="Jonas" WHERE f_name = "Jhn";
Structured Query Language (Transaction)
Transaction
Multiple operations treated as a single unit of work
Either all operations succeed or all fail
Example
BEGIN TRANSACTION
INSERT INTO families VALUES ("Philip", "Zimmer", 3);
INSERT INTO families VALUES ("Hans", "Vogler", 347);
END TRANSACTION
ACID Model
ACID defines who sees what changes and when
ACID transaction control properties
Atomic: operations succeed or roll-back (state before)
Consistent: database is in correct state when trans. finished
Isolated: transactions do not disturb/effect another
4 isolation levels (speed vs consistency)
Durable: results are permanent, even if error'd
Typical 3-tier System Architecture
RDBMS Drawbacks
Scaling is hard (ACID)
Expensive
'Free' solutions are not mature for 9...9%
Non-structured data is hard to store
NoSQL for rescue
For majority of uses RDBMS is just enough
Distributed Databases
Multiple database servers
Data duplicated
Performance/availability increases
But complexity too
Distributed Databases (Replicas)
Master-Slave
Master serves r/w and replicates data to slaves
Slaves serve reads only
Master-Master
Multiple masters that serve r/w
Replication between masters
Distributed Databases (Sharding and Federation)
Sharding
Break data into smaller chunks by key
Store chunks on different servers
Federation
Databases by domain functions
No single monolith database
Query impact (linking tables)
Data Warehouse TODO!!!!
Current and historical data
Store structured data (schema)
Query focused (Business Analytic)
Large and central data store
Data Mart TODO!!!!
Specific views by business departments
Based on data warehouse
Multiple data marts, not a single monolith
More summarized than data warehouse
Data Lake TODO!!!!
Central location for all data
Store raw data (no schema)
Purpose of data is not defined
Data science
Data Pipeline TODO!!!!
Process to move or transform data between systems
...
Data Mesh TODO!!!!
Architectural pattern
Data ownership and distribution
Analytical data (optimizing the business)
Historical and aggregated view
Operational data (running the business)
Current and transactional state
Non-Relational Databases
(NoSQL)
NoSQL Databases
Schema/structure definition is optional
Store anything (mix data in collections)
Need to know major use cases before design
Performance
Very good for expected use cases
Bad for unexpected use cases
Varied transaction support (event-cons, quorum)
Query language complexities
Scalable distributed systems
Consistency Models TODO!!!!
When reader sees a system change TODO!!!!
Weak
Reader might or might not (at all) see the change
Eventual
Reader will see the change sometime
Strong
Reader sees the change immediately
CAP Theorem
Eric Brewer (~1997)
CAP theorem (Reliability)
Consistency: a read receives the most recent data or an error
Availability: a request receives a (non-error) response with
(maybe old) data
Partition tolerance: system operates when network is not
reliable
Choose two (but P shall be a must)
Some systems support configurable CAP modes
BASE Model
Similar to ACID, but for NoSQL
BASE model properties
Basically Available: system guarantees availability
Soft state: system state may change over time, even with no
input
Eventual consistency: system will be consistent over a period of
time, if no input received
NoSQL Databases (Historical)
XML Database
Wasteland
Object Store
Programmer's database
NoSQL Databases (Key-Value)
123 ↠ firstName = "Arthur" ⌁ surName = "Legend"
8874 ↠ color = "Black ⌁ make = "Ford"
Very Fast
Simple to use
Access by keys only
Caches (Infinispan, Redis, Memcached, Ignite, etc.)
NoSQL Databases (Document I.)
Store JSON structured data
Documents can have different fields
{ ⌲ Document 1 Start
name: { ⌲ Complex field
first: "John" ⌲ Simple field
last: "Dee"
}
birth: "2/2/1982" ⌲ Document 1 field (only)
} ⌲ Document 1 End
{ ⌲ Document 2 Start
fullName : ⌲ Simple field
"James Doe"
} ⌲ Document 2 End
NoSQL Databases (Document II.)
Effective document (text) store
Free-text search engine
Documents are JSON based
Various query format
Varied transaction support (single doc.)
Couchbase, Elasticsearch, MongoDB, etc.
NoSQL Databases (Wide Column I.)
Rows (keys) with many (~1000) columns
Write optimized (call logs, bank transactions, etc.)
SQL like query language
Limited ACID support
Heavy weight systems
HBase, Cassandra, etc.
NoSQL Databases (Wide Column II.)
NoSQL Databases (Graph I.)
Based on directed graph
Nodes, properties and relations
Replacement for complex relational models
High level query language
ACID transactions
Neo4j (Cypher), GraphDB (SparQL), etc.
NoSQL Databases (Graph II.)
NoSQL Databases (Time I.)
Data points (measurement) over time interval
Regular intervals (metrics)
Irregular intervals (events)
Data is more useful as aggregate (continuous queries)
SQL like query language with time related additions
No transaction concept
PK is time in high precision
Data modification is rare (append only)
InfluxDB, Kdb+, Prometheus, etc.
NoSQL Databases (Time II.)
Example measurement:
weather,location=us-midwest temperature=82 144488740
| ─────────┬────────── ──┬─────────── |
measurement tag field timestamp
measurement ≈ table
tag ≈ indexed field
field ≈ not indexed field
NoSQL Databases (Computing Grid)
Calculations performed in a computing grid
Move program logic to data, not the other way around
Ignite, Infinispan, etc.
NoSQL Drawbacks
Operational/developer experience needed
Complex Infrastructure
Planned usage drives database design
Data de-normalization might be needed (!)
ACID/BASE compliance varies
Complex queries can be hard
Large distributed systems are always in the state of partial failure
>> Distributed systems are hard <<
NoSQL and RDBMS
Term NoSQL RDBMS
Consistency Weak Strong
Performance Varies Good
Language Custom SQL
DevOps Complex Simpler
Node Count >3 1
Scalability Good Poor
>> Use whatever is the best for the problem <<
Popular Database Combinations (2019)
A Distributed System Example
Future Directions
Cloud (hosted database)
Hybrid (multiple NoSQL modes)
NewSQL (SQL ↔ NoSQL convergence)
NIC/RDMA (cross memory access)
RAM Store (very fast)
FPGA (custom hardware)
Thank You!
Questions?
github.com/kk-sw
slideshare.net/kksw1/presentations
linkedin.com/in/károly-k-java/
kksw.nhely.hu

More Related Content

PDF
ORACLE ARCHITECTURE
PPTX
Indexing with MongoDB
PDF
MongoDB Administration 101
PDF
MongoDB Database Replication
PDF
Inside MongoDB: the Internals of an Open-Source Database
PDF
Oracle db architecture
PDF
Practical Partitioning in Production with Postgres
 
PPTX
The Basics of MongoDB
ORACLE ARCHITECTURE
Indexing with MongoDB
MongoDB Administration 101
MongoDB Database Replication
Inside MongoDB: the Internals of an Open-Source Database
Oracle db architecture
Practical Partitioning in Production with Postgres
 
The Basics of MongoDB

What's hot (20)

PDF
Mastering PostgreSQL Administration
 
PPTX
MySQL Architecture and Engine
PPTX
File system structure
PDF
MongodB Internals
PPTX
Mongo DB 102
PPTX
Hitchhiker's Guide to free Oracle tuning tools
PPTX
Database Programming
PDF
PostgreSQL Streaming Replication Cheatsheet
PPTX
Structure of Database MAnagement System
PPTX
Recovery Techniques and Need of Recovery
PDF
Introduction to column oriented databases
PPTX
5. stored procedure and functions
PDF
Database User and Administrator
PDF
Pl lab solution
DOCX
Index in sql server
PPTX
Hadoop Distributed File System
PPTX
Distributed DBMS - Unit 1 - Introduction
PDF
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
PPTX
Sql vs NoSQL
PPTX
Operating system critical section
Mastering PostgreSQL Administration
 
MySQL Architecture and Engine
File system structure
MongodB Internals
Mongo DB 102
Hitchhiker's Guide to free Oracle tuning tools
Database Programming
PostgreSQL Streaming Replication Cheatsheet
Structure of Database MAnagement System
Recovery Techniques and Need of Recovery
Introduction to column oriented databases
5. stored procedure and functions
Database User and Administrator
Pl lab solution
Index in sql server
Hadoop Distributed File System
Distributed DBMS - Unit 1 - Introduction
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Sql vs NoSQL
Operating system critical section
Ad

Similar to Database Systems - A Historical Perspective (20)

PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
NoSQL(NOT ONLY SQL)
PPTX
NoSQL.pptx
PPTX
Big data technology unit 3
PPT
No sql databases
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
PPTX
PostgreSQL as an Alternative to MSSQL
PDF
Types of Databases
PPS
Oracle Database Overview
PPTX
17-NoSQL.pptx
PPTX
NewSQL - Deliverance from BASE and back to SQL and ACID
PPTX
Is the traditional data warehouse dead?
PDF
About "Apache Cassandra"
PDF
Database Management System
PDF
Presentation sql server to oracle a database migration roadmap
PPTX
Introduction to Data Science NoSQL.pptx
PPTX
Overview of MongoDB and Other Non-Relational Databases
PPTX
Azure Data Fundamentals DP 900 Full Course
PPT
05 No SQL Sudarshan.ppt
PPT
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
cours database pour etudiant NoSQL (1).pptx
NoSQL(NOT ONLY SQL)
NoSQL.pptx
Big data technology unit 3
No sql databases
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
PostgreSQL as an Alternative to MSSQL
Types of Databases
Oracle Database Overview
17-NoSQL.pptx
NewSQL - Deliverance from BASE and back to SQL and ACID
Is the traditional data warehouse dead?
About "Apache Cassandra"
Database Management System
Presentation sql server to oracle a database migration roadmap
Introduction to Data Science NoSQL.pptx
Overview of MongoDB and Other Non-Relational Databases
Azure Data Fundamentals DP 900 Full Course
05 No SQL Sudarshan.ppt
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
Ad

Recently uploaded (20)

PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
1_Introduction to advance data techniques.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Business Acumen Training GuidePresentation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
IB Computer Science - Internal Assessment.pptx
.pdf is not working space design for the following data for the following dat...
IBA_Chapter_11_Slides_Final_Accessible.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
1_Introduction to advance data techniques.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Data_Analytics_and_PowerBI_Presentation.pptx
Launch Your Data Science Career in Kochi – 2025
Business Ppt On Nestle.pptx huunnnhhgfvu
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Database Infoormation System (DBIS).pptx
Business Acumen Training GuidePresentation.pptx

Database Systems - A Historical Perspective

  • 1. Database Systems A Historical Perspective _ Károly Kálmán June 22, 2023
  • 2. Topics Covered Historical databases Relational databases Non-relational databases Future directions
  • 4. Historical Databases (No Database) All data is stored in memory It's a start ✔ Fast ✔ Store anything in any format ✖ No persistent and durable storage
  • 5. Historical Databases (Flat File) Ted Scott ▫ $100 ▫ Apple ☷ Ai Joe ▫ $900 ▫ Peach ☷ ◺ ◿ ↑ ↑ field │ │ value │ └─ record separator └─ field separator ✔ Persistent ✔ Store anything (records can be different) ✖ Low-level access, programmer needed ✖ Complex queries are hard and slow ➤ Today: for small data sets in some domains
  • 6. Historical Databases (Hierarchical) CTO ╱ ╲ Head1 Head2 ╱ ╲ Mngr1 Mngr2 ✔ Defined structure ✔ Faster than flat file ✖ Navigation through the hierarchy only (up-down) ✖ "Programmer perspective" needed ➤ Today: LDAP, Active Directory
  • 7. Historical Databases (Navigational) John ── Alice ── Maggie Rob │ │ │ │ Richard ── Scott Susie ── Nancy ✔ Relaxed navigation ✔ Very fast ✖ Still pre-determined navigation (no ad hoc queries) ✖ "Programmer perspective" needed ➤ Today: IBM Information Management System v15
  • 9. Relational Database Management System (RDBMS) E. F. Codd in 1970 (IBM) Relational model of data Based on formal (math) rules Optimal database design (NF) Data access optimization User friendly Very popular MySQL, Oracle, MS SQL, Sybase, MS Access, etc.
  • 10. RDBMS Concepts (Table) Database = tables + table cross-reference + keys
  • 11. RDBMS Concepts (Keys) Primary Key (PK) is an unique identifier for an entity Keys are needed to make relations
  • 12. Structured Query Language (SQL) SQL = Structured Query Language ANSI Standard Declarative language Focus on what to do, not how to do User friendly Abstractions for non-programmers English like language Pure SQL applications (MS Access) Not fancy, but no programming needed
  • 13. Structured Query Language (Table Operations) Create table CREATE TABLE families (f_name char, s_name char, id int); Modify table (add column) ALTER TABLE families ADD child_name char; Delete table DROP TABLE families; DROP TABLE = NoSQL :)
  • 14. Structured Query Language (Data Operations) Insert new data INSERT INTO families VALUES ("Philip", "Zimmer", 3); Query for data SELECT f_name, s_name FROM families WHERE child_count > 2; Modify existing data UPDATE families SET f_name ="Jonas" WHERE f_name = "Jhn";
  • 15. Structured Query Language (Transaction) Transaction Multiple operations treated as a single unit of work Either all operations succeed or all fail Example BEGIN TRANSACTION INSERT INTO families VALUES ("Philip", "Zimmer", 3); INSERT INTO families VALUES ("Hans", "Vogler", 347); END TRANSACTION
  • 16. ACID Model ACID defines who sees what changes and when ACID transaction control properties Atomic: operations succeed or roll-back (state before) Consistent: database is in correct state when trans. finished Isolated: transactions do not disturb/effect another 4 isolation levels (speed vs consistency) Durable: results are permanent, even if error'd
  • 17. Typical 3-tier System Architecture
  • 18. RDBMS Drawbacks Scaling is hard (ACID) Expensive 'Free' solutions are not mature for 9...9% Non-structured data is hard to store NoSQL for rescue For majority of uses RDBMS is just enough
  • 19. Distributed Databases Multiple database servers Data duplicated Performance/availability increases But complexity too
  • 20. Distributed Databases (Replicas) Master-Slave Master serves r/w and replicates data to slaves Slaves serve reads only Master-Master Multiple masters that serve r/w Replication between masters
  • 21. Distributed Databases (Sharding and Federation) Sharding Break data into smaller chunks by key Store chunks on different servers Federation Databases by domain functions No single monolith database Query impact (linking tables)
  • 22. Data Warehouse TODO!!!! Current and historical data Store structured data (schema) Query focused (Business Analytic) Large and central data store
  • 23. Data Mart TODO!!!! Specific views by business departments Based on data warehouse Multiple data marts, not a single monolith More summarized than data warehouse
  • 24. Data Lake TODO!!!! Central location for all data Store raw data (no schema) Purpose of data is not defined Data science
  • 25. Data Pipeline TODO!!!! Process to move or transform data between systems ...
  • 26. Data Mesh TODO!!!! Architectural pattern Data ownership and distribution Analytical data (optimizing the business) Historical and aggregated view Operational data (running the business) Current and transactional state
  • 28. NoSQL Databases Schema/structure definition is optional Store anything (mix data in collections) Need to know major use cases before design Performance Very good for expected use cases Bad for unexpected use cases Varied transaction support (event-cons, quorum) Query language complexities Scalable distributed systems
  • 29. Consistency Models TODO!!!! When reader sees a system change TODO!!!! Weak Reader might or might not (at all) see the change Eventual Reader will see the change sometime Strong Reader sees the change immediately
  • 30. CAP Theorem Eric Brewer (~1997) CAP theorem (Reliability) Consistency: a read receives the most recent data or an error Availability: a request receives a (non-error) response with (maybe old) data Partition tolerance: system operates when network is not reliable Choose two (but P shall be a must) Some systems support configurable CAP modes
  • 31. BASE Model Similar to ACID, but for NoSQL BASE model properties Basically Available: system guarantees availability Soft state: system state may change over time, even with no input Eventual consistency: system will be consistent over a period of time, if no input received
  • 32. NoSQL Databases (Historical) XML Database Wasteland Object Store Programmer's database
  • 33. NoSQL Databases (Key-Value) 123 ↠ firstName = "Arthur" ⌁ surName = "Legend" 8874 ↠ color = "Black ⌁ make = "Ford" Very Fast Simple to use Access by keys only Caches (Infinispan, Redis, Memcached, Ignite, etc.)
  • 34. NoSQL Databases (Document I.) Store JSON structured data Documents can have different fields { ⌲ Document 1 Start name: { ⌲ Complex field first: "John" ⌲ Simple field last: "Dee" } birth: "2/2/1982" ⌲ Document 1 field (only) } ⌲ Document 1 End { ⌲ Document 2 Start fullName : ⌲ Simple field "James Doe" } ⌲ Document 2 End
  • 35. NoSQL Databases (Document II.) Effective document (text) store Free-text search engine Documents are JSON based Various query format Varied transaction support (single doc.) Couchbase, Elasticsearch, MongoDB, etc.
  • 36. NoSQL Databases (Wide Column I.) Rows (keys) with many (~1000) columns Write optimized (call logs, bank transactions, etc.) SQL like query language Limited ACID support Heavy weight systems HBase, Cassandra, etc.
  • 37. NoSQL Databases (Wide Column II.)
  • 38. NoSQL Databases (Graph I.) Based on directed graph Nodes, properties and relations Replacement for complex relational models High level query language ACID transactions Neo4j (Cypher), GraphDB (SparQL), etc.
  • 40. NoSQL Databases (Time I.) Data points (measurement) over time interval Regular intervals (metrics) Irregular intervals (events) Data is more useful as aggregate (continuous queries) SQL like query language with time related additions No transaction concept PK is time in high precision Data modification is rare (append only) InfluxDB, Kdb+, Prometheus, etc.
  • 41. NoSQL Databases (Time II.) Example measurement: weather,location=us-midwest temperature=82 144488740 | ─────────┬────────── ──┬─────────── | measurement tag field timestamp measurement ≈ table tag ≈ indexed field field ≈ not indexed field
  • 42. NoSQL Databases (Computing Grid) Calculations performed in a computing grid Move program logic to data, not the other way around Ignite, Infinispan, etc.
  • 43. NoSQL Drawbacks Operational/developer experience needed Complex Infrastructure Planned usage drives database design Data de-normalization might be needed (!) ACID/BASE compliance varies Complex queries can be hard Large distributed systems are always in the state of partial failure
  • 44. >> Distributed systems are hard <<
  • 45. NoSQL and RDBMS Term NoSQL RDBMS Consistency Weak Strong Performance Varies Good Language Custom SQL DevOps Complex Simpler Node Count >3 1 Scalability Good Poor
  • 46. >> Use whatever is the best for the problem <<
  • 49. Future Directions Cloud (hosted database) Hybrid (multiple NoSQL modes) NewSQL (SQL ↔ NoSQL convergence) NIC/RDMA (cross memory access) RAM Store (very fast) FPGA (custom hardware)