SlideShare a Scribd company logo
Understanding
Data
Shahd Salama
Software Engineer
Outline (Part 1)
• What are Structured, unstructured, semi-structured data?
• DBMS (Database Management System)
• ACID consistency
• Distributed DB
• Replicas and Sharding
• What is No-SQL?
• When to use No-SQL and when not to use No-SQL?
Outline (Part 2)
• GCP options for database
• What is managed cloud services
• Cloud storage
• Cloud SQL
• Cloud datastore (no sql)
• Cloud big table (no sql)
Structured data
• Can be stored in DB in tables (rows and columns)
• Tables have relational keys
• High degree of organization
• Database contains a schema
• Schema defines tables, tables fields and relationships
Click to add text
This erd represents one
to many relation
Unstructured data
• Data has no structure
• Data has no model
• Cannot fit in relational databases
• Can be represented as a sequence of bytes
• Examples: Images, Videos, files and emails
NLP, text analysis and data mining provide
methods to find patterns of info in unstructured
data
Unstructured data
Semi-structured data
• It is a type of structured data, but it lacks the model
• Tags are used to identify certain elements
• Data does not have rigid structure
• A group of data that belongs to a certain class can
have different attributes
• Data cannot fit into tables
• Examples : JSON, XML
Semi-structured data
DBMS
• A DBMS makes it possible for end users to create,
read, update and delete data in a database
• The DBMS essentially serves as an interface between
the database and end users or application programs
DBMS
Transactions in DBMS
Transactions is a group of tasks
Example:
A bank Employee wants to transfer 500 $ from account
X to account Y
Transactions in DBMS
open_account(A)
old_balance = A.balance
new_balance = old_balance – 500
A.balance = new_balance
close_account(A)
open_account(B)
old_balance = B.balance
new_balance = old_balance + 500
B.balance = new_balance
close_account(B)
Example
A Transaction in DB must maintain
ACID consistency
ACID
• Atomic
• Consistency
• Isolation
• Durability
Atomic
• A transaction must be treated as atomic unit
• All operations must be executed or none
• As in the bank transfer example we cannot
execute
The transaction cannot be partly excuted for example
subtract the transferred money from the first account and
not send it to the other account
Consistency
• A transaction either creates a new and valid state of
data, or, if any failure occurs, returns all data to its
state before the transaction was started
• If database was consistent before transaction then DB
must be consistent after transaction
Isolation
• If more than one transaction is executed at the same
time both should be in total isolation of each other
Isolation
• Example:
One day you go to a restaurant and the restaurant had
no other customers and you make an order then you
will get your order
The next day you go to the same restaurant and it was
full and you make an order then you should get the
same order as the day before
Isolation
In other words you should get the
same order whether another
orders are taken place in
the restaurant or not
Isolation
So If you execute T1 alone you will get
The same result when you execute T1 while T2 is
executing
Durability
• Guarantees that transaction that have committed will
survive permanently
• Whatever changes are made to DB those must have
affect irrespective to hardware or software failure
• Durability can be achieved by flushing the
transaction's log records to non-volatile
storage before acknowledging commitment.
Distributed database
•It is a database in which storage devices are not all attached to a
common processor.
• It may be stored in multiple computers, located in the same
physical location; or may be dispersed over a network of
interconnected computers
Distributes Database Management System
•DDBMS sync all data periodically and insures any
changes (updates, deletes and additions ) are
performed on data in one place will be automatically
reflected on data stored elsewhere
•User will always see data consistent with data seen by
another user
Replicas
• It is frequent copying data in db from one computer or
server to another
• So that all users share the same level of information
• The result : is distributed DB which portions of DB are
stored in multi physical locations and processing is
distributed among different nodes
Sharding
• A database sharding is partitioning of data in DB
• Each individual portion is referred to as shard
• Breaking BD into much smaller DBs
Eventually consistent data
• Eventual consistency is a consistency model used in
distributed computing to achieve high availability that
informally guarantees that, if no new updates are made to a
given data item, eventually all accesses to that item will
return the last updated value(consistency is maintained
later)
• Eventually-consistent services are often classified as
providing BASE (Basically Available, Soft state, Eventual
consistency) semantics, in contrast to traditional ACID
(Atomicity, Consistency, Isolation, Durability) guarantees.
Cap for distributed DB
• System made up of multi nodes (scaled out)
communicating with each other over a network
CAP Theorem
• Consistency : if you write to one node and read from
another node then you get what you wrote
 Data is consistent across all nodes
 All nodes must get the same most recent writes
• Availability: when you talk to a node it will respond
without a guarantee that respond is the most recent
• Partition tolerance: when the network is partitioned it
a node fails the system continues to work
CAP Theorem says
we can only have two of these properties
at most in distributed system
Understanding data
No-SQL
A NoSQL (originally referring to "non SQL" or "non-
relational") database provides a mechanism for storage
and retrieval of data that is modeled in means other
than the tabular relations used in relational databases.
No-SQL
A better was to describe no-sql is is Not Only SQL
because you can actually use SQL in No-SQL
Types and examples of NoSQL databases
Understanding data
No-SQL cannot support
• Joins
• No constraints support (example: null constraints)
• No support for complex transactions
• Data integrity
Example :
 Insert 3 records
Update 2 records
Check something if true rollback
No-SQL cannot support data integrity
No-SQL vs SQL
SQL vs NO-SQL
SQL NO-SQL (non-relational )
Structured data only Unstructured, semi-
structured, structured data
Fixed schema Flexible schema
Non-scalable scalable
Non distributed Distributed
transactional Non transactional
When to use NO-SQL
1. Storing and retrieving big quantities of data (Big Data)
2. Relationships between elements is not important
3. Dealing with growing lists of elements (social media
posts)
4. Unstructured data or the structure of data changes
rapidly
5. Constraints and validations of data can be performed
in application layer and no need to implement
constraints in DB
When not to use NO-SQL
1. Complex transactions (bank transfer from one
account to another example)
2. Joins must be handled by DB
3. Validations and constraints must be andeled by DB
GCP options for database
We need persisted durable
storage
Persistent storage means
data doesn’t go away after
device is turned off unlike
cache and ram
Cloud storage for unstructured data
Buckets : where you store your data
Objects : things you are storing
Creating buckets in cloud storage
You specify
• Name
• Class
• Location
Bucket class in cloud storage
1. Standard :
Best latency , highest availability 99.9%
2. Reduced availability :
Availability 99% , less expensive
3. Nearline :
Higher latency (few more seconds for the first byte)
way less expensive
if you will access object less than once a month
Best for archival scenarios
99 % available
Bucket location in data storage
1. Multi regional (boarder geographic area)
2. Regional : corresponding to region supported in
other cloud platform service like compute engine
Uploading objects to cloud storage
1. From developer console in gcp website
2. From command line using gutil in GC SDK:
Can access files in file system and objects stored in amazon S3
Cloud SQL for structured data
Cloud sql is hosted sql
All that you can do with sql can be done
with cloud sq
Google takes care of keeping the os up-
to-date, preforming backups...
Cloud SQL for structured data
Cloud SQL is a fully-managed
database service that makes it easy to set up,
maintain, manage, and administer your
relational PostgreSQL and MySQL databases
in the cloud.
Cloud SQL Supports
• Rich query language
• Primary and secondary indexes
• Acid transactions
• Relational integrity
• Stored procedure
Cloud SQL Supports
Cloud datastore for semi-structured
data
• Cloud data store is from the no-sql Family
• Scales gracefully from very small to very large
Understanding data
Cloud datastore for semi-structured
data
Cloud big table for semi-structured
and structured data
• Cloud big table is from the no-sql Family
• Stores over a terabyte of structured data
• Low lever DB
• High scalability
• Low latency (uses single zone)
• Does not scale down to small size
Thanks

More Related Content

PPTX
Sql vs NoSQL
DOCX
Sql vs NO-SQL database differences explained
PPTX
NewSQL - Deliverance from BASE and back to SQL and ACID
PPTX
Scalable relational database with SQL Azure
PPT
RDBMS vs NoSQL
PPTX
NoSQL Data Architecture Patterns
PPTX
What is NoSQL and CAP Theorem
PPTX
Introduction to NoSQL
Sql vs NoSQL
Sql vs NO-SQL database differences explained
NewSQL - Deliverance from BASE and back to SQL and ACID
Scalable relational database with SQL Azure
RDBMS vs NoSQL
NoSQL Data Architecture Patterns
What is NoSQL and CAP Theorem
Introduction to NoSQL

What's hot (20)

PPTX
Selecting best NoSQL
PPTX
NoSQL databases - An introduction
PPTX
NoSQL Consepts
PPTX
Chapter1: NoSQL: It’s about making intelligent choices
PPTX
Dynamodb Presentation
PDF
NOSQL- Presentation on NoSQL
PPTX
NoSql Data Management
PPTX
SQL Server 2016 - Stretch DB
PPTX
NoSQL Architecture Overview
PPT
An overview of snowflake
PPTX
SQL vs. NoSQL. It's always a hard choice.
PPTX
Incorta spark integration
PPTX
How Clean is your Database? Data Scrubbing for all Skill Sets
PPTX
PPT
SQL/NoSQL How to choose ?
PDF
NoSQL Now! NoSQL Architecture Patterns
PDF
Project Voldemort
PPTX
NOSQL Databases types and Uses
PDF
Oracle vs NoSQL – The good, the bad and the ugly
PPTX
Polyglot Persistence
Selecting best NoSQL
NoSQL databases - An introduction
NoSQL Consepts
Chapter1: NoSQL: It’s about making intelligent choices
Dynamodb Presentation
NOSQL- Presentation on NoSQL
NoSql Data Management
SQL Server 2016 - Stretch DB
NoSQL Architecture Overview
An overview of snowflake
SQL vs. NoSQL. It's always a hard choice.
Incorta spark integration
How Clean is your Database? Data Scrubbing for all Skill Sets
SQL/NoSQL How to choose ?
NoSQL Now! NoSQL Architecture Patterns
Project Voldemort
NOSQL Databases types and Uses
Oracle vs NoSQL – The good, the bad and the ugly
Polyglot Persistence
Ad

Similar to Understanding data (20)

PPTX
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
PPTX
Introduction to Data Science NoSQL.pptx
PPTX
Big Data Analytics Module-3 as per vtu syllabus.pptx
PPTX
Hbase hive pig
PPTX
To SQL or NoSQL, that is the question
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
PPT
No SQL Databases as modern database concepts
PDF
Database Systems - A Historical Perspective
PPTX
NoSQL.pptx
PPTX
UNIT I Introduction to NoSQL.pptx
PPTX
No sq lv2
PPTX
UNIT I Introduction to NoSQL.pptx
PDF
NoSql and it's introduction features-Unit-1.pdf
PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
PPTX
PPTX
Hbase hivepig
PPTX
NoSQL and MongoDB
PPTX
Introduction to No SQL - Learn nosql databases
PPTX
HbaseHivePigbyRohitDubey
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
Introduction to Data Science NoSQL.pptx
Big Data Analytics Module-3 as per vtu syllabus.pptx
Hbase hive pig
To SQL or NoSQL, that is the question
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
No SQL Databases as modern database concepts
Database Systems - A Historical Perspective
NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
No sq lv2
UNIT I Introduction to NoSQL.pptx
NoSql and it's introduction features-Unit-1.pdf
cours database pour etudiant NoSQL (1).pptx
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Hbase hivepig
NoSQL and MongoDB
Introduction to No SQL - Learn nosql databases
HbaseHivePigbyRohitDubey
Ad

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
System and Network Administration Chapter 2
PDF
Understanding Forklifts - TECH EHS Solution
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Transform Your Business with a Software ERP System
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Digital Strategies for Manufacturing Companies
PDF
medical staffing services at VALiNTRY
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
CHAPTER 2 - PM Management and IT Context
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
VVF-Customer-Presentation2025-Ver1.9.pptx
L1 - Introduction to python Backend.pptx
PTS Company Brochure 2025 (1).pdf.......
System and Network Administration Chapter 2
Understanding Forklifts - TECH EHS Solution
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Transform Your Business with a Software ERP System
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How to Choose the Right IT Partner for Your Business in Malaysia
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Reimagine Home Health with the Power of Agentic AI​
Digital Strategies for Manufacturing Companies
medical staffing services at VALiNTRY
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool

Understanding data

  • 2. Outline (Part 1) • What are Structured, unstructured, semi-structured data? • DBMS (Database Management System) • ACID consistency • Distributed DB • Replicas and Sharding • What is No-SQL? • When to use No-SQL and when not to use No-SQL?
  • 3. Outline (Part 2) • GCP options for database • What is managed cloud services • Cloud storage • Cloud SQL • Cloud datastore (no sql) • Cloud big table (no sql)
  • 4. Structured data • Can be stored in DB in tables (rows and columns) • Tables have relational keys • High degree of organization • Database contains a schema • Schema defines tables, tables fields and relationships
  • 5. Click to add text This erd represents one to many relation
  • 6. Unstructured data • Data has no structure • Data has no model • Cannot fit in relational databases • Can be represented as a sequence of bytes • Examples: Images, Videos, files and emails
  • 7. NLP, text analysis and data mining provide methods to find patterns of info in unstructured data Unstructured data
  • 8. Semi-structured data • It is a type of structured data, but it lacks the model • Tags are used to identify certain elements • Data does not have rigid structure • A group of data that belongs to a certain class can have different attributes • Data cannot fit into tables • Examples : JSON, XML
  • 10. DBMS • A DBMS makes it possible for end users to create, read, update and delete data in a database • The DBMS essentially serves as an interface between the database and end users or application programs
  • 11. DBMS
  • 12. Transactions in DBMS Transactions is a group of tasks Example: A bank Employee wants to transfer 500 $ from account X to account Y
  • 13. Transactions in DBMS open_account(A) old_balance = A.balance new_balance = old_balance – 500 A.balance = new_balance close_account(A) open_account(B) old_balance = B.balance new_balance = old_balance + 500 B.balance = new_balance close_account(B) Example
  • 14. A Transaction in DB must maintain ACID consistency
  • 15. ACID • Atomic • Consistency • Isolation • Durability
  • 16. Atomic • A transaction must be treated as atomic unit • All operations must be executed or none • As in the bank transfer example we cannot execute The transaction cannot be partly excuted for example subtract the transferred money from the first account and not send it to the other account
  • 17. Consistency • A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its state before the transaction was started • If database was consistent before transaction then DB must be consistent after transaction
  • 18. Isolation • If more than one transaction is executed at the same time both should be in total isolation of each other
  • 19. Isolation • Example: One day you go to a restaurant and the restaurant had no other customers and you make an order then you will get your order The next day you go to the same restaurant and it was full and you make an order then you should get the same order as the day before
  • 20. Isolation In other words you should get the same order whether another orders are taken place in the restaurant or not
  • 21. Isolation So If you execute T1 alone you will get The same result when you execute T1 while T2 is executing
  • 22. Durability • Guarantees that transaction that have committed will survive permanently • Whatever changes are made to DB those must have affect irrespective to hardware or software failure • Durability can be achieved by flushing the transaction's log records to non-volatile storage before acknowledging commitment.
  • 23. Distributed database •It is a database in which storage devices are not all attached to a common processor. • It may be stored in multiple computers, located in the same physical location; or may be dispersed over a network of interconnected computers
  • 24. Distributes Database Management System •DDBMS sync all data periodically and insures any changes (updates, deletes and additions ) are performed on data in one place will be automatically reflected on data stored elsewhere •User will always see data consistent with data seen by another user
  • 25. Replicas • It is frequent copying data in db from one computer or server to another • So that all users share the same level of information • The result : is distributed DB which portions of DB are stored in multi physical locations and processing is distributed among different nodes
  • 26. Sharding • A database sharding is partitioning of data in DB • Each individual portion is referred to as shard • Breaking BD into much smaller DBs
  • 27. Eventually consistent data • Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value(consistency is maintained later) • Eventually-consistent services are often classified as providing BASE (Basically Available, Soft state, Eventual consistency) semantics, in contrast to traditional ACID (Atomicity, Consistency, Isolation, Durability) guarantees.
  • 28. Cap for distributed DB • System made up of multi nodes (scaled out) communicating with each other over a network
  • 29. CAP Theorem • Consistency : if you write to one node and read from another node then you get what you wrote  Data is consistent across all nodes  All nodes must get the same most recent writes • Availability: when you talk to a node it will respond without a guarantee that respond is the most recent • Partition tolerance: when the network is partitioned it a node fails the system continues to work
  • 30. CAP Theorem says we can only have two of these properties at most in distributed system
  • 32. No-SQL A NoSQL (originally referring to "non SQL" or "non- relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
  • 33. No-SQL A better was to describe no-sql is is Not Only SQL because you can actually use SQL in No-SQL
  • 34. Types and examples of NoSQL databases
  • 36. No-SQL cannot support • Joins • No constraints support (example: null constraints) • No support for complex transactions • Data integrity Example :  Insert 3 records Update 2 records Check something if true rollback
  • 37. No-SQL cannot support data integrity
  • 39. SQL vs NO-SQL SQL NO-SQL (non-relational ) Structured data only Unstructured, semi- structured, structured data Fixed schema Flexible schema Non-scalable scalable Non distributed Distributed transactional Non transactional
  • 40. When to use NO-SQL 1. Storing and retrieving big quantities of data (Big Data) 2. Relationships between elements is not important 3. Dealing with growing lists of elements (social media posts) 4. Unstructured data or the structure of data changes rapidly 5. Constraints and validations of data can be performed in application layer and no need to implement constraints in DB
  • 41. When not to use NO-SQL 1. Complex transactions (bank transfer from one account to another example) 2. Joins must be handled by DB 3. Validations and constraints must be andeled by DB
  • 42. GCP options for database We need persisted durable storage Persistent storage means data doesn’t go away after device is turned off unlike cache and ram
  • 43. Cloud storage for unstructured data Buckets : where you store your data Objects : things you are storing
  • 44. Creating buckets in cloud storage You specify • Name • Class • Location
  • 45. Bucket class in cloud storage 1. Standard : Best latency , highest availability 99.9% 2. Reduced availability : Availability 99% , less expensive 3. Nearline : Higher latency (few more seconds for the first byte) way less expensive if you will access object less than once a month Best for archival scenarios 99 % available
  • 46. Bucket location in data storage 1. Multi regional (boarder geographic area) 2. Regional : corresponding to region supported in other cloud platform service like compute engine
  • 47. Uploading objects to cloud storage 1. From developer console in gcp website 2. From command line using gutil in GC SDK: Can access files in file system and objects stored in amazon S3
  • 48. Cloud SQL for structured data Cloud sql is hosted sql All that you can do with sql can be done with cloud sq Google takes care of keeping the os up- to-date, preforming backups...
  • 49. Cloud SQL for structured data Cloud SQL is a fully-managed database service that makes it easy to set up, maintain, manage, and administer your relational PostgreSQL and MySQL databases in the cloud.
  • 50. Cloud SQL Supports • Rich query language • Primary and secondary indexes • Acid transactions • Relational integrity • Stored procedure
  • 52. Cloud datastore for semi-structured data • Cloud data store is from the no-sql Family • Scales gracefully from very small to very large
  • 54. Cloud datastore for semi-structured data
  • 55. Cloud big table for semi-structured and structured data • Cloud big table is from the no-sql Family • Stores over a terabyte of structured data • Low lever DB • High scalability • Low latency (uses single zone) • Does not scale down to small size