SlideShare a Scribd company logo
(R)Evolution in Database Systems
 RDBMS – The origins

 Concepts, Architecture and Principles

 Golden Age – Way of life.

 Changing Times– New Problems, New Needs

 Attack on the citadel - Revisiting the norms

 Ignited Minds – Working towards NoSQL Solutions

 Way Ahead– It is a Cloudy out there
 Girish Narasimha Raghavan


 Over 15 years experience building distributed, large
 scale and highly available enterprise systems.

 Current interest include build SAC (Social, Big Data
 Analytics, and Cloud) solutions.

 Likes to write and discuss technologies and its
 applications to solve real world problems.
   http://randomtechthought.blogspot.com
RDBMS to NoSQL. An overview.
 In the world data abounds. Always has and always will.
    Record keeping is as old as Human race.
    Consistent quest to improve storing , accessing, and analyzing
     records

 The early machines had serious shortcomings.
    only a very limited amount of program code and data could be stored
     in memory.
    Electromagnetic data storage was feasible only at an extremely high
     cost.

 Storing Data was an issue
    Organizations had to store data – related to Administration,
     Research, Operations.
    Data stored in proprietary formats – Database Systems did not exist
    Plagued by data integrity issues
    Non standard application logic for accessing stored data
 First attempt: File based systems
    Data sets were growing and accumulating.
   Data had to be managed at a detailed transaction level.
   Computing systems started to be used for critical business
     needs.
   Data inconsistency and redundancy.


 Enter Database Systems
   Attempts to standardize the processes and rules to store and
    access data.
   Intention to reuse, resell and redeploy solutions across
     organizations (with significant customizations).
   Attempt to proactively manage Data Integrity and Quality.
 Database Systems and concepts Evolve
   Hierarchical DBMS
      Information represented using parent/child relationships
      Tree structure is primary data structure.

   Network DBMS
      The relationships is represented in form of a network.
      Graph is the primary data structure.

 Challenges Galore
   Hardware Dependency – Software strongly dependent on the
    underlying hardware.
   Modeling challenges – Representing data under a common
    structure.
   Integration issues - Integrating across dependent packages was a
    nightmare.
   Introducing new functionality and updates - Solution providers
    struggled with it across customized software deployment.
Father of the Relational
      Database model

       Edgar F Codd

A British Computer Scientist
    who made significant
contributions to the theory of
  Relational Databases while
       working for IBM.
 Landmark Paper by Codd - “A relational Model of Data for
 large shared Data Banks”.
   Independence of Data from the Hardware- and Storage
    Implementation.
   automatic navigation to the data set through high level
    nonprocedural language for data access.
   Concept of keys (primary, secondary).
   theoretical proposal, no practical design or implementation.


 Codd’s 12 rules for Relational management System
   http://cims.clayton.edu/booth/ITDB%204201/Codd%20PDF.
    pdf
RDBMS to NoSQL. An overview.
Application     Reporting
     1          Solutions




                  Database     Databases
Application     Management       Data
     2        Systems (DBMS)   Strorage




Application      Future
     3         Applications
 Data Definition
   For describing data and data structures for handling the data


 Data Manipulation
   For describing the operations associated with the data like storage, query, change,
     etc.

 Data Security and Integrity
   For ensuring secure and controlled access to storage and manipulation of data.
   For ensuring correctness, consistency and reliability of the data stored .

 Data Recovery and Concurrency
   For providing and enforcing recovery and concurrency controls.

 Data Dictionary
   For providing information about the data stored.
   For Liaisoning between the conceptual and physical storage.

 Performance
    For ensuring all the above mentioned operations are performed efficiently and
     effectively
External/User
How the user access and sees the data
          [Tables, Views]



       Conceptual/Logical
   How data is organized logically
          [Table Spaces]



       Physical/Internal
   How data is stored internally
          [Data Files]
 Relation (Tables)– Set of Tuples that have the
                                    same attributes.

                                 Tuples (Rows) – A Tuple usually represents an
                                    object and information about that object.

                                 Attribute (Columns)– Represent a particular
                                    characteristic of that object
 Domain - A domain describes the set of permitted values for a given attribute.
  It is the set from which the values of an attribute can be defined.
 Constraints - Constraints make it possible to further restrict the domain of an
  attribute. Constraints help in binding the attribute to a set of rules.
 Primary Key - A primary key is a (set of) attribute (s) that uniquely defines a
  relationship within a database.
 Foreign Key - The foreign key can be used to cross-reference tables.

 Cardinality - Expresses the number of instances of the entity to which another
  entity can be associated via a relation
 Index - An index is a mechanism for providing quicker access to data. Indices
  can be created on any combination of attributes on a relation.
 Based on the perception that real world can be modeled around
  base objects (entities) and relationship among them.

 Modeling of data in a top down fashion
   Conceptual Model – The model is the highest and least granular
     model that defines master reference data entities that are
     commonly used in the problem space.

   Logical Model – The model generally builds over the conceptual
     model by adding additional granular details like operational and
     transactional data entities.

   Physical Model - Specifies relational database objects such
     as database tables, database indexes such as unique key indexes,
     and database constraints.

 The models can be visualized through what is commonly known
  as ER-Diagrams.
 Process for organizing the attributes and tables of a relational
  database to minimize redundancy and dependency.
 Objectives (as specified by Codd)
    To free the collection of relations from undesirable insertion, update
     and deletion dependencies.
    To reduce the need for restructuring the collection of relations, as new
     types of data are introduced, and thus increase the life span of
     application programs.
    To make the relational model more informative to users.
    To make the collection of relations neutral to the query statistics, where
     these statistics are liable to change as time goes by.
 Normal Forms (NF)
    1NF - it contains atomic values only
    2NF - 1NF + every non-key attribute is dependent on the primary key
    3NF - 2NF + every non-key attribute is non-transitively dependent on
     the primary key
 Properties that guarantee that database transactions are processed
  reliably.
    Single logical operation (involving multiple steps) is called transaction.
 Properties
    Atomicity – “All or Nothing” – If one part of the transaction fails, entire
     transaction fails.
    Consistency – Any data written to the database must be valid according
     to all defined rules, and constraints.
    Isolation – Even during concurrent executions, the system result in a
     state that is same as the state which will be obtained when executed
     serially.
    Durability - Once a transaction has been committed, the results will
     be stored permanently irrespective of errors and crashes that can occur
     post commit.
 In RDBMS ACID properties are implemented using various
  techniques like locking and Multi Versioning
RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.
 RDBMS based solutions is generally the first choice for
  database storage/access needs

 RDBMS solutions is now mature and predictable.


 An army of skilled specialists exists for using,
  managing and maintaining RDBMS based systems

 RDBMS has spawned an ecosystem of products that
  makes choosing RDBMS as no brainer
 Ensures Consistent behavior
    With the table structure as the base, RDBMS provides a consistent mechanism for
     storing and accessing different data sets.
 Removes Redundancies
    Through Normal forms, redundancies in the data are removed thereby addressing
     the errors that can arise from consistency of the data stored
 Avoid errors
    Ensures Data integrity and quality by ensuring consistent storage, enforcing
     constraints and relationships and with ability to check data as they are entered
 Facilitates Easy analysis
    With the SQL based query as the foundation, analyzing different data set is seamless.
     Also given the history of RDBMS, users are provided with a vast repository of tools to
     perform analysis.
 Ensures Robust Maintenance and Management
    Database administrators are provided with tools that enable them to easily
     maintain, test, repair and back up the databases housed in the system.
 Is Secure
    Offers good level of security and access control. Whole or part of the data can be
      securely shared across multiple users(applications) based on the privileges granted
      to them(it).
RDBMS to NoSQL. An overview.
 Raise of Social Networks during early 2000s
    World Wide Web acts as the foundation

 Shift in communication patterns
    Sharing of personal information and usage of the same
    Everyone turned into a publisher

 Increased focus around personalization
    Recommendations, Ratings, Preferences and providing
     Personalized interfaces
 Big Data Flood
    More data is being generated currently than what was generated till
     now throughout history of human kind
    Need to store and process unstructured or semi structured data at
     volumes previously not anticipated and at frequencies not
     encountered previously
Ref: http://www.go-gulf.com/blog/60-seconds
 Accessible by users across the globe
    Geography is irrelevant
    Facebook, Google, Yahoo, Twitter, etc. have users across the world

 Highly networked and distributed systems
      Systems are accessed and connected over the Internet

 Need to be highly scalable
    Should be able to handle additional load without redesign
    Amazon sees a manifold increase in traffic to the site during the holiday seasons

 Expected to be highly available
    Systems will be available for access and operations always
    Google will incur a huge revenue and credibility loss if the site goes down

 Handle large data sets hitting the systems with high frequency
    The data need to be stored and processed very quickly
    Number of likes and comments on Facebook has exceeded 2.7 billion per day
RDBMS to NoSQL. An overview.
 Brewers CAP Theorem
   You can get only two out of the following three
      Consistency – Same as Atomicity. You get “All or Nothing”
      Availability - Need to be available for operations always
      Partition Tolerance – Need to work when some nodes are not
       accessible.


 RDBMS were essentially designed for CA
   Latency (response times) is an unfortunate tradeoff for
    consistency
   Partition tolerance becomes essential in distributed
    systems
 Beyond a point you cannot afford to Scale up storage
    It becomes very expensive to keep scaling up.

 Is strict consistency really so important?
    Ensuring consistency slows the system
    Google found that moving from a 10-result page loading in 0.4 seconds to
      a 30-result page loading in 0.9 seconds decreased traffic and ad revenues
      by 20% (Linden 2006)
 Redundancy can be managed
    Joins across normalized database tables is less efficient than reading
      from a data store
 Not All data is relational
   Fitting every kind of data under the Rigid Schema structure of RDBMS is
     a challenge
    Data read from RDBMS modeled back in its original model (say tree,
     graph, key value) induces significant stress on computing resources.
    Attributes (columns) are restricted by domain to store similar data.
    Managing semi structured, unstructured data like documents becomes a
     challenge.
 CRUD (Create, Read, Update, Delete) is crude
    Updates and deletes should never be allowed as they destroy
     information.

 Logical and physical separation of concerns ignored
    Relational model is a logical model
    Database products implemented the relational model at the physical
     level as a set of btree files with multiple indexes.
    Induces artificial overhead onto managing the database.


 It is over spinning disks
    All RDBMS implementations assume that the data is coming from the
     disks
    Legacy of an era when memory was expensive.
    Memory based systems will be faster


 Databases are big and slow
   Fundamentally not designed for big data sets
   Long queries get slower with more data
RDBMS to NoSQL. An overview.
 Core Tenets
   Basically Available
       System seem to work all the time
   Soft State
       It doesn’t have to be consistent all the time
   Eventual Consistency
       Becomes consistent eventually (at some later time)


 Significance
   BASE is diametrically opposed to ACID.
       ACID is pessimistic and forces consistency at the end of every operation
       BASE is optimistic and accepts that the database consistency will be in a
        state of flux.
   The availability is achieved through supporting partial failures
     without total system failure
       It is ok for the system to be available for 80% of users and limit failure
        to 20% of the user.
   Users should understand the implication of Eventual Consistency
       Factors in a probability of data loss. Safety of the data is the tradeoff
       Need to understand how eventual is Eventual
 NoSQL – Not Only SQL
    It is not SQL and it is not Relational

 Essential Feature set
    Elastic Scaling – Rely on Scale out rather than Scale up
    Big Data – Handle High Volume, High Velocity, High Variability
    Commoditize Manageability – Reduce dependence on highly skilled
     DBA and lower administration costs
    Economics – Build over commodity hardware
    Flexible data model – Remove data model based restrictions.

 Applicability
      Performance and real time nature over consistency
      High scalability
      Store and retrieve large data sets
      Does not require a relational model
 Key Value
    Idea is to use a hash table where there is a unique key and a pointer to a
     particular item of data. Simplest to implement.
    it is inefficient when you are only interested in querying or updating part
     of a value
 Column Store
    Created to store and process very large amounts of data distributed over
     many machines
    Still keys but they point to multiple columns.
    The columns are arranged by column family.

 Document
   The model is basically versioned documents that are collections of other
     key-value collections.
    The semi-structured documents are stored in formats like JSON.
    allowing nested values associated with each key
    Document databases support querying more efficiently.

 Graph
    flexible graph model is used which, again, can scale across multiple
      machines
Access Interfaces
                                                                Language Specific
REST/HTTP              Thrift                Map Reduce
                                                                      API



                          Logical Data Model
 Key Value       Column Family Store         Document                Graph



                       Support and Distribution
                  Multi Data Center            Dynamic
CAP Support                                                   Proactive Monitoring
                      Support                Provisioning



                            Data Persistence
                                                        Combination of Memory and
  Memory Based                  Disk Based
                                                                  Disk
NoSQL


Key Value         Column Store             Document        Graph


   MemCached           SimpleDB               CouchDB         Neo4J


      Redis             BigTable              MangoDB        InfoGrid


    SimpleDB             Hbase              Lotus Domino     FlockDB


  Tokyo Cabinet        Cassandra                Riak       InfiniteGraph


    Dynamo            HyperTable


    Voldemort          Azure TS
RDBMS to NoSQL. An overview.
 It is not Mature
    RDBMS is mature, stable and functionally rich.
    Most NoSQL alternatives are in pre-production versions with many key
      features yet to be implemented.
 Support
    Nost NoSQL systems are open source projects.
    Support mostly offered by startup companies, with reach and
      credibility not on par with RDBMS Vendors.
 Analytics
    NoSQL databases offer few facilities for ad-hoc query and analysis.
    Even a simple query requires significant programming expertise.
    At present, commonly used BI tools do not provide credible
      connectivity to NoSQL.
 Administration and Maintenance
    The desired goal of zero maintenance is far away.
    In reality significant effort t required to maintain the systems.
 Expertise
    Currently very limited awareness and knowledge
 Scalability
   Master Slave - One master many slaves
      Write to master; Read from any of the slaves
   Partitioning – Group and localize related functions across nodes
      Partition Vertically (by functions) or Horizontally ( by keys)
   Caching - Memory based cache in front of the Database
      Address scaling issues due to read and write loads


 High Availability
   Clustering - Group of systems responsible for a service
      Build redundancy into a cluster to eliminate single points of failure
   Mirroring and Replication – Maintain a hot standby
      Handle planned or unplanned downtimes
   Recovery Solutions - dependable data backup, restore, and
     recovery procedures
       Combine process with tools
 Performance
    Be open to Denormalization – And accelerate reads
        Allow redundancy and duplicates to reduce joins
    Optimize your costly queries- Analyze and optimize the expensive
      queries
        Use a mix of design strategy, indices, and analysis from query optimization tools
    Invest in better hardware – storage and memory
       It is not a bad bet - The storage and memory costs have dropped significantly


 Rigid Schemas – Not all data is relational
    Even the most schema-less model has some schema
        World revolves round the structures
    If Key-Value kind of store is needed, You can do the same in any
      RDBMS
        RDBMS will provide an added advantage of structured access and queries
 Systems eventually will gravitate towards one of these three
   Fast, agile, highly scalable data stores
   Handlers of complex transactional semantics
   Analytical processors and facilitators


 World is never binary
   It is never either this or that.
   Why fight over technicalities


 Drive decisions based on use cases
     Choose a model based on the use cases and scenarios
     Research and understand what your application needs
     Stay away from substituting “Hard work” with “Rhetoric”
     Be open to experimentation
RDBMS to NoSQL. An overview.
   http://www.guug.de/lokal/muenchen/2007-05-14/rdbmsc.pdf
   http://ansonalex.com/infographics/twitter-usage-statistics-2012-infographic/
   http://www.mountainman.com.au/software/history/it1.html
   http://www.slideshare.net/renguzi/codd
   http://cims.clayton.edu/booth/ITDB%204201/Codd%20PDF.pdf
   http://www.scribd.com/doc/19381895/RDBMS-Concepts
   http://www.gitta.info/DBSysConcept/en/text/DBSysConcept.pdf
   http://en.wikipedia.org/wiki/Relational_database
   http://en.wikipedia.org/wiki/ACID
   http://blogs.hbr.org/now-new-next/2009/05/the-social-data-revolution.html
   http://www.go-gulf.com/blog/60-seconds
   http://en.wikipedia.org/wiki/CAP_theorem
   http://highscalability.com/drop-acid-and-think-about-data
   http://queue.acm.org/detail.cfm?id=1394128
   http://www.bailis.org/blog/safety-and-liveness-eventual-consistency-is-not-safe/
   http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
   http://rebelic.nl/engineering/the-four-categories-of-nosql-databases/
   http://www.slideshare.net/ksankar/nosql-4559402
   http://www.thevirtualcircle.com/2008/11/10/6-reasons-why-relational-database-will-be-superseded/
   http://www.slideshare.net/sbtourist/scale-your-database-and-be-happy
   Note:
    Many images used in the deck have been a result of using google image search. Even though, I have not been able to
    mention the sources of all the images individually, I extend my sincere thanks for the owners of the images for making the
    same available on the net

More Related Content

PPTX
No SQL- The Future Of Data Storage
PDF
NoSQL Databases
PDF
MongoDB Backups and PITR
PPTX
Running Airflow Workflows as ETL Processes on Hadoop
PDF
Cluster management with Kubernetes
PDF
Red Hat OpenShift Container Platform Overview
PDF
Red Hat Satellite
PPTX
SAP Extractorのソースエンドポイントとしての利用
No SQL- The Future Of Data Storage
NoSQL Databases
MongoDB Backups and PITR
Running Airflow Workflows as ETL Processes on Hadoop
Cluster management with Kubernetes
Red Hat OpenShift Container Platform Overview
Red Hat Satellite
SAP Extractorのソースエンドポイントとしての利用

What's hot (20)

PPTX
Kubernetes Architecture v1.x
PDF
Introduction to Kubernetes and GKE
PDF
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
PDF
Best Practices with Azure Kubernetes Services
PDF
PostgreSQL High Availability in a Containerized World
PDF
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
PPTX
Automation CICD
PDF
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
PDF
Kubernetes Application Deployment with Helm - A beginner Guide!
PDF
MySQL Shell for DBAs
PDF
Patroni: Kubernetes-native PostgreSQL companion
PDF
Building an MLOps Stack for Companies at Reasonable Scale
PDF
How to Manage Scale-Out Environments with MariaDB MaxScale
PPTX
Oracleからamazon auroraへの移行にむけて
PDF
Ceph Block Devices: A Deep Dive
PDF
Evolution of containers to kubernetes
PPTX
Data lake ppt
ODP
Kubernetes Architecture
PPTX
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
PPTX
Terraform modules restructured
Kubernetes Architecture v1.x
Introduction to Kubernetes and GKE
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Best Practices with Azure Kubernetes Services
PostgreSQL High Availability in a Containerized World
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
Automation CICD
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
Kubernetes Application Deployment with Helm - A beginner Guide!
MySQL Shell for DBAs
Patroni: Kubernetes-native PostgreSQL companion
Building an MLOps Stack for Companies at Reasonable Scale
How to Manage Scale-Out Environments with MariaDB MaxScale
Oracleからamazon auroraへの移行にむけて
Ceph Block Devices: A Deep Dive
Evolution of containers to kubernetes
Data lake ppt
Kubernetes Architecture
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Terraform modules restructured
Ad

Viewers also liked (20)

PPTX
Rdbms
PPT
Rdbms
PDF
Introduction to RDBMS
PPTX
Relational database management system (rdbms) i
PPTX
RDBMS.ppt
PPT
3. Relational Models in DBMS
PDF
Relational Database Management System
KEY
Amazon SimpleDB
PDF
Comparison between rdbms and nosql
PPT
Historical Evolution of RDBMS
PPS
Procedures/functions of rdbms
PPTX
Difference between RDBMS & DBMS
PPTX
Life and work of E.F. (Ted) Codd | Turing100@Persistent
PDF
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
PPTX
library management system in SQL
PPTX
PPTX
Database : Relational Data Model
PPT
Database Management Systems (DBMS)
PDF
Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)
PPTX
Corporate etiquette ppt by rahul kapoliya
Rdbms
Rdbms
Introduction to RDBMS
Relational database management system (rdbms) i
RDBMS.ppt
3. Relational Models in DBMS
Relational Database Management System
Amazon SimpleDB
Comparison between rdbms and nosql
Historical Evolution of RDBMS
Procedures/functions of rdbms
Difference between RDBMS & DBMS
Life and work of E.F. (Ted) Codd | Turing100@Persistent
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
library management system in SQL
Database : Relational Data Model
Database Management Systems (DBMS)
Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)
Corporate etiquette ppt by rahul kapoliya
Ad

Similar to RDBMS to NoSQL. An overview. (20)

PPTX
Database management systems for students
PPTX
database management system anna universityUnit1.pptx
PDF
database introductoin optimization1-app6891.pdf
PPTX
Introduction to Database
PPTX
Dbms Useful PPT
PPT
Dbms models
PDF
Database management systems
PPTX
DIGITAL CONTENT for the help of students.pptx
PDF
23246406 dbms-unit-1
PPT
Database Management System Introduction
PPT
Basics of Database Management System: Key Components
PPTX
Dbms unit i
PPT
27 fcs157al2
PPTX
Database Management System, Lecture-1
DOCX
Database Concepts
PPTX
DBMS and its Models
PPTX
Basic of Database Management System(DBMS)
PPTX
Lecture 1 to 3intro to normalization in database
PPTX
Big data Analytics(BAD601) -module-1 ppt
PPT
Dbms unit01
Database management systems for students
database management system anna universityUnit1.pptx
database introductoin optimization1-app6891.pdf
Introduction to Database
Dbms Useful PPT
Dbms models
Database management systems
DIGITAL CONTENT for the help of students.pptx
23246406 dbms-unit-1
Database Management System Introduction
Basics of Database Management System: Key Components
Dbms unit i
27 fcs157al2
Database Management System, Lecture-1
Database Concepts
DBMS and its Models
Basic of Database Management System(DBMS)
Lecture 1 to 3intro to normalization in database
Big data Analytics(BAD601) -module-1 ppt
Dbms unit01

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
cuic standard and advanced reporting.pdf
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
NewMind AI Monthly Chronicles - July 2025
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
cuic standard and advanced reporting.pdf
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
NewMind AI Monthly Chronicles - July 2025

RDBMS to NoSQL. An overview.

  • 2.  RDBMS – The origins  Concepts, Architecture and Principles  Golden Age – Way of life.  Changing Times– New Problems, New Needs  Attack on the citadel - Revisiting the norms  Ignited Minds – Working towards NoSQL Solutions  Way Ahead– It is a Cloudy out there
  • 3.  Girish Narasimha Raghavan  Over 15 years experience building distributed, large scale and highly available enterprise systems.  Current interest include build SAC (Social, Big Data Analytics, and Cloud) solutions.  Likes to write and discuss technologies and its applications to solve real world problems.  http://randomtechthought.blogspot.com
  • 5.  In the world data abounds. Always has and always will.  Record keeping is as old as Human race.  Consistent quest to improve storing , accessing, and analyzing records  The early machines had serious shortcomings.  only a very limited amount of program code and data could be stored in memory.  Electromagnetic data storage was feasible only at an extremely high cost.  Storing Data was an issue  Organizations had to store data – related to Administration, Research, Operations.  Data stored in proprietary formats – Database Systems did not exist  Plagued by data integrity issues  Non standard application logic for accessing stored data
  • 6.  First attempt: File based systems  Data sets were growing and accumulating.  Data had to be managed at a detailed transaction level.  Computing systems started to be used for critical business needs.  Data inconsistency and redundancy.  Enter Database Systems  Attempts to standardize the processes and rules to store and access data.  Intention to reuse, resell and redeploy solutions across organizations (with significant customizations).  Attempt to proactively manage Data Integrity and Quality.
  • 7.  Database Systems and concepts Evolve  Hierarchical DBMS  Information represented using parent/child relationships  Tree structure is primary data structure.  Network DBMS  The relationships is represented in form of a network.  Graph is the primary data structure.  Challenges Galore  Hardware Dependency – Software strongly dependent on the underlying hardware.  Modeling challenges – Representing data under a common structure.  Integration issues - Integrating across dependent packages was a nightmare.  Introducing new functionality and updates - Solution providers struggled with it across customized software deployment.
  • 8. Father of the Relational Database model Edgar F Codd A British Computer Scientist who made significant contributions to the theory of Relational Databases while working for IBM.
  • 9.  Landmark Paper by Codd - “A relational Model of Data for large shared Data Banks”.  Independence of Data from the Hardware- and Storage Implementation.  automatic navigation to the data set through high level nonprocedural language for data access.  Concept of keys (primary, secondary).  theoretical proposal, no practical design or implementation.  Codd’s 12 rules for Relational management System  http://cims.clayton.edu/booth/ITDB%204201/Codd%20PDF. pdf
  • 11. Application Reporting 1 Solutions Database Databases Application Management Data 2 Systems (DBMS) Strorage Application Future 3 Applications
  • 12.  Data Definition  For describing data and data structures for handling the data  Data Manipulation  For describing the operations associated with the data like storage, query, change, etc.  Data Security and Integrity  For ensuring secure and controlled access to storage and manipulation of data.  For ensuring correctness, consistency and reliability of the data stored .  Data Recovery and Concurrency  For providing and enforcing recovery and concurrency controls.  Data Dictionary  For providing information about the data stored.  For Liaisoning between the conceptual and physical storage.  Performance  For ensuring all the above mentioned operations are performed efficiently and effectively
  • 13. External/User How the user access and sees the data [Tables, Views] Conceptual/Logical How data is organized logically [Table Spaces] Physical/Internal How data is stored internally [Data Files]
  • 14.  Relation (Tables)– Set of Tuples that have the same attributes.  Tuples (Rows) – A Tuple usually represents an object and information about that object.  Attribute (Columns)– Represent a particular characteristic of that object  Domain - A domain describes the set of permitted values for a given attribute. It is the set from which the values of an attribute can be defined.  Constraints - Constraints make it possible to further restrict the domain of an attribute. Constraints help in binding the attribute to a set of rules.  Primary Key - A primary key is a (set of) attribute (s) that uniquely defines a relationship within a database.  Foreign Key - The foreign key can be used to cross-reference tables.  Cardinality - Expresses the number of instances of the entity to which another entity can be associated via a relation  Index - An index is a mechanism for providing quicker access to data. Indices can be created on any combination of attributes on a relation.
  • 15.  Based on the perception that real world can be modeled around base objects (entities) and relationship among them.  Modeling of data in a top down fashion  Conceptual Model – The model is the highest and least granular model that defines master reference data entities that are commonly used in the problem space.  Logical Model – The model generally builds over the conceptual model by adding additional granular details like operational and transactional data entities.  Physical Model - Specifies relational database objects such as database tables, database indexes such as unique key indexes, and database constraints.  The models can be visualized through what is commonly known as ER-Diagrams.
  • 16.  Process for organizing the attributes and tables of a relational database to minimize redundancy and dependency.  Objectives (as specified by Codd)  To free the collection of relations from undesirable insertion, update and deletion dependencies.  To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs.  To make the relational model more informative to users.  To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.  Normal Forms (NF)  1NF - it contains atomic values only  2NF - 1NF + every non-key attribute is dependent on the primary key  3NF - 2NF + every non-key attribute is non-transitively dependent on the primary key
  • 17.  Properties that guarantee that database transactions are processed reliably.  Single logical operation (involving multiple steps) is called transaction.  Properties  Atomicity – “All or Nothing” – If one part of the transaction fails, entire transaction fails.  Consistency – Any data written to the database must be valid according to all defined rules, and constraints.  Isolation – Even during concurrent executions, the system result in a state that is same as the state which will be obtained when executed serially.  Durability - Once a transaction has been committed, the results will be stored permanently irrespective of errors and crashes that can occur post commit.  In RDBMS ACID properties are implemented using various techniques like locking and Multi Versioning
  • 20.  RDBMS based solutions is generally the first choice for database storage/access needs  RDBMS solutions is now mature and predictable.  An army of skilled specialists exists for using, managing and maintaining RDBMS based systems  RDBMS has spawned an ecosystem of products that makes choosing RDBMS as no brainer
  • 21.  Ensures Consistent behavior  With the table structure as the base, RDBMS provides a consistent mechanism for storing and accessing different data sets.  Removes Redundancies  Through Normal forms, redundancies in the data are removed thereby addressing the errors that can arise from consistency of the data stored  Avoid errors  Ensures Data integrity and quality by ensuring consistent storage, enforcing constraints and relationships and with ability to check data as they are entered  Facilitates Easy analysis  With the SQL based query as the foundation, analyzing different data set is seamless. Also given the history of RDBMS, users are provided with a vast repository of tools to perform analysis.  Ensures Robust Maintenance and Management  Database administrators are provided with tools that enable them to easily maintain, test, repair and back up the databases housed in the system.  Is Secure  Offers good level of security and access control. Whole or part of the data can be securely shared across multiple users(applications) based on the privileges granted to them(it).
  • 23.  Raise of Social Networks during early 2000s  World Wide Web acts as the foundation  Shift in communication patterns  Sharing of personal information and usage of the same  Everyone turned into a publisher  Increased focus around personalization  Recommendations, Ratings, Preferences and providing Personalized interfaces  Big Data Flood  More data is being generated currently than what was generated till now throughout history of human kind  Need to store and process unstructured or semi structured data at volumes previously not anticipated and at frequencies not encountered previously
  • 25.  Accessible by users across the globe  Geography is irrelevant  Facebook, Google, Yahoo, Twitter, etc. have users across the world  Highly networked and distributed systems  Systems are accessed and connected over the Internet  Need to be highly scalable  Should be able to handle additional load without redesign  Amazon sees a manifold increase in traffic to the site during the holiday seasons  Expected to be highly available  Systems will be available for access and operations always  Google will incur a huge revenue and credibility loss if the site goes down  Handle large data sets hitting the systems with high frequency  The data need to be stored and processed very quickly  Number of likes and comments on Facebook has exceeded 2.7 billion per day
  • 27.  Brewers CAP Theorem  You can get only two out of the following three  Consistency – Same as Atomicity. You get “All or Nothing”  Availability - Need to be available for operations always  Partition Tolerance – Need to work when some nodes are not accessible.  RDBMS were essentially designed for CA  Latency (response times) is an unfortunate tradeoff for consistency  Partition tolerance becomes essential in distributed systems
  • 28.  Beyond a point you cannot afford to Scale up storage  It becomes very expensive to keep scaling up.  Is strict consistency really so important?  Ensuring consistency slows the system  Google found that moving from a 10-result page loading in 0.4 seconds to a 30-result page loading in 0.9 seconds decreased traffic and ad revenues by 20% (Linden 2006)  Redundancy can be managed  Joins across normalized database tables is less efficient than reading from a data store  Not All data is relational  Fitting every kind of data under the Rigid Schema structure of RDBMS is a challenge  Data read from RDBMS modeled back in its original model (say tree, graph, key value) induces significant stress on computing resources.  Attributes (columns) are restricted by domain to store similar data.  Managing semi structured, unstructured data like documents becomes a challenge.
  • 29.  CRUD (Create, Read, Update, Delete) is crude  Updates and deletes should never be allowed as they destroy information.  Logical and physical separation of concerns ignored  Relational model is a logical model  Database products implemented the relational model at the physical level as a set of btree files with multiple indexes.  Induces artificial overhead onto managing the database.  It is over spinning disks  All RDBMS implementations assume that the data is coming from the disks  Legacy of an era when memory was expensive.  Memory based systems will be faster  Databases are big and slow  Fundamentally not designed for big data sets  Long queries get slower with more data
  • 31.  Core Tenets  Basically Available  System seem to work all the time  Soft State  It doesn’t have to be consistent all the time  Eventual Consistency  Becomes consistent eventually (at some later time)  Significance  BASE is diametrically opposed to ACID.  ACID is pessimistic and forces consistency at the end of every operation  BASE is optimistic and accepts that the database consistency will be in a state of flux.  The availability is achieved through supporting partial failures without total system failure  It is ok for the system to be available for 80% of users and limit failure to 20% of the user.  Users should understand the implication of Eventual Consistency  Factors in a probability of data loss. Safety of the data is the tradeoff  Need to understand how eventual is Eventual
  • 32.  NoSQL – Not Only SQL  It is not SQL and it is not Relational  Essential Feature set  Elastic Scaling – Rely on Scale out rather than Scale up  Big Data – Handle High Volume, High Velocity, High Variability  Commoditize Manageability – Reduce dependence on highly skilled DBA and lower administration costs  Economics – Build over commodity hardware  Flexible data model – Remove data model based restrictions.  Applicability  Performance and real time nature over consistency  High scalability  Store and retrieve large data sets  Does not require a relational model
  • 33.  Key Value  Idea is to use a hash table where there is a unique key and a pointer to a particular item of data. Simplest to implement.  it is inefficient when you are only interested in querying or updating part of a value  Column Store  Created to store and process very large amounts of data distributed over many machines  Still keys but they point to multiple columns.  The columns are arranged by column family.  Document  The model is basically versioned documents that are collections of other key-value collections.  The semi-structured documents are stored in formats like JSON.  allowing nested values associated with each key  Document databases support querying more efficiently.  Graph  flexible graph model is used which, again, can scale across multiple machines
  • 34. Access Interfaces Language Specific REST/HTTP Thrift Map Reduce API Logical Data Model Key Value Column Family Store Document Graph Support and Distribution Multi Data Center Dynamic CAP Support Proactive Monitoring Support Provisioning Data Persistence Combination of Memory and Memory Based Disk Based Disk
  • 35. NoSQL Key Value Column Store Document Graph MemCached SimpleDB CouchDB Neo4J Redis BigTable MangoDB InfoGrid SimpleDB Hbase Lotus Domino FlockDB Tokyo Cabinet Cassandra Riak InfiniteGraph Dynamo HyperTable Voldemort Azure TS
  • 37.  It is not Mature  RDBMS is mature, stable and functionally rich.  Most NoSQL alternatives are in pre-production versions with many key features yet to be implemented.  Support  Nost NoSQL systems are open source projects.  Support mostly offered by startup companies, with reach and credibility not on par with RDBMS Vendors.  Analytics  NoSQL databases offer few facilities for ad-hoc query and analysis.  Even a simple query requires significant programming expertise.  At present, commonly used BI tools do not provide credible connectivity to NoSQL.  Administration and Maintenance  The desired goal of zero maintenance is far away.  In reality significant effort t required to maintain the systems.  Expertise  Currently very limited awareness and knowledge
  • 38.  Scalability  Master Slave - One master many slaves  Write to master; Read from any of the slaves  Partitioning – Group and localize related functions across nodes  Partition Vertically (by functions) or Horizontally ( by keys)  Caching - Memory based cache in front of the Database  Address scaling issues due to read and write loads  High Availability  Clustering - Group of systems responsible for a service  Build redundancy into a cluster to eliminate single points of failure  Mirroring and Replication – Maintain a hot standby  Handle planned or unplanned downtimes  Recovery Solutions - dependable data backup, restore, and recovery procedures  Combine process with tools
  • 39.  Performance  Be open to Denormalization – And accelerate reads  Allow redundancy and duplicates to reduce joins  Optimize your costly queries- Analyze and optimize the expensive queries  Use a mix of design strategy, indices, and analysis from query optimization tools  Invest in better hardware – storage and memory  It is not a bad bet - The storage and memory costs have dropped significantly  Rigid Schemas – Not all data is relational  Even the most schema-less model has some schema  World revolves round the structures  If Key-Value kind of store is needed, You can do the same in any RDBMS  RDBMS will provide an added advantage of structured access and queries
  • 40.  Systems eventually will gravitate towards one of these three  Fast, agile, highly scalable data stores  Handlers of complex transactional semantics  Analytical processors and facilitators  World is never binary  It is never either this or that.  Why fight over technicalities  Drive decisions based on use cases  Choose a model based on the use cases and scenarios  Research and understand what your application needs  Stay away from substituting “Hard work” with “Rhetoric”  Be open to experimentation
  • 42. http://www.guug.de/lokal/muenchen/2007-05-14/rdbmsc.pdf  http://ansonalex.com/infographics/twitter-usage-statistics-2012-infographic/  http://www.mountainman.com.au/software/history/it1.html  http://www.slideshare.net/renguzi/codd  http://cims.clayton.edu/booth/ITDB%204201/Codd%20PDF.pdf  http://www.scribd.com/doc/19381895/RDBMS-Concepts  http://www.gitta.info/DBSysConcept/en/text/DBSysConcept.pdf  http://en.wikipedia.org/wiki/Relational_database  http://en.wikipedia.org/wiki/ACID  http://blogs.hbr.org/now-new-next/2009/05/the-social-data-revolution.html  http://www.go-gulf.com/blog/60-seconds  http://en.wikipedia.org/wiki/CAP_theorem  http://highscalability.com/drop-acid-and-think-about-data  http://queue.acm.org/detail.cfm?id=1394128  http://www.bailis.org/blog/safety-and-liveness-eventual-consistency-is-not-safe/  http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772  http://rebelic.nl/engineering/the-four-categories-of-nosql-databases/  http://www.slideshare.net/ksankar/nosql-4559402  http://www.thevirtualcircle.com/2008/11/10/6-reasons-why-relational-database-will-be-superseded/  http://www.slideshare.net/sbtourist/scale-your-database-and-be-happy  Note: Many images used in the deck have been a result of using google image search. Even though, I have not been able to mention the sources of all the images individually, I extend my sincere thanks for the owners of the images for making the same available on the net