SlideShare a Scribd company logo
2
Most read
7
Most read
9
Most read
RDBMS Vs Hadoop Vs Spark
Relational Database Management System
• An RDBMS, or relational database management system, is
software that allows users to update, query, and manage
relational databases. Structured Query Language (SQL) is the
most common programming language used to access a
database. The SQL standard has been modified to allow for
the storage, retrieval, and publication of JSON data within a
relational database, providing greater flexibility.
• The most fundamental RDBMS functions are related to create,
read, update, and delete operations, which are referred to
collectively as CRUD. They serve as the foundation for a well-
organized system that promotes consistent data treatment.
RDBMS
Hadoop
• Apache Hadoop is a set of open-source software utilities that
allows you to solve problems involving massive amounts of data
and computation by utilizing a network of many computers. It
provides a software framework for distributed big data storage
and processing based on the MapReduce programming model.
• The core of Apache Hadoop is made up of a storage component
known as Hadoop Distributed File System (HDFS) and a
processing component that uses the MapReduce programming
model. Hadoop divides files into large blocks and distributes
them across cluster nodes. It then distributes packaged code to
nodes in order for the data to be processed in parallel. This
method makes use of data locality.
HADOOP
Spark
• Apache Spark is a free and open-source unified analytics
engine for processing large amounts of data. Spark provides a
programming interface for entire clusters with implicit data
parallelism and fault tolerance.
• Apache Spark necessitates the use of a cluster manager and
a distributed storage system. Spark supports standalone
(native Spark cluster) cluster management, where you can
launch a cluster either manually or using the launch scripts
provided by the install package. These daemons can also be
run on a single machine for testing), Hadoop YARN, Apache
Mesos, or Kubernetes.
SPARK
RDBMS HADOOP SPARK
RDBMS Vs Hadoop Vs Spark
Data
Variety
Data
Storage
Used for Average Data
sets (in GBs)
Used for Large Data
sets (TBs and PBs)
Used for Large Data
sets (TBs and PBs)
SQL Language Spark SQL
Querying
HQL (hive Query
Language)
Used for structured
Data Only
Used for Semi
Structured,
Unstructured and
Structured Data
Used for Semi
Structured,
Unstructured and
Structured Data
RDBMS Vs Hadoop Vs Spark
Schema
Required on Write
(Static Schema)
Required on Read
(Dynamic Schema)
License Free
Cost
Speed Reads are Fast
Both Reads and Writes
are fast
More than 100 times
faster than Hadoop in
some cases
Required on Read
(Dynamic Schema)
RDBMS HADOOP SPARK
Free
Works on Relational
Tables
Works on Key Value
Pair
Resilient Distributed
Datasets (RDDs)
RDBMS Vs Hadoop Vs Spark
Data
Objects
Hardware
Profile
High End Profiles
Commodity/ Utility
Harware
High End Profiles
Used
Cases
OLTP (Online
transaction
processing)
Analytics (Audio,
video, logs etc), Data
Discovery
Streaming Data, Machine
Learning, Fog
Computing, interactive
analyses
RDBMS HADOOP SPARK
RDBMS
• Maintainability: allows database
admins to maintain, control,
update data into the database
easily
• Flexibility: saves a lot of time as
updating data in one place is
enough
• Data Structure: stores data in
tabular format, easily understood
by users, organized data
• Privileges: allows database
administrators to control
activities over the database
• Data Safety: data will be safe
when the program crashes by
authorization codes, other
security layers
HADOOP
• Scalable: it can store and
distribute very large data sets
• Cost-Effective: The raw data
would be deleted, as it would be
too cost-prohibitive to keep
• Flexible: easy access to new
data sources and tap into
different types of data
• Fast: unique storage method is
based on a distributed file
system that basically ‘maps’
data
• Resilient to failure: in the event
of failure, there is another copy
available for use.
SPARK
• Speed: 100 times faster than
Hadoop for large scale data
processing
• Ease of use: easy to use AAPIs
for operating on large datasets
• Advanced Analytics: It supports
Machine learning (ML), Graph
algorithms, Streaming data,
SQL queries, etc.
• Dynamic: easy to develop
parallel applications
• Multilingual: supports many
languages for code writing such
as Python, Java, Scala, etc.
• Powerful: can handle many
analytics challenges
RDBMS Vs Hadoop Vs Spark
Benefits
RDBMS
• Software is expensive
• Complex software refers to
expensive hardware and hence
increases overall cost to avail
the RDBMS service
• It requires skilled human
resources to implement
• Certain applications are slow in
processing
• It is difficult to recover the lost
data
HADOOP
• Fails when it needs to access the
small size file in a large amount
• It is a framework in java, which
makes it more insecure as it can
be easily exploited by any the
cyber-criminal
• Its efficiency decreases while
performing in small data
surroundings
• It uses Kerberos for security
features that are not easy to
manage. Storage and network
encryption are missing in Kerberos
which makes us more concerned
about it
SPARK
• No file management system in
Apache Spark, which need to be
integrated with other platforms
• Doesn’t support real-time data
stream processing fully.
• Not easy to keep data in memory
when we talk about the cost-
efficient processing of big data
• There is a problem with small files
when we use Spark with Hadoop
• The latency of Apache Spark is
higher which results in lower
throughput.
RDBMS Vs Hadoop Vs Spark
Limitations
Thank
You

More Related Content

PDF
Hadoop YARN
PPTX
Introduction To HBase
PPTX
Challenges of Conventional Systems.pptx
PDF
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
PPTX
Data warehouse architecture
PPT
Amazon Simpledb
PDF
Lecture6 introduction to data streams
PPSX
Hadoop YARN
Introduction To HBase
Challenges of Conventional Systems.pptx
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Data warehouse architecture
Amazon Simpledb
Lecture6 introduction to data streams

What's hot (20)

PPTX
Shadow paging
PDF
Apache Hadoop and HBase
DOCX
Information Storage and Management notes ssmeena
PPTX
Cloud Service Models
PPTX
Grid protocol architecture
PDF
Hadoop Ecosystem
PDF
SDN-ppt-new
PPTX
Cloudera Hadoop Distribution
PPTX
Hadoop File system (HDFS)
PDF
Hadoop Overview & Architecture
 
PPT
Map reduce in BIG DATA
PPT
Database Systems Concepts, 5th Ed
PPT
Hive(ppt)
PDF
Big Data: Its Characteristics And Architecture Capabilities
PPTX
Unit 6 - Compression and Serialization in Hadoop.pptx
PPTX
Data warehousing ppt
PPT
Virtualization.ppt
PPTX
Fundamental Cloud Security
PPTX
AWS Elastic Compute Cloud (EC2)
PDF
Shadow paging
Apache Hadoop and HBase
Information Storage and Management notes ssmeena
Cloud Service Models
Grid protocol architecture
Hadoop Ecosystem
SDN-ppt-new
Cloudera Hadoop Distribution
Hadoop File system (HDFS)
Hadoop Overview & Architecture
 
Map reduce in BIG DATA
Database Systems Concepts, 5th Ed
Hive(ppt)
Big Data: Its Characteristics And Architecture Capabilities
Unit 6 - Compression and Serialization in Hadoop.pptx
Data warehousing ppt
Virtualization.ppt
Fundamental Cloud Security
AWS Elastic Compute Cloud (EC2)
Ad

Similar to Comparison - RDBMS vs Hadoop vs Apache (20)

PDF
Big Data: RDBMS vs. Hadoop vs. Spark
PDF
RDBMS vs Hadoop vs Spark
PPTX
finap ppt conference.pptx
PPTX
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
PDF
spark_v1_2
PPTX
Intro to Apache Spark by CTO of Twingo
PDF
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
PPTX
Big data overview
PDF
SparkPaper
PPT
Big_data_analytics_NoSql_Module-4_Session
PDF
Hadoop vs spark
PPTX
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
PPTX
Apache spark installation [autosaved]
PDF
New Analytics Toolbox DevNexus 2015
PDF
Introduction to Apache Spark
PDF
[@NaukriEngineering] Apache Spark
PDF
Intro to Spark and Spark SQL
PPTX
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
PDF
Spark SQL
Big Data: RDBMS vs. Hadoop vs. Spark
RDBMS vs Hadoop vs Spark
finap ppt conference.pptx
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
spark_v1_2
Intro to Apache Spark by CTO of Twingo
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
Big data overview
SparkPaper
Big_data_analytics_NoSql_Module-4_Session
Hadoop vs spark
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
Apache spark installation [autosaved]
New Analytics Toolbox DevNexus 2015
Introduction to Apache Spark
[@NaukriEngineering] Apache Spark
Intro to Spark and Spark SQL
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
Spark SQL
Ad

Recently uploaded (20)

PPTX
history of c programming in notes for students .pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
L1 - Introduction to python Backend.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
AI in Product Development-omnex systems
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
System and Network Administration Chapter 2
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Understanding Forklifts - TECH EHS Solution
PDF
System and Network Administraation Chapter 3
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
history of c programming in notes for students .pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
L1 - Introduction to python Backend.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
AI in Product Development-omnex systems
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Design an Analysis of Algorithms II-SECS-1021-03
System and Network Administration Chapter 2
Odoo POS Development Services by CandidRoot Solutions
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
ai tools demonstartion for schools and inter college
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Understanding Forklifts - TECH EHS Solution
System and Network Administraation Chapter 3
Upgrade and Innovation Strategies for SAP ERP Customers
Which alternative to Crystal Reports is best for small or large businesses.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Internet Downloader Manager (IDM) Crack 6.42 Build 41

Comparison - RDBMS vs Hadoop vs Apache

  • 1. RDBMS Vs Hadoop Vs Spark
  • 2. Relational Database Management System • An RDBMS, or relational database management system, is software that allows users to update, query, and manage relational databases. Structured Query Language (SQL) is the most common programming language used to access a database. The SQL standard has been modified to allow for the storage, retrieval, and publication of JSON data within a relational database, providing greater flexibility. • The most fundamental RDBMS functions are related to create, read, update, and delete operations, which are referred to collectively as CRUD. They serve as the foundation for a well- organized system that promotes consistent data treatment. RDBMS
  • 3. Hadoop • Apache Hadoop is a set of open-source software utilities that allows you to solve problems involving massive amounts of data and computation by utilizing a network of many computers. It provides a software framework for distributed big data storage and processing based on the MapReduce programming model. • The core of Apache Hadoop is made up of a storage component known as Hadoop Distributed File System (HDFS) and a processing component that uses the MapReduce programming model. Hadoop divides files into large blocks and distributes them across cluster nodes. It then distributes packaged code to nodes in order for the data to be processed in parallel. This method makes use of data locality. HADOOP
  • 4. Spark • Apache Spark is a free and open-source unified analytics engine for processing large amounts of data. Spark provides a programming interface for entire clusters with implicit data parallelism and fault tolerance. • Apache Spark necessitates the use of a cluster manager and a distributed storage system. Spark supports standalone (native Spark cluster) cluster management, where you can launch a cluster either manually or using the launch scripts provided by the install package. These daemons can also be run on a single machine for testing), Hadoop YARN, Apache Mesos, or Kubernetes. SPARK
  • 5. RDBMS HADOOP SPARK RDBMS Vs Hadoop Vs Spark Data Variety Data Storage Used for Average Data sets (in GBs) Used for Large Data sets (TBs and PBs) Used for Large Data sets (TBs and PBs) SQL Language Spark SQL Querying HQL (hive Query Language) Used for structured Data Only Used for Semi Structured, Unstructured and Structured Data Used for Semi Structured, Unstructured and Structured Data
  • 6. RDBMS Vs Hadoop Vs Spark Schema Required on Write (Static Schema) Required on Read (Dynamic Schema) License Free Cost Speed Reads are Fast Both Reads and Writes are fast More than 100 times faster than Hadoop in some cases Required on Read (Dynamic Schema) RDBMS HADOOP SPARK Free
  • 7. Works on Relational Tables Works on Key Value Pair Resilient Distributed Datasets (RDDs) RDBMS Vs Hadoop Vs Spark Data Objects Hardware Profile High End Profiles Commodity/ Utility Harware High End Profiles Used Cases OLTP (Online transaction processing) Analytics (Audio, video, logs etc), Data Discovery Streaming Data, Machine Learning, Fog Computing, interactive analyses RDBMS HADOOP SPARK
  • 8. RDBMS • Maintainability: allows database admins to maintain, control, update data into the database easily • Flexibility: saves a lot of time as updating data in one place is enough • Data Structure: stores data in tabular format, easily understood by users, organized data • Privileges: allows database administrators to control activities over the database • Data Safety: data will be safe when the program crashes by authorization codes, other security layers HADOOP • Scalable: it can store and distribute very large data sets • Cost-Effective: The raw data would be deleted, as it would be too cost-prohibitive to keep • Flexible: easy access to new data sources and tap into different types of data • Fast: unique storage method is based on a distributed file system that basically ‘maps’ data • Resilient to failure: in the event of failure, there is another copy available for use. SPARK • Speed: 100 times faster than Hadoop for large scale data processing • Ease of use: easy to use AAPIs for operating on large datasets • Advanced Analytics: It supports Machine learning (ML), Graph algorithms, Streaming data, SQL queries, etc. • Dynamic: easy to develop parallel applications • Multilingual: supports many languages for code writing such as Python, Java, Scala, etc. • Powerful: can handle many analytics challenges RDBMS Vs Hadoop Vs Spark Benefits
  • 9. RDBMS • Software is expensive • Complex software refers to expensive hardware and hence increases overall cost to avail the RDBMS service • It requires skilled human resources to implement • Certain applications are slow in processing • It is difficult to recover the lost data HADOOP • Fails when it needs to access the small size file in a large amount • It is a framework in java, which makes it more insecure as it can be easily exploited by any the cyber-criminal • Its efficiency decreases while performing in small data surroundings • It uses Kerberos for security features that are not easy to manage. Storage and network encryption are missing in Kerberos which makes us more concerned about it SPARK • No file management system in Apache Spark, which need to be integrated with other platforms • Doesn’t support real-time data stream processing fully. • Not easy to keep data in memory when we talk about the cost- efficient processing of big data • There is a problem with small files when we use Spark with Hadoop • The latency of Apache Spark is higher which results in lower throughput. RDBMS Vs Hadoop Vs Spark Limitations