SlideShare a Scribd company logo
BIG DATA ANALYTICS
(21CS71)
STORAGE MODELS
Submitted
by:
Harsha A
(1RR21CS057)
Storage Models


A storage model is the core of any big-data related systems. It affects the scalability,
data-structures, programming and computational models for the systems that are built
on top of any big data-related systems.Understandingabouttheunder-lying storage
model is also the key of understanding the entire spectrum of big-data frameworks.
For addressing different considerations and focus, there has been three main storage
models developed during the past a few decades, namely, Block-based storage, File-
based Storage and Object-basedStorage.
1.1 Block-Based Storage




Block level storage is one of the most classical storage model in computer science. A
traditional block-based storage system presents itself to servers using industry standard
Fiber Channel and iSCSI connectivity mechanisms. Basically, block level storage can
be considered as a hard drive in a server except that the hard drive might be installed in
a remote chassis and is accessible using Fiber Channel or iSCSI.
In addition, for block-based storage, data is stored as blocks which normally have a
fixed size yet with no additional information (metadata). A unique identifier is used to
access each block.
Block based storage focus on performance and scalability to store and access very
largescale data.
As a result, block-hasedstorage is usually used as a low level storage paradigm which
are widely used for higher level storage systems such as File-based systems, Object-
based systems and Transactional Databases, etc.
Architecture
stored as blocks which normally have a fixed size yet with no additional
information (metadata). A unique identifier is used to access each block. The
identifier is mapped to the exact location of actual data blocks through access
interfaces. Traditionally, block-based storage is bound to physical storage
protocols, such as SCSI [4], iSCSI, ATA [5] and SATA [6].
With the development of distributed computing and big data, block-based storage model
are also developed to support distributed and cloud-based environments. As we can see
from the Fig. 3, the architecture of a distributed block-storage system is composed of the
block server and a group of block nodes. The block server is responsible for maintaining
the mapping or indexing from block IDs to the actual data blocks in the block nodes. The
block nodes are responsible for storing the actual data into fixed-size partitions, each of
which is considered as a block.
1.2 File-Based Storage
File-based storage inherits from the traditional file system architecture, considers data as files
that are maintained in a hierarchical structure. It is the most common storage model and is
relatively easy to implement and use. In big data scenario, a file-based storage system could be
built on some other low-level abstraction (such as Block-based and Object-based model) to
improve its performance and scalability.
Architecture
The tile-based storage paradigm is shown in Fig. 4. File paths are organized in a hierarchy and
are used as the entries for accessing data in the physical storage. For a big data scenario,
distributed file systems (DFS) are commonly used as basic storage systems. Figure 5 shows a
typical architecture of a distributed file system which normally contains one or several name
nodes and a bunch of data nodes. The name node is responsible for maintaining the file entries
hierarchy for the entire system while the data nodes are responsible for the persistence of file
data.
In a file based system, a user would need to know of the namespaces and paths in order to
access the stored files. For sharing files across systems, the path or namespace of a file would
include three main parts: the protocol, the domain name and the path of the file. For example, a
HDFS [15] file can be indicated as: "[hdfs://][ServerAddress:ServerPort]/[FilePath]" (Fig. 6).
File-Based Storageis a distributed file system protocol originally developed by Sun
Microsystems. Basically, A Network File System allows remote hosts to mount file systems
over a network and interact with those file systems as though they are mounted locally.
This enables system administrators to consolidate resources onto centralized servers on the
network. NFS is built on the Open Network Computing Remote Procedure Call (ONC
RPC) system. NFS has been widely used in Unix and Linux-based operating systems and
also inspired the development of modern distributed file systems. There have been three
main generations (NFSv2, NFSv3 and NFsv4) for the NFS protocol due to the continuous
development of storage technology and the growth of user requirements. FBSconsists of a
few servers and more clients. The client remotely accesses the data that is stored on the
server machines. In order for this to function properly, a few processes have to be
configured and running. NFS is well-suited for sharing entire file systems with a large
number of known hosts in a transparent manner. However, with case-of-use comes a variety
of potential security problems.
Therefore, NFS also provides two basic options for access control of shared files:
→
First, the server restricts which hosts are allowed to mount which file systems either by
IP address or by host name.
→ Second, the server enforces file system permissions for users on NFS clients in the same
way it does for local users.
Different Storage Models in Big Data Analytics
1.3 Object-Based Storage

The object-based storage model was firstly introduced on Network Attached Secure
devices [17] for providing more flexible data containers objects. For the past decade,
object-based storage has been further developed with further investments being made by
both system vendors such as EMC, HP, IBM and Redhat, ete, and cloud providers such as
Amazon, Microsoft and Google, etc. In the object-based storage model, data is managed as
objects.
As shown in Fig. 7. every object includes the data itself, some meta-data, attributes and a
globally unique object identifier (OID). Object-based storage model abstracts the lower
layers of storage away from the administrators and applications. Object storage systems
can be implemented at different levels, including at the device level, system level and
interface level.
Data is exposed and managed as objects which includes additional descriptive meta-data
that can be used for better indexing or management. Meta-data can be anything from
security, privacy and authentication properties to any applications associated information.


Different Storage Models in Big Data Analytics
Architecture
The typical architecture of an object-based storage system is shown in Fig. 8. As we can see
from the figure, the object-based storage system normally uses a flat namespace, in which the
identifier of data and their locations are usually maintained as key-value pairs in the object
server. In principle, the object server provides location-independent addressing and constant
lookup latency for reading every object. In addition, meta-data of the data is separated from
data and is also maintained as objects in a meta-data server (might be co-located with the
object server).
As a result, it provides a standard and easier way of processing, analyzing and manipulating
of the meta-data without affecting the data itself. Due to the flat architecture, it is very easy to
scale out object-based storage systems by adding additional storage nodes to the system.
Besides, the added storage can be automatically expanded as capacity that is available for all
users. Drawing on the object container and meta-data maintained, it is also able to provide
much more flexible and line-grained data policies at different levels, for example, Amazon S3
[18] provides bucket level policy. Azure (19) provides storage account level policy, Atmos
[20] provides per-object policy.
1.4 Comparison of Storage Models



In practice, there is no perfect model which can suit all possible scenarios. Therefore,
developers and users should choose the storage models according to their application
requirements and context. Basically, cachof the storage model that we have discussed. in
this section has its own pros and cons.
Block-based storage is famous for its flexibility, versatility and simplicity. In a block level
storage system, raw storage volumes (composed of a set of blocks) are created, and then the
server-based system connects to these volumes and uses them as individual storage drives.
This makes block-based storage usable for almost any kind of applications, including file
storage, database storage, virtual machine file system (VMFS) volumes, and more.
Block-based storage can be also used for data-sharing scenarios. After creating block-based
volumes, they can be logically connected or migrated between different user spaces.
Therefore, users can use these overlapped block volumes for sharing data between each
other.
Summary of Data Storage Models
As a result, the main features of each storage model can be summarized as
shown. in Table 1. Generally, block-based storage has a fixed size for each storage
unit while file-based and object-based models can have various sizes of storage
unit based on application requirements. In addition, file-based models use the file-
based directory to locate the data whilst block-based and object-based models
both reply on a global identifier for locating data. Furthermore, both block-based
and object-based models have flat scalability while file-based storage may be
limited by its hierarchical indexing structure. Lastly, block-based storage can
normally guarantee a strong consistency while for file-based and object-based
models the consistency model is configurable for different scenarios.

More Related Content

PPTX
storage system, iscsi,file storage, NAS, SAS
PPTX
What is Object storage ?
PPTX
Survey of distributed storage system
PDF
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PDF
Distributed File Systems
PDF
Authenticated Key Exchange Protocols for Parallel Network File Systems
DOCX
Block Level Storage Vs File Level Storage
PPTX
CLOUD COMPUTING TECHNIQUES - This course will enable the students to learn th...
storage system, iscsi,file storage, NAS, SAS
What is Object storage ?
Survey of distributed storage system
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
Distributed File Systems
Authenticated Key Exchange Protocols for Parallel Network File Systems
Block Level Storage Vs File Level Storage
CLOUD COMPUTING TECHNIQUES - This course will enable the students to learn th...

Similar to Different Storage Models in Big Data Analytics (20)

PDF
Authenticated key exchange protocols for parallel network file systems
PPT
distributedfilesystems-dfs-210408175123.ppt
PPT
File Distribution System for Operating S
PDF
final-unit-ii-cc-cloud computing-2022.pdf
PDF
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
PDF
IRJET- Distributed Decentralized Data Storage using IPFS
PPTX
Chapter 7
PPT
Distributed file systems dfs
PDF
A Strategy for Improving the Performance of Small Files in Openstack Swift
PDF
IRJET- A Survey on File Storage and Retrieval using Blockchain Technology
PDF
CS9222 ADVANCED OPERATING SYSTEMS
PDF
DataCore vFilO lets users locate files across multiple NAS systems with a sin...
PPTX
Basic SQL for Bcom Business Analytics.pptx
PPTX
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
PDF
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
PDF
Iaetsd time constrained self-destructing
PDF
Storage Device & Usage Monitor in Cloud Computing.pdf
PDF
Survey of clustered_parallel_file_systems_004_lanl.ppt
PPTX
Software Defined storage
Authenticated key exchange protocols for parallel network file systems
distributedfilesystems-dfs-210408175123.ppt
File Distribution System for Operating S
final-unit-ii-cc-cloud computing-2022.pdf
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
IRJET- Distributed Decentralized Data Storage using IPFS
Chapter 7
Distributed file systems dfs
A Strategy for Improving the Performance of Small Files in Openstack Swift
IRJET- A Survey on File Storage and Retrieval using Blockchain Technology
CS9222 ADVANCED OPERATING SYSTEMS
DataCore vFilO lets users locate files across multiple NAS systems with a sin...
Basic SQL for Bcom Business Analytics.pptx
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
Iaetsd time constrained self-destructing
Storage Device & Usage Monitor in Cloud Computing.pdf
Survey of clustered_parallel_file_systems_004_lanl.ppt
Software Defined storage
Ad

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Modernizing your data center with Dell and AMD
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
KodekX | Application Modernization Development
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
A Presentation on Artificial Intelligence
PPTX
Understanding_Digital_Forensics_Presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Modernizing your data center with Dell and AMD
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Approach and Philosophy of On baking technology
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KodekX | Application Modernization Development
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Reach Out and Touch Someone: Haptics and Empathic Computing
A Presentation on Artificial Intelligence
Understanding_Digital_Forensics_Presentation.pptx
The AUB Centre for AI in Media Proposal.docx
Ad

Different Storage Models in Big Data Analytics

  • 1. BIG DATA ANALYTICS (21CS71) STORAGE MODELS Submitted by: Harsha A (1RR21CS057)
  • 2. Storage Models   A storage model is the core of any big-data related systems. It affects the scalability, data-structures, programming and computational models for the systems that are built on top of any big data-related systems.Understandingabouttheunder-lying storage model is also the key of understanding the entire spectrum of big-data frameworks. For addressing different considerations and focus, there has been three main storage models developed during the past a few decades, namely, Block-based storage, File- based Storage and Object-basedStorage.
  • 3. 1.1 Block-Based Storage     Block level storage is one of the most classical storage model in computer science. A traditional block-based storage system presents itself to servers using industry standard Fiber Channel and iSCSI connectivity mechanisms. Basically, block level storage can be considered as a hard drive in a server except that the hard drive might be installed in a remote chassis and is accessible using Fiber Channel or iSCSI. In addition, for block-based storage, data is stored as blocks which normally have a fixed size yet with no additional information (metadata). A unique identifier is used to access each block. Block based storage focus on performance and scalability to store and access very largescale data. As a result, block-hasedstorage is usually used as a low level storage paradigm which are widely used for higher level storage systems such as File-based systems, Object- based systems and Transactional Databases, etc.
  • 4. Architecture stored as blocks which normally have a fixed size yet with no additional information (metadata). A unique identifier is used to access each block. The identifier is mapped to the exact location of actual data blocks through access interfaces. Traditionally, block-based storage is bound to physical storage protocols, such as SCSI [4], iSCSI, ATA [5] and SATA [6].
  • 5. With the development of distributed computing and big data, block-based storage model are also developed to support distributed and cloud-based environments. As we can see from the Fig. 3, the architecture of a distributed block-storage system is composed of the block server and a group of block nodes. The block server is responsible for maintaining the mapping or indexing from block IDs to the actual data blocks in the block nodes. The block nodes are responsible for storing the actual data into fixed-size partitions, each of which is considered as a block.
  • 6. 1.2 File-Based Storage File-based storage inherits from the traditional file system architecture, considers data as files that are maintained in a hierarchical structure. It is the most common storage model and is relatively easy to implement and use. In big data scenario, a file-based storage system could be built on some other low-level abstraction (such as Block-based and Object-based model) to improve its performance and scalability. Architecture The tile-based storage paradigm is shown in Fig. 4. File paths are organized in a hierarchy and are used as the entries for accessing data in the physical storage. For a big data scenario, distributed file systems (DFS) are commonly used as basic storage systems. Figure 5 shows a typical architecture of a distributed file system which normally contains one or several name nodes and a bunch of data nodes. The name node is responsible for maintaining the file entries hierarchy for the entire system while the data nodes are responsible for the persistence of file data. In a file based system, a user would need to know of the namespaces and paths in order to access the stored files. For sharing files across systems, the path or namespace of a file would include three main parts: the protocol, the domain name and the path of the file. For example, a HDFS [15] file can be indicated as: "[hdfs://][ServerAddress:ServerPort]/[FilePath]" (Fig. 6).
  • 7. File-Based Storageis a distributed file system protocol originally developed by Sun Microsystems. Basically, A Network File System allows remote hosts to mount file systems over a network and interact with those file systems as though they are mounted locally. This enables system administrators to consolidate resources onto centralized servers on the network. NFS is built on the Open Network Computing Remote Procedure Call (ONC RPC) system. NFS has been widely used in Unix and Linux-based operating systems and also inspired the development of modern distributed file systems. There have been three main generations (NFSv2, NFSv3 and NFsv4) for the NFS protocol due to the continuous development of storage technology and the growth of user requirements. FBSconsists of a few servers and more clients. The client remotely accesses the data that is stored on the server machines. In order for this to function properly, a few processes have to be configured and running. NFS is well-suited for sharing entire file systems with a large number of known hosts in a transparent manner. However, with case-of-use comes a variety of potential security problems. Therefore, NFS also provides two basic options for access control of shared files: → First, the server restricts which hosts are allowed to mount which file systems either by IP address or by host name. → Second, the server enforces file system permissions for users on NFS clients in the same way it does for local users.
  • 9. 1.3 Object-Based Storage  The object-based storage model was firstly introduced on Network Attached Secure devices [17] for providing more flexible data containers objects. For the past decade, object-based storage has been further developed with further investments being made by both system vendors such as EMC, HP, IBM and Redhat, ete, and cloud providers such as Amazon, Microsoft and Google, etc. In the object-based storage model, data is managed as objects. As shown in Fig. 7. every object includes the data itself, some meta-data, attributes and a globally unique object identifier (OID). Object-based storage model abstracts the lower layers of storage away from the administrators and applications. Object storage systems can be implemented at different levels, including at the device level, system level and interface level. Data is exposed and managed as objects which includes additional descriptive meta-data that can be used for better indexing or management. Meta-data can be anything from security, privacy and authentication properties to any applications associated information.  
  • 11. Architecture The typical architecture of an object-based storage system is shown in Fig. 8. As we can see from the figure, the object-based storage system normally uses a flat namespace, in which the identifier of data and their locations are usually maintained as key-value pairs in the object server. In principle, the object server provides location-independent addressing and constant lookup latency for reading every object. In addition, meta-data of the data is separated from data and is also maintained as objects in a meta-data server (might be co-located with the object server). As a result, it provides a standard and easier way of processing, analyzing and manipulating of the meta-data without affecting the data itself. Due to the flat architecture, it is very easy to scale out object-based storage systems by adding additional storage nodes to the system. Besides, the added storage can be automatically expanded as capacity that is available for all users. Drawing on the object container and meta-data maintained, it is also able to provide much more flexible and line-grained data policies at different levels, for example, Amazon S3 [18] provides bucket level policy. Azure (19) provides storage account level policy, Atmos [20] provides per-object policy.
  • 12. 1.4 Comparison of Storage Models    In practice, there is no perfect model which can suit all possible scenarios. Therefore, developers and users should choose the storage models according to their application requirements and context. Basically, cachof the storage model that we have discussed. in this section has its own pros and cons. Block-based storage is famous for its flexibility, versatility and simplicity. In a block level storage system, raw storage volumes (composed of a set of blocks) are created, and then the server-based system connects to these volumes and uses them as individual storage drives. This makes block-based storage usable for almost any kind of applications, including file storage, database storage, virtual machine file system (VMFS) volumes, and more. Block-based storage can be also used for data-sharing scenarios. After creating block-based volumes, they can be logically connected or migrated between different user spaces. Therefore, users can use these overlapped block volumes for sharing data between each other.
  • 13. Summary of Data Storage Models As a result, the main features of each storage model can be summarized as shown. in Table 1. Generally, block-based storage has a fixed size for each storage unit while file-based and object-based models can have various sizes of storage unit based on application requirements. In addition, file-based models use the file- based directory to locate the data whilst block-based and object-based models both reply on a global identifier for locating data. Furthermore, both block-based and object-based models have flat scalability while file-based storage may be limited by its hierarchical indexing structure. Lastly, block-based storage can normally guarantee a strong consistency while for file-based and object-based models the consistency model is configurable for different scenarios.