SlideShare a Scribd company logo
Data Structures,
File organization,
Physical Data
design
Data structures and file organisation refer to the methods of organising the data in a database
They primarily deal with physical storage of data, which assumes significance in retrieving,
storing and re-organising data in a database
Linked lists, inverted lists, B-trees and hash tables, among others
Data structures can be used to build data files (a data file or a file is a collection of many similar
records) and file organisation determines access methods for the file
File organisation (or file structure) is a combination of representations for data in files and of
operations for accessing the data. A file structure allows applications to read, write, and modify
data
Why Data Structure
The key factor in designing data structures and file organisation is the relatively slow speed of
hard disks and large amount of time that is required to get information from a disk.
All the data structures and file organisation designs focus on minimising disk accesses and
maximising the likelihood that the information the user will want is already in the memory.
The constraint related to disk access is generally referred to as I/O bottleneck.
Accessing information using multiple trips to the disk greatly slows down the access time.
Ideally, we should get the information we need with one access to the disk or with as few
accesses as possible.
FILES
files grew intolerably large for unaided sequential access, indexes were added to the files.
The INDEXES made it possible to keep a list of keys and pointers in a small file that could be
searched more quickly. With the keys and pointers the user had direct access to the large,
primary file.
However, as the indexes grew, they too became difficult to manage, especially for dynamic files
in which the set of keys changes.
Then, in the early 1960s the idea Data Structures Emerged
Memory
Primary storage: Pertains to storage media used by Central Processing Unit (CPU) i.e., the main
memory and also the cache memory.
The primary storage memory also called RAM (Random Access Memory) provides fast access to
data and is volatile i.e., loses its content in case of a power outage.
Secondary storage: Includes magnetic disks, optical disks and tapes. Secondary storage memory
provides slower access to data than RAM.
Static RAM (SRAM) which is cache memory is used by CPU to speed up execution of
programmes while Dynamic RAM (DRAM) provides the main work area for CPU.
Flash memory which is non-volatile and called EEPROM (Electrically Erasable
Programmable Read-Only-Memory) has access speed and performance between
DRAM and magnetic disks.
CD-ROM (Compact Disk Read-Only-Memory) disks store data optically and are read
by a laser.
WORM (Write-Once-Read-Memory) disks are used for archiving data and allow data
to be written once and read any number of times.
DVD (Digital Video Disk) – a type of optical disk allows storage of four to fifteen
gigabytes of data per disk.
Magnetic tapes are used for archiving and back-up storage and are becoming
popular as tertiary storage to hold terabytes of data.
Juke boxes (optical and tape) are employed to use arrays of CD-ROMs and tapes.
RAID Technology(RedundantArrayofInexpensive/IndependentDisks)
The main goal of RAID is to even out the widely different rates of performance
improvement of disks against those in memory and microprocessors
In addition to improving performance, RAID is also used to improve reliability by
storing redundant information on disks. One technique for introducing
redundancy is called mirroring. Data is written redundantly to two identical
physical disks that are treated as one logical disk. If a disk fails, the other is used
until the first is repaired
The problem of speed and access time is overcome by using a large array of
small independent disks acting as a single high-performance logical disk. A
concept called data striping is used, which utilises parallelism to improve disk
performance
DATA STRUCTURES
The term Data Structure refers to the manner in which relationships between data elements are
represented in the computer system. Organisation of indexes, representation of stored fields,
physical sequence of stored records, etc., are included in the purview of data structures. Thus,
an understanding of data structures is important in gaining an understanding of database
management systems
There are three major types of data structures : linked lists (indexes), inverted lists (indexes) and
B-trees
Indexes
An index is a file in which each entry (record) consists of a data value together with one or more
pointers (physical storage addresses). The data value is a value for some field of the indexed file
(the indexed field) and pointers identify records in the indexed file having that value for that
field.
An index can be used in two ways.
First, it can be used for sequential access to the indexed file i.e. access according to the values of
the indexed field by imposing an ordering of the indexed file.
Second, it can also be used for direct access to individual
3620121datastructures.ppt
Primary index , Clustering Indexes, Secondary Indexes
A primary index is a file that contains a sorted sequence of records having
two columns:
the ordering key field; and a block address
An entry in primary index file contains the index value of the first record of the data block and a
pointer to that data block.
.
ordering key field for this index can be
the primary key of the data file
Example of Primary Index file
Clustering Indexes
An index that is created on an ordered file whose records of a file are physically ordered on a
non-key field (that is the field does not have a distinct value for each record) is called a
clustering index.
secondary index
A secondary index is a file that contains records containing a secondary index field
value which is not the ordering field of the data file, and a pointer to the block that
contains the data record.
Linked Lists
A simple linked list is a chain of pointers embedded in records. It indicates either a record sequence for
an attribute other than the primary key or all the records with a common property. With a linked list,
any data element can be stored separately. A pointer is then used to link to the next data item.
Inverted Lists
Inverted lists may be viewed simply as index tables of pointers stored separately from the data
records rather than embedded in pointer fields in the stored records themselves
DENSE ONE- TO –ONE (1:1) NON-DENSE 1:m
A NONDENSE LIST ONLY A FEW OF THE
RECORDS IN THE FILE ARE PART OF THE
LIST{1:M}
A DENSE LIST IS ONE WITH A POINTER FOR MOST
OR ALL OF THE RECORDS IN THE FILE.{1:1}
The above lists are dense since there is one-to-one
relationship between both company name and
primary key and company symbol and primary key
Binary tree
•Binary tree is a tree which consists of a root node and two disjoint binary trees called the left
subtree and right subtree.
•every node in a binary tree has 0, 1 or no children
Binary Search Tree (BST)
 To search a typical key value,
 start from the root and move towards left or right depending on the value of key that is being
searched.
 An index is a pair, thus while using BST, use the value as the key and address field must also be
specified in order to locate the records in the file that is stored on the secondary storage devices
A BST as a data structure is very much suitable for an index, if an index is to be contained
completely in the primary memory.
 indexes are quite large in nature and require a combination of primary and secondary storage.
BST stored level by level on a secondary storage which would require the additional problem of
finding the correct sub-tree and also it may require a number of transfers, with the worst
condition as one block transfer for each level of a tree being searched. This situation can be
drastically remedied if we use B -Tree as data structure.
B-Trees
B-Trees
•B-trees are a form of data structure based on hierarchies
•“B” stands for Bayer, the originator /“balanced”.
•B-tree structure was discovered by R.Bayer and E.McCreight (1970) of Bell Scientific Research Labs
•B-Trees are balanced in the sense that all the terminal (bottom) nodes have the same path length to
the root (top).
Algorithms have been developed for efficiently searching and maintaining B-Tree indexes
Algorithms is a finite sequence of well-defined, computer-implementable instructions, typically to solve a class
of problems or to perform a computation.
BTrees provide both sequential and indexed access and are quite flexible..
The height of a B-Tree is the number of levels in the hierarchy
The height of a B-Tree is the number of levels in the hierarchy.
Each node on the tree contains an index element which has a key value, a pointer to the rest of
the data and
 two link pointers;
One link (to the left) points to the elements (nodes) that have lower values
while the other link (to the right) points to elements that have a value greater than or equal to
the value in the node.
The root is the highest node on the tree.
The bottom nodes are called leaves because they are at the end of the tree branches.
FILES AND THEIR ORGANIZATIONS
A file is a sequence of records. File organization refers to physical layout or a structure of record occurrences in
a file. File organization determines the way records are stored and accessed.
fixed-length records & variable-length records
If every record in the file has exactly the same size (in bytes), the file is said to be made of fixed-length
records.
If different records in the file have different sizes, the file is said to be made up of variable-length
records. A file may have variable-length records for several reasons :
Disk Blocks
the databases are stored persistently on magnetic disks for the reasons given below:
 The databases being very large may not fit completely in the main memory.
 Storing the data permanently using the non-volatile storage and provide access to the users
with the help of front end applications.
Primary storage is considered to be very expensive and in order to cut short the cost of the
storage per unit of data to substantially less.
Each hard drive is usually composed of a set of disk platters.
Each disk platter has a layer of magnetic material deposited on its surface.
The entire disk can contain a large amount of data, which is organised into smaller packages
called BLOCKS (or pages). On most computers, one block is equivalent to 1 KB of data (= 1024
Bytes).
Disk Blocks
A block is the smallest unit of data transfer between the hard disk and the processor of the
computer.
Each block therefore has a fixed, assigned, address.
the computer processor will submit a read/write request, which includes the address of the
block, and the address of RAM in the computer memory area called a buffer (or cache) where
the data must be stored / taken from. T
he processor then reads and modifies the buffer data as required, and, if required, writes the
block back to the disk.
•Disk Blocks
•The division of a track (on storage medium) into equal sized disk blocks is set by the operating
system during disk formatting
•The records of a file must be allocated to disk blocks because a block is a unit of data transfer
between disk and memory
•The hardware address of a block comprises a surface number, track number and block number
•Buffer
•a contiguous reserved area in main storage that holds one block-has also an address
•For a read command, the block from disk is copied into the buffer, whereas for a write command
the contents of the buffer are copied into the disk block.
File Organisation and access method
A file organization refers to the organization of the data of a file into records, blocks and access
structures; this includes the way the records and blocks are placed on the storage medium and
interlinked.
 An access method on the other hand, provides a group of operations – such as find, read,
modify, delete etc., — that can be applied to a file.
File Organisation and access method
Sequential Access Method (SAM)
Indexed Sequential Access Method (ISAM)
Direct Access Method (DAM)
Sequential Access Method (SAM)
Records of the file are stored in sequence by the primary key field values.
They are accessible only in the order stored, i.e., in the primary key order
This kind of file Organisation works well for tasks which need to access nearly every record in a
file, e.g., payroll.
a sequentially organised file records are written consecutively when the file is created and
must be accessed consecutively when the file is later used for input
Sequential Access Method (SAM)
Sequential Access Method (SAM)
If only sequential access is required sequential media (magnetic tapes) are suitable and
probably the most cost-effective way of processing such files
Sequential access is fast and efficient while dealing with large volumes of data that need to be
processed periodically.
However, it is require that all new transactions be sorted into a proper sequence for sequential
access processing.
 Also, most of the database or file may have to be searched to locate, store, or modify even a
small number of data records. Thus, this method is too slow to handle applications requiring
immediate updating or responses
Sequential files are generally used for backup or transporting data to a different system. A
sequential ASCII file is a popular export/import format that most database systems support.
Advantages of Sequential File Organisation
 It is fast and efficient when dealing with large volumes of data that need to be
processed periodically (batch system).
Disadvantages of sequential File Organisation
 Requires that all new transactions be sorted into the proper sequence for
sequential access processing.
 Locating, storing, modifying, deleting, or adding records in the file require
rearranging the file.
This method is too slow to handle applications requiring immediate updating or
responses
Indexed Sequential Access Method
(ISAM)
It organises the file like a large dictionary,
i.e., records are stored in order of the key but an index is kept which also permits a type of
direct access.
The records are stored sequentially by primary key values and there is an index built over
the primary key field.
This approach gives (almost) direct access to record occurrences via the index table and
sequential access via the way in which the records are laid out on the storage medium.
The physical address of a record given by the index file is also called a pointer.
To improve the query response
time of a sequential file, a type
of indexing technique can be
added.
Indexing associates a set of
objects to a set of orderable
quantities, that are usually
smaller in number or their
properties
A sequential (or sorted on
primary keys) file that is indexed
on its primary key is called an
index sequential file.
The index allows for random
access to records, while the
sequential storage of the
records of the file provides easy
access to the sequential records.
Direct Access Method (DAM)
the record occurrences in a file do not have to be arranged in any particular sequence on
storage media
However, the computer must keep track of the storage location of each record using a variety of
direct organization methods so that data is retrieved when needed.
New transactions data do not have to be sorted, and processing that requires immediate
responses or updating is easily handled
In the direct access method, an algorithm is used to compute the address of a record. The
primary key value is the input to the algorithm and the block address of the record is the output.
hashing algorithm
To implement the approach, a portion of the storage
space is reserved for the file.
This space should be large enough to hold the file plus
some allowance for growth. Then the algorithm that
generates the appropriate address for a given primary
key is devised. The algorithm is commonly called
hashing algorithm. The process of converting primary
key values into addresses is called key-to-address
transformation
•Reserved storage space
•Overflow area
•relative pointers or relative addresses
•hashing algorithm
•Hashed key

More Related Content

PPTX
DBMS-Unit5-PPT.pptx important for revision
PPTX
Data storage and indexing
PPT
Ardbms
PPTX
storage techniques_overview-1.pptx
PPT
Unit 4 data storage and querying
PPT
9910559 jjjgjgjfs lke lwmerfml lew we.ppt
PPT
Data Indexing Presentation-My.pptppt.ppt
PPTX
normalization process in relational data base management
DBMS-Unit5-PPT.pptx important for revision
Data storage and indexing
Ardbms
storage techniques_overview-1.pptx
Unit 4 data storage and querying
9910559 jjjgjgjfs lke lwmerfml lew we.ppt
Data Indexing Presentation-My.pptppt.ppt
normalization process in relational data base management

Similar to 3620121datastructures.ppt (20)

PPTX
DBMS_UNIT 5 Notes.pptx
PPT
File organization 1
PPTX
files,indexing,hashing,linear and non linear hashing
PPTX
DBMS (UNIT 5)
PDF
Datastructures Notes
PPT
Main MeMory Data Base
PDF
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
PDF
Introduction of data structures and algorithms
PPT
Storage struct
PPTX
File Structure.pptx
PPT
Database Management Systems full lecture
PPT
StorageIndexing_CS541.ppt indexes for dtata bae
PPT
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
PPT
INDEXING METHODS USED IN DATABASE STORAGE
PPT
File Management
PPTX
Index Structures.pptx
PPTX
Data structure chapter 1.pptx
PDF
Computer Science 12th Topic- introduction to syllabus.pdf
PDF
DBMS 8 | Memory Hierarchy and Indexing
PDF
File Organization
DBMS_UNIT 5 Notes.pptx
File organization 1
files,indexing,hashing,linear and non linear hashing
DBMS (UNIT 5)
Datastructures Notes
Main MeMory Data Base
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of data structures and algorithms
Storage struct
File Structure.pptx
Database Management Systems full lecture
StorageIndexing_CS541.ppt indexes for dtata bae
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
INDEXING METHODS USED IN DATABASE STORAGE
File Management
Index Structures.pptx
Data structure chapter 1.pptx
Computer Science 12th Topic- introduction to syllabus.pdf
DBMS 8 | Memory Hierarchy and Indexing
File Organization
Ad

Recently uploaded (20)

PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Cell Types and Its function , kingdom of life
PPTX
master seminar digital applications in india
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Pre independence Education in Inndia.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Lesson notes of climatology university.
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Cell Structure & Organelles in detailed.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Module 4: Burden of Disease Tutorial Slides S2 2025
Complications of Minimal Access Surgery at WLH
Microbial diseases, their pathogenesis and prophylaxis
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Sports Quiz easy sports quiz sports quiz
Cell Types and Its function , kingdom of life
master seminar digital applications in india
PPH.pptx obstetrics and gynecology in nursing
Pre independence Education in Inndia.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Lesson notes of climatology university.
TR - Agricultural Crops Production NC III.pdf
Final Presentation General Medicine 03-08-2024.pptx
Basic Mud Logging Guide for educational purpose
Cell Structure & Organelles in detailed.
O5-L3 Freight Transport Ops (International) V1.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Anesthesia in Laparoscopic Surgery in India
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Ad

3620121datastructures.ppt

  • 2. Data structures and file organisation refer to the methods of organising the data in a database They primarily deal with physical storage of data, which assumes significance in retrieving, storing and re-organising data in a database Linked lists, inverted lists, B-trees and hash tables, among others Data structures can be used to build data files (a data file or a file is a collection of many similar records) and file organisation determines access methods for the file File organisation (or file structure) is a combination of representations for data in files and of operations for accessing the data. A file structure allows applications to read, write, and modify data
  • 3. Why Data Structure The key factor in designing data structures and file organisation is the relatively slow speed of hard disks and large amount of time that is required to get information from a disk. All the data structures and file organisation designs focus on minimising disk accesses and maximising the likelihood that the information the user will want is already in the memory. The constraint related to disk access is generally referred to as I/O bottleneck. Accessing information using multiple trips to the disk greatly slows down the access time. Ideally, we should get the information we need with one access to the disk or with as few accesses as possible.
  • 4. FILES files grew intolerably large for unaided sequential access, indexes were added to the files. The INDEXES made it possible to keep a list of keys and pointers in a small file that could be searched more quickly. With the keys and pointers the user had direct access to the large, primary file. However, as the indexes grew, they too became difficult to manage, especially for dynamic files in which the set of keys changes. Then, in the early 1960s the idea Data Structures Emerged
  • 5. Memory Primary storage: Pertains to storage media used by Central Processing Unit (CPU) i.e., the main memory and also the cache memory. The primary storage memory also called RAM (Random Access Memory) provides fast access to data and is volatile i.e., loses its content in case of a power outage. Secondary storage: Includes magnetic disks, optical disks and tapes. Secondary storage memory provides slower access to data than RAM.
  • 6. Static RAM (SRAM) which is cache memory is used by CPU to speed up execution of programmes while Dynamic RAM (DRAM) provides the main work area for CPU. Flash memory which is non-volatile and called EEPROM (Electrically Erasable Programmable Read-Only-Memory) has access speed and performance between DRAM and magnetic disks. CD-ROM (Compact Disk Read-Only-Memory) disks store data optically and are read by a laser. WORM (Write-Once-Read-Memory) disks are used for archiving data and allow data to be written once and read any number of times. DVD (Digital Video Disk) – a type of optical disk allows storage of four to fifteen gigabytes of data per disk. Magnetic tapes are used for archiving and back-up storage and are becoming popular as tertiary storage to hold terabytes of data. Juke boxes (optical and tape) are employed to use arrays of CD-ROMs and tapes.
  • 7. RAID Technology(RedundantArrayofInexpensive/IndependentDisks) The main goal of RAID is to even out the widely different rates of performance improvement of disks against those in memory and microprocessors In addition to improving performance, RAID is also used to improve reliability by storing redundant information on disks. One technique for introducing redundancy is called mirroring. Data is written redundantly to two identical physical disks that are treated as one logical disk. If a disk fails, the other is used until the first is repaired The problem of speed and access time is overcome by using a large array of small independent disks acting as a single high-performance logical disk. A concept called data striping is used, which utilises parallelism to improve disk performance
  • 8. DATA STRUCTURES The term Data Structure refers to the manner in which relationships between data elements are represented in the computer system. Organisation of indexes, representation of stored fields, physical sequence of stored records, etc., are included in the purview of data structures. Thus, an understanding of data structures is important in gaining an understanding of database management systems There are three major types of data structures : linked lists (indexes), inverted lists (indexes) and B-trees
  • 9. Indexes An index is a file in which each entry (record) consists of a data value together with one or more pointers (physical storage addresses). The data value is a value for some field of the indexed file (the indexed field) and pointers identify records in the indexed file having that value for that field. An index can be used in two ways. First, it can be used for sequential access to the indexed file i.e. access according to the values of the indexed field by imposing an ordering of the indexed file. Second, it can also be used for direct access to individual
  • 11. Primary index , Clustering Indexes, Secondary Indexes A primary index is a file that contains a sorted sequence of records having two columns: the ordering key field; and a block address An entry in primary index file contains the index value of the first record of the data block and a pointer to that data block. . ordering key field for this index can be the primary key of the data file
  • 12. Example of Primary Index file
  • 13. Clustering Indexes An index that is created on an ordered file whose records of a file are physically ordered on a non-key field (that is the field does not have a distinct value for each record) is called a clustering index.
  • 14. secondary index A secondary index is a file that contains records containing a secondary index field value which is not the ordering field of the data file, and a pointer to the block that contains the data record.
  • 16. A simple linked list is a chain of pointers embedded in records. It indicates either a record sequence for an attribute other than the primary key or all the records with a common property. With a linked list, any data element can be stored separately. A pointer is then used to link to the next data item.
  • 17. Inverted Lists Inverted lists may be viewed simply as index tables of pointers stored separately from the data records rather than embedded in pointer fields in the stored records themselves DENSE ONE- TO –ONE (1:1) NON-DENSE 1:m
  • 18. A NONDENSE LIST ONLY A FEW OF THE RECORDS IN THE FILE ARE PART OF THE LIST{1:M} A DENSE LIST IS ONE WITH A POINTER FOR MOST OR ALL OF THE RECORDS IN THE FILE.{1:1} The above lists are dense since there is one-to-one relationship between both company name and primary key and company symbol and primary key
  • 19. Binary tree •Binary tree is a tree which consists of a root node and two disjoint binary trees called the left subtree and right subtree. •every node in a binary tree has 0, 1 or no children
  • 21.  To search a typical key value,  start from the root and move towards left or right depending on the value of key that is being searched.  An index is a pair, thus while using BST, use the value as the key and address field must also be specified in order to locate the records in the file that is stored on the secondary storage devices A BST as a data structure is very much suitable for an index, if an index is to be contained completely in the primary memory.  indexes are quite large in nature and require a combination of primary and secondary storage. BST stored level by level on a secondary storage which would require the additional problem of finding the correct sub-tree and also it may require a number of transfers, with the worst condition as one block transfer for each level of a tree being searched. This situation can be drastically remedied if we use B -Tree as data structure.
  • 23. B-Trees •B-trees are a form of data structure based on hierarchies •“B” stands for Bayer, the originator /“balanced”. •B-tree structure was discovered by R.Bayer and E.McCreight (1970) of Bell Scientific Research Labs •B-Trees are balanced in the sense that all the terminal (bottom) nodes have the same path length to the root (top). Algorithms have been developed for efficiently searching and maintaining B-Tree indexes Algorithms is a finite sequence of well-defined, computer-implementable instructions, typically to solve a class of problems or to perform a computation.
  • 24. BTrees provide both sequential and indexed access and are quite flexible.. The height of a B-Tree is the number of levels in the hierarchy The height of a B-Tree is the number of levels in the hierarchy. Each node on the tree contains an index element which has a key value, a pointer to the rest of the data and  two link pointers; One link (to the left) points to the elements (nodes) that have lower values while the other link (to the right) points to elements that have a value greater than or equal to the value in the node. The root is the highest node on the tree. The bottom nodes are called leaves because they are at the end of the tree branches.
  • 25. FILES AND THEIR ORGANIZATIONS A file is a sequence of records. File organization refers to physical layout or a structure of record occurrences in a file. File organization determines the way records are stored and accessed. fixed-length records & variable-length records If every record in the file has exactly the same size (in bytes), the file is said to be made of fixed-length records. If different records in the file have different sizes, the file is said to be made up of variable-length records. A file may have variable-length records for several reasons :
  • 26. Disk Blocks the databases are stored persistently on magnetic disks for the reasons given below:  The databases being very large may not fit completely in the main memory.  Storing the data permanently using the non-volatile storage and provide access to the users with the help of front end applications. Primary storage is considered to be very expensive and in order to cut short the cost of the storage per unit of data to substantially less. Each hard drive is usually composed of a set of disk platters. Each disk platter has a layer of magnetic material deposited on its surface. The entire disk can contain a large amount of data, which is organised into smaller packages called BLOCKS (or pages). On most computers, one block is equivalent to 1 KB of data (= 1024 Bytes).
  • 27. Disk Blocks A block is the smallest unit of data transfer between the hard disk and the processor of the computer. Each block therefore has a fixed, assigned, address. the computer processor will submit a read/write request, which includes the address of the block, and the address of RAM in the computer memory area called a buffer (or cache) where the data must be stored / taken from. T he processor then reads and modifies the buffer data as required, and, if required, writes the block back to the disk.
  • 28. •Disk Blocks •The division of a track (on storage medium) into equal sized disk blocks is set by the operating system during disk formatting •The records of a file must be allocated to disk blocks because a block is a unit of data transfer between disk and memory •The hardware address of a block comprises a surface number, track number and block number •Buffer •a contiguous reserved area in main storage that holds one block-has also an address •For a read command, the block from disk is copied into the buffer, whereas for a write command the contents of the buffer are copied into the disk block.
  • 29. File Organisation and access method A file organization refers to the organization of the data of a file into records, blocks and access structures; this includes the way the records and blocks are placed on the storage medium and interlinked.  An access method on the other hand, provides a group of operations – such as find, read, modify, delete etc., — that can be applied to a file.
  • 30. File Organisation and access method Sequential Access Method (SAM) Indexed Sequential Access Method (ISAM) Direct Access Method (DAM)
  • 31. Sequential Access Method (SAM) Records of the file are stored in sequence by the primary key field values. They are accessible only in the order stored, i.e., in the primary key order This kind of file Organisation works well for tasks which need to access nearly every record in a file, e.g., payroll. a sequentially organised file records are written consecutively when the file is created and must be accessed consecutively when the file is later used for input
  • 33. Sequential Access Method (SAM) If only sequential access is required sequential media (magnetic tapes) are suitable and probably the most cost-effective way of processing such files Sequential access is fast and efficient while dealing with large volumes of data that need to be processed periodically. However, it is require that all new transactions be sorted into a proper sequence for sequential access processing.  Also, most of the database or file may have to be searched to locate, store, or modify even a small number of data records. Thus, this method is too slow to handle applications requiring immediate updating or responses Sequential files are generally used for backup or transporting data to a different system. A sequential ASCII file is a popular export/import format that most database systems support.
  • 34. Advantages of Sequential File Organisation  It is fast and efficient when dealing with large volumes of data that need to be processed periodically (batch system). Disadvantages of sequential File Organisation  Requires that all new transactions be sorted into the proper sequence for sequential access processing.  Locating, storing, modifying, deleting, or adding records in the file require rearranging the file. This method is too slow to handle applications requiring immediate updating or responses
  • 35. Indexed Sequential Access Method (ISAM) It organises the file like a large dictionary, i.e., records are stored in order of the key but an index is kept which also permits a type of direct access. The records are stored sequentially by primary key values and there is an index built over the primary key field. This approach gives (almost) direct access to record occurrences via the index table and sequential access via the way in which the records are laid out on the storage medium. The physical address of a record given by the index file is also called a pointer.
  • 36. To improve the query response time of a sequential file, a type of indexing technique can be added. Indexing associates a set of objects to a set of orderable quantities, that are usually smaller in number or their properties A sequential (or sorted on primary keys) file that is indexed on its primary key is called an index sequential file. The index allows for random access to records, while the sequential storage of the records of the file provides easy access to the sequential records.
  • 37. Direct Access Method (DAM) the record occurrences in a file do not have to be arranged in any particular sequence on storage media However, the computer must keep track of the storage location of each record using a variety of direct organization methods so that data is retrieved when needed. New transactions data do not have to be sorted, and processing that requires immediate responses or updating is easily handled In the direct access method, an algorithm is used to compute the address of a record. The primary key value is the input to the algorithm and the block address of the record is the output.
  • 38. hashing algorithm To implement the approach, a portion of the storage space is reserved for the file. This space should be large enough to hold the file plus some allowance for growth. Then the algorithm that generates the appropriate address for a given primary key is devised. The algorithm is commonly called hashing algorithm. The process of converting primary key values into addresses is called key-to-address transformation
  • 39. •Reserved storage space •Overflow area •relative pointers or relative addresses •hashing algorithm •Hashed key