SlideShare a Scribd company logo
1
Introduction to the Design and
Specification of File Structures
2
Outline
• What are File Structures?
• Why Study File Structure Design
• Overview of File Structure Design
3
Definition
• A File Structure is a combination of
representations for data in files and of
operations for accessing the data.
• A File Structure allows applications to read,
write and modify data. It might also support
finding the data that matches some search
criteria or reading through the data in some
particular order.
4
Why Study File Structure Design?
I. Data Storage
• Computer Data can be stored in three kinds
of locations:
– Primary Storage ==> Memory
[Computer Memory]
– Secondary Storage [Online Disk/ Tape/
CDRom that can be accessed by the computer]
– Tertiary Storage ==> Archival Data
[Offline Disk/Tape/ CDRom not directly
available to the computer.]
Our
Focus
5
Why Study File Structure Design?
II. Memory versus Secondary Storage
• Secondary storage such as disks can pack thousands
of megabytes in a small physical location.
• Computer Memory (RAM) is limited.
• However, relative to Memory, access to secondary
storage is extremely slow [E.g., getting information
from slow RAM takes 120. 10-9
seconds(= 120
nanoseconds) while getting information from Disk takes
30. 10-3
seconds (= 30 milliseconds)]
6
Why Study File Structure Design?
III. How Can Secondary Storage Access Time be
Improved?
By improving the File Structure.
Since the details of the representation of the
data and the implementation of the
operations determine the efficiency of the
file structure for particular applications,
improving these details can help improve
secondary storage access time.
7
Overview of File Structure Design
I. General Goals
• Get the information we need with one
access to the disk.
• If that’s not possible, then get the
information with as few accesses as
possible.
• Group information so that we are likely to
get everything we need with only one trip to
the disk.
8
Overview of File Structure Design
II. Fixed versus Dynamic Files
• It is relatively easy to come up with file
structure designs that meet the general goals
when the files never change.
• When files grow or shrink when
information is added and deleted, it is much
more difficult.
9
History of File Structures
I. Early Work
• Early Work assumed that files were on tape.
• Access was sequential and the cost of acces
grew in direct proportion to the size of the
file.
10
History of File Structures
II. The emergence of Disks and Indexes
• As files grew very large, unaided sequential
access was not a good solution.
• Disks allowed for direct access.
• Indexes made it possible to keep a list of
keys and pointers in a small file that could
be searched very quickly.
• With the key and pointer, the user had
direct access to the large, primary file.
11
History of File Structures
III. The emergence of Tree Structures
• As indexes also have a sequential flavour,
when they grew too much, they also
became difficult to manage.
• The idea of using tree structures to manage
the index emerged in the early 60’s.
• However, trees can grow very unevenly as
records are added and deleted, resulting in
long searches requiring many disk accesses
to find a record.
12
History of File Structures
IV. Balanced Trees
• In 1963, researchers came up with the idea
of AVL trees for data in memory.
• AVL trees, however, did not apply to files because they
work well when tree nodes are composed of single records
rather than dozens or hundreds of them.
• In the 1970’s came the idea of B-Trees which require an
O(logk N) access time where N is the number of entries in
the file and k, th number of entries indexed in a single block
of the B-Tree structure --> B-Trees can guarantee that one
can find one file entry among millions of others with only 3
or 4 trips to the disk.
13
History of File Structures
V. Hash Tables
• Retrieving entries in 3 or 4 accesses is good,
but it does not reach the goal of accessing
data with a single request.
• From early on, Hashing was a good way to
reach this goal with files that do not change
size greatly over time.
• Recently, Extendible Dynamic Hashing
guarantees one or at most two disk accesses
no matter how big a file becomes.

More Related Content

PDF
Creating an Effective MDM Strategy for Salesforce
PDF
Big Query Basics
PPTX
BigQuery walk through.pptx
PDF
An overview of BigQuery
PPTX
Performance Management in Oracle 12c
PPTX
Fundamentals of Data Modeling and Database Design by Dr. Kamal Gulati
PDF
New look new benefits - upgrading to IFS Applications 8
 
PDF
Introduction to Nebula Graph, an Open-Source Distributed Graph Database
Creating an Effective MDM Strategy for Salesforce
Big Query Basics
BigQuery walk through.pptx
An overview of BigQuery
Performance Management in Oracle 12c
Fundamentals of Data Modeling and Database Design by Dr. Kamal Gulati
New look new benefits - upgrading to IFS Applications 8
 
Introduction to Nebula Graph, an Open-Source Distributed Graph Database

What's hot (20)

PPTX
Introduction To Data Vault - DAMA Oregon 2012
PDF
OOW15 - Advanced Architectures for Oracle E-Business Suite
PDF
Revolutionizing the Energy Industry with Graphs
PPTX
Data Architecture Brief Overview
PPT
Multidimensional Database Design & Architecture
PDF
MySQL Performance Schema in Action
PPTX
Introduction to Data Engineering
PPT
1.자료구조와 알고리즘(강의자료)
PPTX
Modelos De Data Mining
PDF
BigQuery for Beginners
ODP
Partitioning
PPTX
Data Dictionary
PDF
Google BigQuery Best Practices
PDF
Average Active Sessions - OaktableWorld 2013
PPTX
Agile Data Engineering - Intro to Data Vault Modeling (2016)
PDF
Data Warehouse or Data Lake, Which Do I Choose?
PPTX
Quiery builder
PDF
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databas...
PDF
Bigquery 101
PPT
Query processing-and-optimization
Introduction To Data Vault - DAMA Oregon 2012
OOW15 - Advanced Architectures for Oracle E-Business Suite
Revolutionizing the Energy Industry with Graphs
Data Architecture Brief Overview
Multidimensional Database Design & Architecture
MySQL Performance Schema in Action
Introduction to Data Engineering
1.자료구조와 알고리즘(강의자료)
Modelos De Data Mining
BigQuery for Beginners
Partitioning
Data Dictionary
Google BigQuery Best Practices
Average Active Sessions - OaktableWorld 2013
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Data Warehouse or Data Lake, Which Do I Choose?
Quiery builder
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databas...
Bigquery 101
Query processing-and-optimization
Ad

Similar to Introduction to the design and specification of file structures (20)

PDF
Chapter 1_ Introduction to File Structures.pdf
PPT
3620121datastructures.ppt
PPTX
Chapter 3
PPT
Aaaaaaaaaa
PDF
Chapter12
PPT
Chapter 3 part 1
PPTX
OS Unit 4.pptx
PPTX
Unit 5.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PPTX
8 File Management system project .pptx
PPT
Operating Systems - File Space Allocation
PPT
distributed SYSTEMS FSnewBBIT305KCAU.ppt
PDF
Unit ivos - file systems
PPT
Operating System - File Management concepts
PPTX
Operating System Unit 4(RTU Syllabus).pptx
PPTX
File System.pptx
PDF
oslectureset7.pdf
PPT
Unit 3 chapter 1-file management
PPTX
File Concept.pptx fa s fasfasfasfsfsfasfasfas
PDF
Files and data storage
PPTX
OS Unit5.pptx
Chapter 1_ Introduction to File Structures.pdf
3620121datastructures.ppt
Chapter 3
Aaaaaaaaaa
Chapter12
Chapter 3 part 1
OS Unit 4.pptx
Unit 5.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
8 File Management system project .pptx
Operating Systems - File Space Allocation
distributed SYSTEMS FSnewBBIT305KCAU.ppt
Unit ivos - file systems
Operating System - File Management concepts
Operating System Unit 4(RTU Syllabus).pptx
File System.pptx
oslectureset7.pdf
Unit 3 chapter 1-file management
File Concept.pptx fa s fasfasfasfsfsfasfasfas
Files and data storage
OS Unit5.pptx
Ad

More from Devyani Vaidya (20)

PPT
PPT
Fundamental file structure concepts & managing files of records
PPT
Cosequential processing and the sorting of large files
PPTX
Mobile Phone Cloning
PPTX
Data warehousing
PPTX
secued cloud
PPTX
Cloud Cmputing Security
PPTX
Cloud Security
PPTX
Wireless network
PPT
Environmental law
PPTX
Wireless mobile charging using microwaves
PPTX
Secure Cloud Issues
PPTX
Energy Harvesing Through Reverse Electrowetting
PPT
Wireless Charging Of Mobile
PPTX
Applet programming
PPTX
Seminar on telephone directory
PPTX
History of Laptop
PPTX
Ppt on open and close door using Applet
PPTX
Resource management
PPTX
Ppt on use of biomatrix in secure e trasaction
Fundamental file structure concepts & managing files of records
Cosequential processing and the sorting of large files
Mobile Phone Cloning
Data warehousing
secued cloud
Cloud Cmputing Security
Cloud Security
Wireless network
Environmental law
Wireless mobile charging using microwaves
Secure Cloud Issues
Energy Harvesing Through Reverse Electrowetting
Wireless Charging Of Mobile
Applet programming
Seminar on telephone directory
History of Laptop
Ppt on open and close door using Applet
Resource management
Ppt on use of biomatrix in secure e trasaction

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Institutional Correction lecture only . . .
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Insiders guide to clinical Medicine.pdf
PDF
Classroom Observation Tools for Teachers
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Complications of Minimal Access Surgery at WLH
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Final Presentation General Medicine 03-08-2024.pptx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
RMMM.pdf make it easy to upload and study
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial diseases, their pathogenesis and prophylaxis
Institutional Correction lecture only . . .
2.FourierTransform-ShortQuestionswithAnswers.pdf
Microbial disease of the cardiovascular and lymphatic systems
human mycosis Human fungal infections are called human mycosis..pptx
Insiders guide to clinical Medicine.pdf
Classroom Observation Tools for Teachers
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
01-Introduction-to-Information-Management.pdf
Cell Types and Its function , kingdom of life
Complications of Minimal Access Surgery at WLH
Module 4: Burden of Disease Tutorial Slides S2 2025
Supply Chain Operations Speaking Notes -ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

Introduction to the design and specification of file structures

  • 1. 1 Introduction to the Design and Specification of File Structures
  • 2. 2 Outline • What are File Structures? • Why Study File Structure Design • Overview of File Structure Design
  • 3. 3 Definition • A File Structure is a combination of representations for data in files and of operations for accessing the data. • A File Structure allows applications to read, write and modify data. It might also support finding the data that matches some search criteria or reading through the data in some particular order.
  • 4. 4 Why Study File Structure Design? I. Data Storage • Computer Data can be stored in three kinds of locations: – Primary Storage ==> Memory [Computer Memory] – Secondary Storage [Online Disk/ Tape/ CDRom that can be accessed by the computer] – Tertiary Storage ==> Archival Data [Offline Disk/Tape/ CDRom not directly available to the computer.] Our Focus
  • 5. 5 Why Study File Structure Design? II. Memory versus Secondary Storage • Secondary storage such as disks can pack thousands of megabytes in a small physical location. • Computer Memory (RAM) is limited. • However, relative to Memory, access to secondary storage is extremely slow [E.g., getting information from slow RAM takes 120. 10-9 seconds(= 120 nanoseconds) while getting information from Disk takes 30. 10-3 seconds (= 30 milliseconds)]
  • 6. 6 Why Study File Structure Design? III. How Can Secondary Storage Access Time be Improved? By improving the File Structure. Since the details of the representation of the data and the implementation of the operations determine the efficiency of the file structure for particular applications, improving these details can help improve secondary storage access time.
  • 7. 7 Overview of File Structure Design I. General Goals • Get the information we need with one access to the disk. • If that’s not possible, then get the information with as few accesses as possible. • Group information so that we are likely to get everything we need with only one trip to the disk.
  • 8. 8 Overview of File Structure Design II. Fixed versus Dynamic Files • It is relatively easy to come up with file structure designs that meet the general goals when the files never change. • When files grow or shrink when information is added and deleted, it is much more difficult.
  • 9. 9 History of File Structures I. Early Work • Early Work assumed that files were on tape. • Access was sequential and the cost of acces grew in direct proportion to the size of the file.
  • 10. 10 History of File Structures II. The emergence of Disks and Indexes • As files grew very large, unaided sequential access was not a good solution. • Disks allowed for direct access. • Indexes made it possible to keep a list of keys and pointers in a small file that could be searched very quickly. • With the key and pointer, the user had direct access to the large, primary file.
  • 11. 11 History of File Structures III. The emergence of Tree Structures • As indexes also have a sequential flavour, when they grew too much, they also became difficult to manage. • The idea of using tree structures to manage the index emerged in the early 60’s. • However, trees can grow very unevenly as records are added and deleted, resulting in long searches requiring many disk accesses to find a record.
  • 12. 12 History of File Structures IV. Balanced Trees • In 1963, researchers came up with the idea of AVL trees for data in memory. • AVL trees, however, did not apply to files because they work well when tree nodes are composed of single records rather than dozens or hundreds of them. • In the 1970’s came the idea of B-Trees which require an O(logk N) access time where N is the number of entries in the file and k, th number of entries indexed in a single block of the B-Tree structure --> B-Trees can guarantee that one can find one file entry among millions of others with only 3 or 4 trips to the disk.
  • 13. 13 History of File Structures V. Hash Tables • Retrieving entries in 3 or 4 accesses is good, but it does not reach the goal of accessing data with a single request. • From early on, Hashing was a good way to reach this goal with files that do not change size greatly over time. • Recently, Extendible Dynamic Hashing guarantees one or at most two disk accesses no matter how big a file becomes.