Unit 6 OSY.pptx aaaaaaaaaaaaaaaaaaaaaaaa

UNIT 6 -FILE MANAGEMENT
DR USHA RAGHAVAN

INTRODUCTION
 A file is a named collection of related information that is recorded on secondary storage such as
magnetic disks, magnetic tapes and optical disks.
 A file is a sequence of bits, bytes, lines or records whose meaning is defined by the files creator and
user.
 Data files may be numeric, alphabetic, alphanumeric or binary

FILE STRUCTURE
 A File Structure should be according to a required format that the operating system can understand.
• A file has a certain defined structure according to its type.
• A text file is a sequence of characters organized into lines.
• A source file is a sequence of procedures and functions.
• An object file is a sequence of bytes organized into blocks that are understandable by the machine.
• When operating system defines different file structures, it also contains the code to support these file
structure. Unix, MS-DOS support minimum number of file structure.

FILE TYPE
 File type refers to the ability of the operating system to distinguish different types of file
such as text files, source files and binary files etc. Many operating systems support many
types of files. Operating system like MS-DOS and UNIX have the following types of files
−
 Ordinary files
• These are the files that contain user information.
• These may have text, databases or executable program.
• The user can apply various operations on such files like add, modify, delete or even
remove the entire file.
 Directory files
• These files contain list of file names and other information related to these files.

FILE TYPE
 Special files
• These files are also known as device files.
• These files represent physical device like disks, terminals, printers,
networks, tape drive etc.
These files are of two types −
• Character special files − data is handled character by character as in
case of terminals or printers.
• Block special files − data is handled in blocks as in the case of disks
and tapes.

FILE ATTRIBUTES
 A file has a name and data. Moreover, it also stores meta information like file
creation date and time, current size, last modified date, etc. All this information is
called the attributes of a file system.
 File attributes used in OS are:
• Name: It is the only information stored in a human-readable form. It is always
followed by an extension name. It specifies the type of file . Eg.- OS .doc OS is file
name and .doc is extension name. ‘.’ is a separator
• Identifier: Every file is identified by a unique tag number within a file system
known as an identifier. It is not human readable

FILE ATTRIBUTES
• Location: Points to file location on device. It is a pointer that points to address the
file on storage device
• Type: This attribute is required for systems that support various types of files.
Type is indicated with file extension
• Size. Attribute used to display the current file size.It is the number of bytes
occupied by the contents of the file on storage device – Eg. -10 MB
• Protection. This attribute assigns and controls the access rights of reading,
writing, and executing the file.
• Time, date and security: It is used for protection, security, and also used for
monitoring. It specifies information about date and time of creation of the file, last
modification of file and last use of file. It is useful for protection and security and
usage monitoring

OPERATIONS ON FILE
• Create file, find space on disk, and make an entry in the directory.
• Write to file, requires positioning within the file
• Read from file involves positioning within the file
• Delete directory entry, regain disk space.
• Reposition: move read/write position.

CREATE A FILE
 Create operation is used to create a file by reserving memory space on the
storage device. It includes 2 steps.
1. To find free space from the file system
2. To make an entry of that file in its respective directory .
 Creating a file requires naming a file with unique file name inside a directory

WRITE INTO A FILE
 A system call with 2 parameters is required to write into a file. First parameter
specifies name of the file and the second parameter specifies the information
or data to be written into the file
 With the name of the file, system searches the directory to find the file’s
location. In that file, a write pointer is used to write data into the file. After
every write operator, pointer must be updated for next write operation

READING A FILE
 To read a file, a system call is required with 2 parameters that specify name of
the file and the 2nd optional parameter to specify the data to be read from the
file.
 With the file name , system searches a file from the directory and read pointer
is used to read data from the file. After every read operator a read pointer is
updated for next read operation

REPOSITIONING WITHIN THE FILE
 The directory is searched for appropriate entry of the file and a current
position pointer is repositioned to a given value.
 Repositioning may not always be I/O operation.
 This file operation is also called File Seek operation

DELETING A FILE
 For deleting a file, the OS requires location of the file. After searching the file, the system
releases the memory space allocated to that file to delete file from a storage device
 It also deletes file entry from the directory table

Other common operations include appending a new information to end of the file
and renaming an existing file. The primitive operations are combined to perform
other 5 operations such as creating a copy of the file, moving file from one location
to another, copying file to the I/O devices such as printer or display etc..

FILE TYPES
 Operating system recognises and supports various file types.
 After recognizing the type of file, OS can perform operations on it.
 File type can be mentioned as a part of file name. it consists of 2 parts. First part is the name
of the file and the second part is file extension separated with a ‘.’ operator or a character.
 With file extension, the OS recognises the type of file such as .doc- document file, . Exe –
executable file etc…
 In MSDOS , a name consists of upto 8 characters followed by a . Character and terminated
by an extension name with 3 characters
 In UNIX system it uses magicnumber stored at the beginning of some files to indicate type of
such file as executable program.

Common file types
File Type Extension Functions
executable Exe, com,bin or
none
ready-to-run machine- language program
Object obj, o complied, machine language, not linked
Source code c. p, pas, 177,
asm, a
source code in various languages
Batch bat, sh Series of commands to interpreter
Text txt, doc textual data documents
Word processor doc,docs, tex, rrf,
etc.
various word-processor formats
Library lib, h libraries of routines
archive arc, zip, tar related files grouped into one file, sometimes compressed.
multimedia Mpeg, mp3, Binary files containing audio / video information

FILE ACCESS METHODS
 File access mechanism refers to the manner in which the records of a file may be accessed. There are
several ways to access files −
• Sequential access
• Direct/Random access
• Indexed sequential access

SEQUENTIAL ACCESS
 A sequential access is that in which the records are accessed in some sequence, i.e., the information in
the file is processed in order, one record after the other.
 This access method is the most primitive one. Example: Compilers usually access files in this fashion.

DIRECT/RANDOM ACCESS
• Random access file organization provides, accessing the records directly.
• Each record has its own address on the file with by the help of which it can be directly accessed for
reading or writing.
• The records need not be in any sequence within the file and they need not be in adjacent locations on
the storage medium.

INDEXED SEQUENTIAL ACCESS
• This mechanism is built up on base of sequential access.
• An index is created for each file which contains pointers to various blocks.
• Index is searched sequentially and its pointer is used to access the file directly.

FILE ALLOCATION METHODS
 Files are allocated disk spaces by operating system. Operating systems deploy following three main
ways to allocate disk space to files.
• Contiguous Allocation
• Linked Allocation
• Indexed Allocation

CONTIGUOUS ALLOCATION
• Each file occupies a contiguous address space on disk.
• Assigned disk address is in linear order.
• Easy to implement.
• External fragmentation is a major issue with this type of allocation technique.

A single continuous set of blocks is allocated to a file at
the time of file creation. Thus, this is a pre-allocation
strategy, using variable size portions. The file allocation
table needs just a single entry for each file, showing the
starting block and the length of the file. This method is
best from the point of view of the individual sequential
file. Multiple blocks can be read in at a time to improve
I/O performance for sequential processing. It is also
easy to retrieve a single block. For example, if a file
starts at block b, and the ith block of the file is wanted,
its location on secondary storage is simply b+i-1.
Disadvantage
•External fragmentation will occur, making it difficult to
find contiguous blocks of space of sufficient length.
Compaction algorithm will be necessary to free up
additional space on disk.
•Also, with pre-allocation, it is necessary to declare the
size of the file at the time of creation.

LINKED ALLOCATION
• Each file carries a list of links to disk blocks.
• Directory contains link / pointer to first block of a file.
• No external fragmentation
• Effectively used in sequential access file.
• Inefficient in case of direct access file.

Allocation is on an individual block basis. Each block contains a
pointer to the next block in the chain. Again the file table needs
just a single entry for each file, showing the starting block and the
length of the file. Although pre-allocation is possible, it is more
common simply to allocate blocks as needed. Any free block can
be added to the chain. The blocks need not be continuous.
Increase in file size is always possible if free disk block is
available. There is no external fragmentation because only one
block at a time is needed but there can be internal fragmentation
but it exists only in the last disk block of file.
Disadvantage:
•Internal fragmentation exists in last disk block of file.
•There is an overhead of maintaining the pointer in every disk
block.
•If the pointer of any disk block is lost, the file will be truncated.
•It supports only the sequencial access of files.

INDEXED ALLOCATION
• Provides solutions to problems of contiguous and linked allocation.
• A index block is created having all pointers to files.
• Each file has its own index block which stores the addresses of disk space occupied by the file.
• Directory contains the addresses of index blocks of files.

It addresses many of the problems of contiguous and
chained allocation. In this case, the file allocation table
contains a separate one-level index for each file: The
index has one entry for each block allocated to the file.
Allocation may be on the basis of fixed-size blocks or
variable-sized blocks. Allocation by blocks eliminates
external fragmentation, whereas allocation by variable-
size blocks improves locality. This allocation technique
supports both sequential and direct access to the file
and thus is the most popular form of file allocation.

DIRECTORY STRUCTURE
Collection of files is a file directory. The directory contains information about the
files, including attributes, location and ownership. Much of this information,
especially that is concerned with storage, is managed by the operating system. The
directory is itself a file, accessible by various file management routines.
Information contained in a device directory are:
•Name
•Type
•Address
•Current length
•Maximum length
•Date last accessed
•Date last updated
•Owner id

Operation performed on directory are:
•Search for a file
•Create a file
•Delete a file
•List a directory
•Rename a file
•Traverse the file system
Advantages of maintaining directories are:
•Efficiency: A file can be located more quickly.
•Naming: It becomes convenient for users as two users can have same name
for different files or may have different name for same file.
•Grouping: Logical grouping of files can be done by properties e.g. all java
programs, all games etc.

SINGLE-LEVEL DIRECTORY
 In this a single directory is maintained for all the users.
• Naming problem: Users cannot have same name for two files.
• Grouping problem: Users cannot group files according to their need.

Single level Directory system
The single directory is also called root directory
The single level directory has 5 files owned by 3 different
users P,Q,R
User P has 2 files, User Q has 2 Files and user R has 1 File
in the directory
Advantages
Simple to implement
Locating files is very fast
Limitations
If a single user has a large number of files , it becomes
difficult to remember the name of each file
If more than one user keeps file in the same directory,
then different users may give the same names to their
files – thus violating the rule of uniqueness of names
Root Directory
P P Q Q
R

TWO-LEVEL DIRECTORY
 In this separate directories for each user is maintained.
• Path name: Due to two levels there is a path name for every file to locate that file.
• Now, we can have same file name for different user.
• Searching is efficient in this method.

Two Level Directory Systems
A private directory is given to each user. The same name given to files in different users does not interfere
When an user attempts to open a file, the system knows which user it is in order to know the directory in
which the file is to be searched
Advantage
Solves name collision problem
Independent user gets isolated from each other
Limitations
If The users are co-operative, then some systems do not allow accessing the other user’s files
It is not convenient for users with large number of files

TREE-STRUCTURED DIRECTORY
 Directory is maintained in the form of a tree. Searching is efficient and also
there is grouping capability. We have absolute or relative path name for a file.
Tree is the most common directory
structure
Each user can have as many directories as
are needed so that files can be grouped
together in the way it is needed
Every file has a unique pathname
All modern file systems use this
mechanism

DISK ORGANIZATION AND DISK STRUCTURE
 The magnetic disk is used as the main storage device .
 It is magnetic type of storage device
 Within one magnetic disk, many physical disks are present
 Each disk is called a platter. Several platters are present in a magnetic disk.
They are coated with special magnetic material

Platter
•One or more round, flat disks used to actually hold the data in the drive. Each platter
has two surfaces (top & bottom) that are capable of holding data;
•Each surface has one read /write head (Each platter has two heads, one on the top of
the platter and one on the bottom,)
•Hard disk with three platters has six surfaces and six total heads. Normally both
surfaces of each platter are used
•The outer surface of top and bottom disk cannot be used.
•Platter size is the form factor
• Disks are sometimes referred to by a size specification for example "3.5-inch hard
disk".
• The first PCs used hard disks that had a nominal size of 5.25".
• Today, by far the most common hard disk platter size is 3.5“
•Laptop drives are usually smaller, The platters on these drives are usually 2.5" in
diameter or less; 2.5" is the standard form factor, but drives with 1.8" and even 1.0"
platters are becoming more common.
• PCs usually have 1 to 5 platters

TRACKS AND SECTORS
 Each platter has its information recorded
in concentric circles called tracks.
 Each track is further broken down into
smaller pieces called sectors, each of
which holds 512 bytes of information.

STORAGE OF DATA IN PLATTERS
 A sector contains a fixed number of bytes -- for example, 256 or 512. Each track
typically holds between 100 and 300 sectors.
 Larger outer tracks hold more sectors than the smaller inner ones.
 All information stored on a hard disk is recorded in tracks.
 The tracks are numbered, starting from zero, starting at the outside of the platter.
 A hard disk has several thousand tracks on each platter.
 Either at the drive or the operating system level, sectors are often grouped
together into clusters.

Same tracks of different platters form an imaginary cylinder like structure
Data is stored cylinder by cylinder
All tracks on a cylinder are written and then the R/W head moves to the next Cylinder . This reduces movement
of R/W head and increases the speed of read and write operation

CONSTRUCTION OF HDD
The components of the Hard Disk
 Disk Platter
 Read/Write head
 Head Arm/ Head Slider
 Head Actuator mechanisms
 Spindle motor
 Bezel
 Cable & connectors
 Logic board
 Air filter

Read-Write(R-W) head moves over the rotating hard disk. It is this Read-Write head that performs all the
read and write operations on the disk and hence, position of the R-W head is a major concern.
To perform a read or write operation on a memory location, we need to place the R-W head over that
position. Some important terms must be noted here:
1.Seek time – The time taken by the R-W head to reach the desired track from it’s current position.
2.Rotational latency – Time taken by the sector to come under the R-W head.
3.Data transfer time – Time taken to transfer the required amount of data. It depends upon the rotational
speed.
4.Controller time – The processing time taken by the controller.
5.Average Access time – seek time + Average Rotational latency + data transfer time + controller time.

LOGICAL STRUCTURE
File Systems are stored on disks. The above figure
depicts a possible File-System Layout.
•MBR: Master Boot Record is used to boot the
computer
•Partition Table: Partition table is present at the end of
MBR. This table gives the starting and ending addresses
of each partition.
•Boot Block: When the computer is booted, the BIOS
reads in and executes the MBR. The first thing the MBR
program does is locate the active partition, read in its
first block, which is called the boot block, and execute
it. The program in the boot block loads the operating
system contained in that partition. Every partition
contains a boot block at the beginning though it does not
contain a bootable operating system.
•Super Block: It contains all the key parameters about
the file system and is read into memory when the
computer is booted or the file system is first touched.

Free space Management: To keep track of free disk space, the system maintains a free space list that records
all free blocks
I node: The information regarding each file in file system is kept in data structure called I-Node. For each file
there is one i-node
Root directory: It is the top of the file system tree
Files and directories: They are the files and directories in the disk

RAID(REDUNDANT ARRAY OF INDEPENDENT DISKS) STRUCTURE OF DISK
 RAID, or “Redundant Arrays of Independent Disks” is a technique which makes use of a combination of
multiple disks instead of using a single disk for increased performance, data redundancy or both
 Data redundancy, although taking up extra space, adds to disk reliability. This means, in case of disk
failure, if the same data is also backed up onto another disk, we can retrieve the data and go on with the
operation. On the other hand, if the data is spread across just multiple disks without the RAID technique,
the loss of a single disk can affect the entire data.

Key evaluation points for a RAID System
•Reliability: How many disk faults can the system tolerate?
•Availability: What fraction of the total session time is a system in uptime mode, i.e. how available is the
system for actual use?
•Performance: How good is the response time? How high is the throughput (rate of processing work)?
•Capacity: Given a set of N disks each with B blocks, how much useful capacity is available to the user?
• RAID is very transparent to the underlying system. This means, to the host system, it appears as a
single big disk presenting itself as a linear array of blocks. This allows older technologies to be replaced
by RAID without making too many changes in the existing code.

•In the figure, blocks “0,1,2,3” form a stripe.
•Instead of placing just one block into a disk at a
time, we can work with two (or more) blocks
placed into a disk before moving on to the next
one.
RAID-0 (Stripping)
•Blocks are “stripped” across disks.
Evaluation:
•Reliability: 0
There is no duplication of data. Hence, a block
once lost cannot be recovered.
•Capacity: N*B
The entire space is being used to store data. Since
there is no duplication, N disks each having B
blocks are fully utilized.

RAID-1 (Mirroring)
More than one copy of each block is stored in a separate disk. Thus, every block has
two (or more) copies, lying on different disks.
•RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of reliability.
Evaluation:
Assume a RAID system with mirroring level 2.
•Reliability: 1 to N/2
1 disk failure can be handled for certain, because blocks of that disk would have
duplicates on some other disk. If we are lucky enough and disks 0 and 2 fail, then again
this can be handled as the blocks of these disks have duplicates on disks 1 and 3. So,
in the best case, N/2 disk failures can be handled.

Raid 2
•This uses bit level striping. i.e Instead of
striping the blocks across the disks, it stripes
the bits across the disks.
•In the above diagram b1, b2, b3 are bits. E1,
E2, E3 are error correction codes.
•We need two groups of disks. One group of
disks are used to write the data, another
group is used to write the error correction
codes.
•When data is read from the disks, it also
reads the corresponding ECC code from the
redundancy disks, and checks whether the
data is consistent. If required, it makes
appropriate corrections .
•This is not used anymore. This is expensive
and implementing it in a RAID controller is
complex.

RAID 3
•This uses byte level striping. i.e
Instead of striping the blocks across
the disks, it stripes the bytes across
the disks.
•In the above diagram B1, B2, B3 are
bytes. p1, p2, p3 are parities.
•Uses multiple data disks, and a
dedicated disk to store parity.
•Sequential read and write will have
good performance.
•Random read and write will have
worst performance.

RAID 4
•This uses block level striping.
•In the above diagram A,B,C are blocks. p1,
p2, p3 are parities.
•Uses multiple data disks, and a dedicated
disk to store parity.
•Minimum of 3 disks (2 disks for data and 1
for parity)
•Good random reads, as the data blocks are
striped.
•Bad random writes, as for every write, it has
to write to the single parity disk.
•It is somewhat similar to RAID 3 and 5, but a
little different.
•This is just like RAID 3 in having the
dedicated parity disk, but this stripes blocks.
•This is just like RAID 5 in striping the blocks
across the data disks, but this has only one
parity disk.

RAID 5
This is a slight modification of the RAID-4
system where the only difference is that the
parity rotates among the drives.
•Reliability: 1
RAID-5 allows recovery of at most 1 disk
failure (because of the way parity works). If
more than one disk fails, there is no way to
recover the data. This is identical to RAID-
4.
•Capacity: (N-1)*B
Overall, space equivalent to one disk is
utilized in storing the parity. Hence, (N-1)
disks are made available for data storage,
each disk having B blocks.

RAID 6
•Just like RAID 5, this does block
level striping. However, it uses
dual parity.
•In the above diagram A, B, C are
blocks. p1, p2, p3 are parities.
•This creates two parity blocks for
each data block.
•Can handle two disk failure
•This RAID configuration is
complex to implement in a RAID
controller, as it has to calculate
two parity data for each data
block.

Unit 6 OSY.pptx aaaaaaaaaaaaaaaaaaaaaaaa

More Related Content

Similar to Unit 6 OSY.pptx aaaaaaaaaaaaaaaaaaaaaaaa (20)

Recently uploaded (20)

Unit 6 OSY.pptx aaaaaaaaaaaaaaaaaaaaaaaa