SlideShare a Scribd company logo

Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6
9:00 Intro &
terminology
TP mons
& ORBs
Logging &
res. Mgr.
Files &
Buffer Mgr.
Structured
files
11:00 Reliability Locking
theory
Res. Mgr. &
Trans. Mgr.
COM+ Access paths
13:30 Fault
tolerance
Locking
techniques
CICS & TP
& Internet
CORBA/
EJB + TP
Groupware
15:30 Transaction
models
Queueing Advanced
Trans. Mgr.
Replication Performance
& TPC
18:00 Reception Workflow Cyberbricks Party FREE
Files and Buffer Manager
Chapter 15

Abstractions Provided by the File
Manager
Device independence: The file manager turns the large
variety of external storage devices, such as disks (with
their different numbers of cylinders, tracks, arms, and
read/write heads), ram-disks, tapes, and so on, into
simple abstract data types.
Allocation independence: The file manager does its
own space management for storing the data objects
presented by the client. It may store the same objects in
more than one place (replication).

Abstractions Provided by the File
Manager
Address independence: Whereas objects in main
memory are always accessed through their addresses,
the file manager provides mechanisms for associative
access. Thus, for example, the client can request
access to all records with a specified value in some
field of the record. Support for associative access
comes in many flavors, from simple mechanisms
yielding fast retrieval via the primary key up to the
expressive power of the SQL select statement.

External Storage vs. Main Memory
Capacity: Main memory is usually limited to a size that
is some orders of magnitude smaller than what large
databases need.
Economics: External storage holds large volumes of
data at reasonable cost.
Durability: Main memory is volatile. External storage
devices such as magnetic or optical disks are
inherently durable and therefore are appropriate for
storing persistent objects. After a crash, recovery
starts with what is found in durable storage.

External Storage vs. Main Memory
Speed: External storage devices are some orders of
magnitude slower than main memory. As a result, it is
more costly, both in terms of latency and in terms of
pathlength, to get data from external storage to the CPU
than to load data from main memory.
Functionality: Data cannot be processed directly on
external storage: they can neither be compared nor
modified “out there.”

The Storage Pyramid
main
memory
online
external
storage
near line
(archive)
storage
typical capacity
Electronic RAM
and bulk
storage
Magnetic
/ optical
disks
Automated archives
(e.g. optical disk
jukeboxes, tape
robots, etc.)
current data
stale
data

Interfacing to External Memory:
Read-Write Mapping
ExternalStorage
FileA
FileB
FileC
FileD
MainMemory
readobjectw readobjectyreadobjectx
/writeobjectx

Interfacing to External Memory:
File Mapping
ExternalStorage
FileA
FileB
FileC
FileD
MainMemory FileC
mapFileC
unmapFileC

Interfacing to External Memory:
Single-Level Storage
External Storage
File A
File B
File C
File D
Main Memory
File A File B File C File D
Virtual
memory
Explicit mapping

Locality and Cacheing
The movement of data through the pyramid is guided by
the principle of locality:
Locality of active data: Data that have recently been
referenced will very likely be referenced again.
Locality of passive data: Data that have not been
referenced recently will most likely not be referenced in
the future.

Levels of Abstraction in a File and
Database Manager
main
memory
online
external
memory
nearline
external
memory
DBMS
Application
database
access
modules
databaseb
uffer mgr.
logging
recovery
media
and file
manager
archive
manager
Transaction
programs
Tuple
management,
associative
access
Buffer management
File management
Archive management
manages
manages
manages
tuple
oriented
access
block
oriented
access
device
oriented
access
setoriented
access
Application
sort, join,...
read, write

Operations of the Basic File System
STATUS create(filename, allocparmp)
STATUS delete(filename)
STATUS open(filename, ACCESSMODE, FILEID);
STATUS close(FILEID)
STATUS extend(FILEID, allocparmp)
STATUS read(FILEID, BLOCKID, BLOCKP)
STATUS readc(FILEID, BLOCKID, blockcount,
BLOCKP)
STATUS write(FILEID, BLOCKID, BLOCKP)
STATUS writec(FILEID, BLOCKID, blockcount,
BLOCKP)

Mapping Files To Disk
File A
File B
File C
File D
File E
Disk
1
The denote the
address
mapping
between
disk and
files.

Issues in Managing Disk Space
Initial allocation: When a file is created, how
many contiguous slots should be allocated to
it?
Incremental expansion: If an existing file grows
beyond the number of slots currently allocated,
how many additional contiguous blocks should
be assigned to that file?
Reorganization: When and how should the
free space on the disk be reorganized?

Extent-Based Allocation
D i s k - A D i s k - B D i s k - A D i s k - Ad i s k - i d
e x t e n t i n d e x
a c c u m - l e n g t h
1 4 1 8 7 3 2 1 4
1 0 0 3 5 0 6 0 0 8 5 0
p r i m a r y 1 . s e c o n d a r y 2 . s e c o n d a r y 3 . s e c o n d a r y
f i l e d i r e c t o r y
A Be x t e n t d i r e c t o r y

Buddy Systems
Slots (shadedextents are free)
Free
extents
Type
0
1
2
3
0110
0010
1100
1011
Buddytypes
0
0
0
0
0
0
0
1
0
0
1
0
0
0
1
1
0
1
0
0
0
1
0
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
1
0
1
0
1
0
1
1
1
1
0
0
1
1
0
1
1
1
1
0
1
1
1
1
0
1
2
3

Simple Mapping of Relations To Disks
filesystem&operatingsystem
databasesystem
realdisks
Filesystem
segments
relations
extents

A Usual Way of Mapping of Relations
To Disks
operating system database system
real disks logical disks OS-files
tabelspaces
segments
relations
extents

Principles of the Database Buffer
process of access
module
buffer
storage
area
buffer is accessible
fromthecaller's
process
(sharedmemory)
buffer
manager
interface
readdirect
directory
bufferfix(P, ...)
Giveme
pageP
findframein
buffer
determine
FILEIDand
blocknumber
return
frame
address F
1
23
4
5

Design Options for the Buffer
Manager
Buffer per file: Each file has its own private buffer pool..
Buffer per page size: In systems with different page
(and block) sizes, there is usually at least one buffer for
each page size.
Buffer per file type: There are files like indices, which
are accessed in a significantly different way from other
files. Therefore, some systems dedicate buffers to files
depending on the access pattern and try to manage
each of them in a way that is optimal for the respective
file organization.

Logic of the Buffer Manager
Search in buffer: Check if the requested page is
in the buffer. If found, return the address F of
this frame to the caller.
Find free frame: If the page is not in the buffer,
find a frame that holds no valid page.
Determine replacement victim: If no such frame
exists, determine a page that can be removed
from the buffer (in order to reuse its frame).

Logic of the Buffer Manager
Write modified page: If replacement page has
been changed, write it.
Establish frame address: Denote the start
address of the frame as F.
Determine block address: Translate the
requested PAGEID P into a FILEID and a block
number. Read the block into the frame
selected.
Return: Return the frame address F to the
caller.

Synchronization in the Buffer
process A process B process A process B
bufferfix (P,...) bufferfix (P,...)
give copy
to caller
give copy
to caller
change P to P' change P to P"
rewrite page rewrite page
?
a) Access module in
process Arequests
access to page P;
gets private copy.
b) Access module in
process Brequests
access to page P;
gets private copy.
c) Both processes try to rewrite
an updated version of the page,
but these versions are different.
Only the version written last
will be on disk; this is the "lost
update" anomaly.

What the Buffer Manager Does for
Synchronization
Sharing: Pages are made addressable to all
processes that run the database code.
Semaphore protection: Each requestor gets
the address of a semaphor protecting the page.
Durable storage: The access modules inform
the buffer manager if their page access has
resulted in an update of the page; the actual
write operation, however, is issued by the buffer
manager, probably at a time when the update
transaction is long gone.

The Interface to the Buffer Manager
typedef struct
{PAGEID pageid; /* id of page in file */
PAGEPTR pageaddr; /* base addr. in buffer */
int index; /* record within page
*/
semaphore * pagesem; /* pointer to the sem. */
Boolean modified; /* caller modif. page
*/
Boolean invalid; /* destroyed page */
} BUFFER_ACC_CB, *BUFFER_ACC_CBP;
/* control block for buffer access */

The Need for Fix and Unfix
transaction X
bufferfix(P,...)
page P
not
in buffer
read block
containing
page P
return base
address in
buffer
transaction Y transaction Y
bufferfix(Q,...)
page Q not
in buffer;
replace P
read block
containing
page Q
bufferfix(Q,...)
return base
address in
buffer
a) Transaction X
requests access
to page P; gets
base address in buffer.
b) Transaction Y
requests access
to page Q; buffer
mgr. decides to
replace page P
c) Transaction Y
gets the base
address of Q in
the buffer - is
same as P's.

The Fix-Use-Unfix Protocol I
FIX: The client requests access to a page
using the bufferfix interface.
USE: The client uses the page and the pointer
to the frame containing the page will remain
valid.
UNFIX: The client explicitly waives further
usage of the frame pointer; that is, it tells the
buffer manager that it no longer wants to use
that page.

The Fix-Use-Unfix Protocol II
page P
page Q
page R
fix page P
use
use
unfix page P
fix page R
use
use
use
unfix page R
fix page Quse
use
use
unfix page Q

Structure of the Buffer Manager
bufferpool
frames
hash table
buffer
control
block
buffer
control
block
buffer
control
block
buffer
control
block
buffer
control
block
buffer
control
block
p
a
g
e
s
frame_index first_bcb
next_in_hclass
mru_pagelru_page
buffer
access
control
block
to and from client
index of frame holding the page (address pointer in
case of buffer access control block)
chain of buffer control blocks in same hash class
LRU - chain

Logging and Recovery from the
Buffer Manager's Perspective I
Transaction Buffer Database Remark
running
running
running
running
committed
committed
committed
committed
A BA B OK; old state in DB
BA A B OK; old state in DB
BA A B database corrupted
BA BA conflicting view on TA
A BA B
BA A B
BA A B
BA BA
OK; Read-only TA
DB not in new state
database corrupted
OK; new state in DB

Logging and Recovery from the
Buffer Manager's Perspective II
state of
transaction
TA
aborted
aborted
committed
committed
state of
page A in
database
old
new
new
result of recovery
using operation log
wrong tuple might be deleted
old
inverse operation succeeds
operation succeeds
duplicate of tuple is inserted

The Log and Page LSNs
b u f f e r
m a n a g e r
l o g
m a n a g e r l o g r e c o r d f o r
p a g e A
l o g r e c o r d f o r
p a g e A
l o g r e c o r d f o r
p a g e A
w r i t e t o d i s k
p a g e A L S N 1
w r i t e t o d i s k
p a g e A L S N 3
l o g r e c o r d f o r
p a g e A
t i m e
L S N 1 L S N 2 L S N 3 L S N 4

Different Buffer Management Policies
Steal policy: When the buffer manager needs
space, it can decide to replace dirty pages.
No-Steal policy: Pages can be replaced only if
they are clean.
Force policy: At end of transaction, all
modified pages are forced to disk in a series of
synchronous write operations.
No-Force policy: No modified page is forced
during commit. REDO log records are written to
the log.

The Problem of Hotspot Pages
bufferpool
durable storage
log
TA1
TA2
TA3
TA4 TA5
TA6
TA7
TA8
force
The dotted arrows indicate an update of the page by the respective
transaction.The arrows at 45 degrees indicate the forced writing of the page
during commit processing.The downward arrows indicate the writing of log
records for the respective transaction.
update
operations
page A

The Basic Checkpoint Algorithm
Quiesce: Delay all incoming update DML calls
until all fixes with exclusive semaphores have
been released.
Flush the buffer: Write all modified pages.
Log the checkpoint: Write a record to the log,
saying that a checkpoint has been generated.
Resume normal operation: The bufferfix requests
for updates that have been delayed in order to
take the checkpoint can now be processed again.

The Case for Indirect Checkpointing
checkpoints
c1 c2 c3
1
2
3
4
5
6
7
8
9
10
frame-
numbers
log
When taking a checkpoint, the PAGEIDs of the pages
currently in buffer are written to the log.

The Indirect Checkpointing Algorithm
Record TOC: Log the list of PAGEIDs.
Compare with prev. ckpt: See if any modified
pages have not been replaced since last ckpt.
Force lazy pages: Schedule the writing of those
pages during the next checkpoint interval.
Low-water mark: Find the LSN of the oldest
still-volatile update; write it to the log.
Write “Checkpoint done” record
Resume normal operation

Further Possibilities for Optimization
Pre-flushing can be performed by an
asynchronous process that scans the buffer for
"old" modified pages. Writing is done under
semaphore protection.
Pre-fetching can, among other things, be used
to make restart more efficient. If page reads are
logged one can use the recent checkpoint plus
the log to prime the bufferpool, i.e. it will look
almost exactly like at the moment of the crash.

Further Possibilities for Optimization
Transaction scheduling and buffer management
can take hints from the query optimizer:
This relation will be scanned sequentially.
This is a sequential scan of the leaves of a B-
tree.
This is the traversal of a B-tree, starting at the
root.
This is a nested-loop join, where the inner
relation is scanned in physically sequential
order.

More Related Content

PPTX
Linux files
PPTX
File management
PDF
Overloading Perl OPs using XS
PPT
Firebird2.5 Benchmarks(English)20091031
PPT
Basic linux day 3
PDF
Github.com anton terekhov-orientdb-php
PPTX
File handling in C by Faixan
Linux files
File management
Overloading Perl OPs using XS
Firebird2.5 Benchmarks(English)20091031
Basic linux day 3
Github.com anton terekhov-orientdb-php
File handling in C by Faixan

What's hot (20)

PPT
Basic Linux day 6
PPT
Basic linux day 4
PPT
Basic Linux day 1
PDF
Unix commands in etl testing
PDF
Devops for beginners
PPTX
File handling in C
PDF
file handling c++
PPTX
Hadoop Interacting with HDFS
ODP
C 檔案輸入與輸出
PPS
Rpg Pointers And User Space
PDF
Fluentd unified logging layer
PDF
SGN Introduction to UNIX Command-line 2015 part 1
PPT
File in cpp 2016
PDF
Robert Havelka: REDIS – Letem světem
PDF
HDFS_Command_Reference
PPT
Unit 7
PPTX
file management in c language
PDF
20141111 파이썬으로 Hadoop MR프로그래밍
PPTX
UNIT 10. Files and file handling in C
PDF
Part 6 of "Introduction to linux for bioinformatics": Productivity tips
Basic Linux day 6
Basic linux day 4
Basic Linux day 1
Unix commands in etl testing
Devops for beginners
File handling in C
file handling c++
Hadoop Interacting with HDFS
C 檔案輸入與輸出
Rpg Pointers And User Space
Fluentd unified logging layer
SGN Introduction to UNIX Command-line 2015 part 1
File in cpp 2016
Robert Havelka: REDIS – Letem světem
HDFS_Command_Reference
Unit 7
file management in c language
20141111 파이썬으로 Hadoop MR프로그래밍
UNIT 10. Files and file handling in C
Part 6 of "Introduction to linux for bioinformatics": Productivity tips
Ad

Viewers also liked (13)

PDF
MarkH-CV 2015
PPT
03 fault model
PPT
05 tp mon_orbs
PPTX
4G MOBILE COMMUNICATION SYSTEM
PDF
Assignment1.2012
PDF
Solution5.2012
PPT
10a log
PDF
21 domino mohan-1
PPT
Dynamic Power of QFT - Copy
DOCX
PDF
Transactions
DOCX
UPDATED INTERVIEW GUIDE
PDF
Emerging Trends and Tech
MarkH-CV 2015
03 fault model
05 tp mon_orbs
4G MOBILE COMMUNICATION SYSTEM
Assignment1.2012
Solution5.2012
10a log
21 domino mohan-1
Dynamic Power of QFT - Copy
Transactions
UPDATED INTERVIEW GUIDE
Emerging Trends and Tech
Ad

Similar to 15 bufferand records (20)

PDF
Lecture storage-buffer
PPTX
Main Memory Management in Operating System
DOCX
virtual memory
PPT
Os8 2
PDF
Measuring Firebird Disk I/O
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
4K Video Downloader Crack + License Key 2025
PPTX
Unit 5Memory management.pptx
PPT
Chapter 8 - Main Memory
PPT
Memory management
PDF
7989-lect 10.pdf
PDF
File
PDF
Java File I/O Performance Analysis - Part I - JCConf 2018
PPTX
CS 542 Putting it all together -- Storage Management
PPT
Ch9 OS
 
PPT
PPT
unit-4 class (2).ppt,Memory managements part-1
PPT
Memory+management
Lecture storage-buffer
Main Memory Management in Operating System
virtual memory
Os8 2
Measuring Firebird Disk I/O
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
4K Video Downloader Crack + License Key 2025
Unit 5Memory management.pptx
Chapter 8 - Main Memory
Memory management
7989-lect 10.pdf
File
Java File I/O Performance Analysis - Part I - JCConf 2018
CS 542 Putting it all together -- Storage Management
Ch9 OS
 
unit-4 class (2).ppt,Memory managements part-1
Memory+management

More from ashish61_scs (20)

PDF
7 concurrency controltwo
PPT
22 levine
PPT
20 access paths
PPT
19 structured files
PPT
18 philbe replication stanford99
PDF
17 wics99 harkey
PPT
16 greg hope_com_wics
PPT
14 turing wics
PPT
14 scaleabilty wics
PPT
13 tm adv
PPT
PPT
10b rm
PPT
09 workflow
PPT
08 message and_queues_dieter_gawlick
PPT
06 07 lock
PPT
04 transaction models
PPT
02 fault tolerance
PPT
01 whirlwind tour
PDF
Solution6.2012
PDF
Solution7.2012
7 concurrency controltwo
22 levine
20 access paths
19 structured files
18 philbe replication stanford99
17 wics99 harkey
16 greg hope_com_wics
14 turing wics
14 scaleabilty wics
13 tm adv
10b rm
09 workflow
08 message and_queues_dieter_gawlick
06 07 lock
04 transaction models
02 fault tolerance
01 whirlwind tour
Solution6.2012
Solution7.2012

Recently uploaded (20)

PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Business Ethics Teaching Materials for college
PPTX
Pharma ospi slides which help in ospi learning
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
01-Introduction-to-Information-Management.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
master seminar digital applications in india
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Classroom Observation Tools for Teachers
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Pre independence Education in Inndia.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
Supply Chain Operations Speaking Notes -ICLT Program
Business Ethics Teaching Materials for college
Pharma ospi slides which help in ospi learning
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
01-Introduction-to-Information-Management.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
master seminar digital applications in india
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Classroom Observation Tools for Teachers
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
O7-L3 Supply Chain Operations - ICLT Program
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Pre independence Education in Inndia.pdf
Week 4 Term 3 Study Techniques revisited.pptx

15 bufferand records

  • 1.  Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro & terminology TP mons & ORBs Logging & res. Mgr. Files & Buffer Mgr. Structured files 11:00 Reliability Locking theory Res. Mgr. & Trans. Mgr. COM+ Access paths 13:30 Fault tolerance Locking techniques CICS & TP & Internet CORBA/ EJB + TP Groupware 15:30 Transaction models Queueing Advanced Trans. Mgr. Replication Performance & TPC 18:00 Reception Workflow Cyberbricks Party FREE Files and Buffer Manager Chapter 15
  • 2.  Abstractions Provided by the File Manager Device independence: The file manager turns the large variety of external storage devices, such as disks (with their different numbers of cylinders, tracks, arms, and read/write heads), ram-disks, tapes, and so on, into simple abstract data types. Allocation independence: The file manager does its own space management for storing the data objects presented by the client. It may store the same objects in more than one place (replication).
  • 3.  Abstractions Provided by the File Manager Address independence: Whereas objects in main memory are always accessed through their addresses, the file manager provides mechanisms for associative access. Thus, for example, the client can request access to all records with a specified value in some field of the record. Support for associative access comes in many flavors, from simple mechanisms yielding fast retrieval via the primary key up to the expressive power of the SQL select statement.
  • 4.  External Storage vs. Main Memory Capacity: Main memory is usually limited to a size that is some orders of magnitude smaller than what large databases need. Economics: External storage holds large volumes of data at reasonable cost. Durability: Main memory is volatile. External storage devices such as magnetic or optical disks are inherently durable and therefore are appropriate for storing persistent objects. After a crash, recovery starts with what is found in durable storage.
  • 5.  External Storage vs. Main Memory Speed: External storage devices are some orders of magnitude slower than main memory. As a result, it is more costly, both in terms of latency and in terms of pathlength, to get data from external storage to the CPU than to load data from main memory. Functionality: Data cannot be processed directly on external storage: they can neither be compared nor modified “out there.”
  • 6.  The Storage Pyramid main memory online external storage near line (archive) storage typical capacity Electronic RAM and bulk storage Magnetic / optical disks Automated archives (e.g. optical disk jukeboxes, tape robots, etc.) current data stale data
  • 7.  Interfacing to External Memory: Read-Write Mapping ExternalStorage FileA FileB FileC FileD MainMemory readobjectw readobjectyreadobjectx /writeobjectx
  • 8.  Interfacing to External Memory: File Mapping ExternalStorage FileA FileB FileC FileD MainMemory FileC mapFileC unmapFileC
  • 9.  Interfacing to External Memory: Single-Level Storage External Storage File A File B File C File D Main Memory File A File B File C File D Virtual memory Explicit mapping
  • 10.  Locality and Cacheing The movement of data through the pyramid is guided by the principle of locality: Locality of active data: Data that have recently been referenced will very likely be referenced again. Locality of passive data: Data that have not been referenced recently will most likely not be referenced in the future.
  • 11.  Levels of Abstraction in a File and Database Manager main memory online external memory nearline external memory DBMS Application database access modules databaseb uffer mgr. logging recovery media and file manager archive manager Transaction programs Tuple management, associative access Buffer management File management Archive management manages manages manages tuple oriented access block oriented access device oriented access setoriented access Application sort, join,... read, write
  • 12.  Operations of the Basic File System STATUS create(filename, allocparmp) STATUS delete(filename) STATUS open(filename, ACCESSMODE, FILEID); STATUS close(FILEID) STATUS extend(FILEID, allocparmp) STATUS read(FILEID, BLOCKID, BLOCKP) STATUS readc(FILEID, BLOCKID, blockcount, BLOCKP) STATUS write(FILEID, BLOCKID, BLOCKP) STATUS writec(FILEID, BLOCKID, blockcount, BLOCKP)
  • 13.  Mapping Files To Disk File A File B File C File D File E Disk 1 The denote the address mapping between disk and files.
  • 14.  Issues in Managing Disk Space Initial allocation: When a file is created, how many contiguous slots should be allocated to it? Incremental expansion: If an existing file grows beyond the number of slots currently allocated, how many additional contiguous blocks should be assigned to that file? Reorganization: When and how should the free space on the disk be reorganized?
  • 15.  Extent-Based Allocation D i s k - A D i s k - B D i s k - A D i s k - Ad i s k - i d e x t e n t i n d e x a c c u m - l e n g t h 1 4 1 8 7 3 2 1 4 1 0 0 3 5 0 6 0 0 8 5 0 p r i m a r y 1 . s e c o n d a r y 2 . s e c o n d a r y 3 . s e c o n d a r y f i l e d i r e c t o r y A Be x t e n t d i r e c t o r y
  • 16.  Buddy Systems Slots (shadedextents are free) Free extents Type 0 1 2 3 0110 0010 1100 1011 Buddytypes 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 0 1 2 3
  • 17.  Simple Mapping of Relations To Disks filesystem&operatingsystem databasesystem realdisks Filesystem segments relations extents
  • 18.  A Usual Way of Mapping of Relations To Disks operating system database system real disks logical disks OS-files tabelspaces segments relations extents
  • 19.  Principles of the Database Buffer process of access module buffer storage area buffer is accessible fromthecaller's process (sharedmemory) buffer manager interface readdirect directory bufferfix(P, ...) Giveme pageP findframein buffer determine FILEIDand blocknumber return frame address F 1 23 4 5
  • 20.  Design Options for the Buffer Manager Buffer per file: Each file has its own private buffer pool.. Buffer per page size: In systems with different page (and block) sizes, there is usually at least one buffer for each page size. Buffer per file type: There are files like indices, which are accessed in a significantly different way from other files. Therefore, some systems dedicate buffers to files depending on the access pattern and try to manage each of them in a way that is optimal for the respective file organization.
  • 21.  Logic of the Buffer Manager Search in buffer: Check if the requested page is in the buffer. If found, return the address F of this frame to the caller. Find free frame: If the page is not in the buffer, find a frame that holds no valid page. Determine replacement victim: If no such frame exists, determine a page that can be removed from the buffer (in order to reuse its frame).
  • 22.  Logic of the Buffer Manager Write modified page: If replacement page has been changed, write it. Establish frame address: Denote the start address of the frame as F. Determine block address: Translate the requested PAGEID P into a FILEID and a block number. Read the block into the frame selected. Return: Return the frame address F to the caller.
  • 23.  Synchronization in the Buffer process A process B process A process B bufferfix (P,...) bufferfix (P,...) give copy to caller give copy to caller change P to P' change P to P" rewrite page rewrite page ? a) Access module in process Arequests access to page P; gets private copy. b) Access module in process Brequests access to page P; gets private copy. c) Both processes try to rewrite an updated version of the page, but these versions are different. Only the version written last will be on disk; this is the "lost update" anomaly.
  • 24.  What the Buffer Manager Does for Synchronization Sharing: Pages are made addressable to all processes that run the database code. Semaphore protection: Each requestor gets the address of a semaphor protecting the page. Durable storage: The access modules inform the buffer manager if their page access has resulted in an update of the page; the actual write operation, however, is issued by the buffer manager, probably at a time when the update transaction is long gone.
  • 25.  The Interface to the Buffer Manager typedef struct {PAGEID pageid; /* id of page in file */ PAGEPTR pageaddr; /* base addr. in buffer */ int index; /* record within page */ semaphore * pagesem; /* pointer to the sem. */ Boolean modified; /* caller modif. page */ Boolean invalid; /* destroyed page */ } BUFFER_ACC_CB, *BUFFER_ACC_CBP; /* control block for buffer access */
  • 26.  The Need for Fix and Unfix transaction X bufferfix(P,...) page P not in buffer read block containing page P return base address in buffer transaction Y transaction Y bufferfix(Q,...) page Q not in buffer; replace P read block containing page Q bufferfix(Q,...) return base address in buffer a) Transaction X requests access to page P; gets base address in buffer. b) Transaction Y requests access to page Q; buffer mgr. decides to replace page P c) Transaction Y gets the base address of Q in the buffer - is same as P's.
  • 27.  The Fix-Use-Unfix Protocol I FIX: The client requests access to a page using the bufferfix interface. USE: The client uses the page and the pointer to the frame containing the page will remain valid. UNFIX: The client explicitly waives further usage of the frame pointer; that is, it tells the buffer manager that it no longer wants to use that page.
  • 28.  The Fix-Use-Unfix Protocol II page P page Q page R fix page P use use unfix page P fix page R use use use unfix page R fix page Quse use use unfix page Q
  • 29.  Structure of the Buffer Manager bufferpool frames hash table buffer control block buffer control block buffer control block buffer control block buffer control block buffer control block p a g e s frame_index first_bcb next_in_hclass mru_pagelru_page buffer access control block to and from client index of frame holding the page (address pointer in case of buffer access control block) chain of buffer control blocks in same hash class LRU - chain
  • 30.  Logging and Recovery from the Buffer Manager's Perspective I Transaction Buffer Database Remark running running running running committed committed committed committed A BA B OK; old state in DB BA A B OK; old state in DB BA A B database corrupted BA BA conflicting view on TA A BA B BA A B BA A B BA BA OK; Read-only TA DB not in new state database corrupted OK; new state in DB
  • 31.  Logging and Recovery from the Buffer Manager's Perspective II state of transaction TA aborted aborted committed committed state of page A in database old new new result of recovery using operation log wrong tuple might be deleted old inverse operation succeeds operation succeeds duplicate of tuple is inserted
  • 32.  The Log and Page LSNs b u f f e r m a n a g e r l o g m a n a g e r l o g r e c o r d f o r p a g e A l o g r e c o r d f o r p a g e A l o g r e c o r d f o r p a g e A w r i t e t o d i s k p a g e A L S N 1 w r i t e t o d i s k p a g e A L S N 3 l o g r e c o r d f o r p a g e A t i m e L S N 1 L S N 2 L S N 3 L S N 4
  • 33.  Different Buffer Management Policies Steal policy: When the buffer manager needs space, it can decide to replace dirty pages. No-Steal policy: Pages can be replaced only if they are clean. Force policy: At end of transaction, all modified pages are forced to disk in a series of synchronous write operations. No-Force policy: No modified page is forced during commit. REDO log records are written to the log.
  • 34.  The Problem of Hotspot Pages bufferpool durable storage log TA1 TA2 TA3 TA4 TA5 TA6 TA7 TA8 force The dotted arrows indicate an update of the page by the respective transaction.The arrows at 45 degrees indicate the forced writing of the page during commit processing.The downward arrows indicate the writing of log records for the respective transaction. update operations page A
  • 35.  The Basic Checkpoint Algorithm Quiesce: Delay all incoming update DML calls until all fixes with exclusive semaphores have been released. Flush the buffer: Write all modified pages. Log the checkpoint: Write a record to the log, saying that a checkpoint has been generated. Resume normal operation: The bufferfix requests for updates that have been delayed in order to take the checkpoint can now be processed again.
  • 36.  The Case for Indirect Checkpointing checkpoints c1 c2 c3 1 2 3 4 5 6 7 8 9 10 frame- numbers log When taking a checkpoint, the PAGEIDs of the pages currently in buffer are written to the log.
  • 37.  The Indirect Checkpointing Algorithm Record TOC: Log the list of PAGEIDs. Compare with prev. ckpt: See if any modified pages have not been replaced since last ckpt. Force lazy pages: Schedule the writing of those pages during the next checkpoint interval. Low-water mark: Find the LSN of the oldest still-volatile update; write it to the log. Write “Checkpoint done” record Resume normal operation
  • 38.  Further Possibilities for Optimization Pre-flushing can be performed by an asynchronous process that scans the buffer for "old" modified pages. Writing is done under semaphore protection. Pre-fetching can, among other things, be used to make restart more efficient. If page reads are logged one can use the recent checkpoint plus the log to prime the bufferpool, i.e. it will look almost exactly like at the moment of the crash.
  • 39.  Further Possibilities for Optimization Transaction scheduling and buffer management can take hints from the query optimizer: This relation will be scanned sequentially. This is a sequential scan of the leaves of a B- tree. This is the traversal of a B-tree, starting at the root. This is a nested-loop join, where the inner relation is scanned in physically sequential order.