SlideShare a Scribd company logo
CS215 - Lec 9  indexing and reclaiming space in files
Maintain Indexes.
Adding a data record with Indexing.
Deleting a data record with Indexing.
Reclaiming space.
Multilevel Index.
Dr. Hussien M.
Sharaf
2
Dr. Hussien M.
Sharaf
3
Structure of Indexes
Indexes must be sorted on ascending or descending
order with respect to a (one or more ) field(s).
CompanyName offset
Google 211Record1
n
n
IBM 0Record2 n
ITE 643Record3 n
Microsoft 462Record4 n
Apple Mac 985
New
record n
Dr. Hussien M.
Sharaf
4
Operations needed for an Index:
1. Create an index at memory by
looping on all records from the
original data file.
2. If the there is an index file, load it
into memory before using it.
3. Write the index into file at the
closing of the program.
Dr. Hussien M.
Sharaf
5
-Now Index is loaded at memory, the following
operations are needed:
1. Add: Add data records to the data file and
insert an index record at the correct position.
2. Delete: mark the record at data file as
deleted and delete the related record from
the index.
3. Deleting and updating data records requires
updating the offsets of all index records. Is it
the same for the adding a data record?
Dr. Hussien M.
Sharaf
6
R1
R2
R3
R4
R5
Data records
R4
R3
R2
R5
R1
Index on Name
R2
R3
R1
R4
R5
Index on Phone
Dr. Hussien M.
Sharaf
7
R1
R2
R3
R4
R5
Data records
on disk
R4
R3
R2
R5
R1
Name Index on RAM
R2
R3
R1
R4
R5
Phone Index on RAM
R6
R6
R6
Dr. Hussien M.
Sharaf
8
1. Go to the end of data file, get current offset.
2. Data record is appended to the end of data
file.
3. An index entry is built using offset and key
of the new data record. (offset, Key)
4. The new index entry is inserted into its
correct position at sorted index list.
5. At the end of the program the index list is
saved into disk.
Dr. Hussien M.
Sharaf
9
1. Search for index entry by comparing target
value with the key field value.
2. Mark the index entry as deleted.
3. Get the offset of the target data record.
4. Seek for the target offset , mark the data
record as deleted.
NOTE: Data record is not actually deleted
immediately. Space reclaiming function is
required to run.
Dr. Hussien M.
Sharaf
10
R1
R2
R3
R4
R5
Data records
on disk
R4
R6
R2
R5
R1
Name Index on RAM
R2
R6
R1
R4
R5
Phone Index on RAM
R6
R3
R3
Dr. Hussien M.
Sharaf
11
A. Create a new file stream.
B. While not end of records
1. Read a collection of records into buffer.
2. For each record in the buffer:
If record is marked deleted, go to the next record.
Else copy record to the new file stream.
C. End While
D. Rebuild all indexes based on the new data
file.
NOTE: in the process of copying data to the
new stream, buffering is used.
Dr. Hussien M.
Sharaf
12
When an Index gets very big, it can not
be stored in RAM.
It should be stored on file, hence another
level of index that can be loaded into
memory is required.
Hence we need multilevel of indexing.
Dr. Hussien M.
Sharaf
13
Level #4 Index can be loaded into memory
CS215 - Lec 9  indexing and reclaiming space in files

More Related Content

PDF
CS215 - Lec 8 searching records
PDF
HEPData Open Repositories 2016 Talk
PDF
HEPData workshop talk
PDF
PDF
Reproducible, Open Data Science in the Life Sciences
PDF
Thomas Krichel (Long Island University) – AuthorClaim
PDF
Genomic Selection with Bayesian Generalized Linear Regression model using R
PDF
Tutorial for Estimating Broad and Narrow Sense Heritability using R
CS215 - Lec 8 searching records
HEPData Open Repositories 2016 Talk
HEPData workshop talk
Reproducible, Open Data Science in the Life Sciences
Thomas Krichel (Long Island University) – AuthorClaim
Genomic Selection with Bayesian Generalized Linear Regression model using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R

What's hot (10)

PDF
Tutorial for Circular and Rectangular Manhattan plots
PPTX
Data Wrangling with Open Refine
PPTX
Big Data & Hadoop Data Analysis
PPT
Design and creation of ontologies for environmental information retrieval
PDF
Basic Tutorial of Association Mapping by Avjinder Kaler
PPTX
Beautiful Research Data (Structured Data and Open Refine)
PDF
Db lec 08_new
PPTX
Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...
PPTX
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
PPTX
Computation systems for protecting delimited data
Tutorial for Circular and Rectangular Manhattan plots
Data Wrangling with Open Refine
Big Data & Hadoop Data Analysis
Design and creation of ontologies for environmental information retrieval
Basic Tutorial of Association Mapping by Avjinder Kaler
Beautiful Research Data (Structured Data and Open Refine)
Db lec 08_new
Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
Computation systems for protecting delimited data
Ad

Viewers also liked (20)

PDF
CS215 - Lec 2 file organization
PPT
Ie Storage, Multimedia And File Organization
PPTX
Disk structure & File Handling
PDF
Gps file structure explained
PPTX
File Structure Concepts
PDF
Discussion : File structure of Meteor Apps
PPTX
04.01 file organization
PPT
File organisation
PPTX
register file structure of PIC controller
PDF
File organization and processing
PPT
Ch 1-final-file organization from korth
PDF
Model answer of compilers june spring 2013
PPTX
Concept of computer files for Grade 12 learners
PPT
File organization techniques
PPTX
Office administration ppowerpoitn wed
PPTX
Handling computer files
PPTX
Concept of computer files
PPT
File organization
PPT
File handling
PDF
File organisation
CS215 - Lec 2 file organization
Ie Storage, Multimedia And File Organization
Disk structure & File Handling
Gps file structure explained
File Structure Concepts
Discussion : File structure of Meteor Apps
04.01 file organization
File organisation
register file structure of PIC controller
File organization and processing
Ch 1-final-file organization from korth
Model answer of compilers june spring 2013
Concept of computer files for Grade 12 learners
File organization techniques
Office administration ppowerpoitn wed
Handling computer files
Concept of computer files
File organization
File handling
File organisation
Ad

Similar to CS215 - Lec 9 indexing and reclaiming space in files (20)

PDF
CS215 - Lec 7 managing records collection
PDF
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
PPTX
MARUTHI_INVERTED_SEARCH_presentation.pptx
PDF
File handling in qbasic
PDF
Reaction StatisticsBackgroundWhen collecting experimental data f.pdf
PPTX
DBMS-Unit5-PPT.pptx important for revision
PDF
DATASTORAGE.pdf
PDF
DATASTORAGE
PPTX
DATASTORAGE.pptx
PPTX
File handling
PPT
Hadoop -HDFS.ppt
DOCX
iLAB OVERVIEWScenario and SummarySuccessful database recovery re.docx
PPT
Unit 4 data storage and querying
DOCX
CS3451-INTRODUCTION TO OPERATING SYSTEM-91035556-OS LAB CSE (1).docx
DOCX
Log into your netlab workstation then ssh to server.cnt1015.local wi.docx
PPTX
Overview of Storage and Indexing ...
PPTX
INVESTIGATE 2.pptx jksjadnkk ksaxka ja axkaka skaxkk
PPTX
Installing pidsr
PPTX
Lec 1 indexing and hashing
CS215 - Lec 7 managing records collection
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
MARUTHI_INVERTED_SEARCH_presentation.pptx
File handling in qbasic
Reaction StatisticsBackgroundWhen collecting experimental data f.pdf
DBMS-Unit5-PPT.pptx important for revision
DATASTORAGE.pdf
DATASTORAGE
DATASTORAGE.pptx
File handling
Hadoop -HDFS.ppt
iLAB OVERVIEWScenario and SummarySuccessful database recovery re.docx
Unit 4 data storage and querying
CS3451-INTRODUCTION TO OPERATING SYSTEM-91035556-OS LAB CSE (1).docx
Log into your netlab workstation then ssh to server.cnt1015.local wi.docx
Overview of Storage and Indexing ...
INVESTIGATE 2.pptx jksjadnkk ksaxka ja axkaka skaxkk
Installing pidsr
Lec 1 indexing and hashing

More from Arab Open University and Cairo University (20)

PDF
File Organization & processing Mid term summer 2014 - modelanswer
PDF
Model answer of exam TC_spring 2013
PPTX
Theory of computation Lec6
PPTX
Theory of computation Lec3 dfa
PPTX
Theory of computation Lec2
PPTX
Theory of computation Lec1
PPTX
Theory of computation Lec7 pda
PPTX
PPTX
Cs419 lec8 top-down parsing
PPTX
Cs419 lec11 bottom-up parsing
PPTX
Cs419 lec12 semantic analyzer
PPTX
Cs419 lec9 constructing parsing table ll1
PPTX
Cs419 lec10 left recursion and left factoring
PPTX
Cs419 lec6 lexical analysis using nfa
PPTX
Cs419 lec5 lexical analysis using dfa
PPTX
Cs419 lec4 lexical analysis using re
PPTX
Cs419 lec3 lexical analysis using re
File Organization & processing Mid term summer 2014 - modelanswer
Model answer of exam TC_spring 2013
Theory of computation Lec6
Theory of computation Lec3 dfa
Theory of computation Lec2
Theory of computation Lec1
Theory of computation Lec7 pda
Cs419 lec8 top-down parsing
Cs419 lec11 bottom-up parsing
Cs419 lec12 semantic analyzer
Cs419 lec9 constructing parsing table ll1
Cs419 lec10 left recursion and left factoring
Cs419 lec6 lexical analysis using nfa
Cs419 lec5 lexical analysis using dfa
Cs419 lec4 lexical analysis using re
Cs419 lec3 lexical analysis using re

Recently uploaded (20)

PDF
Pre independence Education in Inndia.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Basic Mud Logging Guide for educational purpose
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Lesson notes of climatology university.
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Cell Types and Its function , kingdom of life
PDF
Insiders guide to clinical Medicine.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Pre independence Education in Inndia.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Microbial disease of the cardiovascular and lymphatic systems
Basic Mud Logging Guide for educational purpose
Abdominal Access Techniques with Prof. Dr. R K Mishra
Sports Quiz easy sports quiz sports quiz
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Lesson notes of climatology university.
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Complications of Minimal Access Surgery at WLH
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Computing-Curriculum for Schools in Ghana
Cell Types and Its function , kingdom of life
Insiders guide to clinical Medicine.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...

CS215 - Lec 9 indexing and reclaiming space in files

  • 2. Maintain Indexes. Adding a data record with Indexing. Deleting a data record with Indexing. Reclaiming space. Multilevel Index. Dr. Hussien M. Sharaf 2
  • 3. Dr. Hussien M. Sharaf 3 Structure of Indexes Indexes must be sorted on ascending or descending order with respect to a (one or more ) field(s). CompanyName offset Google 211Record1 n n IBM 0Record2 n ITE 643Record3 n Microsoft 462Record4 n Apple Mac 985 New record n
  • 4. Dr. Hussien M. Sharaf 4 Operations needed for an Index: 1. Create an index at memory by looping on all records from the original data file. 2. If the there is an index file, load it into memory before using it. 3. Write the index into file at the closing of the program.
  • 5. Dr. Hussien M. Sharaf 5 -Now Index is loaded at memory, the following operations are needed: 1. Add: Add data records to the data file and insert an index record at the correct position. 2. Delete: mark the record at data file as deleted and delete the related record from the index. 3. Deleting and updating data records requires updating the offsets of all index records. Is it the same for the adding a data record?
  • 6. Dr. Hussien M. Sharaf 6 R1 R2 R3 R4 R5 Data records R4 R3 R2 R5 R1 Index on Name R2 R3 R1 R4 R5 Index on Phone
  • 7. Dr. Hussien M. Sharaf 7 R1 R2 R3 R4 R5 Data records on disk R4 R3 R2 R5 R1 Name Index on RAM R2 R3 R1 R4 R5 Phone Index on RAM R6 R6 R6
  • 8. Dr. Hussien M. Sharaf 8 1. Go to the end of data file, get current offset. 2. Data record is appended to the end of data file. 3. An index entry is built using offset and key of the new data record. (offset, Key) 4. The new index entry is inserted into its correct position at sorted index list. 5. At the end of the program the index list is saved into disk.
  • 9. Dr. Hussien M. Sharaf 9 1. Search for index entry by comparing target value with the key field value. 2. Mark the index entry as deleted. 3. Get the offset of the target data record. 4. Seek for the target offset , mark the data record as deleted. NOTE: Data record is not actually deleted immediately. Space reclaiming function is required to run.
  • 10. Dr. Hussien M. Sharaf 10 R1 R2 R3 R4 R5 Data records on disk R4 R6 R2 R5 R1 Name Index on RAM R2 R6 R1 R4 R5 Phone Index on RAM R6 R3 R3
  • 11. Dr. Hussien M. Sharaf 11 A. Create a new file stream. B. While not end of records 1. Read a collection of records into buffer. 2. For each record in the buffer: If record is marked deleted, go to the next record. Else copy record to the new file stream. C. End While D. Rebuild all indexes based on the new data file. NOTE: in the process of copying data to the new stream, buffering is used.
  • 12. Dr. Hussien M. Sharaf 12 When an Index gets very big, it can not be stored in RAM. It should be stored on file, hence another level of index that can be loaded into memory is required. Hence we need multilevel of indexing.
  • 13. Dr. Hussien M. Sharaf 13 Level #4 Index can be loaded into memory