SlideShare a Scribd company logo
Database Sizing
Idea used from: Lake (2014)
Why and When Size?
• Initially to establish the scale of the
required database to help select OS
environment and DBMS
• Establish HDD requirements
• To get a “feel” for the data:
• which tables need special treatment: separate
tablespaces? Partitioning?….
• Generate statistics which help in physical
design
• Continually monitor
Sizing Basics
Bit
“One of the two digits 0 and 1 used in binary
notation. The word comes from Binary digit”
Byte
 “A set of binary digits usually representing
one character, which is treated by the
computer as one unit”
Common Data Types
Type Bytes Range
bit 0 or 1
tinyint 1 0 to 255
smallint 2 -216
to 216
- 1
integer 4 -231
to 231
- 1
decimal(m,n) 8 -1038
to 1038
- 1
datetime 8 depends on DBMS
char(n) , string(n) n Maximum 255
varchar2(n) n Maximum 4000
Oracle Date Data Type
Dates are renowned for causing problems
when transfering data between DBMS
because the method used to store the data
internally differs. For example:
In Oracle the DATE datatype stores the
century, year, month, day, hours, minutes, and
seconds.
Paradox date fields can contain any valid date
from January 1, 9999 BC to December 31, 9999
AD.
Oracle Date Data Type
Example with Oracle:
CREATE TABLE Birthdays_tab (Bname
VARCHAR2(20),Bday DATE) ;
INSERT INTO Birthdays_tab (bname, bday) VALUES
('ANNIE',TO_DATE('13-NOV-92 10:56 A.M.','DD-
MON-YY HH:MI A.M.'));
Oracle uses its own internal format to
store dates. Date data is stored in fixed-
length fields of seven bytes each,
corresponding to century, year, month,
day, hour, minute, and second.
Oracle Varchar2 Data Type
The VARCHAR2 datatype stores variable-length character
strings.
specify a maximum string length (in bytes or characters)
between 1 and 4000 bytes for the VARCHAR2 column.
For each row, Oracle stores each value in the column as a
variable-length field unless a value exceeds the column's
maximum length, in which case Oracle returns an error.
Using VARCHAR2 saves on space used by the table.
For example, storing “PETER” in a column defined as
VARCHAR2(50) will cost only 5 bytes of storage, not 50.
More efficient, but more difficult for sizing!
Oracle LOB Data Types
The LOB datatypes BLOB, CLOB, and BFILE enable you to
store large blocks of unstructured data (such as text,
graphic images, video clips, and sound waveforms) up to 4
gigabytes in size. They provide efficient, random access to
the data.
CLOB is roughly equivalent to a MEMO in Paradox.
You can manipulate and search CLOB fields using special
tools
Again, sizing is difficult as only the space needed is taken
There are lots of other data types, but
these will do for the time being!
Row Sizing
Maximum row size can be determined by
ascertaining the data-types of different
columns of the table and adding together
the respective number of bytes.
Create Table SizeDemo (id Integer, Name
Varchar2(20), Dayte DATE) ;
Max Row Length = 4 + 20 +7 = 31 bytes
Max Row size is a safe estimation, but can
be considerably over estimated.
Oracle Row Sizing (8i onwards)
As an alternative to manual calculation
the average Row Size can be discovered
using the ANALYZE function:
ANALYZE TABLE Member ESTIMATE
STATISTICS;
Then ask for the statistic:
SELECT AVG(NVL(VSIZE(SURNAME),1)) from
member;
Tablespace building blocks
• Data blocks are the finest level of granularity
• A data block is the smallest unit of Input/Output
(I/O) used by the database.
• The block size itself will depend upon several things,
including the OS block size, and is set when the
database is created and is not altered thereafter.
• The data block size should be a multiple of the
operating system's block size.
• For a decision support system (DSS) application, it is
suggested that you choose a large value for the
DB_BLOCK_SIZE. For an OLTP type application, a
lower value (e.g., 2k or 4k) is suggested.
• There is no point in bringing back 32K of data from a
disk if the user only needs 2K!
Data blocks
• Regardless of what the block is being used
to store (it could be part of an Extent in a
table segment, or an index segment, or any
other segment) the data block will be of a
set format.
• The overhead: information about the block
(type, count of entries, timestamp, pointers
to items in the block, etc.). This is often no
more than 100 bytes in size.
• The data section (or Row Data): contains
the rows from the table, or branches of an
index.
• Free Space: the area in a block not yet
taken by row data.
Data blocks
• PCTFREE parameter tells Oracle to stop inserting
data when the free space reduces to 20% of fillable
space
• Free space = Block size-overhead-row data
• Fillable space = Block size-overhead
• The block is now unusable for insertions, and will
remain so until enough rows are deleted to bring
the percentage of the block that is filled with rows
below the PCTUSED parameter setting.
• You do not have to set either parameter: Oracle
defaults to 10% for PCTFREE and 40% for
PCTUSED
• PCTFREE+PCTUSED < 100
create table grades(g_id integer, Grade varchar2(12)) PCTFREE 20 PCTUSED 60;
Block Headers
Vary in size according to information in Block (ie
Index data or Row data)
block header = fixed header + variable
transaction header + table directory + row
directory
Where:
Fixed Header = 57 bytes
Variable transaction = 23 * initrans
Initrans is the number of transaction slots per block. By
default it is 1 for data and 2 for indexes.
Table Directory = 4 bytes
Row directory = 2 bytes for every row in the block
Ref: Oracle Metalink support
Block Space – Worked example block header = fixed header + variable transaction header +
table directory + row directory
 block header = 57 + (23*1) + 4 + 2x = (84 + 2x) bytes, where x =
number of rows in the block (assumes initrans=1)
 available data space = (block size - total block header) -
((block size - total block header) * (PCTFREE/100))
 For example, with PCTFREE = 10 and a block size of 2048, the
total space for new data in a block is:
 available data space = (2048 - (84 + 2x)) - ((2048 - (84 + 2x)) *
(10/100))
 = (1964 - 2x) - ((2048 - 84 - 2x) * (10/100))
 = (1964 - 2x) - (1964 - 2x) * 0.1
 = (1964 - 2x - 196 + 0.2x) bytes
 = (1768 - 1.8x) bytes
Ref: Oracle Metalink support
Sizing: Rows per Block
The next Step is to take your average Row Size
and calculate the average number of rows that
can fit into a database block
average number of rows per block =
floor(available data space / average row size)
Eg, for a average row size of 28 bytes:
average number of rows per block = x = (1768 - 1.8x)/28
bytes
28x = 1768 - 1.8x
29.8x = 1768
x ~ 59 = average number of rows per block
Make sure you round x or the average number of rows per
block DOWN.
Table Sizing
Once you know the number of rows that can fit
inside the available space of a database block,
you can calculate the number of blocks required
to hold the proposed table:
number of blocks for the table = number of
rows / average number of rows per block
Using 10,000 rows for table test:
number of blocks for table test = 10000 rows / 59
rows per block
~ 169 blocks
Index SizingThe method is the same, but there are some
differences in the numbers for Index Blocks:
INITRANS is usually = 2
Fixed Header = 113
So block header size = 113 + (23 * 2) bytes = 159
available data space is still= (block size - block
header size) - ((block size - block header size) *
(PCTFREE/100))
Assuming a block size of 2048 bytes and PCTFREE of 10:
available data space = (2048 bytes - 159 bytes) -
((2048 bytes - 159 bytes) * (10/100)) = 1889 bytes -
188.9 bytes = 1700.1 bytes
Index Sizing cont...
Now find the total average column widths of the
columns used in the index.
Eg: Put an index on the NAME column of SizeDemo.
Assuming average width of 22
Take that into our calculation of bytes per index
entry:
 bytes per entry = entry header + ROWID length + F + V + D
 entry header = 1 byte
 ROWID length = 6 bytes
 F = total length bytes of all columns with 1 byte column
lengths (CHAR, NUMBER, DATE, and ROWID types)
 V = total length bytes of all columns with 3 byte column
lengths (VARCHAR2 and RAW datatypes)
 D = 22 (from above)
Index Sizing cont...
bytes per entry = 1 + 6 + (0 * 1) + (1 * 3) + 22 bytes
= 32 bytes
To calculate the number of blocks and bytes
required for the index, use:
number of blocks for index = 1.1 * ((number of
not null rows * avg. entry size) / avail. data space
The additional 10% added to this result accounts
for branch blocks of the index.
number of blocks for index = 1.1 * ((10000 * 32
bytes) / 1700)
= 208 blocks (rounded up)
Database Sizing
Repeat this exercise for all your major tables and
indexes
80/20 rule applies: don’t waste time on lookups
for example, just make an appropriate, but safe,
guess
Add all the table sizes (in blocks) together and
you have the disk space required
To get this value in bytes, multiply by the
database block size.

More Related Content

PDF
R Introduction
PPT
Column-Stores vs. Row-Stores: How Different are they Really?
PDF
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
PDF
Spatial index(2)
PDF
Pandas
PPT
Extensible hashing
PPTX
Introduction to pandas
PPT
Chapter13
R Introduction
Column-Stores vs. Row-Stores: How Different are they Really?
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Spatial index(2)
Pandas
Extensible hashing
Introduction to pandas
Chapter13

What's hot (20)

PPTX
Data Analysis with Python Pandas
PPTX
Array c programming
PPTX
Binomial Heaps and Fibonacci Heaps
PDF
DBMS 9 | Extendible Hashing
PDF
Tree representation in map reduce world
PPT
Week3 binary trees
PDF
Юра Гуляев. Oracle tables
PDF
Introduction to Pandas and Time Series Analysis [PyCon DE]
PDF
R training3
PDF
SQL: Query optimization in practice
PPTX
Sql server lesson5
PPTX
Data Structure and Algorithms
PPTX
Python and CSV Connectivity
PDF
CS375 Presentation-binary sort.pptx
PPTX
Moving Data to and From R
PPTX
Allocation and free space management
PPTX
PPTX
Database Performance
PPT
Abstract data types
PPTX
Arrays In C Language
Data Analysis with Python Pandas
Array c programming
Binomial Heaps and Fibonacci Heaps
DBMS 9 | Extendible Hashing
Tree representation in map reduce world
Week3 binary trees
Юра Гуляев. Oracle tables
Introduction to Pandas and Time Series Analysis [PyCon DE]
R training3
SQL: Query optimization in practice
Sql server lesson5
Data Structure and Algorithms
Python and CSV Connectivity
CS375 Presentation-binary sort.pptx
Moving Data to and From R
Allocation and free space management
Database Performance
Abstract data types
Arrays In C Language
Ad

Similar to Database Sizing (20)

PDF
MariaDB ColumnStore
PPTX
Session 2 - "MySQL Basics & Schema Design"
PDF
MariaDB ColumnStore
PDF
PDF
Oracle sql tutorial
PPTX
unit 1 ppt.pptx
PPTX
unit 1_unit1_unit1_unit 1_unit1_unit1_ ppt.pptx
PPTX
unit 1_unit1_unit1_unit 1_unit1_unit1_ ppt.pptx
PPTX
Sql Basics And Advanced
PPT
1650607.ppt
PPTX
Maryna Popova "Deep dive AWS Redshift"
PPTX
Apache Cassandra, part 1 – principles, data model
PDF
Basic datatypes - deep understanding
PPTX
In memory databases presentation
PDF
2011 06-sq lite-forensics
PPTX
05 Create and Maintain Databases and Tables.pptx
PDF
UNIT 3 SQL 10.pdf ORACEL DATABASE QUERY OPTIMIZATION
PPTX
Indexing
PDF
Deep Dive into DynamoDB
PPTX
Structured Query Language (SQL) _ Edu4Sure Training.pptx
MariaDB ColumnStore
Session 2 - "MySQL Basics & Schema Design"
MariaDB ColumnStore
Oracle sql tutorial
unit 1 ppt.pptx
unit 1_unit1_unit1_unit 1_unit1_unit1_ ppt.pptx
unit 1_unit1_unit1_unit 1_unit1_unit1_ ppt.pptx
Sql Basics And Advanced
1650607.ppt
Maryna Popova "Deep dive AWS Redshift"
Apache Cassandra, part 1 – principles, data model
Basic datatypes - deep understanding
In memory databases presentation
2011 06-sq lite-forensics
05 Create and Maintain Databases and Tables.pptx
UNIT 3 SQL 10.pdf ORACEL DATABASE QUERY OPTIMIZATION
Indexing
Deep Dive into DynamoDB
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Ad

More from Amin Chowdhury (8)

PPTX
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
DOCX
Tlad better with data - matthew love + charles (2)
PPTX
Tlad 2015 presentation amin+charles-final
PPTX
Database Project management
PPTX
Database Industry perspective
DOC
090321 - EEHCO Project Plan PSTC- Dhaka
PPTX
E-commerce Project Development
PPTX
Data Quality: A Raising Data Warehousing Concern
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
Tlad better with data - matthew love + charles (2)
Tlad 2015 presentation amin+charles-final
Database Project management
Database Industry perspective
090321 - EEHCO Project Plan PSTC- Dhaka
E-commerce Project Development
Data Quality: A Raising Data Warehousing Concern

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Mega Projects Data Mega Projects Data
PDF
Introduction to the R Programming Language
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Fluorescence-microscope_Botany_detailed content
Business Analytics and business intelligence.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to machine learning and Linear Models
IB Computer Science - Internal Assessment.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Mega Projects Data Mega Projects Data
Introduction to the R Programming Language
Reliability_Chapter_ presentation 1221.5784
climate analysis of Dhaka ,Banglades.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Data Science and Data Analysis
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Introduction-to-Cloud-ComputingFinal.pptx
Qualitative Qantitative and Mixed Methods.pptx
Fluorescence-microscope_Botany_detailed content

Database Sizing

  • 1. Database Sizing Idea used from: Lake (2014)
  • 2. Why and When Size? • Initially to establish the scale of the required database to help select OS environment and DBMS • Establish HDD requirements • To get a “feel” for the data: • which tables need special treatment: separate tablespaces? Partitioning?…. • Generate statistics which help in physical design • Continually monitor
  • 3. Sizing Basics Bit “One of the two digits 0 and 1 used in binary notation. The word comes from Binary digit” Byte  “A set of binary digits usually representing one character, which is treated by the computer as one unit”
  • 4. Common Data Types Type Bytes Range bit 0 or 1 tinyint 1 0 to 255 smallint 2 -216 to 216 - 1 integer 4 -231 to 231 - 1 decimal(m,n) 8 -1038 to 1038 - 1 datetime 8 depends on DBMS char(n) , string(n) n Maximum 255 varchar2(n) n Maximum 4000
  • 5. Oracle Date Data Type Dates are renowned for causing problems when transfering data between DBMS because the method used to store the data internally differs. For example: In Oracle the DATE datatype stores the century, year, month, day, hours, minutes, and seconds. Paradox date fields can contain any valid date from January 1, 9999 BC to December 31, 9999 AD.
  • 6. Oracle Date Data Type Example with Oracle: CREATE TABLE Birthdays_tab (Bname VARCHAR2(20),Bday DATE) ; INSERT INTO Birthdays_tab (bname, bday) VALUES ('ANNIE',TO_DATE('13-NOV-92 10:56 A.M.','DD- MON-YY HH:MI A.M.')); Oracle uses its own internal format to store dates. Date data is stored in fixed- length fields of seven bytes each, corresponding to century, year, month, day, hour, minute, and second.
  • 7. Oracle Varchar2 Data Type The VARCHAR2 datatype stores variable-length character strings. specify a maximum string length (in bytes or characters) between 1 and 4000 bytes for the VARCHAR2 column. For each row, Oracle stores each value in the column as a variable-length field unless a value exceeds the column's maximum length, in which case Oracle returns an error. Using VARCHAR2 saves on space used by the table. For example, storing “PETER” in a column defined as VARCHAR2(50) will cost only 5 bytes of storage, not 50. More efficient, but more difficult for sizing!
  • 8. Oracle LOB Data Types The LOB datatypes BLOB, CLOB, and BFILE enable you to store large blocks of unstructured data (such as text, graphic images, video clips, and sound waveforms) up to 4 gigabytes in size. They provide efficient, random access to the data. CLOB is roughly equivalent to a MEMO in Paradox. You can manipulate and search CLOB fields using special tools Again, sizing is difficult as only the space needed is taken There are lots of other data types, but these will do for the time being!
  • 9. Row Sizing Maximum row size can be determined by ascertaining the data-types of different columns of the table and adding together the respective number of bytes. Create Table SizeDemo (id Integer, Name Varchar2(20), Dayte DATE) ; Max Row Length = 4 + 20 +7 = 31 bytes Max Row size is a safe estimation, but can be considerably over estimated.
  • 10. Oracle Row Sizing (8i onwards) As an alternative to manual calculation the average Row Size can be discovered using the ANALYZE function: ANALYZE TABLE Member ESTIMATE STATISTICS; Then ask for the statistic: SELECT AVG(NVL(VSIZE(SURNAME),1)) from member;
  • 11. Tablespace building blocks • Data blocks are the finest level of granularity • A data block is the smallest unit of Input/Output (I/O) used by the database. • The block size itself will depend upon several things, including the OS block size, and is set when the database is created and is not altered thereafter. • The data block size should be a multiple of the operating system's block size. • For a decision support system (DSS) application, it is suggested that you choose a large value for the DB_BLOCK_SIZE. For an OLTP type application, a lower value (e.g., 2k or 4k) is suggested. • There is no point in bringing back 32K of data from a disk if the user only needs 2K!
  • 12. Data blocks • Regardless of what the block is being used to store (it could be part of an Extent in a table segment, or an index segment, or any other segment) the data block will be of a set format. • The overhead: information about the block (type, count of entries, timestamp, pointers to items in the block, etc.). This is often no more than 100 bytes in size. • The data section (or Row Data): contains the rows from the table, or branches of an index. • Free Space: the area in a block not yet taken by row data.
  • 13. Data blocks • PCTFREE parameter tells Oracle to stop inserting data when the free space reduces to 20% of fillable space • Free space = Block size-overhead-row data • Fillable space = Block size-overhead • The block is now unusable for insertions, and will remain so until enough rows are deleted to bring the percentage of the block that is filled with rows below the PCTUSED parameter setting. • You do not have to set either parameter: Oracle defaults to 10% for PCTFREE and 40% for PCTUSED • PCTFREE+PCTUSED < 100 create table grades(g_id integer, Grade varchar2(12)) PCTFREE 20 PCTUSED 60;
  • 14. Block Headers Vary in size according to information in Block (ie Index data or Row data) block header = fixed header + variable transaction header + table directory + row directory Where: Fixed Header = 57 bytes Variable transaction = 23 * initrans Initrans is the number of transaction slots per block. By default it is 1 for data and 2 for indexes. Table Directory = 4 bytes Row directory = 2 bytes for every row in the block Ref: Oracle Metalink support
  • 15. Block Space – Worked example block header = fixed header + variable transaction header + table directory + row directory  block header = 57 + (23*1) + 4 + 2x = (84 + 2x) bytes, where x = number of rows in the block (assumes initrans=1)  available data space = (block size - total block header) - ((block size - total block header) * (PCTFREE/100))  For example, with PCTFREE = 10 and a block size of 2048, the total space for new data in a block is:  available data space = (2048 - (84 + 2x)) - ((2048 - (84 + 2x)) * (10/100))  = (1964 - 2x) - ((2048 - 84 - 2x) * (10/100))  = (1964 - 2x) - (1964 - 2x) * 0.1  = (1964 - 2x - 196 + 0.2x) bytes  = (1768 - 1.8x) bytes Ref: Oracle Metalink support
  • 16. Sizing: Rows per Block The next Step is to take your average Row Size and calculate the average number of rows that can fit into a database block average number of rows per block = floor(available data space / average row size) Eg, for a average row size of 28 bytes: average number of rows per block = x = (1768 - 1.8x)/28 bytes 28x = 1768 - 1.8x 29.8x = 1768 x ~ 59 = average number of rows per block Make sure you round x or the average number of rows per block DOWN.
  • 17. Table Sizing Once you know the number of rows that can fit inside the available space of a database block, you can calculate the number of blocks required to hold the proposed table: number of blocks for the table = number of rows / average number of rows per block Using 10,000 rows for table test: number of blocks for table test = 10000 rows / 59 rows per block ~ 169 blocks
  • 18. Index SizingThe method is the same, but there are some differences in the numbers for Index Blocks: INITRANS is usually = 2 Fixed Header = 113 So block header size = 113 + (23 * 2) bytes = 159 available data space is still= (block size - block header size) - ((block size - block header size) * (PCTFREE/100)) Assuming a block size of 2048 bytes and PCTFREE of 10: available data space = (2048 bytes - 159 bytes) - ((2048 bytes - 159 bytes) * (10/100)) = 1889 bytes - 188.9 bytes = 1700.1 bytes
  • 19. Index Sizing cont... Now find the total average column widths of the columns used in the index. Eg: Put an index on the NAME column of SizeDemo. Assuming average width of 22 Take that into our calculation of bytes per index entry:  bytes per entry = entry header + ROWID length + F + V + D  entry header = 1 byte  ROWID length = 6 bytes  F = total length bytes of all columns with 1 byte column lengths (CHAR, NUMBER, DATE, and ROWID types)  V = total length bytes of all columns with 3 byte column lengths (VARCHAR2 and RAW datatypes)  D = 22 (from above)
  • 20. Index Sizing cont... bytes per entry = 1 + 6 + (0 * 1) + (1 * 3) + 22 bytes = 32 bytes To calculate the number of blocks and bytes required for the index, use: number of blocks for index = 1.1 * ((number of not null rows * avg. entry size) / avail. data space The additional 10% added to this result accounts for branch blocks of the index. number of blocks for index = 1.1 * ((10000 * 32 bytes) / 1700) = 208 blocks (rounded up)
  • 21. Database Sizing Repeat this exercise for all your major tables and indexes 80/20 rule applies: don’t waste time on lookups for example, just make an appropriate, but safe, guess Add all the table sizes (in blocks) together and you have the disk space required To get this value in bytes, multiply by the database block size.