Dbms4

CS 222
Database Management System
Spring 2010-11

Lecture 5
Database Design (Decomposition)

Korra Sathya Babu
Department of Computer Science
NIT Rourkela

Recap
• Design of DB is needed to reduce redundancy and
anomalies
• The theory of Functional Dependency is completely
studied
• Better Design requires schema refinement
• A solution for schema refinement is Synthesis of
relations

2/11/2013 Database Design 2

Relation Decomposition

R1

R-X + X X +- X

R2
R


Relation Decomposition
• Reason for Decomposition
• A solution for reducing redundancy and Anomalies

• Rules for synthesis
• Lossless Join (Information Preservation)

• Dependency Preservation (a special case of information
preservation)

• Decomposition (synthesis) types
• By functional dependency
• By multi-valued dependency
• By Join dependency


Lossless Join
• Definition
A decomposition D = {R1, R2,..., Rm} of R has the lossless
join property with respect to the set of dependencies F on
R if, for every relation r of R that satisfies F, the following
holds, ( R1(r), ..., Rm(r)) = r

where is the natural join of all the relations in D

• The word loss in lossless refers to loss of
information, not to loss of tuples.


Test for Lossless Join
Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of
Functional Dependencies

Lossless Join Test Algorithm:
Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one
column j for each attribute Aj in R.
Step 2: Set S(i, j) := bij for all matrix entries
Step 3: For each row i representing relation schema Ri Do
{for each column j representing Aj do
{if relation Ri includes attribute Aj then
set S(i, j) := aj;}
Step 4: Repeat the following loop until a complete loop execution results in no
changes to S.


Test for Lossless Join
Lossless Join Test Algorithm: continues…
Step 4: Repeat the following loop until a complete loop execution results in no
changes to S.
If {for each function dependency X Y in F do
for all rows in S which have the same symbols in the columns
corresponding to attributes in X do
{make the symbols in each column that correspond to
an attribute in Y be the same in all these rows as follows:
if any of the rows has an “a” symbol for the column,
set the other rows to the same “a” symbol in the column.
If no “a” symbol exists for the attribute in any of the
rows, choose one of the “b” symbols that appear in one
of the rows for the attribute and set the other rows to
that same “b” symbol in the column;}}

Step 5: If a row is made up entirely of “a” symbols, then the
decomposition has the lossless join property;
otherwise it does not.


Example 1

Emp_PROJ

SSN PNUM hours ENAME PNAME PLOCATION

F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours}

R1 R2
SSN ENAME PNUM PNAME PLOCATION

R3
SSN PNUM hours


Example 1

A1 A2 A3 A4 A5 A6
SSN ENAME PNUM PNAME PLOCATION hours

R1 b11 b12 b13 b14 b15 b16
R2 b21 b22 b23 b24 b25 b26

R3 b31 b32 b33 b34 b35 b36

R1 a1 a2 b13 b14 b15 b16

R2 b21 b22 a3 a4 a5 b26

R3 a1 b32 a3 b34 b35 a6


Example 1
SSN ENAME
SSN ENAME
R1 a1 a2 b13 b14 b15 b16

R2 b21 b22 a3 a4 a5 b26

R3 a1 a2 a3 b34 b35 a6

PNUM {PNAME, PLOCATION}

PNUM PNAME PLOCATION
R1 a1 a2 b13 b14 b15 b16

R2 b21 b22 a3 a4 a5 b26

R3 a1 a2 a3 a4 a5 a6


Example 2

Emp_PROJ

SSN PNUM hours ENAME PNAME PLOCATION
F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours}

R1 R2

ENAME PLOCATION SSN PNUM hours PNAME PLOCATION


Example 2

A1 A2 A3 A4 A5 A6
SSN ENAME PNUM PNAME PLOCATION hours

R1 b11 b12 b13 b14 b15 b16
R2 b21 b22 b23 b24 b25 b26

R1 b11 a2 b13 b14 a5 b16
R2 a1 b22 a3 a4 a5 a6

SSN ENAME
PNUM {PNAME, PLOCATION}
{SSN, PNUM} hours


Problems
• Check whether the following decompositions are
lossy or lossless
• Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE.
Let F={AC, BC, CD, DEC, CEA}
• R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}.
R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ)
• R(XYZ), F={XY, ZY}. R1(XY), R2(YZ)
• R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)}
F={XYW, XWP, PQZ, XYQ}


Dependency Preservation
R was decomposed (normalisation) into R1, …, Rn
S - the set of FDs for R
S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers
to only the attributes of Ri)
S’ = S1 … Sn (usually, S’ S)

the decomposition is dependency preserving if
S’+ = S+


Test for Dependency Preservation
Input: decomposition D={D1,…,Dk} and a set of FDs F
Dependency Preservation Test:
Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the
determinant of the FD under consideration). ie set T=X and continue with step 2
Step 2: Repeat step 3 until the set T no longer changes. When T no longer
changes continue with step 4
Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply
the corresponding Ri operation (on a set of attributes T with respect to set
of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and repeat step 3
Step 4: Test to see if Y(the right hand side of the FD under consideration)
is such that Y ⊂ T. There are two outcomes to this test. If the answer is
negative. i.e. if Y not a subset of T then stop the execution of the
algorithm and report that the decomposition does not preserve the FD. If
the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other
FDs in F that need to be considered repeat step 1 with a FD that has not
been considered before. If no more FDs in F then continue with step 4


Problems

1. Given R(XYZ) and the set F = {ZX , XYZ}. Check if the
decomposition R1(XY) and R2(XZ) preserve the set F.
2. Given R(ABCD) and the set F = {AB , CD}. Check if the
decomposition R1(AB) and R2(CD) preserve the set F.
3. Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the
relation R(WXYZ) preserves the dependencies of the set F={XY,
YZ, ZW, WX}.
4. Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check
if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve
the set F.


Normalization
• Normalization is the process of successive reduction
of a given set of relations to a better form (reduced
redundancy and anomalies)
• The normalization that one needs to sustain
depends on the work flow (tradeoff between fast
access, maintenance of integrity)
• Assumes that all possible functional dependencies
are known
• First construct a minimal set of FDs
• Then apply algorithms that construct a required Normal
Form
• Additional criteria may be needed to ensure that the
set of relations in a relational database are
atisfactory

1 NF
• A relation is in first normal form (1NF) if it does not
contain any repeating columns or repeating groups
of columns
• It is the process of converting complex data
structures into more simple, stable data structures
• A relvar is in 1NF if and only if in every legal value
of that relvar, every tuple contains exactly one
value for each attribute
• First Normal From (1NF)
• Unique rows
• All attributes are atomic


2 NF
• A table is in the second normal form (2NF) if it is in
the first normal form and if all non-key columns in
the table depend on the entire primary key
• The following relation is in 1NF but not 2NF

EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed)

Functional dependencies:
1. Emp_ID  Name, Dept, Salary partial key dependency
2. Emp_ID, Course  Date_Completed

Decompose into 2NF
EMPLOYEE1(Emp_ID, Name, Dept, Salary)
Functional dependencies: Emp_ID Name, Dept, Salary

EMPCOURSE(Emp_ID, Course,Date_Completed)
Functional dependency: Emp_ID, Course  Date_Completed


3 NF
• A table is in the third normal form (3NF) if it is in
the second normal form and if all non-key columns
in the table depend non-transitively on the entire
primary key

SALES(Customer_ID, Customer_Name, SalesPerson, Region)
1. Customer_ID  Customer_Name, SalesPerson, Region
2. SalesPerson  Region Transitive Dependency

Decompose into 3NF
SALES1(Customer_ID, Customer_Name, SalesPerson)
Functional dependencies: Customer_ID Customer_Name, SalesPerson

SPERSON(SalesPerson, Region)
Functional dependency: SalesPerson  Region


BCNF
• A table is in Boyce-Codd normal form (BCNF) if
every column, on which some other column is fully
functionally dependent, is also a candidate for the
primary key of the table
• A table is in BCNF if the only determinants in the
table are the candidate keys
SCHOOL(Student, Subject, Teacher)
1. Student, Subject  Teacher
2. Student, Teacher  Subject
3. Teacher  Subject

Decompose into BCNF
SCHOOL1(Student, Subject)
SCHOOL2(Subject, Teacher)

All Functional Dependencies vanished except TeacherSubject


Comparison between 3NF and BCNF

• It is always possible to decompose a relation into
relations in 3NF such that:
 the decomposition is lossless
 the dependencies are preserved

• It is always possible to decompose a relation into
relations in BCNF such that:
 the decomposition is lossless
 but it may not be possible to preserve dependencies
 But may eliminate more redundancy


Multivalued Dependency
Let R be a relation schema and let R and R. The
multivalued dependency

holds on R if in any legal relation r(R), for all pairs for
tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist tuples t3 and t4
in r such that:
t1[ ] = t2 [ ] = t3 [ ] = t4 [ ]
t3[ ] = t1 [ ]
t3[R – ] = t2[R – ]
t4 ] = t2[ ]
t4[R – ] = t1[R – ]

• MVD is a tuple generating Dependency


4 NF
• A table is in the fourth normal form (4 NF) if it is in
BCNF and does not have any independent multi-
valued parts of the primary key
• If there are two attributes A and B and for a given
value of A if there exists multiple values of B, then
we say that an MVD exists between A and B
• The normal forms after BCNF are theoretical
interests


4 NF
Student Table
Student Subject Language
Geeta Mythology English
Geeta Psychology English
Geeta Mythology Hindi
Geeta Psychology Hindi
Shekher Gardening English

Student Subject
Student Language


4 NF

Split the independent multi-valued components of the
primary key into two tables
The primary key is (student subject language)

Student_Subject Table Student_Language Table
Student Subject Student Language
Geeta Mythology Geeta English
Geeta Psychology Geeta Hindi
Shekher Gardening Shekher English

Here we take care of the update anomaly


Surprise: Loss less Decomposition
• There exists relations that cannot be nonloss-
decomposed into two projects, but can be
decomposed into three or more


Join Dependency
• Definition: A relation R satisfies the join
Dependency (JD) *(X,Y,…,Z)
iff R is equal to the join of its projects on
X,Y,..,Z, where X,Y,..,Z are subsets of the set of
attributes of R.
• Consider the following Suppliers(S), Parts(P) and Location they
Supply (L) table
SPL Table
S P L
S P P L
S1 P1 L2
ACTUAL S1 P1 P1 L2
DECOMPOSTION
S1 P2 L1
S1 P2 P2 L1
S2 P1 L1
S2 P1 P1 L1
S1 P1 L1


Join Dependency

S P L
S P P L
S1 P1 L2
ACTUAL S1 P1 P1 L2
DECOMPOSTION
S1 P2 L1
S1 P2 P2 L1
S2 P1 L1
S2 P1 P1 L1
S1 P1 L1
Join

S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
Spurious S2 P1 L2
Tuple

Join Dependency

S P L
S P P L L S
S1 P1 L2
DECOMPOSTION S1 P1 P1 L2 L2 S1
S1 P2 L1
S1 P2 P2 L1 L1 S1
S2 P1 L1
S2 P1 P1 L1 L2 S2
S1 P1 L1
Join

S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1


5 NF
• A table is in fifth normal form (5NF) if it is in the
fourth normal form and every join dependency in
the table is implied by the candidate key
• Its also called as the Project Join Normal Form
(PJNF)


Normalization
Un-normalized Relation
Arrange every atomic value in the cell
(intersection of row and column) of a table

First Normal Form (1NF)
Eliminate Partial Dependencies

Second Normal Form (2NF)
Eliminate Transitive Dependencies

Third Normal Form (3NF)
Make every determinant as a key

Boyce-Codd Normal Form
Eliminate Multi-valued Dependencies
that are not Functional Dependencies
Fourth Normal Form (4NF)
Eliminate Join Dependencies that are not
implied by Candidate keys
Fifth Normal Form (5NF)

Denormalization
• Denormalization if a process in which we retain or
introduce some amount of redundancy for faster
data access
• Where there arise tradeoffs


Summary
• Normalization helps to reduce redundancy and few
anomalies
• The first 3 (1, 2 and 3) normal forms are practical
but BCNF, 4NF and 5 NF are more of theoretical
interests
• Denormalization is done for fast access


Dbms4

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to Dbms4 (20)

Dbms4