SlideShare a Scribd company logo
Prof P Sreenivasa Kumar
Department of CS&E, IITM
1
Database Design and Normal Forms
Database Design
coming up with a ‘good’ schema is very important
How do we characterize the “goodness” of a schema ?
If two or more alternative schemas are available
how do we compare them ?
What are the problems with “bad” schema designs ?
Normal Forms:
Each normal form specifies certain conditions
If the conditions are satisfied by the schema
certain kind of problems are avoided
Details follow….
Prof P Sreenivasa Kumar
Department of CS&E, IITM
2
An Example
student relation with attributes: studName, rollNo, sex, studDept
department relation with attributes: deptName, officePhone, hod
Several students belong to a department.
studDept gives the name of the student’s department.
Correct schema:
What are the problems that arise ?
studName
studName
rollNo
rollNo
sex
sex
studDept deptName
deptName
officePhone
officePhone
HOD
HOD
Incorrect schema:
Student Department
Student Dept
Prof P Sreenivasa Kumar
Department of CS&E, IITM
3
Problems with bad schema
Redundant storage of data:
Office Phone & HOD info - stored redundantly
once with each student that belongs to the department
wastage of disk space
A program that updates Office Phone of a department
must change it at several places
• more running time
• error - prone
Transactions running on a database
must take as short time as possible to increase transaction
throughput
Prof P Sreenivasa Kumar
Department of CS&E, IITM
4
Update Anomalies
Another kind of problem with bad schema
Insertion anomaly:
No way of inserting info about a new department unless
we also enter details of a (dummy) student in the department
Deletion anomaly:
If all students of a certain department leave
and we delete their tuples,
information about the department itself is lost
Update Anomaly:
Updating officePhone of a department
• value in several tuples needs to be changed
• if a tuple is missed - inconsistency in data
Prof P Sreenivasa Kumar
Department of CS&E, IITM
5
Normal Forms
First Normal Form (1NF) - included in the definition of a relation
Second Normal Form (2NF)
defined in terms of
Third Normal Form (3NF) functional dependencies
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF) - defined using multivalued
dependencies
Fifth Normal Form (5NF) or Project Join Normal Form (PJNF)
defined using join dependencies
Prof P Sreenivasa Kumar
Department of CS&E, IITM
6
Functional Dependencies
A functional dependency (FD) X → Y
(read as X determines Y) (X ⊆ R, Y ⊆ R)
is said to hold on a schema R if
in any instance r on R,
if two tuples t1, t2 (t1 ≠ t2, t1 ∈ r, t2 ∈ r)
agree on X i.e. t1 [X] = t2 [X]
then they also agree on Y i.e. t1 [Y] = t2 [Y]
Note: If K ⊂ R is a key for R then for any A ∈ R,
K → A
holds because the above if …..then condition is
vacuously true
Prof P Sreenivasa Kumar
Department of CS&E, IITM
7
Functional Dependencies – Examples
Consider the schema:
Student ( studName, rollNo, sex, dept, hostelName, roomNo)
Since rollNo is a key, rollNo → {studName, sex, dept,
hostelName, roomNo}
Suppose that each student is given a hostel room exclusively, then
hostelName, roomNo → rollNo
Suppose boys and girls are accommodated in separate hostels, then
hostelName → sex
FDs are additional constraints that can be specified by designers
Prof P Sreenivasa Kumar
Department of CS&E, IITM
8
Trivial / Non-Trivial FDs
An FD X → Y where Y ⊆ X
- called a trivial FD, it always holds good
An FD X → Y where Y ⊈ X
- non-trivial FD
An FD X → Y where X ∩ Y = f
- completely non-trivial FD
Prof P Sreenivasa Kumar
Department of CS&E, IITM
9
Deriving new FDs
Given that a set of FDs F holds on R
we can infer that a certain new FD must also hold on R
For instance,
given that X → Y, Y → Z hold on R
we can infer that X → Z must also hold
How to systematically obtain all such new FDs ?
Unless all FDs are known, a relation schema is not fully specified
Prof P Sreenivasa Kumar
Department of CS&E, IITM
10
Entailment relation
We say that a set of FDs F ⊨{ X → Y}
(read as F entails X → Y or
F logically implies X → Y)
if in every instance r of R on which FDs F hold,
FD X → Y also holds.
Armstrong came up with several inference rules
for deriving new FDs from a given set of FDs
We define F+
= {X → Y | F ⊨X → Y}
F
+
: Closure of F
Prof P Sreenivasa Kumar
Department of CS&E, IITM
11
Armstrong’s Inference Rules (1/2)
1. Reflexive rule
F ⊨ {X → Y | Y ⊆ X} for any X. Trivial FDs
2. Augmentation rule
{X → Y} ⊨ {XZ → YZ}, Z ⊆ R. Here XZ denotes X ⋃ Z
3. Transitive rule
{X → Y, Y → Z} ⊨ {X → Z}
4. Decomposition or Projective rule
{X → YZ} ⊨ {X → Y}
5. Union or Additive rule
{X → Y, X → Z} ⊨ {X → YZ}
6. Pseudo transitive rule
{X → Y, WY → Z} ⊨ {WX → Z}
Prof P Sreenivasa Kumar
Department of CS&E, IITM
12
Rules 4, 5, 6 are not really necessary.
For instance, Rule 5: {X → Y, X → Z} ⊨ {X → YZ} can be
proved using 1, 2, 3 alone
1) X → Y
2) X → Z
3) X → XY Augmentation rule on 1
4) XY → ZY Augmentation rule on 2
5) X → ZY Transitive rule on 3, 4.
Similarly, 4, 6 can be shown to be unnecessary.
But it is useful to have 4, 5, 6 as short-cut rules
given
Armstrong's Inference Rules (2/2)
Prof P Sreenivasa Kumar
Department of CS&E, IITM
13
Sound and Complete Inference Rules
Armstrong showed that
Rules (1), (2) and (3) are sound and complete.
These are called Armstrong’s Axioms (AA)
Soundness:
Every new FD X → Y derived from a given set of FDs F
using Armstrong's Axioms is such that F ⊨{X → Y}
Completeness:
Any FD X → Y logically implied by F (i.e. F ⊨ {X → Y})
can be derived from F using Armstrong’s Axioms
Prof P Sreenivasa Kumar
Department of CS&E, IITM
14
Proving Soundness
Suppose X → Y is derived from F using AA in some n steps.
If each step is correct then overall deduction would be correct.
Single step: Apply Rule (1) or (2) or (3)
Rule (1) – obviously results in correct FDs
Rule (2) – {X → Y}⊨ {XZ → YZ}, Z ⊆ R
Suppose t1, t2 ∈ r agree on XZ
⇒ t1, t2 agree on X
⇒ t1, t2 agree on Y (since X → Y holds on r)
⇒ t1, t2 agree as YZ
Hence Rule (2) gives rise to correct FDs
Rule (3) – {X → Y, Y → Z} ⊨ X → Z
Suppose t1, t2 ∈ r agree on X
⇒ t1, t2 agree on Y (since X → Y holds)
⇒ t1, t2 agree on Z (since Y → Z holds)
Prof P Sreenivasa Kumar
Department of CS&E, IITM
15
Proving Completeness of Armstrong’s Axioms (1/4)
Define X
+
F (closure of X wrt F)
= {A | X → A can be derived from F using AA}, A ∈ R
Claim1:
X → Y can be derived from F using AA iff Y ⊆ X
+
(If) Let Y = {A1, A2,…, An}. Y ⊆ X
+
⇒ X → Ai can be derived from F using AA (1 ≤ i ≤ n)
By union rule, it follows that X → Y can be derived from F.
(Only If) X → Y can be derived from F using AA
By projective rule X → Ai (1 ≤ i ≤ n)
Thus by definition of X+
, Ai ∈ X
+
⇒ Y ⊆ X
+
Prof P Sreenivasa Kumar
Department of CS&E, IITM
16
Completeness of Armstrong’s Axioms (2/4)
Completeness:
(F ⊨ {X → Y}) ⇒ X → Y follows from F using AA
We will prove the contrapositive:
X →Y can’t be derived from F using AA
⇒ F ⊭ {X → Y}
⇒ ∃ a relation instance r on R st all the FDs of
F hold on r but X → Y doesn’t hold.
Consider the relation instance r with just two tuples:
X
+
attributes Other attributes
r: 1 1 1 …1 1 1 1 …1
1 1 1 …1 0 0 0 …0
Prof P Sreenivasa Kumar
Department of CS&E, IITM
17
Claim 2: All FDs of F are satisfied by r
Suppose not. Let W → Z in F be an FD not satisfied by r
Then W ⊆ X+
and Z ⊈ X
+
Let A ∈ Z – X
+
Now, X → W follows from F using AA as W ⊆ X
+
(claim 1)
X → Z follows from F using AA by transitive rule
Z → A follows from F using AA by reflexive rule as A ∈ Z
X → A follows from F using AA by transitive rule
By definition of closures, A must belong to X
+
- a contradiction. r: 1 1 1 …1 1 1 1 …1
Hence the claim. 1 1 1 …1 0 0 0 …0
X+
R - X+
Completeness Proof (3/4)
Prof P Sreenivasa Kumar
Department of CS&E, IITM
18
Completeness Proof (4/4)
Claim 3: X → Y is not satisfied by r
Suppose not
Because of the structure of r, Y ⊆ X+
⇒ X → Y can be derived from F using AA
contradicting the assumption about X → Y
Hence the claim
Thus, whenever X → Y doesn’t follow from F using AA,
F doesn’t logically imply X → Y
Armstrong’s Axioms are complete.
Prof P Sreenivasa Kumar
Department of CS&E, IITM
19
Consequence of Completeness of AA
X
+
= {A | X → A follows from F using AA}
= {A | F ⊨ X → A}
Similarly
F
+
= {X → Y | F ⊨ X → Y}
= {X → Y | X → Y follows from F using AA}
Prof P Sreenivasa Kumar
Department of CS&E, IITM
20
Computing closures
The size of F
+
can sometimes be exponential in the size of F.
For instance, F = {A → B1, A → B2,….., A → Bn}
F
+
= {A → X} where X ⊆ {B1, B2,…,Bn}.
Thus |F
+
| = 2
n
Computing F
+
: computationally expensive
Fortunately, checking if X → Y ∈ F+
can be done by checking if Y ⊆ X
+
F
Computing attribute closure (X
+
F) is easier
Prof P Sreenivasa Kumar
Department of CS&E, IITM
21
Computing X+
F
We compute a sequence of sets X0, X1,… as follows:
X0:= X; // X is the given set of attributes
Xi+1:= Xi ∪ {A | there is a FD Y → Z in F
and A ∈ Z and Y ⊆ Xi}
Since X0 ⊆ X1 ⊆ X2 ⊆ ... ⊆ Xi ⊆ Xi+1 ⊆ ...⊆ R
and R is finite,
There is an integer i st Xi = Xi+1 = Xi+2 =…
and X+
F is equal to Xi.
Prof P Sreenivasa Kumar
Department of CS&E, IITM
22
Normal Forms – 2NF
Full functional dependency:
An FD X → A for which there is no proper subset Y of X
such that Y → A
(A is said to be fully functionally dependent on X)
2NF: A relation schema R is in 2NF if
every non-prime attribute is fully functionally dependent on any
key of R
prime attribute: A attribute that is part of some key
non-prime attribute: An attribute that is not part of any key
Prof P Sreenivasa Kumar
Department of CS&E, IITM
23
Example
1) Book (authorName, title, authorAffiliation, ISBN, publisher,
pubYear )
Keys: (authorName, title), ISBN
Not in 2NF as authorName Æ authorAffiliation
(authorAffiliation is not fully functionally dependent on the
first key)
2) Student (rollNo, name, dept, sex, hostelName, roomNo,
admitYear)
Keys: rollNo, (hostelName, roomNo)
Not in 2NF as hostelName → sex
student (rollNo, name, dept, hostelName, roomNo, admitYear)
hostelDetail (hostelName, sex)
- There are both in 2NF
Prof P Sreenivasa Kumar
Department of CS&E, IITM
24
Transitive Dependencies
Transitive dependency:
An FD X → Y in a relation schema R for which there is a set of
attributes Z ⊆ R such that
X → Z and Z → Y and Z is not a subset of any key of R
Ex: student (rollNo, name, dept, hostelName, roomNo, headDept)
Keys: rollNo, (hostelName, roomNo)
rollNo → dept; dept → headDept hold
So, rollNo → headDept a transitive dependency
Head of the dept of dept D is stored redundantly in every tuple
where D appears.
Relation is in 2NF but redundancy still exists.
Prof P Sreenivasa Kumar
Department of CS&E, IITM
25
Normal Forms – 3NF
Relation schema R is in 3NF if it is in 2NF and no non-prime
attribute of R is transitively dependent on any key of R
student (rollNo, name, dept, hostelname, roomNo, headDept)
is not in 3NF
Decompose: student (rollNo, name, dept, hostelName, roomNo)
deptInfo (dept, headDept)
both in 3NF
Redundancy in data storage - removed
Prof P Sreenivasa Kumar
Department of CS&E, IITM
26
Another definition of 3NF
Relation schema R is in 3NF if for any nontrivial FD X → A
either (i) X is a superkey or (ii) A is prime.
Suppose some R violates the above definition
⇒ There is an FD X → A for which both (i) and (ii) are false
⇒ X is not a superkey and A is non-prime attribute
Two cases arise:
1) X is contained in a key – A is not fully functionally dependent
on this key
- violation of 2NF condition
2) X is not contained in a key
K → X, X → A is a case of transitive dependency
(K – any key of R)
Prof P Sreenivasa Kumar
Department of CS&E, IITM
27
Motivating example for BCNF
gradeInfo (rollNo, studName, course, grade)
Suppose the following FDs hold:
1) rollNo, course → grade Keys:
2) studName, course → grade (rollNo, course)
3) rollNo → studName (studName, course)
4) studName → rollNo
For 1,2 lhs is a key. For 3,4 rhs is prime
So gradeInfo is in 3NF
But studName is stored redundantly along with every course
being done by the student
Prof P Sreenivasa Kumar
Department of CS&E, IITM
28
Boyce - Codd Normal Form (BCNF)
Relation schema R is in BCNF if for every nontrivial
FD X → A, X is a superkey of R.
In gradeInfo, FDs 3, 4 are nontrivial but lhs is not a superkey
So, gradeInfo is not in BCNF
Decompose:
gradeInfo (rollNo, course, grade)
studInfo (rollNo, studName)
Redundancy allowed by 3NF is disallowed by BCNF
BCNF is stricter than 3NF
3NF is stricter than 2NF
Prof P Sreenivasa Kumar
Department of CS&E, IITM
29
Decomposition of a relation schema
If R doesn’t satisfy a particular normal form,
we decompose R into smaller schemas
What’s a decomposition?
R = (A1, A2,…, An)
D = (R1, R2,…, Rk) st Ri ⊆ R and R = R1 ∪ R2 ∪ … ∪ Rk
(Ri’s need not be disjoint)
Replacing R by R1, R2,…, Rk – process of decomposing R
Ex: gradeInfo (rollNo, studName, course, grade)
R1: gradeInfo (rollNo, course, grade)
R2: studInfo (rollNo, studName)
Prof P Sreenivasa Kumar
Department of CS&E, IITM
30
Desirable Properties of Decompositions
Not all decomposition of a schema are useful
We require two properties to be satisfied
(i) Lossless join property
- the information in an instance r of R must be preserved in the
instances r1, r2,…,rk where ri = pRi
(r)
(ii) Dependency preserving property
- if a set F of dependencies hold on R it should be possible to
enforce F by enforcing appropriate dependencies on each ri
Prof P Sreenivasa Kumar
Department of CS&E, IITM
31
Lossless join property
F – set of FDs that hold on R
R – decomposed into R1, R2,…,Rk
Decomposition is lossless wrt F if
for every relation instance r on R satisfying F,
r = pR1
(r) * pR2
(r) *…* pRk
(r)
R = (A, B, C); R1 = (A, B); R2 = (B, C)
r: A B C r1: A B r2: B C r1 * r2: A B C
a1 b1 c1 a1 b1 b1 c1 a1 b1 c1
a2 b2 c2 a2 b2 b2 c2 a1 b1 c3
a3 b1 c3 a3 b1 b1 c3 a2 b2 c2
a3 b1 c1
a3 b1 c3
Spurious tuples
Original info
is distorted
Lossy join
Lossless joins
are also called
non-additive joins
Prof P Sreenivasa Kumar
Department of CS&E, IITM
32
Dependency Preserving Decompositions
Decomposition D = (R1, R2,…,Rk) of schema R preserves a set
of dependencies F if
(pR1
(F) ∪ pR2
(F) ∪… ∪ pRk
(F))
+
= F
+
Here, pRi
(F) = { (X Æ Y) ∈ F
+
| X ⊆ Ri, Y ⊆ Ri}
(called projection of F onto Ri)
Informally, any FD that logically follows from F must also
logically follow from the union of projections of F onto Ri’s
Then, D is called dependency preserving.
Prof P Sreenivasa Kumar
Department of CS&E, IITM
33
An example
Schema R = (A, B, C)
FDs F = {A → B, B → C, C → A}
Decomposition D = (R1 = {A, B}, R2 = {B, C})
pR1
(F) = {A → B, B → A}
pR2
(F) = {B → C, C → B}
(pR1
(F) ∪ pR2
(F))+
= {A → B, B → A,
B → C, C → B,
A → C, C → A} = F+
Hence Dependency preserving
Prof P Sreenivasa Kumar
Department of CS&E, IITM
34
Testing for lossless decomposition property(1/6)
R – given schema with attributes A1,A2, …, An
F – given set of FDs
D – {R1,R2, …, Rm} given decomposition of R
Is D a lossless decomposition?
Create an m × n matrix S with columns labeled as A1,A2, …, An
and rows labeled as R1,R2, …, Rm
Initialize the matrix as follows:
set S(i,j) as symbol bij for all i,j.
if Aj is in the scheme Ri, then set S(i,j) as symbol aj , for all i,j
Prof P Sreenivasa Kumar
Department of CS&E, IITM
35
Testing for lossless decomposition property(2/6)
After S is initialized, we carry out the following process on it:
repeat
for each functional dependency U → V in F do
for all rows in S which agree on U-attributes do
make the symbols in each V- attribute column
the same in all the rows as follows:
if any of the rows has an “a” symbol for the column
set the other rows to the same “a” symbol in the column
else // if no “a” symbol exists in any of the rows
choose one of the “b” symbols that appears
in one of the rows for the V-attribute and
set the other rows to that “b” symbol in the column
until no changes to S
At the end, if there exists a row with all “a” symbols then D is
lossless otherwise D is a lossy decomposition
Prof P Sreenivasa Kumar
Department of CS&E, IITM
36
Testing for lossless decomposition property(3/6)
R = (rollNo, name, advisor, advisorDept, course, grade)
FD’s = { rollNo → name; rollNo → advisor; advisor → advisorDept
rollNo, course → grade}
D : { R1 = (rollNo, name, advisor), R2 = (advisor, advisorDept),
R3 = (rollNo, course, grade) }
Matrix S : (Initial values)
rollNo name advisor advisor
Dept
course grade
R1 a1 a2 a3 b14 b15 b16
R2 b21 b22 a3 a4 b25 b26
R3 a1 b32 b33 b34 a5 a6
Prof P Sreenivasa Kumar
Department of CS&E, IITM
37
Testing for lossless decomposition property(4/6)
R = (rollNo, name, advisor, advisorDept, course, grade)
FD’s = { rollNo → name; rollNo → advisor; advisor → advisorDept
rollNo, course → grade}
D : { R1 = (rollNo, name, advisor), R2 = (advisor, advisorDept),
R3 = (rollNo, course, grade) }
Matrix S : (After enforcing rollNo → name & rollNo → advisor)
rollNo name advisor advisor
Dept
course grade
R1 a1 a2 a3 b14 b15 b16
R2 b21 b22 a3 a4 b25 b26
R3 a1 b32a2 b33a3 b34 a5 a6
Prof P Sreenivasa Kumar
Department of CS&E, IITM
38
Testing for lossless decomposition property(5/6)
R = (rollNo, name, advisor, advisorDept, course, grade)
FD’s = { rollNo → name; rollNo → advisor; advisor → advisorDept
rollNo, course → grade}
D : { R1 = (rollNo, name, advisor), R2 = (advisor, advisorDept),
R3 = (rollNo, course, grade) }
Matrix S : (After enforcing advisor → advisorDept )
No more changes. Third row with all a symbols. So a lossless join.
rollNo name advisor advisor
Dept
course grade
R1 a1 a2 a3 b14a4 b15 b16
R2 b21 b22 a3 a4 b25 b26
R3 a1 b32a2 b33a3 b34a4 a5 a6
Prof P Sreenivasa Kumar
Department of CS&E, IITM
39
Testing for lossless decomposition property(6/6)
R – given schema. F – given set of FDs
The decomposition of R into R1, R2 is lossless wrt F if and only if
either R1 ∩ R2 → (R1 – R2) belongs to F
+
or
R1 ∩ R2 → (R2 – R1) belongs to F
+
Eg. gradeInfo (rollNo, studName, course, grade)
with FDs = {rollNo, course → grade; studName, course → grade;
rollNo → studName; studName → rollNo}
decomposed into
grades (rollNo, course, grade) and studInfo (rollNo, studName)
is lossless because
rollNo → studName
Prof P Sreenivasa Kumar
Department of CS&E, IITM
40
A property of lossless joins
D1: (R1, R2,…, RK) lossless decomposition of R wrt F
D2: (Ri1, Ri2,…, Rip) lossless decomposition of Ri wrt Fi = pRi
(F)
Then
D = (R1, R2, … , Ri-1, Ri1, Ri2, …, Rip, Ri+1,…, Rk) is a
lossless decomposition of R wrt F
This property is useful in the algorithm for BCNF decomposition
Prof P Sreenivasa Kumar
Department of CS&E, IITM
41
Algorithm for BCNF decomposition
R – given schema. F – given set of FDs
D = {R} // initial decomposition
while there is a relation schema Ri in D that is not in BCNF do
{ let X → A be the FD in Ri violating BCNF;
Replace Ri by Ri1 = Ri – {A} and Ri2 = X ∪ {A} in D;
}
Decomposition of Ri is lossless as
Ri1 ∩ Ri2 = X, Ri2 – Ri1 = A and X → A
Result: a lossless decomposition of R into BCNF relations
Prof P Sreenivasa Kumar
Department of CS&E, IITM
42
Dependencies may not be preserved (1/2)
Consider the schema: townInfo (stateName, townName, distName)
with the FDs F: ST → D (town names are unique within a state)
D → S
Keys: ST, DT. – all attributes are prime
– relation in 3NF
Relation is not in BCNF as D → S and D is not a key
Decomposition given by algorithm: R1: TD R2: DS
Not dependency preserving as pR1
(F) = trivial dependencies
pR2
(F) = {D → S}
Union of these doesn’t imply ST → D
ST → D can’t be enforced unless we perform a join.
S T D
Prof P Sreenivasa Kumar
Department of CS&E, IITM
43
Dependencies may not be preserved (2/2)
Consider the schema: R (A, B, C)
with the FDs F: AB → C and C → B
Keys: AB, AC – relation in 3NF (all attributes are prime)
– Relation is not in BCNF as C → B and C is not a key
Decomposition given by algorithm: R1: CB R2: AC
Not dependency preserving as pR1
(F) = trivial dependencies
pR2
(F) = {C → B}
Union of these doesn’t imply AB → C
All possible decompositions: {AB, BC}, {BA, AC}, {AC, CB}
Only the last one is lossless!
Lossless and dependency-preserving decomposition doesn't exist.
Prof P Sreenivasa Kumar
Department of CS&E, IITM
44
Equivalent Dependency Sets
F, G – two sets of FDs on schema R
F is said to cover G if G ⊆ F+
(equivalently G+
⊆ F+
)
F is equivalent to G if F+
= G+
(or, F covers G and G covers F)
Note: To check if F covers G,
it’s enough to show that for each FD X → Y in G, Y ⊆ X
+
F
Prof P Sreenivasa Kumar
Department of CS&E, IITM
45
Canonical covers or Minimal covers
It is of interest to reduce a set of FDs F into a “standard” form
F′ such that F′ is equivalent to F.
We define that a set of FDs F is in ‘minimal form’ if
(i) the rhs of any FD of F is a single attribute
(ii) there are no redundant FDs in F
that is, there is no FD X → A in F
s.t (F – {X → A}) is equivalent to F
(iii) there are no redundant attributes on the lhs of any FD in F
that is, there is no FD X → A in F s.t there is Z ⊂ X for which
F – {X → A} ∪ {Z → A} is equivalent to F
Minimal Covers
useful in obtaining a lossless, dependency-preserving
decomposition of a scheme R into 3NF relation schemas
Prof P Sreenivasa Kumar
Department of CS&E, IITM
46
Algorithm for computing a minimal cover
R – given Schema or set of attributes; F – given set of fd’s on R
Step 1: G := F
Step 2: Replace every fd of the form X → A1A2A3…Ak in G
by X → A1; X → A2; X → A3; … ; X → Ak
Step 3: For each fd X → A in G do
for each B in X do
if A ∈ (X – B)+ wrt G then
replace X → A by (X – B) → A
Step 4: For each fd X → A in G do
if (G – { X → A})+ = G+ then
replace G by G – { X → A}
Prof P Sreenivasa Kumar
Department of CS&E, IITM
47
3NF decomposition algorithm
R – given Schema; F – given set of fd’s on R in minimal form
Use BCNF algorithm to get a lossless decomposition D = (R1, R2,…,Rk)
Note: each Ri is already in 3NF (it is in BCNF in fact!)
Algorithm: Let G be the set of fd’s not preserved in D
For each fd Z → A that is in G
Add relation scheme S = (B1,B2, …, Bs,A) to D. // Z = {B1,B2, …, Bs}
As Z → A is in F which is a minimal cover,
there is no proper subset X of Z s.t X → A. So Z is a key for S!
Any other fd X → C on S is such that C is in {B1,B2, …, Bs}.
Such fd’s do not violate 3NF because each Bj’s is prime a attribute!
Thus any scheme S added to D as above is in 3NF.
D continues to be lossless even when we add new schemas to it!
Prof P Sreenivasa Kumar
Department of CS&E, IITM
48
Multi-valued Dependencies (MVDs)
studCourseEmail(rollNo,courseNo,emailAddr)
a student enrolls for several courses and has several email addresses
rollNo →→ courseNo ( read as rollNo multi-determines courseNo )
If (CS05B007, CS370, shyam@gmail.com)
(CS05B007, CS376, shyam@yahoo.com) appear in the data then
(CS05B007, CS376, shyam@gmail.com)
(CS05B007, CS370, shyam@yahoo.com)
should also appear for, otherwise, it implies that having gmail
address has something to with doing course CS370 !!
By symmetry, rollNo →→ emailAddr
Prof P Sreenivasa Kumar
Department of CS&E, IITM
49
More about MVDs
Consider studCourseGrade(rollNo,courseNo,grade)
Note that rollNo →→ courseNo does not hold here even though
courseNo is a multi-valued attribute of student
If (CS05B007, CS370, A)
(CS05B007, CS376, B) appear in the data then
(CS05B007, CS376, A)
(CS05B007, CS370, B) will not appear !!
Attribute ‘grade’ depends on (rollNo,courseNo)
MVD’s arise when two unrelated multi-valued attributes of an
entity are sought to be represented together.
Prof P Sreenivasa Kumar
Department of CS&E, IITM
50
More about MVDs
Consider
studCourseAdvisor(rollNo,courseNo,advisor)
Note that rollNo →→ courseNo holds here
If (CS05B007, CS370, Dr Ravi)
(CS05B007, CS376, Dr Ravi)
appear in the data then swapping courseNo values
gives rise to existing tuples only.
But, since rollNo → advisor and (rollNo, courseNo) is the key,
this gets caught in checking for 2NF itself.
Prof P Sreenivasa Kumar
Department of CS&E, IITM
51
Alternative definition of MVDs
Consider R(X,Y,Z)
Suppose that X →→ Y and by symmetry X →→ Z
Then, decomposition D = (XY, XZ) should be lossless
That is, for any instance r on R, r = π XY(r) * π XZ(r)
Prof P Sreenivasa Kumar
Department of CS&E, IITM
52
MVDs and 4NF
An MVD X →→ Y on scheme R is called trivial if either
Y ⊆ X or R = X ∪ Y. Otherwise, it is called nontrivial.
4NF: A relation R is in 4NF if it is in BCNF and for every
nontrivial MVD X →→ A, X must be a superkey of R.
studCourseEmail(rollNo,courseNo,emailAddr)
is not in 4NF as
rollNo →→ courseNo and
rollNo →→ emailAddr
are both nontrivial and rollNo is not a superkey for the
relation

More Related Content

PDF
project
PDF
Note on closed sets in topological spaces
PPTX
Unit 1: Topological spaces (its definition and definition of open sets)
PDF
On Review of the Cluster Point of a Set in a Topological Space
PDF
Computational logic First Order Logic
PDF
Computational logic First Order Logic_part2
PPTX
Computational logic Propositional Calculus proof system
PDF
Ac2640014009
project
Note on closed sets in topological spaces
Unit 1: Topological spaces (its definition and definition of open sets)
On Review of the Cluster Point of a Set in a Topological Space
Computational logic First Order Logic
Computational logic First Order Logic_part2
Computational logic Propositional Calculus proof system
Ac2640014009

What's hot (20)

PPTX
topology definitions
PDF
Variations on the Higman's Lemma
PPT
9 normalization
PPTX
TOPOLOGY and TYPES OF TOPOLOGY PowerPoint
PDF
The Chase in Database Theory
PDF
Hecke Curves and Moduli spcaes of Vector Bundles
PDF
4.4 Set operations on relations
PDF
Probability theory
PDF
PDF
Bq32857863
PDF
Lesson 20: Derivatives and the Shapes of Curves (slides)
PDF
Number theory
PDF
differentiate free
PDF
1.3.2 Inductive and Deductive Reasoning
PDF
Totally R*-Continuous and Totally R*-Irresolute Functions
PPTX
Topology M.Sc. 2 semester Mathematics compactness, unit - 4
PDF
Lesson 5: Continuity
PDF
Lesson 5: Continuity (slides)
PPTX
Infinite sequence & series 1st lecture
topology definitions
Variations on the Higman's Lemma
9 normalization
TOPOLOGY and TYPES OF TOPOLOGY PowerPoint
The Chase in Database Theory
Hecke Curves and Moduli spcaes of Vector Bundles
4.4 Set operations on relations
Probability theory
Bq32857863
Lesson 20: Derivatives and the Shapes of Curves (slides)
Number theory
differentiate free
1.3.2 Inductive and Deductive Reasoning
Totally R*-Continuous and Totally R*-Irresolute Functions
Topology M.Sc. 2 semester Mathematics compactness, unit - 4
Lesson 5: Continuity
Lesson 5: Continuity (slides)
Infinite sequence & series 1st lecture
Ad

Viewers also liked (20)

PDF
1 introduction
PDF
5 data storage_and_indexing
PPT
Best Practices for Database Schema Design
PDF
4 the sql_standard
PPTX
Managing your tech career
PPTX
Webinar: Build an Application Series - Session 2 - Getting Started
PDF
3 relational model
PDF
MySQL Replication: Pros and Cons
PDF
Distributed Postgres
ZIP
Week3 Lecture Database Design
PPTX
Database Design
PDF
2 entity relationship_model
PPTX
English gcse final tips
PDF
Postgres-XC Write Scalable PostgreSQL Cluster
PDF
Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!
PPTX
Database design concept
PDF
Database Schema
PPT
Best Practices for Database Schema Design
PPT
Database design
1 introduction
5 data storage_and_indexing
Best Practices for Database Schema Design
4 the sql_standard
Managing your tech career
Webinar: Build an Application Series - Session 2 - Getting Started
3 relational model
MySQL Replication: Pros and Cons
Distributed Postgres
Week3 Lecture Database Design
Database Design
2 entity relationship_model
English gcse final tips
Postgres-XC Write Scalable PostgreSQL Cluster
Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!
Database design concept
Database Schema
Best Practices for Database Schema Design
Database design
Ad

Similar to 6 relational schema_design (20)

PPT
lec06lec06lec06lec06lec06lec06lec06lec06
PPT
Functional Dependencies in rdbms with examples
PPT
Functional dependency in relational database
PPT
Unit-2 relational algebra ikgtu DBMS.ppt
PPT
Unit05 dbms
PPT
6 normalization
PDF
Normalization.pdf
PPT
test
PPT
Normalization
PDF
Cs501 fd nf
PPT
DBMS.ppt
PDF
Functional Dependencies 2.pdf
PPTX
DBMS Unit 3.pptx
PPT
ch7-clean.ppt
PPT
MODULE 4 -Normalization_1.ppt
PPT
Cross-reference or relationship relation optionAdditional01.ppt
PPT
DBMS MODULE-5 normalisation in database management
PDF
Introduction to database-Normalisation
lec06lec06lec06lec06lec06lec06lec06lec06
Functional Dependencies in rdbms with examples
Functional dependency in relational database
Unit-2 relational algebra ikgtu DBMS.ppt
Unit05 dbms
6 normalization
Normalization.pdf
test
Normalization
Cs501 fd nf
DBMS.ppt
Functional Dependencies 2.pdf
DBMS Unit 3.pptx
ch7-clean.ppt
MODULE 4 -Normalization_1.ppt
Cross-reference or relationship relation optionAdditional01.ppt
DBMS MODULE-5 normalisation in database management
Introduction to database-Normalisation

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Digital Logic Computer Design lecture notes
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Construction Project Organization Group 2.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Well-logging-methods_new................
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
web development for engineering and engineering
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CH1 Production IntroductoryConcepts.pptx
Internet of Things (IOT) - A guide to understanding
Digital Logic Computer Design lecture notes
Model Code of Practice - Construction Work - 21102022 .pdf
Mechanical Engineering MATERIALS Selection
Embodied AI: Ushering in the Next Era of Intelligent Systems
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Construction Project Organization Group 2.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Well-logging-methods_new................
Foundation to blockchain - A guide to Blockchain Tech
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
Sustainable Sites - Green Building Construction
web development for engineering and engineering
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026

6 relational schema_design

  • 1. Prof P Sreenivasa Kumar Department of CS&E, IITM 1 Database Design and Normal Forms Database Design coming up with a ‘good’ schema is very important How do we characterize the “goodness” of a schema ? If two or more alternative schemas are available how do we compare them ? What are the problems with “bad” schema designs ? Normal Forms: Each normal form specifies certain conditions If the conditions are satisfied by the schema certain kind of problems are avoided Details follow….
  • 2. Prof P Sreenivasa Kumar Department of CS&E, IITM 2 An Example student relation with attributes: studName, rollNo, sex, studDept department relation with attributes: deptName, officePhone, hod Several students belong to a department. studDept gives the name of the student’s department. Correct schema: What are the problems that arise ? studName studName rollNo rollNo sex sex studDept deptName deptName officePhone officePhone HOD HOD Incorrect schema: Student Department Student Dept
  • 3. Prof P Sreenivasa Kumar Department of CS&E, IITM 3 Problems with bad schema Redundant storage of data: Office Phone & HOD info - stored redundantly once with each student that belongs to the department wastage of disk space A program that updates Office Phone of a department must change it at several places • more running time • error - prone Transactions running on a database must take as short time as possible to increase transaction throughput
  • 4. Prof P Sreenivasa Kumar Department of CS&E, IITM 4 Update Anomalies Another kind of problem with bad schema Insertion anomaly: No way of inserting info about a new department unless we also enter details of a (dummy) student in the department Deletion anomaly: If all students of a certain department leave and we delete their tuples, information about the department itself is lost Update Anomaly: Updating officePhone of a department • value in several tuples needs to be changed • if a tuple is missed - inconsistency in data
  • 5. Prof P Sreenivasa Kumar Department of CS&E, IITM 5 Normal Forms First Normal Form (1NF) - included in the definition of a relation Second Normal Form (2NF) defined in terms of Third Normal Form (3NF) functional dependencies Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) - defined using multivalued dependencies Fifth Normal Form (5NF) or Project Join Normal Form (PJNF) defined using join dependencies
  • 6. Prof P Sreenivasa Kumar Department of CS&E, IITM 6 Functional Dependencies A functional dependency (FD) X → Y (read as X determines Y) (X ⊆ R, Y ⊆ R) is said to hold on a schema R if in any instance r on R, if two tuples t1, t2 (t1 ≠ t2, t1 ∈ r, t2 ∈ r) agree on X i.e. t1 [X] = t2 [X] then they also agree on Y i.e. t1 [Y] = t2 [Y] Note: If K ⊂ R is a key for R then for any A ∈ R, K → A holds because the above if …..then condition is vacuously true
  • 7. Prof P Sreenivasa Kumar Department of CS&E, IITM 7 Functional Dependencies – Examples Consider the schema: Student ( studName, rollNo, sex, dept, hostelName, roomNo) Since rollNo is a key, rollNo → {studName, sex, dept, hostelName, roomNo} Suppose that each student is given a hostel room exclusively, then hostelName, roomNo → rollNo Suppose boys and girls are accommodated in separate hostels, then hostelName → sex FDs are additional constraints that can be specified by designers
  • 8. Prof P Sreenivasa Kumar Department of CS&E, IITM 8 Trivial / Non-Trivial FDs An FD X → Y where Y ⊆ X - called a trivial FD, it always holds good An FD X → Y where Y ⊈ X - non-trivial FD An FD X → Y where X ∩ Y = f - completely non-trivial FD
  • 9. Prof P Sreenivasa Kumar Department of CS&E, IITM 9 Deriving new FDs Given that a set of FDs F holds on R we can infer that a certain new FD must also hold on R For instance, given that X → Y, Y → Z hold on R we can infer that X → Z must also hold How to systematically obtain all such new FDs ? Unless all FDs are known, a relation schema is not fully specified
  • 10. Prof P Sreenivasa Kumar Department of CS&E, IITM 10 Entailment relation We say that a set of FDs F ⊨{ X → Y} (read as F entails X → Y or F logically implies X → Y) if in every instance r of R on which FDs F hold, FD X → Y also holds. Armstrong came up with several inference rules for deriving new FDs from a given set of FDs We define F+ = {X → Y | F ⊨X → Y} F + : Closure of F
  • 11. Prof P Sreenivasa Kumar Department of CS&E, IITM 11 Armstrong’s Inference Rules (1/2) 1. Reflexive rule F ⊨ {X → Y | Y ⊆ X} for any X. Trivial FDs 2. Augmentation rule {X → Y} ⊨ {XZ → YZ}, Z ⊆ R. Here XZ denotes X ⋃ Z 3. Transitive rule {X → Y, Y → Z} ⊨ {X → Z} 4. Decomposition or Projective rule {X → YZ} ⊨ {X → Y} 5. Union or Additive rule {X → Y, X → Z} ⊨ {X → YZ} 6. Pseudo transitive rule {X → Y, WY → Z} ⊨ {WX → Z}
  • 12. Prof P Sreenivasa Kumar Department of CS&E, IITM 12 Rules 4, 5, 6 are not really necessary. For instance, Rule 5: {X → Y, X → Z} ⊨ {X → YZ} can be proved using 1, 2, 3 alone 1) X → Y 2) X → Z 3) X → XY Augmentation rule on 1 4) XY → ZY Augmentation rule on 2 5) X → ZY Transitive rule on 3, 4. Similarly, 4, 6 can be shown to be unnecessary. But it is useful to have 4, 5, 6 as short-cut rules given Armstrong's Inference Rules (2/2)
  • 13. Prof P Sreenivasa Kumar Department of CS&E, IITM 13 Sound and Complete Inference Rules Armstrong showed that Rules (1), (2) and (3) are sound and complete. These are called Armstrong’s Axioms (AA) Soundness: Every new FD X → Y derived from a given set of FDs F using Armstrong's Axioms is such that F ⊨{X → Y} Completeness: Any FD X → Y logically implied by F (i.e. F ⊨ {X → Y}) can be derived from F using Armstrong’s Axioms
  • 14. Prof P Sreenivasa Kumar Department of CS&E, IITM 14 Proving Soundness Suppose X → Y is derived from F using AA in some n steps. If each step is correct then overall deduction would be correct. Single step: Apply Rule (1) or (2) or (3) Rule (1) – obviously results in correct FDs Rule (2) – {X → Y}⊨ {XZ → YZ}, Z ⊆ R Suppose t1, t2 ∈ r agree on XZ ⇒ t1, t2 agree on X ⇒ t1, t2 agree on Y (since X → Y holds on r) ⇒ t1, t2 agree as YZ Hence Rule (2) gives rise to correct FDs Rule (3) – {X → Y, Y → Z} ⊨ X → Z Suppose t1, t2 ∈ r agree on X ⇒ t1, t2 agree on Y (since X → Y holds) ⇒ t1, t2 agree on Z (since Y → Z holds)
  • 15. Prof P Sreenivasa Kumar Department of CS&E, IITM 15 Proving Completeness of Armstrong’s Axioms (1/4) Define X + F (closure of X wrt F) = {A | X → A can be derived from F using AA}, A ∈ R Claim1: X → Y can be derived from F using AA iff Y ⊆ X + (If) Let Y = {A1, A2,…, An}. Y ⊆ X + ⇒ X → Ai can be derived from F using AA (1 ≤ i ≤ n) By union rule, it follows that X → Y can be derived from F. (Only If) X → Y can be derived from F using AA By projective rule X → Ai (1 ≤ i ≤ n) Thus by definition of X+ , Ai ∈ X + ⇒ Y ⊆ X +
  • 16. Prof P Sreenivasa Kumar Department of CS&E, IITM 16 Completeness of Armstrong’s Axioms (2/4) Completeness: (F ⊨ {X → Y}) ⇒ X → Y follows from F using AA We will prove the contrapositive: X →Y can’t be derived from F using AA ⇒ F ⊭ {X → Y} ⇒ ∃ a relation instance r on R st all the FDs of F hold on r but X → Y doesn’t hold. Consider the relation instance r with just two tuples: X + attributes Other attributes r: 1 1 1 …1 1 1 1 …1 1 1 1 …1 0 0 0 …0
  • 17. Prof P Sreenivasa Kumar Department of CS&E, IITM 17 Claim 2: All FDs of F are satisfied by r Suppose not. Let W → Z in F be an FD not satisfied by r Then W ⊆ X+ and Z ⊈ X + Let A ∈ Z – X + Now, X → W follows from F using AA as W ⊆ X + (claim 1) X → Z follows from F using AA by transitive rule Z → A follows from F using AA by reflexive rule as A ∈ Z X → A follows from F using AA by transitive rule By definition of closures, A must belong to X + - a contradiction. r: 1 1 1 …1 1 1 1 …1 Hence the claim. 1 1 1 …1 0 0 0 …0 X+ R - X+ Completeness Proof (3/4)
  • 18. Prof P Sreenivasa Kumar Department of CS&E, IITM 18 Completeness Proof (4/4) Claim 3: X → Y is not satisfied by r Suppose not Because of the structure of r, Y ⊆ X+ ⇒ X → Y can be derived from F using AA contradicting the assumption about X → Y Hence the claim Thus, whenever X → Y doesn’t follow from F using AA, F doesn’t logically imply X → Y Armstrong’s Axioms are complete.
  • 19. Prof P Sreenivasa Kumar Department of CS&E, IITM 19 Consequence of Completeness of AA X + = {A | X → A follows from F using AA} = {A | F ⊨ X → A} Similarly F + = {X → Y | F ⊨ X → Y} = {X → Y | X → Y follows from F using AA}
  • 20. Prof P Sreenivasa Kumar Department of CS&E, IITM 20 Computing closures The size of F + can sometimes be exponential in the size of F. For instance, F = {A → B1, A → B2,….., A → Bn} F + = {A → X} where X ⊆ {B1, B2,…,Bn}. Thus |F + | = 2 n Computing F + : computationally expensive Fortunately, checking if X → Y ∈ F+ can be done by checking if Y ⊆ X + F Computing attribute closure (X + F) is easier
  • 21. Prof P Sreenivasa Kumar Department of CS&E, IITM 21 Computing X+ F We compute a sequence of sets X0, X1,… as follows: X0:= X; // X is the given set of attributes Xi+1:= Xi ∪ {A | there is a FD Y → Z in F and A ∈ Z and Y ⊆ Xi} Since X0 ⊆ X1 ⊆ X2 ⊆ ... ⊆ Xi ⊆ Xi+1 ⊆ ...⊆ R and R is finite, There is an integer i st Xi = Xi+1 = Xi+2 =… and X+ F is equal to Xi.
  • 22. Prof P Sreenivasa Kumar Department of CS&E, IITM 22 Normal Forms – 2NF Full functional dependency: An FD X → A for which there is no proper subset Y of X such that Y → A (A is said to be fully functionally dependent on X) 2NF: A relation schema R is in 2NF if every non-prime attribute is fully functionally dependent on any key of R prime attribute: A attribute that is part of some key non-prime attribute: An attribute that is not part of any key
  • 23. Prof P Sreenivasa Kumar Department of CS&E, IITM 23 Example 1) Book (authorName, title, authorAffiliation, ISBN, publisher, pubYear ) Keys: (authorName, title), ISBN Not in 2NF as authorName Æ authorAffiliation (authorAffiliation is not fully functionally dependent on the first key) 2) Student (rollNo, name, dept, sex, hostelName, roomNo, admitYear) Keys: rollNo, (hostelName, roomNo) Not in 2NF as hostelName → sex student (rollNo, name, dept, hostelName, roomNo, admitYear) hostelDetail (hostelName, sex) - There are both in 2NF
  • 24. Prof P Sreenivasa Kumar Department of CS&E, IITM 24 Transitive Dependencies Transitive dependency: An FD X → Y in a relation schema R for which there is a set of attributes Z ⊆ R such that X → Z and Z → Y and Z is not a subset of any key of R Ex: student (rollNo, name, dept, hostelName, roomNo, headDept) Keys: rollNo, (hostelName, roomNo) rollNo → dept; dept → headDept hold So, rollNo → headDept a transitive dependency Head of the dept of dept D is stored redundantly in every tuple where D appears. Relation is in 2NF but redundancy still exists.
  • 25. Prof P Sreenivasa Kumar Department of CS&E, IITM 25 Normal Forms – 3NF Relation schema R is in 3NF if it is in 2NF and no non-prime attribute of R is transitively dependent on any key of R student (rollNo, name, dept, hostelname, roomNo, headDept) is not in 3NF Decompose: student (rollNo, name, dept, hostelName, roomNo) deptInfo (dept, headDept) both in 3NF Redundancy in data storage - removed
  • 26. Prof P Sreenivasa Kumar Department of CS&E, IITM 26 Another definition of 3NF Relation schema R is in 3NF if for any nontrivial FD X → A either (i) X is a superkey or (ii) A is prime. Suppose some R violates the above definition ⇒ There is an FD X → A for which both (i) and (ii) are false ⇒ X is not a superkey and A is non-prime attribute Two cases arise: 1) X is contained in a key – A is not fully functionally dependent on this key - violation of 2NF condition 2) X is not contained in a key K → X, X → A is a case of transitive dependency (K – any key of R)
  • 27. Prof P Sreenivasa Kumar Department of CS&E, IITM 27 Motivating example for BCNF gradeInfo (rollNo, studName, course, grade) Suppose the following FDs hold: 1) rollNo, course → grade Keys: 2) studName, course → grade (rollNo, course) 3) rollNo → studName (studName, course) 4) studName → rollNo For 1,2 lhs is a key. For 3,4 rhs is prime So gradeInfo is in 3NF But studName is stored redundantly along with every course being done by the student
  • 28. Prof P Sreenivasa Kumar Department of CS&E, IITM 28 Boyce - Codd Normal Form (BCNF) Relation schema R is in BCNF if for every nontrivial FD X → A, X is a superkey of R. In gradeInfo, FDs 3, 4 are nontrivial but lhs is not a superkey So, gradeInfo is not in BCNF Decompose: gradeInfo (rollNo, course, grade) studInfo (rollNo, studName) Redundancy allowed by 3NF is disallowed by BCNF BCNF is stricter than 3NF 3NF is stricter than 2NF
  • 29. Prof P Sreenivasa Kumar Department of CS&E, IITM 29 Decomposition of a relation schema If R doesn’t satisfy a particular normal form, we decompose R into smaller schemas What’s a decomposition? R = (A1, A2,…, An) D = (R1, R2,…, Rk) st Ri ⊆ R and R = R1 ∪ R2 ∪ … ∪ Rk (Ri’s need not be disjoint) Replacing R by R1, R2,…, Rk – process of decomposing R Ex: gradeInfo (rollNo, studName, course, grade) R1: gradeInfo (rollNo, course, grade) R2: studInfo (rollNo, studName)
  • 30. Prof P Sreenivasa Kumar Department of CS&E, IITM 30 Desirable Properties of Decompositions Not all decomposition of a schema are useful We require two properties to be satisfied (i) Lossless join property - the information in an instance r of R must be preserved in the instances r1, r2,…,rk where ri = pRi (r) (ii) Dependency preserving property - if a set F of dependencies hold on R it should be possible to enforce F by enforcing appropriate dependencies on each ri
  • 31. Prof P Sreenivasa Kumar Department of CS&E, IITM 31 Lossless join property F – set of FDs that hold on R R – decomposed into R1, R2,…,Rk Decomposition is lossless wrt F if for every relation instance r on R satisfying F, r = pR1 (r) * pR2 (r) *…* pRk (r) R = (A, B, C); R1 = (A, B); R2 = (B, C) r: A B C r1: A B r2: B C r1 * r2: A B C a1 b1 c1 a1 b1 b1 c1 a1 b1 c1 a2 b2 c2 a2 b2 b2 c2 a1 b1 c3 a3 b1 c3 a3 b1 b1 c3 a2 b2 c2 a3 b1 c1 a3 b1 c3 Spurious tuples Original info is distorted Lossy join Lossless joins are also called non-additive joins
  • 32. Prof P Sreenivasa Kumar Department of CS&E, IITM 32 Dependency Preserving Decompositions Decomposition D = (R1, R2,…,Rk) of schema R preserves a set of dependencies F if (pR1 (F) ∪ pR2 (F) ∪… ∪ pRk (F)) + = F + Here, pRi (F) = { (X Æ Y) ∈ F + | X ⊆ Ri, Y ⊆ Ri} (called projection of F onto Ri) Informally, any FD that logically follows from F must also logically follow from the union of projections of F onto Ri’s Then, D is called dependency preserving.
  • 33. Prof P Sreenivasa Kumar Department of CS&E, IITM 33 An example Schema R = (A, B, C) FDs F = {A → B, B → C, C → A} Decomposition D = (R1 = {A, B}, R2 = {B, C}) pR1 (F) = {A → B, B → A} pR2 (F) = {B → C, C → B} (pR1 (F) ∪ pR2 (F))+ = {A → B, B → A, B → C, C → B, A → C, C → A} = F+ Hence Dependency preserving
  • 34. Prof P Sreenivasa Kumar Department of CS&E, IITM 34 Testing for lossless decomposition property(1/6) R – given schema with attributes A1,A2, …, An F – given set of FDs D – {R1,R2, …, Rm} given decomposition of R Is D a lossless decomposition? Create an m × n matrix S with columns labeled as A1,A2, …, An and rows labeled as R1,R2, …, Rm Initialize the matrix as follows: set S(i,j) as symbol bij for all i,j. if Aj is in the scheme Ri, then set S(i,j) as symbol aj , for all i,j
  • 35. Prof P Sreenivasa Kumar Department of CS&E, IITM 35 Testing for lossless decomposition property(2/6) After S is initialized, we carry out the following process on it: repeat for each functional dependency U → V in F do for all rows in S which agree on U-attributes do make the symbols in each V- attribute column the same in all the rows as follows: if any of the rows has an “a” symbol for the column set the other rows to the same “a” symbol in the column else // if no “a” symbol exists in any of the rows choose one of the “b” symbols that appears in one of the rows for the V-attribute and set the other rows to that “b” symbol in the column until no changes to S At the end, if there exists a row with all “a” symbols then D is lossless otherwise D is a lossy decomposition
  • 36. Prof P Sreenivasa Kumar Department of CS&E, IITM 36 Testing for lossless decomposition property(3/6) R = (rollNo, name, advisor, advisorDept, course, grade) FD’s = { rollNo → name; rollNo → advisor; advisor → advisorDept rollNo, course → grade} D : { R1 = (rollNo, name, advisor), R2 = (advisor, advisorDept), R3 = (rollNo, course, grade) } Matrix S : (Initial values) rollNo name advisor advisor Dept course grade R1 a1 a2 a3 b14 b15 b16 R2 b21 b22 a3 a4 b25 b26 R3 a1 b32 b33 b34 a5 a6
  • 37. Prof P Sreenivasa Kumar Department of CS&E, IITM 37 Testing for lossless decomposition property(4/6) R = (rollNo, name, advisor, advisorDept, course, grade) FD’s = { rollNo → name; rollNo → advisor; advisor → advisorDept rollNo, course → grade} D : { R1 = (rollNo, name, advisor), R2 = (advisor, advisorDept), R3 = (rollNo, course, grade) } Matrix S : (After enforcing rollNo → name & rollNo → advisor) rollNo name advisor advisor Dept course grade R1 a1 a2 a3 b14 b15 b16 R2 b21 b22 a3 a4 b25 b26 R3 a1 b32a2 b33a3 b34 a5 a6
  • 38. Prof P Sreenivasa Kumar Department of CS&E, IITM 38 Testing for lossless decomposition property(5/6) R = (rollNo, name, advisor, advisorDept, course, grade) FD’s = { rollNo → name; rollNo → advisor; advisor → advisorDept rollNo, course → grade} D : { R1 = (rollNo, name, advisor), R2 = (advisor, advisorDept), R3 = (rollNo, course, grade) } Matrix S : (After enforcing advisor → advisorDept ) No more changes. Third row with all a symbols. So a lossless join. rollNo name advisor advisor Dept course grade R1 a1 a2 a3 b14a4 b15 b16 R2 b21 b22 a3 a4 b25 b26 R3 a1 b32a2 b33a3 b34a4 a5 a6
  • 39. Prof P Sreenivasa Kumar Department of CS&E, IITM 39 Testing for lossless decomposition property(6/6) R – given schema. F – given set of FDs The decomposition of R into R1, R2 is lossless wrt F if and only if either R1 ∩ R2 → (R1 – R2) belongs to F + or R1 ∩ R2 → (R2 – R1) belongs to F + Eg. gradeInfo (rollNo, studName, course, grade) with FDs = {rollNo, course → grade; studName, course → grade; rollNo → studName; studName → rollNo} decomposed into grades (rollNo, course, grade) and studInfo (rollNo, studName) is lossless because rollNo → studName
  • 40. Prof P Sreenivasa Kumar Department of CS&E, IITM 40 A property of lossless joins D1: (R1, R2,…, RK) lossless decomposition of R wrt F D2: (Ri1, Ri2,…, Rip) lossless decomposition of Ri wrt Fi = pRi (F) Then D = (R1, R2, … , Ri-1, Ri1, Ri2, …, Rip, Ri+1,…, Rk) is a lossless decomposition of R wrt F This property is useful in the algorithm for BCNF decomposition
  • 41. Prof P Sreenivasa Kumar Department of CS&E, IITM 41 Algorithm for BCNF decomposition R – given schema. F – given set of FDs D = {R} // initial decomposition while there is a relation schema Ri in D that is not in BCNF do { let X → A be the FD in Ri violating BCNF; Replace Ri by Ri1 = Ri – {A} and Ri2 = X ∪ {A} in D; } Decomposition of Ri is lossless as Ri1 ∩ Ri2 = X, Ri2 – Ri1 = A and X → A Result: a lossless decomposition of R into BCNF relations
  • 42. Prof P Sreenivasa Kumar Department of CS&E, IITM 42 Dependencies may not be preserved (1/2) Consider the schema: townInfo (stateName, townName, distName) with the FDs F: ST → D (town names are unique within a state) D → S Keys: ST, DT. – all attributes are prime – relation in 3NF Relation is not in BCNF as D → S and D is not a key Decomposition given by algorithm: R1: TD R2: DS Not dependency preserving as pR1 (F) = trivial dependencies pR2 (F) = {D → S} Union of these doesn’t imply ST → D ST → D can’t be enforced unless we perform a join. S T D
  • 43. Prof P Sreenivasa Kumar Department of CS&E, IITM 43 Dependencies may not be preserved (2/2) Consider the schema: R (A, B, C) with the FDs F: AB → C and C → B Keys: AB, AC – relation in 3NF (all attributes are prime) – Relation is not in BCNF as C → B and C is not a key Decomposition given by algorithm: R1: CB R2: AC Not dependency preserving as pR1 (F) = trivial dependencies pR2 (F) = {C → B} Union of these doesn’t imply AB → C All possible decompositions: {AB, BC}, {BA, AC}, {AC, CB} Only the last one is lossless! Lossless and dependency-preserving decomposition doesn't exist.
  • 44. Prof P Sreenivasa Kumar Department of CS&E, IITM 44 Equivalent Dependency Sets F, G – two sets of FDs on schema R F is said to cover G if G ⊆ F+ (equivalently G+ ⊆ F+ ) F is equivalent to G if F+ = G+ (or, F covers G and G covers F) Note: To check if F covers G, it’s enough to show that for each FD X → Y in G, Y ⊆ X + F
  • 45. Prof P Sreenivasa Kumar Department of CS&E, IITM 45 Canonical covers or Minimal covers It is of interest to reduce a set of FDs F into a “standard” form F′ such that F′ is equivalent to F. We define that a set of FDs F is in ‘minimal form’ if (i) the rhs of any FD of F is a single attribute (ii) there are no redundant FDs in F that is, there is no FD X → A in F s.t (F – {X → A}) is equivalent to F (iii) there are no redundant attributes on the lhs of any FD in F that is, there is no FD X → A in F s.t there is Z ⊂ X for which F – {X → A} ∪ {Z → A} is equivalent to F Minimal Covers useful in obtaining a lossless, dependency-preserving decomposition of a scheme R into 3NF relation schemas
  • 46. Prof P Sreenivasa Kumar Department of CS&E, IITM 46 Algorithm for computing a minimal cover R – given Schema or set of attributes; F – given set of fd’s on R Step 1: G := F Step 2: Replace every fd of the form X → A1A2A3…Ak in G by X → A1; X → A2; X → A3; … ; X → Ak Step 3: For each fd X → A in G do for each B in X do if A ∈ (X – B)+ wrt G then replace X → A by (X – B) → A Step 4: For each fd X → A in G do if (G – { X → A})+ = G+ then replace G by G – { X → A}
  • 47. Prof P Sreenivasa Kumar Department of CS&E, IITM 47 3NF decomposition algorithm R – given Schema; F – given set of fd’s on R in minimal form Use BCNF algorithm to get a lossless decomposition D = (R1, R2,…,Rk) Note: each Ri is already in 3NF (it is in BCNF in fact!) Algorithm: Let G be the set of fd’s not preserved in D For each fd Z → A that is in G Add relation scheme S = (B1,B2, …, Bs,A) to D. // Z = {B1,B2, …, Bs} As Z → A is in F which is a minimal cover, there is no proper subset X of Z s.t X → A. So Z is a key for S! Any other fd X → C on S is such that C is in {B1,B2, …, Bs}. Such fd’s do not violate 3NF because each Bj’s is prime a attribute! Thus any scheme S added to D as above is in 3NF. D continues to be lossless even when we add new schemas to it!
  • 48. Prof P Sreenivasa Kumar Department of CS&E, IITM 48 Multi-valued Dependencies (MVDs) studCourseEmail(rollNo,courseNo,emailAddr) a student enrolls for several courses and has several email addresses rollNo →→ courseNo ( read as rollNo multi-determines courseNo ) If (CS05B007, CS370, shyam@gmail.com) (CS05B007, CS376, shyam@yahoo.com) appear in the data then (CS05B007, CS376, shyam@gmail.com) (CS05B007, CS370, shyam@yahoo.com) should also appear for, otherwise, it implies that having gmail address has something to with doing course CS370 !! By symmetry, rollNo →→ emailAddr
  • 49. Prof P Sreenivasa Kumar Department of CS&E, IITM 49 More about MVDs Consider studCourseGrade(rollNo,courseNo,grade) Note that rollNo →→ courseNo does not hold here even though courseNo is a multi-valued attribute of student If (CS05B007, CS370, A) (CS05B007, CS376, B) appear in the data then (CS05B007, CS376, A) (CS05B007, CS370, B) will not appear !! Attribute ‘grade’ depends on (rollNo,courseNo) MVD’s arise when two unrelated multi-valued attributes of an entity are sought to be represented together.
  • 50. Prof P Sreenivasa Kumar Department of CS&E, IITM 50 More about MVDs Consider studCourseAdvisor(rollNo,courseNo,advisor) Note that rollNo →→ courseNo holds here If (CS05B007, CS370, Dr Ravi) (CS05B007, CS376, Dr Ravi) appear in the data then swapping courseNo values gives rise to existing tuples only. But, since rollNo → advisor and (rollNo, courseNo) is the key, this gets caught in checking for 2NF itself.
  • 51. Prof P Sreenivasa Kumar Department of CS&E, IITM 51 Alternative definition of MVDs Consider R(X,Y,Z) Suppose that X →→ Y and by symmetry X →→ Z Then, decomposition D = (XY, XZ) should be lossless That is, for any instance r on R, r = π XY(r) * π XZ(r)
  • 52. Prof P Sreenivasa Kumar Department of CS&E, IITM 52 MVDs and 4NF An MVD X →→ Y on scheme R is called trivial if either Y ⊆ X or R = X ∪ Y. Otherwise, it is called nontrivial. 4NF: A relation R is in 4NF if it is in BCNF and for every nontrivial MVD X →→ A, X must be a superkey of R. studCourseEmail(rollNo,courseNo,emailAddr) is not in 4NF as rollNo →→ courseNo and rollNo →→ emailAddr are both nontrivial and rollNo is not a superkey for the relation