Relational Database Design

Dept of CSE | III YEAR | V SEMESTER CS T53 | DATABASE MANAGEMENT SYSTEMS | UNIT 3
1 |Prepared By : Mr. PRABU.U/AP |Dept. of Computer Science and Engineering | SKCET
UNIT III
Relational Database Design: Features of Good Relational Designs – Atomic Domains
and First Normal Form – Second Normal Form – Decomposition Using Functional
Dependencies – Functional Dependency Theory – Algorithms for decomposition –
Decomposition Using Multi-valued Dependencies – More Normal Forms – Database
Design Process – Modeling Temporal Data
3.1 FEATURES OF GOOD RELATIONAL DESIGNS
3.1.1 Design Alternative: Larger Schemas
 Combined Schemas
 Combined Schema without repetition
3.1.2 Design Alternative: Smaller Schemas
3.1.1 Design Alternative: Larger Schemas
It is possible to generate a set of relation schemas directly from the E-R design.
The goodness (or badness) of the resulting set of schemas depends on how good the E-R
design was in the first place.
 Combined Schemas
 Suppose we combine borrower and loan to get
bor_loan = (customer_id, loan_type, amount )
 Result is possible repetition of information (L100 in example below)
loan_type amount
..........
..........
L-1000
............
............
...........
...........
1000
..........
..........
Figure 3.1: cust_loan table
customer_id loan_type
..............
..............
C0001
C0002
..............
..............
...............
...............
L-1000
L-1000
..............
..............
customer_id loan_type amount
C0001
C0002
L-1000
L-1000
1000
1000

 Combined Schemas without repetition
 Consider combining loan_branch and loan
loan_amt_br = (loan_number, amount, branch_name)
 No repetition (as suggested by example below)
loan_number Amount
..........
..........
25235
............
............
...........
...........
1000
..........
..........
Figure 3.2: loan_branch table
3.1.2 Design Alternative: Smaller Schemas
We need to write a rule that says “if there were a schema (dept_name, budget),
then dept_name is able to serve as the primary key.” This rule is specified as a
functional dependency.
dept_ name→ budget
employee (ID, name, street, city, salary)
Not all decompositions are good. Suppose we decompose employee into
employee1 (ID, name)
employee2 (name, street, city, salary)
Figure 3.3: Loss of information via a bad decomposition.
loan_number branch_name
..............
..............
25235
..............
..............
...............
...............
Anna Nagar
..............
..............
loan_number Amount branch_name
25235 1000 Anna Nagar

3.2 ATOMIC DOMAINS AND FIRST NORMAL FORM
3.2.1 Atomic Domains
3.2.2 First Normal Form
 Employee (unnormalized)
 Employee (normalized – 1 NF)
 Alterations
3.2.1 Atomic Domains
A domain is atomic if elements of the domain are considered to be indivisible
units. We say that a relation schema R is in first normal form (1NF) if the domains of
all attributes of R are atomic.
A set of names is an example of a non atomic value. Non atomic values
complicate storage and encourage redundant (repeated) storage of data.
3.2.2 First Normal Form
 A relation is said to be in first normal form if all of its attributes have domains
that are indivisible or atomic. Also called as Flat File.
 Each attribute must be atomic. No repeating columns within a row. No multi-
valued columns.
 Each row of data must have a unique identifier (or Primary Key)
Employee (unnormalized)
Employee (normalized – 1 NF)
Alterations
Update Anamoly
 Update address of a student who occurs twice or more than in a table, address
column should be updated in all rows.
Insertion Anamoly
 Student admission, sid, sname, address known but course unknown – leads NULL
Value insertion.
Deletion Anamoly
 Student 101 discontinued course – leads to delete the other details also.
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

3.3 SECOND NORMAL FORM
 A relation is said to be in second normal form if it meets both the followings
 The relation is in first normal form.
 All non-key attributes are functionally dependent on the entire primary
key.
 Each attribute must be functionally dependent on the primary key.
 2NF improves data integrity.
 Prevents update, insert, and delete anomalies.
Employee (normalized – 1 NF)
 Name, dept_no, and dept_name are functionally dependent on emp_no.
(emp_no -> name, dept_no, dept_name)
 Skills is not functionally dependent on emp_no since it is not unique to each
emp_no.
Employee (2NF) Skills (2NF)
3.4 DECOMPOSITION USING FUNCTIONAL DEPENDENCIES
3.4.1 Keys and Functional Dependencies
3.4.2 Boyce–Codd Normal Form
3.4.3 BCNF and Dependency Preservation
3.4.4 Third Normal Form
3.4.5 Higher Normal Forms
3.4.1 Keys and Functional Dependencies
Keys
 A subset K of R is a super key of r (R) if, in any legal instance of r (R), for all pairs
t1 and t2 of tuples in the instance of r if t1 = t2, then t1[K] = t2[K].
 That is, no two tuples in any legal instance of relation r (R) may have the same
value on attribute set K.
emp_no skills
1 C
1 Perl
1 Java
2 Linux
2 Mac
3 DB2
3 Oracle
3 Java
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
emp_no name dept_nodept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D

Functional Dependencies
 Y is functionally dependent on X
 if the value of Y is determined by X.
 if Y = X +1
 value of X will determine the resultant value of Y
 Y is dependent on X as a function of the value of X
3.4.2 Boyce–Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional dependencies
if for all functional dependencies in F+ of the form.

where R and R, at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Example schema not in BCNF:
bor_loan = ( customer_id, loan_number, amount )
because loan_number amount holds on bor_loan but loan_number is not a superkey.
Decomposing a Schema into BCNF
Suppose we have a schema R and a nontrivial dependency causes a
violation of BCNF.
We decompose R into:
 (U )
 ( R- () )
In our example,
= loan_number
= amount
and bor_loan is replaced by
(U ) = ( loan_number, amount )
( R- () ) = ( customer_id, loan_number )
3.4.3 BCNF and Dependency Preservation
 Constraints, including functional dependencies, are costly to check in practice
unless they pertain to only one relation
 If it is sufficient to test only those dependencies on each individual relation of a
decomposition in order to ensure that all functional dependencies hold, then that
decomposition is dependency preserving.
 Because it is not always possible to achieve both BCNF and dependency
preservation, we consider a weaker normal form, known as third normal form.
3.4.4 Third Normal Form
A relation is said to be in third normal form if it meets both the followings
 The relation is in second normal form.
 There is no transitive dependence that is, all the non-key attributes
depend only on the primary key.
Remove transitive dependencies.
 Any transitive dependencies are moved into a smaller (subset) table.
3NF further improves data integrity.

 Prevents update, insert, and delete anomalies.
Employee (2NF) Skills (2NF)
Employee (3NF) Department (3NF)
Skills (3NF)
3.4.5 Higher Normal Forms
Refer 3.6 and 3.7
3.5 FUNCTIONAL DEPENDENCY THEORY
3.5.1 Closure of a Set of Functional Dependencies
3.5.2 Closure of Attribute Sets
3.5.3 Canonical Cover
3.5.4 Lossless-join Decomposition
3.5.5 Dependency Preservation
3.5.1 Closure of a Set of Functional Dependencies
Given a set F set of functional dependencies, there are certain other functional
dependencies that are logically implied by F.
For example: If A B and B C, then we can infer that A C
The set of all functional dependencies logically implied by F is the closure of F.
We denote the closure of F by F+.
We can find all of F+ by applying Armstrong’s Axioms:
 if , then (reflexivity)
 if , then (augmentation)
 if , and , then (transitivity)
These rules are
 sound (generate only functional dependencies that actually hold) and
 complete (generate all functional dependencies that hold).
emp_no skills
1 C
1 Perl
1 Java
2 Linux
2 Mac
3 DB2
3 Oracle
3 Java
emp_no name dept_no
1 Kevin Jacobs 201
2 Barbara Jones 224
3 Jake Rivera 201
dept_nodept_name
201 R&D
224 IT
emp_no skills
1 C
1 Perl
1 Java
2 Linux
2 Mac
3 DB2
3 Oracle
3 Java
emp_no name dept_nodept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D

Procedure for Computing F+
F + = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
We can further simplify manual computation of F+ by using the following additional
rules.
 If holds and holds, then holds (union)
 If holds, then holds and holds (decomposition)
 If holds and holds, then holds (pseudo transitivity)
The above rules can be inferred from Armstrong’s axioms.
3.5.2 Closure of Attribute Sets
Given a set of attributes define the closure of under F (denoted by +) as the
set of attributes that are functionally determined by under F.
Algorithm to compute +, the closure of under F
result := ;
while (changes to result) do
for each in F do
begin
if result then result := result 
end
Uses of attribute closure
There are several uses of the attribute closure algorithm:
 Testing for superkey
 To test if is a superkey, we compute +, and check if + contains all
attributes of R.
 Testing functional dependencies
 To check if a functional dependency holds (or, in other words, is in
F+), just check if +.
 That is,we compute + by using attribute closure, and then check if it
contains .
 Is a simple and cheap test, and very useful.
 Computing closure of F
For each R, we find the closure +, and for each S +, we output a
functional dependency S.

3.5.3 Canonical Cover
A canonical cover for F is a set of dependencies Fc such that
 F logically implies all dependencies in Fc, and
 Fc logically implies all dependencies in F, and
 No functional dependency in Fc contains an extraneous attribute, and
 Each left side of functional dependency in Fc is unique.
To compute a canonical cover for F
repeat
Use the union rule to replace any dependencies in F
  1 1 and 1 2 with 1 1 2
Find a functional dependency with an extraneous attribute either in or
in 
If an extraneous attribute is found, delete it from 
until F does not change
Computing a Canonical Cover
R = (A, B, C)
F = {A BC, B C, A B, AB C}
 Combine A BC and A B into A BC
 Set is now {A BC, BC, AB C}
 A is extraneous in AB C
 Check if the result of deleting A from AB C is implied by the other
dependencies
 Yes: in fact, B C is already present!
 Set is now {A BC, B C}
 C is extraneous in A BC
 Check if A C is logically implied by A B and the other dependencies
 Yes: using transitivity on A B and B C.
 The canonical cover is:
A B
B C
3.5.4 Lossless-join Decomposition
The decomposition is lossless if, for all legal database, relation r contains the
same set of tuples as the result of the following SQL query:
select * from (select R1 from r) natural join (select R2 from r)
This is stated more concisely in the relational algebra as:
R1 and R2 form a lossless decomposition of R if at least one of the following
functional dependencies is in F+:
R1 ∩ R2 → R1
R1 ∩ R2 → R2

Example
R = (A, B, C)
F = {A B, B C}
Can be decomposed in two different ways
R1 = (A, B), R2 = (B, C)
 Lossless-join decomposition:
R1 R2 = {B} and B BC
 Dependency preserving
R1 = (A, B), R2 = (A, C)
 Lossless-join decomposition:
R1 R2 = {A} and A AB
 Not dependency preserving
3.5.5 Dependency Preservation
Let Fi be the set of dependencies F + that include only attributes in Ri
 A decomposition is dependency preserving, if (F1 F2 … Fn )+ = F +
 If it is not, then checking updates for violation of functional dependencies may
require computing joins, which is expensive.
Testing for Dependency Preservation
 To check if a dependency is preserved in a decomposition of R into R1, R2,
…, Rn we apply the following test (with attribute closure done with respect to F)
result = 
while (changes to result) do
for each Ri in the decomposition
t = (result Ri)+ Ri
result = result t
 If result contains all attributes in , then the functional dependency is
preserved.
 We apply the test on all dependencies in F to check if a decomposition is
dependency preserving.
 This procedure takes polynomial time, instead of the exponential time required
to compute F+ and (F1 F2 … Fn)+
3.6 ALGORITHMS FOR DECOMPOSITION
3.6.1 BCNF Decomposition
3.6.2 3NF Decomposition
3.6.3 Correctness of the 3NF Algorithm
3.6.4 Comparison of BCNF and 3NF
3.6.1 BCNF Decomposition
The definition of BCNF can be used directly to test if a relation is in BCNF.
However, computation of F+ can be a tedious task.

Testing for BCNF
To check if a nontrivial dependency causes a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R, that is, it is a superkey of R.
 Simplified test: To check if a relation schema R is in BCNF, it suffices to check
only the dependencies in the given set F for violation of BCNF, rather than
checking all dependencies in F+.
 If none of the dependencies in F causes a violation of BCNF, then none of the
dependencies in F+ will cause a violation of BCNF either.
However, using only F is incorrect when testing a relation in a decomposition of R.
Consider R = (A, B, C, D, E), with F = { A B, BC D}
 Decompose R into R1 = (A,B) and R2 = (A,C,D, E)
 Neither of the dependencies in F contain only attributes from (A,C,D,E) so we
might be mislead into thinking R2 satisfies BCNF.
 In fact, dependency AC D in F+ shows R2 is not in BCNF.
BCNF Decomposition Algorithm
If R is not in BCNF, we can decompose R into a collection of BCNF schemas R1,
R2, . . . , Rn by the algorithm. The algorithm uses dependencies that demonstrate
violation of BCNF to perform the decomposition.
result := {R };
done := false;
compute F +;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let be a nontrivial functional dependency that holds on Ri
such that Ri is not in F +,
and = ;
result := (result – Ri ) (Ri – ) (, );
end
else done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join.
There are some situations where BCNF is not dependency preserving, and
efficient checking for FD violation on updates is important.
Solution: Define a weaker normal form, called Third Normal Form (3NF)
 Allows some redundancy
 But functional dependencies can be checked on individual relations without
computing a join.
 There is always a lossless-join, dependency preserving decomposition into 3NF.

Let Fc be a canonical cover for F;
i := 0;
for each functional dependency in Fc do
if none of the schemas Rj, 1 j i contains 
then begin
i := i + 1;
Ri := 
end
if none of the schemas Rj, 1 j i contains a candidate key for R
then begin
i := i + 1;
Ri := any candidate key for R;
end
return (R1, R2, ..., Ri)
3.6.3 Correctness of the 3NF Algorithm
If a relation Ri is in the decomposition generated by the algorithm, then Ri
satisfies 3NF.
 Let Ri be generated from the dependency 
 Let B be any nontrivial functional dependency on Ri.
 Now, B can be in either or but not in both. Consider each case separately.
Case 1: If B in :
 If is a superkey, the 2nd condition of 3NF is satisfied.
 Otherwise must contain some attribute not in 
 Since B is in F+ it must be derivable from Fc, by using attribute closure on .
 Attribute closure not have used . If it had been used, must be contained
in the attribute closure of , which is not possible, since we assumed is not a
superkey.
 Now, using ({ B}) and B, we can derive B (since , and B
since B is nontrivial)
 Then, B is extraneous in the right hand side of ; which is not possible since
is in Fc.
 Thus, if B is in then must be a superkey, and the second condition of 3NF
must be satisfied.
Case 2: B is in .
 Since a is a candidate key, the third alternative in the definition of 3NF is trivially
satisfied.
 In fact, we cannot show that g is a superkey.
 This shows exactly why the third alternative is present in the definition of 3NF.
3.6.4 Comparison of BCNF and 3NF
1. We have seen BCNF and 3NF.
 It is always possible to obtain a 3NF design without sacrificing lossless-
join or dependency-preservation.
 If we do not eliminate all transitive dependencies, we may need to use
null values to represent some of the meaningful relationships.
 Repetition of information occurs.

2. These problems can be illustrated with Banker-schema.
 As banker-name bname, we may want to express relationships between
a banker and his or her branch.
.
 This table shows how we must either have a corresponding value for
customer name, or include a null.
 Repetition of information also occurs.
 Every occurrence of the banker's name must be accompanied by the
branch name.
3. If we must choose between BCNF and dependency preservation, it is generally
better to opt for 3NF.
 If we cannot check for dependency preservation efficiently, we either pay
a high price in system performance or risk the integrity of the data.
 The limited amount of redundancy in 3NF is then a lesser evil.
4. To summarize, our goal for a relational database design is
 BCNF.
 Lossless-join.
 Dependency-preservation.
5. If we cannot achieve this, we accept
 3NF
 Lossless-join.
 Dependency-preservation.
6. A final point: there is a price to pay for decomposition. When we decompose a
relation, we have to use natural joins or Cartesian products to put the pieces
back together. This takes computational time.
3.7 DECOMPOSITION USING MULTI-VALUED DEPENDENCIES
3.7.1 Multi-valued Dependencies (MVDs)
3.7.2 Fourth Normal Form
3.7.1 Multi-valued Dependencies (MVDs)
Let R be a relation schema and let R and R. The multi-valued dependency
   
holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that
t1[] = t2 [], there exist tuples t3 and t4 in r such that:
t1[] = t2 [] = t3 [] = t4 []
t3[] = t1 []
t3[R – ] = t2[R – ]
t4 [] = t2[]
t4[R – ] = t1[R – ]
ENAME BANKER-
NAME
BNAME
Bill
Tom
Mary
Null
Jhon
Jhon
Jhon
Tim
SFU
SFU
SFU
Austin

Theory of MVDs
From the definition of multivalued dependency, we can derive the following rule:
If , then 
That is, every functional dependency is also a multivalued dependency.
The closure D+ of D is the set of all functional and multivalued dependencies logically
implied by D.
 We can compute D+ from D, using the formal definitions of functional
dependencies and multivalued dependencies.
 We can manage with such reasoning for very simple multivalued dependencies,
which seem to be most common in practice.
 For complex dependencies, it is better to reason about sets of dependencies
using a system of inference rules.
3.7.2 Fourth Normal Form
A relation schema R is in 4NF with respect to a set D of functional and multivalued
dependencies if for all multivalued dependencies in D+ of the form , where
R and R, at least one of the following hold:
  is trivial (i.e., or = R)
  is a superkey for schema R
If a relation is in 4NF it is in BCNF.
Restriction of Multivalued Dependencies
The restriction of D to Ri is the set Di consisting of
 All functional dependencies in D+ that include only attributes of Ri
 All multivalued dependencies of the form
  (Ri)
where Ri and is in D+
result: = {R};
done := false;
compute D+;
Let Di denote the restriction of D+ to Ri
while (not done)
if (there is a schema Ri in result that is not in 4NF) then
begin
let be a nontrivial multivalued dependency that holds
on Ri such that Ri is not in Di, and ;
result := (result Ri) (Ri ) (, );
end
else done:= true;
Note: each Ri is in 4NF, and decomposition is lossless join
3.8 MORE NORMAL FORMS
 Join dependencies generalize multivalued dependencies
 lead to project-join normal form (PJNF) (also called fifth normal form)
 A class of even more general constraints, leads to a normal form called domain
key normal form.

 Problem with these generalized constraints: are hard to reason with, and no set
of sound and complete set of inference rules exists. Hence rarely used.
3.9 DATABASE DESIGN PROCESS
3.9.1 E-R Model and Normalization
3.9.2 Naming of Attributes and Relationships
3.9.3 Denormalization for Performance
3.9.4 Other Design Issues
We have assumed schema R is given
 R could have been generated when converting ER diagram to a set of tables.
 R could have been a single relation containing all attributes that are of interest
(called universal relation).
 Normalization breaks R into smaller relations.
 R could have been the result of some ad hoc design of relations, which we then
test/convert to normal form.
3.9.1 E-R Model and Normalization
 When an ER diagram is carefully designed, identifying all entities correctly, the
tables generated from the ER diagram should not need further normalization.
 However, in a real (imperfect) design, there can be functional dependencies from
nonkey attributes of an entity to other attributes of the entity.
 Example: an employee entity with attributes department_number and
department_address, and a functional dependency.
department_number department_address
 Good design would have made department an entity.
 Functional dependencies from nonkey attributes of a relationship set possible,
but rare most relationships are binary.
3.9.2 Naming of Attributes and Relationships
A desirable feature of a database design is the unique-role assumption, which
means that each attribute name has a unique meaning in the database.
In large database schemas, relationship sets are often named via a concatenation
of the names of related entity sets, perhaps with an intervening hyphen or underscore.
We have used a few such names, for example inst sec and student sec.
3.9.3 Denormalization for Performance
 May want to use non- normalized schema for performance.
 For example, displaying customer_name along with account_number and balance
requires join of account with depositor.
Alternative 1: Use denormalized relation containing attributes of account as well as
depositor with all above attributes
 faster lookup
 extra space and extra execution time for updates
 extra coding work for programmer and possibility of error in extra code
Alternative 2: use a materialized view defined as account depositor
account depositor

 Benefits and drawbacks same as above, except no extra coding work for
programmer and avoids possible errors.
3.9.4 Other Design Issues
 Some aspects of database design are not caught by normalization
 Examples of bad database design, to be avoided:
Instead of earnings (company_id, year, amount ), use
 earnings_2004, earnings_2005, earnings_2006, etc., all on the schema
(company_id, earnings).
 Above are in BCNF, but make querying across years difficult
and needs new table each year
 company_year(company_id, earnings_2004, earnings_2005, earnings_2006)
 Also in BCNF, but also makes querying across years difficult and
requires new attribute each year.
 Is an example of a crosstab, where values for one attribute become
column names.
 Used in spreadsheets, and in data analysis tools.
3.10 MODELING TEMPORAL DATA
 Temporal data have an association time interval during which the data are valid.
 A snapshot is the value of the data at a particular point in time.
 Several proposals to extend ER model by adding valid time to
 attributes, e.g. address of a customer at different points in time
 entities, e.g. time duration when an account exists
 relationships, e.g. time during which a customer owned an account
 But no accepted standard
 Adding a temporal component results in functional dependencies like
customer_id customer_street, customer_city
not to hold, because the address varies over time
 A temporal functional dependency X Y holds on schema R if the functional
dependency X Y holds on all snapshots for all legal instances r (R )

Relational Database Design

More Related Content

What's hot (20)

Similar to Relational Database Design (20)

More from Prabu U (20)

Recently uploaded (20)

Relational Database Design