SlideShare a Scribd company logo
CS 222
Database Management System
             Spring 2010-11


           Lecture 5
Database Design (Decomposition)

            Korra Sathya Babu
         Department of Computer Science
                 NIT Rourkela
Recap
•       Design of DB is needed to reduce redundancy and
        anomalies
•       The theory of Functional Dependency is completely
        studied
•       Better Design requires schema refinement
•       A solution for schema refinement is Synthesis of
        relations




    2/11/2013              Database Design                  2
Relation Decomposition




                                 R1



            R-X +        X            X +- X


                    R2
                             R



2/11/2013                Database Design       3
Relation Decomposition
•       Reason for Decomposition
        •       A solution for reducing redundancy and Anomalies


•       Rules for synthesis
        •       Lossless Join (Information Preservation)

        •       Dependency Preservation (a special case of information
                preservation)


•       Decomposition (synthesis) types
        •       By functional dependency
        •       By multi-valued dependency
        •       By Join dependency


    2/11/2013                        Database Design                     4
Lossless Join
•       Definition
                A decomposition D = {R1, R2,..., Rm} of R has the lossless
                join property with respect to the set of dependencies F on
                R if, for every relation r of R that satisfies F, the following
                holds,           ( R1(r), ..., Rm(r)) = r

                where   is the natural join of all the relations in D



•       The word loss in lossless refers to loss of
        information, not to loss of tuples.




    2/11/2013                              Database Design                        5
Test for Lossless Join
Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of
Functional Dependencies

Lossless Join Test Algorithm:
Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one
         column j for each attribute Aj in R.
Step 2: Set S(i, j) := bij for all matrix entries
Step 3: For each row i representing relation schema Ri Do
                           {for each column j representing Aj do
                                    {if relation Ri includes attribute Aj then
                                               set S(i, j) := aj;}
Step 4: Repeat the following loop until a complete loop execution results in no
                           changes to S.




 2/11/2013                             Database Design                                 6
Test for Lossless Join
Lossless Join Test Algorithm: continues…
Step 4: Repeat the following loop until a complete loop execution results in no
                                     changes to S.
         If {for each function dependency X      Y in F do
                  for all rows in S which have the same symbols in the columns
                   corresponding to attributes in X do
                  {make the symbols in each column that correspond to
                  an attribute in Y be the same in all these rows as follows:
                  if any of the rows has an “a” symbol for the column,
                  set the other rows to the same “a” symbol in the column.
                  If no “a” symbol exists for the attribute in any of the
                  rows, choose one of the “b” symbols that appear in one
                  of the rows for the attribute and set the other rows to
                  that same “b” symbol in the column;}}

Step 5: If a row is made up entirely of “a” symbols, then the
                 decomposition has the lossless join property;
                 otherwise it does not.

2/11/2013                         Database Design                                 7
Example 1

    Emp_PROJ


       SSN        PNUM      hours       ENAME         PNAME PLOCATION

     F = {SSN     ENAME, PNUM    {PNAME, PLOCATION}, {SSN, PNUM}   hours}




      R1                            R2
            SSN    ENAME            PNUM PNAME PLOCATION

                      R3
                           SSN      PNUM           hours

2/11/2013                        Database Design                        8
Example 1

              A1     A2     A3         A4        A5        A6
             SSN   ENAME   PNUM      PNAME    PLOCATION   hours

        R1   b11    b12     b13        b14       b15      b16
        R2   b21    b22     b23        b24       b25      b26

        R3   b31    b32     b33        b34       b35      b36




        R1   a1     a2      b13        b14       b15      b16

        R2   b21    b22     a3         a4        a5       b26

        R3   a1     b32     a3         b34       b35      a6




2/11/2013                   Database Design                       9
Example 1
                                         SSN    ENAME
             SSN   ENAME
        R1   a1     a2      b13         b14        b15      b16

        R2   b21    b22     a3          a4         a5       b26

        R3   a1     a2      a3          b34        b35      a6


                                    PNUM       {PNAME, PLOCATION}

                           PNUM     PNAME       PLOCATION
        R1   a1     a2      b13        b14         b15      b16

        R2   b21    b22     a3         a4          a5       b26

        R3   a1     a2      a3         a4          a5       a6



2/11/2013                   Database Design                         10
Example 2

  Emp_PROJ

       SSN       PNUM    hours         ENAME       PNAME PLOCATION
      F = {SSN   ENAME, PNUM   {PNAME, PLOCATION}, {SSN, PNUM}        hours}




 R1                               R2

 ENAME      PLOCATION            SSN    PNUM     hours   PNAME   PLOCATION




2/11/2013                      Database Design                               11
Example 2

              A1     A2     A3         A4        A5        A6
             SSN   ENAME   PNUM      PNAME    PLOCATION   hours

        R1   b11    b12     b13        b14       b15      b16
        R2   b21    b22     b23        b24       b25      b26




        R1   b11    a2      b13        b14       a5       b16
        R2   a1     b22     a3         a4        a5       a6

                                     SSN    ENAME
                                     PNUM    {PNAME, PLOCATION}
                                     {SSN, PNUM}   hours


2/11/2013                   Database Design                       12
Problems
•       Check whether the following decompositions are
        lossy or lossless
        •       Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE.
                Let F={AC, BC, CD, DEC, CEA}
        •       R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}.
                R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ)
        •       R(XYZ), F={XY, ZY}. R1(XY), R2(YZ)
        •       R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)}
                F={XYW, XWP, PQZ, XYQ}




    2/11/2013                    Database Design                   13
Dependency Preservation
R was decomposed (normalisation) into R1, …, Rn
S - the set of FDs for R
S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers
   to only the attributes of Ri)
S’ = S1 … Sn (usually, S’ S)

the decomposition is dependency preserving if
                                       S’+ = S+




2/11/2013               Database Design                    14
Test for Dependency Preservation
Input: decomposition D={D1,…,Dk} and a set of FDs F
Dependency Preservation Test:
Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the
determinant of the FD under consideration). ie set T=X and continue with step 2
Step 2: Repeat step 3 until the set T no longer changes. When T no longer
changes continue with step 4
Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply
the corresponding Ri operation (on a set of attributes T with respect to set
of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and repeat step 3
Step 4: Test to see if Y(the right hand side of the FD under consideration)
is such that Y ⊂ T. There are two outcomes to this test. If the answer is
negative. i.e. if Y not a subset of T then stop the execution of the
algorithm and report that the decomposition does not preserve the FD. If
the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other
FDs in F that need to be considered repeat step 1 with a FD that has not
been considered before. If no more FDs in F then continue with step 4

2/11/2013                            Database Design                                  15
Problems

1. Given R(XYZ) and the set F = {ZX , XYZ}. Check if the
   decomposition R1(XY) and R2(XZ) preserve the set F.
2. Given R(ABCD) and the set F = {AB , CD}. Check if the
   decomposition R1(AB) and R2(CD) preserve the set F.
3. Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the
   relation R(WXYZ) preserves the dependencies of the set F={XY,
   YZ, ZW, WX}.
4. Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check
   if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve
   the set F.




2/11/2013                   Database Design                       16
Normalization
•       Normalization is the process of successive reduction
        of a given set of relations to a better form (reduced
        redundancy and anomalies)
•       The normalization that one needs to sustain
        depends on the work flow (tradeoff between fast
        access, maintenance of integrity)
•       Assumes that all possible functional dependencies
        are known
        •       First construct a minimal set of FDs
        •       Then apply algorithms that construct a required Normal
                Form
•       Additional criteria may be needed to ensure that the
        set of relations in a relational database are
        atisfactory
    2/11/2013                       Database Design                      17
1 NF
•       A relation is in first normal form (1NF) if it does not
        contain any repeating columns or repeating groups
        of columns
•       It is the process of converting complex data
        structures into more simple, stable data structures
•       A relvar is in 1NF if and only if in every legal value
        of that relvar, every tuple contains exactly one
        value for each attribute
•       First Normal From (1NF)
        •       Unique rows
        •       All attributes are atomic




    2/11/2013                        Database Design          18
2 NF
•       A table is in the second normal form (2NF) if it is in
        the first normal form and if all non-key columns in
        the table depend on the entire primary key
•       The following relation is in 1NF but not 2NF

      EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed)

      Functional dependencies:
      1. Emp_ID  Name, Dept, Salary                     partial key dependency
      2. Emp_ID, Course  Date_Completed


       Decompose into 2NF
       EMPLOYEE1(Emp_ID, Name, Dept, Salary)
       Functional dependencies: Emp_ID Name, Dept, Salary

       EMPCOURSE(Emp_ID, Course,Date_Completed)
       Functional dependency: Emp_ID, Course  Date_Completed


    2/11/2013                          Database Design                            19
3 NF
•       A table is in the third normal form (3NF) if it is in
        the second normal form and if all non-key columns
        in the table depend non-transitively on the entire
        primary key

      SALES(Customer_ID, Customer_Name, SalesPerson, Region)
      Functional dependencies:
      1. Customer_ID  Customer_Name, SalesPerson, Region
      2. SalesPerson  Region                                   Transitive Dependency



      Decompose into 3NF
      SALES1(Customer_ID, Customer_Name, SalesPerson)
      Functional dependencies: Customer_ID Customer_Name, SalesPerson

      SPERSON(SalesPerson, Region)
      Functional dependency: SalesPerson  Region



    2/11/2013                          Database Design                                  20
BCNF
•       A table is in Boyce-Codd normal form (BCNF) if
        every column, on which some other column is fully
        functionally dependent, is also a candidate for the
        primary key of the table
•       A table is in BCNF if the only determinants in the
        table are the candidate keys
      SCHOOL(Student, Subject, Teacher)
      Functional dependencies:
      1. Student, Subject  Teacher
      2. Student, Teacher  Subject
      3. Teacher  Subject

      Decompose into BCNF
      SCHOOL1(Student, Subject)
      SCHOOL2(Subject, Teacher)

      All Functional Dependencies vanished except TeacherSubject

    2/11/2013                             Database Design           21
Comparison between 3NF and BCNF

•       It is always possible to decompose a relation into
        relations in 3NF such that:
               the decomposition is lossless
               the dependencies are preserved


•       It is always possible to decompose a relation into
        relations in BCNF such that:
               the decomposition is lossless
               but it may not be possible to preserve dependencies
               But may eliminate more redundancy




    2/11/2013                       Database Design                   22
Multivalued Dependency
        Let R be a relation schema and let         R and   R. The
        multivalued dependency

                 holds on R if in any legal relation r(R), for all pairs for
tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist tuples t3 and t4
in r such that:
                  t1[ ] = t2 [ ] = t3 [ ] = t4 [ ]
                  t3[ ] = t1 [ ]
                  t3[R – ] = t2[R – ]
                  t4 ] = t2[ ]
                  t4[R – ] = t1[R – ]

•       MVD is a tuple generating Dependency

    2/11/2013                    Database Design                           23
4 NF
•       A table is in the fourth normal form (4 NF) if it is in
        BCNF and does not have any independent multi-
        valued parts of the primary key
•       If there are two attributes A and B and for a given
        value of A if there exists multiple values of B, then
        we say that an MVD exists between A and B
•       The normal forms after BCNF are theoretical
        interests




    2/11/2013                Database Design                  24
4 NF
    Student Table
            Student       Subject               Language
            Geeta        Mythology               English
            Geeta        Psychology              English
            Geeta        Mythology               Hindi
            Geeta        Psychology              Hindi
            Shekher      Gardening               English


      Student         Subject
      Student         Language




2/11/2013                     Database Design              25
4 NF

 Split the independent multi-valued components of the
 primary key into two tables
 The primary key is (student subject language)

  Student_Subject Table                     Student_Language Table
        Student       Subject                     Student      Language
            Geeta    Mythology                    Geeta         English
            Geeta    Psychology                   Geeta         Hindi
        Shekher      Gardening                    Shekher       English




      Here we take care of the update anomaly


2/11/2013                       Database Design                           26
Surprise: Loss less Decomposition
•       There exists relations that cannot be nonloss-
        decomposed into two projects, but can be
        decomposed into three or more




    2/11/2013               Database Design              27
Join Dependency
•       Definition: A relation R satisfies the join
        Dependency (JD) *(X,Y,…,Z)
                    iff R is equal to the join of its projects on
        X,Y,..,Z, where X,Y,..,Z are subsets of the set of
        attributes of R.
•       Consider the following Suppliers(S), Parts(P) and Location they
        Supply (L) table
        SPL Table
            S       P    L
                                              S        P    P    L
            S1      P1   L2
                                 ACTUAL       S1       P1   P1   L2
                              DECOMPOSTION
            S1      P2   L1
                                              S1       P2   P2   L1
            S2      P1   L1
                                              S2       P1   P1   L1
            S1      P1   L1



    2/11/2013                        Database Design                      28
Join Dependency

        S    P    L
                                       S        P                 P         L
        S1   P1   L2
                          ACTUAL       S1       P1                P1        L2
                       DECOMPOSTION
        S1   P2   L1
                                       S1       P2                P2        L1
        S2   P1   L1
                                       S2       P1                P1        L1
        S1   P1   L1
                                                      Join

                                                     S       P         L
                                                     S1      P1        L2
                                                     S1      P2        L1
                                                     S2      P1        L1
                                                     S1      P1        L1
                  Spurious                           S2      P1        L2
                  Tuple
2/11/2013                     Database Design                                    29
Join Dependency

        S    P    L
                                       S        P         P        L        L    S
        S1   P1   L2
                       DECOMPOSTION    S1       P1       P1       L2        L2   S1
        S1   P2   L1
                                       S1       P2       P2       L1        L1   S1
        S2   P1   L1
                                       S2       P1       P1       L1        L2   S2
        S1   P1   L1
                                                          Join

                                                     S        P        L
                                                     S1       P1       L2
                                                     S1       P2       L1
                                                     S2       P1       L1
                                                     S1       P1       L1




2/11/2013                     Database Design                                        30
5 NF
•       A table is in fifth normal form (5NF) if it is in the
        fourth normal form and every join dependency in
        the table is implied by the candidate key
•       Its also called as the Project Join Normal Form
        (PJNF)




    2/11/2013                 Database Design                   31
Normalization
             Un-normalized Relation
                                                         Arrange every atomic value in the cell
                                                         (intersection of row and column) of a table

             First Normal Form (1NF)
                                                         Eliminate Partial Dependencies

            Second Normal Form (2NF)
                                                         Eliminate Transitive Dependencies

             Third Normal Form (3NF)
                                                         Make every determinant as a key

            Boyce-Codd Normal Form
                                                         Eliminate Multi-valued Dependencies
                                                         that are not Functional Dependencies
            Fourth Normal Form (4NF)
                                                         Eliminate Join Dependencies that are not
                                                         implied by Candidate keys
             Fifth Normal Form (5NF)
2/11/2013                              Database Design                                                 32
Denormalization
•       Denormalization if a process in which we retain or
        introduce some amount of redundancy for faster
        data access
•       Where there arise tradeoffs




    2/11/2013               Database Design                  33
Summary
•       Normalization helps to reduce redundancy and few
        anomalies
•       The first 3 (1, 2 and 3) normal forms are practical
        but BCNF, 4NF and 5 NF are more of theoretical
        interests
•       Denormalization is done for fast access




    2/11/2013               Database Design                   34

More Related Content

PPTX
Digital image processing
PPSX
Image Enhancement in Spatial Domain
PDF
Design of animation sequence
PPTX
Image compression
PPTX
Artificial Intelligence
PDF
IMAGE COMPRESSION AND DECOMPRESSION SYSTEM
Digital image processing
Image Enhancement in Spatial Domain
Design of animation sequence
Image compression
Artificial Intelligence
IMAGE COMPRESSION AND DECOMPRESSION SYSTEM

What's hot (20)

PPT
Multimodal Biometric Systems
PPTX
Introduction to biometric systems security
PPTX
Real Time Object Tracking
PPTX
Polygon mesh
PPTX
Region based image segmentation
PPTX
Chapter 8 image compression
PPTX
Biometrics ppt
PPSX
Edge Detection and Segmentation
PDF
EMOTION DETECTION USING AI
PPTX
Histogram Specification or Matching Problem
PPTX
SRAVYA
PPTX
Object tracking
PPTX
Image Acquisition and Representation
PPTX
Finger vein technology
PPTX
Project Face Detection
PDF
Backpropagation in RNN and LSTM
PPTX
Image representation
PPTX
Smoothing in Digital Image Processing
PPTX
Object recognition
PPTX
Object Recognition
Multimodal Biometric Systems
Introduction to biometric systems security
Real Time Object Tracking
Polygon mesh
Region based image segmentation
Chapter 8 image compression
Biometrics ppt
Edge Detection and Segmentation
EMOTION DETECTION USING AI
Histogram Specification or Matching Problem
SRAVYA
Object tracking
Image Acquisition and Representation
Finger vein technology
Project Face Detection
Backpropagation in RNN and LSTM
Image representation
Smoothing in Digital Image Processing
Object recognition
Object Recognition
Ad

Viewers also liked (8)

PDF
7 relational database design algorithms and further dependencies
PPT
PHP mysql Aggregate functions
PPT
Unit05 dbms
PPTX
New Perspectives: Access.03
PPTX
Aggregate Function - Database
PPTX
PPT
Database Management System
PPT
functional dependencies with example
7 relational database design algorithms and further dependencies
PHP mysql Aggregate functions
Unit05 dbms
New Perspectives: Access.03
Aggregate Function - Database
Database Management System
functional dependencies with example
Ad

Similar to Dbms4 (20)

PPT
Normalization
PPTX
PDF
Database Design 2009
PPTX
Chapter-9 Normalization
PPT
PPT
Normalization
PDF
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
PDF
Introduction to database
PPT
VNSISPL_DBMS_Concepts_ch7
PPSX
Oracle fundamentals and sql
DOC
Sql All Tuts 2009
PPT
For mapping a category whose defining superclass Additional02.ppt
DOC
rdbms-notes
DOC
B T0066
DOC
Bt0066
PDF
Databases
PPTX
Methods of organizing data
PDF
AMIS - Can collections speed up your PL/SQL?
PPTX
Databases
Normalization
Database Design 2009
Chapter-9 Normalization
Normalization
3 DATABASE MANAGEMENT SYSTEMS DESIGN PRINCIPLES.pdf
Introduction to database
VNSISPL_DBMS_Concepts_ch7
Oracle fundamentals and sql
Sql All Tuts 2009
For mapping a category whose defining superclass Additional02.ppt
rdbms-notes
B T0066
Bt0066
Databases
Methods of organizing data
AMIS - Can collections speed up your PL/SQL?
Databases

Dbms4

  • 1. CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela
  • 2. Recap • Design of DB is needed to reduce redundancy and anomalies • The theory of Functional Dependency is completely studied • Better Design requires schema refinement • A solution for schema refinement is Synthesis of relations 2/11/2013 Database Design 2
  • 3. Relation Decomposition R1 R-X + X X +- X R2 R 2/11/2013 Database Design 3
  • 4. Relation Decomposition • Reason for Decomposition • A solution for reducing redundancy and Anomalies • Rules for synthesis • Lossless Join (Information Preservation) • Dependency Preservation (a special case of information preservation) • Decomposition (synthesis) types • By functional dependency • By multi-valued dependency • By Join dependency 2/11/2013 Database Design 4
  • 5. Lossless Join • Definition A decomposition D = {R1, R2,..., Rm} of R has the lossless join property with respect to the set of dependencies F on R if, for every relation r of R that satisfies F, the following holds, ( R1(r), ..., Rm(r)) = r where is the natural join of all the relations in D • The word loss in lossless refers to loss of information, not to loss of tuples. 2/11/2013 Database Design 5
  • 6. Test for Lossless Join Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of Functional Dependencies Lossless Join Test Algorithm: Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one column j for each attribute Aj in R. Step 2: Set S(i, j) := bij for all matrix entries Step 3: For each row i representing relation schema Ri Do {for each column j representing Aj do {if relation Ri includes attribute Aj then set S(i, j) := aj;} Step 4: Repeat the following loop until a complete loop execution results in no changes to S. 2/11/2013 Database Design 6
  • 7. Test for Lossless Join Lossless Join Test Algorithm: continues… Step 4: Repeat the following loop until a complete loop execution results in no changes to S. If {for each function dependency X Y in F do for all rows in S which have the same symbols in the columns corresponding to attributes in X do {make the symbols in each column that correspond to an attribute in Y be the same in all these rows as follows: if any of the rows has an “a” symbol for the column, set the other rows to the same “a” symbol in the column. If no “a” symbol exists for the attribute in any of the rows, choose one of the “b” symbols that appear in one of the rows for the attribute and set the other rows to that same “b” symbol in the column;}} Step 5: If a row is made up entirely of “a” symbols, then the decomposition has the lossless join property; otherwise it does not. 2/11/2013 Database Design 7
  • 8. Example 1 Emp_PROJ SSN PNUM hours ENAME PNAME PLOCATION F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours} R1 R2 SSN ENAME PNUM PNAME PLOCATION R3 SSN PNUM hours 2/11/2013 Database Design 8
  • 9. Example 1 A1 A2 A3 A4 A5 A6 SSN ENAME PNUM PNAME PLOCATION hours R1 b11 b12 b13 b14 b15 b16 R2 b21 b22 b23 b24 b25 b26 R3 b31 b32 b33 b34 b35 b36 R1 a1 a2 b13 b14 b15 b16 R2 b21 b22 a3 a4 a5 b26 R3 a1 b32 a3 b34 b35 a6 2/11/2013 Database Design 9
  • 10. Example 1 SSN ENAME SSN ENAME R1 a1 a2 b13 b14 b15 b16 R2 b21 b22 a3 a4 a5 b26 R3 a1 a2 a3 b34 b35 a6 PNUM {PNAME, PLOCATION} PNUM PNAME PLOCATION R1 a1 a2 b13 b14 b15 b16 R2 b21 b22 a3 a4 a5 b26 R3 a1 a2 a3 a4 a5 a6 2/11/2013 Database Design 10
  • 11. Example 2 Emp_PROJ SSN PNUM hours ENAME PNAME PLOCATION F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours} R1 R2 ENAME PLOCATION SSN PNUM hours PNAME PLOCATION 2/11/2013 Database Design 11
  • 12. Example 2 A1 A2 A3 A4 A5 A6 SSN ENAME PNUM PNAME PLOCATION hours R1 b11 b12 b13 b14 b15 b16 R2 b21 b22 b23 b24 b25 b26 R1 b11 a2 b13 b14 a5 b16 R2 a1 b22 a3 a4 a5 a6 SSN ENAME PNUM {PNAME, PLOCATION} {SSN, PNUM} hours 2/11/2013 Database Design 12
  • 13. Problems • Check whether the following decompositions are lossy or lossless • Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE. Let F={AC, BC, CD, DEC, CEA} • R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}. R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ) • R(XYZ), F={XY, ZY}. R1(XY), R2(YZ) • R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)} F={XYW, XWP, PQZ, XYQ} 2/11/2013 Database Design 13
  • 14. Dependency Preservation R was decomposed (normalisation) into R1, …, Rn S - the set of FDs for R S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to only the attributes of Ri) S’ = S1 … Sn (usually, S’ S) the decomposition is dependency preserving if S’+ = S+ 2/11/2013 Database Design 14
  • 15. Test for Dependency Preservation Input: decomposition D={D1,…,Dk} and a set of FDs F Dependency Preservation Test: Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the determinant of the FD under consideration). ie set T=X and continue with step 2 Step 2: Repeat step 3 until the set T no longer changes. When T no longer changes continue with step 4 Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply the corresponding Ri operation (on a set of attributes T with respect to set of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and repeat step 3 Step 4: Test to see if Y(the right hand side of the FD under consideration) is such that Y ⊂ T. There are two outcomes to this test. If the answer is negative. i.e. if Y not a subset of T then stop the execution of the algorithm and report that the decomposition does not preserve the FD. If the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other FDs in F that need to be considered repeat step 1 with a FD that has not been considered before. If no more FDs in F then continue with step 4 2/11/2013 Database Design 15
  • 16. Problems 1. Given R(XYZ) and the set F = {ZX , XYZ}. Check if the decomposition R1(XY) and R2(XZ) preserve the set F. 2. Given R(ABCD) and the set F = {AB , CD}. Check if the decomposition R1(AB) and R2(CD) preserve the set F. 3. Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the relation R(WXYZ) preserves the dependencies of the set F={XY, YZ, ZW, WX}. 4. Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve the set F. 2/11/2013 Database Design 16
  • 17. Normalization • Normalization is the process of successive reduction of a given set of relations to a better form (reduced redundancy and anomalies) • The normalization that one needs to sustain depends on the work flow (tradeoff between fast access, maintenance of integrity) • Assumes that all possible functional dependencies are known • First construct a minimal set of FDs • Then apply algorithms that construct a required Normal Form • Additional criteria may be needed to ensure that the set of relations in a relational database are atisfactory 2/11/2013 Database Design 17
  • 18. 1 NF • A relation is in first normal form (1NF) if it does not contain any repeating columns or repeating groups of columns • It is the process of converting complex data structures into more simple, stable data structures • A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute • First Normal From (1NF) • Unique rows • All attributes are atomic 2/11/2013 Database Design 18
  • 19. 2 NF • A table is in the second normal form (2NF) if it is in the first normal form and if all non-key columns in the table depend on the entire primary key • The following relation is in 1NF but not 2NF EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed) Functional dependencies: 1. Emp_ID  Name, Dept, Salary partial key dependency 2. Emp_ID, Course  Date_Completed Decompose into 2NF EMPLOYEE1(Emp_ID, Name, Dept, Salary) Functional dependencies: Emp_ID Name, Dept, Salary EMPCOURSE(Emp_ID, Course,Date_Completed) Functional dependency: Emp_ID, Course  Date_Completed 2/11/2013 Database Design 19
  • 20. 3 NF • A table is in the third normal form (3NF) if it is in the second normal form and if all non-key columns in the table depend non-transitively on the entire primary key SALES(Customer_ID, Customer_Name, SalesPerson, Region) Functional dependencies: 1. Customer_ID  Customer_Name, SalesPerson, Region 2. SalesPerson  Region Transitive Dependency Decompose into 3NF SALES1(Customer_ID, Customer_Name, SalesPerson) Functional dependencies: Customer_ID Customer_Name, SalesPerson SPERSON(SalesPerson, Region) Functional dependency: SalesPerson  Region 2/11/2013 Database Design 20
  • 21. BCNF • A table is in Boyce-Codd normal form (BCNF) if every column, on which some other column is fully functionally dependent, is also a candidate for the primary key of the table • A table is in BCNF if the only determinants in the table are the candidate keys SCHOOL(Student, Subject, Teacher) Functional dependencies: 1. Student, Subject  Teacher 2. Student, Teacher  Subject 3. Teacher  Subject Decompose into BCNF SCHOOL1(Student, Subject) SCHOOL2(Subject, Teacher) All Functional Dependencies vanished except TeacherSubject 2/11/2013 Database Design 21
  • 22. Comparison between 3NF and BCNF • It is always possible to decompose a relation into relations in 3NF such that:  the decomposition is lossless  the dependencies are preserved • It is always possible to decompose a relation into relations in BCNF such that:  the decomposition is lossless  but it may not be possible to preserve dependencies  But may eliminate more redundancy 2/11/2013 Database Design 22
  • 23. Multivalued Dependency Let R be a relation schema and let R and R. The multivalued dependency holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist tuples t3 and t4 in r such that: t1[ ] = t2 [ ] = t3 [ ] = t4 [ ] t3[ ] = t1 [ ] t3[R – ] = t2[R – ] t4 ] = t2[ ] t4[R – ] = t1[R – ] • MVD is a tuple generating Dependency 2/11/2013 Database Design 23
  • 24. 4 NF • A table is in the fourth normal form (4 NF) if it is in BCNF and does not have any independent multi- valued parts of the primary key • If there are two attributes A and B and for a given value of A if there exists multiple values of B, then we say that an MVD exists between A and B • The normal forms after BCNF are theoretical interests 2/11/2013 Database Design 24
  • 25. 4 NF Student Table Student Subject Language Geeta Mythology English Geeta Psychology English Geeta Mythology Hindi Geeta Psychology Hindi Shekher Gardening English Student Subject Student Language 2/11/2013 Database Design 25
  • 26. 4 NF Split the independent multi-valued components of the primary key into two tables The primary key is (student subject language) Student_Subject Table Student_Language Table Student Subject Student Language Geeta Mythology Geeta English Geeta Psychology Geeta Hindi Shekher Gardening Shekher English Here we take care of the update anomaly 2/11/2013 Database Design 26
  • 27. Surprise: Loss less Decomposition • There exists relations that cannot be nonloss- decomposed into two projects, but can be decomposed into three or more 2/11/2013 Database Design 27
  • 28. Join Dependency • Definition: A relation R satisfies the join Dependency (JD) *(X,Y,…,Z) iff R is equal to the join of its projects on X,Y,..,Z, where X,Y,..,Z are subsets of the set of attributes of R. • Consider the following Suppliers(S), Parts(P) and Location they Supply (L) table SPL Table S P L S P P L S1 P1 L2 ACTUAL S1 P1 P1 L2 DECOMPOSTION S1 P2 L1 S1 P2 P2 L1 S2 P1 L1 S2 P1 P1 L1 S1 P1 L1 2/11/2013 Database Design 28
  • 29. Join Dependency S P L S P P L S1 P1 L2 ACTUAL S1 P1 P1 L2 DECOMPOSTION S1 P2 L1 S1 P2 P2 L1 S2 P1 L1 S2 P1 P1 L1 S1 P1 L1 Join S P L S1 P1 L2 S1 P2 L1 S2 P1 L1 S1 P1 L1 Spurious S2 P1 L2 Tuple 2/11/2013 Database Design 29
  • 30. Join Dependency S P L S P P L L S S1 P1 L2 DECOMPOSTION S1 P1 P1 L2 L2 S1 S1 P2 L1 S1 P2 P2 L1 L1 S1 S2 P1 L1 S2 P1 P1 L1 L2 S2 S1 P1 L1 Join S P L S1 P1 L2 S1 P2 L1 S2 P1 L1 S1 P1 L1 2/11/2013 Database Design 30
  • 31. 5 NF • A table is in fifth normal form (5NF) if it is in the fourth normal form and every join dependency in the table is implied by the candidate key • Its also called as the Project Join Normal Form (PJNF) 2/11/2013 Database Design 31
  • 32. Normalization Un-normalized Relation Arrange every atomic value in the cell (intersection of row and column) of a table First Normal Form (1NF) Eliminate Partial Dependencies Second Normal Form (2NF) Eliminate Transitive Dependencies Third Normal Form (3NF) Make every determinant as a key Boyce-Codd Normal Form Eliminate Multi-valued Dependencies that are not Functional Dependencies Fourth Normal Form (4NF) Eliminate Join Dependencies that are not implied by Candidate keys Fifth Normal Form (5NF) 2/11/2013 Database Design 32
  • 33. Denormalization • Denormalization if a process in which we retain or introduce some amount of redundancy for faster data access • Where there arise tradeoffs 2/11/2013 Database Design 33
  • 34. Summary • Normalization helps to reduce redundancy and few anomalies • The first 3 (1, 2 and 3) normal forms are practical but BCNF, 4NF and 5 NF are more of theoretical interests • Denormalization is done for fast access 2/11/2013 Database Design 34