SlideShare a Scribd company logo
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 7
Normalization of Relational Tables
7-2
Outline
 Modification anomalies
 Functional dependencies
 Major normal forms
 Relationship independence
 Practical concerns
7-3
Modification Anomalies
 Unexpected side effect
 Insert, modify, and delete more data than
desired
 Caused by excessive redundancies
 Strive for one fact in one place
7-4
Big University Database Table
StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc
S1 JUN O1 2006 3.5 C1 DB
S1 JUN O2 2006 3.3 C2 VB
S2 JUN O3 2006 3.1 C3 OO
S2 JUN O2 2006 3.4 C2 VB
7-5
Modification Anomaly Examples
 Insertion
 Insert more column data than desired
 Must know student number and offering number to
insert a new course
 Update
 Change multiple rows to change one fact
 Must change two rows to change student class of
student S1
 Deletion
 Deleting a row causes other facts to disappear
 Deleting enrollment of student S2 in offering O3
causes loss of information about offering O3 and
course C3
7-6
Functional Dependencies
 Constraint on the possible rows in a table
 Value neutral like FKs and PKs
 Asserted
 Understand business rules
7-7
FD Definition
 X  Y
 X (functionally) determines Y
 X: left-hand-side (LHS) or determinant
 For each X value, there is at most one Y
value
 Similar to candidate keys
7-8
FD Diagrams and Lists
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade
CourseNo CrsDesc
StdSSN  StdCity, StdClass
OfferNo  OffTerm, OffYear, CourseNo, CrsDesc
CourseNo  CrsDesc
StdSSN, OfferNo  EnrGrade
7-9
FDs in Data
• Prove non existence (but not existence) by looking
at data
• Two rows that have the same X value but a different
Y value
StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc
S1 JUN O1 2006 3.5 C1 DB
S1 JUN O2 2006 3.3 C2 VB
S2 JUN O3 2006 3.1 C3 OO
S2 JUN O2 2006 3.4 C2 VB
7-10
Identifying FDs
 Easy identification
 Statements about uniqueness
 PKs and CKs resulting from ERD conversion
 1-M relationship: FD from child to parent
 Difficult identification
 LHS is not a PK or CK in a converted table
 LHS is part of a combined primary or
candidate key
 Ensure minimality of LHS
7-11
Normalization
 Process of removing unwanted
redundancies
 Apply normal forms
 Identify FDs
 Determine whether FDs meet normal form
 Split the table to meet the normal form if there
is a violation
7-12
Relationships of Normal Forms
1NF
2NF
3NF/BCNF
4NF
5NF
DKNF
7-13
1NF
 Starting point for most relational DBMSs
 No repeating groups: flat rows
StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc
S1 JUN O1 2006 3.5 C1 DB
O2 2006 3.3 C2 VB
S2 JUN O3 2006 3.1 C3 OO
O2 2006 3.4 C2 VB
7-14
Combined Definition of
2NF/3NF
 Key column: candidate key or part of
candidate key
 Analogy to the traditional justice oath
 Every non key column depends on all
candidate keys, whole candidate keys,
and nothing but candidate keys
 Usually taught as separate definitions
7-15
2NF
 Every nonkey column depends on all
candidate keys, not a subset of any
candidate key
 Violations
 Part of key  nonkey
 Violations only for combined keys
7-16
2NF Example
 Many violations for the big university
database table
 StdSSN  StdCity, StdClass
 OfferNo  OffTerm, OffYear, CourseNo,
CrsDesc
 Splitting the table
 UnivTable1 (StdSSN, StdCity, StdClass)
 UnivTable2 (OfferNo, OffTerm, OffYear,
CourseNo, CrsDesc)
7-17
3NF
 Every nonkey column depends only on
candidate keys, not on non key columns
 Violations: Nonkey  Nonkey
 Alterative formulation
 No transitive FDs
 A  B, B  C then A  C
 OfferNo  CourseNo, CourseNo  CrsDesc
then OfferNo  CrsDesc
7-18
3NF Example
 One violation in UnivTable2
 CourseNo  CrsDesc
 Splitting the table
 UnivTable2-1 (OfferNo, OffTerm, OffYear,
CourseNo)
 UnivTable2-2 (CourseNo, CrsDesc)
7-19
BCNF
 Every determinant must be a candidate
key.
 Simpler definition
 Apply with simple synthesis procedure
 Special cases not covered by 3NF
 Part of key  Part of key
 Nonkey  Part of key
 Special cases are not common
7-20
BCNF Example
 Primary key: (OfferNo, StdSSN)
 Many violations for the big university
database table
 StdSSN  StdCity, StdClass
 OfferNo  OffTerm, OffYear, CourseNo
 CourseNo  CrsDesc
 Split into four tables
7-21
Simple Synthesis Procedure
1. Eliminate extraneous columns from the
LHSs
2. Remove derived FDs
3. Arrange the FDs into groups with each
group having the same determinant.
4. For each FD group, make a table with
the determinant as the primary key.
5. Merge tables in which one table contains
all columns of the other table.
7-22
Simple Synthesis Example I
 Begin with FDs shown in Slide 8
 Step 1: no extraneous columns
 Step 2: eliminate OfferNo  CrsDesc
 Step 3: already arranged by LHS
 Step 4: four tables (Student, Enrollment,
Course, Offering)
 Step 5: no redundant tables
7-23
Simple Synthesis Example II
 AuthNo  AuthName, AuthEmail, AuthAddress
 AuthEmail  AuthNo
 PaperNo  Primary-AuthNo, Title, Abstract,
Status
 RevNo  RevName, RevEmail, RevAddress
 RevEmail  RevNo
 RevNo, PaperNo  Auth-Comm, Prog-Comm,
Date, Rating1, Rating2, Rating3, Rating4,
Rating5
7-24
Simple Synthesis Example II
Solution
 Author(AuthNo, AuthName, AuthEmail, AuthAddress)
UNIQUE (AuthEmail)
 Paper(PaperNo, Primary-Auth, Title, Abstract, Status)
FOREIGN KEY (Primary-Auth) REFERENCES Author
 Reviewer(RevNo, RevName, RevEmail, RevAddress)
UNIQUE (RevEmail)
 Review(PaperNo, RevNo, Auth-Comm, Prog-Comm,
Date, Rating1, Rating2, Rating3,Rating4, Rating5)
FOREIGN KEY (PaperNo) REFERENCES Paper
FOREIGN KEY (RevNo) REFERENCES Reviewer
7-25
Multiple Candidate Keys
 Multiple candidate keys do not violate
either 3NF or BCNF
 Step 5 of the Simple Synthesis Procedure
creates tables with multiple candidate
keys.
 You should not split a table just because it
contains multiple candidate keys.
 Splitting a table unnecessarily can slow
query performance.
7-26
Relationship Independence
and 4NF
 M-way relationship that can be derived
from binary relationships
 Split into binary relationships
 Specialized problem
 4NF does not involve FDs
7-27
Relationship Independence
Problem
StdSSN
StdName
Student
OfferNo
OffLocation
Offering
TextNo
TextTitle
Textbook
Enroll
Std-Enroll
Offer-Enroll
Text-Enroll
7-28
Relationship Independence
Solution
StdSSN
StdName
Student
OfferNo
OffLocation
Offering
TextNo
TextTitle
Textbook
Enroll Orders
7-29
Extension to the Relationship
Independence Solution
StdSSN
StdName
Student
OfferNo
OffLocation
Offering
TextNo
TextTitle
Textbook
Enroll Orders
Purchase
Std-Purch
Offer-Purch
Text-Purch
7-30
MVDs and 4NF
 MVD: difficult to identify
 A  B | C (multi-determines)
 A associated with a collection of B and C
values
 B and C are independent
 Non trivial MVD: not also an FD
 4NF: no non trivial MVDs
7-31
MVD Representation
A B C
A1 B1 C1
A1 B2 C2
A1 B2 C1
A1 B1 C2
A  B | C
OfferNo StdSSN TextNo
O1 S1 T1
O1 S2 T2
O1 S2 T1
O1 S1 T2
OfferNo  StdSSN | TextNo
Given the two rows above the line, the two rows below the line
are in the table if the MVD is true.
7-32
Higher Level Normal Forms
 5NF for M-way relationships
 DKNF: absolute normal form
 DKNF is an ideal, not a practical normal
form
7-33
Role of Normalization
 Refinement
 Use after ERD
 Apply to table design or ERD
 Initial design
 Record attributes and FDs
 No initial ERD
 May reverse engineer an ERD after
normalization
7-34
Advantages of Refinement
Approach
 Easier to translate requirements into an
ERD than list of FDs
 Fewer FDs to specify
 Fewer tables to split
 Easier to identify relationships especially
M-N relationships without attributes
7-35
Normalization Objective
 Update biased
 Not a concern for databases without
updates (data warehouses)
 Denormalization
 Purposeful violation of a normal form
 Some FDs may not cause anomalies
 May improve performance
7-36
Summary
 Beware of unwanted redundancies
 FDs are important constraints
 Strive for BCNF
 Use a CASE tool for large problems
 Important tool of database development
 Focus on the normalization objective

More Related Content

PDF
Database design normalization note and exercise
PPTX
04 CHAPTER FOUR - INTEGRITY CONSTRAINTS AND NORMALIZATION.pptx
PPTX
Normalization of Data Base
PPTX
Relational Database Design Functional Dependency – definition, trivial and no...
PPTX
DBMS: Week 10 - Database Design and Normalization
PPTX
Normalization
PPTX
Distributed database
Database design normalization note and exercise
04 CHAPTER FOUR - INTEGRITY CONSTRAINTS AND NORMALIZATION.pptx
Normalization of Data Base
Relational Database Design Functional Dependency – definition, trivial and no...
DBMS: Week 10 - Database Design and Normalization
Normalization
Distributed database

Similar to Normalization of Relational Tables - How (20)

PPTX
Normalisation
PPT
Normmmalizzarion.ppt
PDF
functionaldependenciesandnormalization-150628061940-lva1-app6891.pdf
PPTX
Functional dependencies and normalization
PPTX
Sppt chap007
PDF
Chapter – 4 Normalization and Relational Algebra.pdf
PDF
Dependencies in various topics like normalisation and its types
PDF
DBMS-Ch7-MKB (1).pdf
PPT
Normalization
PPTX
1-161103092724.pzxsdfdsdrgdrgdfgdfgdfgdfgptx
PPTX
the normalization of database from APSI Course
PPTX
L1-Normalization 1NF 2NF 3NF 4NF BCNF.pptx
PPTX
Database Presentation
PDF
Advanced Normalization
PPT
Chapter13.pptkuytr567rtedfu765rtghjk876tyuih
PPT
Chapter13.pptkgfhjkljhghjkjhgfhjkhgjhgfhjh
PPTX
Basics of Functional Dependencies and Normalization for Relational Databases....
PPT
Intro to Data warehousing lecture 03
PPT
Chapter six - Normalization.ppt fundamental of db
Normalisation
Normmmalizzarion.ppt
functionaldependenciesandnormalization-150628061940-lva1-app6891.pdf
Functional dependencies and normalization
Sppt chap007
Chapter – 4 Normalization and Relational Algebra.pdf
Dependencies in various topics like normalisation and its types
DBMS-Ch7-MKB (1).pdf
Normalization
1-161103092724.pzxsdfdsdrgdrgdfgdfgdfgdfgptx
the normalization of database from APSI Course
L1-Normalization 1NF 2NF 3NF 4NF BCNF.pptx
Database Presentation
Advanced Normalization
Chapter13.pptkuytr567rtedfu765rtghjk876tyuih
Chapter13.pptkgfhjkljhghjkjhgfhjkhgjhgfhjh
Basics of Functional Dependencies and Normalization for Relational Databases....
Intro to Data warehousing lecture 03
Chapter six - Normalization.ppt fundamental of db
Ad

Recently uploaded (20)

PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Microsoft Core Cloud Services powerpoint
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Introduction to the R Programming Language
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Database Infoormation System (DBIS).pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Leprosy and NLEP programme community medicine
PDF
annual-report-2024-2025 original latest.
STERILIZATION AND DISINFECTION-1.ppthhhbx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
CYBER SECURITY the Next Warefare Tactics
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Microsoft Core Cloud Services powerpoint
[EN] Industrial Machine Downtime Prediction
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
importance of Data-Visualization-in-Data-Science. for mba studnts
Introduction to the R Programming Language
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Database Infoormation System (DBIS).pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Leprosy and NLEP programme community medicine
annual-report-2024-2025 original latest.
Ad

Normalization of Relational Tables - How

  • 1. McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 7 Normalization of Relational Tables
  • 2. 7-2 Outline  Modification anomalies  Functional dependencies  Major normal forms  Relationship independence  Practical concerns
  • 3. 7-3 Modification Anomalies  Unexpected side effect  Insert, modify, and delete more data than desired  Caused by excessive redundancies  Strive for one fact in one place
  • 4. 7-4 Big University Database Table StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc S1 JUN O1 2006 3.5 C1 DB S1 JUN O2 2006 3.3 C2 VB S2 JUN O3 2006 3.1 C3 OO S2 JUN O2 2006 3.4 C2 VB
  • 5. 7-5 Modification Anomaly Examples  Insertion  Insert more column data than desired  Must know student number and offering number to insert a new course  Update  Change multiple rows to change one fact  Must change two rows to change student class of student S1  Deletion  Deleting a row causes other facts to disappear  Deleting enrollment of student S2 in offering O3 causes loss of information about offering O3 and course C3
  • 6. 7-6 Functional Dependencies  Constraint on the possible rows in a table  Value neutral like FKs and PKs  Asserted  Understand business rules
  • 7. 7-7 FD Definition  X  Y  X (functionally) determines Y  X: left-hand-side (LHS) or determinant  For each X value, there is at most one Y value  Similar to candidate keys
  • 8. 7-8 FD Diagrams and Lists StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc StdSSN  StdCity, StdClass OfferNo  OffTerm, OffYear, CourseNo, CrsDesc CourseNo  CrsDesc StdSSN, OfferNo  EnrGrade
  • 9. 7-9 FDs in Data • Prove non existence (but not existence) by looking at data • Two rows that have the same X value but a different Y value StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc S1 JUN O1 2006 3.5 C1 DB S1 JUN O2 2006 3.3 C2 VB S2 JUN O3 2006 3.1 C3 OO S2 JUN O2 2006 3.4 C2 VB
  • 10. 7-10 Identifying FDs  Easy identification  Statements about uniqueness  PKs and CKs resulting from ERD conversion  1-M relationship: FD from child to parent  Difficult identification  LHS is not a PK or CK in a converted table  LHS is part of a combined primary or candidate key  Ensure minimality of LHS
  • 11. 7-11 Normalization  Process of removing unwanted redundancies  Apply normal forms  Identify FDs  Determine whether FDs meet normal form  Split the table to meet the normal form if there is a violation
  • 12. 7-12 Relationships of Normal Forms 1NF 2NF 3NF/BCNF 4NF 5NF DKNF
  • 13. 7-13 1NF  Starting point for most relational DBMSs  No repeating groups: flat rows StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc S1 JUN O1 2006 3.5 C1 DB O2 2006 3.3 C2 VB S2 JUN O3 2006 3.1 C3 OO O2 2006 3.4 C2 VB
  • 14. 7-14 Combined Definition of 2NF/3NF  Key column: candidate key or part of candidate key  Analogy to the traditional justice oath  Every non key column depends on all candidate keys, whole candidate keys, and nothing but candidate keys  Usually taught as separate definitions
  • 15. 7-15 2NF  Every nonkey column depends on all candidate keys, not a subset of any candidate key  Violations  Part of key  nonkey  Violations only for combined keys
  • 16. 7-16 2NF Example  Many violations for the big university database table  StdSSN  StdCity, StdClass  OfferNo  OffTerm, OffYear, CourseNo, CrsDesc  Splitting the table  UnivTable1 (StdSSN, StdCity, StdClass)  UnivTable2 (OfferNo, OffTerm, OffYear, CourseNo, CrsDesc)
  • 17. 7-17 3NF  Every nonkey column depends only on candidate keys, not on non key columns  Violations: Nonkey  Nonkey  Alterative formulation  No transitive FDs  A  B, B  C then A  C  OfferNo  CourseNo, CourseNo  CrsDesc then OfferNo  CrsDesc
  • 18. 7-18 3NF Example  One violation in UnivTable2  CourseNo  CrsDesc  Splitting the table  UnivTable2-1 (OfferNo, OffTerm, OffYear, CourseNo)  UnivTable2-2 (CourseNo, CrsDesc)
  • 19. 7-19 BCNF  Every determinant must be a candidate key.  Simpler definition  Apply with simple synthesis procedure  Special cases not covered by 3NF  Part of key  Part of key  Nonkey  Part of key  Special cases are not common
  • 20. 7-20 BCNF Example  Primary key: (OfferNo, StdSSN)  Many violations for the big university database table  StdSSN  StdCity, StdClass  OfferNo  OffTerm, OffYear, CourseNo  CourseNo  CrsDesc  Split into four tables
  • 21. 7-21 Simple Synthesis Procedure 1. Eliminate extraneous columns from the LHSs 2. Remove derived FDs 3. Arrange the FDs into groups with each group having the same determinant. 4. For each FD group, make a table with the determinant as the primary key. 5. Merge tables in which one table contains all columns of the other table.
  • 22. 7-22 Simple Synthesis Example I  Begin with FDs shown in Slide 8  Step 1: no extraneous columns  Step 2: eliminate OfferNo  CrsDesc  Step 3: already arranged by LHS  Step 4: four tables (Student, Enrollment, Course, Offering)  Step 5: no redundant tables
  • 23. 7-23 Simple Synthesis Example II  AuthNo  AuthName, AuthEmail, AuthAddress  AuthEmail  AuthNo  PaperNo  Primary-AuthNo, Title, Abstract, Status  RevNo  RevName, RevEmail, RevAddress  RevEmail  RevNo  RevNo, PaperNo  Auth-Comm, Prog-Comm, Date, Rating1, Rating2, Rating3, Rating4, Rating5
  • 24. 7-24 Simple Synthesis Example II Solution  Author(AuthNo, AuthName, AuthEmail, AuthAddress) UNIQUE (AuthEmail)  Paper(PaperNo, Primary-Auth, Title, Abstract, Status) FOREIGN KEY (Primary-Auth) REFERENCES Author  Reviewer(RevNo, RevName, RevEmail, RevAddress) UNIQUE (RevEmail)  Review(PaperNo, RevNo, Auth-Comm, Prog-Comm, Date, Rating1, Rating2, Rating3,Rating4, Rating5) FOREIGN KEY (PaperNo) REFERENCES Paper FOREIGN KEY (RevNo) REFERENCES Reviewer
  • 25. 7-25 Multiple Candidate Keys  Multiple candidate keys do not violate either 3NF or BCNF  Step 5 of the Simple Synthesis Procedure creates tables with multiple candidate keys.  You should not split a table just because it contains multiple candidate keys.  Splitting a table unnecessarily can slow query performance.
  • 26. 7-26 Relationship Independence and 4NF  M-way relationship that can be derived from binary relationships  Split into binary relationships  Specialized problem  4NF does not involve FDs
  • 29. 7-29 Extension to the Relationship Independence Solution StdSSN StdName Student OfferNo OffLocation Offering TextNo TextTitle Textbook Enroll Orders Purchase Std-Purch Offer-Purch Text-Purch
  • 30. 7-30 MVDs and 4NF  MVD: difficult to identify  A  B | C (multi-determines)  A associated with a collection of B and C values  B and C are independent  Non trivial MVD: not also an FD  4NF: no non trivial MVDs
  • 31. 7-31 MVD Representation A B C A1 B1 C1 A1 B2 C2 A1 B2 C1 A1 B1 C2 A  B | C OfferNo StdSSN TextNo O1 S1 T1 O1 S2 T2 O1 S2 T1 O1 S1 T2 OfferNo  StdSSN | TextNo Given the two rows above the line, the two rows below the line are in the table if the MVD is true.
  • 32. 7-32 Higher Level Normal Forms  5NF for M-way relationships  DKNF: absolute normal form  DKNF is an ideal, not a practical normal form
  • 33. 7-33 Role of Normalization  Refinement  Use after ERD  Apply to table design or ERD  Initial design  Record attributes and FDs  No initial ERD  May reverse engineer an ERD after normalization
  • 34. 7-34 Advantages of Refinement Approach  Easier to translate requirements into an ERD than list of FDs  Fewer FDs to specify  Fewer tables to split  Easier to identify relationships especially M-N relationships without attributes
  • 35. 7-35 Normalization Objective  Update biased  Not a concern for databases without updates (data warehouses)  Denormalization  Purposeful violation of a normal form  Some FDs may not cause anomalies  May improve performance
  • 36. 7-36 Summary  Beware of unwanted redundancies  FDs are important constraints  Strive for BCNF  Use a CASE tool for large problems  Important tool of database development  Focus on the normalization objective

Editor's Notes

  • #1: Welcome to Chapter 7 - Logical database design: converting and refining the ERD - Major part of logical database design: normalization - Normalization: refinement; identifying and resolving unwanted redundancy Objectives: - Identify modification anomalies - Define functional dependencies - Apply normalization rules to modest size problems: BCNF and simple synthesis procedure - Understand relationship independence problems - Appreciate the role and objective of normalization in the db development process
  • #2: Modification anomalies: motivation for normalization; unwanted redundancy Functional dependencies: - Assertions or constraints about the data - Most important part of the process: recording FDs Normal forms: - Rules about allowable patterns of FDs - Apply on modest size problems - CASE tool for large problems Relationship independence: - More specialized redundancy problem - Not as common and important as BCNF Practical concerns: - Role of normalization in the development process: when to use; how important - Analyzing the objective
  • #3: Side effect: unintended consequence; sometimes good, sometimes bad Modification anomaly: - Cannot modify just the desired data - Must modify more than the desired data Cause: - Redundancy: facts stored multiple times - Remove unwanted redundancies to eliminate anomalies
  • #4: Big University Database Table: - Table 7-1 except for omission of two columns (StdCity and OffTerm) - Typical beginner's mistake: use one table for the entire database Anomalies: - PK: combination of StdSSN and OfferNo - Insert: cannot insert a new student without enrolling in an offering (OfferNo part of PK) - Update: change a course description; change every enrollment of the course - Delete: remove third row; lose information about course C3 Table has obvious redundancies - Easier to query: no joins - More difficult to change: can work around problems (dummy PK) but tedious to do
  • #5: To deal with these anomalies, users may circumvent them (such as using a default primary key to insert a new course) or database programmers may write code to prevent inadvertent loss of data. A better solution is to modify the table design to remove the redundancies that cause the anomalies.
  • #6: Assert: - Defining a business rule - Normative: should be statement - Look at data to see existing practices - Most important part of normalization: asserting FDs Value neutral: - no specific value mentioned in an FD - PK: can be any value but it must be unique - FK: can be any value that matches a row in the PK table
  • #7: Notation: - X->Y - X determines Y (more properly X functionally determines Y) - Refer to X as the LHS or determinant - Each X has at most one Y - Like a mathematical function: f(X) = Y Example StdSSN -> StdClass - There is at most one class for each student - Place StdSSN and StdClass alone in the same table: StdSSN is a candidate key
  • #8: FD diagram: - 7.2 of textbook chapter 7 - See related FDs (same LHS) by line height - Useful for small sets of FDs - Unwieldy for large sets of FDs FD list: - Group by LHS - Shortcut notation: X -> Y, Z is a shortcut for X -> Y and X -> Z Compound LHS: - Similar to a combined PK - Compound LHS is not a shortcut (as is a compound RHS) - Combination of StdSSN and OfferNo determine EnrGrade (not either column alone) Minimality: - LHS must be minimal - Cannot remove columns from LHS without making the FD invalid - Usually non minimal LHS is not a problem: important that LHS does not have extraneous columns - Properly known as full functional dependence: minimal LHS makes full functional dep.
  • #9: Looking at data: - Useful when explaining to a user - Automated tools ask for example rows to eliminate FDs Example: - OfferNo -> StdSSN: contradicting rows ( 2, 4) (same OfferNo but a different StdSSN) - StdSSN -> OfferNo: contradicting rows (<1,2>, <3,4>) - StdSSN -> OffYear: data has no contradictions - Add rows to provide contradiction (enroll S1 in a 2001 offering) Assignment 5: - Questions similar to this line of reasoning - Find contradictory rows or add rows if no contradiction is found
  • #10: Functional dependencies in which the LHS is not a primary or candidate key can also be difficult to identify. These FDs are especially important to identify after converting an ERD to a table design. You should carefully look for FDs in which the LHS is not a candidate key or primary key. You should also consider FDs in tables with a combined primary or candidate key in which the LHS is part of a key, but not the entire key. The presentation of normal forms in Section 7.2 explains that these kinds of FDs can lead to modification anomalies. Minimality: no extra columns in LHS Difficult FDs to identify: most important because most FDs are identified in developing ERD
  • #11: Normal form: - Rule about allowable pattern of FDs (1NF through BCNF) - Higher normal forms are not rules about FDs: more difficult to understand and use - Most important part is to record FDs: CASE tool can perform normalization - Check FDs to see if they violate the pattern permitted in the normal form Split table - Smaller tables do not violate the normal form - Smaller tables should not lose information contained in the larger table Difficulty: - Normalization is easy to apply to small tables with simple dependency structures - Use CASE tool for large databases and tables with complex dependency structures
  • #12: 1NF: least restrictive; every table in 1NF 2NF: more restrictive than 1NF; every table in 2NF is also in 1NF 3NF/BCNF: BCNF is a revised definition of 3NF; BCNF is more restrictive than BCNF 4NF: Inappropriate usage of an M-way relationship; Relationship independence and MVDs; does not involve FDs 5NF: does not involve FDs; Inappropriate usage of an M-way relationship; more specialized than 4NF DKNF: ideal rather than a practical normal form
  • #13: Big university database table is not normalized - Not in 1NF - S1 row has repeating values (O1 and O2) - S2 row has repeating values (O3 and O2) Convert to 1NF: - Flatten rows - Split each repeating group into a separate row - Repeat the implied values in the new rows: - S1 JUN for row two with O2 2000 3.3 C2 VB - S2 JUN for row four with O2 2000 3.4 C2 VB Nested tables are permitted in SQL:2003 (Chapter 18) but nested tables are not important in most business databases. Nested tables are not in 1NF.
  • #14: Candidate key: - Unique - Minimal: no extraneous columns without losing uniqueness property - Can have multiple candidate keys per table Key column: - A candidate key by itself - Part of a combined CK - Nonkey: a column that is not a key column Combined definition: - Analogy to traditional justice oath - So help me: Ted Codd (father of relational databases) - Usually taught as separate definitions for simplicity
  • #15: First of combined 2NF/3NF definition: dependent on the whole key Violation: - Part of a key determines a non key - If FDs for a table contain such an FD, split the table
  • #16: Violating FDs: - All FDs violate except StdSSN, OfferNo -> EnrGrade - StdSSN: part of a key (not the entire key) - OfferNo: part of a key (not the entire key) Splitting: - Place each FD group in a separate table - UnivTable1: StdSSN group - UnivTable2: OfferNo group - UnivTable3: StdSSN, OfferNo group - No violations among new tables - CourseNo -> CrsDesc (CourseNo is not part of a key) - This FD violates 3NF, not 2NF Splitting process: - Recover original table with natural join - Not lose any FDs: all FDs are derivable - Books on normalization theory explain criteria: theory not important here
  • #17: Second part of combined 2NF/3NF definition: nothing but the key Violation: - Non key determines a non key - If FDs for a table contain such an FD, split the table Alternative formulation: - No transitive FDs - Law of transitivity: A < B, B < C then A < C - Transitivity applies to FDs - Not the preferred definition of 3NF: should not write down transitively derived FDs - Simple synthesis procedure for BCNF
  • #18: Violating FDs: - CourseNo is non key - Alternatively, OfferNo -> CrsDesc is a transitively derived FD Splitting UnivTable2: - Arrange by LHS - UnivTable2-2: CourseNo is the PK
  • #19: Boyce-Codd Normal Form Determinant: LHS Candidate key: unique column(s) for the table Important because it is simpler and more direct to apply Revised 3NF definition Special case is not common
  • #20: Violating FDs: - All FDs violate except StdSSN, OfferNo -> EnrGrade - StdSSN: not a candidate key (part of a candidate key) - OfferNo: part of a key (not the entire key) Splitting: - Place each FD group in a separate table - UnivTable1: StdSSN group - UnivTable2: OfferNo group - UnivTable3: CourseNo group - UnivTable4: StdSSN, OfferNo group - No violations among new tables
  • #21: Synthesis: - Combine parts into whole - Musical synthesis: combine individual sounds into larger musical units - Combine FDs into tables Simple synthesis procedure - Procedure to apply BCNF: result are BCNF tables - Simple: second step is much more complex in practice - Many ways to derive FDs - Compute minimal cover: difficult to compute by hand except for small FD lists - Use CASE tool for moderate to large FD lists Step 1: - Nake sure LHSs are minimal - Usually not a problem Step 2: - Many ways to derive FDs - For this class, only consider law of transitivity - Reduce amount of work by not recording derived FDs Step 3: - Sort FDs by LHS - Little work because natural to write FDs in this manner Step 4: - One table per FD group - LHS becomes PK Step 5: - Deals with tables that have multiple candidate keys - Merge FD groups when one table contains another table - Choose the primary key of one of the separate tables as the primary of the new, merged table. - Define unique constraints for the other primary keys that were not designated as the primary key of the new table. Add FKs: - Not formally part of normalization process - Ensure consistency - Add FK: PK or CK used in another table
  • #22: Step 1: - No work - StdSSN, StdClass -> StdCity: remove StdClass Step 2: - OfferNo -> CrsDesc is derived by transitivity - Remove from the FD list Step 3: - StdSSN -> StdCity, StdClass - OfferNo -> OffTerm, OffYear, CourseNo - CourseNo -> CrsDesc - StdSSN, OfferNo -> EnrGrade Step 4: - Student table: StdSSN (PK) - Offering table: OfferNO (PK) - Course table: CourseNo (PK) - Enrollment table: StdSSN, OfferNo (combined PK) Step 5: - No redundant tables - Email -> StdSSN - Merge email and StdSSN tables - StdSSN the PK: more stable than Email; missing email addresses FKs: - Offering: CourseNo - Enrollment: StdSSN and OfferNo
  • #23: database to track reviews of papers submitted to an academic conference. Prospective authors submit papers for review and possible acceptance in the published conference proceedings.
  • #24: Because the LHS is minimal in each FD, the first step is finished. The second step is not necessary because there are no transitive dependencies. Note that the FDs AuthEmail  AuthName, AuthAddress, and RevEmail  RevName, RevAddress can be transitively derived. If any of these FDs were part of the original list, they should be removed. For each of the six FD groups, you should define a table. In the last step, you combine the FD groups with AuthNo and AuthEmail and RevNo and RevEmail as determinants. In addition, you should add unique constraints for AuthEmail and RevEmail because these columns were not selected as the primary keys of the new tables.
  • #25: As this additional example demonstrates, multiple candidate keys do not violate BCNF. The fifth step of the Simple Synthesis Procedure creates tables with multiple candidate keys because it merges tables. Multiple candidate keys do not violate 3NF either. There is no reason to split a table just because it has multiple candidate keys. Splitting a table with multiple candidate keys can slow query performance due to extra joins.
  • #26: Analogy to statistical independence: - Do not store joint probabilities when variables are independent - Age of rock and age of person holding the rock are independent - Joint probability (Person age = X and rock age = y) can be derived from marginal probabilities Relationship independence: - Independent relationships: binary relationships that combine to show all possible combinations - Combine using the join operator - No need to store the join (M-way relationship) when it can be derived More specialized problem than BCNF: - M-way relationships are not common: important when occurring - Analysis of M way relationships is not a typical situation - Two ways to analyze M-way relationships: - Given binary relationships, should there be an M-way relationship instead - Given M-way relationship, should it be split into M-1 binary relationships - Relationship independence and 4NF only involve the splitting question - Chapter 12 (Section 12.2.2) provides a more general (but less rigorous) way to reason about both questions
  • #27: Enroll: - Associative entity type containing combinations of students, offerings, and textbooks - All key: StdSSN, OfferNo, TextNo Design problem: - Should Enroll be split into two binary relationships - Student-Offering: students register for course offerings - Offering-Textbook: professors choose textbooks - Student-Textbook: can be derived from other two relationships - Student-Offering and Offering-Textbook are independent Solution: next slide
  • #28: Solution: - Split enroll entity type into two M-N relationships - Enroll relationship: Student-Offering - Orders relationship: Offering-Textbook 3 Way relationship would probably not been considered because of knowledge of business process - Enrollment and textbook ordering are independent events - Occur in different points in time - Separate data entry forms
  • #29: Problem extension: - Record textbook purchases - Add an associative entity type but keep the two binary relationships - Purchasing behavior cannot be derived from enrollment and ordering data If the assumptions change slightly, an argument can be made for an associative entity type representing a three-way relationship. Suppose that the bookstore wants to record textbook purchases by offering and student to estimate textbook demand. Then, the relationship between students and textbooks is no longer independent of the other two relationships. Even though a student is enrolled in an offering and the offering uses a textbook, the student may not purchase the textbook (perhaps borrow it) for the offering. In this situation, there is no independence and a three-way relationship is needed. In addition to the M-N relationships in Figure 7.8, there should be a new associative entity type and three 1-M relationships, as shown in Figure 7.9. You need the Enroll relationship to record student selections of offerings and the Orders relationship to record professor selections of textbooks. The Purchase entity type records purchases of textbooks by students in a course offering. However, a purchase cannot be known from the other relationships.
  • #30: MVDs: - Difficult to identify - Rigorous definition of relationship independence - Concept is confusing to many practitioners: independence concept is often omitted - I stress the independence idea (the key idea) rather than the multi value idea MVD: - Association with a set of values (not just one) - Independence: key idea - An FD is an MVD with collection is a single value - Non trivial MVD: associated with one or more values
  • #31: Given the two rows above the line, the two rows below the line are in the table if A multi determines B | C. Independence means that A is associated with every combination of B and C values.
  • #32: 5NF: - More specialized than 4NF - More difficult to understand than 4NF - Split a three way relationship into three (not two) binary relationships DKNF (Domain Key Normal Form) - Domain: sets of values - Key: candidate key (uniqueness property) - All constraints derivable from domains and keys - Not possible to test a table for DKNF compliance - No known procedure to construct a DKNF table
  • #33: Approach in textbook: - Refinement - Apply normalization after conversion - Normalization can be applied directly to an ERD Initial design approach: - Use attributes and FDs - May reverse engineer ERD later - Some strongly advocate this approach
  • #34: This book clearly favors using normalization as a refinement tool, not as an initial design tool. Through development of an ERD, you intuitively group related fields. Much normalization is accomplished in an informal manner without the tedious process of recording functional dependencies. As a refinement tool, there are fewer FDs to specify and less normalization to perform. Applying normalization ensures that candidate keys and redundancies have not been overlooked. Another reason for favoring the refinement approach is that relationships can be overlooked when using normalization as the initial design approach. 1-M relationships must be identified in the child-to-parent direction. For novice data modelers, identifying relationships is easier when considering both sides of a relationship. For an M-N relationship without attributes, there will not be any functional dependencies that show the need for a table. For example, in a design about textbooks and course offerings, if the relationship between them has no attributes, there are no functional dependencies that relate textbooks and course offerings[1]. In drawing an ERD, however, the need for an M-N relationship becomes clear. [1] An FD can be written with a null right-hand side to represent M-N relationships. The FD for the offering-textbook relationship can be expressed as TextId, OfferNo . However, this kind of FD is awkward to state. It is much easier to define an M-N relationship.
  • #35: Carefully analyze objective: - Ignore normalization for FDs that do not cause anomalies - Be careful: most FDs will lead to anomalies Classic example: - ZipCode -> City - Only holds for the city in which the post office is located - Even when it holds, it may not lead to anomalies - Mail order business: track tax rates by zip code (not really accurate) - Important for mail order databases Performance: - Chapter 8 - Consider performance implications after logical design is complete
  • #36: Cause of difficult modifications: - Unwanted redundancies - Anomalies can occur with unwanted redundancies FD: - Like a candidate key constraint - Must be able to record FDs - Normalization can be performed by CASE tool: necessary for large databases - BCNF: revised definition of 3NF; most important in practice Role of normalization: - Refinement rather than initial design (my expert opinion) - Can be applied after conversion or directly to an ERD Normalization objective: - Update biased: make a db easier to change - Normalization makes many tables - Difficult and inefficient to query - If an FD does not cause a significant anomaly, perhaps relax from full BCNF - Denomalization can be done to improve performance (Chapter 8)