SlideShare a Scribd company logo
Ahsan AbdullahAhsan Abdullah
11
Data WarehousingData Warehousing
Lecture-6Lecture-6
NormalizationNormalization
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan@cluxing.com
Ahsan Abdullah
2
NormalizationNormalization
Ahsan Abdullah
3
NormalizationNormalization
What is normalization?
What are the goals of normalization?
 Eliminate redundant data.
 Ensure data dependencies make sense.
What is the result of normalization?
What are the levels of normalization?
Always follow purists approach of normalization?
NONO
Ahsan Abdullah
4
NormalizationNormalization
SID: Student ID
Degree: Registered as BS or MS student
Campus: City where campus is located
Course: Course taken
Marks: Score out of max of 50
Consider a student database system to be developed for a multi-campus university, such
that it specializes in one degree program at a campus i.e. BS, MS or PhD.
SID Degree Campus Course Marks
1 BS Islamabad CS-101 30
1 BS Islamabad CS-102 20
1 BS Islamabad CS-103 40
1 BS Islamabad CS-104 20
1 BS Islamabad CS-105 10
1 BS Islamabad CS-106 10
2 MS Lahore CS-101 30
2 MS Lahore CS-102 40
3 MS Lahore CS-102 20
4 BS Islamabad CS-102 20
4 BS Islamabad CS-104 30
4 BS Islamabad CS-105 40
Ahsan Abdullah
5
Normalization: 1NFNormalization: 1NF
Only contains atomic values, BUT also contains redundant data.
40CS-105IslamabadBS4
30CS-104IslamabadBS4
20CS-102IslamabadBS4
20CS-102LahoreMS3
40CS-102LahoreMS2
30CS-101LahoreMS2
10CS-106IslamabadBS1
10CS-105IslamabadBS1
20CS-104IslamabadBS1
40CS-103IslamabadBS1
20CS-102IslamabadBS1
30CS-101IslamabadBS1
MarksCourseCampusDegreeSID
FIRST
Ahsan Abdullah
6
Normalization: 1NFNormalization: 1NF
Update anomalies
INSERT. Certain student with SID 5 got admission in a
different campus (say) Karachi cannot be added until the
student registers for a course.
DELETE. If student graduates and his/her corresponding
record is deleted, then all information about that student
is lost.
UPDATE. If student migrates from Islamabad campus to
Lahore campus (say) SID = 1, then six rows would have
to be updated with this new information.
Ahsan Abdullah
7
Normalization: 2NFNormalization: 2NF
Every non-key column is fully dependent on the PK
FIRST is in 1NF but not in 2NF because degree and campus are
functionally dependent upon only on the column SID of the composite
key (SID, course). This can be illustrated by listing the functional
dependencies in the table:
SID —> campus, degree
campus —> degree
(SID, Course) —> Marks
To transform the table FIRST into 2NF we move the columns SID, Degree and
Campus to a new table called REGISTRATION. The column SID becomes the
primary key of this new table.
SID & Campus are NOT unique
Ahsan Abdullah
8
Normalization: 2NFNormalization: 2NF
SID Degree Campus
1 BS Islamabad
2 MS Lahore
3 MS Lahore
4 BS Islamabad
5 PhD Peshawar
SID Course Marks
1 CS-101 30
1 CS-102 20
1 CS-103 40
1 CS-104 20
1 CS-105 10
1 CS-106 10
2 CS-101 30
2 CS-102 40
3 CS-102 20
4 CS-102 20
4 CS-104 30
4 CS-105 40
REGISTRATION
PERFORMANCE
SID is now a PK
PERFORMANCE in 2NF as (SID, Course) uniquely identify Marks
Ahsan Abdullah
9
Normalization: 2NFNormalization: 2NF
Presence of modification anomalies for tables in
2NF. For the table REGISTRATION, they are:
 INSERT: Until a student gets registered in a degree
program, that program cannot be offered!
 DELETE: Deleting any row from REGISTRATION destroys
all other facts in the table.
Why there are anomalies?
The table is in 2NF but NOT in 3NF
Ahsan Abdullah
10
Normalization: 3NFNormalization: 3NF
All columns must be dependent only on the primary key.
Table PERFORMANCE is already in 3NF. The non-key column, marks, is fully
dependent upon the primary key (SID, degree).
REGISTRATION is in 2NF but not in 3NF because it contains a transitive
dependency.
A transitive dependency occurs when a non-key column that is a
determinant of the primary key is the determinate of other columns.
The concept of a transitive dependency can be illustrated by showing the
functional dependencies in REGISTRATION:
REGISTRATION.SID —> REGISTRATION.Degree
REGISTRATION.SID —> REGISTRATION.Campus
REGISTRATION.Campus —> REGISTRATION.Degree
Note that REGISTRATION.Degree is determined both by the primary key SID
and the non-key column campus.
Ahsan Abdullah
11
Normalization: 3NFNormalization: 3NF
To transform REGISTRATION into 3NF, we create a
new table called CAMPUS_DEGREE and move the
columns campus and degree into it.
Degree is deleted from the original table, campus is
left behind to serve as a foreign key to
CAMPUS_DEGREE, and the original table is
renamed to STUDENT_CAMPUS to reflect its
semantic meaning.
Ahsan Abdullah
12
Normalization: 3NFNormalization: 3NF
PeshawarPhD5
IslamabadBS4
LahoreMS3
LahoreMS2
IslamabadBS1
CampusDegreeSID
REGISTRATION
Peshawar5
Islamabad4
Lahore3
Lahore2
Islamabad1
CampusSID
STUDENT_CAMPUS
PhDPeshawar
MSLahore
BSIslamabad
DegreeCampus
CAMPUS_DEGREE
Ahsan Abdullah
13
Normalization: 3NFNormalization: 3NF
Removal of anomalies and improvement in
queries as follows:
 INSERT: Able to first offer a degree program,
and then students registering in it.
 UPDATE: Migrating students between
campuses by changing a single row.
 DELETE: Deleting information about a course,
without deleting facts about all columns in the
record.
Ahsan Abdullah
14
NormalizationNormalization
Conclusions:
 Normalization guidelines are cumulative.
 Generally a good idea to only ensure 2NF.
 3NF is at the cost of simplicity and performance.
 There is a 4NF with no multi-valued
dependencies.
 There is also a 5NF.

More Related Content

PPT
Dwh lecture-06-normalization
PPT
Intro to Data warehousing lecture 03
PPT
Tablas del lab 1 Gestion de Datos I
PPT
Normalization of database_tables_chapter_4
PPT
Lecture8 Normalization Aggarwal
PPTX
Ms sql server tips 1 0
PPT
Dwh lecture slides-week2
PPT
Dwh lecture slides-week3&4
Dwh lecture-06-normalization
Intro to Data warehousing lecture 03
Tablas del lab 1 Gestion de Datos I
Normalization of database_tables_chapter_4
Lecture8 Normalization Aggarwal
Ms sql server tips 1 0
Dwh lecture slides-week2
Dwh lecture slides-week3&4

Similar to Lecture 6 (20)

PPTX
PP DBMS - 2 (1) (1).pptx
PPTX
Learning of 3NF BCNF Normal Forms in DBMS.pptx
PPT
Normmmalizzarion.ppt
DOCX
Dbms record
PPTX
Presentation on Normalization.pptx
PPT
Unit03 dbms
PDF
Impact of Normalization in Future
PPTX
Education Presentation ABAP Week-15.pptx
PPTX
Normalization.pptx
PPTX
Chapter 3 ( PART 2 ).pptx
PDF
Database Management System lecture note.
PPTX
Presentations_PPT_Unit-2_25042019031227AM.pptx
PPT
D I T211 Chapter 6
PPTX
Sppt chap007
PDF
Assignment#11
PPT
Normalization of Relational Tables - How
PPTX
Linked lists
PPTX
Data Manipulation Language
PPTX
SQL Fundamentals
PP DBMS - 2 (1) (1).pptx
Learning of 3NF BCNF Normal Forms in DBMS.pptx
Normmmalizzarion.ppt
Dbms record
Presentation on Normalization.pptx
Unit03 dbms
Impact of Normalization in Future
Education Presentation ABAP Week-15.pptx
Normalization.pptx
Chapter 3 ( PART 2 ).pptx
Database Management System lecture note.
Presentations_PPT_Unit-2_25042019031227AM.pptx
D I T211 Chapter 6
Sppt chap007
Assignment#11
Normalization of Relational Tables - How
Linked lists
Data Manipulation Language
SQL Fundamentals
Ad

More from Shani729 (20)

PPT
Python tutorialfeb152012
PPT
Python tutorial
PDF
Interaction design _beyond_human_computer_interaction
PPTX
Fm lecturer 13(final)
PPT
Lecture slides week14-15
PPT
Frequent itemset mining using pattern growth method
PPT
Dwh lecture slides-week15
PPT
Dwh lecture slides-week10
PPT
Dwh lecture slidesweek7&8
PPT
Dwh lecture slides-week5&6
PPTX
Dwh lecture slides-week1
PPT
Dwh lecture slides-week 13
PPT
Dwh lecture slides-week 12&13
PPTX
Data warehousing and mining furc
PPT
Lecture 40
PPT
Lecture 39
PPT
Lecture 38
PPT
Lecture 37
PPT
Lecture 35
PPT
Lecture 36
Python tutorialfeb152012
Python tutorial
Interaction design _beyond_human_computer_interaction
Fm lecturer 13(final)
Lecture slides week14-15
Frequent itemset mining using pattern growth method
Dwh lecture slides-week15
Dwh lecture slides-week10
Dwh lecture slidesweek7&8
Dwh lecture slides-week5&6
Dwh lecture slides-week1
Dwh lecture slides-week 13
Dwh lecture slides-week 12&13
Data warehousing and mining furc
Lecture 40
Lecture 39
Lecture 38
Lecture 37
Lecture 35
Lecture 36
Ad

Recently uploaded (20)

PPTX
Information Storage and Retrieval Techniques Unit III
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
communication and presentation skills 01
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Abrasive, erosive and cavitation wear.pdf
PPTX
Current and future trends in Computer Vision.pptx
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Soil Improvement Techniques Note - Rabbi
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
UNIT - 3 Total quality Management .pptx
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
Information Storage and Retrieval Techniques Unit III
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
communication and presentation skills 01
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Abrasive, erosive and cavitation wear.pdf
Current and future trends in Computer Vision.pptx
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Safety Seminar civil to be ensured for safe working.
Fundamentals of safety and accident prevention -final (1).pptx
Exploratory_Data_Analysis_Fundamentals.pdf
R24 SURVEYING LAB MANUAL for civil enggi
UNIT 4 Total Quality Management .pptx
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Soil Improvement Techniques Note - Rabbi
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
UNIT - 3 Total quality Management .pptx
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Categorization of Factors Affecting Classification Algorithms Selection

Lecture 6

  • 1. Ahsan AbdullahAhsan Abdullah 11 Data WarehousingData Warehousing Lecture-6Lecture-6 NormalizationNormalization Virtual University of PakistanVirtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan@cluxing.com
  • 3. Ahsan Abdullah 3 NormalizationNormalization What is normalization? What are the goals of normalization?  Eliminate redundant data.  Ensure data dependencies make sense. What is the result of normalization? What are the levels of normalization? Always follow purists approach of normalization? NONO
  • 4. Ahsan Abdullah 4 NormalizationNormalization SID: Student ID Degree: Registered as BS or MS student Campus: City where campus is located Course: Course taken Marks: Score out of max of 50 Consider a student database system to be developed for a multi-campus university, such that it specializes in one degree program at a campus i.e. BS, MS or PhD. SID Degree Campus Course Marks 1 BS Islamabad CS-101 30 1 BS Islamabad CS-102 20 1 BS Islamabad CS-103 40 1 BS Islamabad CS-104 20 1 BS Islamabad CS-105 10 1 BS Islamabad CS-106 10 2 MS Lahore CS-101 30 2 MS Lahore CS-102 40 3 MS Lahore CS-102 20 4 BS Islamabad CS-102 20 4 BS Islamabad CS-104 30 4 BS Islamabad CS-105 40
  • 5. Ahsan Abdullah 5 Normalization: 1NFNormalization: 1NF Only contains atomic values, BUT also contains redundant data. 40CS-105IslamabadBS4 30CS-104IslamabadBS4 20CS-102IslamabadBS4 20CS-102LahoreMS3 40CS-102LahoreMS2 30CS-101LahoreMS2 10CS-106IslamabadBS1 10CS-105IslamabadBS1 20CS-104IslamabadBS1 40CS-103IslamabadBS1 20CS-102IslamabadBS1 30CS-101IslamabadBS1 MarksCourseCampusDegreeSID FIRST
  • 6. Ahsan Abdullah 6 Normalization: 1NFNormalization: 1NF Update anomalies INSERT. Certain student with SID 5 got admission in a different campus (say) Karachi cannot be added until the student registers for a course. DELETE. If student graduates and his/her corresponding record is deleted, then all information about that student is lost. UPDATE. If student migrates from Islamabad campus to Lahore campus (say) SID = 1, then six rows would have to be updated with this new information.
  • 7. Ahsan Abdullah 7 Normalization: 2NFNormalization: 2NF Every non-key column is fully dependent on the PK FIRST is in 1NF but not in 2NF because degree and campus are functionally dependent upon only on the column SID of the composite key (SID, course). This can be illustrated by listing the functional dependencies in the table: SID —> campus, degree campus —> degree (SID, Course) —> Marks To transform the table FIRST into 2NF we move the columns SID, Degree and Campus to a new table called REGISTRATION. The column SID becomes the primary key of this new table. SID & Campus are NOT unique
  • 8. Ahsan Abdullah 8 Normalization: 2NFNormalization: 2NF SID Degree Campus 1 BS Islamabad 2 MS Lahore 3 MS Lahore 4 BS Islamabad 5 PhD Peshawar SID Course Marks 1 CS-101 30 1 CS-102 20 1 CS-103 40 1 CS-104 20 1 CS-105 10 1 CS-106 10 2 CS-101 30 2 CS-102 40 3 CS-102 20 4 CS-102 20 4 CS-104 30 4 CS-105 40 REGISTRATION PERFORMANCE SID is now a PK PERFORMANCE in 2NF as (SID, Course) uniquely identify Marks
  • 9. Ahsan Abdullah 9 Normalization: 2NFNormalization: 2NF Presence of modification anomalies for tables in 2NF. For the table REGISTRATION, they are:  INSERT: Until a student gets registered in a degree program, that program cannot be offered!  DELETE: Deleting any row from REGISTRATION destroys all other facts in the table. Why there are anomalies? The table is in 2NF but NOT in 3NF
  • 10. Ahsan Abdullah 10 Normalization: 3NFNormalization: 3NF All columns must be dependent only on the primary key. Table PERFORMANCE is already in 3NF. The non-key column, marks, is fully dependent upon the primary key (SID, degree). REGISTRATION is in 2NF but not in 3NF because it contains a transitive dependency. A transitive dependency occurs when a non-key column that is a determinant of the primary key is the determinate of other columns. The concept of a transitive dependency can be illustrated by showing the functional dependencies in REGISTRATION: REGISTRATION.SID —> REGISTRATION.Degree REGISTRATION.SID —> REGISTRATION.Campus REGISTRATION.Campus —> REGISTRATION.Degree Note that REGISTRATION.Degree is determined both by the primary key SID and the non-key column campus.
  • 11. Ahsan Abdullah 11 Normalization: 3NFNormalization: 3NF To transform REGISTRATION into 3NF, we create a new table called CAMPUS_DEGREE and move the columns campus and degree into it. Degree is deleted from the original table, campus is left behind to serve as a foreign key to CAMPUS_DEGREE, and the original table is renamed to STUDENT_CAMPUS to reflect its semantic meaning.
  • 12. Ahsan Abdullah 12 Normalization: 3NFNormalization: 3NF PeshawarPhD5 IslamabadBS4 LahoreMS3 LahoreMS2 IslamabadBS1 CampusDegreeSID REGISTRATION Peshawar5 Islamabad4 Lahore3 Lahore2 Islamabad1 CampusSID STUDENT_CAMPUS PhDPeshawar MSLahore BSIslamabad DegreeCampus CAMPUS_DEGREE
  • 13. Ahsan Abdullah 13 Normalization: 3NFNormalization: 3NF Removal of anomalies and improvement in queries as follows:  INSERT: Able to first offer a degree program, and then students registering in it.  UPDATE: Migrating students between campuses by changing a single row.  DELETE: Deleting information about a course, without deleting facts about all columns in the record.
  • 14. Ahsan Abdullah 14 NormalizationNormalization Conclusions:  Normalization guidelines are cumulative.  Generally a good idea to only ensure 2NF.  3NF is at the cost of simplicity and performance.  There is a 4NF with no multi-valued dependencies.  There is also a 5NF.

Editor's Notes

  • #4: <number>
  • #5: <number>
  • #6: <number>
  • #7: <number>
  • #8: <number>
  • #9: <number>
  • #10: <number>
  • #11: <number>
  • #12: <number>
  • #13: <number>
  • #14: <number>
  • #15: <number>