Normalization
Normalization is a well-known standard that helps efficient storage and retrieval of data from the relational database systems. There are three important normalizations available. They are:
First Normal Form
Second Normal Form
Third Normal Form
This articles talks about these three normal forms with examples. Let us start.
First Normal Form
The first normal form says, "The information stored in the form of database table should be well split into columns. When you split make sure same information is not split into multiple columns".
Consider the table below
ID NAME
1 Syed Sharique Imam
2 Asaad Khurshid
Let us say there are 4000 such members in the table. If you want to retrieve the data in the descending order based on the last name then you need to perform string manipulation on the stored data. Also it is not always guaranteed that the information you split is fully qualified last name as you may perform string manipulation based on the space. The first statement in the Normalization (First Normal form) specifies this. Data should be well split into column. Why do I put "well split" into underline? Because we all know that data should be split into columns. Definitely no one will design the preceding table as below:
Below is the Well Split table:
Well, but the First normal form says "When splitting do not split same information into multiple column". It may confuse that I split the same information, the Name into multiple columns. OK. Do not look at the initial column before the split that is; the member name. Look at the final result and apply the rule:
"Do not split same information into multiple column"
First Name and Last Name are different, so it is not the same information. Each column is well categorized. So why the First Norm says "So not split same information into multiple columns"?
To explain that look at the below table:
The table stores the Orderid and which customer placed the order in custid. But, if you have look at the columns for products in the table, it is stored as separate column. The same information is repeated in multiple columns like product1, product2 etc., to show the products in each order. Number of product that belongs to the Order is not constant as it varies. So Data Manipulation should accommodate the change in data like inserting a row or Modifying a row. The schema should not change.
The correct design is shown below:
Second Normal Form
The Second Normal form says, "Entire columns in the table should functionally depend on a single key that uniquely identifies the row also should adhere to the first normal form". Note that, here the key may be a combination of more than one column.
In the above table say Orders, Customerid and OrderPrice are depending on the single key column Orderid. The CustomerName field is not related to OrderId and should go on a Customer table. If we remove the Customer Name fields from this field the table is in Second normal form.
Third Normal Form
The third normal form says, "Any columns in the table should not depend on column which is not a key column that uniquely identifies the row. Also, first it should be in second normal form".
It means if a table has some columns that depends on some other column that is not a primary key column then the table is not in third normal form. Let us consider the following example:
In the above example the TotalPrice column is dependent on UnitPrice and NoOfItems. Hence, this column should be removed from the database. The program that processes this table should calculate the TotalPrice or a view could be created to have the calculated column such as TotalPrice. Whatever the solution, for the preceding table to be in the third normal form the TotalPrice column should be removed from the table.
No comments:
Post a Comment