Normalization is the process of efficiently
organizing data in a database. There are two goals of the normalization
process: eliminating redundant data (for example, storing the same data in more
than one table)
and ensuring data
dependencies make sense (only storing related data in a table). Both
of these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored.
The
Normal Forms
The
database community has developed a series of guidelines for ensuring that
databases are normalized. These are referred to as normal forms and are
numbered from one (the lowest form of normalization, referred to as first normal
form or 1NF) through five (fifth normal form or 5NF). In practical
applications, you'll often see 1NF,
2NF,
and 3NF
along with the occasional 4NF. Fifth normal form is very rarely seen and won't
be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies.
First
Normal Form (1NF)
First normal form (1NF) sets the
very basic rules for an organized database:
- Eliminate duplicative columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second
Normal Form (2NF)
Second normal form (2NF) further
addresses the concept of removing duplicative data:
- Meet all the requirements of the first normal form.
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.
Third
Normal Form (3NF)
Third normal form (3NF) goes one
large step further:
- Meet all the requirements of the second normal form.
- Remove columns that are not dependent upon the primary key.
Boyce-Codd
Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also
referred to as the "third and half (3.5) normal form", adds one more
requirement:
- Meet all the requirements of the third normal form.
- Every determinant must be a candidate key.
Fourth
Normal Form (4NF)
Finally, fourth normal form (4NF)
has one additional requirement:
- Meet all the requirements of the third normal form.
- A relation is in 4NF if it has no multi-valued dependencies.
Benefits of normalization
Normalization produces smaller tables with smaller rows:
- More rows per page (less logical I/O)
- More rows per I/O (more efficient)
- More rows fit in cache (less physical I/O)
The benefits of normalization include:
- Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page.
- You usually have more tables.
- You can have more clustered indexes (one per table), so you get more flexibility in tuning queries.
- Index searching is often faster, since indexes tend to be narrower and shorter.
- More tables allow better use of segments to control physical placement of data.
- You usually have fewer indexes per table, so data modification commands are faster.
- Fewer null values and less redundant data, making your database more compact.
- Triggers execute more quickly if you are not maintaining redundant data.
- Data modification anomalies are reduced.
- Normalization is conceptually cleaner and easier to maintain and change as your needs change.
