Anomalies
Consider a silly database where we keep up with students, grades, and friendships in a single table.
Jason | 10 | Mary | 9 |
Jason | 10 | John | 12 |
What can go wrong?
- Modification or update anomaly — Jason moves to grade 11, but only one tuple is updated.
- Insertion anomaly — How to handle a new student who arrives without friends.
- Deletion anomaly — Do you remove Jason when he loses his last friend
Yes, it is silly, but consider the one-person department which temporarily loses its single employee, or the newly-created department, or all the tables where we get up with office locations.
Normalization
Start by consulting a down-to-earth view from Microsoft.
Then there is Wikipedia for a more formal view.
First normal form
“Do not use multiple fields in a single table to store similar data.” says Microsoft.
Or, every attribute must be an atom. Attributes shall not be lists.
Second normal form
Functional dependency
Given sets X and Y, write X → Y when knowing X tells you Y.
Don’t just look at tables to determine this information. You need someone with knowledge of the field. Sometimes functional dependency is just happenstance. In a small crowd, knowing the birthday may determine the name.
Key definitions
A superkey is a collection of attributes that uniquely determine each row. The row is functionally dependent on the superkey.
A candidate key is a a superkey that contains no smaller superkey. The row is fully functionally dependent on the candidate key.
A prime attribute is an attribute that occurs in a candidate key. A non-prime attribute isn’t prime.
Finally, the definition
A relation is in second normal form if it is in first normal form and no non-prime attribute is dependent on any proper subset of the relation.
Third Normal Formal
A relation is in third normal form if the relation is in second normal form and every non-prime attribute is non-transitively dependent on every superkey.
A transitive dependency is a situation where X → Z and Z → A and X and Z are different.
Boyce-Codd normal form
A relation is in Boyce-Codd normal form if for every dependency X → Y either X is a superkey or Y is a subset of X (a trivial functional dependency).
Similar definition for third normal form
A relation is in third normal form if for every dependency X → A, where A is a single attribute, either X is a superkey or or A is an element of X or A is a prime attribute.
Fourth normal form
You need to look at the Wikipeida entry for this one.