A little normalization

Anomalies

Consider a silly database where we keep up with students, grades, and friendships in a single table.

Jason10Mary9
Jason10John12

What can go wrong?

Yes, it is silly, but consider the one-person department which temporarily loses its single employee, or the newly-created department, or all the tables where we get up with office locations.

Normalization

Start by consulting a down-to-earth view from Microsoft.

Then there is Wikipedia for a more formal view.

First normal form

“Do not use multiple fields in a single table to store similar data.” says Microsoft.

Or, every attribute must be an atom. Attributes shall not be lists.

Second normal form

Functional dependency

Given sets X and Y, write X → Y when knowing X tells you Y.

Don’t just look at tables to determine this information. You need someone with knowledge of the field. Sometimes functional dependency is just happenstance. In a small crowd, knowing the birthday may determine the name.

Key definitions

A superkey is a collection of attributes that uniquely determine each row. The row is functionally dependent on the superkey.

A candidate key is a a superkey that contains no smaller superkey. The row is fully functionally dependent on the candidate key.

A prime attribute is an attribute that occurs in a candidate key. A non-prime attribute isn’t prime.

Finally, the definition

A relation is in second normal form if it is in first normal form and no non-prime attribute is dependent on any proper subset of the relation.

Third Normal Formal

A relation is in third normal form if the relation is in second normal form and every non-prime attribute is non-transitively dependent on every superkey.

A transitive dependency is a situation where XZ and ZA and X and Z are different.

Boyce-Codd normal form

A relation is in Boyce-Codd normal form if for every dependency XY either X is a superkey or Y is a subset of X (a trivial functional dependency).

Similar definition for third normal form

A relation is in third normal form if for every dependency XA, where A is a single attribute, either X is a superkey or or A is an element of X or A is a prime attribute.

Fourth normal form

You need to look at the Wikipeida entry for this one.

Normalization