# A little normalization

## Anomalies

Consider a silly database where we keep up with students, grades, and friendships in a single table.

 Jason 10 Mary 9 Jason 10 John 12

What can go wrong?

• Modification or update anomaly — Jason moves to grade 11, but only one tuple is updated.
• Insertion anomaly — How to handle a new student who arrives without friends.
• Deletion anomaly — Do you remove Jason when he loses his last friend

Yes, it is silly, but consider the one-person department which temporarily loses its single employee, or the newly-created department, or all the tables where we get up with office locations.

## Normalization

Start by consulting a down-to-earth view from Microsoft.

Then there is Wikipedia for a more formal view.

## First normal form

“Do not use multiple fields in a single table to store similar data.” says Microsoft.

Or, every attribute must be an atom. Attributes shall not be lists.

## Second normal form

### Functional dependency

Given sets X and Y, write X → Y when knowing X tells you Y.

Don’t just look at tables to determine this information. You need someone with knowledge of the field. Sometimes functional dependency is just happenstance. In a small crowd, knowing the birthday may determine the name.

### Key definitions

A superkey is a collection of attributes that uniquely determine each row. The row is functionally dependent on the superkey.

A candidate key is a a superkey that contains no smaller superkey. The row is fully functionally dependent on the candidate key.

A prime attribute is an attribute that occurs in a candidate key. A non-prime attribute isn’t prime.

### Finally, the definition

A relation is in second normal form if it is in first normal form and no non-prime attribute is dependent on any proper subset of the relation.

## Third Normal Formal

A relation is in third normal form if the relation is in second normal form and every non-prime attribute is non-transitively dependent on every superkey.

A transitive dependency is a situation where XZ and ZA and X and Z are different.

## Boyce-Codd normal form

A relation is in Boyce-Codd normal form if for every dependency XY either X is a superkey or Y is a subset of X (a trivial functional dependency).

### Similar definition for third normal form

A relation is in third normal form if for every dependency XA, where A is a single attribute, either X is a superkey or or A is an element of X or A is a prime attribute.

## Fourth normal form

You need to look at the Wikipeida entry for this one.