## Anomalies

Consider a silly database where we keep up with students, grades, and friendships in a single table.

Jason | 10 | Mary | 9 |

Jason | 10 | John | 12 |

What can go wrong?

- Modification or update anomaly — Jason moves to grade 11, but only one tuple is updated.
- Insertion anomaly — How to handle a new student who arrives without friends.
- Deletion anomaly — Do you remove Jason when he loses his last friend

Yes, it is silly, but consider the one-person department which temporarily loses its single employee, or the newly-created department, or all the tables where we get up with office locations.

## Normalization

Start by consulting a down-to-earth view from Microsoft.

Then there is Wikipedia for a more formal view.

## First normal form

“Do not use multiple fields in a single table to store similar data.” says Microsoft.

Or, every attribute must be an *atom*.
Attributes shall not be lists.

## Second normal form

### Functional dependency

Given *sets* `X` and `Y`,
write `X` → `Y`
when knowing `X` tells you `Y`.

Don’t just look at tables to determine this information. You need someone with knowledge of the field. Sometimes functional dependency is just happenstance. In a small crowd, knowing the birthday may determine the name.

### Key definitions

A superkey is a collection of attributes that
uniquely determine each row.
The row is *functionally dependent* on the *superkey*.

A candidate key is a
a superkey that contains no smaller superkey.
The row is *fully* functionally dependent on the
*candidate key*.

A prime attribute is an attribute that occurs in a candidate key. A non-prime attribute isn’t prime.

### Finally, the definition

A relation is in second normal form if it is in first normal form *and*
no non-prime attribute is dependent on any proper subset
of the relation.

## Third Normal Formal

A relation is in third normal form if the relation is in second normal form and every non-prime attribute is non-transitively dependent on every superkey.

A transitive dependency is a situation where
`X` → `Z`
and `Z` → `A` and
`X` and `Z` are different.

## Boyce-Codd normal form

A relation is in Boyce-Codd normal form if
for every dependency `X` → `Y`
*either* `X` is a superkey *or*
`Y` is a subset of `X` (a trivial functional dependency).

### Similar definition for third normal form

A relation is in third normal form if
for every dependency `X` → `A`, where
`A` is a single attribute,
*either* `X` is a superkey *or*
*or* `A` is an element of `X`
*or* `A` is a prime attribute.

## Fourth normal form

You need to look at the Wikipeida entry for this one.