## Functional dependency

Given *sets* of attributes `X` and `Y`,
write `X` → `Y`
when knowing `X` tells you `Y`.
That is, there is there is a functional relationship between
`X` and `Y`. For example
`Social-Security-Number` → `Name`.

*Functional dependencies should be given by domain experts!*

About once a year, the department secretary will get a call with a question that goes something this: “I was at UNCA for a year about 15 years ago and I took a course on programming. I think we used C or FORTRAN. Could you send me a syllabus for that course.” Think of a useful set of attributes for a database that could be used for this “query.”

By the way, students who are passing through often forget the names and instructors of a course, but they may remember the instructor’s gender or some details about the room in which the course was taught. They also may remember programming languages or software packages used in the course.

Once you have written some attributes, try to write some functional dependencies.

### Closures of functional dependencies

Given a collection of functional dependencies, it is possible to
derive new functional dependencies. Here is one simple example:
If `Social-Security-Number` → `UNCA-ID` and
`UNCA-ID` → `name`,
then `Social-Security-Number` → `name`.

Armstrong’s three axioms are a *complete* and
*sound* set of axioms for generating functional dependencies.

- Axiom of transitivity — If
`X`→`Y`and`Y`→`Z`, then`X`→`Z`. - Axiom of reflexivity — If
`Y`⊆`X`, then`X`→`Y`. - Axiom of augmentation — If
`X`→`Y`,`X Z`→`Y Z`for any set`Z`of attributes.

Complete and sound axiomatizations are cool.

Use Armstrong’s axioms to generate some (most likely useless) additional functional dependencies.

## Relations and tables

Though it may be unnatural, think of a *single* relation
for your enrollment database.
Let it have about a dozen attributes.

Do not worry about normalization at this time.

## Key definitions

A superkey is a collection of attributes that
uniquely determine each row.
The rows are *functionally dependent* on the *superkey*.

A candidate key is a
a superkey that contains no smaller superkey.
The rows are *fully* functionally dependent on the
*candidate key*.

A prime attribute is an attribute that occurs in a candidate key. A non-prime attribute isn’t prime.

Identify some keys for your relation.

## The loss-less join

When a large relation is broken into two smaller relations that, when joined with ⋈, recreate the original larger relation; that decomposition is a loss-less-join decomposition.

What are the possible decomposition s of
the relationship (`Social-Security-Number`,
`UNCA-Id`, `Date-of-Birth )?
`

Which of these decompositions are loss-less?

## Third normal form and Boyce-Codd normal form

A relation is in third normal form if
for every dependency `X` → `A`, where
`A` is a single attribute,
*either* `X` is a superkey *or*
*or* `A` is an element of `X`
*or* `A` is a prime attribute.

A relation is in Boyce-Codd normal form if
for every dependency `X` → `Y`
*either* `X` is a superkey *or*
`Y` is a subset of `X` (a trivial functional dependency).

Is your single big relation for looking up old courses in Boyce-Codd Normal Form?

If a relation is not in BCNF, you can often normalize it.
Take one of the offending functional dependencies, say
`X` → `Y`.
Let `T` be all the attributes of your relation that
and in neither `X` or `Y`.

Find a triple of attributes sets
`X`, `Y` and `T` as described above.
Now break your table into two tables, one with attributes from
`X` and `Y` and the other with attributes from
`X` and `T`.

Are both of these tables in Boyce-Codd form? If not, keep decomposing.

## Fourth normal form

This one has to do with mixing unrelated items in a table. You will need to look at the Wikipeida entry for this one for a more formal definition.

Suppose you have a table to keep up with everyone’s favorite apples and oranges.

person | orange | apple |
---|---|---|

Tom | Albermarle Pippin | California Navel |

Tom | Orange Sweet | Valencia |

Joe | Golden Delicious | Valencia |

Joe | Zabergau Reinette | Cleopatra Orange |

The rows of this relationship make no sense. How are Tom’s
preferences to the Albermarle Pippin and California Navel related?
Should those be two separate rows with `NULL`

in one of the
fruits?

person | orange | apple |
---|---|---|

Tom | Albermarle Pippin | NULL |

Tom | NULL | California Navel |

Tom | Orange Sweet | NULL |

Tom | NULL | Valencia |

Joe | Golden Delicious | NULL |

Joe | NULL | Valencia |

Joe | Zabergau Reinette | NULL |

Joe | NULL | Cleopatra Orange |

Or, if we are pairing unrelated things, should we just add all possible pairs of unrelated items?

person | orange | apple |
---|---|---|

Tom | Albermarle Pippin | California Navel |

Tom | Albermarle Pippin | Valencia |

Tom | Orange Sweet | California Navel |

Tom | Orange Sweet | Valencia |

Joe | Golden Delicious | Cleopatra Orange |

Joe | Golden Delicious | Valencia |

Joe | Zabergau Reinette | Cleopatra Orange |

Joe | Zabergau Reinette | Valencia |

Or maybe there should be two different tables: One for oranges and one for apples? Or maybe the table should have fruit-type and fruit-variety attributes?

Make sure your relations don’t have unrelated information.