# A little more normalization

## Functional dependency

Given sets of attributes X and Y, write X → Y when knowing X tells you Y. That is, there is there is a functional relationship between X and Y. For example Social-Security-Number → Name.

Functional dependencies should be given by domain experts!

About once a year, the department secretary will get a call with a question that goes something this: “I was at UNCA for a year about 15 years ago and I took a course on programming. I think we used C or FORTRAN. Could you send me a syllabus for that course.” Think of a useful set of attributes for a database that could be used for this “query.”

By the way, students who are passing through often forget the names and instructors of a course, but they may remember the instructor’s gender or some details about the room in which the course was taught. They also may remember programming languages or software packages used in the course.

Once you have written some attributes, try to write some functional dependencies.

### Closures of functional dependencies

Given a collection of functional dependencies, it is possible to derive new functional dependencies. Here is one simple example: If Social-Security-Number → UNCA-ID and UNCA-ID → name, then Social-Security-Number → name.

Armstrong’s three axioms are a complete and sound set of axioms for generating functional dependencies.

• Axiom of transitivity — If X → Y and Y → Z, then X → Z.
• Axiom of reflexivity — If Y ⊆ X, then X → Y.
• Axiom of augmentation — If X → Y, X Z → Y Z for any set Z of attributes.

Complete and sound axiomatizations are cool.

Use Armstrong’s axioms to generate some (most likely useless) additional functional dependencies.

## Relations and tables

Though it may be unnatural, think of a single relation for your enrollment database. Let it have about a dozen attributes.

Do not worry about normalization at this time.

## Key definitions

A superkey is a collection of attributes that uniquely determine each row. The rows are functionally dependent on the superkey.

A candidate key is a a superkey that contains no smaller superkey. The rows are fully functionally dependent on the candidate key.

A prime attribute is an attribute that occurs in a candidate key. A non-prime attribute isn’t prime.

Identify some keys for your relation.

## The loss-less join

When a large relation is broken into two smaller relations that, when joined with ⋈, recreate the original larger relation; that decomposition is a loss-less-join decomposition.

What are the possible decomposition s of the relationship (Social-Security-Number, UNCA-Id, Date-of-Birth)?

Which of these decompositions are loss-less?

## Third normal form and Boyce-Codd normal form

A relation is in third normal form if for every dependency XA, where A is a single attribute, either X is a superkey or or A is an element of X or A is a prime attribute.

A relation is in Boyce-Codd normal form if for every dependency X → Y either X is a superkey or Y is a subset of X (a trivial functional dependency).

Is your single big relation for looking up old courses in Boyce-Codd Normal Form?

If a relation is not in BCNF, you can often normalize it. Take one of the offending functional dependencies, say X → Y. Let T be all the attributes of your relation that and in neither X or Y.

Find a triple of attributes sets X, Y and T as described above. Now break your table into two tables, one with attributes from X and Y and the other with attributes from X and T.

Are both of these tables in Boyce-Codd form? If not, keep decomposing.

## Fourth normal form

This one has to do with mixing unrelated items in a table. You will need to look at the Wikipeida entry for this one for a more formal definition.

Suppose you have a table to keep up with everyone’s favorite apples and oranges.

personorangeapple
TomAlbermarle PippinCalifornia Navel
TomOrange SweetValencia
JoeGolden DeliciousValencia
JoeZabergau ReinetteCleopatra Orange

The rows of this relationship make no sense. How are Tom’s preferences to the Albermarle Pippin and California Navel related? Should those be two separate rows with `NULL` in one of the fruits?

personorangeapple
TomAlbermarle PippinNULL
TomNULLCalifornia Navel
TomOrange SweetNULL
TomNULLValencia
JoeGolden DeliciousNULL
JoeNULLValencia
JoeZabergau ReinetteNULL
JoeNULLCleopatra Orange

Or, if we are pairing unrelated things, should we just add all possible pairs of unrelated items?

personorangeapple
TomAlbermarle PippinCalifornia Navel
TomAlbermarle PippinValencia
TomOrange SweetCalifornia Navel
TomOrange SweetValencia
JoeGolden DeliciousCleopatra Orange
JoeGolden DeliciousValencia
JoeZabergau ReinetteCleopatra Orange
JoeZabergau ReinetteValencia

Or maybe there should be two different tables: One for oranges and one for apples? Or maybe the table should have fruit-type and fruit-variety attributes?

Make sure your relations don’t have unrelated information.