A little more normalization

Functional dependency

Given sets of attributes X and Y, write X → Y when knowing X tells you Y. That is, there is there is a functional relationship between X and Y. For example Social-Security-Number → Name.

Functional dependencies should be given by domain experts!

About once a year, the department secretary will get a call with a question that goes something this: “I was at UNCA for a year about 15 years ago and I took a course on programming. I think we used C or FORTRAN. Could you send me a syllabus for that course.” Think of a useful set of attributes for a database that could be used for this “query.”

By the way, students who are passing through often forget the names and instructors of a course, but they may remember the instructor’s gender or some details about the room in which the course was taught. They also may remember programming languages or software packages used in the course.

Once you have written some attributes, try to write some functional dependencies.

Closures of functional dependencies

Given a collection of functional dependencies, it is possible to derive new functional dependencies. Here is one simple example: If Social-Security-Number → UNCA-ID and UNCA-ID → name, then Social-Security-Number → name.

Armstrong’s three axioms are a complete and sound set of axioms for generating functional dependencies.

Axiom of transitivity — If X → Y and Y → Z, then X → Z.
Axiom of reflexivity — If Y ⊆ X, then X → Y.
Axiom of augmentation — If X → Y, X Z → Y Z for any set Z of attributes.

Complete and sound axiomatizations are cool.

Use Armstrong’s axioms to generate some (most likely useless) additional functional dependencies.

Relations and tables

Though it may be unnatural, think of a single relation for your enrollment database. Let it have about a dozen attributes.

Do not worry about normalization at this time.

Key definitions

A superkey is a collection of attributes that uniquely determine each row. The rows are functionally dependent on the superkey.

A candidate key is a a superkey that contains no smaller superkey. The rows are fully functionally dependent on the candidate key.

A prime attribute is an attribute that occurs in a candidate key. A non-prime attribute isn’t prime.

Identify some keys for your relation.

The loss-less join

When a large relation is broken into two smaller relations that, when joined with ⋈, recreate the original larger relation; that decomposition is a loss-less-join decomposition.

What are the possible decomposition s of the relationship (Social-Security-Number, UNCA-Id, Date-of-Birth)?

Which of these decompositions are loss-less?

Third normal form and Boyce-Codd normal form

A relation is in third normal form if for every dependency X → A, where A is a single attribute, either X is a superkey or or A is an element of X or A is a prime attribute.

A relation is in Boyce-Codd normal form if for every dependency X → Y either X is a superkey or Y is a subset of X (a trivial functional dependency).

Is your single big relation for looking up old courses in Boyce-Codd Normal Form?

If a relation is not in BCNF, you can often normalize it. Take one of the offending functional dependencies, say X → Y. Let T be all the attributes of your relation that and in neither X or Y.

Find a triple of attributes sets X, Y and T as described above. Now break your table into two tables, one with attributes from X and Y and the other with attributes from X and T.

Are both of these tables in Boyce-Codd form? If not, keep decomposing.

Fourth normal form

This one has to do with mixing unrelated items in a table. You will need to look at the Wikipeida entry for this one for a more formal definition.

Suppose you have a table to keep up with everyone’s favorite apples and oranges.

person	orange	apple
Tom	Albermarle Pippin	California Navel
Tom	Orange Sweet	Valencia
Joe	Golden Delicious	Valencia
Joe	Zabergau Reinette	Cleopatra Orange

The rows of this relationship make no sense. How are Tom’s preferences to the Albermarle Pippin and California Navel related? Should those be two separate rows with NULL in one of the fruits?

person	orange	apple
Tom	Albermarle Pippin	NULL
Tom	NULL	California Navel
Tom	Orange Sweet	NULL
Tom	NULL	Valencia
Joe	Golden Delicious	NULL
Joe	NULL	Valencia
Joe	Zabergau Reinette	NULL
Joe	NULL	Cleopatra Orange

Or, if we are pairing unrelated things, should we just add all possible pairs of unrelated items?

person	orange	apple
Tom	Albermarle Pippin	California Navel
Tom	Albermarle Pippin	Valencia
Tom	Orange Sweet	California Navel
Tom	Orange Sweet	Valencia
Joe	Golden Delicious	Cleopatra Orange
Joe	Golden Delicious	Valencia
Joe	Zabergau Reinette	Cleopatra Orange
Joe	Zabergau Reinette	Valencia

Or maybe there should be two different tables: One for oranges and one for apples? Or maybe the table should have fruit-type and fruit-variety attributes?

Make sure your relations don’t have unrelated information.

Normalization references

Normal Forms notes from the database course at Illinois (graduate studies allowed
A tutorial by the guy you gave us that relational algebra interface