CSCI 373: Advanced Databases
   
Description This course will provide pratical coverage of essential data mining and data warehousing topics including:
  • Data mining inputs and outputs
  • Data mining algorithms
  • Evaluating data mining results
  • Data warehouse models and OLAP operations
  • Issues in data warehouse design such as monitoring, integration, data cleaning, data loading, and metadata.

Students will work with the Weka data mining tool set version 3.4 (http://www.cs.waikato.ac.nz/ml/weka/). The course will also include work with GIS databases and distributed databases.

   
Text Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) by Ian H. Witten and Eibe Frank. Morgan Kaufmann, 2005.
   
Instructors
Rebecca Bruce
Office: RBH 024
Telephone: 232-2275
e-mail: bruce@cs.unca.edu
Office Hours: TBA
 
Joe Brownsmith
Office: RBH 220
Telephone: 232-5046
e-mail: brownsmith@cs.unca.edu
Office Hours: TBA


Tentative Course Schedule
Week Lectures & Reading Assignments & Exams
Jan11-Jan13 Chapter 1: Introduction KDD and Data Mining Assignment 1: Writing assignment
Jan16-Jan20 Chapter 2: Input: Concepts, Instances, Attributes
Chapter 3: Output: Knowledge Representation
Assignment 2: Attributes
Assignment 3: Knowledge Representation
Jan23-Jan27 Chapters 4.3 & 6.1: Decision Trees
Decision Trees & Information Gain
Assignment 4: Decision Trees
Thursday: Quiz 1
Jan30-Feb3 Chapter 5: Crossvalidation and Metrics
Statistics review, & Statistics in performance evaluation
Assignment 5: Evaluation
Assignment 6: Statistical sig.
Feb6-Feb10 Chapters 4.2, & 6.7: Bayesian Models
Naive Bayes
Thursday: Quiz 2
Assignment 7: Naive Bayes
Feb13-Feb17 Chapters 4.1, 4.4: Classification Rules
Chapter 6: Classification Rules
Assignment 8: Classification Rules
Feb20-Feb24 Ch 4.7, An Example, Class Exercise
Ch 6.4: Instance-Based Learning
Thursday: Quiz 3
Assignment 9: Nearest Neighbor
Feb27-March3 Ch 6.6: K-Means & Hierarchical Clustering,
Probabilistic Clustering, & A Paper
Thursday: Quiz 4
Due After Break: Assignment 10: Clustering
March6-March10 Spring Break: No Classes
March13-March17 Tuesday: Lecture by Mike Squires
Thursday: Lecture by Lee Johnson
Data Mining Project
March20-March24 Distributed Databases Writing assignment
March27-March31 Distributed Databases Quiz 5 & Writing assignment
April3-April7 Undergraduate Research: No Thursday Classes
Data Warehousing
Writing assignment
April10-April14 Data Warehousing Quiz 6 & Writing assignment
April17-April21 Tuesday: Intro. to GIS & GIS Lab work in RH141
Thursday: GIS Lecture
April24-April28 Tuesday: GIS Lab work in RH141
Thursday: Lynda Wayne on GIS Meta Data
May3-May9 Final Exam: Tues, May 9 from 8-10:30am
Sample Final Exam Questions on GIS


Resources:

Weka: http://www.cs.waikato.ac.nz/~ml/weka/index.html


Grades

Grades will be based on points earned from assignments frequent quizzes, and a final exam as follows:

Quizzes 30%
Final Exam 30%
Assignments
HW: 30%
Project:    10%
Total 100%

The following numerical scale will be used in assigning letter grades based on Score, the weighted score computed using the preceding table. The instructor reserves the option of relaxing the cut-offs for a letter grade in special circumstances.

Score ≥ 92A
Score ≥ 90 & Score < 92A-
Score ≥ 88 & Score < 90B+
Score ≥ 82 & Score < 88B
Score ≥ 80 & Score < 82B-
Score ≥ 78 & Score < 80C+
Score ≥ 72 & Score < 78C
Score ≥ 70 & Score < 72C-
Score ≥ 68 & Score < 70D+
Score ≥ 60 & Score < 68D
Score < 60F

Grading Procedures:

Assignments: Assignments will be graded on the +/ok/- scale, where + indicates excellent, ok indicates satisfactory, and - indicates needs improvement. Assignments will be given each class period and will be due at the start of the following class period; late assignments will not be accepted. Your solutions to these assignments are your way of telling the instructor about your mastery of this course. Your solutions must be clearly different than those turned in by others in the class and represent a unique and special effort on your part.

Quizes: During the semester there will be seven short minute quizzes. These quizzes will cover recent class material and will resemble homework questions. Quizes will be given on the dates indicated unless otherwise indicated in class. Your lowest quize grade will be droped in calculating your course grade.


Attendance Policy:

Lectures: Students are expected to attend all class lectures. Failure to do so will be will impact your homework grade and will be considered a lack of interest in success on the part of the student.

Quizes: If you must miss an quiz due to illness you must email or telephone the instructor before the scheduled time and perhaps something can be arranged to avoid a zero for this quiz. Failure to notify the instructor prior to the scheduled time will produce an automatic zero for the quiz.