Project 1 for CS 373: Adding Boolean Retrieval

Existing System

As discussed in class, a basic system for vector-space retrieval (VSR) is available as a zip file. The Javadocs for this system are also available. Use the main method of the InvertedIndex class to index a set of documents and then process queries. A zip file containing two document collections is available on this link.

Your Task

As discussed in class and in your text, a VSR system uses a weighted vector representation of queries and documents. In a VSR system, the cosine of the angle between the query vector and the document vector is used as a measure document relevance.

In this project, you will implement a different retrieval model, the Boolean model. In a Boolean retrieval model, queries are specified as Boolean expressions consisting of index terms linked by Boolean operators. Documents are represented as sets of index terms. All documents satisfying the query expression are considered "relevant", there is no notion of weight. For more information on the Boolean retrieval model see the following sections of your text: section 2.5.2, section 4.2.3, and section 8.4.

In this implementation, queries may be formed using the following three logical operators: + representing logical AND, | representing logical OR, and - representing logical NOT. All three operators are binary operators and each has a specific precedence; the order of precedence (from highest to lowest) is: (1) logical NOT, (2) logical AND, and (3) logical OR.

To assist you in this assignment, I am providing a package of code that will parse a String representation of a query and return a parse tree. A zip file containing that code is available here. Javadocs for the package are also available. Before beginning the implementation of your Boolean retrieval system, develop a good understanding of the input and output of the QueryParser class. You can do this by running that class's main() method.

Your Boolean retrieval system should be implemented as part of the ir package of java code. It should be contained in its own package, for example, the package ir.bool. Where possible you should make use of other classes developed as part of the ir package, and the runtime behavior of your package should be as similar as possible to that of the VSR package. You must provide Javadoc style documentation of your code package.

Grading

Your grade will be based on the following factors (in order of importance):
  1. correct compilation, execution and performance of your code
  2. appropriate (i.e., time and space efficient) choices of data structures and algorithms
  3. good documentation of your code
  4. good documentation of the group work effort. As part of each project, each group must submit a brief summary of the contributions of each group member.

What to turn in

The following files must be placed in your course dropoff directory by March 6th: