Project 1 for CS 373: Adding Boolean Retrieval
Existing System
As discussed in class, a basic system for vector-space retrieval (VSR)
is available as a zip file. The Javadocs for this system are also available. Use the
main method of the InvertedIndex class to index a set of documents and
then process queries. A zip file containing two document collections
is available on this link.
Your Task
As discussed in class and in your text, a VSR system uses a weighted
vector representation of queries and documents. In a VSR system,
the cosine of the angle between the query vector and the document
vector is used as a measure document relevance.
In this project, you will implement a different retrieval model, the
Boolean model. In a Boolean retrieval model, queries are
specified as Boolean expressions consisting of index terms linked by
Boolean operators. Documents are represented as sets of index
terms. All documents satisfying the query expression are considered
"relevant", there is no notion of weight. For more information on the
Boolean retrieval model see the following sections of your text:
section 2.5.2, section 4.2.3, and section 8.4.
In this implementation, queries may be formed using the following
three logical operators: +
representing logical AND,
|
representing logical OR, and -
representing logical NOT. All three operators are binary operators
and each has a specific precedence; the order of precedence (from
highest to lowest) is: (1) logical NOT, (2) logical AND, and (3)
logical OR.
To assist you in this assignment, I am providing a package of code
that will parse a String representation of a query and return a parse
tree. A zip file containing that code is available here. Javadocs for the
package are also available. Before beginning the implementation of
your Boolean retrieval system, develop a good understanding of the
input and output of the QueryParser class. You can do this by running
that class's main() method.
Your Boolean retrieval system should be implemented as part of the ir
package of java code. It should be contained in its own package,
for example, the package ir.bool. Where possible you should make use
of other classes developed as part of the ir package, and the runtime
behavior of your package should be as similar as possible to that of
the VSR package. You must provide Javadoc style documentation
of your code package.
Grading
Your grade will be based on the following factors (in order of
importance):
- correct compilation, execution and performance of your code
- appropriate (i.e., time and space efficient) choices of data
structures and algorithms
- good documentation of your code
- good documentation of the group work effort. As part of each
project, each group must submit a brief summary of the contributions
of each group member.
What to turn in
The following files must be placed in your course dropoff directory by
March 6th:
- A zip file containing your Boolean retrieval code package in
addition to any other ir packages that you modified. For example, if
you named your Boolean retrieval package ir.bool and you also modified
some of the code in the ir.queryparser package, your zip file should
contain the following directories: ir/bool and ir/queryparser.
- A text file containing a list of classes that your group either
modified or created as part of this project. The list should also
contain the package name for each class. (This list will mainly be
used to identify classes that you modified or added to packages
outside of your Boolean retrieval package.)
- A text file outlining the contributions made by each group member
to this project.