University of North Carolina at Asheville

CSCI 202: Introduction to Data Structures


Lab 12: Building a Word Index Generator

[Introduction] [Instructions] [What To Submit]


Introduction

In this lab you will actually use some of the basic collection classes you have studied in earlier labs. In particular, you will employ both the BinaryTree and Queue classes to construct a simple application that scans a textfile and builds an alphabetized list of all the distinct words it contains. Moreover, for each word in the list, the application accumulates a list of all the line numbers in the file where that word occurs.

For this project, you will be provided with JAR files containing two separate Java packages, lab07 and lab09. These packages contain essentially the same classes that you worked with in your earlier labs on linked lists and binary trees. In this lab they will provide the basic collection classes for the application, as well as superclasses that will make it easy to build a specialized binary tree for this application.

As usual, the basic GUI framework is already written for you. Your first tasks will be to download a set of source files and the JAR files containing the two packages mentioned above. With these packages you will be able to compile the source files, which include two subclasses of classes contained in the packages. One of these subclasses is fully implemented, but one will require you to complete a stubbed method. After you complete this part, the application should be ready to run.

To give you a feel for the behavior of this application, the image below shows the initial window:

As you can see, the center panel contains two multiline text boxes (of type javax.swing.JTextArea). When you enter the name of a valid textfile (and its directory pathname) in the bottom control panel and press the Start button, the application reads the file and displays each line of text in the upper text area. For convenience, the display adds line numbers to the start of each line.

As it scans the file, the application also quietly inserts each word it encounters (and the line number where it occurs) into a specialized binary tree which will be defined below. After the file has been completely scanned, the application then traverses the binary tree and displays an accumulated word index in the lower text area. For example, scanning the sample file stuff.txt yields the display shown below:


Instructions:

  1. Login and enter the directory csci/202/labs. If this directory does not exist, create it now.
  2. Launch the NetBeans IDE, select the File/New Project... option in the toplevel menu bar, and create a new project called Lab12. Define the project to be a General Java Application and accept all the default settings so that NetBeans will automatically write a class lab12.Main for you. You should find the source file for this class in Lab12/src/lab12.
  3. Download the following files and import them into your Lab12/src/lab12 project folder:

    DisplayFrame.java WordIndex.java IndexItem.java ErrorDialog.java
  4. Now create a new subdirectory in your Lab12 project folder, named Lab12/lib. Then download the following two JAR files into this directory, and make them accessible to your Lab12project:

    Lab07.jar Lab09.jar
  5. Next create another new subdirectory in your Lab12 project folder, named Lab12/samples. Then download the following two sample textfiles into this directory for later testing:

    stuff.txt penrose.txt
  6. Open the source file for lab12.Main and insert the following lines of code into its main() method:
    	DisplayFrame frame = new DisplayFrame();
    	frame.setVisible(true);
    
  7. Once you have assembled all these files, you should be able to generate a complete set of Javadoc pages for the classes they define. But for a quick overview, the only task of the main() method in lab12.Main is to create a DisplayFrame, which is the toplevel application window for this project. Both the DisplayFrame and ErrorDialog classes are completely implemented and ready for use. However, you might want to take the time to look over the code in DisplayFrame.java, since this class contains the event handler buildIndex() that is actually responsible for scanning textfiles.

    WordIndex is a subclass of lab09.BinaryTree that has been customized for this application. As its name suggests, this class accumulates the list of words and their associated line numbers as they are found in the textfile. However, note that the source file WordIndex.java contains only a partial implementation of this class. You will need to complete its definition as specified below to make everyting work properly.

    Similarly, IndexItem is defined as a subclass of the Item class that you have used earlier with both ordered lists and binary search trees, For this lab the Item superclass is provided by the lab09 package. The main new feature in an IndexItem is the addition of a Queue as new member data, which will be used to accumulate the list of line numbers marking the occurrences of each distinct word found in the file.It also overrides the Item version of the toString() method, so that a traversal of the binary tree can display each word and its line-number list as shown in the second figure above. This class has also been completely implemented, and is ready for use.

  8. Now open the incomplete file WordIndex.java, and locate the stubbed method

    void insert (IndexItem item)

    This method is similar in some respects to the BinaryTree insert() method that you implemented in Lab 09. If the IndexItem contains a new word, a new tree node is created to hold that IndexItem. Note that in addition to the word itself, the new IndexItem stores the line number where the word was first found in its line-number queue.

    In this lab it is slightly trickier to handle repeated occurrences of a word. In essence, the method must extract the single line number in the queue of the input IndexItem parameter, and add that line to the existing IndexItem for that word. In other words, it effectively discards the input IndexItem parameter after extracting its line number. Of course the method must also increment the occurrence count for that word, using its inherited incOccurrences() method as in your earlier labs.

    Hint: You might start by simply copying the code that you wrote for the original BinaryTree insert() method in Lab 09, and modifying it as necessary.

  9. Once you have completed your implementation of the WordIndex class, try to compile your Lab12 project. Of course if you encounter any compiler errors, you will have to fix them all before you can proceed...
  10. Run the project. You should now verify that the application behaves as described in the Introduction. If you don't want to bother writing your own sample textfiles, just use the two freebies stuff.txt and penrose.txt that you should have downloaded earlier. The second of these files provides a more substantial test case, since it contains several complete paragraphs copied from the introduction to The Emperor's New Mind by Roger Penrose (Oxford University Press, 1989).


What To Submit

When your project is complete and working correctly, demonstrate it to your lab instructor. Then, before you exit NetBeans, clean your Lab12 project. Finally, before you logout, open a terminal window and use the cd command to enter your csci/202/labs directory. Then create a JAR file of your Lab12 project folder, using the command

jar cf Lab12Project.jar Lab12
Please leave Lab12Project.jar and your Lab12 project folder in your csci/202/labs directory for the remainder of the semester.