Lab 14

Performance

[Introduction] [Instructions] [What To Submit]

Introduction

In this lab you develop an application that can test and compare the time performance of various algorithms used for essential programming tasks, as for example sorting and searching large collections of data. These topics are covered in Sections 4.1-4.2 of your text. The Booksite has corresponding descriptions for both Section 4.1 and Section 4.2 .

Instructions:

To complete this project, follow the steps listed below:

Login to your home directory and launch a terminal window. Use the cd command to change your default directory to csci/273/labs. Keep this window open for future reference.

Launch the NetBeans IDE and raise the New Project dialog. Define the project to be a Java Application named Lab14. Make sure that the project folder will be created within your csci/273/labs folder. As in your previous labs, you should also uncheck the Create Main Class checkbox before you click Finish.

Some of the classes you will be downloading or creating for this lab are clients of textbook library classes that have been collected in the JAR file Booksite2.jar. So your next step is to download this JAR file from the link above, placing it in your csci/273/labs folder. Then be sure to make it part of your Lab14 project classpath, by repeating the steps that you first performed as Step 6 in Lab 08.
As you may recall, Booksite2.jar is organized into packages. So remember that you will need to include import statements, as for example
```
import stdlib.*;
```
as you write the various client applications described below.

The application you will develop in this lab will essentially provide a solution to Exercise 4.1.17 in your text (see page 503, or its equivalent online description in Section 4.1). This is an exercise in comparing the running times of two distinct algorithms for generating random permutations of integers in the range [0,n). One of these was actually covered earlier in Section 1.4 of your text, namely the algorithm used to shuffle a deck of cards. The other is listed in the problem statement.
For your convenience, complete methods that implement both of these algorithms are already available to you in a class named Permute.java, which you should now download into the default package for Lab14. Note that this is not a textbook demo or library class, so you will need to get it from the link above. You will want to scan Permute.java to figure out its API, but your real task here is to develop a client application that makes performance tests of the two methods in Permute and compares their running times as a function of the parameter n.
To get started, create a new main class named Performance in the default packge for Lab14. Then start to fill in its main() method by copying the following code:
```
int n = Integer.parseInt(args[0]);
Stopwatch timer = new Stopwatch();

double tstart = timer.elapsedTime();
int [] p = Permute.permute1(n);
double t = timer.elapsedTime() - tstart;

System.out.printf ("Elapsed time (sec) = %10.6f\n", t);
```
After you add this code, you should expect to see a few red lines right away, but despair not... these will be easy to fix.
Study the code a moment. Note in particular the first red-lined statement, which refers to a class Stopwatch. The problem at the moment is that the compiler does not recognize this class. Stopwatch is a textbook demo from Chapter 3, which has in fact been included in the JAR file Booksite2.jar for your convenience. But the compiler will not know where to find Stopwatch until you add the line
```
import chapter3.Stopwatch;
```
above the header for class Performance. So take care of that now, and hopefully all the red lines will go away. If not, it is time to consult with your lab instructor.
Otherwise, please resume your meditations on this code. It will not surprise you that Stopwatch models a stopwatch. But make sure that you see how the code listed above creates a Stopwatch instance and then uses its elapsedTime() method to find the time required to generate a single permutation of [0,n).

Now Build the complete Lab14 project and try running Performance from the terminal window. Start with, say, a value of 1000 for the command-line argument n, and then experiment with increasing values of n. You will soon discover that n must be rather large before the reported elapsed time becomes anything other than 0.000000. But as soon as you do see a positive value, try running Performance several more times in a row, all with the same value of n. You are likely to discover that the successive running times vary quite a bit.
The problem here is that computers tend to be so fast that time scales of seconds or milliseconds are much too coarse for accurate performance measurements of many common tasks. However, there is a way to improve on these measurements, which is simply to measure the time it takes to perform the same task multiple times in a row, and then compute an average time by dividing the total time by the number of repetitions. The key here is to make the total time interval long enough to measure reliably.
You can easily make this refinement by inserting a repetition loop around the method to be tested, as in the code fragment shown below:
```
double tstart = timer.elapsedTime();     // ms

for (int r = 0; r < reps; r++) {
    int [] p = Permute.permute1(n);
}
double t = (timer.elapsedTime() - tstart) / reps;
```
Note the corresponding change required in the last statement above, where t is now taken to be the average time per repetition.
Of course the variable reps, specifying the number of repetitions, must be defined somewhere. For the purposes of this project, you should provide this value as another command-line argument.
So make these changes, then rebuild the project and repeat your experiments. You should go back to relatively small values of n, for which you were getting 0.000000 times in the first version. See if a large number of repetitions per trial makes a difference. Also perform several trials in a row using the same values of n and reps, then increase reps while keeping n fixed and perform several more trials. Try to determine whether your results for fixed n actually vary less from trial to trial as you increase reps.
After you have drawn some conclusions from your latest results, show your results to your lab instructor.

As matters stand now, your Performance application prints the average running time t for a single value of the parameter n. The obvious next step is to revise Performance so that it can measure the running times for a sequence of values for n, to examine how fast the running time grows as n increases.
In a nutshell, you can do this easily if you surround the code you have written so far in a loop that lets n range over multiple values. A flexible way to set up this loop is to provide three separate parameters, namely
- An initial value of n, say ni.
- A final value of n, say nf, which specifies when to terminate the loop.
- An increment (step size) dn, which is added to n after each loop iteration.
Using these variables, the loop control might be written as follows:
```
for (int n = ni; n <= nf; n += dn)
```
Note that n itself has now become a loop counter in this code, rather than a command-line argument. So in your new version of Performance you should remove the earlier line that accepted n as an argument and replace it with three lines accepting the arguments ni, nf, and dn. But of course you should be sure to keep reps (the repetition count) as an additional argument. (That makes a total of four arguments, in case you have lost track by now.)
Make these changes, rebuild, and run the project. For starters, you might try letting ni = 1000, nf = 10000, and dn = 1000. The repetition count reps should be fairly large, so try a starting value of at least 1000. But you should certainly experiment a bit with these numbers.
Before you proceed, demonstrate your latest version to your lab instructor.

At this stage your project should be giving you useful timing data, but unfortunately only as a sequence of console output lines. Now it is difficult for most people to draw meaningful conclusions directly from a bunch of numbers printed on the screen. In fact, almost all of us can immediately infer much more from a graphical display of the same data (at least if it is presented in a reasonable manner). So your next goal is to beef up Performance to add some simple graphics, using the familiar StdDraw library.
Whenever you start to make a graph, one of your first decisions is about scaling, or choosing suitable ranges for the x and y axes. For the data generated by Performance, we should not really all that interested in the absolute running times t or particular values of n. What should be of concern is the trend shown by this data, namely, how fast does the running time increase with increasing n? So in this application it is all right to scale the axes to any convenient range. Hence you might as well use the default range that StdDraw assumes for both axes, namely [0,1].
Of course you do want the data points to show up in the graph, so they will have to be rescaled to match these defaults. Since the running time t(n) is a monotonically increasing function of n, this is an easy task: simply divide each value of n by its final (maximum) value, and then do likewise for each value of t. But since the maximum values of n and t are the last values generated in the outermost loop, you will have to revise Performance so that it saves all the values of n and t as it computes them, rather than throwing them away as it does now. The best way to accomplish this is to replace the single variables n and t with arrays, so that each iteration of the outermost loop generates matching components n[i] and t[i]. You may also want to rewrite the loop itself, so that it is based on an index variable i that labels each data point, rather than n. Thus the line
```
for (int n = ni; n < nf; n += dn)
```
might be replaced by
```
for (int i = 0; i <= imax; i++)
```
where the variable imax can be derived from ni, nf, and dn:
```
int imax = (nf - ni) / dn;
```
Within the loop itself, each n[i] would then be given by
```
n[i] = ni + i * dn;
```
After all values have been found and stored in the n and t arrays, your new version of Performance can immediately rescale all components so that they range from 0 to 1. After rescaling, plotting the points should be no problem. You should soon be able to produce a plot that looks something like the following:

The argument values used to generate this example were ni=0, nf=10000, dn=500, and reps=1000. Also, the label permute1 was displayed by adding the line
```
StdDraw.text (0.2, 0.8, "permute1");
```

As your final task in this project, revise Performance yet again to display a combined plot comparing the running times of both permutation algorithms provided in the class Permute, not just permute1() as in your present version. Helpful hint: the other method is named permute2().
Your final plot should at least vaguely resemble the image shown below:

When this part is ready, show your results to your lab instructor. You might be asked to draw some conclusions from your latest results, but otherwise you have successfully completed the final lab project for the semester. Congratulations!

What To Submit

When your project is complete and working correctly, demonstrate it to your lab instructor. Then, before you exit NetBeans, clean your Lab14 project. Finally, before you logout, switch back to your terminal and set your default directory back to csci/273/labs. Then create a JAR file of your Lab14 project folder, using the command

jar cf Lab14Project.jar Lab14

Please leave both your Lab14 project folder and Lab14Project.jar in your csci/273/labs directory for the remainder of the semester.