CSCI 202 in-class lab — Finite State Machines

Our problem for the day

Our task is very similar to Homework 4 except that we are looking for CSCI 107

Breaking up the task

Modularization is always a good idea. Go ahead and create a Java application, but this time write a main routine that reads one line at a time, using Scanner, and passes that line on to processLine which will do the checking.

For now, have processLine use a Scanner to break the line into tokens which are printed, one per line. This is similar to what we did in Lab 3 to break up the lines of the ZIP table.

Adding state

In order to solve problems like this, you need to write a loop that remembers. Modify your program so that it prints a token only if the preceding token was "CSCI". You can do this by introducing a boolean variable called lastWordCSCI that remembers if the previous token was "CSCI".

Using state

Now modify your program so that it prints the entire line only if there are two successive tokens "CSCI" and "107"".

In your first attempt, the line may be printed several times if the two tokens occur more than once. Modify processLine so that this does not happen. Add an additional variable called matched that is set to true the first time "CSCI" and "107"" occur in order.

Finite state machines and enumerations

At this point you should have two variables, lastWordCSCI and matched, that track the state of the loop. Let’s replace these by a single variable that uses Java enumerations.

To do this, you must first define the enumeration with a line similar to the following, which must appear outside any of your module declarations. By the way, the values of the enumeration are considered constants so they are written in all capital letters.

    enum ProcessState { INITIAL, LASTWORDCSCI, MATCHED } ;

Now you must declare and initialize a state variable. The syntax for this is odd and wordy.

        ProcessState loopState = ProcessState.INITIAL ;    

Now you got to think a little. Your loop will move through three states ProcessState.INITIAL, ProcessState.LASTWORDCSCI and ProcessState.MATCHED. You can use the usual == operator to test the value of loopState and the usual = operator to set the value of loopState.

Try it out.

Enumerations for tokens

What we are doing is called parsing. Usually parsing is accomplished by representing the possible values of the tokens with an enumerated type. Let’s try this out.

First, define an appropriate enumerated type.

    enum TokenType { CSCI, INTROCOURSE, OTHER } ;

Now write a method called word2token that is passes a token and returns the appropriate TokenType. Use the following code to get started.

    private static TokenType word2token (String token) {
        if (token.equals("CSCI")) {
            return TokenType.CSCI ;
        }
    }

Now you need to modify processLine to use these tokens. This will involve adding a call to word2token with the loop of processLine. Something like the following:

            TokenType token = word2token(lineStream.next()) ;

Your program must allow allow test against tokens rather than Strings. For example, an expression like word.equals("CSCI") may need to be changed to token==TokenType.CSCI .

Was this worth the trouble?

Modify your program so that "CSCI" can be in lower or upper case and the course number can be 107, 181, or 182. All you have to do is change word2token.

How would the pros do this?

They would probably use a regular expression.

    public static void main(String[] args) {
        Scanner inStream = new Scanner(System.in) ;
        Pattern regex = Pattern.compile("(^|\\s)(CSCI|csci)\\s+(107|181|182)(\\s|$)") ;
        while (inStream.hasNext()) {
            String line = inStream.nextLine() ;
            if (regex.matcher(line).find()) {
                System.out.println(line) ;
            }
        }
    }