CSCI 255 — PIC assembly for Collatz

Getting ready

In this week we are going to write and optimize an assembly language program related to the Collatz conjecture or Syracuse problem: Assuming integers of unlimited size, is there any positive integer n that results in the following program getting in an infinite loop?

  x = n ;
  r = 0 ;
  while (x != 1) {
    if (x%2 ==1) {
      x = 3*x + 1 ;
    } else {
      x = x/2 ;
    }
    r = r + 1 ;
  }

In the process of writing this program in PIC assembly with 16-bit integers (admittedly a pretty small sample of integers), you’ll learn something about flow control and optimization in an assembly language and take another look at the sizes of integers that can be stored in fixed fields.

Getting started

We’re going to start this lab with a C++ program. Why C++? Because Java doesn’t allow the goto and C and Java don’t support operator overloading and it never hurts to experience yet another programming language.

Start up NetBeans and make the menu choices FileNew Project... . At the New Project window select C/C++ and C/C++ Application. and then press Finish. At the next screen, call your main file collatz and be sure you are creating a C++ program.

In the Projects tab, bring up the file collatz.cpp under the Source Files expander. Delete the starter C++ code and copy the initial C++ code for this lab into your NetBeans program window.

#include <iostream>
using namespace std;

int main(int argc, char** argv) {
  short n, x, r ;

  cout << "Enter your number" << endl ;
  cin >> n ;
    
  x = n ;
  r = 0 ;
  while (x != 1) {
    if (x%2 == 1) {
      x = 3*x + 1 ;
    } else {
      x = x/2 ;
    }
    r = r + 1 ;
  }

  cout << r << " iterations required for " << n << endl ;

  return 0;
}

For some reason, I had to build this program twice before the red exclamation points disappeared.

This looks quite a lot like a C program except for the using namespace and the odd I/O. We’ll leave the using namespace for a junior-level course. In C++ I/O is often done with >> for input and << for output. This is an example of operator overloading which allows a programmer to associate class methods with program operators. When using a C++ istream object (similar to the InputStream of Java), >> means read not left shift.

Run your program with the input number 77. You’ll get 22 for the number of iterations: 77 → 232 → 116 → 58 → 29 → 88 → 44 → 22 → 11 → 34 → 17 → 52 → 26 → 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1. As you can see the Collatz sequence can grow and shrink in unexpected ways. Download a spreadsheet that allows you to compute the sequence and try out some values.

A problem of range

Getting negative

Run your program again but give it the input 447. No answer? Your program is in a loop. You’ll have to terminate it by pressing the little X on the right side of the bottom of the NetBeans window.

On the 52th iteration with input 447, x reached 13121. The 53th x should be 39364. But it isn t.

What is the next value of x in your code at this point? (Hint: 215 is 32768.) How does declaring your program’s variables as unsigned shorts fix this problem?

You and some person sitting next to you must agree on an answer to this question to pass this checkpoint.

Also, add the unsigned modifier to your declarations so that your program works when you type in 447. It should print 97.

Getting too big

You probably see where this is going. This time type in 703. Your program prints 75, but the real answer is 170. Change your variable declarations from unsigned short to unsigned int to see what we mean.

This time the problem occurs when x is 22841. Again, what is the next value of x in your code at this point? (Hint: 216 is 65536 and 3*22841+1 is 68524.) How does declaring x as an unsigned int fix this problem?

You and some person sitting next to you must agree on an answer to pass this checkpoint.

Is there hope?

Since the second week of class you have known that twos-complement numbers have a limited range. There will always be a number that requires more bits than the variable can hold. With x declared as an unsigned int, overflow will occur at 8791350 which is only 0.002 of the way to 4294967296 or 232. This is one of the challenges of proving the Collatz conjecture, which has a £1000 reward for a formal proof Some numbers just grow a lot.

If you want to test the Collatz conjecture with some large numbers, you can use the GNU Multiple Precision Arithmetic Library which implements big numbers limited only by the size of your computer memory. You can have millions of bits!

To do this, you need to do the following:

After you have done this, you can run your program on huge numbers such as one septendecillion. (The number of iterations is 981.) Even for a really large number such as 101848, the number of iterations is only 33059.

A note on big numbers

No one found bignum packages particularly interested until modular exponentiation of big integers became the basis of cryptography. Now every serious programming language has a big integer package. The big integer package of Java is BigInteger. The usefulness of operator overloading can be seen when using big integers. In C++, you say 3*x + 1 in a big number calculation. In Java, you would define a variable like THREE:
    private static final BigInteger THREE = new BigInteger(new byte[]{3}) ;
so that you can say THREE.multiply(x).add(BigInteger.ONE) .

But then in C++ the programmer who wrote the C++ GMP library had to write a function named operator+.

A checkoff?

Everyone in the lab doesn’t need to use big integers, but at least watch someone try.

In any case, go back to unsigned short declarations in your program before continuing with this lab. You can also just reload the original program if you are unsure how to return to the starting point.

Optimizations in C++

Now we are going to make a series of improvements to your C++. Many of these will prepare the program for translation to PIC assembly. After each improvement run your program with 447 as input. Be sure you get 97 back.

Reducing a test and a loop

If x is odd, we know that we know that 3*x + 1 will be even. The most common optimization to the Collatz program is avoiding the unneeded test after the odd case. So go ahead and compute (3*x + 1)/2 and avoid the test. You will need to change your code to increment r by 2 in the odd case and by 1 in the even case.

Make the odd optimization and test your program with 447.

Reduction of strength

One significant way to improve the performance of a program is with strength reduction, the replacement of a complex operation, such as multiplication, with a simpler operation, such as addition. For example, there is a modulus operation, x%2, in our program. Because the second operand is a power of two, this % can be replaced with a simple bit-wise and operation, (x&1), because we are only interested in the last bit of x.

Note the parentheses! Due to C/C++/Java’s rules of operator precedence, x&1 == 1 is x&(1==1) is x&1 is x&1 != 0 is x/2 != 0. It doesn’t cause a problem in this case but get in the habit of including extra parentheses when bit-wise operators are mixed with relational operators. Often NetBeans will remind you to do so.

Similarly, division by powers of two can be replaced by much faster right shifts. Thus x/2 can be replaced by (x>>1).

This means that (3*x + 1)/2 can be replaced by (3*x + 1)>>1. However, because x is odd, the expression can be replaced by x + (x>>1) + 1. You may need to think about this one for while. (3*x + 1)/2 is (2*x + x + 1)/2 is (2*x)/2 + (x + 1)/2 is x + (x + 1)/2. But (x + 1)/2 is x/2 + 1 when x is odd, so (3*x + 1)/2 is x + (x>>1) + 1. Again, C/C++/Java precedence rules make the parentheses required.

One additional advantage of this change is reduces the possibility of over overflow. For example, in 16-bit twos-complement, (3*x + 1)/2 overflows to 2988 when x is 22841, but x + (x>>1) + 1 is 34262.

Make the strength reduction optimization and test your program using 447.

Take care that you add all the needed parentheses.

Common subexpression elimination

Common subexpression elimination is an optimization technique that reduces the size of program and prevents the calculation of a recurring expression more than once. It is similar to refactoring. Find a common expression, such as x>>1, assign it to a temporary variable, such as t, and then replace x>>1 with t in subsequent sections of your program. Be sure to declare t as an usigned int at the beginning of your program.

Apply and test common expression elimination.

You should also show your program to a lab instructor before going on. We all need similar programs at this point.

Return of the goto

Assembly languages don’t support the while or even the block if. We must now return to the pre-FORTRAN 77 era.

Start by commenting out all the statements with an if, else or while. Also comment out the two lines containing matching right curly braces for these constructs. You should also line up all of your C++ statements with the initial assignment statements. Get rid of indentation. Pretend it’s 1973.

Build and run your program with these changes made. It should always print 3 as the number of required iterations.

Replacing the selection block

We are going to replace the ifelse with code using the goto, a control construct, forbidden in Java and strongly discouraged in C++.

Right after the old if, add the following conditional goto:
    if ((x&1)==0) goto elsepnt ;
Expect NetBeans to complain about an unresolved identifier. Note that must negate the test from (x&1)==1 to (x&1)==0.

Right after the old else, add the following label:
  elsepnt:
No matter how hard NetBeans objects, align this label at the start of the line.

Now NetBeans should stop complaining about an unresolved identifier. You have added code to skip the if, or odd, case when x is even.

Now we need to skip the else, or even, case when x is odd. Right before the old else statement, add the the following unconditional goto:
    goto joinpnt ;
Expect complaints from NetBeans.

After the right brace closing the the old else, add the following label:
  joinpnt:

This process of replacing an if then with a conditional goto, unconditional goto and a couple of labels is a pattern than can be applied to all selection statements. You just need to start with the innermost if blocks and work your way out. This is what compilers do for a living.

Make sure you can still build your program.

Replacing the iteration block

This time our program needs to leave the while block when x is 1. Right after the while statement, add the following conditional goto:
    if (x==1) goto exitpnt ;
And right after the closing right brace for the while add the following label:
  exitpnt:

Right before the while statement, add the following label:
  looppnt:
And right before the closing right brace for the while add the following unconditional goto:
    goto looppnt ;
This goto will be placed between the joinpnt and exitpnt labels.

Again, this is a general pattern for replacing a while with a conditional goto, unconditional goto and a couple of labels.

Yes, it is ugly and hard to follow, but it is very close to what your compiled code looks like. Test your program with 447 to see that it works.

Optimizing at the assembly control level

You may notice that your
    goto joinpnt ;
transfers control to another goto, this time to looppoint. Your program could go directly to looppoint and avoid executing the second goto.

Try this one out. You can even eliminate the joinpnt label.

There is one other loop optimization that is frequently done. Notice that the end of the loop is a goto to a test at the beginning of the loop. To make the loop one instruction shorter, some compilers will place the test at the end of the loop. This means replacing the unconditional branch at the end of the loop
    goto looppnt ;
  exitpnt:
with code similar to
  looppnt:
    if (x!=1) goto entrypnt ;
Note that the program goes to startpnt when x is not 1.

The code at the beginning of the loop
  looppnt:
    if (x==1) goto exitpnt ;
is replaced with
    goto looppnt:
  entrypnt:

This change does not reduce the size of a program, but it does reduce the number of instructions executed within the loop. In a really tight loop, such as one to add up the elements of an array, this can reduce the number of instructions within a loop from three to two. A 33% improvement is pretty good.

You can try this one out if you wish. We did. If you do make this change, be sure to test your changes.

In any case, you might remove all those if, exit, while and right curly brace comments to show the pure goto nature of your program.

There was a time when students in beginning programming courses wrote code like this.

Collatz in PIC

Now we are going to complete the transformation of your C++ program into PIC assembler code.

The basic idea

Given a selection statement looking like the following:

if (expression) {
  statements1
} else {
  statements2
}

Transform it to code using the goto to something like this:

  if (! expression) goto λ1 ;
  statements1
  goto λ2 ;
λ1 :
  statements2
λ2 :

Starting with an iterative statement such as:

while (expression) {
  statements
}

Generate a goto directed code similar to this:

λ1:
  if (! expression) goto λ2;
  statements
  goto λ1 ;
λ2:

However, many compilers prefer the following which makes the loop a tiny bit smaller:

  goto λ2;
λ1:
  statements
λ2:
  if (! expression) goto λ1 ;

Starting a new PIC assembly project

Create a PIC assembler language project using MPLAB X following the same procedure used in the Introducing MPLAB X & PIC assembly lab. Add a source program called collatz.s and place the following initial PIC assembler program into your source file.

          .include  "p24Hxxxx.inc"
          .global   __reset
          .equiv    initN,447
          .bss
n:        .space    2
x:        .space    2
r:        .space    2
t:        .space    2
ugh:      .space    2
          .text
__reset:
          mov       #initN,W0
          mov       W0,n           ;; initialize n

;;; Your code starts here


;;; Your code ends here

bigloop:  bra       bigloop
         .end

Now, go to back to NetBeans and copy the C++ statements between
    cin >> n ;
and
    cout << r << " iterations required for " << n << endl ;
into your assembler program running in MPLAB X.

Because xc16-gcc is both a PIC assembler and a C compiler you will get far few error messages than you expect. However, comment out everything except the four labels in the C++ you just inserted.

Build your program. It must assemble before you can go on.

The simple transfers

Although the PIC has a goto instruction, we want to use the bra instruction for consistency. Replace each C++ unconditional goto with a bra instruction. There should be two of these.

Build your program. Two down and nine to go.

Low hanging arithmetic fruit

As you know all binary arithmetic operators involve a working register. The simplest of these, such as “add f,WREG“ use the WREG register.

However, adding one or two to a file register r doesn’t require the use of a working register. Replace the two increments of r with single PIC instructions. Also, initialize r with a single instruction.

Build your program. Five down and six to go.

Register allocation

Note that x and t are local variables in your program. They appear in neither input or output statements.

Rather than storing them in memory, let’t just keep their values in registers W2 and W3. Start by commenting out the .space definitions for x and t. You can still built your program since neither x and t will be referenced. Just add some comments explaining that W2 is playing the role of x and W3 is playing the role of t.

Now, we’re going to implement the three assignments to x in PIC assembly. Go ahead a replace “x = n ;” with a single PIC instruction. Remember, the value of n is stored in W0 at the beginning of the program.

Another easy one to replace is “x = t ;”.

You can’t do “x = x + t + 1 ;” in one PIC instruction, but two add instructions will do fine.

Build it. Eight down and three to go.

An experienced assembly language program would have used a .equiv for x and W2. and for t and W3. The would have allowed an instruction like “mov t,x” to be used in place of “mov W3,W2”. However, that’s a bit too much on your second day of PIC assembly programming.

Being shifty

Because x is an unsigned number we must use the lsr instruction to shift x one place before storing it into t. This is a shift of only one place using two registers. Use the simplest lsr that can be used for this purpose.

Build it. Nine down and two to go.

The conditional transfers

We are left with two conditional goto instructions. In one of these cases we are comparing x with one, but we can only do that by comparing x-1 with zero. At the machine level, comparison are made by subtracting two numbers and comparing the result with zero.

In each case a simple operation is evaluated. In once instance, it’s x&1  and in the other, x-1. In both cases your should store the result of the operation into W7, a throwaway register for our program. Go ahead and write the PIC code to do these two operations, one per conditional goto, using a immediate operand. The two instructions will look very similar.

The Z will be set by each of these two instructions. Your program should use “bra z” to branch if the Z is set and “bra nz” to branch if the Z is not set.

Build it, but don’t run it yet.

This is the hardest point in the lab. Look at your four new PIC instruction. Make sure they match what you are trying to do. Look for similarities and differences in the two pairs of instructions.

The test

Put a breakpoint at the instruction following the bigloop label.

Start the debugger on your program. If all works, the value of r should be 0x0061 or 97.

If it doesn’t work, you need to do some real debugging.

Is this the fastest?

After all these changes, my loop has 11 instructions. However, when you left shift x (W2) one place and store the result in t, the Z bit is set if x is 1 (or 0) and the C bit is set if x is odd.

You can use this hack to eliminate two instructions: the add used to test if x is 1 and the and used to test if x is odd. This brings the loop down to 9 instructions. You do have to move the lsr to the location where sub is now to make this work.

You can do even better than this. Because the C bit is known to be on in the odd case, you can replace the remaining add with an addc, add with carry, and then remove the inc for register W2. Now its down to eight.

Try this one if you wish.

If you are interested in correctness

We really ought to have a test for overflow right after computing x + t + 1.