Getting ready
In this week we are going to write and optimize an assembly language
program related to the
Collatz conjecture
or Syracuse problem:
Assuming integers of unlimited size,
is there any positive integer n
that results in the following program getting in an infinite loop?
x = n ; r = 0 ; while (x != 1) { if (x%2 ==1) { x = 3*x + 1 ; } else { x = x/2 ; } r = r + 1 ; }
In the process of writing this program in PIC assembly with 16-bit integers (admittedly a pretty small sample of integers), you’ll learn something about flow control and optimization in an assembly language and take another look at the sizes of integers that can be stored in fixed fields.
Getting started
We’re going to start this lab with a C++ program.
Why C++? Because Java doesn’t allow the goto
and
C and Java don’t support operator overloading
and it never hurts to experience yet another programming language.
Start up NetBeans and make the menu choices File ⇒ New Project... . At the New Project window select C/C++ and C/C++ Application. and then press Finish. At the next screen, call your main file collatz and be sure you are creating a C++ program.
In the Projects tab, bring up the file collatz.cpp under the Source Files expander. Delete the starter C++ code and copy the initial C++ code for this lab into your NetBeans program window.
#include <iostream> using namespace std; int main(int argc, char** argv) { short n, x, r ; cout << "Enter your number" << endl ; cin >> n ; x = n ; r = 0 ; while (x != 1) { if (x%2 == 1) { x = 3*x + 1 ; } else { x = x/2 ; } r = r + 1 ; } cout << r << " iterations required for " << n << endl ; return 0; }
For some reason, I had to build this program twice before the red exclamation points disappeared.
This looks quite a lot like a C program except for the
using namespace
and the odd I/O.
We’ll leave the using namespace
for a junior-level
course. In C++ I/O is often done with
>>
for input and
<<
for output. This is an example
of operator overloading which allows a programmer to associate
class methods with program operators.
When using a C++ istream
object
(similar to the InputStream
of Java),
>>
means read not left shift.
Run your program with the input number 77. You’ll get 22 for the number of iterations: 77 → 232 → 116 → 58 → 29 → 88 → 44 → 22 → 11 → 34 → 17 → 52 → 26 → 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1. As you can see the Collatz sequence can grow and shrink in unexpected ways. Download a spreadsheet that allows you to compute the sequence and try out some values.
A problem of range
Getting negative
Run your program again but give it the input 447. No answer? Your program is in a loop. You’ll have to terminate it by pressing the little X on the right side of the bottom of the NetBeans window.
On the 52th iteration with input 447, x
reached 13121.
The 53th x
should be 39364. But it isn t.
What is the next value of x
in your code at this point?
(Hint: 215 is 32768.)
How does declaring your program’s variables as unsigned short
s
fix this problem?
You and some person sitting next to you must agree on an answer to this question to pass this checkpoint.
Also, add the unsigned
modifier to your declarations
so that your program works when you type in 447.
It should print 97.
Getting too big
You probably see where this is going.
This time type in 703.
Your program prints 75, but the real answer is 170.
Change your variable declarations from unsigned short
to unsigned int
to see what we mean.
This time the problem occurs when x
is 22841.
Again, what is the next value of x
in your code at this point?
(Hint: 216 is 65536 and 3*22841+1 is 68524.)
How does declaring x
as an unsigned int
fix this problem?
You and some person sitting next to you must agree on an answer to pass this checkpoint.
Is there hope?
Since the second week of class
you have known that twos-complement numbers have a limited range.
There will always be a number that requires more bits than the variable can hold.
With x
declared as an
unsigned int
, overflow will occur at 8791350 which is only
0.002 of the way to 4294967296 or 232.
This is one of the challenges of proving the Collatz conjecture,
which has a £1000 reward for a formal proof
Some numbers just grow a lot.
If you want to test the Collatz conjecture with some large numbers, you can use the GNU Multiple Precision Arithmetic Library which implements big numbers limited only by the size of your computer memory. You can have millions of bits!
To do this, you need to do the following:
- Make sure the GMP library is installed on your computers. (Linux users can install the libgmp-dev package.)
- Include the GMP C++ .h file at the beginning of your program
#include <gmpxx.h>
- Change the declaration of your variables to be of type
mpz_class
. - Add the library gmpxx to your linker properties. This means adding -lgmpxx to Libraries on the Linker properties of your project. You may need a demonstration of this one.
After you have done this, you can run your program on huge numbers such as one septendecillion. (The number of iterations is 981.) Even for a really large number such as 101848, the number of iterations is only 33059.
A note on big numbers
No one found bignum packages particularly interested until modular exponentiation of
big integers became the basis of cryptography. Now every serious programming language
has a big integer package.
The big integer package of Java is
BigInteger
. The usefulness of operator overloading can be seen when using big integers.
In C++, you say 3*x + 1
in a big number calculation.
In Java, you would define a variable like THREE
:
private static final BigInteger THREE = new BigInteger(new byte[]{3}) ;
so that you can say
THREE.multiply(x).add(BigInteger.ONE)
.
But then in C++ the programmer who wrote the C++ GMP library had to
write a function named operator+
.
A checkoff?
Everyone in the lab doesn’t need to use big integers, but at least watch someone try.
In any case, go back to unsigned short
declarations in your
program before continuing with this lab.
You can also just reload the original program if you
are unsure how to return to the starting point.
Optimizations in C++
Now we are going to make a series of improvements to your C++. Many of these will prepare the program for translation to PIC assembly. After each improvement run your program with 447 as input. Be sure you get 97 back.
Reducing a test and a loop
If x
is odd, we know that
we know that 3*x + 1
will be even.
The most common optimization to the Collatz program
is avoiding the unneeded test after the odd case.
So go ahead and compute (3*x + 1)/2
and avoid the test.
You will need to change your code to
increment r
by 2 in the odd case
and by 1 in the even case.
Make the odd optimization and test your program with 447.
Reduction of strength
One significant way to improve the performance of a program is with
strength reduction, the replacement of a complex operation, such as
multiplication, with a simpler operation, such as addition.
For example, there is a modulus operation, x%2
, in our
program. Because the second operand is a power of two,
this %
can be replaced with a simple bit-wise and
operation, (x&1)
, because we are only interested in the
last bit of x
.
Note the parentheses!
Due to C/C++/Java’s rules of operator precedence,
x&1 == 1
is x&(1==1)
is x&1
is x&1 != 0
is x/2 != 0
.
It doesn’t cause a problem in this case but get in the habit of
including extra parentheses when bit-wise operators are mixed with relational
operators. Often NetBeans will remind
you to do so.
Similarly, division by powers of two can be replaced by
much faster right shifts.
Thus x/2
can be replaced by
(x>>1)
.
This means that
(3*x + 1)/2
can be replaced by (3*x + 1)>>1
.
However, because x
is odd,
the expression can be replaced
by x + (x>>1) + 1
.
You may need to think about this one for while.
(3*x + 1)/2
is
(2*x + x + 1)/2
is
(2*x)/2 + (x + 1)/2
is
x + (x + 1)/2
.
But (x + 1)/2
is
x/2 + 1
when x
is odd,
so (3*x + 1)/2
is
x + (x>>1) + 1
.
Again, C/C++/Java precedence rules make the parentheses required.
One additional advantage of this change is reduces the possibility of
over overflow. For example, in 16-bit twos-complement,
(3*x + 1)/2
overflows to 2988
when x
is
22841, but x + (x>>1) + 1
is 34262.
Make the strength reduction optimization and test your program using 447.
Take care that you add all the needed parentheses.
Common subexpression elimination
Common subexpression elimination is an optimization technique
that reduces the size of program and prevents the calculation of
a recurring expression more than once.
It is similar to refactoring. Find a common expression, such as
x>>1
, assign it to a temporary variable, such as
t
, and then replace x>>1
with
t
in subsequent sections of your program.
Be sure to declare t
as an usigned int
at the beginning of your program.
Apply and test common expression elimination.
You should also show your program to a lab instructor before going on. We all need similar programs at this point.
Return of the goto
Assembly languages don’t support the while
or even the
block if
. We must now return to the pre-FORTRAN 77 era.
Start by commenting out all the statements with an if
, else
or while
.
Also comment out the two lines containing matching right curly braces for these constructs.
You should also line up all of your C++ statements with the
initial assignment statements. Get rid of indentation.
Pretend it’s 1973.
Build and run your program with these changes made. It should always print 3 as the number of required iterations.
Replacing the selection block
We are going to replace the if
–else
with
code using
the goto
,
a control construct, forbidden in Java and strongly discouraged in C++.
Right after the old if
, add the following conditional goto:
if ((x&1)==0) goto elsepnt ;
Expect NetBeans to complain about an unresolved
identifier.
Note that must negate the test from (x&1)==1
to (x&1)==0
.
Right after the old else
, add the following label:
elsepnt:
No matter how hard NetBeans objects, align
this label at the start of the line.
Now NetBeans should stop complaining
about an unresolved identifier. You have added code to skip the if
,
or odd, case when x
is even.
Now we need to skip the else
,
or even, case when x
is odd.
Right before the old else
statement, add the
the following unconditional goto:
goto joinpnt ;
Expect complaints from NetBeans.
After the right brace closing the
the old else
, add the following label:
joinpnt:
This process of replacing an if
then
with
a conditional goto, unconditional goto and a couple of labels
is a pattern than can be applied to all selection statements.
You just need to start with the innermost if
blocks
and work your way out.
This is what compilers do for a living.
Make sure you can still build your program.
Replacing the iteration block
This time our program needs to leave the while
block when
x
is 1.
Right after the while
statement, add the following conditional
goto:
if (x==1) goto exitpnt ;
And right after the closing right brace for the while
add the following label:
exitpnt:
Right before the while
statement, add the following label:
looppnt:
And right before the closing right brace for the while
add the following
unconditional goto:
goto looppnt ;
This goto
will be placed between the joinpnt
and exitpnt
labels.
Again, this is a general pattern for
replacing a while
with
a conditional goto, unconditional goto and a couple of labels.
Yes, it is ugly and hard to follow, but it is very close to what your compiled code looks like. Test your program with 447 to see that it works.
Optimizing at the assembly control level
You may notice that your
goto joinpnt ;
transfers control to another goto
, this time to looppoint
.
Your program could go directly to looppoint
and avoid executing
the second goto.
Try this one out. You can even eliminate the joinpnt
label.
There is one other loop optimization that is frequently done. Notice that
the end of the loop is a goto
to a test at the beginning of the loop.
To make the loop one instruction shorter, some compilers will place the
test at the end of the loop. This means replacing the unconditional branch
at the end of the loop
goto looppnt ;
exitpnt:
with code similar to
looppnt:
if (x!=1) goto entrypnt ;
Note that the program goes to startpnt
when x
is not 1.
The code at the beginning of the loop
looppnt:
if (x==1) goto exitpnt ;
is replaced with
goto looppnt:
entrypnt:
This change does not reduce the size of a program, but it does reduce the number of instructions executed within the loop. In a really tight loop, such as one to add up the elements of an array, this can reduce the number of instructions within a loop from three to two. A 33% improvement is pretty good.
You can try this one out if you wish. We did. If you do make this change, be sure to test your changes.
In any case, you might remove all those if
, exit
,
while
and right curly brace comments to show the pure
goto
nature of your program.
There was a time when students in beginning programming courses wrote code like this.
Collatz in PIC
Now we are going to complete the transformation of your C++ program into PIC assembler code.
The basic idea
Given a selection statement looking like the following:
if (expression) { statements1 } else { statements2 }
Transform it to code using the goto
to something like this:
if (! expression) goto λ1 ; statements1 goto λ2 ; λ1 : statements2 λ2 :
Starting with an iterative statement such as:
while (expression) { statements }
Generate a goto
directed code similar to this:
λ1: if (! expression) goto λ2; statements goto λ1 ; λ2:
However, many compilers prefer the following which makes the loop a tiny bit smaller:
goto λ2; λ1: statements λ2: if (! expression) goto λ1 ;
Starting a new PIC assembly project
Create a PIC assembler language project using MPLAB X following the same procedure used in the Introducing MPLAB X & PIC assembly lab. Add a source program called collatz.s and place the following initial PIC assembler program into your source file.
.include "p24Hxxxx.inc" .global __reset .equiv initN,447 .bss n: .space 2 x: .space 2 r: .space 2 t: .space 2 ugh: .space 2 .text __reset: mov #initN,W0 mov W0,n ;; initialize n ;;; Your code starts here ;;; Your code ends here bigloop: bra bigloop .end
Now, go to back to NetBeans
and copy the C++ statements between
cin >> n ;
and
cout << r << " iterations required for " << n << endl ;
into your assembler program running in
MPLAB X.
Because xc16-gcc is both a PIC assembler and a C compiler you will get far few error messages than you expect. However, comment out everything except the four labels in the C++ you just inserted.
Build your program. It must assemble before you can go on.
The simple transfers
Although the PIC has a goto
instruction, we want to use
the bra
instruction for consistency.
Replace each C++ unconditional goto
with a bra
instruction.
There should be two of these.
Build your program. Two down and nine to go.
Low hanging arithmetic fruit
As you know all binary arithmetic operators involve a
working register. The simplest of these,
such as “add f,WREG
“
use the WREG
register.
However, adding one or two to a file register
r
doesn’t require the use of a working register.
Replace the two increments of r
with single PIC instructions.
Also, initialize r
with a single instruction.
Build your program. Five down and six to go.
Register allocation
Note that x
and t
are local variables in your
program. They appear in neither input or output statements.
Rather than storing them in memory, let’t just keep their values in
registers W2
and W3
.
Start by commenting out the .space
definitions for
x
and t
.
You can still built your program since neither
x
and t
will be referenced.
Just add some comments explaining that
W2
is playing the role of x
and
W3
is playing the role of t
.
Now, we’re going to implement the three assignments
to x
in PIC assembly.
Go ahead a replace “x = n ;
” with
a single PIC instruction. Remember, the value of n
is stored in
W0
at the beginning of the program.
Another easy one to replace is “x = t ;
”.
You can’t do “x = x + t + 1 ;
”
in one PIC instruction, but two add
instructions will do fine.
Build it. Eight down and three to go.
An experienced assembly language program would have used a
.equiv
for x
and W2
.
and for t
and W3
.
The would have allowed an instruction like
“mov t,x
” to be used in place of
“mov W3,W2
”.
However, that’s a bit too much on your second day of PIC assembly programming.
Being shifty
Because x
is an unsigned
number we must use
the lsr
instruction to shift x
one place before storing it into
t
.
This is a shift of only one place using two registers.
Use the simplest lsr
that can be used for this purpose.
Build it. Nine down and two to go.
The conditional transfers
We are left with two conditional goto
instructions.
In one of these cases we are comparing x
with one, but
we can only do that by comparing x-1
with zero.
At the machine level, comparison are made by subtracting two numbers
and comparing the result with zero.
In each case a simple operation is evaluated.
In once instance, it’s x&1
and in the other,
x-1
.
In both cases your should store the result of the operation into
W7
, a throwaway register for our program.
Go ahead and write the PIC code to do these two operations,
one per conditional goto
,
using a immediate operand.
The two instructions will look very similar.
The Z
will be set by each of these two instructions.
Your program should use
“bra z
” to branch if the Z
is set
and
“bra nz
” to branch if the Z
is not set.
Build it, but don’t run it yet.
This is the hardest point in the lab. Look at your four new PIC instruction. Make sure they match what you are trying to do. Look for similarities and differences in the two pairs of instructions.
The test
Put a breakpoint at the instruction following the bigloop
label.
Start the debugger on your program. If all works, the value of r
should be 0x0061
or 97
.
If it doesn’t work, you need to do some real debugging.
Is this the fastest?
After all these changes, my loop has 11 instructions.
However, when you left shift x
(W2
) one place and store
the result in t
,
the Z
bit is set if x
is 1 (or 0) and the C
bit is set if x
is odd.
You can use this hack
to eliminate two instructions:
the add
used to
test if x
is 1 and the
and
used to test if x
is odd.
This brings the loop down to 9 instructions.
You do have to move the lsr
to the location
where sub
is now to make this work.
You can do even better than this.
Because the C
bit is known to be
on in the odd case, you can replace the remaining add
with an
addc
, add with carry, and then remove the inc
for
register W2
.
Now its down to eight.
Try this one if you wish.
If you are interested in correctness
We really ought to have a test for overflow right after computing
x + t + 1
.