Expression evaluation is the process of translating expressions, such as the following, into machine instructions.
sum += A[i]++ ;
eolist = ( p == NULL || *p == '\0' ) ;
The classic, though rare, method
In the second programming course, expression evaluation is often
done as an example of programming with stacks.
For example an expression such as
a*x*x + b*x + c
will be parsed into the equivalent
postfix, or Reverse Polish, notation, in this case
a x x * * b x * + b c +
.
In many computer organization textbooks, this is the first step
into generating stack based assembly code similar to the following:
PUSH a PUSH x PUSH x MULT MULT PUSH b PUSH x MULT PLUS PUSH c PLUS
In a stack-based computer like the
Burroughs B5000 from the 1960’s,
an instruction like MULT
would remove the top two elements
of the stack and replace them with their product.
Stack-based scientific calculators, such as the early HP35
and today’s TI-83, operate similarly.
The popular PDF file format is also largely built on
the
stack-based programming language PostScript programming language.
Expression evaluation with stacks can be used on the PIC, which has
PUSH
and POP
instructions,
but more efficient code can be generated by using registers to
store intermediate values.
Also, C and Java have some expressions that are difficult to
perform on stack.
For example, in evaluating A[i]++
, you can’t just put the
value of A[i]
on the stack and then perform the ++
because ++
needs the
address of A[i]
, not the value of A[i]
.
Similarly for something like
p == NULL || *p == '\0'
,
you can’t put
p == NULL
and
*p == '\0'
on the stack and then call
the stack operator
OR
, because you shouldn’t
even attempt the evaluation of *p == '\0'
when p == NULL
is true.
The C-to-C solution
Instead of searching for an automatic solution to
expression evaluation, we’ll try an ad hoc approach where you
translate a complex expression into a sequence of simple assignments
where only one operator appears on the left hand side.
We’ll need to use made-up variable names to do this,
just like those τ
variables used in
the discussion of translating C
control structures.
For a while, we’re going to ignore most of those
complex C expressions that involve lvalues (locations).
This means you are not going to see pointers, the &
operator, or
structures here. That will come later.
Parsing
You will need to parse your C code. These means you must pay attention to C’s rules of precedence to know the order in which operators are applied. In a real compiler, this part of the task is usually done with code generated by a compiler compiler such as yacc or bison.
The simple operators
The simple operators are the arithmetic, bit-wise logical, and relational operators. We’ll also discuss function calls, although the function stack will be presented a bit later.
For example, a statement such as
“x = z*sin(f*d) + k
”
would be translated to a sequence of C statements similar to
the following:
τ1 = f*d ; τ2 = sin(τ1) ; τ3 = z*τ2 ; x = τ3 + k;
Just notice that the there is only one operator on the right hand side of each statement.
Very simple statements, such as
“x = τ3 + k
”
can be implemented with a couple of instructions of
your target machine instruction set.
Some operators, such as multiplication or division, may need to be implemented
with calls to specialized functions written for your machine architecture.
For example, f*d
may need to be replaced with something like
_MultiplyDouble(f, d)
if our computer, like the PIC,
does not support a floating point multiply operation.
Some operators will also need to be translated into short sequences of
instructions. Perhaps, a 32-bit addition will be performed as
two 16-bit additions.
When you implement the relational operators, such as >
and ==
, you must make sure that these operators
return either 0, for false, or 1, for true, as required by
the C standard. For example, here is a faithful PIC implementation
of the C statement
“r = x > y
;”.
CLR r ;; r <- 0 MOV x,WREG SUBR y,WREG ;; WREG <- x - y BRA LE,1f ;; go to the next 1: INC r ;; ++r only if x > y 1:
The more complex operators
There are three C short circuit operators
which may not evaluate all of their
operands before yielding a result.
These operators are &&
, ||
,
and the ? :
ternary operator.
These can be implemented using C’s if
construct which
can them be translating using
the control structure rules.
The following table shows the translation rules for these operators.
τ = exp1 && exp2 ; |
|
τ = exp1 || exp2 ; |
if (exp1) τ = 1 ; else if (exp2) τ = 1 ; else τ = 0 ; |
τ = exp1 ? exp2 : exp3 ; |
|
An example
Keep in mind that those expressions in the above example must not be evaluated before their time.
As an example, let’s look at some C code to set
a 16-bit integer m
to the larger of the
16-bit integers x
, y
, and z
.
if (x >= y && x >= z) { m = x ; } else if (y >= z) { m = y ; } else { m = z ; }
If you use the rules of the control structures section, which ignores expression evaluation, you’ll get something like the following:
int τ1 = x >= y && x >= z ; if (τ1 == 0) goto λ1 ; m = x ; goto λ3 ; λ1: int τ2 = y >= z ; if (τ2 == 0) goto λ2 ; m = y ; goto λ3 ; λ2: m = z ; λ3:
Now we have to worry about evaluating the expression
with the &&
. This is going to get very messy.
First, we get the following:
int τ1 ; if (! (x >= y)) { τ1 = 0 ; } else if (! (x >= z)) { τ1 = 0 ; } else { τ1 = 1 ; } if (τ1 == 0) goto λ1 ; m = x ; goto λ3 ; λ1: int τ2 = y >= z ; if (τ2 == 0) goto λ2 ; m = y ; goto λ3 ; λ2: m = z ; λ3:
Then we go back and apply the rules of the
if else
, and it gets even worse.
int τ1 ; int τ3 = ! (x >= y) ; if (τ3 == 0) goto λ4 ; τ1 = 0 ; goto λ6 ; λ4: int τ4 = ! (x >= z) ; if (τ4 == 0) goto λ5 ; τ1 = 0 ; goto λ6 ; λ5: τ1 = 1 ; λ6: if (τ1 == 0) goto λ1 ; m = x ; goto λ3 ; λ1: int τ2 = y >= z ; if (τ2 == 0) goto λ2 ; m = y ; goto λ3 ; λ2: m = z ; λ3:
Huh?
Yep. That’s pretty silly. C compilers really aren t that simple minded. Most of them optimize code. First, let’s simply those comparisions.
int τ1 ; int τ3 = (x < y) ; if (τ3 == 0) goto λ4 ; τ1 = 0 ; goto λ6 ; λ4: int τ4 = (x < z) ; if (τ4 == 0) goto λ5 ; τ1 = 0 ; goto λ6 ; λ5: τ1 = 1 ; λ6: if (τ1 == 0) goto λ1 ; m = x ; goto λ3 ; λ1: int τ2 = y >= z ; if (τ2 == 0) goto λ2 ; m = y ; goto λ3 ; λ2: m = z ; λ3:
There are some branching inefficiences here.
For example, in one place τ1
is set to
0 before a transfer to λ6
where
τ1
is immediately compared to 0 before
transfering to λ1
.
Let’s speed that up a little. We can even eliminate τ1
.
int τ3 = (x < y) ; if (τ3 == 0) goto λ4 ; goto λ1 ; λ4: int τ4 = (x < z) ; if (τ4 == 0) goto λ5 ; goto λ1 ; λ5: m = x ; goto λ3 ; λ1: int τ2 = y >= z ; if (τ2 == 0) goto λ2 ; m = y ; goto λ3 ; λ2: m = z ; λ3:
Next we can remove a few of those unconditional goto
’s
by optimizing the tests.
int τ3 = (x < y) ; if (τ3 != 0) goto λ1 ; int τ4 = (x < z) ; if (τ4 != 0) goto λ1 ; m = x ; goto λ3 ; λ1: int τ2 = y >= z ; if (τ2 == 0) goto λ2 ; m = y ; goto λ3 ; λ2: m = z ; λ3:
You might find this a little easier to read if we just
got rid of τ2
,
τ3
, and τ4
.
if (x < y) goto λ1 ; if (x < z) goto λ1 ; m = x ; goto λ3 ; λ1: if (y < z) goto λ2 ; m = y ; goto λ3 ; λ2: m = z ; λ3:
That’s what any self respecting C compiler would do.
What’s left?
- Functions
- Parameter passing
- Returned values
- Local variables
- Arrays
- Structures
- Pointers