Expression evaluation is the process of translating expressions, such as the following, into machines instructions.
sum += A[i]++ ;
eolist = ( p == NULL || *p == '\0' ) ;
Good expression evaluation is hard to do. Many textbooks present
stack-based solutions.
For example in Patt and Patel’s Introduction to
Computing Systems,
a C statement like “x = y + x*y
”
would be implemented with something like the following:
LD R0,y JSR PUSH LD R0,x JSR PUSH LD R0,y JSR PUSH ;; stack has [y][x][y] JSR OpMult ;; stack has [y][x*y] JSR OpAdd ;; stack has [y+x*y] JSR POP ;; R0 is [y+x*y] ST R0,y
This works fine for simple statements, but it just won’t do for C.
For example, in evaluating A[i]++
, you can’t put the
value of A[i]
on the stack and then call something
like OpPlusPlus
, because ++
needs the
address of A[i]
, not the value of A[i]
.
Similarly for something like
p == NULL || *p == '\0'
,
you can’t put
p == NULL
and
*p == '\0'
on the stack and then call
OpLogicalOR
, because you shouldn’t
even attempt the evaluation of *p == '\0'
when p == NULL
is true.
Instead of searching for an automatic solution to
expression evaluation, we’ll try an ad hoc approach where you
translate a complex expression into a sequence of simple assignments
where only one operator appears on the left hand side.
You'll need to use made-up variable names to do this,
just like those δ
variables used in
the discussion of translating C
control structures.
For a while, we’re going to ignore most of those
complex C expressions that involve lvalues (locations).
This means you are not going to see pointers, the &
operator, or for that
matter structures here. That will come later.
You will need to parse your C code. These means you must pay attention to C’s rules of precedence to know the order in which operators are applied. In a real compiler, this part of the task is usually done with code generated by a parser generator such as yacc or bison.
The simple operators are the arithmetic operators, the bit-wise logical, the relational operators, and even function calls.
For example, a statement such as
“x = z*sin(f*d) + k
”
would be translated to a sequence of C statements similar to
the following:
δ1 = f*d ; δ2 = sin(δ1) ; δ3 = z*δ2 ; x = δ3 + k;
Just notice that the there is only one operator on the right hand side of each statement.
Very simple statements, such as
“x = δ3 + k
”
can be implemented with a couple of instructions of
your target machine instruction set.
Some operators, such as multiplication or division, may need to be translated
into calls to specialized functions written for your machine architecture.
For example, f*d
may need to be replaced with something like
_MultiplyDouble(f, d)
.
The implement of function calls is very specific on the computer’s
ABI (Application Binary Interface) and will not be covered here.
When you implement the relational operators, such as >
and ==
, you must make sure that these operators
return either 0, for false, or 1, for true.
There are three C operators that have short circuits, that is,
they may not evaluate all of their operands before returning a result.
These operators are &&
, ||
,
and the ? :
ternary operator.
These can be implemented using C’s if
construct which
can them be translating using
the control structure rules.
The following table shows the translation rules for these operators.
τ = exp1 && exp2 ; |
|
τ = exp1 || exp2 ; |
if (exp1) τ = 1 ; else if (exp2) τ = 1 ; else τ = 0 ; |
τ = exp1 ? exp2 : exp3 ; |
|
Keep in mind that those expressions in the above example must not be evaluated before their time.
As an example, let’s look at an expression that is
true if either x
equals 5 or both y
and z
equal 5.
τ = (x==5 || y==5 && z==5) ;
In the first step of translation, the ||
is “simplified”.
if (x==5) τ = 1 ; else if (y==5 && z==5) τ = 1 ; else τ = 0 ;
Now, we must simply the if
’s using the rules of
Translating C to C: Control structures.
We start with the first one.
δ1 = (x==5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: if (y==5 && z==5) τ = 1 ; else τ = 0 ; λ2:
Now let’s try the second one.
δ1 = (x==5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: δ2 = (y==5 && z==5) ; if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
That’s already looking bad and we still have the expression
with &&
. Let’s try that one now.
δ1 = (x==5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: if (! (y==5)) δ2 = 0 ; else if (! (z==5)) δ2 = 0 ; else δ2 = 1 ; if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
Now the if
after λ1
δ1 = (x==5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: δ3 = (y!=5) ; if (δ3 == 0) goto λ5 δ2 = 0 ; goto λ6 ; λ5: if (! (z==5)) δ2 = 0 ; else δ2 = 1 ; λ6: if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
And finally, the if
after λ5.
δ1 = (x==5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: δ3 = (y!=5) ; if (δ3 == 0) goto λ5 ; δ2 = 0 ; goto λ6 ; λ5: δ4 = (y!=5) ; if (δ4 == 0) goto λ7 ; δ2 = 0 ; goto λ8 ; λ7: δ2 = 1 ; λ8: λ6: if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
Merging those redundant labels helps a little, but not much.
There’s just no way to implement these short-circuits without
some serious goto
’s.
δ1 = (x==5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: δ3 = (y!=5) ; if (δ3 == 0) goto λ5 ; δ2 = 0 ; goto λ6 ; λ5: δ4 = (y!=5) ; if (δ4 == 0) goto λ7 ; δ2 = 0 ; goto λ6 ; λ7: δ2 = 1 ; λ6: if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ2: λ3: τ = 0 ; λ2:
However, with some optimization, you can do quite a bit better.
δ1 = (x==5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: δ3 = (y==5) ; if (δ3 == 0) goto λ3 ; δ4 = (y==5) ; if (δ4 == 0) goto λ3 ; τ = 1 ; goto λ2: λ3: τ = 0 ; λ2:
The &
(address-of) operator can often be implemented by
using the LC-3 instruction LEA
rather than LDR
.
For example, if the C statement “N = M
uses “LDR R0,R5,#10
” to retreive
M
; then the C statement
“P = &M
”
uses “LEA R0,R5,#10
” to retreive
&M
.
Consider each structure as having its own symbol table. This was mentioned in the Structures in C lecture.