Expression evaluation is the process of translating expressions, such as the following, into machines instructions.
sum += A[i]++ ;
eolist = ( p == NULL || *p == '\0' ) ;
Good expression evaluation is hard to do. Many textbook present
stack based solutions.
For example in Patt and Patel's Introduction to
Computing Systems
a C statement like "x = y + x*y
"
would be implemented with something like the following:
LD R0,y JSR PUSH LD R0,x JSR PUSH LD R0,y JSR PUSH ;; stack has [y][x][y] JSR OpMult ;; stack has [y][x*y] JSR OpAdd ;; stack has [y+x*y] JSR POP ;; R0 is [y+x*y] ST R0,y
This works fine for simple statements, but it just won't do for C.
For example in evaluating A[i]++
, you can't put the
value of A[i]
on the stack and then call something
like OpPlusPlus
, because ++
needs the
address of A[i]
, not the value of A[i]
.
Similarly for something like
p == NULL || *p == '\0'
,
you can't put
p == NULL
and
*p == '\0'
on the stack and then call
OpLogicalOR
, because you shouldn't
even attempt the evaluation of *p == '\0'
when p == NULL
is true.
Instead of searching for an automatic solution to
expression evaluation, try an ad hoc approach where you
translate a complex expression into a sequence of simple assignments
where only one operator appears on the left hand side.
You'll need to use made-up variable names to do this,
just like those δ
variables used in
the discussion of translating C
control structures.
On this page, we're going to ignore most of the
complex C expressions that involve lvalues (locations).
This means you are going to see pointers, the &
operator, or for that
matter structures here. That will come later.
You will need to parse your C code. These means you must pay attention to C's rules of precedence to know the order in which operators are applied. In a real compiler, this part of the task is usually done with code generated by a parser generator such as yacc or bison.
The simple operators are the arithmetic operators, the bit-wise logical, the relational operators, and even function calls.
For example, a statement such as
"x = z*sin(f*d) + k
"
would be translated to a sequence of C statements similar to
the following:
δ1 = f*d ; δ2 = sin(δ1) ; δ3 = z*δ2 ; x = δ3 + k;
Just notice that the there is only one operator on the right hand side of each statement.
Very simple statements, such as
"x = δ3 + k
"
can be implemented with a couple of instructions of
your target machine instruction set.
Some operators, such as multiplication or division, may need to be translated
into calls to specialized functions written for your machine architecture.
For example, f*d
may need to be replaced with something like
_MultiplyDouble(f, d)
.
The implement of function calls is very specific on the computer's
ABI (Application Binary Interface) and will not be covered here.
When you implement the relational operators, such as >
and ==
, you must make sure that these operators
return either 0, for false, or 1, for true.
There are three C operators that have short circuits, that is,
they may not evaluate all of their operands before returning a result.
These operators are &&
, ||
,
and the ? :
ternary operator.
These can be implemented using C's if
construct which
can them be translating using
the control structure rules.
The following table shows the translation rules for these operators.
τ = exp1 && exp2 ; |
|
τ = exp1 || exp2 ; |
if (exp1) τ = 1 ; else if (exp2) τ = 1 ; else τ = 0 ; |
τ = exp1 ? exp2 : exp3 ; |
|
Keep in mind that those expressions in the above example must not be evaluated before their time.
As an example, let's look at an expression that is
true if exactly one of x
and y
is equal
to 5.
τ = (x==5 && y!=5) || (x!=5 && y==5) ;
In the first step of translation, the ||
is "simplified".
if (x==5 && y!=5) τ = 1 ; else if (x!=5 && y==5) τ = 1 ; else τ = 0 ;
Now, we must simply the if
's using the rules of
Translating C to C: Control structures.
We start with the first one.
δ1 = (x==5 && y!=5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: if (x!=5 && y==5) τ = 1 ; else τ = 0 ; λ2:
Now let's try the second one.
δ1 = (x==5 && y!=5) ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: δ2 = (x!=5 && y==5) ; if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
That's already looking bad and we still have two expressions
with &&
. Let's go ahead and do both.
if (! (x==5)) δ1 = 0 ; else if ( ! (y!=5)) δ1 = 0 ; else δ1 = 1 ; if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: if (! (x!=5)) δ2 = 0 ; else if (! (y==5)) δ2 = 0 ; else δ2 = 1 ; if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
Now the first if
.
δ3 = ! (x==5) ; if (δ3 == 0) goto λ5 ; δ1 = 0 ; goto λ6: λ5: if ( ! (y!=5)) δ1 = 0 ; else δ1 = 1 ; λ6: if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: if (! (x!=5)) δ2 = 0 ; else if (! (y==5)) δ2 = 0 ; else δ2 = 1 ; if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
Now the second if
, which used to
be inside the else
.
δ3 = ! (x==5) ; if (δ3 == 0) goto λ5 ; δ1 = 0 ; goto λ6: λ5: δ4 = ! (! (y!=5)) ; if (δ4 == 0) goto λ7 ; δ1 = 0 ; goto λ8 ; λ7: δ1 = 1 ; λ8: λ6: if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: if (! (x!=5)) δ2 = 0 ; else if (! (y==5)) δ2 = 0 ; else δ2 = 1 ; if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
This is getting ridiculous. Go head and finish off the
entire remaining if
else
if
else
.
δ3 = ! (x==5) ; if (δ3 == 0) goto λ5 ; δ1 = 0 ; goto λ6: λ5: δ4 = ! (y!=5) ; if (δ4 == 0) goto λ7 ; δ1 = 0 ; goto λ8 ; λ7: δ1 = 1 ; λ8: λ6: if (δ1 == 0) goto λ1 ; τ = 1 ; goto λ2 ; λ1: δ5 = ! (x!=5) ; if (δ5 == 0) goto λ9 ; δ2 = 0 ; goto λ10 ; λ9 ; δ6 = ! (y==5) ; if (δ6 == 0) goto λ11 ; δ2 = 0 ; goto λ12 ; λ11 ; δ2 = 1 ; λ12 ; λ10 ; if (δ2 == 0) goto λ3 ; τ = 1 ; goto λ4: λ3: τ = 0 ; λ4: λ2:
Perhaps this would be clearer if did a little "optimization".
Let's rename all those
variables from using numbers to letters in a sensible order.
Also rename the labels to used letters in a sensible order
and eliminate the double labeling of two locations of code.
And finally get rid of those !
s in front of all the assignments.
δa = (x!=5) ; if (δa == 0) goto λa ; δb = 0 ; goto λc: λa: δc = (y==5) ; if (δc == 0) goto λb ; δb = 0 ; goto λc ; λb: δb = 1 ; λc: if (δb == 0) goto λd ; τ = 1 ; goto λi ; λd: δd = (x==5) ; if (δd == 0) goto λe ; δe = 0 ; goto λg ; λe ; δf = (y!=5) ; if (δf == 0) goto λf ; δe = 0 ; goto λg ; λf ; δe = 1 ; λg ; if (δe == 0) goto λh ; τ = 1 ; goto λi: λh: τ = 0 ; λi:
Don't do this at home. Or better yet, start with this C statement.
τ = (x==5) != (y==5) ;