Translating C to C: Expressions

Expression evaluation is the process of translating expressions, such as the following, into machines instructions.

sum += A[i]++ ;
eolist = ( p == NULL || *p == '\0' ) ;

Problem with many textbook methods

Good expression evaluation is hard to do. Many textbooks present stack-based solutions. For example in Patt and Patel’s Introduction to Computing Systems, a C statement like “x = y + x*y” would be implemented with something like the following:

        LD    R0,y
        JSR   PUSH
        LD    R0,x
        JSR   PUSH
        LD    R0,y
        JSR   PUSH        ;; stack has [y][x][y]
        JSR   OpMult      ;; stack has [y][x*y]
        JSR   OpAdd       ;; stack has [y+x*y]
        JSR   POP         ;; R0 is [y+x*y]
        ST    R0,y

This works fine for simple statements, but it just won’t do for C. For example, in evaluating A[i]++, you can’t put the value of A[i] on the stack and then call something like OpPlusPlus, because ++ needs the address of A[i], not the value of A[i].

Similarly for something like p == NULL || *p == '\0', you can’t put p == NULL and *p == '\0' on the stack and then call OpLogicalOR, because you shouldn’t even attempt the evaluation of *p == '\0' when p == NULL is true.

The C-to-C solution

Instead of searching for an automatic solution to expression evaluation, we’ll try an ad hoc approach where you translate a complex expression into a sequence of simple assignments where only one operator appears on the left hand side. You'll need to use made-up variable names to do this, just like those δ variables used in the discussion of translating C control structures.

For a while, we’re going to ignore most of those complex C expressions that involve lvalues (locations). This means you are not going to see pointers, the & operator, or for that matter structures here. That will come later.

Parsing

You will need to parse your C code. These means you must pay attention to C’s rules of precedence to know the order in which operators are applied. In a real compiler, this part of the task is usually done with code generated by a parser generator such as yacc or bison.

The simple operators

The simple operators are the arithmetic operators, the bit-wise logical, the relational operators, and even function calls.

Getting the idea

For example, a statement such as “x = z*sin(f*d) + k” would be translated to a sequence of C statements similar to the following:

δ1 = f*d ;
δ2 = sin(δ1) ;
δ3 = z*δ2 ;
x = δ3 + k;

Just notice that the there is only one operator on the right hand side of each statement.

The really simple and not-so-simple operators

Very simple statements, such as “x = δ3 + k” can be implemented with a couple of instructions of your target machine instruction set. Some operators, such as multiplication or division, may need to be translated into calls to specialized functions written for your machine architecture. For example, f*d may need to be replaced with something like _MultiplyDouble(f, d). The implement of function calls is very specific on the computer’s ABI (Application Binary Interface) and will not be covered here.

When you implement the relational operators, such as > and ==, you must make sure that these operators return either 0, for false, or 1, for true.

The more complex operators

There are three C operators that have short circuits, that is, they may not evaluate all of their operands before returning a result. These operators are &&, ||, and the ? : ternary operator. These can be implemented using C’s if construct which can them be translating using the control structure rules.

The following table shows the translation rules for these operators.

τ = exp1 && exp2 ;

if (! (exp1))
   τ = 0 ;
else if (! (exp2))
   τ = 0 ;
else
   τ = 1 ;

τ = exp1 || exp2 ;
if (exp1)
   τ = 1 ;
else if (exp2)
   τ = 1 ;
else
   τ = 0 ;
τ = exp1 ? exp2 : exp3 ;

if (exp1)
   τ = exp2 ;
else
   τ = exp3 ;

An example

Keep in mind that those expressions in the above example must not be evaluated before their time.

As an example, let’s look at an expression that is true if either x equals 5 or both y and z equal 5.

τ = (x==5 || y==5 && z==5) ;

In the first step of translation, the || is “simplified”.

if (x==5)
  τ = 1 ;
else if (y==5 && z==5)
  τ = 1 ;
else
  τ = 0 ;

Now, we must simply the if’s using the rules of Translating C to C: Control structures. We start with the first one.

  δ1 = (x==5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  if (y==5 && z==5)
    τ = 1 ;
  else
    τ = 0 ;
λ2:

Now let’s try the second one.

  δ1 = (x==5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  δ2 = (y==5 && z==5) ;
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

That’s already looking bad and we still have the expression with &&. Let’s try that one now.

  δ1 = (x==5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  if (! (y==5))
    δ2 = 0 ;
  else if (! (z==5))
    δ2 = 0 ;
  else
    δ2 = 1 ;
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

Now the if after λ1

  δ1 = (x==5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  δ3 = (y!=5) ;
  if (δ3 == 0) goto λ5
  δ2 = 0 ;
  goto λ6 ;
λ5:
  if (! (z==5))
    δ2 = 0 ;
  else
    δ2 = 1 ;
λ6:
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

And finally, the if after λ5.

  δ1 = (x==5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  δ3 = (y!=5) ;
  if (δ3 == 0) goto λ5 ;
  δ2 = 0 ;
  goto λ6 ;
λ5:
  δ4 = (y!=5) ;
  if (δ4 == 0) goto λ7 ;
  δ2 = 0 ;
  goto λ8 ;
λ7:
  δ2 = 1 ;
λ8:
λ6:
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

Merging those redundant labels helps a little, but not much. There’s just no way to implement these short-circuits without some serious goto’s.

  δ1 = (x==5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  δ3 = (y!=5) ;
  if (δ3 == 0) goto λ5 ;
  δ2 = 0 ;
  goto λ6 ;
λ5:
  δ4 = (y!=5) ;
  if (δ4 == 0) goto λ7 ;
  δ2 = 0 ;
  goto λ6 ;
λ7:
  δ2 = 1 ;
λ6:
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ2:
λ3:
  τ = 0 ;
λ2:

However, with some optimization, you can do quite a bit better.

  δ1 = (x==5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  δ3 = (y==5) ;
  if (δ3 == 0) goto λ3 ;
  δ4 = (y==5) ;
  if (δ4 == 0) goto λ3 ;
  τ = 1 ;
  goto λ2:
λ3:
  τ = 0 ;
λ2:

Pointers

The & (address-of) operator can often be implemented by using the LC-3 instruction LEA rather than LDR. For example, if the C statement “N = M uses “LDR  R0,R5,#10” to retreive M; then the C statement “P = &M” uses “LEA  R0,R5,#10” to retreive &M.

Structures

Consider each structure as having its own symbol table. This was mentioned in the Structures in C lecture.