Translating C to C: Simple Expressions

Expression evaluation is the process of translating expressions, such as the following, into machines instructions.

sum += A[i]++ ;
eolist = ( p == NULL || *p == '\0' ) ;

Problem with many textbook method

Good expression evaluation is hard to do. Many textbook present stack based solutions. For example in Patt and Patel's Introduction to Computing Systems a C statement like "x = y + x*y" would be implemented with something like the following:

        LD    R0,y
        JSR   PUSH
        LD    R0,x
        JSR   PUSH
        LD    R0,y
        JSR   PUSH        ;; stack has [y][x][y]
        JSR   OpMult      ;; stack has [y][x*y]
        JSR   OpAdd       ;; stack has [y+x*y]
        JSR   POP         ;; R0 is [y+x*y]
        ST    R0,y

This works fine for simple statements, but it just won't do for C. For example in evaluating A[i]++, you can't put the value of A[i] on the stack and then call something like OpPlusPlus, because ++ needs the address of A[i], not the value of A[i].

Similarly for something like p == NULL || *p == '\0', you can't put p == NULL and *p == '\0' on the stack and then call OpLogicalOR, because you shouldn't even attempt the evaluation of *p == '\0' when p == NULL is true.

The C-to-C solution

Instead of searching for an automatic solution to expression evaluation, try an ad hoc approach where you translate a complex expression into a sequence of simple assignments where only one operator appears on the left hand side. You'll need to use made-up variable names to do this, just like those δ variables used in the discussion of translating C control structures.

On this page, we're going to ignore most of the complex C expressions that involve lvalues (locations). This means you are going to see pointers, the & operator, or for that matter structures here. That will come later.

Parsing

You will need to parse your C code. These means you must pay attention to C's rules of precedence to know the order in which operators are applied. In a real compiler, this part of the task is usually done with code generated by a parser generator such as yacc or bison.

The simple operators

The simple operators are the arithmetic operators, the bit-wise logical, the relational operators, and even function calls.

Getting the idea

For example, a statement such as "x = z*sin(f*d) + k" would be translated to a sequence of C statements similar to the following:

δ1 = f*d ;
δ2 = sin(δ1) ;
δ3 = z*δ2 ;
x = δ3 + k;

Just notice that the there is only one operator on the right hand side of each statement.

The really simple and not-so-simple operators

Very simple statements, such as "x = δ3 + k" can be implemented with a couple of instructions of your target machine instruction set. Some operators, such as multiplication or division, may need to be translated into calls to specialized functions written for your machine architecture. For example, f*d may need to be replaced with something like _MultiplyDouble(f, d). The implement of function calls is very specific on the computer's ABI (Application Binary Interface) and will not be covered here.

When you implement the relational operators, such as > and ==, you must make sure that these operators return either 0, for false, or 1, for true.

The more complex operators

There are three C operators that have short circuits, that is, they may not evaluate all of their operands before returning a result. These operators are &&, ||, and the ? : ternary operator. These can be implemented using C's if construct which can them be translating using the control structure rules.

The following table shows the translation rules for these operators.

τ = exp1 && exp2 ;

if (! (exp1))
   τ = 0 ;
else if (! (exp2))
   τ = 0 ;
else
   τ = 1 ;

τ = exp1 || exp2 ;
if (exp1)
   τ = 1 ;
else if (exp2)
   τ = 1 ;
else
   τ = 0 ;
τ = exp1 ? exp2 : exp3 ;

if (exp1)
   τ = exp2 ;
else
   τ = exp3 ;

An example

Keep in mind that those expressions in the above example must not be evaluated before their time.

As an example, let's look at an expression that is true if exactly one of x and y is equal to 5.

τ = (x==5 && y!=5) || (x!=5 && y==5) ;

In the first step of translation, the || is "simplified".

if (x==5 && y!=5)
  τ = 1 ;
else if (x!=5 && y==5)
  τ = 1 ;
else
  τ = 0 ;

Now, we must simply the if's using the rules of Translating C to C: Control structures. We start with the first one.

  δ1 = (x==5 && y!=5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  if (x!=5 && y==5)
    τ = 1 ;
  else
    τ = 0 ;
λ2:

Now let's try the second one.

  δ1 = (x==5 && y!=5) ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  δ2 = (x!=5 && y==5) ;
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

That's already looking bad and we still have two expressions with &&. Let's go ahead and do both.

  if (! (x==5))
    δ1 = 0 ;
  else if ( ! (y!=5))
    δ1 = 0 ;
  else
    δ1 = 1 ;
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  if (! (x!=5))
    δ2 = 0 ;
  else if (! (y==5))
    δ2 = 0 ;
  else
    δ2 = 1 ;
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

Now the first if.

  δ3 = ! (x==5) ;
  if (δ3 == 0) goto λ5 ;
  δ1 = 0 ;
  goto λ6:
λ5:
  if ( ! (y!=5))
    δ1 = 0 ;
  else
    δ1 = 1 ;
λ6:
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  if (! (x!=5))
    δ2 = 0 ;
  else if (! (y==5))
    δ2 = 0 ;
  else
    δ2 = 1 ;
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

Now the second if, which used to be inside the else.

  δ3 = ! (x==5) ;
  if (δ3 == 0) goto λ5 ;
  δ1 = 0 ;
  goto λ6:
λ5:
  δ4 = ! (! (y!=5)) ;  
  if (δ4 == 0) goto λ7 ;
  δ1 = 0 ;
  goto λ8 ;
λ7:
  δ1 = 1 ;
λ8:
λ6:
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  if (! (x!=5))
    δ2 = 0 ;
  else if (! (y==5))
    δ2 = 0 ;
  else
    δ2 = 1 ;
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

This is getting ridiculous. Go head and finish off the entire remaining if else if else.

  δ3 = ! (x==5) ;
  if (δ3 == 0) goto λ5 ;
  δ1 = 0 ;
  goto λ6:
λ5:
  δ4 = ! (y!=5) ;  
  if (δ4 == 0) goto λ7 ;
  δ1 = 0 ;
  goto λ8 ;
λ7:
  δ1 = 1 ;
λ8:
λ6:
  if (δ1 == 0) goto λ1 ;
  τ = 1 ;
  goto λ2 ;
λ1:
  δ5 = ! (x!=5) ;
  if (δ5 == 0) goto λ9 ;
  δ2 = 0 ;
  goto λ10 ;
λ9 ;
  δ6 = ! (y==5) ;
  if (δ6 == 0) goto λ11 ;
  δ2 = 0 ;
  goto λ12 ;
λ11 ;
  δ2 = 1 ;
λ12 ;
λ10 ;
  if (δ2 == 0) goto λ3 ;
  τ = 1 ;
  goto λ4:
λ3:
  τ = 0 ;
λ4:
λ2:

Perhaps this would be clearer if did a little "optimization". Let's rename all those variables from using numbers to letters in a sensible order. Also rename the labels to used letters in a sensible order and eliminate the double labeling of two locations of code. And finally get rid of those !s in front of all the assignments.

  δa = (x!=5) ;
  if (δa == 0) goto λa ;
  δb = 0 ;
  goto λc:
λa:
  δc = (y==5) ;  
  if (δc == 0) goto λb ;
  δb = 0 ;
  goto λc ;
λb:
  δb = 1 ;
λc:
  if (δb == 0) goto λd ;
  τ = 1 ;
  goto λi ;
λd:
  δd = (x==5) ;
  if (δd == 0) goto λe ;
  δe = 0 ;
  goto λg ;
λe ;
  δf = (y!=5) ;
  if (δf == 0) goto λf ;
  δe = 0 ;
  goto λg ;
λf ;
  δe = 1 ;
λg ;
  if (δe == 0) goto λh ;
  τ = 1 ;
  goto λi:
λh:
  τ = 0 ;
λi:

Don't do this at home. Or better yet, start with this C statement.

τ = (x==5) != (y==5) ;