Translating C to C: Control Structures

The ultimate goal

The C programming language, like all modern programming languages, has three basic types on control structures: sequence, selection and iteration.

The sequence control structure is the statement block: A collection of statements inside curly braces, { and }. The selection control structures are if, if – else and switch. The iteration control structures are while, do – while and for. All of these control structures are common to C, C++ and Java.

C and C++ also support the goto which can be used to make a transfer to a labeled statement. Java does not have a goto, but it does support the use of labels with the break and select which can result in goto-like code.

Most C/C++/Java programmers realize that the for can be easily rewritten using a while and that a select can be replaced with a series of if then statements. We’re going to do something similar, but more drastic, in this handout. We are going to replace all the control structures of C using the goto and two very simple versions of the if statement:
if (τ) goto λ ;
or
goto λ ;
where τ is an integer variable and λ is a label.

Of course, the real goal is to translate C into a machine or assembly language. To do that, we eventually need to learn how to allocate storage, evaluate expressions, and call functions. But let’s forget that for now.

Translating the `if` statement

The if statement has the following form where statement represents the body of the if statement.

if (expression)
  statement

It’s not hard to transform this code into a sequence of C statements similar to the following:

  int τ_x = expression ;
  if (τ_x) goto λ_y ;
  statement
λ_y :

In this code segment, τ_x is a C variable that is uniquely generated for the purpose of this translation. Similarly, λ_y represents a uniquely generated name for a location.

A nested example

These rules must be applied recursively to translate a program. Suppose we have been asked to translate the following more complex C if statement.

if (n % 4 == 0) {
  ++julianLeap ;
  if (n % 400 == 0 || n % 100 != 0) {
    ++gregorianLeap ;
  }
}

In this case, two unique data variables, τ₁ and τ₂, along with two unique labels, λ₁ and λ₂, would be needed.

The translated code would look something like the following.

  τ₁ = ( n % 4 == 0 ) ;
  if (τ₁ == 0) goto λ₁ ;
  ++julianLeap ;
  τ₂ = ( n % 400 == 0 || n % 100 != 0 ) ;
  if (τ₂ == 0) goto λ₂ ;
  ++gregorianLeap ;
λ2 :
λ1 :

Notice how the inner if is translated inside the outer if.

Anything else?

The if—else can be translated by adding two labels in the code. For example, consider the following rather abstract C code.

if (expression)
  statement1
else
  statement2

It could be transformed as follows:

  int τ₁ = expression ;
  if (τ₁ == 0) goto λ₁ ;
  statement1
  goto λ₂ ;
λ₁ :
  statement2
λ₂ :

So the following example

if (a > b)
  m = a ;
else
  m = b ;

would be changed to

  int τ₁ = (a > b) ;
  if (τ₁ == 0) goto λ₁ ;
  m = a ;
  goto λ₂ ;
λ₁ :
  m = b ;
λ2 :

The switch

The switch statement isn’t pretty, so you can’t expect its transformation to be easily explained. However, the switch can be viewed as series of if choices which select the target code for each choice. The break statements are replaced with goto’s to the end of the switch.

Let’s do an example using the following silly C code.

switch (sizeNum) {
case '0':
case '1':
  sizeChar = 's' ;
  break ;
case '2':
  sizeChar = 'm' ;
  break ;
default:
  sizeChar = 'l' ;
}

This can expressed switch-less as:

  if (sizeNum == '0' || sizeNum == '1')
    goto λ₁ ;
  else if (sizeNum == '2')
    goto λ₂ ;
  else
    goto λ₃ ;

λ₁:
  sizeChar = 's' ;
  goto λ₄ ;

λ₂:
  sizeChar = 'm' ;
  goto λ₄ ;

λ₃:
  sizeChar = 'l' ;

λ₄:

The switch of C mimics the computed goto of FORTRAN which mimics the branch table. Evidently the designers of C and FORTRAN felt like they couldn’t convince programmers to give up assembler unless there was some form of a branch table.

Translating iterative statements

Translating the while isn’t hard. The code begins with a test that evaluates the continuation condition and exits the loop if it is false. At the end of the loop is a goto back to the beginning.

Consider the following abstract loop

while (expression)
  statement

It can be translated into if controlled code as:

λ₁:
  if (! expression) goto λ₂;
  statement
  goto λ₁:
λ₂:

Many compilers generate the following because it makes the loop a tad faster.

goto λ₂
λ₁:
  statement
λ₂:
  if (expression) goto λ₁ ;

Fortunately for us, the for is often described using the while. Consider a for statement like the following:

for(init ; condition ; increment)
  statement

It can be translated into the following while statement. Be sure to put the increment after the for statement.

init ;
while(condition) {
  statement ;
  increment ;
}

And thus the following two sections of C code do the same.

for(i=0 ; i<10 ; ++i)
  sum = sum + i ;

i=0 ;
while(i<10) {
  sum = sum + i ;
  ++i ;
}

We’re going to leave the do—while as an exercise to the reader. It’s really not hard. Just make the test at the end of the loop rather than the beginning.

Taking a `break` or `continue`ing

If a break statement is used inside a loop statement, it needs to replaced by a goto that leaves the loop, without performing any of the tests for continuing the loop.

If a continue statement is used inside a while statement, it needs to replaced by a goto that branches to the beginning of the the loop. In this case, the tests for continuing the loop must be performed. A continue within a for statement should be replaced to a goto to the code where the for loop increment statement is performed.