Subprogram control

The Copy-Rule

Copy-rule: the effect of subprogram call is same as if code were copied into calling routine. But this is not general:

no recursion
no implicit calls
no co-routines
no scheduled calls - at a later point of time
no multi-tasking - single thread of execution

Activation record created anew at each call - and destroyed at return.

Need an (CIP, CEP) (current instruction pointer, current environment pointer) pair

At call: activation record created, CEP points to it, CIP points to first instruction of subprogram.

At return: activaton record destroyed, CEP set to previous EP, CIP set to previous IP.

The (IP,EP) pair fix the environment under which you execute. But how do we resolve non-local references?

Static and Dynamic Scope

Scope: range of statements over which a variable is known

Static scope - Scope is dependent on the syntax of the program.

Dynamic scope - Scope is determined by the execution of the program.

Static nested scope - A variable is accessible in the procedure it is declared in, and all procedures internal to that procedure, except a new declaration of that variable name in an internal procedure will eliminate the new variable's scope from the scope of the outer variable.

A variable declared in a procedure is local in that procedure; otherwise it is global.

A review of static scope rules:

Q and T are declarations of procedures within P, so scope of names Q and T is same as scope of declaration a.
R and S are declarations of procedures in Q.
U is a declaration of a procedure in T.

Problem is: How to manage this execution stack?

Two pointers perform this function:
1. Dynamic link pointer points to activation record that called (invoked) the new activation record. It is used for returning from the procedure to the calling procedure.
2. Static link pointer points to the activation record that is global to the current activation record (i.e., points to the activation record of the procedure containing the declaration of this procedure).

Example

In R: C := B+A; C local, A and B global
For each variable, get pointer to proper activation record.
Assume AR is current activation record pointer (R).
1. B is one level back:
Follow AR.SL to get AR containing B.
Get R-value of B from fixed offset L-value
2. A is two levels back:
Follow (AR.SL).SL to get act. rec. containing A.
Add R-value of A from fixed offset L-value
3. C is local. AR points to correct activation record.
Store sum into L-value of C

Advantages and disadvantages of Dynamic binding

Provides a great deal of program flexibility
When a name is used, the declaration that applies to that name can not be determined by merely examining the program.
Type may be different each time the routine is called. Thus, dynamic scoping clashes with static typing - type checking must be done at execution time.
Every variable must have a descriptor.
Descriptors must be of varying size (structures take more room to describe than primitive types).
Often implemented by interpreter (slower anyway, so increased time to check types isn't a factor)

Languages are designed so that a particular binding may be performed at a given time - but it is up to the implementor.

Without static scoping, nothing about nonlocal names can be determined during translation.
Must be done at execution time (and redone each time statement is encountered)

Advantages and Disadvantages of Static Scope

Aids readability
Allows production of considerably more efficient executable code.
Convenient access to globals
Too much data access (no way to avoid knowing about enclosing scopes)

   begin 
   boolean b := true;
   procedure p;
   begin
      print (b)
   end;
   begin
      boolean b := false
      P
   end
end

What is printed?

static: true
dynamic: false

Parameters and Parameter Transmission

When subprogram is called with an actual-parameter expression, the expression is evaluated (usually) at the time of call
The data object that results becomes the actual parameter

Establishing the correspondence

positional - good for small lists
by explicit name
Example: (Ada) SUB(Y=>ACT_Y, MAX=>100)
May be default values (Ada, C++) if no parameter is supplied

Order of evaluation of parameters is often not specified by language definition. Is in some (left-to-right, or right-to-left)

Transmission types

IN-OUT type parameters
- transmission by reference
  formal parameter is local object of type pointer
  If expression: a temporary location may be passed
  
  Disadvantages:
  1. access slower as is indirect
  2. may make inadvertent changes (if out only was desired)
  3. aliases are created
- transmission by value-result
  formal parameter is same data type as actual parameter
  value copied at time of call and copied back at time of return same as by reference if
  1. the subprogram terminates normally, and
  2. the called subprogram cannot also access the actual parameter through an alias
  Need to know order arguments are copied back.
  Need to know whether address is computed again before copying back. XX(i,a[i])
  Faster, as no indirect reference
IN type only parameters
- transmission by value
  - If pass array by value, entire array gets copied
  - C is not consistent here
  - transmission by constant value - as in C++
  - no assignment to param is allowed OR allowed, but only changes local copy
  - formal parameter may not be transmitted to another subprogram except as a constant value parameter
  - May be implemented as transmission by value or reference
Out only type parameters
formal parameter is a local variable with no initial value
copied back at termination of subprogram
Pass by result
Explicit function Values: may be considered an extra OUT parameter
1. return(expr)
2. value to be returned by assignment to function name

Unevaluated Parameters: Transmission by Name

Used in Algol: theoretical significance

New interest in functional languages - delayed evaluation

Substitute name (in calling environment) for formal parameter

The name location binding is delayed until (and established fresh each time) the formal parameter is encountered.

Implemented by passing parameterless subprograms (thunks) rather than variable name. An expression needs to be evaluated IN the proper environment. Don't have mechanism to do that other than thru procedure call.

Whenever formal parameter is referenced, a call is made to thunk, which evaluates the parameter in the proper (caller) environment and returns proper resulting value (or location)

Example:

procedure R(var i,j: integer);
begin
  var  m:integer;
  m := 5;
  i := i + 1;
  j := j + 1;
  write(i,j);
end;

m := 2;
for(i=0;i<10;i++) c[i]=10*i;
R(m,c[m]);

pass by reference: adds 1 to M and c[2]
Pass by name: adds 1 to m and c[3]

Example for Call by Name

b1: begin real x,y;
      procedure G(t): name t;
      begin integer w;
            w := 10;
            y := 20;
	    print t
	    x = 0
	    print t
      end G;
      y := 0.0;
  b2: begin real y;
        y := 0.5;
        x := 1.0;
        call G(y-x)
      end
end


thunk()
  return(y-x)
end;

If name parameter is on left hand side, thunk would have to return the address of the element.

Parameter Passing Techniques

Show what is printed assuming (in turn) each of the parameter passing techniques. If the parameter passing method is illegal for the example, indicate it.

main()
{ integer a,c;
   procedure A(int x,int y);
   { integer a=y+7;
    c = 4; y=x-1; print (a,x,y,c)
   };
   a = 1;  c=3;
   call A(a,a); print(a,c);
   c = 5; call A(c,a); print(a,c);
}

call by value:
call by reference:
call by value-result:

main()
{ // no globals
   procedure A(int x,int y);
   { integer a=3, c=0;
     x=y+7;
     c = x+y;  print (a,x,y,c)
   };

   procedure B ();
   { integer a,c;
     a = 1; c=3;
     call A(a,a); print(a,c);
     c = 5; call A(c,a+c); print(a,c);
   }

   call B();
}

call by name:

main()
{ integer i, c, a[0..9] = {1,2,3,4,5,6,7,8,9,10};

   procedure D(int x,int y);
   { x = 3;
     c = i + i *i -5;
     print (x,y,c,i)
   };

   i := 1;
   call D(a[i],i++)
   print( i,c,a);
}

call by value:
call by name:
call by value-result;

Subprograms as parameters:

Corresponding formal parameter is of type subprogram name

Problems

static type checking: cannot determine if number of arguments is correct
Needs not just name, but full procedure specification: type returned, number order and type of args.
nonlocal references (free variables)
- variables without bindings assume same nonlocal environment as if inline expansion were used usually not what was intended
- nonlocal reference means the same thing during execution of the subprogram passed as a parameter as it would if the subprogram were invoked at the point where it appears an an actual parameter in the parameter list
  
  Need to create the correct nonlocal environment
  fairly straightforward with static chain method
  determine correct static chain pointer for the subprogram parameter and pass that along as part of the information transmitted with a subprogram parameter

If types of parameters are not required and separate compilation is possible, cannot do type checking.

As in passing labels as parameters, need an (ip, ep) (instruction pointer, environment pointer) pair

Are three choices of environment

the environment of the subprogram to which it is passed (shallow binding)
Not appropriate for block structures languages because of static binding of variables
the environment of the subprogram which is passed (deep binding)
the environment of the subprogram which passes the subprogram (used)

0 begin procedure P(R,b);
         value b,R; boolean b; procedure  R;
 
1    integer i;
2        procedure Q;
3        begin
4           i :=  i + 1;
5        end Q;
 
6    i := 0;
7    if b then P(Q,not b) else  R;
8    print (i);
9  end P;
 
10     P(P,true);
end

We have two Q Parameters. One declared in the first call to P and one declared in the second call to P. These two Q's have the same IP, but different EPs.
The value of the formal parameter R for the first call to P is an (ip,ep) pair corresponding to itself. The value of the formal parameter R for the second call to P is an (ip,ep) pair corresponding to the Q declared in the first call to P.

Statement Labels as Parameters

Can't simply pass address - need (cip,cep) pair

Need to update the dynamic chain of activation records.

Dynamic Scope

Creation of an implicit nonlocal environment via current dynamic chain

If no nesting of subprograms (APL, LISP, and SNOBOL4), there is no static structure to base scope rules.

Rule: use most recently created association for X
Trouble: changes between activations requiring dynamic type checking
Only used when dynamic type checking must exist for other reasons

Implementation Issues:
Local environment for each subprogram is made part of its activation record

At each reference, dynamic chain searched until association is found which is costly
Need to store identifiers themselves
No base plus offset calculation is possible

Alternative Central Referencing Environment table:
contains all currently active identifier associations

Figure 7.17

Saves following multiple pointers and searching for matching identifier names.

Contains one value for each identifier, regardless of the number of different occurrences of id
activation flag: indicates whether or not association is active
Can use base plus offset into central table
identifier name need not be present during run time, unless names can be generated during execution
Subprogram entry and exit are more costly:
- table must be updated: new associations made. What about old associations? (pushed onto stack)
- execution time stack still required, but hidden

Static Scope and Block Structure

Search for variable is done during compilation - stack of symbol tables.

Store count of activation records in static chain.

Alternative: The Display Implementation

Avoids following multiple pointers during execution (even though number of pointers to follow is known at compiler time)

On subprogram entry: pointers to static chain of activations are copied into display

With display (chain #, offset)

consider first entry as subscript into the display of activation record pointers
compute base plus offset

 ______________
0|environment 0|
1|environment 1|
2|environment 2|
3|environment 3|
4|environment 4|
 ______________

Subscript is "how many pointers to follow". Array contains address of the activation record accessed by following the number of pointers indicated by the subscript.