CSCI 431 Lecture Notes - Implementation and Binding

Computer Language Levels

Low-level - close to the machine instruction set, (e.g. machine language, assembly language)
High-level - generally English-like in nature, the statements in the language are not tied to the machine instruction set. (e.g. C++, Pascal)
Very high-level - many details of programming are implicit handled in the language (e.g. Lisp, Miranda)

How do we describe a programming language?

Syntax:
The syntax of a language must be structured enough that it can be automatically recognized by an implementation. This requires describing:
- Lexical Level:
  - literals
  - keywords
  - identifiers
  - punctuation
- Grammar Level:
  - combinations of lexical elements
  - This is done with syntax diagrams in the form of trees
Semantics
- Semantic descriptions which you have seen have probably been verbal. The idea of semantics in programming is "what is the effect of 'executing' a particular program statement?"

How do we implement a programming language?

A language implementation provides a couple of very standard capabilities:

syntax checking:
- A program is input (usually from a source file) and its form is checked against the syntax of the language.
code generation:
- If the syntax of the program is in compliance with the syntax of the language, then the text of the program is translated into the target language (usually machine code or assembly language) and the resulting translated form is saved in a file (the object file).

Implementation Methods

In theory it is possible to construct a hardware computer to execute directly programs written in any particular programming language. But practical considerations favor computers with low-level machine languages, on the basis of speed, flexibility and cost.

The solution:

compiled languages
interpreted languages
hybrid systems

Translation (compilation)

Translate from the high-level language to the host computer machine language.

compiler
assembler
linker or loader
preprocessor

Example of translation

C source code:

void initialize(int U[], int size, int init)
{
   int j;

   for (j=0; j < size; ++j)
       U[j]=init;
}

Example of translation (continued)

corresponding assembly language code:

 #      1 void initialize(int U[], int size, int init)
				; $16 holds U
				; $17 holds size
				; $18 holds init
initialize:
	sextl	$17, $17	; sign-extend $17 to 64 bits
	sextl	$18, $18	; sign-extend $18 to 64 bits

 #      3    int j;
 #      4 
 #      5    for (j=0; j < size; ++j)
	ble	$17, L$5       ; if $17 leq 0, go to L$5

	clr	$1		; $1 holds j, $1 =  0

L$6:
	addl	$1, 1, $1	; add one to $1

 #      6        U[j]=init;
	stl	$18, ($16)	; store $18 in the location that $16 holds

	cmplt	$1, $17, $3	; $3 = $1 lt $17

	lda	$16, 4($16)	; increment $16 to point to next element
	bne	$3, L$6		; branch if j lt size

L$5:

	ret	($26)		; $26 holds return address

Interpretation (software simulation)

Simulate a computer whose machine language is the high-level language
Done by constructing a set of programs in the host computer machine language that represent the algorithms necessary for execution of programs in the high-level language

Hybrid Systems

translate from a high-level language to some intermediate language designed to allow easy interpretation
faster than pure interpretation and provides portability

The structure of a translator.

The translation process of a compiler or interpreter can be seen in several intertwined steps.
- Tokenizer:
  - The tokenizer is usually implemented as a function. When called it reads the input stream and stops when it has read the next lexical element - i.e., token. The tokenizer is called as needed by the parser.
- Parser:
  - The parser looks at tokens as needed and compares their order to that required by the language (its actually easier than this sounds). If it runs into an unexpected token, it reports an error and continues on (in some error recovery mode) or quits.
  - The parser has two other required activities which are carried out in anticipation of error free parsing:
    - Symbol Table:
      - Every time the parser sees a token which is an identifier, it makes an entry in the Symbol Table . Depending on the language syntax, making the entry is only done if the identifier is not already there.
    - Code Generation:
      - As executable program structures are encountered, appropriate code is generated, either into a table for further processing, or into an object file.
- This process is a simplistic view of a real translator. Often this process takes several passes through the source file. One pass may be used to check tokens and create a symbol table. A second pass can involve filling in the symbol table with appropriate attributes taken from the source code. A third pass will generate code and the final pass do some optimization on that generated code.

Identifiers and Binding

The most interesting tokens are identifiers.
Identifiers are abstractions representing language entities or constructs
An identifier is the name of a
- variable,
- constant,
- function,
- type,
- parameter,
- or other programmer defined constructs.

Names are for human readers of programs, good names help understanding, bad names hinder it.

  /* Can you guess what this function is doing? */
  int factorial(int x, int sum) {
    return x + sum * previous;
    }

  /* How about with these Names? */
  int next_height(int height, int velocity) {
    return height + velocity * ELAPSED_TIME;
    }

Every identifier must somehow be defined - either explicitly or implicitly. (In Fortran if a variable name begins with an `I' it is an integer variable)

More about Bindings

Every identifier has certain attributes associated with it.
- type (i.e., simple variable, array name, function name, formal parameter, etc.)
- type of value (i.e., integer, float, etc)
- memory location
- value
- component---binding of a data object to other data objects
- referencing environment
- potentially other information as well
Binding is the process of associating attributes with an identifier.
Binding time is the time in the lifetime of an identifier when a particular attribute is bound to it.
Some attributes are bound at translation time (e.g., type in Pascal), others are bound at execution time (e.g., value and address).
Commands and expressions cannot be interpreted in isolation, their interpretation depends on how the identifiers are bound. E.g. expressions like "N+1" and commands like "int N=1" are valid only if N is declared as an integer (or maybe a float).
It is also possible to further complicate matters given that the same identifier can be bound to different things at different points in the program, for example:
```
  int X;

  void foo (int X)
  {
    ...
  }

  int main() {
    int X;

    foo(X);
    ...
  }
```
Here the interpretation of X depends on the current bindings of X. X can refer to either a global variable, a variable local to main or a parameter of the function foo.

Environments

An environment is a set of bindings. Each expression and command is interpreted in a particular environment, and all identifiers occurring in the expression or command must have bindings in that environment.

An environment may be thought of as a mapping from identifiers to the entities that they denote. For example:

  int X;
 
  void foo (int X)
  {
(3)
    ...
  }
 
  int main() {
(1)
    int X;
(2) 
    foo(X);
    ...
  }

The environment at point (1) is:

  X -> a global integer variable
  foo -> a function abstraction

At point (2):

  X -> an integer variable local to main
  foo -> a function abstraction

At point (3):

  X -> a parameter of the function foo
  foo -> a function abstraction