CSCI 431 Lecture Notes - Implementation and Binding
Computer Language Levels
-
Low-level - close to the machine instruction set, (e.g. machine
language, assembly language)
-
High-level - generally English-like in nature, the statements in the
language are not tied to the machine instruction set. (e.g. C++, Pascal)
-
Very high-level - many details of programming are implicit handled in
the language (e.g. Lisp, Miranda)
How do we describe a programming language?
How do we implement a programming language?
A language implementation provides a couple of very standard
capabilities:
- syntax checking:
- A program is input (usually from a source file) and its form is
checked against the syntax of the language.
- code generation:
- If the syntax of the program is in compliance with the syntax of
the language, then the text of the program is translated into the
target language (usually machine code or assembly language) and the
resulting translated form is saved in a file (the object file).
Implementation Methods
In theory it is possible to construct a hardware computer to
execute directly programs written in any particular programming
language. But practical considerations favor computers with low-level
machine languages, on the basis of speed, flexibility and cost.
The solution:
- compiled languages
- interpreted languages
- hybrid systems
Translation (compilation)
Translate from the high-level language to the host computer machine
language.
- compiler
- assembler
- linker or loader
- preprocessor
Example of translation
C source code:
void initialize(int U[], int size, int init)
{
int j;
for (j=0; j < size; ++j)
U[j]=init;
}
Example of translation (continued)
corresponding assembly language code:
# 1 void initialize(int U[], int size, int init)
; $16 holds U
; $17 holds size
; $18 holds init
initialize:
sextl $17, $17 ; sign-extend $17 to 64 bits
sextl $18, $18 ; sign-extend $18 to 64 bits
# 3 int j;
# 4
# 5 for (j=0; j < size; ++j)
ble $17, L$5 ; if $17 leq 0, go to L$5
clr $1 ; $1 holds j, $1 = 0
L$6:
addl $1, 1, $1 ; add one to $1
# 6 U[j]=init;
stl $18, ($16) ; store $18 in the location that $16 holds
cmplt $1, $17, $3 ; $3 = $1 lt $17
lda $16, 4($16) ; increment $16 to point to next element
bne $3, L$6 ; branch if j lt size
L$5:
ret ($26) ; $26 holds return address
Interpretation (software simulation)
- Simulate a computer whose machine language is the high-level
language
- Done by constructing a set of programs in the host computer
machine language that represent the algorithms necessary for execution
of programs in the high-level language
Hybrid Systems
- translate from a high-level language to some intermediate language
designed to allow easy interpretation
- faster than pure interpretation and provides portability
The structure of a translator.
- The translation process of a compiler or interpreter can be seen
in several intertwined steps.
- Tokenizer:
- The tokenizer is usually implemented as a function. When called it
reads the input stream and stops when it has read the next lexical
element - i.e., token. The tokenizer is called as needed by the
parser.
- Parser:
- The parser looks at tokens as needed and compares their order to
that required by the language (its actually easier than this sounds).
If it runs into an unexpected token, it reports an error and continues
on (in some error recovery mode) or quits.
- The parser has two other required activities which are carried
out in anticipation of error free parsing:
- Symbol Table:
- Every time the parser sees a token which is an identifier, it makes
an entry in the Symbol Table . Depending on the language
syntax, making the entry is only done if the identifier is not already
there.
- Code Generation:
- As executable program structures are encountered, appropriate code
is generated, either into a table for further processing, or into an
object file.
- This process is a simplistic view of a real translator. Often
this process takes several passes through the source file.
One pass may be used to check tokens and create a symbol table. A
second pass can involve filling in the symbol table with appropriate
attributes taken from the source code. A third pass will generate
code and the final pass do some optimization on that generated code.
Identifiers and Binding
- The most interesting tokens are identifiers.
- Identifiers are abstractions representing language entities or
constructs
- An identifier is the name of a
- variable,
- constant,
- function,
- type,
- parameter,
- or other programmer defined constructs.
- Names are for human readers of programs, good names help
understanding, bad names hinder it.
/* Can you guess what this function is doing? */
int factorial(int x, int sum) {
return x + sum * previous;
}
/* How about with these Names? */
int next_height(int height, int velocity) {
return height + velocity * ELAPSED_TIME;
}
- Every identifier must somehow be defined - either
explicitly or implicitly. (In Fortran if a variable name begins
with an `I' it is an integer variable)
More about Bindings
- Every identifier has certain attributes associated with
it.
- type (i.e., simple variable, array name, function name, formal
parameter, etc.)
- type of value (i.e., integer, float, etc)
- memory location
- value
- component---binding of a data object to other data objects
- referencing environment
- potentially other information as well
- Binding is the process of associating attributes with an
identifier.
- Binding time is the time in the lifetime of an identifier
when a particular attribute is bound to it.
- Some attributes are bound at translation time (e.g., type in
Pascal), others are bound at execution time (e.g., value and
address).
- Commands and expressions cannot be interpreted in isolation, their
interpretation depends on how the identifiers are bound. E.g.
expressions like "N+1" and commands like "int N=1" are valid only if N is
declared as an integer (or maybe a float).
- It is also possible to further complicate matters given that the same
identifier can be bound to different things at different points in the
program, for example:
int X;
void foo (int X)
{
...
}
int main() {
int X;
foo(X);
...
}
Here the interpretation of X depends on the current bindings of
X. X can refer to either a global variable, a variable
local to main or a parameter of the function foo.
Environments
- An environment is a set of bindings. Each expression and
command is interpreted in a particular environment, and all
identifiers occurring in the expression or command must have bindings
in that environment.
- An environment may be thought of as a mapping from
identifiers to the entities that they denote. For example:
int X;
void foo (int X)
{
(3)
...
}
int main() {
(1)
int X;
(2)
foo(X);
...
}
The environment at point (1) is:
X -> a global integer variable
foo -> a function abstraction
At point (2):
X -> an integer variable local to main
foo -> a function abstraction
At point (3):
X -> a parameter of the function foo
foo -> a function abstraction