CSCI 431 -- Chapter 2

The Structure and Operation of the Computer


By computer we will mean an integrated set of algorithms and data structures capable of storing a program. This could be A program may be either executed on a

Any computer consists of the six components below. Studying how different languages, and thus how different virtual computers implement these aspects will be a major focus of this course.

The hardware of the computer

Data: typically for a hardware computer this consists of main memory, cache memory and external files. It will also have a number of built-in data types - integers, reals, fixed-length strings. The program itself is also data, known as the machine language representation of the computer.

Operations: typically for a hardware computer, these are very simple operations, such as arithmetic, simple tests (ie >0, =0, <0) and jumping to another memory location

Sequence Control: a physical computer typically steps through operations one after another unless it is instructed to jump to another location

Data Access: typically a hardware computer stores each piece of data, in a particular location, in main memory. Data is accessed by specifying the address of that location.

Storage Management: a conflict occurs because CPU operations typically take place at the nanosecond level, external access at the millisecond. Simple computers execute only one program at a time, thus causing the CPU to wait while slower data operations occur. This may be sped up by multiprogramming - running one program until it initiates a slow data access and then executing a different program until that action is complete; or by moving memory that is likely to be used, to faster (cache) memory.

Operating environment: computers are usually connected to external resources - networks, I/O devices, storage devices.

Computer Organization

Operations: Sequence control how do we decide which instruction to take next.
Note: control sequence changes accomplished by the program address register

Interpreter

Computer states: The dynamic behaviour of an instruction, may be examined by studying how the initial state of a computer, is transformed by a state transition, into a final state. Program execution may be viewed as a sequence of state transitions.

Alternative computer architectures:
The major computer architecture is known as the Von Neumann architecture. These systems have a single CPU, a large memory and a process to transfer data between the memory and the CPU. Other architectures are possible, e.g. Multiprocessors.

Firmware Computers Consider that any program may be implemented in hardware. Therefore any high level language could be implemented as a computer, where the low level machine instructions of the computer were the language. Not often used because a computer designed this way would be more complex, thus more costly, it would also be less flexible when implementing other languages.

A lower level variation of this is microprogramming, where microprogramming is used to specify the commands that a CPU will use. This combination of programmable hardware plus its microprogram is yet another virtual computer.

The Computer as a Multi-Level Machine: Abstraction

Programming language creates a virtual machine for programmer

Dijkstra: Originally we were obligated to write programs so that a computer could execute them. Now we write the programs and the computer has the obligation to understand and execute them.

Progress in programming language design marked by increasing support for abstraction.

Computer at lowest level is set of charged particles racing through wires w/ memory locations set to one and off - very hard to deal with.

In computer organization look at higher level of abstraction: interpret sequences of on/off as data (reals, integers, char's, etc) and as instructions.

Computer looks at current instruction and contents of memory, then does something to another chunk of memory (incl. registers, accumulators, program counter, etc.)---the hardware computer described above

When write Pascal (or other language) program - work with different virtual machine.

Language creates the illusion of more sophisticated virtual machine.

Pure translators

Assembler:

Compiler:

Preprocessor:

Execution of program w/ compiler:

Interpreter:

We will speak of virtual machine defined by a language implementation.

Machine language of virtual machine is set of instructions supported by translator for language.

Layers of virtual machines on Mac: Bare 680x0 chip, OpSys virtual machine, MacPascal (or Lightspeed Pascal) machine, application program's virtual machine.

We will describe language in terms of virtual machine

Slight problem:

May lead to different implementations of same language - even on same machine.

Problem : How can you ensure different implementations result in same semantics?

Sometimes virtual machines made explicit:

Compilers and Interpreters

While exist few special purpose chips which execute high-level languages (LISP machine) most have to be translated into machine language.

Two extreme solutions:

Pure interpreter: Simulate virtual machine (our approach to run-time semantics)

	REPEAT
		Get next statement
		Determine action(s) to be executed
		Call routine to perform action
	UNTIL done

Pure Compiler:

  1. Translate all units of program into object code (say, in machine language)

  2. Link into single relocatable machine code

  3. Load into memory

Comparison of Compilation vs Interpretation

compiler interpreter
Only translate each statement once. Translate only if executed.
Speed of execution. Error messages tied to source.
More supportive environment.
Only object code in memory when executing.
May take more space because of expansion.
Must have interp. in memory when executing (but source may be more compact)

Rarely have pure compiler or interpreter.

Can go farther and compile into intermediate code (e.g., P-code) and then interpret.

In FORTRAN, Format statements (I/O) are always interpreted.

Binding Time

Program elements have attributes which must be "bound" to them at some point.

Binding = fixing a value or some other property of an object from set of possibilities

MAKING A DECISION

Example: Bind variable to location and value

Time of making a decision is called binding time

Possibilities: Execution, translation, language implementation, language definition

Dynamic

Execution:

  1. Entry to block or subprogram - bind actual to formal parameter, location of local variable.

  2. Arbitrary points - values to variables via assignment

Static

Translation:

  1. Determined by programmer - declarations bind type to variable name, values to constants

  2. Determined by translator - global variable to location (load time) source program to object program representation
Implementation: Representation of values in computer, semantics of operations, statements - if not uniform may lead to diff. results on diff. machines

Language Def: Structure of language, possible types, rep of values in program text.

Example: When is meaning of "+" bound to its meaning in "x + 10"?

Difference between reserved and key words has to do with binding time Example: "DO" is reserved word in Pascal, but not FORTRAN (can write DO = 10)

"Integer" may be redefined in Pascal, but not FORTRAN or Ada.

Why care about binding time?

Early vs. late binding - many language design decisions relate to binding time

Example: "+" bound at translation vs. execution time

Early binding supports compilation, late binding -> interpretation

Small changes may delay binding time -

Ex: recursion forces delay in binding time for local variables to locations

Generally considered useful to bind ASAP

As work down layers in examining or translating language, may find able to make more binding, e.g., by constant propagation - support optimizers.

Bindings are maintained in structures both at compile and at run-time.

During compilation, declarations stored in Symbol table.

Most of these are then used in the compilation process and need not be saved.

Other attributes are needed at execution-time:

Run-time environment keeps track of meanings of names:

Contents of locations also changes during execution. Usually called memory or state:

Memory: Locations -> Values

With interpreter, just keep it all 3 sets of values together in one Environment.

Overview of structure of a compiler

Two primary phases:
Analysis:
Break into lexical items, build parse tree, generate simple intermediate code (type checking)

Synthesis:
Optimization (look at instructions in context), code generation, linking and loading.

Lexical analysis:
Break source program into lexical items, e.g. identifiers, operation symbols, key words, punctuation, comments, etc. Enter id's into symbol table. Convert lexical items into internal form - often pair of kind of item and actual item (for id, symbol table reference)

Syntactical analysis:
Use formal grammar to parse program and build tree (either explicitly or implicitly through stack)

Semantic analysis:
Update symbol table (e.g., by adding type info). Insert implicit info (e.g., resolve overloaded ops's - like "+"), error detection - type-checking, jumps into loops, etc.

Intermediate Code Generator:
Traverse tree generating intermediate code

Optimization:
Catch adjacent store-reload pairs, eval common sub-expressions, move static code out of loops, allocate registers, optimize array accesses, etc.
Example:
		for i := .. do ...
			for j:= 1 to n do
				A[i,j] := ....
Code generation:
Generate real assembly or machine code

Linking & loading:
Get object code for all pieces of program (incl separately compiled modules, libraries, etc.). Resolve all external references - get locations relative to beginning location of program. Load program into memory at some start location - do all addressing relative to base address.

Symbol table: Contains all identifier names (variable, array name, proc name, formal parameter), type of value, where visible, etc. Used to check for errors and generate code. Often thrown away at end of compilation, but may be held for error reporting or if names generated dynamically.
Example:
  Identifier Type Scope ...
1 y real 1 ...
2 x1 real 2 ...
3 r real 1 ...
         

Like to have easily portable compilers front-end vs back-end

Front-end generate intermediate code and do some peep-hole optimization

Back-end generate real code and do more optimization.

Semantics

Meaning of a program (once know it is syntactically correct). Work with virtual (or abstract) machine when discuss semantics of programming language constructs. Run program by loading it into memory and initializing ip to beginning of program

Official language definitions: Standardize syntax and semantics - promote portability.

Often better to standardize after experience. -- Ada standardized before a real implementation.

Common Lisp, Scheme, ML now standardized, Fortran '9x.

Good formal description of syntax, semantics still hard.

Backus, in Algol 60 Report promised formal semantics.