CSCI 431 Lecture Notes - Introduction to CSCI 431

What is a Programming Language?

Features that a programming language should have:

universal - it can compute anything that can be computed (too strict?)
implementable - it can be implemented (obvious)
efficient - it should be possible to efficiently encode algorithms (a factor of 2-3 times slower matters a great deal!)
syntax (form/structure) and semantics (meaning)
Over the past few decades, thousands of programming languages have been designed, but programming language design is by no means a dead area. In only the past few years, a number of new languages have been developed and have become prominent, including:
- Perl
- HTML
- Java
- AMPL
- Active VRML

Examples

Each of the new languages listed above meets some specialized goal. For instance, AMPL is designed for expressing mathematics; HTML is a mark-up language for hypertext documents; and Active VRML, which was derived from the CAML family of languages, has been designed by Microsoft to enable transmission of active virtual reality scenarios across networks.

a few different forms of the expression x + 1

 x + 1          C, Pascal, Ada, Algol
 $x + 1         Perl
 x 1 add        Forth, Postscript
 (+ x 1)        Lisp, Scheme
 add $1 $2 $3   Assembly language

What is this Subject About?

the computer does exactly what it is told to do
programming is the art of precisely describing or commanding the computer to do something
it is very difficult to ``describe'' or ``command'' something precisely in English
programming languages are languages that the computer knows and are restricted so as to allow concise and exact specifications of tasks
there are currently over 3000 computer languages, can't study each one, must study what they share in common - their principles
once the principles are understood, it becomes simpler to understand new languages

Why do we need to understand programming languages?

Technical jobs in computer science will inevitably involve programming, which requires us to understand the languages we use.
We might be called upon to choose a programming language for some project. Making such a selection involves several issues, including technological, sociological (such as training programmers) and economic (such as re-use of existing code and programming environments) considerations. Of these, we will focus mostly on technological considerations.
Finally, we might be in a position to build a new language. To be successful, we must understand past efforts, current needs and key technological ideas.

What does it mean to understand a programming language?

Let us illustrate the problem via an example. Consider the following statement:

set x[i] to x[i] + 1

Example (continued)

This is clearly intended to denote the increment of an array element. How would we translate this statement to a variety of different languages, and what would it mean?
In C (circa 1970), we would write this as
x[i] = x[i] + 1;
This performs a hardware lookup for the address of x and adds i to it. The addition is a hardware operation, so it is dependent upon the hardware in question. This resulting address is then referenced (if it's legal - which it might not be), 1 is added to the bit-string stored there (again, as a hardware addition, which can overflow), and the result is stored back to that location. However, no attempt has been made to determine that x is even a vector and that x[i] is a number.

Example (continued)

In Scheme (1975), this would be transcribed as
(vector-set! x i (+ (vector-ref x i) 1))
This does all the things the corresponding C operation does, but in addition it also (a) checks that the object named x is indeed an array, (b) makes sure i is within the bounds of the array, (c) ensures the dereferenced location contains a number, and (d) performs abstract arithmetic (so there will be no ``overflow'').

Example (continued)

Finally, in Java (circa 1991), one might write
x[i] = x[i] + 1;
which looks identical to the C code. However, the actions performed are those performed by the Scheme code, with one major difference: the arithmetic is not as abstract. It is defined to be done as if the machine were a 32-bit machine, which means we can always determine the result of an operation, no matter which machine we execute the program on, but we cannot have our numbers grow arbitrarily large.

What do we need to know to program in a language?

There are three crucial components to any language. The syntax of the language is a way of specifying what is legal in the phrase structure of the language; knowing the syntax is analogous to knowing how to spell and form sentences in a natural language like English. However, this doesn't tell us anything about what the sentences mean.
The syntax of a language can be expressed in terms of a grammar such as BNF

What we need to know (continued)

The second component is the meaning, or semantics, of a program in that language. Ultimately, without a semantics, a programming language is just a collection of meaningless phrases; hence, the semantics is the crucial part of a language.
There 3 ways of expressing the semantics of a programming language mentioned in your book.
- Denotational Semantics tells what is computed by giving a mathematical object (typically a function) which is the meaning of the program. Denotational semantics are used in comparitive studies of programming langauges.
- Axiomatic Semantics defines the meaning of the program implicitly. It makes assertions about relationships that hold at each point in the execution of the program. Axioms define the properties of the control structures and state the properties that may be infered. A property about a program is deduced by using the axioms. Each program has a pre-condition which describes the initial conditions required by the program prior to execution and a post-condition which describes, upon termination of the program, the desired program property.
- Operational semantics tells how a computation is performed by defining how to simulate the execution of the program. Operational semantics may describe the syntactic transformations which mimic the execution of the program on an abstract machine or define a translation of the program into recursive functions. Operational semantics are used when learning a programming language and by compiler writers.

What we need to know (continued)

Finally, as with natural languages, every programming language has certain idioms that a programmer needs to know to use the language effectively. This is sometimes referred to as the pragmatics of the language. Idioms are usually acquired through practice and experience, though research over the past few decades has led to a better understanding of these issues.

Computer Language Levels

Low-level - close to the machine instruction set, (e.g. machine language, assembly language)
High-level - generally English-like in nature, these make it easier for humans to program, (e.g. C++, Pascal)
Very high-level - give a general idea of what the computer should do and let it do it (e.g. Lisp, Miranda)

Implementation Methods

In theory it is possible to construct a hardware computer to execute directly program written in any particular programming language. But practical considerations favor computers with low-level machine languages, on the basis of speed, flexibility and cost.

The solution:

compiled languages
interpreted languages
hybrid systems

Translation (compilation)

Translate from the high-level language to the host computer machine language.

compiler
assembler
linker or loader
preprocessor

Example of translation

C source code:

void initialize(int U[], int size, int init)
{
   int j;

   for (j=0; j < size; ++j)
       U[j]=init;
}

Example of translation (continued)

corresponding assembly language code:

 #      1 void initialize(int U[], int size, int init)
				; $16 holds U
				; $17 holds size
				; $18 holds init
initialize:
	sextl	$17, $17	; sign-extend $17 to 64 bits
	sextl	$18, $18	; sign-extend $18 to 64 bits

 #      3    int j;
 #      4 
 #      5    for (j=0; j < size; ++j)
	ble	$17, L$5       ; if $17 leq 0, go to L$5

	clr	$1		; $1 holds j, $1 =  0

L$6:
	addl	$1, 1, $1	; add one to $1

 #      6        U[j]=init;
	stl	$18, ($16)	; store $18 in the location that $16 holds

	cmplt	$1, $17, $3	; $3 = $1 lt $17

	lda	$16, 4($16)	; increment $16 to point to next element
	bne	$3, L$6		; branch if j lt size

L$5:

	ret	($26)		; $26 holds return address

Interpretation (software simulation)

Simulate a computer whose machine language is the high-level language
Done by constructing a set of programs in the host computer machine language that represent the algorithms necessary for execution of programs in the high-level language

Hybrid Systems

translate from a high-level language to some intermediate language designed to allow easy interpretation
faster than pure interpretation and provides portability

Computational Paradigms

The focus of this course is general purpose, high-level languages.

Machine-readable - unambiguous, finite algorithm to translate language, not too complex, has context-free grammar

there are different paradigms of programming languages (similar to different phyla in biology)

Imperative Languages
- Procedural languages
- Object-Oriented Languages
- Parallel processing languages
Declarative Languages
- Logic Programming languages
- Function Programming languages
- Database Languages

Imperative Languages

sequential execution
variables represent memory locations
assignment used to change value of variables

Procedural Languages

Nested blocks
Procedures
Scoping rules for variables
Recursion

Object-Oriented Languages

object is collection of memory locations and operations that can change values at them
opposite of functional programming - focuses on data structures
computation is interaction and communication between objects
each object behaves like a computer - own memory and operations
classes defined using structured declarations
objects are created as instances of a class

Parallel Processing Languages

shared memory
message passing

Functional Languages

function theory
parameter passing
returned values
no notion of data structures
no looping (uses recursion)

Logic Programming Languages

based on symbolic logic
no loops or other control structures
describe what is true about desired result

Language Criteria

How to evaluate the features of the various languages

readability
orthogonality
well-defined descriptions (syntax considerations)
provability (semantics considerations)
reliability
efficiency (cost)
expressibility (writability)
extensibility