CSCI 431 Lecture Notes - Abstraction

Abstraction

Abstraction is the representation of an entity that includes only the attributes of significance in a particular context.
The purpose of abstraction is to simplify the problem solving and programming process.
There are 2 fundamental kinds of abstraction in contemporary programming: process abstraction and data abstraction.
Historically data abstraction followed process abstraction; both provide the programmer with the ability to create new data types and operations on those types.

Process abstraction

Subprograms are process abstraction
A subprogram is a abstract operation defined by the programmer
Subprograms provide a way to provide a computational process while hiding the details of how it is done
They are the basic building blocks out of which most programs are constructed

Encapsulation

Dividing programs into groups of logically related subprograms and data, makes a large program more understandable, such groups are called modules
Each module performs a limited set of operations on a limited amount of data
When modules are designed so that they can be compiled separately this increases efficiency
An encapsulation is a grouping of subprograms and the data that they manipulate which is independently compilable
Information hiding is inforced by encapsulation
The idea behind information hiding is:
- As much information as possible is hidden from the user
- The user is not permitted to directly manipulate the hidden information
Encapsulation is important in permitting easy modification of a program
An other important advantage of encapsulation is that it facilitates code reuse
Examples of encapsulation mechanisms (although not always complete encapsulation mechanisms):
- The COMMON block in Fortran
- Nested blocks in Algol-60 like languages
- Ada packages
- C++ classes

Data abstraction

In some programming languages subprograms are are not compilation units
Often subprograms do not permit complete encapsulation of the data
For these reasons they are not adequate encapsulation structures
New languages provide better facilities for specifying and implementing entire Abstract Data Types (ATD's)
An ADT is defined as having
- a set of data objects
- a set of abstract operations on those data objects
- encapsulation so that the implementation process is unavailable to the user
- Examples: packages in Ada, classes in C++ and Java

An Example

Consider implementing an a stack as an ADT

Formulate the abstraction:
- Properties of the stack data type
  - a container that holds data elements of a particular type
  - it has a top element but no notion of a bottom (the bottom can not be accessed)
  - it can be empty or full
- Operations
  - create a new stack
  - push an element on the stack
  - pop an element from the stack
  - is empty
  - is full

Implementation in Ada

Encapsulation using packages
- specification package
- body package
Information hiding using private types and limited private types
Are the implementation details hidden?
Can stacks be used without knowledge of the implementation?

The specification package:


package stack is 
-- The visible entities, or the public interface
   type STACKTYPE is limited private;
   MAX_SIZE : constant := 100;
   function EMPTY (STK : in STACKTYPE) return BOOLEAN;
   procedure PUSH (STK : in out STACKTYPE; ELEMENT : in INTEGER);
   procedure POP(STK : in out STACKTYPE);
   function TOP(STK : in STACKTYPE);
-- The part that is hidden
   private
      type LIST_TYPE is array (1..MAX_SIZE) of INTEGER;
      type STACKTYPE is
         record
	 LIST : LIST_TYPE;
	 TOPSUB : INTEGER range 0..MAX_SIZE := 0;
	 end record;
end STACKTYPE;

The body package:


with TEXT_IO; use TEXT_IO;
package body STACKPACK is
   function EMPTY(STK : in STACKTYPE) return BOOLEAN is
      begin
      return STK.TOPSUB = 0;
      end EMPTY;

   procedure PUSH (STK : in out STACKTYPE; ELEMENT : in INTEGER) is
      begin
      if STK.TOPSUB > = MAX_SIZE then
         PUT_LINE("Error - Stack Overflow");
      else
         STK.TOPSUB := STK.TOPSUB + 1;
	 STK.LIST(TOPSUB) := ELEMENT;
      end if;
      end PUSH;

   procedure POP(STK : in out STACKTYPE) is
      begin
      if STK.TOPSUB = 0
         then PUT_LINE("Error - Stack Underflow"); 
	 else STK.TOPSUB := STK.TOPSUB - 1;
      end if;
      end POP;
 
   function TOP(STK : in STACKTYPE) return INTEGER is
      begin
      if STK.TOPSUB = 0
         then PUT_LINE("Error - Stack is Empty");
	 else return STK.LIST(STK.TOPSUB);
      end if;
      end TOP;
end STACKPACK;

A driver program


with STACKPACK, TEXT_IO; 
use STACKPACK, TEXT_IO;
procedure USE_STACKS is
   TOPONE : INTEGER;
   STACK : STACKTYPE; -- Creates a new STACKTYPE object
   begin
   PUSH(STACK, 42);
   PUSH(STACK, 17);
   TOPONE := TOP(STACK);
   POP(STACK);
   ...
   end USE_STACKS;

Procedural abstraction

Two components: specification and implementation
A subprogram represents a mathematical function that maps each particular set of arguments into a particular set of results.
If a subprogram returns a single data object as a result it is typically called a function
A formal specification of a procedure can be formulated in terms an abstract model. We will discuss the specification of a procedure using axiomatic semantics later.
The specification of a procedure in a programming language includes the following:
- the name of the procedure
- the signature (also called the prototype) giving the number of parameters, their order, data types, and the number of results, their order and data type.
- the actions performed by the subprogram
- In addition, some languages include a keyword in the declaration such as procedure or function
- Examples:
  - In Pascal:
    function Fn(X: real; Y:integer): real;
  - In C:
    void Sub(float X, int Y, float *Z, int *W);
  - In Ada:
    procedure Sub(X: in real; Y: in integer; Z: in out real; W: out boolean)

Parameters and parameter transmission

Parameters offer a way of sharing data; they are an alternative to the use of non-local environments
The term actual-parameter refers to the calling argument and formal-parameter refers to the local data object belonging to the subprogram
correspondence between actual parameters (calling arguments) and formal parameters can be established in either of two ways:
- based on position
- based on name as in Ada:
  Sub(Y => B, X => 27);
- most language use positional correspondence exclusively

Methods for transmitting parameters

The most common methods for parameter transmission are:
- call by value
- call by reference
- call by name
- call by value-result
- call by result
- call by constant value

Call by value

The actual parameter is copied into the location identified with the name of the formal parameter
Any changes made to the formal parameter during the execution of the subprogram are lost when the subprogram terminates.
Example in Ada:
procedure p(x:in integer);

Call by reference

Perhaps the most common parameter transmission method
A pointer to the location of the data object is made available to the subprogram. (The data object does not change positions in memory, rather the formal parameter refers to the same memory location as the actual parameter.)
Changes made to the formal parameter affect the actual parameter as well
Example in Pascal:
procedure p(var x: integer);
Example in C:
procedure p(int *x);
or
procedure p(int &x);
Call by reference creates aliases which can lead to problems

Call by name

The actual parameter is substituted everywhere for the formal parameter in the body of the subprogram before execution of the subprogram
The basic technique for implementation is to treat actual parameters as simple (parameterless) programs.
These parameterless programs are traditionally called thunks.
When a formal parameter (specified as call by name) is referenced, the corresponding thunk is executed resulting in the evaluation of the actual parameter in the proper referencing environment
If the actual parameter is a scalar variable, then call by name is equivalent to call by reference
If the actual parameter is a constant, then call by name is equivalent to call by value
If the actual parameter is an array element then call by name may be different from any other mechanism:
What happens when the actual parameters for the subprogram below are:
array[i],i
```
procedure increment(name x: real; name y: integer);
begin
   y := y + 2;
   x := x + 5.5;
end procedure
```
If the actual parameter is an expression that contains a variable, call by name is again different from other methods

Call by result

This is used to transmit only the result back from a subprogram
The initial value of the actual parameter makes no difference and can't be used by the subprogram
The formal parameter is a local variable with no initial value
When the subprogram terminates the final value of the formal parameter is assigned as the new value of the actual parameter
Example in Ada:
procedure p(x:out integer);

Call by value-result

Call by value-result is in effect a combination of call-by-value and call-by-result.
The value of the actual parameter is used to initialize the formal parameter which acts like a local variable.
When the program terminates the value of the formal parameter is copied to the actual parameter.
Sometimes called "pass-by-copy"
Example in Ada:
procedure p(x:in out integer);

Call by constant value

In this case, no change in the formal parameter is allowed during execution of the subprogram---the formal parameter acts as a local constant
The actual parameter establishes the starting value of the formal parameter

Question:

What happens when you pass an expression by reference, such as in the call sub(&(a+b), &b); ?

Subprograms as parameters

In many languages, a subprogram may be transmitted as an actual parameter
Example in Pascal:
procedure sub(x: integer; function R(y,z: integer): integer);
The function sub may be called with a function name as its second argument, e.g.: sub(27, fun1);
Within sub the function fun1 may be invoked using the formal parameter:
R(2,4);
There are two major problems associated with subprogram parameters:
- static type checking
- free variables (variables with no local binding)

Free variables in subprograms passed as parameters

Suppose that a subprogram fun that contains a nonlocal reference is passed as parameter from a calling program P to a called program Q.
What environment should be used to establish the value of the non-local variable when fun is called?
The rule: A non-local reference should mean the same thing during execution of the subprogram passed as a parameter as it would if the subprogram were invoked at the point where it appears as an actual parameter
To implement this rule the static pointer for the subprogram is part of the information transmitted with a subprogram parameter

Implementation of Issues

Remember that with call-by-value, values are passed to the subprogram, and, with call-by-reference, addresses are passed to the subprogram.
In general, parameter transmission is handled using the run-time stack as discussed during the last class.
While most modern architectures use a run-time stack similar to that discussed in class there are variations particularly in RISC architectures.
Below we will look at the details of three current architectures:
- An Intel PC running Windows 95
- DEC alpha workstation running Digital UNIX
- Sun SPARC workstation running Solaris

Example program


#include < stdio.h>

int avg8(int a, int b, int c, int d,
		 int e, int f, int g, int h)
{
	return((a+b+c+d+e+f+g+h+4)/8) ;
}

int avg3(int a, int b, int c)
{
	return((a+b+c+2)/3) ;
}

void main()
{
	int i, v[8] ;

	for (i=0; i < 8; ++i)
		scanf("%d", &v[i]) ;

	printf("%d %d\n",
		avg8(v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7]),
		avg3(v[0], v[1], v[2])) ;

}

Calling conventions in Windows 95 on an Intel PC

The Intel architecture has a limited "sparse register set". The calling convention is a classical stack-based approach. Passed arguments are placed on the stack in left-to-right order.
In the call of avg3, the variables v[0], v[1], v[2] are moved into registers (with the mov instruction) and are then pushed onto the stack by the calling procedure. Notice the order of placement -- v[2], v[1], and finally v[0].
```
	mov	eax, DWORD PTR _v$[ebp+8]	;; load v[2]
	push	eax
	mov	ecx, DWORD PTR _v$[ebp+4]	;; load v[1]
	push	ecx
	mov	edx, DWORD PTR _v$[ebp]		;; load v[0]
	push	edx
	call	_avg3
```
In the called procedure, parameters are read from the stack and loaded into registers as needed. The called procedure returns four-byte results in register EAX.
On return, the calling procedure is responsible for popping the stack.
MicroSoft's Visual C++ compiler also supports a "fastcall" convention in which up to two parameters are passed in registers.

Calling conventions in Digital Unix on Alpha

The Alpha chip has a large register set.
Six registers (16 to 21) are used to pass the first six arguments to the routine. In those rare instances in which more than six arguments are passed, the additional arguments are stored on the stack. Values are returned from the call in register 0.

The code for calling avg8 looks something like:


	ldl	$16, 48($sp)		;; load v[0]
	ldl	$17, 52($sp)		;; load v[1]
	ldl	$18, 56($sp)		;; load v[2]
	ldl	$19, 60($sp)		;; load v[3]
	ldl	$20, 64($sp)		;; load v[4]
	ldl	$21, 68($sp)		;; load v[5]
	ldl	$7, 72($sp)		;; load v[6]
	stq	$7, ($sp)		;; store on stack
	ldl	$8, 76($sp)		;; load v[7]
	stq	$8, 8($sp)		;; store on stack

Calling conventions in Solaris on Sun SPARC

Like the Alpha, the RISC-based SPARC passes up to six parameters in registers and places other parameters on the stack. But, one unusual feature of the SPARC architecture is its register windows.
The SPARC processor has 32 registers -- 8 global registers and 24 registers in register windows. These 24 are organized in three groups of eight.
- Group 1 -- 8 to 15 -- out registers
- Group 2 -- 16 to 23 -- local registers
- Group 3 -- 24 to 31 -- in registers
The calling procedure loads its first six parameters into the "out" register set. [Incidently, the two other "out" registers are used for the stack pointer and return address.]
During the process of making the call, the register window is moved. The "out" registers of the caller procedure become the "in" registers of the called procedure. The called procedure also gets a new set of "local" registers and "out" registers (which it uses in its own calls).
On procedure return, the window move is reversed and the caller has its old set of registers -- except that the first "out" register will contain the value returned by the called procedure.
Thus a call will be proceeded by code storing into registers %o0, %o1, etc. and procedures generally start with code retrieving values from $i0, %i1, etc.


	ld	[%fp-36],%l0		;; load v[0]
	ld	[%fp-32],%l1		;; load v[1]
	ld	[%fp-28],%l2		;; load v[2]
	ld	[%fp-24],%l3		;; load v[3]
	ld	[%fp-20],%l5		;; load v[4]
	ld	[%fp-16],%l6		;; load v[5]
	ld	[%fp-12],%l7		;; load v[6]
	ld	[%fp-8],%l4		;; load v[7]
	mov	%l0,%o0			;; move local register to out resister
	mov	%l1,%o1			;; move local register to out resister
	mov	%l2,%o2			;; move local register to out resister
	mov	%l3,%o3			;; move local register to out resister
	mov	%l5,%o4			;; move local register to out resister
	mov	%l6,%o5			;; move local register to out resister
	st	%l7,[%sp+92]		;; store on stack
	st	%l4,[%sp+96]		;; store on stack
	call	avg8