CSCI 431: Parallel Programming
Motivation
- Although machines have been getting faster yearly, the basic
problem impeding progress is that CPU speed is always about an
order of magnitude faster than the main memory. Therefore a computer
spends much of its time waiting for data from the main memory.
- There are two approaches to solving the problem:
- More efficient hardware
- Software that allows more efficient use of the hardware---software
that allows programs to run concurrently.
Variations on subprogram control associated with concurrency
- Up until now we have discussed subroutines that have the following
four properties
- Explicit call statements are required.
- This is not all together true---what kinds of subroutines have we
seen that are not invoked by explicit call statements?
- Subprograms must execute completely at each call
- In concurrent programming we have coroutines or
tasks and control is transferred back and forth between
the called and the calling subprograms
- There is an immediate transfer of control at the point of the call
- In concurrent programming tasks can be scheduled
- There is a single execution sequence
- Multiple tasks can run simultaneously or we can have logical
concurrency---the illusion of simultaneous execution.
- Programs executed with physical concurrency can have multiple
threads of control.
Basic Parallel Programming Concepts
- In a sequential computer, there is only one program being
executed at any given time. The computer has only one CPU and it
serves one program at a time. A program under execution is called a
process. A process consists of program, data, stack and
operating system administrative structures. Each process has a unique
process id which is usually a small integer. A sequential
computer can perform multiprogramming by rapid reassignment of
the CPU among many processes, giving the appearance of parallelism;
this an example of logical concurrency. This is the case of
multi-user UNIX systems. However, true parallelism is only provided
by a computer with multiple CPUs, simply known as processors.
- A task is a unit of a program that can be concurrently executed
with other units of the same program. Each task corresponds to a
separate process.
- A SIMD (Single Instruction Multiple Data) computer is a
parallel computer whose processors can simultaneously execute the same
instruction on different data elements. A MIMD (Multiple Instruction
Multiple Data) computer is one whose processors can execute different
instructions on different data elements. When a program is divided
into parts and each part is performed by a parallel process, the
overall program can be executed that much faster.
- In order to cooperate, parallel processes performing functions
must exchange information from time to time. Therefore, the
processors must be connected in some way to allow communication among
them. Some parallel computers use a network-type connection and allow
messages to pass through the network from a sending processor to a
receiving processor. This includes the hypercube type computers.
Another way to provide communication is to have all processors access
a common memory that is shared. This way, a message can be written
into the shared memory and picked up by the receiver.
- There are four important new aspects to parallel programs not
present in sequential programming:
- scheduling
- synchronization
- mutual exclusion
- deadlock
Of these, we will look at synchronization and mutual exclusion.
Synchronization
- Synchronization is a mechanism that controls the order in
which tasks execute.
-
Because the processes on a shared memory multiprocessor make progress
through their programs at independent and unpredictable rates, it is
necessary to coordinate the order in which some tasks are performed.
If a subtask must not be started before other tasks are finished, it
is important to make sure that this is the case. For example, imagine
each process is a worker on an assembly line. Then a process must
wait until a previous process has finished its part of the job. Such
time-related coordination of parallel activities is called
synchronization. Process synchronization usually involves delaying
processes until some other processes reach certain stages in their
computation.
-
A process is said to be blocked if its continued execution is
delayed until a later time. There are several methods to block a
process. A process can spin in a tight loop checking a
condition repeatedly until it becomes true (or false). Here is the
top-level analogue of a spin block loop
while (flag == 0) { };
where flag is a shared variable. This form of waiting is referred to
as busy wait. Busy waiting is expensive in terms of processor
usage. Therefore, it is only advisable to use spin block for
extremely short waiting periods (less than 200 instructions say). For
longer waiting periods it is better to use a process block which
suspends the process. A suspended process relinquishes the
processor it is on so it can be used for other things. A suspended
process will be activated when the condition it is waiting for is met.
Mutual Exclusion
- When parallel processes can access a common resource, such as
variables or arrays in shared memory, simultaneous access by more than
one process to a given resource can take place. Such simultaneous
accesses usually result in erroneous/unpredictable results. For
example, if a shared variable flag is read by one process as
it is being assigned a value by a second process, then the value read
can be either the old value or the new value. Worse yet, if two
processes assign values to flag simultaneously, one of the
values is lost. The only solution to such conflicts is to arrange
mutually exclusive access to shared quantities. When programmed
correctly, only one process at a time can access the same quantity
protected by mutual exclusion.
-
A critical region is a sequence of program statements within a
task where the task is operating on some data object shared with other
tasks. If a critical region in task A is manipulating data object
X, then mutual exclusion requires that on other task be
simultaneously executing a critical region that also manipulated
X.
Methods for implementing mutual exclusion
Software locks
- Mutual exclusion on a shared quantity can be achieve by making
sure that at any time only one process is executing code in the
critical regions associated with the shared data.
One way to accomplish this is to put a lock
instruction immediately before and an unlock instruction
immediately after each critical section.
-
A simple way to understand the lock and unlock operations is to view
them as system-supplied operations on a shared boolean value (1 or 0)
that represents whether the lock is locked or unlocked. A
lock(X) call will block if X is already locked. Otherwise, it
will change the value of X to be locked and return. An
unlock(X) call sets X to unlock and returns always. The mutual
exclusion of simultaneous lock operations on the shared
boolean value is assured by the system.
Because two processes, running on difference CPUs,
may simultaneously execute the lock operation,
the operation must be implemented with special machine
instructions, such as the test-and-set atomic operation
of the IBM System/360 or the load-locked,
store-condition pair of the Alpha.
-
Atomicity ensures that each operation completes execution before any
other concurrent operation can access its data. An atomic execution
of a statement means that the execution of that statement is not
interruptible.
The Semaphore
Semaphore example
Semaphores have a number of disadvantages:
- If a wait statement is inadvertently omitted in just one
process, access to the data structure by all processes is insecure
- If a release statement is inadvertently omitted in just one
process, deadlock is likely to arise
- If a reference to the protected resource is inadvertently
placed outside a critical region, no protection is provided
- A semaphore is itself a shared data structure whose access requires protection; an infinite regression can only be avoided if the computing system provides uninterruptible wait and release statements
The Monitor
- Another approach to mutual exclusion is through the use of a
monitor. A monitor is a shared data object together with a
set of operations that may manipulate it.
- A monitor is an ADT and could be implemented as a
task
- Another task may manipulate the shared data object only by using
the defined member functions of the monitor ADT.
- To enforce mutual exclusion it is necessary to require that at
most one of the operations defined for the data object may be
executing a any given time.
Message passing
- A message is a transfer of information from one task to another.
- It provides a way for each task to synchronize its actions with
another task, yet each task remains free to continue executing when
not needing to be synchronized.
- In message passing, the problem of shared data is resolved by
passing data values as messages; there is no shared data object.
- Using message passing as the basis for data sharing ensures mutual
exclusion without any special mechanism since each data object is
owned by exactly one task.
- Message passing can be either synchronous or asynchronous
- In synchronous message passing, the two processes
simultaneously execute the send and receive primitives.
- In asynchronous message passing, the send
primitive may enqueue a message for a later receive.
Storage Management in Tasks
- Tasks are defined as several executing sequences executing
simultaneously within a given program.
- Each task needs its own storage management, usually a stack.
This requires a means of
implementing multiple stacks for a single task.
Languages for Concurrent Programming
- There are several languages that support concurrent programming.
- High Performance Fortran
- Concurrent Pascal
- Occam
- PL/I
- Ada
- VAL
- Java
- Linda
- There are also extensions to languages such as C
- Developed for use on parallel machines
- Developed to support the operating system.
- We will consider only Ada and Java
Ada
- In Ada the definition of a task differs little from the definition
of an ordinary subprogram, except for defining how the task
synchronizes and communicates with other tasks.
- A task definition takes the following form:
task Name is
--specific declarations allowing synchronization and
--communication with other tasks
end;
task body Name is
--the usual local declarations as found in any program
begin
--sequence of statements
end;
- The interface (or specification) of a task is its entry points or
locations where it an accept messages from other tasks.
- To allow for multiple activations of a task, a task type must be
defined
task type Name is
--the rest of the definition in the same form as above
end;
- To create several activations of a task and give them the names A,
B and C, the declarations are written as ordinary variable
declarations:
A, B, C: Name
alternatively the declarations can be done using pointers
type taskPtr is access Name;
newName: taskPtr := new Name;
- Note that when a task is declared it is activated, that is, it is
running
- Although Ada tasks can be thought of as managers that reside with
the resource that they manage (like monitors), Ada tasks can be active
in other ways
- They have several mechanism that allow them to choose among
competing requests for access to their resources.
Ada message passing
- The transmission of a message from one task to another is called a
rendezvous
- Rendezvous can occur only if both the sender and the receiver want
it to happen.
- The information of the message can be transmitted in either or
both directions.
- The Ada design of tasks is partially based on the work of
and Hoare in that nondeterminism is used to choose among competing
message-sending tasks.
- A more complete example of an Ada task specification with one
entry point that has one parameter:
task Name is
entry Entry_1(Item: in INTEGER)
end;
- The task body must include some syntactic form for entry points
that corresponds to the entry clause in the specification, these are
specified by accept clauses
task body Name is
--the usual local declarations as found in any program
begin
loop
accept Entry_1(Item: in INTEGER) do
...
end Entry_1
end loop;
end Name;
- Whenever a task entry point or accept clause receives a message
that it is not ready to process, for whatever reason, the sender task
must suspend until the entry point is ready to accept the message
- If the execution of Name begins and reaches the Entry_1 accept
clause before any other task sends a message to Entry_1, Name is
suspended.
- If a message is sent to Entry_1 while Name is suspended at its
accept, a rendezvous occurs and the accept clause body is executed.
Then because of the loop, execution proceeds to the accept clause
again.
- In summary, a rendezvous can occur in 2 basic ways in this simple
example:
- The receiver task can be waiting for another task to send
a message to Entry_1, when the message is sent the rendezvous occurs
- The receiver can be busy with one rendezvous, or some other
processing, when another task attempts to send a message to the same
entry. In this case the sender is suspended until the receiver is
free to accept the message.
- If several message are sent while the receiver is busy, the
senders are queued to wait for a rendezvous.
Multiple Entry points
Guarded Clauses
Task Termination
Ada Examples
A little bit about Java
- Java supports logical concurrency through the Thread
class.
- When a Java application program begins execution, a new thread is
created (in which the main method will run) and main is called.
Therefore all Java programs run in threads.
The Thread class