CSCI 431 Lecture Notes - Elementary Data Types

Data Types

programs: a set of operations working on certain data in a certain sequence.

Differences among languages are in the

types of data allowed,
types of operations available, and
mechanisms for sequence control.

chapter 4: data types that are built into languages
chapter 5: programmer-defined data types (encapsulated data types)
chapter 6: individual operation and sequence control
chapter 7: subprogram control
chapter 8: introduces inheritance (operations on encapsulated data objects can be automatically derived)

Properties of Data Types and Objects

It is important to differentiate between:

a data object, which is a location in a virtual computer. It represents a container for data values, a place where data values may be stored and later retrieved.
a data value, a possible value to be stored in a data object
a bound variable, a data object which contains a particular data value

Data Objects

A data object has various attributes that define that object. The value that is bound to some of these attributes may change during runtime. Some of the most important of these include:

type usually an attribute
location where in the computer's memory the object is stored
value usually from an assignment
name(s) possibly modified or adding to during runtime
component(s) a link to any subordinate data structures

Examples of data objects are:

programmer defined data objects such as the variables, constants, arrays, files, etc.
system generated data objects such as run-time storage stack, subprogram activation records, etc.
A data object that is defined and named by the programmer explicitly in a program is called a variable
A constant is a data object with a name which is bound to a value permanently during its lifetime.
A literal is a constant whose name is just the written representation of its value.

const int MAX = 30;

Data Types

A data type is a class of data objects together with a set of operations for creating and manipulating them. Each language has a set of primitive data types built into the language. Each data type has a specification that is made up of

attributes that distinguish the data type.. i.e. for an array, how many dimensions it has
values that the data type may store
operations that may be performed with the data type.

A data type also has an implementation, made up of

the storage representation of the object
and a description of how the operations of the data object are performed.

A program deals with particular data objects, but a programming language deals more commonly with data types and the operations provided for manipulating them.

Every programming language has a set of primitive (or elementary) data types that are built into the language. In addition, a language may provide facilities to allow the programmer to define new data types, such as C and Ada. In other words, there are two kinds of data types

elementary data types:
such as characters, integers, real numbers, Booleans, etc.
structured data types
such as arrays, records, character strings, etc.

Specification of Elementary Data Types

An elementary data type is an elementary data objects plus operations. Elementary data objects may store only one value, i.e. integer, boolean, character.

Attributes: (i.e. name, type).. usually don't change.. may be stored in a descriptor as part of the data object.
values: often defined by the underlying machine.
Operations: made up of primitive operations - defined by the language -, and user-defined operations. Each operation that may be used is defined by a signature. To specify the signature of an operation, the number, order, and data types of the arguments in the domain (i.e., the input) of an operation are given as well as the order and data type of the resulting range (i.e., the output).
sin(x) -> sin: float float
x = b -> =: integer x integer boolean
x + b -> +: float x float float
operations are often implemented via hardware, though many use some sort of algorithm

Implementation of Elementary Data Types

Storage representation

Storage for elementary data types is strongly influenced by the underlying computer that will execute the program. For example, the storage representation for integer and real values is almost always the integer or floating-point binary representation for numbers used in the underlying hardware. For character values, the hardware or OS character codes are used.

Implementation of operations
- Each operation defined for data objects of a given type may be implemented in one of three main ways:

Declarations

A declaration is a program statement commonly used to specify the name and type of data objects

explicit declarations
implicit declarations

Declarations of Operations: to specify the signatures of operations to the language translator

There are also type declarations. e.g.,


    typedef struct{
      int number;
      char name[NAME_LEN+1];
    } Part;

Purposes for Declarations

for the choice of storage representation and storage management
for polymorphic operations
template or generic functions: a function name may take on a variety of implementations depending upon the types of its arguments
allows the programmer to extend the language with new data types and operations
for type checking and error checking

improves saftey and correctness
declarations allow for static type checking

Type Checking and Type Conversion

Each memory location contains some binary string. By that information alone, it is impossible to tell the type of the data object stored in that location. A computer could try to add two integers, whereas they may actually record two reals, leading to a garbage result.

Type checking is the process of ensuring that the arguments to an expression are of correct types. In other words, to ensure that each operation receives the proper number of arguments of the proper type. This may be done dynamically, but this requires that each data object stores its type, and that each time an operation is performed the type is checked. Most languages minimise dynamic type checking by doing it statically at compile time.

Dynamic type checking (at run time)

advantage: flexibility in program design
disadvantages:
- programs are difficult to debug and difficult to remove all type errors
- the extra storage requirement can be substantial
- the speed of execution of the program can be greatly reduced

Static type checking (at compile time)

information required

for each operation, the number, order, and data types of its arguments and results
for each variable, the type of data object named
the type of each constant data object

advantage

the result is a substantial gain in efficiency of storage use and execution speed

Type Equivalence

When are two types equivalent?

Name Equivalence
Structural Equivalence

Structural Equivalence

Structural equivalence depends on a simple comparison of type descriptions after substituting out all structured type names and expanding all the way to built-in types. Original types are equivalent if the expanded type descriptions are the same.
Pointers complicate matters. The simple (used in Algol) approach is to pretend all pointers are equivalent.
Algol-68 uses structural equivalence, as did many early Pascal implementations (the definition was vague; Standard Pascal uses name equivalence). ML is more-or-less structural and C uses a hybrid (structural except for structs).
More flexiable than name equivalence but harder to implement.

Name Equivalence

Strict and loose variants of name equivalence depend on actual occurrences of declarations in the source code.
loose name equivalence types are equivalent if they refer to the same outermost constructor. The built-in types are assumed to be names for (implicit) pre-defined constructors.
strict name equivalence types are equivalent if they refer to the same declaration.
Name equivalence is more fashionable these days.

Example


        type  alink = pointer to cell;
        type blink = alink;
        p, q : pointer to cell;
        r    : alink;
        s    : blink;
        t    : pointer to cell;
        u    : alink;

Structural equivalence says all five variables have same type.
Strict name equivalence equates types of p & q and r & u, because they refer back to the same type declaration
Loose name equivalence equates types of p & q and r, s, & u because in both cases we can trace back to the same constructor.
The intuitive motivation for loose name equivalence is this: If you declare a type using a structure, you probably have a particular use in mind and you don't want to have your objects mistaken for something else that happens to have the same structure.
```
    type student = record
        name, address : string;
        SSN : integer;
    end;
    type school = record
        name, address : string;
        enrollment : integer;
    end;
```
On the other hand, if you define one type as an alias for another, you probably want them to be the same.

Strong typing

Strong typing has been defined in different ways. We will take it to mean the following:
A language is strongly typed if that language prevents you from applying an operation to data for which it is not appropriate.

Lisp is strongly typed (although it is not statically typed).

A related definition: A function f, with signature f : S -> R, is type safe if execution of f cannot generate a value outside of R

Type conversion and coercion

A coercion: automatic conversion to a different type
- in C: 1 + 1.5
A conversion: manual conversion to a different type, also called a type cast
- in C: (int)1.5
What to do when a type mismatch occurs
- flag as an error
- apply a coercion (in effect, coercion rules are a relaxation of type checking.)
The basic principle of coercion is not to lose information so coercions are inforced as widenings or promotions

Some unexpected results can occur.. in C



    6.0 +15/8 = 7.0 

    6 +15.0/8 = 7.875

Narrowing, or conversion to a smaller primitive type, should require a type cast, e.g., Java.
Recent thought is that coersion is probably a bad idea. Languages such as Modula-2 and Ada do not permit coercions.

Assignment and Initialization

Assignment is the basic operation for changing the binding of a value to a data object
l-value and r-value; an example:
{ int X; X = 14; ...

Location for an object X is its L-value. Contents of that location is its R-value
Where did names L-value and R-value come from?
Consider: A = B + C;
Executing that statement requires:
1. Pick up contents of location B
2. Add contents of location C
3. Store result into address A.
For each named object in an assignment statement, its position on the right-hand-side of the assignment operator (=) is a ``content-of'' access, and its position on the left-hand-side of the assignment operator is an ``address-of'' access.
address-of then is an L-value
contents-of then is an R-value
Value, by itself, generally means R-value

Elementary Data Types

Integers: Size of integers is usually correlated to hardware. There may be as many as 8 different integer types in a language.
Floating-Point Real Numbers: some languages prohibit testing for equality due to rounding errors. Usually exactly like the hardware, but not always; some languages allow accuracy specs in code (e.g., Ada):
```
         type Speed is
             digits 7 range 0.0..1000.0;
         type Voltage is
             delta 0.1 range -12.0..24.0;
```
Fixed-Point Real Numbers: for precision, floats are stored as integers with the location of the decimal point recorded, i.e.
23.45 = 2345 (2)
23.45 + 36.78 = 2345 (2) + 3678 (2)
23.456 + 36.78 = 23456 (3) + 36780 (3)
Advanatage is accuracy. The disadvantage is limited range and wasted memory.
Complex Numbers: sometimes provided, usually stored in two storage locations.
Rational Numbers: sometimes provided, a rational number is a number represented as a quotient of two numbers (i.e. LISP) Used to avoid roundoff errors. Often stored as unbounded length integers.
Enumerations: The user enumerates all of the possible values, which are symbolic constants, i.e. type class is red, yellow, blue. Usually stored as an integer, basic operations are the relational operators (greater than, equal, e.t.c.), assignment, successor and previous. The order is usually given by the order of their declaration, though C can override this,
i.e. enum class Fresh=14, Soph=36, Junior=4, Senior=42
Design Issue: Should a symbolic constant be allowed to be in more than one type definition?
Examples:
- Pascal--cannot reuse constants; they can be used for array subscripts, for variables, case selectors; no input or output; can be compared
- Ada--constants can be reused (overloaded literals); can be used as in Pascal; input and output supported
- C and C++--like Pascal, except they can be input and output as integers
- Java does not include an enumeration type
Subrange Types An ordered, contiguous subsequence of another ordinal type.
Design Issue: How can they be used?
Examples:
- Pascal--subrange types behave as their parent types; can be used as for variables and array indices
```
              type pos = 0 .. MAXINT;
```
- Ada--subtypes are not new types, just constrained existing types (so they are compatible); can be used as in Pascal, plus case constants
```
              subtype Pos_Type is
                  Integer range 0 ..Integer'Last;
```
Booleans: usually check for equality in one of two ways (a) if a particular bit is 0 or 1, (b) the value 0 represents false, and all other values represent true.
Usually stored as one memory storage unit. However some languages allow for many booleans to be collected into one unit. i.e. bit string (PL/I), packed array of boolean (Pascal).
Characters: Sometimes available individually, though sometimes only as a string. Sometimes languages do not specify a character set but simply use the basic set defined in the underlying hardware and operating system. Some do define a set (i.e. ASCII) but if the underlying machine does not support it, it must be emulated in software.