CSCI 255 Pointers and Arrays

Areas of memory

Executing C and C++ programs are considered to have several segments or sections, distinct areas of memory.

Some compilers may place const variables in read-only segments. In some implementation each shared or dynamic-link library has its own segments for text and data. This allows several running programs to share common library routines.

Example C program using all areas

uint32_t Aglobal[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
const uint32_t Aconst[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
uint32_t Abss[10] ;

void subr(void) {
  uint32_t ASlocal[10] ;
  return ;
}

int main(int argc, char** argv) {
  uint32_t AMlocal[10] ;
  uint32_t *Amalloc ;
  Amalloc = (uint32_t *)malloc(10*sizeof(uint32_t)) ;
  subr() ;
  return (EXIT_SUCCESS);
}

Some address for those variables

Variable 64-bit linux 16-bit PIC24 32-bit PIC32
Aglobal 0000000000600B00 0850 A0000044
Aconst 00000000004007A0 9072 9D000B04
Abss 0000000000600B60 0878 A000001C
Amalloc 0000000000897010 09AE A00000A8
AMlocal 00007FFF09C89C80 09F2 A0007FAC
ASlocal 00007FFF09C89C30 0A24 A0007F38
main 0000000000400541 1204 9D000888
subr 0000000000400514 11EA 9D000868

Where are the global variables in Java?

The closest thing Java has to global variables are public static variables and references. You can think of system.in and system.out as globals but don’t mention this is front of a Java programmer.

Pointers

Example C functions using pointers

void swap(int *p, int *q) {
  int t = *p ;
  *p = *q ;
  *q = t ;
}
void sort(int *mn, int *mx) {
  if (*mn > *mx) {
    swap(mn, mx) ;
  }
}

Example C++ functions using references

void swap(int &p, int &q) {
  int t = p ;
  p = q ;
  q = t ;
}
void sort(int &mn, int &mx) {
  if (mn > mx) {
    swap(mn, mx) ;
  }
}

Examples of implementing pointers on the PIC

Here is the C program.

    uint32_t a = 202 ;
    uint32_t b = 255 ;
    uint32_t *p, *q ;

int main() {
    b  = a ;
    p  = &a ;
    a  = *p ;
    *p = a ;
    p  = q ;
    *p = *q ;
}

Here is an assembler program where the assembler will figure out how to use $gp, the global pointer, to access the global variables.

     lw     $t1,a                    #   b  = a ;
     sw     $t1,b

     la     $t1,a                    #   p  = &a ;
     sw     $t1,p

     aw     $t1,p                    #   a  = *p ;
     lw     $t1,0($t1)
     sw     $t1,a

     lw     $t1,a                    #   *p = a ;
     lw     $t0,p
     sw     $t1,0($t0)

     lw     $t1,q                    #   p  = q ;
     sw     $t1,p

     lw     $t1,q                    #   *p = *q ;
     lw     $t1,0($t1)
     lw     $t0,p
     sw     $t1,0($t0)

Here is an assembler program where variables are accessed at offsets from $gp, the global pointer.

     lw     $t1,offA($gp)            #   b  = a ;
     sw     $t1,offB($gp)

     addiu  $t1,$gp,offA             #   p  = &a ;
     sw     $t1,offP($gp)

     lw     $t1,offP($gp)            #   a  = *p ;
     lw     $t1,0($t1)
     sw     $t1,offA($gp)

     lw     $t1,offA($gp)            #   *p = a ;
     lw     $t0,offP($gp)
     sw     $t1,0($t0)

     lw     $t1,offQ($gp)            #   p  = q ;
     sw     $t1,offP($gp)

     lw     $t1,offQ($gp)            #   *p = *q ;
     lw     $t1,0($t1)
     lw     $t0,offP($gp)
     sw     $t1,0($t0)

The null value

Way back in the late-50’s the LISP programming language was created by John McCarthy. For years, LISP was considered the langauge for artificial intelligene applications.

LISP has a special atom nil which represents an empty list and a function null to test if an atom is nil. Here is an example of a recursive LISP program to sum all the elements of a list. The car operator returns the head, or first element, of a list and the cdr returns the tail, or remainder, of the list. By the way, car and cdr were assembly language macros on the IBM 704.

(define sum(l)
   (cond ((null l) 0)
         (plus (car l) (sum (cdr l)))))

Several years ago, everyone graduating in computer science would have known a little LISP. Although usage of LISP has faded, its spirit has been revived with the lambda, so popular in JavaScript.

When Pascal was invented in 1970, it also had a value nil, which stood for “not in list”. C, invented in 1972, had a NULL value, a pointer to nowhere. In C, the code for recursively summing a list might look something like the following:

int sum(struct node *l) {
  if (l==NULL) {
    return 0 ;
  } else {
    return l->head + sum(l->tail) ;
  }
}

C and C++ also have NUL, the null character, a special character value used to terminate a string. The null character must be implemented with the value. '\0'. The null pointer is usually implemented by using the value 0. Java also has a special value null which can be assigned to all objects.

Yes, null is confusing and leads to a lot of contorted programming and unreliable programs. In Java 8, the Optional class was introduced to allow null free programming. PHP uses the Elvis operator ?: (rotate it a quarter-turn clockwise) to provide alternate values to null.

Where are the pointers in Java?

Java is defined to have two types of variables, primitive variables and reference variables. The reference variables are implemented using pointers to objects, but Java does a good job of hiding this from the programmer.

Arrays and pointers in C

In C, an array declaration defines a constant pointer to a sequence of data values. For example:
        int A[] = {2, 3, 5} ;
is really the same as:
        int * const A = {2, 3, 5} ;
In neither case can A be assigned a value. However, A[2] can be modified.

Incidentally, these declarations are not the same as:
        int const A[] = {2, 3, 5} ;
or
        int const * A = {2, 3, 5} ;
Neither of these declarations allow assignments to A[2].

Finally, all you Java programmers should be warned that the following, while encouraged in Java, is forbidden in C:
        int[] A = {2, 3, 5} ;
In C the brackets must follow the variable name.

Pointer arithmetic in C

If p is the address of an integer in C, then p+i is the address of the i’th integer stored in memory after the place p points to. This interpretation means that p[i] can be considered an abbreviation for *(p+i).

However, because 32-bit integers require 4 8-bit bytes, the address of p[i] is 4*i memory locations from the address of p[0]. This can be very confusing as shown in the following examples.

Two equivalent C loops

for (i=0; i<1000; ++i) {
  A[i] = B[i] ;
}
int *pA = &A[0]  ;
int *pB = &B[0]  ;
for (i=0; i<1000; ++i) {
  *pA++ = *pB++ ;
}

In the old days programmers obsessed with efficiency would often write obfuscated loops using pointer arithmetic. Today, optimizing compiler use techniques, such as loop unrolling, that produce faster code than the “hand optimized” code.

Initialization vs Declarion

An initialization declares and assigns a variable. Assuming that a has been declared to be an int. Then the following meaningless function compiles without warning on recent C (C99) compilers.

void goodStuff() {
    int *pA ;
    *pA = a ;
    int pB[] = {2, 3, 5} ;
    int *pC = pB ;
}

However, both statements of the following function will receive a warning.

void badStuff() {
    int *pA = a ;
    int *pB = {2, 3, 5} ;
}

C Strings

In C a string is an array of characters; but, instead of having a size field, it is terminated by a null character. Here are two programs to count the number of times the character 'B' occurs in a string. Note the use of pointer arithmetic in the second.

char bCount = 0 ;
for (int i = 0; buff[i]; ++i) {
   if (buff[i] == 'A') {
      ++bCount ;
   }
}
char bCount = 0 ;
for (char *nextC = &buff[0]; *nextC; ++nextC) {
   if (*nextC == 'A') {
      ++bCount ;
   }
}

Examples of implementing arrays on the PIC

Assume that variables n, vC and vI have been declared as follows.

char vC[100] ;
int  vI[100] ;
int n ;

Then the statement
    vC[n] = '0' ;
would be implemented as follows:

       addi   $t0,$zero,'0'
       lw     $t1,n
       la     $t2,vC
       add    $t2,$t2,$t1
       sb     $t0,0($t2)

However the statement
    vI[n] = '0' ;
would be implemented by multiplying the array index by four:

       addi   $t0,$zero,'0'
       lw     $t1,n
       la     $t2,vI
       sll    $t1,$t1,2        # multiply $t1 by 4
       add    $t2,$t2,$t1
       sw     $t0,0($t2)

A brief look at structure

C has a non-heterogeneous data structure called the struct, resembling a method-less class in Java, where several fields are stored within a collection.

struct point3D {
  int x ;
  int y ;
  int z ;
}
struct point3D P ;
P.x = 5 ;
P.y = P.x + 1 ;
P.z = 7 ;

When implemented the fields are stored at fixed offsets from the beginning of the structure.

       lw      $t0,P           # P.y <- P.x+1
       addi    $t0,$t0,1
       sw      $t0,P+4

       addi    $t0,$zero,7     # P.z <- 7
       sw      $t0,P+8

Java and C++ have a similar way of storing fields within classes, but its more complicated due to the use of inheritance in both languages: Field declined in superclasses must be stored at the beginning of the implementing structure. In Java, the implementing structure also stores references to the object’s methods. In C++ this is only necessary for virtual methods. However, C++ must deal with multiple inheritance.

Trying out an example

int V[500] ;
int H[100] ;
int i ;
for (i=0; i<100; ++i) {
  H[i] = 0 ;
}
for (i=0; i<500; ++i) {
  ++H[V[i]] ;
}