C Pointers and arrays

Areas of memory

Executing C and C++ programs are considered to have several segments or sections, distinct areas of memory.

Text: For program code
Data: For global and static variables
- Data (more specialized): For initialized variables
- BSS: For uninitialized variables — Required to be 0 in ANSI-C
Stack: For local variables
Heap: For dynamically allocated memory

Some compilers may place const variables in read-only segments. In some implementation each shared or dynamic-link library has its own segments for text and data. This allows several running programs to share common library routines.

Example C program using all areas

uint32_t Aglobal[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
const uint32_t Aconst[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
uint32_t Abss[10] ;

void subr(void) {
  uint32_t ASlocal[10] ;
  return ;
}

int main(int argc, char** argv) {
  uint32_t AMlocal[10] ;
  uint32_t *Amalloc ;
  Amalloc = (uint32_t *)malloc(10*sizeof(uint32_t)) ;
  subr() ;
  return (EXIT_SUCCESS);
}

Some address for those variables

Variable	64-bit linux	16-bit PIC24	32-bit PIC32
`Aglobal`	`0000000000600B00`	`0850`	`A0000044`
`Aconst`	`00000000004007A0`	`9072`	`9D000B04`
`Abss`	`0000000000600B60`	`0878`	`A000001C`
`Amalloc`	`0000000000897010`	`09AE`	`A00000A8`
`AMlocal`	`00007FFF09C89C80`	`09F2`	`A0007FAC`
`ASlocal`	`00007FFF09C89C30`	`0A24`	`A0007F38`
`main`	`0000000000400541`	`1204`	`9D000888`
`subr`	`0000000000400514`	`11EA`	`9D000868`

Where are the global variables in Java?

The closest thing Java has to global variables are public static variables and references. You can think of system.in and system.out as globals but don’t mention this is front of a Java programmer.

Pointers

Example C functions using pointers

void swap(int *p, int *q) {
  int t = *p ;
  *p = *q ;
  *q = t ;
}

void sort(int *mn, int *mx) {
  if (*mn > *mx) {
    swap(mn, mx) ;
  }
}

Example C++ functions using references

void swap(int &p, int &q) {
  int t = p ;
  p = q ;
  q = t ;
}

void sort(int &mn, int &mx) {
  if (mn > mx) {
    swap(mn, mx) ;
  }
}

Examples of implementing pointers on the PIC

Here is the C program.

    uint32_t a = 202 ;
    uint32_t b = 255 ;
    uint32_t *p, *q ;

int main() {
    b  = a ;
    p  = &a ;
    a  = *p ;
    *p = a ;
    p  = q ;
    *p = *q ;
}

Here is an assembler program where the assembler will figure out how to use $gp, the global pointer, to access the global variables.

     lw     $t1,a                    #   b  = a ;
     sw     $t1,b

     la     $t1,a                    #   p  = &a ;
     sw     $t1,p

     aw     $t1,p                    #   a  = *p ;
     lw     $t1,0($t1)
     sw     $t1,a

     lw     $t1,a                    #   *p = a ;
     lw     $t0,p
     sw     $t1,0($t0)

     lw     $t1,q                    #   p  = q ;
     sw     $t1,p

     lw     $t1,q                    #   *p = *q ;
     lw     $t1,0($t1)
     lw     $t0,p
     sw     $t1,0($t0)

Here is an assembler program where variables are accessed at offsets from $gp, the global pointer.

     lw     $t1,offA($gp)            #   b  = a ;
     sw     $t1,offB($gp)

     addiu  $t1,$gp,offA             #   p  = &a ;
     sw     $t1,offP($gp)

     lw     $t1,offP($gp)            #   a  = *p ;
     lw     $t1,0($t1)
     sw     $t1,offA($gp)

     lw     $t1,offA($gp)            #   *p = a ;
     lw     $t0,offP($gp)
     sw     $t1,0($t0)

     lw     $t1,offQ($gp)            #   p  = q ;
     sw     $t1,offP($gp)

     lw     $t1,offQ($gp)            #   *p = *q ;
     lw     $t1,0($t1)
     lw     $t0,offP($gp)
     sw     $t1,0($t0)

The null value

Way back in the late-50’s the LISP programming language was created by John McCarthy. For years, LISP was considered the langauge for artificial intelligene applications.

LISP has a special atom nil which represents an empty list and a function null to test if an atom is nil. Here is an example of a recursive LISP program to sum all the elements of a list. The car operator returns the head, or first element, of a list and the cdr returns the tail, or remainder, of the list. By the way, car and cdr were assembly language macros on the IBM 704.

(define sum(l)
   (cond ((null l) 0)
         (plus (car l) (sum (cdr l)))))

Several years ago, everyone graduating in computer science would have known a little LISP. Although usage of LISP has faded, its spirit has been revived with the lambda, so popular in JavaScript.

When Pascal was invented in 1970, it also had a value nil, which stood for “not in list”. C, invented in 1972, had a NULL value, a pointer to nowhere. In C, the code for recursively summing a list might look something like the following:

int sum(struct node *l) {
  if (l==NULL) {
    return 0 ;
  } else {
    return l->head + sum(l->tail) ;
  }
}

C and C++ also have NUL, the null character, a special character value used to terminate a string. The null character must be implemented with the value. '\0'. The null pointer is usually implemented by using the value 0. Java also has a special value null which can be assigned to all objects.

Yes, null is confusing and leads to a lot of contorted programming and unreliable programs. In Java 8, the Optional class was introduced to allow null free programming. PHP uses the Elvis operator ?: (rotate it a quarter-turn clockwise) to provide alternate values to null.

Where are the pointers in Java?

Java is defined to have two types of variables, primitive variables and reference variables. The reference variables are implemented using pointers to objects, but Java does a good job of hiding this from the programmer.

Arrays and pointers in C

In C, an array declaration defines a constant pointer to a sequence of data values. For example:
int A[] = {2, 3, 5} ;
is really the same as:
int * const A = {2, 3, 5} ;
In neither case can A be assigned a value. However, A[2] can be modified.

Incidentally, these declarations are not the same as:
int const A[] = {2, 3, 5} ;
or
int const * A = {2, 3, 5} ;
Neither of these declarations allow assignments to A[2].

Finally, all you Java programmers should be warned that the following, while encouraged in Java, is forbidden in C:
int[] A = {2, 3, 5} ;
In C the brackets must follow the variable name.

Pointer arithmetic in C

If p is the address of an integer in C, then p+i is the address of the i’th integer stored in memory after the place p points to. This interpretation means that p[i] can be considered an abbreviation for *(p+i).

However, because 32-bit integers require 4 8-bit bytes, the address of p[i] is 4*i memory locations from the address of p[0]. This can be very confusing as shown in the following examples.

The expression &p[i] - &p[0] is i
The expression &(void *)p[i] - &(void *)p[0] is 4*i

Two equivalent C loops

for (i=0; i<1000; ++i) {
  A[i] = B[i] ;
}

int *pA = &A[0]  ;
int *pB = &B[0]  ;
for (i=0; i<1000; ++i) {
  *pA++ = *pB++ ;
}

In the old days programmers obsessed with efficiency would often write obfuscated loops using pointer arithmetic. Today, optimizing compiler use techniques, such as loop unrolling, that produce faster code than the “hand optimized” code.

Initialization vs Declarion

An initialization declares and assigns a variable. Assuming that a has been declared to be an int. Then the following meaningless function compiles without warning on recent C (C99) compilers.

void goodStuff() {
    int *pA ;
    *pA = a ;
    int pB[] = {2, 3, 5} ;
    int *pC = pB ;
}

However, both statements of the following function will receive a warning.

void badStuff() {
    int *pA = a ;
    int *pB = {2, 3, 5} ;
}

C Strings

In C a string is an array of characters; but, instead of having a size field, it is terminated by a null character. Here are two programs to count the number of times the character 'B' occurs in a string. Note the use of pointer arithmetic in the second.

char bCount = 0 ;
for (int i = 0; buff[i]; ++i) {
   if (buff[i] == 'A') {
      ++bCount ;
   }
}

char bCount = 0 ;
for (char *nextC = &buff[0]; *nextC; ++nextC) {
   if (*nextC == 'A') {
      ++bCount ;
   }
}

Examples of implementing arrays on the PIC

Assume that variables n, vC and vI have been declared as follows.

char vC[100] ;
int  vI[100] ;
int n ;

Then the statement
vC[n] = '0' ;
would be implemented as follows:

       addi   $t0,$zero,'0'
       lw     $t1,n
       la     $t2,vC
       add    $t2,$t2,$t1
       sb     $t0,0($t2)

However the statement
vI[n] = '0' ;
would be implemented by multiplying the array index by four:

       addi   $t0,$zero,'0'
       lw     $t1,n
       la     $t2,vI
       sll    $t1,$t1,2        # multiply $t1 by 4
       add    $t2,$t2,$t1
       sw     $t0,0($t2)

A brief look at structure

C has a non-heterogeneous data structure called the struct, resembling a method-less class in Java, where several fields are stored within a collection.

struct point3D {
  int x ;
  int y ;
  int z ;
}
struct point3D P ;
P.x = 5 ;
P.y = P.x + 1 ;
P.z = 7 ;

When implemented the fields are stored at fixed offsets from the beginning of the structure.

       lw      $t0,P           # P.y <- P.x+1
       addi    $t0,$t0,1
       sw      $t0,P+4

       addi    $t0,$zero,7     # P.z <- 7
       sw      $t0,P+8

Java and C++ have a similar way of storing fields within classes, but its more complicated due to the use of inheritance in both languages: Field declined in superclasses must be stored at the beginning of the implementing structure. In Java, the implementing structure also stores references to the object’s methods. In C++ this is only necessary for virtual methods. However, C++ must deal with multiple inheritance.

Trying out an example

int V[500] ;
int H[100] ;
int i ;
for (i=0; i<100; ++i) {
  H[i] = 0 ;
}
for (i=0; i<500; ++i) {
  ++H[V[i]] ;
}