Areas of memory
Executing C and C++ programs are considered to have several segments or sections, distinct areas of memory.
- Text: For program code
- Data: For global and static variables
- Data (more specialized): For initialized variables
- BSS: For uninitialized variables — Required to be 0 in ANSI-C
- Stack: For local variables
- Heap: For dynamically allocated memory
Some compilers may place const variables in read-only segments. In some implementation each shared or dynamic-link library has its own segments for text and data. This allows several running programs to share common library routines.
Example C program using all areas
uint32_t Aglobal[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; const uint32_t Aconst[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; uint32_t Abss[10] ; void subr(void) { uint32_t ASlocal[10] ; return ; } int main(int argc, char** argv) { uint32_t AMlocal[10] ; uint32_t *Amalloc ; Amalloc = (uint32_t *)malloc(10*sizeof(uint32_t)) ; subr() ; return (EXIT_SUCCESS); }
Some address for those variables
Variable | 64-bit linux | 16-bit PIC24 | 32-bit PIC32 |
---|---|---|---|
Aglobal |
0000000000600B00 |
0850 |
A0000044 |
Aconst |
00000000004007A0 |
9072 |
9D000B04 |
Abss |
0000000000600B60 |
0878 |
A000001C |
Amalloc |
0000000000897010 |
09AE |
A00000A8 |
AMlocal |
00007FFF09C89C80 |
09F2 |
A0007FAC |
ASlocal |
00007FFF09C89C30 |
0A24 |
A0007F38 |
main |
0000000000400541 |
1204 |
9D000888 |
subr |
0000000000400514 |
11EA |
9D000868 |
Where are the global variables in Java?
The closest thing Java has to global variables are
public static
variables and references.
You can think of system.in
and
system.out
as globals but don’t
mention this is front of a Java programmer.
Pointers
Example C functions using pointers
void swap(int *p, int *q) { int t = *p ; *p = *q ; *q = t ; }
void sort(int *mn, int *mx) { if (*mn > *mx) { swap(mn, mx) ; } }
Example C++ functions using references
void swap(int &p, int &q) { int t = p ; p = q ; q = t ; }
void sort(int &mn, int &mx) { if (mn > mx) { swap(mn, mx) ; } }
Examples of implementing pointers on the PIC
Here is the C program.
uint32_t a = 202 ; uint32_t b = 255 ; uint32_t *p, *q ; int main() { b = a ; p = &a ; a = *p ; *p = a ; p = q ; *p = *q ; }
Here is an assembler program where the assembler will
figure out how to use $gp
, the global pointer, to
access the global variables.
lw $t1,a # b = a ; sw $t1,b la $t1,a # p = &a ; sw $t1,p aw $t1,p # a = *p ; lw $t1,0($t1) sw $t1,a lw $t1,a # *p = a ; lw $t0,p sw $t1,0($t0) lw $t1,q # p = q ; sw $t1,p lw $t1,q # *p = *q ; lw $t1,0($t1) lw $t0,p sw $t1,0($t0)
Here is an assembler program where variables are
accessed at offsets from $gp
, the global pointer.
lw $t1,offA($gp) # b = a ; sw $t1,offB($gp) addiu $t1,$gp,offA # p = &a ; sw $t1,offP($gp) lw $t1,offP($gp) # a = *p ; lw $t1,0($t1) sw $t1,offA($gp) lw $t1,offA($gp) # *p = a ; lw $t0,offP($gp) sw $t1,0($t0) lw $t1,offQ($gp) # p = q ; sw $t1,offP($gp) lw $t1,offQ($gp) # *p = *q ; lw $t1,0($t1) lw $t0,offP($gp) sw $t1,0($t0)
The null value
Way back in the late-50’s the LISP programming language was created by John McCarthy. For years, LISP was considered the langauge for artificial intelligene applications.
LISP has a special atom nil
which represents an
empty list and a function null
to test if an atom is
nil
.
Here is an example of a recursive
LISP program to sum all the elements of a list.
The car
operator returns the head, or first element, of a
list and the cdr
returns the tail, or remainder, of the
list.
By the way, car
and cdr
were assembly language
macros on the IBM 704.
(define sum(l) (cond ((null l) 0) (plus (car l) (sum (cdr l)))))
Several years ago, everyone graduating in computer science would have known
a little LISP.
Although usage of LISP has faded, its spirit has been revived with the
lambda
, so popular in JavaScript.
When Pascal was invented in 1970, it also had a value nil
,
which stood for “not in list”.
C, invented in 1972, had a NULL
value, a pointer to nowhere.
In C, the code for
recursively summing a list might look something like the following:
int sum(struct node *l) { if (l==NULL) { return 0 ; } else { return l->head + sum(l->tail) ; } }
C and C++ also have NUL
,
the null character, a special character value
used to terminate a string. The null character must be
implemented with the value.
'\0'
. The null pointer is usually implemented by
using the value 0
.
Java also has a special value null
which can be assigned to all
objects.
Yes, null is confusing and leads to a lot of contorted programming
and unreliable programs.
In Java 8, the
Optional class was introduced to allow null free programming.
PHP uses the
Elvis operator
?:
(rotate it a quarter-turn clockwise) to provide alternate values to
null
.
Where are the pointers in Java?
Java is defined to have two types of variables, primitive variables and reference variables. The reference variables are implemented using pointers to objects, but Java does a good job of hiding this from the programmer.
Arrays and pointers in C
In C, an array declaration defines a constant pointer
to a sequence of data values. For example:
int A[] = {2, 3, 5} ;
is really the same as:
int * const A = {2, 3, 5} ;
In neither case can A
be assigned a value.
However, A[2]
can be modified.
Incidentally, these declarations are not the same as:
int const A[] = {2, 3, 5} ;
or
int const * A = {2, 3, 5} ;
Neither of these declarations allow assignments to
A[2]
.
Finally, all you Java programmers should be warned that the following,
while encouraged in Java, is forbidden in C:
int[] A = {2, 3, 5} ;
In C the brackets must follow the variable name.
Pointer arithmetic in C
If p is the address of an integer in C, then
p+i is the address of the i’th
integer stored in memory after the place p points to.
This interpretation means that p[i]
can be
considered an abbreviation for *(p+i)
.
However, because 32-bit integers require 4 8-bit bytes,
the address of p[i]
is 4*i
memory locations
from the address of p[0]
. This can be very confusing
as shown in the following examples.
- The expression
&p[i] - &p[0]
isi
- The expression
&(void *)p[i] - &(void *)p[0]
is 4*i
Two equivalent C loops
for (i=0; i<1000; ++i) { A[i] = B[i] ; }
int *pA = &A[0] ; int *pB = &B[0] ; for (i=0; i<1000; ++i) { *pA++ = *pB++ ; }
In the old days programmers obsessed with efficiency would often write obfuscated loops using pointer arithmetic. Today, optimizing compiler use techniques, such as loop unrolling, that produce faster code than the “hand optimized” code.
Initialization vs Declarion
An initialization declares and assigns
a variable. Assuming that a
has been declared to
be an int
.
Then the following meaningless function
compiles without warning on recent C (C99) compilers.
void goodStuff() { int *pA ; *pA = a ; int pB[] = {2, 3, 5} ; int *pC = pB ; }
However, both statements of the following function will receive a warning.
void badStuff() { int *pA = a ; int *pB = {2, 3, 5} ; }
C Strings
In C a string is an array of characters; but, instead of having
a size field, it is terminated by a
null character.
Here are two programs to count the number of times the character
'B'
occurs in a string.
Note the use of pointer arithmetic in the second.
char bCount = 0 ; for (int i = 0; buff[i]; ++i) { if (buff[i] == 'A') { ++bCount ; } }
char bCount = 0 ; for (char *nextC = &buff[0]; *nextC; ++nextC) { if (*nextC == 'A') { ++bCount ; } }
Examples of implementing arrays on the PIC
Assume that variables n
,
vC
and vI
have been declared as follows.
char vC[100] ; int vI[100] ; int n ;
Then the statement
vC[n] = '0' ;
would be implemented as follows:
addi $t0,$zero,'0' lw $t1,n la $t2,vC add $t2,$t2,$t1 sb $t0,0($t2)
However the statement
vI[n] = '0' ;
would be implemented by multiplying the array index by four:
addi $t0,$zero,'0' lw $t1,n la $t2,vI sll $t1,$t1,2 # multiply $t1 by 4 add $t2,$t2,$t1 sw $t0,0($t2)
A brief look at structure
C has a non-heterogeneous data structure called the struct
,
resembling a method-less class in Java, where several fields
are stored within a collection.
struct point3D { int x ; int y ; int z ; } struct point3D P ; P.x = 5 ; P.y = P.x + 1 ; P.z = 7 ;
When implemented the fields are stored at fixed offsets from the beginning of the structure.
lw $t0,P # P.y <- P.x+1 addi $t0,$t0,1 sw $t0,P+4 addi $t0,$zero,7 # P.z <- 7 sw $t0,P+8
Java and C++ have a similar way of storing fields within classes, but its
more complicated due to the use of inheritance in both languages:
Field declined in superclasses must be stored at the beginning of the
implementing structure.
In Java, the implementing structure also stores references to the
object’s methods. In C++ this is only necessary for virtual
methods. However, C++ must deal with multiple inheritance.
Trying out an example
int V[500] ; int H[100] ; int i ; for (i=0; i<100; ++i) { H[i] = 0 ; } for (i=0; i<500; ++i) { ++H[V[i]] ; }