Other features of C

Structures in C

Structure declaraion

The C struct constructor allows the definition of a heterogeneous data structures with fields or members of different types. The name of each field within a structure defintion must be unique, but often different structure defintions will share a common name.

struct state {
  char *Name ;
  char Abbrev[2] ;
  int Population ;
} ;

struct state NC ;

NC.Name       = "North Carolina" ;
NC.Abbrev[0]  = 'N' ;
NC.Abbrev[1]  = 'C' ;
NC.Population =

Effectively every struct definition introduces a new programmer-defined type. Many programmers use typedef’s for structures.

typedef struct state State ;

State NC ;

By the way, the C structure is rather like a method-less class in either C++ or Java.

The dot operator

The dot operator joins a structure expression with a field or member name.

The . operator has the highest precedence level and has left-to-right precedence. Other operators at this level are:

Function call with ( )
Array subscript with [ ]
Field selection from object .
Field selection from pointer ->
Postfix ++
Postfix --

A horrendous example of this would be f("United States")[30].Popluation++ used as a expression; however, you do often find long sequences such as USA[12].Name[0].

Structures as variables

In ANSI C, one structure can be assigned (with =) to another and structures can be passed to and returned from functions. This involves copying the entire structure and is reasonable only with very small structures.

C also has a syntax for inializing an entire structure in one statement.

struct state NC = { "North Carolina", {'N','C'}, 9535483 } ;

In C you can’t return an array from a function, but you can return a structure containing an array.

Arrays of structures

There is nothing unusual here.

struct state USA[50] ;

USA[12].Name       = "North Carolina" ;
USA[12].Abbrev[0]  = 'N' ;
USA[12].Abbrev[1]  = 'C' ;
USA[12].Population = 9535483 ;

Nested structures

Nested Structures are both useful and common.

struct county {
  char *Name ;
  char  Population ;
  struct state State ;
} ;

struct county Haywood ;

Haywood.Name = "Haywood" ;
Haywood.State.Name = "North Carolina" ;

Recursive structure definitions

Structures are often defined to contain pointers to other stuctures, and some of those structures may even be of the same type. This is quite natural. After all a structure representing a person will need a reference to another person, such as a mother!

typedef struct Person *PersonRef ;

typedef struct Person {
  char Name[80] ;
  PersonRef mother ;
} PersonNode ;

Dynamic allocation of structures using pointers

Recursive data structure definitions frequently lead to dynamically allocated structures.

struct state *eve ;

Of course, you do need to use a pointer to refer to fields within dynamically allocated structures. For example, you’d need to use (*eve).Name to access the Name field of our dynamic person. Because dynamic structures are so popular, C has a special syntax for refering to their fields: P->F is a way of saying (*P).F. (Think of -> as a sign pointing in a direction.) This allows you to refer to our Eve’s name as eve->Name

eve = (Personref) malloc (sizeof(struct Person)) ;
strcpy(&eve->Name[0], "Eve") ;
eve->mother = eve ;

Bit fields

It is possible to to place several fields of a structure within a single integer. This can be done to conserve storage (though at the cost of increased runtime) or to conform to external standards.

IP (Internet Protocol) packet header

The first four bits of an IPv4 packet contains the IP version of the packet (which is 4 for 99.9% of the packets) and the next four bits contain the length of the header in 32 bit words. Because these must be packed in big endian order on the “wire,” the standard ip.h include file, must contain a pre-processor conditional. This example is copyrighted by the Free Software Foundation.

#if __BYTE_ORDER == __LITTLE_ENDIAN
    unsigned int flags:4;
    unsigned int overflow:4;
#elif __BYTE_ORDER == __BIG_ENDIAN
    unsigned int overflow:4;
    unsigned int flags:4;
#else
# error	"Please fix <bits/endian.h>"
#endif

Syntax of the bit field specification

The type of the bit field can be either unsigned int, signed int, or int. Oddly enough, plain int’s may be considered signed or unsigned depending on the implementation. Because older C implementations considered all bit fields to be unsigned, it is safer to stick with unsigned int.

The field name can be omitted. This is useful when the programmer wishes to align following fields in a special manner, perhaps to conform with an external standard.

The fields are allocated within integers from left-to-right or right-to-left depending on the endianness of the implementation. If a field “straddles” an integer boundary, it is usually placed in the next integer. However, this is also implementation dependent. Finally, since the size of the integer is implementation dependent, it may be difficult to predict where the straddle occurs.

As a special case, if the field width is 0, the next bit field is placed at the beginning of the next integer.

One consequence of these rules is that it is difficult to write portable C code using bit fields. If you do, stick to unsigned integer fields and use 0 field widths liberally to obtain the desired alignment. Bit fields may seem more convenient than masking and shifting with bit operations, but the convenience comes at the price of portability.

Restrictions on bit fields

Bit fields cannot be arrays.

Bit fields do not have addresses. The & operator cannot be applied to a bit field.

Structures and symbols

Consider each structure as having its own symbol table. Here’s a possible one for the 16-bit PIC24 architecture.

field	`struct`	offset	size
`Abbrev`	`state`	2	2
`Name`	`state`	0	2
`Population`	`state`	6	2

The adress of NC.Population would be the address of NC plus the offset of Population.

In C you are allowed to assume that fields will be allocated in memory in the same order as they appear in the definition; however, because the compiler is free to insert padding between fields, this assumption is rarely useful.

Unions

The union is a structure in which only one field can be active. They are usually implemented by having fields “share” the same memory. The syntax for accessing union fields is identical to the syntax for accessing structure fields.

Unions are confusing.

Color example

Color can be represented in several ways.

union Color {
  float hsb[3] ;
  int   rgb[3] ;
}

union Color Red, Blue ;
Red.rgb[0] = 255 ;
Red.rgb[1] = 0 ;
Red.rgb[2] = 0 ;
Blue.hsb[0] = 0.6666666 ;
Blue.hsb[1] = 1.0 ;
Blue.hsb[2] = 1.0 ;

But how can you tell RGB from HSB?

struct Color {
  int   type ;    /* 0 for HSB, 1 for RGB */
  union uc {
    float hsb[3] ;
    int   rgb[3] ;
  }
}

struct Color Red ;
Red.type = 1 ;
Red.uc.hsb[0] = 255 ;
Red.uc.hsb[1] = 0 ;
Red.uc.hsb[2] = 0 ;

Issues

Size of the union structure
Dangers of assigning the wrong field
ICMP include

Enumerated types

Enumerated types are a great way to assign logical names to useful constants. They look better than those DEFINE’s in the ICMP include file.

enum ResistorCode {black,  brown,  red,    orange, yellow,
                   green,  blue,   violet, grey,   white } ;
enum ResistorCode band1, band2, band3, color ;
band1 = orange ;
band2 = orange ;
band3 = brown ;
color = black ;
++color ;

Java did not have enumerations until version 1.5, when they were announced with great bravado.

Functions as parameters

In C, functions can be passed pointers to functions.

/* Function to perform a numeric integration */
double Integrate0to1(double(*f)(double), int N) {
  double sum = 0.0 ;
  int i ;
  sum = f(0.0)*0.5 ;
  for(i=1; i < N-1; ++i) {
    sum += f(1.0/N) ;
  }
  sum += f(1.0)*0.5 ;
  return sum/N ;
}

C Preprocessor

The C preprocessor, generally called cpp, transforms your code before passes it to the C compiler.

The C preprocessor
Removes most comments, but adds a few giving the name of files and line numbers within files. The C compiler uses these when generating error messages.	Before	After
Allows the splitting of long lines.	Before	After
Substitutes for trigraphs if you ask.	Before	After
Allows the definition of manifest constants.	Before	After
Allows the use of macros.	Before	After
Allows the use of useful macros, if you add in enough parentheses.	Before	After
Stringizes.	Before	After
Glues.	Before	After
Reads your include files.	Before	After
Reads the standard include files.	Before	After
"Branches" according to what is defined.	Before	After
Has a common technique for avoiding multiple inclusion of include files.	Before	After
Allows complicated tests on integers for branches.	Before	After
Has some preloaded macros specified by ANSI C standard.	Before	After
Can stop compilation when something is wrong.	Before

By the way, .cpp is a common file extension for C++ programs.