Structures in C
Structure declaraion
The C struct
constructor allows the definition of a
heterogeneous data structures with fields or members
of different types.
The name of each field within a structure defintion must be unique,
but often different structure defintions will share a common name.
struct state { char *Name ; char Abbrev[2] ; int Population ; } ; struct state NC ; NC.Name = "North Carolina" ; NC.Abbrev[0] = 'N' ; NC.Abbrev[1] = 'C' ; NC.Population =
Effectively every struct
definition
introduces a new programmer-defined type.
Many programmers use typedef
’s for structures.
typedef struct state State ; State NC ;
By the way, the C structure is rather like a method-less class in either C++ or Java.
The dot operator
The dot operator joins a structure expression with a field or member name.
The .
operator has the highest precedence level and has
left-to-right precedence. Other operators at this level are:
- Function call with
( )
- Array subscript with
[ ]
- Field selection from object
.
- Field selection from pointer
->
- Postfix
++
- Postfix
--
A horrendous example of this would be
f("United States")[30].Popluation++
used as a expression;
however, you do often find long sequences such as
USA[12].Name[0]
.
Structures as variables
In ANSI C, one structure can be assigned (with =
) to another
and structures can be passed to and returned from functions.
This involves copying the entire structure and is reasonable only with
very small structures.
C also has a syntax for inializing an entire structure in one statement.
struct state NC = { "North Carolina", {'N','C'}, 9535483 } ;
In C you can’t return an array from a function, but you can return a structure containing an array.
Arrays of structures
There is nothing unusual here.
struct state USA[50] ; USA[12].Name = "North Carolina" ; USA[12].Abbrev[0] = 'N' ; USA[12].Abbrev[1] = 'C' ; USA[12].Population = 9535483 ;
Nested structures
Nested Structures are both useful and common.
struct county { char *Name ; char Population ; struct state State ; } ; struct county Haywood ; Haywood.Name = "Haywood" ; Haywood.State.Name = "North Carolina" ;
Recursive structure definitions
Structures are often defined to contain pointers to other stuctures, and some of those structures may even be of the same type. This is quite natural. After all a structure representing a person will need a reference to another person, such as a mother!
typedef struct Person *PersonRef ; typedef struct Person { char Name[80] ; PersonRef mother ; } PersonNode ;
Dynamic allocation of structures using pointers
Recursive data structure definitions frequently lead to dynamically allocated structures.
struct state *eve ;
Of course, you do need to use a pointer to refer to fields within
dynamically allocated structures. For example,
you’d need to use (*eve).Name
to access the Name
field of our dynamic person.
Because dynamic structures are so popular, C has a special
syntax for refering to their fields:
P->F
is a
way of saying (*P).F
.
(Think of ->
as a sign pointing in a direction.)
This allows you to refer to our Eve’s name as
eve->Name
eve = (Personref) malloc (sizeof(struct Person)) ; strcpy(&eve->Name[0], "Eve") ; eve->mother = eve ;
Bit fields
It is possible to to place several fields of a structure within a single integer. This can be done to conserve storage (though at the cost of increased runtime) or to conform to external standards.
IP (Internet Protocol) packet header
The first four bits of an
IPv4 packet
contains the IP version of the packet (which is 4 for 99.9% of the packets)
and the next four bits contain the length of the header in 32 bit
words. Because these must be packed in big endian order
on the “wire,”
the standard
ip.h
include file, must contain a
pre-processor conditional.
This example is copyrighted by the Free Software Foundation.
#if __BYTE_ORDER == __LITTLE_ENDIAN unsigned int flags:4; unsigned int overflow:4; #elif __BYTE_ORDER == __BIG_ENDIAN unsigned int overflow:4; unsigned int flags:4; #else # error "Please fix <bits/endian.h>" #endif
Syntax of the bit field specification
The type of the bit field can be either unsigned int
,
signed int
, or int
.
Oddly enough, plain int
’s may be considered signed
or unsigned depending on the implementation.
Because older C implementations considered all bit fields
to be unsigned, it is safer to stick with
unsigned int
.
The field name can be omitted. This is useful when the programmer wishes to align following fields in a special manner, perhaps to conform with an external standard.
The fields are allocated within integers from left-to-right or right-to-left depending on the endianness of the implementation. If a field “straddles” an integer boundary, it is usually placed in the next integer. However, this is also implementation dependent. Finally, since the size of the integer is implementation dependent, it may be difficult to predict where the straddle occurs.
As a special case, if the field width is 0, the next bit field is placed at the beginning of the next integer.
One consequence of these rules is that it is difficult to write portable C code using bit fields. If you do, stick to unsigned integer fields and use 0 field widths liberally to obtain the desired alignment. Bit fields may seem more convenient than masking and shifting with bit operations, but the convenience comes at the price of portability.
Restrictions on bit fields
Bit fields cannot be arrays.
Bit fields do not have addresses. The &
operator cannot be applied to a bit field.
Structures and symbols
Consider each structure as having its own symbol table. Here’s a possible one for the 16-bit PIC24 architecture.
field | struct | offset | size |
---|---|---|---|
Abbrev | state | 2 | 2 |
Name | state | 0 | 2 |
Population | state | 6 | 2 |
The adress of NC.Population
would be the
address of NC
plus the offset of Population
.
In C you are allowed to assume that fields will be allocated in memory in the same order as they appear in the definition; however, because the compiler is free to insert padding between fields, this assumption is rarely useful.
Unions
The union
is a structure in which only
one field can be active. They are usually implemented
by having
fields “share” the same memory.
The syntax for accessing union
fields is identical to the syntax for accessing structure fields.
Unions are confusing.
Color example
Color can be represented in several ways.
union Color { float hsb[3] ; int rgb[3] ; } union Color Red, Blue ; Red.rgb[0] = 255 ; Red.rgb[1] = 0 ; Red.rgb[2] = 0 ; Blue.hsb[0] = 0.6666666 ; Blue.hsb[1] = 1.0 ; Blue.hsb[2] = 1.0 ;
But how can you tell RGB from HSB?
struct Color { int type ; /* 0 for HSB, 1 for RGB */ union uc { float hsb[3] ; int rgb[3] ; } } struct Color Red ; Red.type = 1 ; Red.uc.hsb[0] = 255 ; Red.uc.hsb[1] = 0 ; Red.uc.hsb[2] = 0 ;
Issues
- Size of the
union
structure - Dangers of assigning the wrong field
- ICMP include
Enumerated types
Enumerated types are a great way to assign logical names
to useful constants.
They look better than those DEFINE
’s in the
ICMP include file.
enum ResistorCode {black, brown, red, orange, yellow, green, blue, violet, grey, white } ; enum ResistorCode band1, band2, band3, color ; band1 = orange ; band2 = orange ; band3 = brown ; color = black ; ++color ;
Java did not have enumerations until version 1.5, when they were announced with great bravado.
Functions as parameters
In C, functions can be passed pointers to functions.
/* Function to perform a numeric integration */ double Integrate0to1(double(*f)(double), int N) { double sum = 0.0 ; int i ; sum = f(0.0)*0.5 ; for(i=1; i < N-1; ++i) { sum += f(1.0/N) ; } sum += f(1.0)*0.5 ; return sum/N ; }
C Preprocessor
The C preprocessor,
generally called cpp
,
transforms your code before passes it to the C compiler.
The C preprocessor | ||
---|---|---|
Removes most comments, but adds a few giving the name of files and line numbers within files. The C compiler uses these when generating error messages. | Before | After |
Allows the splitting of long lines. | Before | After |
Substitutes for trigraphs if you ask. | Before | After |
Allows the definition of manifest constants. | Before | After |
Allows the use of macros. | Before | After |
Allows the use of useful macros, if you add in enough parentheses. | Before | After |
Stringizes. | Before | After |
Glues. | Before | After |
Reads your include files. | Before | After |
Reads the standard include files. | Before | After |
"Branches" according to what is defined. | Before | After |
Has a common technique for avoiding multiple inclusion of include files. | Before | After |
Allows complicated tests on integers for branches. | Before | After |
Has some preloaded macros specified by ANSI C standard. | Before | After |
Can stop compilation when something is wrong. | Before |
By the way, .cpp
is a common file extension
for C++ programs.