C: The Dark Corners

Updated 2017-02-15

C was designed with simplicity in mind. Despite this, C has a lot of dark corners that are not necessarily well known. Here follows an incomplete collection of them.

For historical reasons, we also include some C++ intricacies that may be source of confusion when writing C programs.

Constancy

Constant definition

The const key word always applies to the identifier to the left, when any, or to the right otherwise.

Both following lines declare a pointer over a constant integer.

const int * pi;
int const * pi;

The following, however, declares a constant pointer over an integer.

int * const pi;

Pay special attention when declaring pointers to arrays because of the operator precedence. Here we have an array of 12 pointers to constant integers.

const int *pi[12];

The next one is a pointer to an array of 12 constant integers.

const int (*pi)[12];

It is always possible to make something constant, but the opposite is not true.

In C++, it is possible to add the const key word next to a method prototype to specify that it will not modify the attributes.

Constant pointers

The following is forbidden:

char *pc;
const char **ppc;

ppc = &pc; // Forbidden!

This would break the constancy rule, since it would be possible to change **ppc value through *pc.

Suppose it would not be forbidden:

const char c = 'a';      // Constant variable.
char *pc;                // Pointer through which we will change c.
const char **ppc = &pc;  // Forbidden, but assume it is not.

*ppc = &c;               // Legal.
*pc = 'b';               // Change c.

So ppc goes through pc to c. Since pc is not a pointer to a constant, we can change the value, thus ppc constancy is broken.

C/C++ difference for const

In C, the following

const int a = 10;
int *p = &a;
*p = 30;

printf("&a: %u, a: %d\n", &a, a);
printf("&p: %u, p: %d\n", p, *p);
return 0;

outputs as expected

&a: 1021510500, a: 30
&p: 1021510500, p: 30

But in C++, the previous code won’t be allowed since the const keyword is more restrictive. There is a workaround though:

const int a = 10;
int *p = (int*)(&a);
*p = 30;

printf("&a: %u, a: %d\n", &a, a);
printf("&p: %u, p: %d\n", p, *p);

but the output will be:

&a: 1021510500, a: 10
&p: 1021510500, p: 30

Yes, that is the same address and two different values!

This is because C++ handles const as an immediate value, not a variable. It behaves similarly to #define. The address of a const, albeit grammatically defined, is rather meaningless.

Constants as static array initializers

Semantically speaking, the const keyword refers to immutable variables and not constant variables, which is an interesting oxymoron.

As such, const variables should not be used to initialize static arrays of some size, since the standard requires a semantic constant here, i.e. an integer or a preprocessor expression that expands to an integer.

int array1[17];
const unsigned int sz = sizeof array1;
int array2[sizeof array1]; // OK
int array3[sz]; // Wrong

In practice, most compilers accept const variables in that case.

Function argument evaluation order

From The C Programming Language:

The order in which function arguments are evaluated is unspecified, so the statement printf(“%d %d\n”, ++n, power(2, n)); can produce different results with different compilers, depending on whether n is incremented before power is called.

Thus it is good practice to avoid expressions in function calls.

Arrays

Arrays are not pointers! There is a small number of cases when they behave differently. The following test is true:

array[0] == *array

From the C standard:

Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

Using sizeof

The sizeof operator is dynamic and follows its own set of rules as described by the standard. When the argument is an array, it will return the total number of bytes.

long array[3];
long *p = array;
printf("%zu\n", sizeof(array));
printf("%zu\n", sizeof(p));

On machines where long is 8 bytes and pointers are 4 bytes, this will output:

24
4

Arrays are automatically converted to pointers in function arguments. Thus the behavior of sizeof is special only within the scope of an array declaration.

void foo(int array[]) {
	printf("foo: sizeof array == %zu\n", sizeof array);
}

void bar(int array[12]) {
	printf("bar: sizeof array == %zu\n", sizeof array);
}

int main() {
	int array[10];

	printf("main: sizeof array == %zu\n", sizeof array);
	foo(array);
	bar(array);

	return 0;
}

For multidimensional arrays, only the outermost dimension is converted to a pointers. For instance, int array[M][N] will be cast to int (*)[N]. The following will output the size of a pointer.

void foo(int *array[3]) {
	printf("foo: sizeof array == %zu\n", sizeof array);
}

int main() {
	int arr[2][3] = {{10, 20, 30}, {40, 50, 60}};
	foo(arr);
	return 0;
}

Addressing arrays

Arrays have a type signature that differs from pointers. The signature of a pointer to an n-array of T is T (*)[n].

long array[3];
long *p;
long **pp;
long (*ap)[3];

p = &array;  // Wrong
pp = &array; // Wrong
ap = &array; // OK

Note that the warning about type comes from the dereferences (&), since the following code does not prompt any warning:

long array[3];
long *p;
long (*ap)[3];

p = array;   // OK this time
ap = &array; // OK

Conversely, an array cannot be assigned to a pointer:

long array[3];
long *p;
p = array; // Wrong

Arrays as strings

Arrays can only be initialized with semantic constants.

char *p = "hello";
char t0[] = "world";
char t1[] = {'f', 'o', 'o'};
char t2[] = p; // Error.
char t3[] = (char*) "foo"; // Error.

There is another major difference in the initialization of pointers against arrays. The pointer will only set its value to the address of hello stored in the static memory segment of the program, whereas the array will copy world from this same segment to its allocated memory. The array can be modified afterwards, unlike the underlying value of the pointer.

Implicit cast

Numbers are automatically upcast in function calls. Compare

unsigned char a = 255;
a++;
printf("%d\n", a);

and

unsigned char a = 255;
printf("%d\n", a+1);

There is no loss of information during an upcast, except for the char type. C does not specify whether a char should be signed. Thus signed or unsigned should be used to ensure portability.

From The C Programming Language, section 2.7:

Conversion rules are more complicated when unsigned operands are involved. The problem is that comparisons between signed and unsigned values are machine-dependent, because they depend on the sizes of the various integer types. For example, suppose that int is 16 bits and long is 32 bits. Then -1L < 1U, because 1U, which is an int, is promoted to a signed long. But -1L > 1UL, because -1L is promoted to unsigned long and thus appears to be a large positive number.

See appendix A6 in the book for more implicit conversion rules.

Bit shifting

Be wary of the difference between a logical shift and an arithmetic shift. See this Wikipedia article for more details. Note that it only matters for right shifting.

The C behaviour is architecture dependent for signed numbers.

Modulo operation

In C99, the result of a modulo operation has the sign of the dividend:

printf("-5 % 2 = %d\n", -5 % 2);
printf("5 % -2 = %d\n", 5 % -2);

To test whether an integer is odd, you must compare to 0, not 1. Otherwise, the result will be incorrect when the dividend is negative.

if (n % 2 == 1) // WRONG!
if (n % 2 != 0) // Correct.

Operator precedence

The choice for operator precedence in C can be counter-intuitive at times. The expression a & b == 7 is parsed as a & (b == 7).

See this Wikipedia article for more details.

File reading

When a text file is open in text-mode, (e.g. using the "r" option), POSIX specifies that the "b" option is ignored. Some non-POSIX operating systems, however, may try to be too smart. They will expect a “standard” end-of-line, such as \r\n. Which will obviously produce unexpected results on files with "\n" line breaks. The "b" option does not harm and helps for portability.

Globals

Pre-declarations can appear any number of times in C. They can appear only once in C++, or the compiler will complain about double definitions of globals:

#include <stdio.h>

int global;
int global;
int global = 3;

void change() {
	global = 17;
}

int main() {
	printf("%d\n", global);
	change();
	printf("%d\n", global);
	return 0;
}

In C, it will display the following:

3
17

Pointer arithmetic

It is not safe to assume that pointer arithmetic results in any integral type. Some architectures may have memory addresses indexed over 64-bit values, while using data over 32 bits. This behavior can be controlled from stdlib.h. For example, a pointer difference is stored as a type ptrdiff_t.

Size of void

With GCC, sizeof(void) == 1 is true. This is non standard, but the behaviour is not clearly specified either. Using -pedantic will output a warning.

Alignment

Do not expect the memory layout in structures to be as the code describes it: the compiler is free to pad some memory for optimization purposes.

This proves dangerous when serializing data. Use the offsetof macro to get the real offset of each structure member.

struct {char a; int b;} foo;
struct {char a; char b;} bar;

printf("sizeof foo == %zu\n", sizeof foo);
printf("&foo == %p\n", &foo);
printf("&foo.a == %p\n", &foo.a);
printf("&foo.b == %p\n", &foo.b);

printf("sizeof bar == %zu\n", sizeof bar);
printf("&bar == %p\n", &bar);
printf("&bar.a == %p\n", &bar.a);
printf("&bar.b == %p\n", &bar.b);

Precompiled headers

Compiling a header file may yield an unexpected result: some compilers such as GCC will recognize the extension and act accordingly. In that case, building a header will not result in an executable, but in a precompiled header, that is, an optimization for large headers.

If you want to force or prevent the build of precompiled headers, GCC allows for specifying the input language:

# The .xml file will be seen as a C header file.
gcc -x c-header myfile.xml
# The .h file will be compiled into an executable.
gcc -x c myfile.h

Final note

The numerous dark corners of C require some getting used to. It is helpful and good practice to make heavy use of your compiler’s warning flags, together with some fine ‘lint’ tools.

References