5.2. Quick Tour

C is a compiled language without an interpreter which would allow us to explore the language interactively. Instead we have to go through the usual edit-compile-run loop. The code is placed into files with the suffix .c or .h (header files for declarations shared by multiple programs, see below).

5.2.1. Hello World

#include <stdio.h>

main() {
  printf("Hello World");
}

Compared to the simple print "Hello World" we have seen so far, we need to understand a number of language features to implement our favorite message in C: libraries, preprocessor instructions, statement blocks, functions, and the entry point main.

Let's start with C's module structure. The core language is rather small and contains just the bare minimum to define functions and structures. Everything else, from input and output to memory management and string processing is defined in libraries. We will use only a few standard libraries, the most important one being stdio which allows us to show the results of our programs.

The stdio library contains the function printf for formatted output. It is declared in the header file stdio.h. To use the printf function, we have to include this header file using the preprocessor directive #include. The translation of our little program happens in two steps. First, the C preprocessor executes all the preprocessor commands (starting with a hash character) and expands the macros (defined with #define, see Section 5.2.6>). The result is plain C code without any preprocessor instructions or macros. If you want to see the generated code, you can run the preprocessor cpp directly. In case of the #include directive, the included file is copied into the generated code. The process is recursive, that is, the included file is actually processed by the preprocessor before inserting it into the code.

The remaining part of the program defines the function main. The function definition consists of the name of the function followed by the empty argument list () and a block of statements enclosed in the characteristic curly braces. Each statement ends with a semicolon. The call of the printf function looks just like a mathematical function call. Like in many other languages, String constants are enclosed in double quotes.

The compiler takes the preprocessed C code and turns it into machine code. This machine code contains a table declaring the data and functions used and defined by the code (the symbol table which you can display on UNIX systems using the nm command). When executing this code, the operating system looks for a function called main and calls it.

The main function as shown above is actually a shortcut. The "correct" version would define the return type in front of the function name and actually return a value.

#include <stdio.h>

int main() {
  printf("Hello World");
  return 0;
}

Since this is the default behavior of a function in C, we can omit both, the return type int and the return statement. In the following examples, we will omit the scaffolding and just list the main function followed by the result printed on the screen.

Simple arithmetic uses the familiar mathematical infix notation.

main() {
  printf("result=%d\n", 3+4*5);
  printf("result=%f\n", 1.5 * 2 + 3.5);
}

result=23
result=6.500000

Here we use the formatting capabilities of the printf function which work just like Python's formatting operator (or better the other way round, since Python's formatting was derived from C's). C support a large number of arithmetic operators including bit operations and update shortcut operators such as += and ++ (equivalent to +=1). A very useful operator is the conditional expression (the one feature we miss the most in Python - it is going to be added soon). It is a ternary expression which corresponds to the if expression in functional languages and uses a question mark after the condition followed by the two alternatives separated by a colon.

int abs(int i) { return i<0? -i : i; }

main() {
  printf("result=%d\n", abs(-55));
  printf("result=%d\n", abs(55));
}

result=55
result=55

5.2.2. Control Flow

C provides the conditional and loop statements known from procedural languages such as if/else and do/while.

main() {
  int i;

  if (1 < 2) {
    printf("yes\n");
  }
  else {
    printf("no\n");
  }

  i = 0;
  while (i<3) {
    printf("i=%d\n", i);
    i += 1;
  }

  do {
    i -= 1;
    printf("i=%d\n", i);
  } while (i>0);  
}

yes
i=0
i=1
i=2
i=2
i=1
i=0

This example already introduces an integer variable i with a type declaration which we will cover in the next section in more detail. The most useful control statement is the for loop. It consists of four parts: initialization, test, update, and a statement. The initialization statement is executed before entering the loop, the test is performed before every iteration, and the update after every iteration. The statement, which is typically a block of statements is executed in between. If the test fails, the loop is left. The loop every C programmer has entered about a million times walks through the integers between zero and some upper bound:

main() {
  int i;
  for (i=0; i<5; i++) {
    printf("i=%d\n", i);
  }
}

5.2.3. Basic Types

To me, the key to understanding C is the type system in general and pointers and type declarations in particular. Once you can read a declaration such as the following one of the signal function without blinking twice, there is hardly anything that can stop you in C.

 void (*signal(int sig, void(* func)(int)))(int);

Before dissecting this declaration, let's start with the basics. C is a statically typed language with explicit type declarations. In contrast to dynamically typed languages, the type information is attached to variables rather than values. In C, the programmer has to use type declarations to tell the compiler that a variable or function is of a certain type. As we will see in the chapters about ML and Haskell, there are languages which leave most of this burden to the compiler using implicit types and type inference.

The following example introduces a simple integer function and uses it in the main function.

int times2(int x) { return 2*x; }

main() {
  int n = 5;
  double result;
  result = times2(n);
  printf("result=%f\n", result);
}

result=10.000000

In the simple cases, a C type declaration is put in front of the associated variable or function (for the return type). In the main function we introduce two local variables, the integer variable n and the floating point variable result. In C, local variables must be declared at the beginning of a block (in the newer members of the C family such as C++, Java, and C#, this rule was dropped and you can declare variables where they are used for the first time). In reality, most C programmers declare local variables at the beginning of a function. As you can see, variables can be initialized as part of the declaration. The example also shows that C converts between types automatically whenever it makes sense, for example, from integer to double, but not vice versa. In other words, type conversion if performed automatically if no information is lost. We can also force a type conversion by putting the required type in parentheses in front of the value, an operation known as casting.

main() {
  double x = 5.5;
  int n = (int)x;
  printf("n=%d", n);
}

n=5

The next example shows the first deviation from the rule that type declarations are always put in front.

static void squares(int a[], int n) {
  int i;
  for (i=0; i<n; i++) {
    a[i] = i*i;
  }
}

main() {
  int i;
  int a[5];

  squares(a, 5);
  for (i=0; i<5; i++) {
    printf("square(%d)=%d", i, a[i]);
  }
}

One of C's (few) built-in data structures are arrays. An array contains a fixed number of values of the same type. The type declaration uses the same syntax as array access, that is, the index operator (square brackets). The array part of the type declaration is put behind the variable in questions. When used to access the array, the square brackets contain the zero-based index (to be interpreted as an "offset"), and when used in a type declaration they contain the size of the array. If the size of the array is not known (as for the argument a to the squares function), the square brackets are left empty.

The example also give a first glimpse at the dangers of C. The array does not contain information about its size, and there is no boundary checking when accessing array elements. The application has to make sure that the arrays are used properly, or otherwise it will experience one of the infamous buffer overflow bugs. That's why we have to pass the length of the array explicitly to the squares function. We can make our example a little bit safer by defining a constant for the size of the array.

main() {
  const int n = 5;
  int i, a[n];

  squares(a, n);
  for (i=0; i<n; i++) {
    printf("square(%d)=%d\n", i, a[i]);
  }
}

This shows another type feature introduced with the ANSI standardization of C: constants which are declared by preceding the type with the const qualifier.

Since arrays may contain arbitrary types, we can define multidimensional arrays as arrays of arrays as in the next example.

main() {
  const int m = 2;
  const int n = 3;
  double a[m][n];

  a[1][2] = 55;
  printf("x=%d", a[1][2]);
}

The variable a is a matrix of two rows and three columns. The six elements are stored in memory as one block of six integers, and the array access is implemented by computing the offset as row index times column size plus column index. Obviously, this approach generalizes to arrays of dimension greater than two.

As you can tell we are moving closer and closer to the machine level. The next topic takes us directly to the computer memory. C's pointers are nothing but addresses in (nowadays typically virtual) memory, and C gives the developer a lot of freedom to manipulate pointers and use them in creative manners. What the index operator [] is for arrays, the asterisk * is for pointers. In type declarations, an asterisk preceding an expression indicates a pointer, and in statements the same operator retrieve the value the pointer is pointing to (that is, the contents of the memory at the address contained in the pointer). The ampersand & does the opposite by returning the address of a variable as a pointer.

main() {
  int n = 5;
  int *np = &n;

  *np = 10;
  printf("n=%d", n);
}

n=10

What makes things interesting is that you can calculate with pointers. As an example, we can access the elements of an array using pointer calculations.

main() {
  int a[5];
  int *ap = &a[0];

  a[2] = 1;
  printf("a[2]=%d\n", a[2]);
  *(ap + 2) = 2;
  printf("a[2]=%d\n", a[2]);
  ap[2] = 3;
  printf("a[2]=%d\n", a[2]);
}

a[2]=1
a[2]=2
a[2]=3

The example shows three different ways to access an element of an array. First, we let the pointer ap point to the first element of the array. The first assignment uses the standard array index operator. The second one, *(ap + 2) = 2 uses pointer arithmetic. We add the offset 2 to the pointer resulting in a pointer to the third element of the array. Using the asterisk, we access the value of this element and set it to two. The last assignment, a[2] = 3 demonstrated C's shortcut syntax for the same expression: For a pointer p, the expression p[i] is equivalent to *(p+i). Note that the pointer arithmetic takes the size of the underlying type (here integer) into account. At least, we don't have to compute the addresses in bytes.

Pointers are essential when dealing with dynamic data structures, for examples, arrays whose size is not known at compile time. The memory for these structures has to be allocated at run time (on the heap) and given back to the operating system once the memory is not used any more. Using C, those tasks are the responsibility of the programmer. New memory is allocated with the malloc function (and its siblings) and deallocated with free.

#include <stdio.h>
#include <malloc.h>

main() {
  const int n = 5;
  int* a = malloc(n * sizeof(int));
  a[2] = 55;
  printf("a[2]=", a[2]);
  free(a);
}

a[2]=55

The malloc function takes the number of bytes to be allocated and returns a generic pointer (of type void*). We therefore have to compute the actual size of our integer array in bytes using the sizeof function. The resulting pointer to the memory block allocated by the malloc function is automatically converted to an integer pointer.

With all the knowledge about arrays and pointers we can revisit strings. The standard strings in C are null-terminated arrays of 8-bit characters.

5.2.4. Structures and Type Definitions

Besides the basics types, arrays, and pointers, C allows you to define you own structures. A structure combines a fixed number of named fields.

main() {
  struct { const char* name; int age; } person;

  person.name = "Homer";
  person.age = 55;

  printf("person=%s, %d", person.name, person.age);
}
person=Homer, 55

The fields of structures are declared just like local variables of a functions. Like arrays, structures are static in the sense that the size and memory layout is completely determined at compile time. The fields of a structure are accessed using the dot notation. We can give the structure definition a name for reuse. In the following example, we use the named structure to define a pointer to the person.

main() {
  struct PERSON { const char* name; int age; } person;
  struct PERSON* person_ptr = &person;

  person.name = "Homer";
  person.age = 55;

  printf("person=%s, %d", person.name, person.age);
  (*person_ptr).name = "Bart";
  person_ptr->age = 10;
  printf("person=%s, %d", person.name, person.age);
}

person=Homer, 55
person=Bart, 10

As you can see, C provides the arrow operator -> as a shortcut for the dereference operator * combined with the field access.

When dealing with complex types such as structures, it is useful to give the type a new name using C's type definitions. A type definition looks like a variable definition preceded by the keyword typedef. The variable name become the name of the new type. This makes the previous example much more readable.

typedef struct {
  const char* name;
  int age;
} Person;

main() {
  Person person;
  Person *person_ptr = &person;

  ...
}

With this concept we can start to define a number of functions manipulating Person structures.

void Person_print(Person* person) {
  printf("name=%s, age=%d", person->name, person->age);
}

Note that just the pointer is passed to function instead of copying the whole structure.

5.2.5. Functions

We have already used the basic syntax of function definitions in the previous sections. Functions are defined with their return type followed by the function name, the argument list, and a block of statements. A function can have local variables which must be declared at the beginning of the block. In contrast to other precedural languages (e.g., the Pascal family), function definitions can not be nested. In other words, functions live in a single global scope. However, we can hide a definition of global variables and functions inside a file using the static keyword.

static n = 5;
static int timesN(int x) { return n*x; }

Preceding a definition with static prevents the compiler from adding the symbols to the symbol table so that the associated variables and functions can not be seen by another module. As a rule, define all objects static unless they are used by other modules. (This is a UNIX-centric presentation of the situation. On Windows, one has to explicitly export definitions in order to make them available in other modules.)

C supports recursive functions as well.

static int fac(int n) { return n<2? 1 : n*fac(n-1); }
main() {
  printf("fac(5)=%d", fac(5));
}

120

You can call a function before it is defined by using a function declaration (this is essential for defining mutually recursive functions). These declarations are basically function definitions without the statement block. Like other declarations they end with a semicolon.

static int fac(int n);

main() {
  printf("fac(5)=%d", fac(5));
}

static int fac(int n) { return n<2? 1 : n*fac(n-1); }

120

Without the forward declaration, this code would have lead to an error. The C compiler reads and translates the file sequentially (leading to fast compilers). Function declarations are also the main means to share information between multiple compilation units. The header files such as stdio.h contain just the declarations of the exported functions. The associated compiled function definitions of the standard I/O library are contained in the system library and linked to the executable during linking.

We have seen functions of elementary types and structures, but does C support higher order functions? Is it possible to define a function like map which applies another function to all elements, say, in an array? In other words, can we pass a function as an argument to another function? As you might have guessed by now, the typical answer in C involves pointers, in this case, function pointers. But how do we define a function pointer type? The syntax follows the symmetry between the declaration of a variable and the declaration of the corresponding type. In case of a function, we take the function declaration, e.g., int f(int i), replace the function name by the type name, and make it a pointer by preceding the name with an asterisk. All that's missing for our function pointer type definition is the typedef keyword and parentheses around the asterisk and the name to prevent C's preference rules from binding the asterisk to the int return type.

typedef int (*IntMap)(int i);

void map(IntMap f, int a[], int b[], int n) {
  int i;
  for (i=0; i<n; ++i) {
    b[i] = f(a[i]);
  }
}

static int times2(int i) { return 2*i; }

main() {
  const int n = 5;
  int i, a[n], b[n];
 
  for (i=0; i<n; i++) a[i] = i;
  map(times2, a, b, n);
  for (i=0; i<n; i++) {
    printf("b[%d]=%d\n", i, b[i]);
  }
}

b[0]=0
b[1]=2
b[2]=4
b[3]=6
b[4]=8

Note that the function times2 is passed to the map function just using the function's name. You may have thought that the name has to be preceded by an ampersand to turn the function into a function pointer, but the syntax makes sense since the parentheses required when calling a function clearly distinguish a function call from the function itself.

It is not surprising that C ows much of its flexibility to function pointers. They can be used in any other data structures such as arrays and structures. Functions can also return function pointers, which leads us to the explanation of the signal function.

 void (*signal(int sig, void(* func)(int)))(int);

The definition becomes are lot more readable by introducing a type definition for signal handlers which are pointers to functions taking a single integer argument (the signal number) and returning nothing.

typedef void (*SignalHandler)(int signal);

With this definition, the signal function looks like this:

SignalHandler signal(int signal, SignalHandler handler);

The signal function is used to bind a signal handler to a signal. It returns the old signal handler so that the application can keep it in mind and restore the old situation if necessary.

We finish this section with a slightly longer example showing one tiny step towards the implementation of object oriented features in C. It demonstrates how structures and function pointers can be used to implement polymorphism.

typedef void (*PrintFunction)(void* object);

typedef struct {
  PrintFunction print;
} Class;

typedef struct {
  Class *class;
} Object;

static void print(void* o) {
  Object* object = (Object*)o;
  object->class->print(object);
}

/* Cat */

typedef struct {
  Class* class;
  const char* name;
} Cat;

static void Cat_print(void* p) {
  Cat* cat = (Cat*)p;
  printf("Miouw, I'm a cat, my name is %s", cat->name);
}

static Class Cat_class = { Cat_print };

static Cat* Cat_new(const char* name) {
  Cat* cat = (Cat*)malloc(sizeof(Cat));
  cat->class = &Cat_class;
  cat->name = name;
  return cat;
}

/* Dog */

typedef struct {
  Class* class;
  const char* name;
  const char* breed;
} Dog;

static void Dog_print(void* p) {
  Dog* dog = (Dog*)p;
  printf("Wouff, I'm a %s, my name is %s", dog->breed, dog->name);
}

static Class Dog_class = { Dog_print };

static Dog* Dog_new(const char* name, const char* breed) {
  Dog* dog = (Dog*)malloc(sizeof(Dog));
  dog->class = &Dog_class;
  dog->name = name;
  dog->breed = breed;
  return dog;
}

main() {
  Cat* cat = Cat_new("Felix");
  Dog* dog = Dog_new("Alex", "Terrier");

  print(cat);
  print(dog);

  free(dog);
  free(cat);
}

Miouw, I'm a cat, my name is Felix.
Wouff, I'm a Terrier, my name is Alex.

How does it work? Objects as well as their classes are represented as structures. The object structures have a pointer to their associated class structure as the first field. The class structure contains a pointer to a print function. Note that there is a lot of casting going on which can have desastrous effects if the structure does no look like expected. This admittedly oversimplified example is not so far from the actual implementation of object oriented extensions of C such as C++ and Objective C which hide all the casting and additional pointers from the developer. C++ does not use a pointer to a class structure, but to an array of function pointers (the virtual method table).

5.2.6. Macros

Since a programming language is hardly ever complete, its success often depends on the ability to extend the language with means of the language itself (rather than writing a new compiler). For C, this ability is provided by the preprocessor. We have used it from the very beginning to include the declarations of the standard I/O library. Besides merging other files into the source code, the preprocessor has two more main tasks: macros and conditional compilation. Macros allow you to substitute arbitrary text for symbols used in the code.

#define HELLO printf("Hello World")

main() {
  HELLO;
}

Some symbols such as the current file name (__FILE__) and line number (__LINE__) are predefined.

#include <stdio.h>

main() {
  printf("file=%s, line=%d", __FILE__, __LINE__);
}

file=sample.c, line=4

They are very handy when defining diagnostic messages, and, since the macro definitions are resolved at compile time, come at no cost. Macros can also have arguments which are substituted verbatim into the macro expansion.

#include <stdio.h>

#define LOG(message) printf("%s:%d %s", __FILE__, __LINE__, message)

main() {
  LOG("here we are");
}

sample.c:6 here we are

The macro syntax uses white space to separate the macro declaration (here: LOG(message)) from the definition which is read until the end of the line. If you want to define macros spanning multiple lines, you need to end all but the last line with a backslash as C's continuation character.

Other typical examples are small generic functions. Since C's function symbols are unique (no overloading), macros are the only way to define a "function" for multiple types.

#define MAX(a, b) (((a)>(b))? (a) : (b))

main() {
  int n = 50;
  double x = 1.23;
  printf("MAX(n + 5, 10)=%d\n", MAX(n + 5, 10));
  printf("MAX(x, 10)=%f\n", MAX(x, 10));
}

MAX(n, 10)=50
MAX(x, 10)=10.000000

Note that the macro calls look like function calls, but they are not. Instead the macro expression is pasted into the code. In contrast to functions, the arguments are not computed once and passed by value. The argument expressions are substituted literally for the placeholders in the macro definition. That's why we put parentheses around the arguments in the macro definition. Hence, MAX(n+5,10) will be replaced by the expression (((n+5)>(10))? (n+5) : (10)). Another example is the SWAP macro which swaps the values of two variables.

#define SWAP(T, a, b) { T tmp=a; a=b; b=tmp; }

main() {
  int x=5, y=10;
  SWAP(int, x, y);
  printf("x=%d, y=%d", x, y);
}

x=10, y=5

Here, we need to pass the type so that the macro can declare the temporary variable tmp. Originally, macros were also used to define constants, but with the introduction of proper typed constants in ANSI C this is not necessary anymore. So, instead of macro

#define PI 3.1415926535897931

better use the constant:

static const double PI = 3.1415926535897931;

We won't cover conditional compilation in this short chapter apart from saying that it is mainly used to adapt the source code to the different environments, hardware, and operating systems.