A Guide to Undefined Behavior in C and C++

Bumped into this nice article by John Regehr about undefined behaviour in C and C++. The article is in three parts:

  1. A Guide to Undefined Behavior in C and C++ Part 1
  2. A Guide to Undefined Behavior in C and C++ Part 2
  3. A Guide to Undefined Behavior in C and C++ Part 3

Programming languages typically make a distinction between normal program actions and erroneous actions. For Turing-complete languages we cannot reliably decide offline whether a program has the potential to execute an error; we have to just run it and see.

In a safe programming language, errors are trapped as they happen. Java, for example, is largely safe via its exception system. In an unsafeprogramming language, errors are not trapped. Rather, after executing an erroneous operation the program keeps going, but in a silently faulty way that may have observable consequences later on. Luca Cardelli’s article on type systems has a nice clear introduction to these issues. C and C++ are unsafe in a strong sense: executing an erroneous operation causes the entire program to be meaningless, as opposed to just the erroneous operation having an unpredictable result. In these languages erroneous operations are said to have undefined behavior.

The C FAQ defines “undefined behavior” like this:

  • Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
  • This is a good summary. Pretty much every C and C++ programmer understands that accessing a null pointer and dividing by zero are erroneous actions. On the other hand, the full implications of undefined behavior and its interactions with aggressive compilers are not well-appreciated. This post explores these topics.
 
C++