An Example of Optimizations Around Undefined Behaviour

I don't know if I'll be teaching the course again, but if I wait until next year I'll forget feedback I've received. Note that more test cases have been added (if you pull the new starter code); those are for the (potential) future!

Undefined behaviour is something that a program may exhibit during execution. Strictly speaking, it is not something that exists statically (during compilation). Optimizations in C++ are based on the as if rule: the observable behaviour of a compiled program must be indistinguishable from that defined by the semantics of the source code. However, the compiler does not need to preserve the behaviour of an execution that exhibits undefined behaviour. In other words, the compiler optimizes code assuming undefined behaviour never happens.

One of the classic examples of UB is a null pointer dereference; a compiler may assume you will never do this. I was trying to create a cool example in class using the Collatz conjecture; I wanted something like “the compiler cannot optimize this without solving the Collatz conjecture or reasoning about UB”. However, sitting down and thinking about it, the Collatz conjecture is not suitable for this type of example I was trying to create. Roughly, we need a problem that we know the answer to, but the compiler doesn’t. Unfortunately, we do not have a definite answer to the Collatz conjecture.

int* get_new_int();

void foo(int* p) {
	if (p == nullptr) {
		return 0;
	}
	return *p * 2;
}

int do_something(int* p) {
	if (p == nullptr) {
		printf("p is null.\n");
	} else {
		printf("some other code...\n");
	}
}

int main() {
	int* p = get_new_int();
	do_something(p);
	while (foo(p) == 0) {
		p = get_new_int();
	}
	*p += 1;
	do_something(p);
	return 0;
}

Note that if foo(p) returns a nonzero integer, p must not have been null. So indeed, *p += 1; is safe, because we may only break out of the loop if foo(p) != 0.

Inlining is a powerful transformation that enables a lesser form of inter-procedural optimization. Suppose the compiler inlines the definition of do_something into main, so main now looks like this.

int main() {
	int* p = get_new_int();
	// Inlined do_something(p).
	if (p == nullptr) {
		printf("p is null.\n");
	} else {
		printf("some other code...\n");
	}
	while (foo(p) == 0) {
		p = get_new_int();
	}
	*p += 1;
	// Inlined do_something(p).
	if (p == nullptr) {
		printf("p is null.\n");
	} else {
		printf("some other code...\n");
	}
	return 0;
}

It would be great if the compiler could remove the second null-check for us. One way to know that is safe is by realizing that foo(p) != 0 implies p != nullptr. Another way is to reason about UB; *p += 1; dereferences a pointer. The compiler trusts that you would never do that. So if you reached that point in the program, p must not be null.

In general, totally determining whether a pointer may be null would be equivalent to solving the halting problem (can you see how to formulate it?). Intuitively, given a specific program, it should always be possible to write a sufficiently intelligent compiler that could optimize it (without reasoning about UB). However, it is impossible to have a single compiler that could optimize all such instances of maybe-null pointers in all possible programs. Forbidding UB is a (powerful) tool that the compiler uses to determine more “facts” about your input program in order to perform optimizations.

UofT ECE 467 (2022 Fall) - An Example of Optimizations Around Undefined Behaviour

The website and starter code is being pre-emptively updated!

Read More