Recursive Macros in C/C++

References

Key Points

during expansion of a macro, the macro’s name is marked to not be further expanded
the arguments of a function-like macro invocation are expanded first, then substituted into the macro body, and then expanded again separately
if a marked symbol is encountered during expansion, it is marked permanently to not be further expanded
a function-like macro is only invoked if it is immediately followed by a left parenthesis

Consider the following code.

#define FOR_EACH(F, ...) \
	F(A __VA_OPT__(,) __VA_ARGS__) \
	F(B __VA_OPT__(,) __VA_ARGS__) \
	F(C __VA_OPT__(,) __VA_ARGS__) \

#define PRINT_CHAR(x) printf(#x);

FOR_EACH(PRINT_CHAR)
/*
 * printf("A");
 * printf("B");
 * printf("C");
 */

#define PRINT_CHAR_WITH_SUFFIX(x, suffix) printf(#x suffix);

FOR_EACH(PRINT_CHAR_WITH_SUFFIX, "\n")
/*
 * printf("A" "\n");
 * printf("B" "\n");
 * printf("C" "\n");
 */

The macro FOR_EACH takes a macro, and calls it for each of A, B, and C. This is useful, for example, for doing something for each type of a list/set, such as forward declaring them. Here, the example just stringifies the elements of the set, and prints them. The __VA_OPT__(,) __VA_ARGS__ just allows forwarding additional arguments to the supplied macro.

What if we wanted to do a double loop, performing an action/executing a macro for each possible pair of elements? This requires a recursive call to the FOR_EACH macro. The following is a naive solution.

#define PRINT_DOUBLE(b, a) printf(#a #b "\n");
#define PRINT_DOUBLE_HELPER(a) FOR_EACH(PRINT_DOUBLE, a)

FOR_EACH(PRINT_DOUBLE_HELPER)
/*
 * FOR_EACH(PRINT_DOUBLE, A)
 * FOR_EACH(PRINT_DOUBLE, B)
 * FOR_EACH(PRINT_DOUBLE, C)
 */

Unfortunately, the inner FOR_EACH did not get expanded. This is because if the expansion of a macro is self-referential, the inner occurence of the macro is not expanded. When expanding a macro, its name is said to be painted blue, and further occurences of this symbol are no longer expanded. Here is GCC’s page on self-referential macros. Here is a snippet from section 19.3.4 of the C++ standard:

If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced.

We can emulate recursion by taking advantage of argument prescanning; when the preprocessor encounters a function-like macro, it first expands its arguments, substitutes the results into the macro body, and then expands that. The key idea is that as the 1st and 3rd steps are separate expansions; the “blue paint” from the first expansion is lifted before the second expansion.

However, here is the second half of the above snippet from the C++ standard:

Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.

Essentially, if a painted-blue symbol is encountered during expansion, it is permanently marked as no longer available for further replacement. GCC also mentions this:

The self-references that do not expand in the first scan are marked so that they will not expand in the second scan either.

There is one more fact we must exploit to achieve our solution: a function-like macro is only expanded if its name is immediately followed by a left parenthesis (ignoring whitespace).

Solution

I came up with the following solution after reading this post by Jonathan Heathcote.

#define EMPTY()
#define DEFER(m) m EMPTY()
#define EVAL(m) m

#define FOR_EACH_() FOR_EACH

#define PRINT_DOUBLE(b, a) printf(#a #b "\n");
#define PRINT_DOUBLE_HELPER(a) DEFER(FOR_EACH_)()(PRINT_DOUBLE, a)

EVAL(FOR_EACH(PRINT_DOUBLE_HELPER))
/*
 * printf("A" "A" "\n");
 * printf("A" "B" "\n");
 * printf("A" "C" "\n");
 * printf("B" "A" "\n");
 * printf("B" "B" "\n");
 * printf("B" "C" "\n");
 * printf("C" "A" "\n");
 * printf("C" "B" "\n");
 * printf("C" "C" "\n");
 */

With what we have learned in mind, we can begin carrying out the expansion by hand for the argument to EVAL.

EVAL(FOR_EACH(PRINT_DOUBLE_HELPER)) // initial invocation

EVAL(
	PRINT_DOUBLE_HELPER(A)
	PRINT_DOUBLE_HELPER(B)
	PRINT_DOUBLE_HELPER(C)
	) // expand FOR_EACH

EVAL(
	DEFER(FOR_EACH_)()(PRINT_DOUBLE, A)
	DEFER(FOR_EACH_)()(PRINT_DOUBLE, B)
	DEFER(FOR_EACH_)()(PRINT_DOUBLE, C)
	) // expand PRINT_DOUBLE_HELPER

EVAL(
	FOR_EACH_ EMPTY() ()(PRINT_DOUBLE, A)
	FOR_EACH_ EMPTY() ()(PRINT_DOUBLE, B)
	FOR_EACH_ EMPTY() ()(PRINT_DOUBLE, C)
	) // expand DEFER

EVAL(
	FOR_EACH_ ()(PRINT_DOUBLE, A)
	FOR_EACH_ ()(PRINT_DOUBLE, B)
	FOR_EACH_ ()(PRINT_DOUBLE, C)
	) // expand EMPTY

The argument expansion stops there. FOR_EACH_ is not expanded as it was not immediately followed by a left parenthesis. Even though the preprocessor sees a left parenthesis after expanding EMPTY(), it does not “look back” to realize that there is a function-like macro.

Now, substituting the argument for EVAL to its body, we simply get:

FOR_EACH_()(PRINT_DOUBLE, A)
FOR_EACH_()(PRINT_DOUBLE, B)
FOR_EACH_()(PRINT_DOUBLE, C)

It is easy to see that it expands to:

FOR_EACH(PRINT_DOUBLE, A)
FOR_EACH(PRINT_DOUBLE, B)
FOR_EACH(PRINT_DOUBLE, C)

PRINT_DOUBLE(A, A)
PRINT_DOUBLE(B, A)
PRINT_DOUBLE(C, A)
PRINT_DOUBLE(A, B)
PRINT_DOUBLE(B, B)
PRINT_DOUBLE(C, B)
PRINT_DOUBLE(A, C)
PRINT_DOUBLE(B, C)
PRINT_DOUBLE(C, C)
/*
 * printf("A" "A" "\n");
 * printf("A" "B" "\n");
 * printf("A" "C" "\n");
 * printf("B" "A" "\n");
 * printf("B" "B" "\n");
 * printf("B" "C" "\n");
 * printf("C" "A" "\n");
 * printf("C" "B" "\n");
 * printf("C" "C" "\n");
 */

Why does `DEFER` need an `EMPTY()`?

Without the EMPTY(), we would get the following expansion:

DEFER(FOR_EACH_)()(PRINT_DOUBLE, A)

FOR_EACH_()(PRINT_DOUBLE, A)

FOR_EACH(PRINT_DOUBLE, A)

As the expansion of DEFER(FOR_EACH_) to FOR_EACH_ is immediately followed by a left parenthesis, it is further expanded. Now that FOR_EACH, a blue token, is encountered, it is marked to be no longer available for further replacement, even in the expansion of EVAL’s body.

Why do we need `#define FOR_EACH_() FOR_EACH`?

Suppose we had this:

#define PRINT_DOUBLE_HELPER(a) DEFER(FOR_EACH)(PRINT_DOUBLE, a)

EVAL(FOR_EACH(PRINT_DOUBLE_HELPER))

EVAL(
	PRINT_DOUBLE_HELPER(A)
	PRINT_DOUBLE_HELPER(B)
	PRINT_DOUBLE_HELPER(C)
	) // expand FOR_EACH

EVAL(
	DEFER(FOR_EACH)(PRINT_DOUBLE, A)
	DEFER(FOR_EACH)(PRINT_DOUBLE, B)
	DEFER(FOR_EACH)(PRINT_DOUBLE, C)
	) // expand PRINT_DOUBLE_HELPER

EVAL(
	FOR_EACH EMPTY() (PRINT_DOUBLE, A)
	FOR_EACH EMPTY() (PRINT_DOUBLE, B)
	FOR_EACH EMPTY() (PRINT_DOUBLE, C)
	) // expand DEFER

EVAL(
	FOR_EACH (PRINT_DOUBLE, A)
	FOR_EACH (PRINT_DOUBLE, B)
	FOR_EACH (PRINT_DOUBLE, C)
	) // expand EMPTY

Again, the preprocessor does not try to expand FOR_EACH as it was not immediately followed by a parenthesis. However, recall the snippet:

Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.

Just the presence of FOR_EACH will permanently disable it. That is why we needed EVAL’s argument to expand to FOR_EACH_()(PRINT_DOUBLE, A) which does not contain the blue symbol FOR_EACH. Upon evaluation inside the body, where the blue paint from the argument expansion has been lifted, FOR_EACH_() will expand to FOR_EACH, will finally be expanded as it is followed by the parenthesis in (PRINT_DOUBLE, A).

References

Key Points

Solution

Why does DEFER need an EMPTY()?

Why do we need #define FOR_EACH_() FOR_EACH?

Why does `DEFER` need an `EMPTY()`?

Why do we need `#define FOR_EACH_() FOR_EACH`?