Trait Meta-Programming in Rust

Consider the following trait interface and implementation.

pub trait Rootable {
	type Rank: UsizeConstantT;

	type Rooted<'rooted>;

	fn root_into<'a>(self, guarded: StackRootedSlice<'a, Self::Rank>) -> Self::Rooted<'a>
	where
		[(); Self::Rank::VALUE]:;
}

impl<R, S> Rootable for (R, S)
where
	R: Rootable,
	S: Rootable,
	R::Rank: Add<S::Rank>,
	<R::Rank as Add<S::Rank>>::Output: UsizeConstantT,
	<R::Rank as Add<S::Rank>>::Output: Sub<R::Rank>,
	<<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output: UsizeConstantT,
	for<'a> StackRootedSlice<'a, <<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output>:
		Identity<StackRootedSlice<'a, S::Rank>>,
	[(); R::Rank::VALUE]:,
	[(); S::Rank::VALUE]:,
	[(); <R::Rank as Add<S::Rank>>::Output::VALUE]:,
	[(); <<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output::VALUE]:,
{
	type Rank = <R::Rank as Add<S::Rank>>::Output;

	type Rooted<'rooted> = (R::Rooted<'rooted>, S::Rooted<'rooted>);

	fn root_into<'a>(self, guarded: StackRootedSlice<'a, Self::Rank>) -> Self::Rooted<'a> {
		let (guarded_r, guarded_s): (
			StackRootedSlice<R::Rank>,
			StackRootedSlice<<<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output>,
		) = guarded.split::<R::Rank>();
		(
			self.0.root_into(guarded_r),
			self.1.root_into(Identity::<StackRootedSlice<S::Rank>>::identity(guarded_s)),
		)
	}
}

What the hell is that ugly mess? (Arguably, I should have written more idiomatic Rust which omits type annotations for local variables, but I like to be as explicit as possible.) More specifically, we will explore the meaning of the first 7 where bounds (up to and including the one starting with for<'a>). The remaining 4 bounds starting with [(); R::Rank::VALUE]: are explained quickly in the appendix.

What is `Rootable`?

Given objects managed by a garbage collector (GC), suppose we are given a pointer to an object. If GC runs concurrently with the application, we want to make sure the objects we use are not collected from under us. This is done by “rooting”; i.e. informing the GC that we are using a pointer/object (before we start using it).

In general, we can say there are certain types, where for each such type \(T\), we wish to require a prerequisite operation be performed on a \(T\) before other functions may be called on it. One way to implement this by creating a newtype (more generally some wrapper) \(T’\) for each \(T\), and defining a single public way of creating a \(T’\) - a “rooting” function. The rooting function takes (by value) a \(T\), possibly additional parameters required for the prerequisite operation, performs the operation, then produces a \(T’\) from the \(T\). Since “rooting” is the only way to publically get a \(T’\), we can define our “gated” functions on \(T’\) instead of \(T\).

The trait Rootable indicates that an implementing type \(T\) has an associated type Rooted<'a> (i.e. \(T’\) from previously), and a rooting function root_into which consumes a \(T\) and produces its Rooted<'a> counterpart. The <'a> is a generic lifetime parameter on the associated type. For those unfamiliar with Rust, lifetime parameters ('a and 'rooted in the original code) can be ignored for the purposes of this post. Whatever prerequisite operation \(T\) requires is implemented in root_into.

This “rootable property” is structurally transitive (I made that term up; there’s probably a proper, formal term); compositions of Rootable types are Rootable simply by rooting the individual parts. Of course, there are “atomic” Rootable types that are like the base case of this recursive definition. For example, in a GCed scenario, the pointer types of managed objects are the “atoms”. Then, given two Rootable types \(R\) and \(S\), a tuple \((R, S)\) is also rootable by rooting the \(R\) and rooting the \(S\), producing an \((R’, S’)\) in the obvious way. More generally, one can say that the product type or sum type of Rootable types is Rootable, and we wish to have a generic implementation for this. Even more abstractly, if we view uninhabited enums as unique (the “never” type) and zero-sized structs as unique (the “empty” type), I believe it is then correct to say that the set of rootable types forms a ring, where the additive identity \(0\) is the “never” type, and the multiplicative identity \(1\) is the “empty” type. (Forgive me if I’m wrong, I’m just a lowly math minor!)

We can actually extend Rootable to all other types \(U\) (that do not require any sort of “rooting”) with a blanket implementation where a \(U\) roots identically to itself (with no other operation). (Then we get that the set of all types forms a ring, which is probably a trivial result in type theory.)

More interestingly, and most messily in the implementation, a Rootable type \(T\) has a “rank”; for GC, the rank is the number of pointers needing rooting in a \(T\). The rank of an atomic Rootable type is 1; for GC, an atomic pointer type is, well, 1 pointer that needs to be recorded. The rank of an “inert” type (one that does not require rooting and roots identically to itself) is 0, because there are no relevant pointers. For two Rootable types \(R\) and \(S\), the tuple (product) \((R, S)\) has rank the sum of the ranks of \(R\) and \(S\), because we need to root the pointers in \(R\), plus the pointers in \(S\). The enum (sum) of either \(R\) or \(S\) has rank the maximum of their ranks. For example, in Rust, the rank of an Option<R> (for a rootable type R) is just the rank of R; formally, Option<R> is the sum of R and the \(1\)/empty type (). (The rank then follows from the definitions.)

Aside: From where does this seem familiar…

Those with certain familiarity with math may recognize rank as similar to the degree function on polynomials. In fact, (I believe) we can say more about the ring of all types; it is the polynomial ring with coefficients in the inert types, and indeterminates \(x_i\) for each atomic Rootable type \(T_i\)! Then, the rank of a type we defined is exactly the degree of the type viewed as an element in the polynomial ring! I think that is Very Cool(tm).

The function root_into<'a> takes parameters self, and a guarded: StackRootedSlice<'a, Self::Rank>. Conceptually, a StackRootedSlice<'a, Self::Rank> is a (wrapper around a) reference of lifetime 'a to an array of compile-time size Self::Rank. The function root_into<'a> then:

performs the “rooting operation”, which for GC is storing each pointer in self into guarded, and
produces the rooted associated type Self::Rooted<'a>.

Hopefully, you can now make a guess of how root_into<'a> for (R, S) should work. Recall that the rank of a tuple (R, S) is the rank of R plus the rank of S. So fn root_into<'a> splits guarded into two StackRootedSlices (of the same lifetime 'a); one with the rank of R, and another with the rank of S. It then recursively calls <R as Rootable>::root_into on the R, and <S as Rootable>::root_into on the S, with the two StackRootedSlices respectively.

Limitations of const generics

Curiously, in our trait Rootable, the rank is a type Rank: UsizeConstantT instead of a more obvious constant RANK: usize. In short, const generics are still an unstable feature of Rust, and their expressivity is limited. For example, when using const generics, we may want the array types [u32; N + M] and [u32; M + N] to be “the same”. However, in general this requires the compiler to prove the equality of expressions involving const parameters. Importantly, Rust chooses to try to prove this equality at the definition of a generic item, before we have concrete values for N and M - I believe the term is (pre-)monomorphization - that Rust does its work and validation before a concrete instantiation (monomorphization) of the generic.

The original RFC for const generics states that initially, distinct generic expressions will be considered distinct. At the time of this writing, the compiler is already smarter about determining equality, although following the relevant tracking issue and its links, it’s unclear to me the exact current rule. (In-development “valtrees” should further improve when the compiler can determine equality.) Currently, at least roughly, it seems that the compiler knows about syntactic equality. For example, the following does not compile, because N + M is syntactically different from M + N.

fn foo<const N: usize, const M: usize>() -> [u32; N + M] {
	[0; M + N]
}

Going over this next part quickly, if we implemented rank directly using const generics, the compiler would not be able to prove that an array of size <(R, S) as Rootable>::RANK splits evenly into two arrays of sizes <R as Rootable>::RANK and <S as Rootable>::RANK. Or more importantly, we will not be able to convince the compiler that it splits evenly; by wrapping const generics with generic types, we can use traits to tell the compiler that the corresponding equality will hold, until we actually reach monomorphization. (We could also use the typenum crate, which creatively implements integers entirely in Rust’s type system without any unstable features. However, we don’t need all of its capabilities and can use nightly Rust, so we can represent integers with “less code” using const generics.)

To those familiar with C++’s template meta-programming, the following equivalent code does compile in C++.

#include <cstddef>
#include <array>

template<size_t N, size_t M> std::array<unsigned int, N + M> foo() {
	return std::array<unsigned int, M + N> {};
}

Curiously, compiling the above on Compiler Explorer produces no output. This is because C++ really treats templates like “templates”; only when you use a templated item does it “copy-paste” the template definition with the concrete template parameters substituted, and compile it (i.e. when the template is monomorphized). If you add a concrete usage such as a call foo<1, 2>(), you would see some assembly. Has GCC proven that N + M == M + N? Well, no; it has only verified that it is true in this particular instance where we have substituted N = 1 and M = 2. The decision of when to validate/”compile” templates/generics is a philosophical difference between C++ and Rust, that has some interesting consequences which I hoped to discuss in this post, but after working on this I’ll leave it for another time/another post.

Wrapping const generics in generic types

For those familiar with C++’s std::integral_constant, conceptually that’s just what we have here (just that specifically the constant type is usize, as we cannot parameterize generic constants with a generic type in Rust). If you’re comfortable with that you can skip ahead to the next section.

We have the following trait UsizeConstantT.

pub trait UsizeConstantT {
	const VALUE: usize;
}

Pretty straightforward; it’s conceptually an interface that says: a type T that implements UsizeConstantT has an associated constant VALUE (of type usize), accessed like T::VALUE, or <T as UsizeConstantT>::VALUE in some places where Rust needs us to be explicit.

The “T” at the end of the trait name is for “Trait”, since it is used with the following struct with a similar name and I couldn’t think of anything better.

#[derive(Default, Eq, PartialEq)]
pub struct UsizeConstant<const N: usize>;

impl<const N: usize> UsizeConstantT for UsizeConstant<N> {
	const VALUE: usize = N;
}

Each distinct constant N: usize gives us a distinct struct/type UsizeConstant<N>. For example, to the type system, the only thing UsizeConstant<1> and UsizeConstant<2> have in common is that they both implement UsizeConstantT (through the generic implementation). And, for example, UsizeConstant<1>::VALUE (or more explicitly <UsizeConstant<1> as UsizeConstantT>::VALUE) is 1, and so on. So whenever we wanted to parameterize a type with a constant parameter const N: usize, we can parameterize it with a type parameter N: UsizeConstantT, and use N::VALUE (or <N as UsizeConstantT>::VALUE) when we actually need the corresponding usize value, e.g. for an array size.

Now, let’s implement addition in the type system!

use std::ops::Add;

impl<const X: usize, const Y: usize> Add<UsizeConstant<Y>> for UsizeConstant<X>
where
	[(); X + Y]:,
{
	type Output = UsizeConstant<{ X + Y }>;

	fn add(self, _rhs: UsizeConstant<Y>) -> Self::Output {
		Default::default()
	}
}

Thats… it! (As for the bound [(); X + Y], see the appendix at the bottom.) I mean, it’s kinda cheating; the typenum crate does something much cooler, but outside the scope of this post. That this implementation exists means that we can add a value of type UsizeConstant<Y> (for any const Y: usize) on the right hand side, to a value of type UsizeConstant<X> (for any const X: usize) on the left hand side. The associated Output type indicates that the sum is a value of type UsizeConstant<{ X + Y }>. The actual function implementation add simply produces the default value for the Output type, since struct UsizeConstant<const N: usize> is just a zero-sized struct we can #[derive(Default)] for. Note that what we really care about is the Output type, but let’s put that aside for now. We can do the following, to illustrate.

fn foo() {
	let x: UsizeConstant<2> = Default::default();
	let y: UsizeConstant<3> = Default::default();
	let z: UsizeConstant<5> = x + y;
	assert_eq!(z, Default::default());
}

Ok, that’s… not so interesting because they are zero-sized types; there is only one value (the default, empty value) for each of these types anyways. But maybe interestingly, we would get a compilation error if we changed the type of the local z to UsizeConstant<6>.

As mentioned, values of the zero-sized types UsizeConstant<N> (for constant N: usize) are not so interesting. Instead, the resulting Output type of adding a value of type UsizeConstant<Y> (for constant Y: usize) to a value of type UsizeConstant<X> (for constant X: usize) is conceptually the sum we want. Spelled out, such an output type is named like <UsizeConstant<X> as Add<UsizeConstant<Y>>>::Output. And in fact, we can get its usize value back when we really need a usize - through the associated constant VALUE because all UsizeConstant<N> (for constant N: usize) implement UsizeConstantT!

Subtraction is implemented similarly.

Finally visiting the `where` bounds on `impl<R, S> Rootable for (R, S)`

A simple trait bound on an item is of the form Type: Trait, meaning that the concrete type Type at monomorphization must implement the trait Trait. The body of the item can then be validated assuming that Type implements Trait (and therefore all the traits associated constants, types, and methods), without considering the concrete types it will be instantiated with. Instantiating the generic item can be validated by checking that the concrete types satisfy the bounds, without considering the body of the item.

The first three bounds are easiest to understand.

R: Rootable says that the concrete type R (at monomorphization) must implement the trait Rootable. In particular, it has a concrete associated type R::Rank. The trait Rootable also requires that R::Rank implements UsizeConstantT.
S: Rootable says that the concrete type S (at monomorphization) must implement the trait Rootable. In particular, it has a concrete associated type S::Rank. The trait Rootable also requires that S::Rank implements UsizeConstantT.
R::Rank: Add<S::Rank> says that R::Rank must implement Add<S::Rank>. In particular, it has a concrete associated type <R::Rank as Add<S::Rank>>::Output. Informally, Output = R::Rank + S::Rank. (If we just wrote R::Rank::Output, the Rust compiler would nicely ask us to be explicit about which trait’s Output we want.)

At this point, we can try to define the associated type Rank = <R::Rank as Add<S::Rank>>::Output;. Except the trait Rootable requires that Rank implements UsizeConstantT, and no-one ever said anything about what <R::Rank as Add<S::Rank>>::Output implements! So we add the fourth bound to ensure it implements UsizeConstantT: <R::Rank as Add<S::Rank>>::Output: UsizeConstantT.

Let’s turn our attention to the function root_into<'a> - specifically the call to guarded.split::<R::Rank>().

impl<R, S> Rootable for (R, S)
where
	// ...
{

	fn root_into<'a>(self, guarded: StackRootedSlice<'a, Self::Rank>) -> Self::Rooted<'a> {
		let (guarded_r, guarded_s): (
			StackRootedSlice<R::Rank>,
			StackRootedSlice<<<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output>,
		) = guarded.split::<R::Rank>();
		(
			self.0.root_into(guarded_r),
			self.1.root_into(Identity::<StackRootedSlice<S::Rank>>::identity(guarded_s)),
		)
	}

}

Recall that StackRootedSlice is (conceptually) a reference to an array of compile-time size. In particular, guarded has size corresponding to Self::Rank, which was informally R::Rank + S::Rank. We split guarded into two StackRootedSlices, one of size (corresponding to) R::Rank, and conceptually, one of size (corresponding to) S::Rank; at least, we need StackRootedSlices of these sizes since we call self.0.root_into(...) and self.1.root_into(...), where self.0 is an R and self.1 is an S, which are their own Rootables, so their function root_into<'a> takes StackRootedSlices of their ranks. Informally, the signature of split in StackRootedSlice looks like this:

impl<'a, N: UsizeConstantT> StackRootedSlice<'a, N> {

	pub fn split<I: UsizeConstantT>(self) -> (StackRootedSlice<'a, I>, StackRootedSlice<'a, N - I>)
	where
		// ...
	{
		// ...
	}

}

Informally, there are bounds saying we can subtract I from N, and that the Output of N - I implements UsizeConstantT, since we are returning a StackRootedSlice parameterized by N - I; these bounds are almost identical to the ones that let us add R::Rank to S::Rank and get UsizeConstantT back.

First, let’s check that in root_into<'a>, our call to the function split<I: UsizeConstantT>, which instantiates N (on the StackRootedSlice) with Self::Rank and I with R::Rank, satisfies its bounds.

N.B.: Somewhat surprisingly to me, as far as I can tell, the compiler does substitute <R::Rank as Add<S::Rank>>::Output for Self::Rank, to see that the bounds are satisfied (I would have expected the bounds on N to be required on Self::Rank directly, but I suppose it can do the substitution in this scenario because it is unambiguous that Self::Rank is <R::Rank as Add<S::Rank>>::Output).

So this call requires the following bounds be satisfied.

N: UsizeConstantT: Satisfied by <Self as Rootable>::Rank being required to implement UsizeConstantT (by the definition of the trait Rootable), which was enforced by our bound <R::Rank as Add<S::Rank>>:Output: UsizeConstantT.
I: UsizeConstantT: Similarly satisfied by <R as Rootable>::Rank being required to implement UsizeConstantT.
Subtracting N - I, i.e. N: Sub<I>: We need a bound Self::Rank: Sub<R::Rank>. Expanded, we get <R::Rank as Add<S::Rank>>::Output: Sub<R::Rank>; i.e. the 5th bound on impl<R, S> Rootable for (R, S).
The subtraction N - I implements UsizeConstantT, i.e. <N as Sub<I>>::Output: UsizeConstantT: We need a bound <Self::Rank as Sub<R::Rank>>::Output: UsizeConstantT. Expanded, we get <<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output: UsizeConstantT; i.e. the 6th bound on impl<R, S> Rootable for (R, S).

Now, consider that we return a StackRootedSlice of size (corresponding to) I, i.e. R::Rank, and a StackRootedSlice of size (corresponding to, informally) N - I, i.e. (R::Rank + S::Rank) - R::Rank. The first is exactly what we needed to call self.0.root_into(...). But while to us (R::Rank + S::Rank) - R::Rank should be equal to S::Rank, no bound has required that in the type system! More accurately, the second StackRootedSlice (stored in guarded_s) is a StackRootedSlice<'a, <<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output>, whereas self.1 is an S whose fn root_into<'a> requires a StackRootedSlice<'a, S::Rank>. If were using just const generics for the rank, we would get stuck here; the compiler can’t prove it, and we can’t express the requirement! But since <<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output and S::Rank are types, we can make a trait that requires they are equal!.

pub trait Identity<T> {
	fn identity(self) -> T;
}

impl<T> Identity<T> for T {
	fn identity(self) -> T {
		self
	}
}

Technically, we are not saying that the types are the same - rather that we can convert a value of type U implementing Identity<T> to T, and there is a single generic implementation for all T to identically convert a T to itself. We want to say StackRootedSlice<'a, <<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output> (what would be U previously) implements Identity<StackRootedSlice<'a, S::Rank>>, so that we can make the conversion. However, because of the lifetime parameter, we need a higher-ranked trait bound; the thing starting with for<'a>. Truly understanding higher-ranked trait bounds is outside the scope of this post, but it’s basically requiring that StackRootedSlice<'a, <<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output> implements Identity<StackRootedSlice<'a, S::Rank>> for any lifetime 'a. So given guarded_s of type StackRootedSlice<'a, <<R::Rank as Add<S::Rank>>::Output as Sub<R::Rank>>::Output>, we can identically convert it to a StackRootedSlice<'a, S::Rank> which self.1.root_into(...) needs - any lifetime will suffice!

So… you made it! Hopefully I explained it well enough :)

Final aside: Hmmm… from where does the trait `Identity<T>` look familiar?

Keen readers may notice that our trait Identity<T> is just a special case of the standard library trait Into<T> for performing general conversions. And in fact, any type T identically implements Into<T>, which is just what we did. Having a dedicated Identity trait just makes it clear that we’re not really intending a conversion.

Appendix

What are the array-lookalike bounds?

The last 4 bounds that look like [(); expr]: (with an empty righthand side after the colon) are somewhat the byproduct of Rust const-generics being incomplete. Quickly, it just encodes upfront in the generic item definition that expr must be valid. The Rust compiler can then validate a generic instantiation without looking at the body of the generic item. The final syntax is still being decided on.

What is Rootable?