I don't know if I'll be teaching the course again, but if I wait until next year I'll forget feedback I've received. Note that more test cases have been added (if you pull the new starter code); those are for the (potential) future!
Note: expect to do a lot of searching and reading for this lab!
In this final lab, you will implement the intermediate representation (IR) for your compiler, using LLVM. Your task in this lab will be to:
codegen()
, we’ll use std::visit
to iterate through the AST similar to in lab 3.LLVMContext TheContext
, IRBuilder<> Builder
, and std::unique_ptr<Module> TheModule
in our CodeGen
class.
All the LLVM API functions used to create the IR should involve (at least) one of those; the starter code will call LLVM to compile the IR to machine code and write it to a file
(e.g. you do not need to modify CodeGen::emit_object_file
).IfExprAST
is equivalent to the ternary operator in C;
the control flow is the same as for an if-statement,
but instead of needing a phi node to select a value from each branch,
each block of the if-statement simply produces the statements within the block.
Assignments to variables will be handled by chapter 7.int x
, we can instead treat it as something like int* x = malloc(sizeof(int));
(specifically, in LLVM IR, we will create a local variable AllocInst*
, which is a subclass of Value*
, representing the address of the local variable).x
become dereferences of x
; e.g. x = 3
becomes *x = 3
.x
(which is now a pointer) doesn’t change, so this satisfies the SSA requirement.CodeGen::generate
.
If you traverse the AST as you did in lab 3, and call the corresponding LLVM functions to create basic blocks and Value*
s, the starter code should produce the corresponding machine code output.You may additionally wish to read the LLVM manual, or LLVM docs, for a better idea of the LLVM infrastructure.
For a program to do something interesting, it (probably) needs to make system calls to perform I/O.
In runtime.cpp
, there are two functions by default; to read an int, and to print an int.
You can define additional functions here, add the corresponding function declaration to the input file to the compiler, and call it
(you don’t need to add anything for these labs, but it provides a way for you to extend your compiler if you wish).
The starter CMake files build libece467rt.a
(at <your build directory>/src/libece467rt.a
) from runtime.cpp
.
Later, you link this static archive to your compiled program, to be able to call the (arbitrary) functions you have defined.
By default, given an input file hi.c
, the starter code outputs hi.c.o
.
You will notice that this file is not executable, even if you have a main function.
Unfortunately, there’s a whole other dark side to compilers known as linking.
To keep things simple, we’ll just use the existing system linker, for which gcc
is actually just a frontend to.
So given hi.c.o
, you want to run the following command (paths relative to inside the build directory):
gcc -o <hi.o> hi.c.o -Lsrc -lece467rt -lreadline
The flag -L
adds a path to the list gcc searches for libraries from.
The -l
flags compile in/link libece467rt.a
and libreadline.so
respectively.
The readline library is just a slightly nicer input prompt, used by the provided read_int
function.
Note that the order is important.
You can use the script do_compile.sh
provided in the starter code to compile an input file with your compiler (outputs a.out
in the current directly)
as well as gcc
/g++
(outputs b.out
in the current directly).
You are encouraged to read this script (it’s short) to become more familiar with invoking gcc
/g++
directly.
Now you can run ./hi.o
!
llvm::IRBuilderBase
to find the majority of its methods.getType()
method may be helpful.ConstantInt
/APInt
where you need to specify the bit width,
APFloat
deduces the size of the float based on its argument
(float
is always 32 bits, double
is always 64 bits, unlike types such as int
or long
in C/C++, which are actually only specified to be at least 16 and 32 bits respectively, and may be different sizes on different architectures).IRBuilder::CreateRet
and IRBuilder::CreateRetVoid
may be used multiple times in a function.Value*
should work as the condition for a branch.
The LLVM tutorial requires an explicit comparison against 0.0
to produce a (1-bit) integer to use as a condition since all their values are doubles.The test cases used for marking are provided in the starter code’s test
directory.
See the comment at the top of each file for a description.
You can compile the tests with gcc to see the expected output.
gcc -c -o collatz.o test/lab4/collatz.c
g++ -c -o runtime.o src/runtime.cpp
g++ -o collatz runtime.o collatz.o -lreadline
./collatz
The following test case will be used to give part marks for the Fibonacci numbers test, which should not require chapter 7 of the LLVM tutorial (mutable variables) in order to pass.
// 2 points for correct fib1.
void put_int(int x);
int fib1(int n) {
if (n <= 1) {
return n;
}
return fib1(n - 1) + fib1(n - 2);
}
int main() {
put_int(fib1(0)); // forgot to call fib1 in the original post...
put_int(fib1(1));
put_int(fib1(2));
put_int(fib1(3));
put_int(fib1(4));
put_int(fib1(5));
put_int(fib1(6));
put_int(fib1(7));
put_int(fib1(8));
put_int(fib1(9));
return 0; // forgot this in the original post
}
The original file fib.c
remains the same, but is only worth 2 marks for correct output.