Welcome to the ECE 467 labs homepage for 2020 Fall. See here for the latest version of the course.
You can run git pull origin master
to get the changes to the starter code.
Since you've began modifying the starter code,
there probably will be conflicts that you must manually resolve.
Alternatively, run git fetch origin
followed by git diff 3f90de49ecb0d37d5ae868d76cd39250a6dc48a0 fe15a958c061f55ab531fb7f324c101c248fb910
to get a diff of the changes I've made to lab 4.
The important changes are summarized below.
Note: I retroactively made these changes to the starter code after doing my solution. I may have missed something that will cause your code to not work; if you suspect this, please contact me so I can fix it.
runtime.cpp
from SOURCES
in src/CMakeLists.txt
add_library(ece467rt SHARED runtime.cpp)
to the bottom of src/CMakeLists.txt
compile
in compiler.cpp
:
std::optional<llvm::Error> e = unit->build(); if (e) { return nullptr; }
CompilationUnit::dump
in compiler.cpp
, replace llvm::WriteBitcodeToFile(*this->module, out)
with this->module->print(out, nullptr)
runtime.cpp
, replace putint(int64_t x)
with put_int(int32_t x)
(and change the format specifier from %ld
to %d
in the printf
)
main.cpp
, replace the end of the lab 4 driver code (the else if (lab == 4)
block) with:
root = optimize(std::move(root)); std::unique_ptr<CompilationUnit> u = compile(root.get()); if (u == nullptr) { printf("[error] failed to compile.\n"); return 1; } u->dump(argv[2] + ".ll"s); return 0;It should produce a text file which you can then compile through
clang
(see bottom of page).
Note: You will probably have to do a lot of reading (and googling) for this lab!
In this final lab, you will implement the intermediate representation (IR) for your compiler. In particular, you will be using LLVM's IR, which is based on static single assignment (SSA) form. LLVM can perform optimizations on IR, generate machine code from the IR and will produce the executables. Your task in this lab will be to:
CompilationUnit::process
function in compiler.cpp
(the LLVM tutorial does most of the constructs you need)A control-flow graph (CFG) is a directed graph with a designated start and end node. In the context of programs, the nodes are "basic blocks". A basic block is a sequence of instructions that can only start executing from the beginning (you cannot jump to and start in the middle of a basic block), and once it starts executing will continue executing until the end (there are no jumps in/from the middle of the basic block). A basic block has at most 2 successors, but can have any positive number of predecessors (think about how). With LLVM, you will be creating the CFG by making and connecting basic blocks, while filling in the basic blocks with instructions in SSA form.
SSA form is a property of a compiler's IR which requires that each variable is assigned to only once. The motivation for SSA form is that it greatly simplifies data-flow analysis; each variable has a single reaching definition. As a result, most modern compilers generate and operate on IR in SSA form. Of course, variables in real programs may be assigned to many times; to handle this in SSA, a new variable is created for each variable assignment (commonly, it is named after the source variable name, with an integer subscript). Special phi functions are used to merge multiple versions of a variable. Each phi function has 1 argument for each predecessor of the basic block it belongs to. The phi function is "evaluated" at runtime, and its value is the i-th argument when control flow enters the phi function's basic block from the i-th predecessor. Consider the following code:
int x = 3; x = 4; if (cond()) { x = 5; } print(x);In IR, this code would first be broken into basic blocks (pseudo-IR shown below):
B0: x0 = 3 x1 = 4 if cond() goto B1 else goto B2 B1 (B0): x2 = 5 goto B2 B2 (B0, B1): x3 = phi(x1, x2) print(x3)Let us analyze this IR:
B0
, B1
, and B2
are basic blocksx
creates a variable x0
,
and the second assignment to x
creates a variable x1
B2
has 2 arguments as B2
has 2 predecessorsB2
from B0
, x3
is assigned the value of x1
.
When control enters B2
from B1
, x3
is assigned the value of x2
The algorithm by Braun et al. is essentially for placing phi functions in the "correct place" to merge multiple definitions of a mutable variable. LLVM has an MemorySSA API which simplifies codegen for mutable variables (also described in chapter 7 of the tutorial). With MemorySSA, each local variable is a constant address. This address never gets assigned to outside of the initial definition; it is an SSA variable. To modify the local variable, instead of modifying the SSA variable, we will modify the contents of the address in the SSA variable. (I originally intended for students to use Braun et al.'s algorithm, but at this point I think it would be too much for most students).
Note: Not all the includes have been added for you! You can look them up yourself
Note: Use google to find more LLVM docs! e.g. the first link from searching "llvm constant int"
llvm::ConstantInt::get(llvm::Type::getInt32Ty(context), value, signed);
llvm::ConstantFP::get(llvm::Type::getDoubleTy(context), value);
std::vector<llvm::Type*> parameters; llvm::FunctionType* signature = llvm::FunctionType::get(return_type, parameters, /* isVarArg */ false); llvm::Function* f = llvm::Function::Create(signature, llvm::Function::ExternalLinkage, name, module); size_t i = 0; for (llvm::Argument& a : f->args()) { a.setName(name); }
llvm::BasicBlock* bb = llvm::BasicBlock::Create(context, name, function); builder.SetInsertPoint(bb);
llvm::Type::getIntNTy(context, num_bits)
To create a call to a function that returns void
, use builder.CreateCall(callee, args)
(no name/string).
The CFG for a for-loop should look like this:
before: // whatever happens before the loop goto loop_condition header: // loop induction statement goto loop_condition loop_condition: // evaluate loop condition branch body after body: // loop body goto header after: // continue generating code
To handle break and continue, you must keep track of the "header" and "after" block for each loop.
int
s should be 32 bitsfloat
s should be 32 bits
You can implement auxiliary functions in runtime.cpp
in the C namespace -
i.e. those declared with extern "C"
.
After implementing an extern "C"
function in runtime.cpp
,
to access it from your input program source code,
simply include a declaration for it with a matching signature.
Then, you should be able to call it.
For example, void put_int(int32_t);
has been declared in runtime.cpp
.
To access it in your input program source code,
first create a declaration void put_int(int x);
.
Now, you should be able to call it like so: put_int(3);
.
This way, you can arbitrarily extend your compiler to support I/O, interfacing with other libraries, etc.
Example: ./src/ece467c 4 foo.c
creates foo.c.ll
on success.
You can view this LLVM textual bitcode with a text editor.
To produce an executable, run clang foo.c.ll -Lpath/to/build/src -lece467rt -o foo.o
.
To run your executable, run LD_PRELOAD=path/to/build/src/libece467rt.so ./foo.o
(with bash).
To avoid specifying LD_PRELOAD
every time, you can do export LD_PRELOAD=path/to/src/libece467rt.so
in bash.
If you are using csh (the default on the ECF machines), you can do setenv LD_PRELOAD path/to/src/libece467rt.so
.
Additionally, Your compiler will be run through Valgrind to catch memory bugs.
The following test cases are provided for you.
Same as Lab 1.