I don't know if I'll be teaching the course again, but if I wait until next year I'll forget feedback I've received. Note that more test cases have been added (if you pull the new starter code); those are for the (potential) future!
Learning to use tmux
is recommended (a quick reference of usage is provided, but google is your friend as well).
A terminal multiplexer:
It is also recommended to use bash
as a shell instead of the default sh
on the UG machines;
it includes quality of life improvements such as reverse command search (control + r
).
While I cannot seem to change the default shell on the UG machines upon (SSH) logging in, you can instruct tmux
to use bash
for its windows by adding the following to ~/.tmux.conf
.
set -g default-shell /usr/bin/bash
Instructions are on Querqus (due to containing semi-sensitive information):
Modules -> Remote Access to the Lab Machines -> ug-remote-access.txt
.
C++ has a long history, not without major improvements to the language. Generally, “modern C++” refers to the C++11 standard (and later), which added “move semantics”. The labs are configured to use C++17 (C++20 is only partially supported by GCC and Clang). These labs make liberal use of modern C++ idioms.
Initialization in C++ is at best… a complex mess. If you care about the technical details, you can read about initialization on cppreference.com. Otherwise, here is a loose summary.
You may not like it, but this is what peak C++ initialization looks like.
int x { 0 };
// or, if you like `auto`
auto y = int { 0 };
You won’t find much of the above code in the wild given C++’s long history, but, while not perfect, you should generally prefer curly braces for initialization. Yep. That means for-loops should look something like this:
for (size_t i { 0 }; i < 10; i++) {
printf("%zd\n", i);
}
Roughly, the curly braces prevents implicit narrowing conversions of numeric types,
although, for reasons, GCC only issues a warning.
The following is an error on Clang, since an int
doesn’t necessarily fit in an size_t
(which is unsigned):
int x { 0 };
size_t y { x };
When you want to call the default constructor, using curly braces also avoids the most vexing parse:
struct Foo {
Foo() = default;
};
Foo f(); // Interpreted as a declaration of a function `f` returning a `Foo`.
Foo g{}; // Calls the default constructor of `Foo`.
The one “gotcha” I’m aware of is that curly braces may perform aggregate initialization, which will bypass constructors (even deleted or privated ones)!
struct Foo {
int x;
std::string str;
Foo() = delete;
};
Foo f { 1, "hello world" }; // Compiles.
However, aggregate initialization is useful for types that “just hold data”, such as those in the starter code’s src/nodes.hpp
,
as it allows you to initialize the fields (in declaration order) without defining a bunch of constructors.
C++ introduces four dedicated cast operations which are roughly explained as follows:
const_cast
: remove const
qualifier;static_cast
: perform numeric conversions;reinterpret_cast
: convert between pointer types; anddynamic_cast
: safely cast pointers and references across inheritance hierarchies.These should be preferred over C-style casts, which essentially try the first 3 above to get the code to compile.
You should not need to call malloc
or free
in C++.
You also should not need to call new
and delete
in C++.
Instead, use std::unique_ptr
and std::make_unique
, or std::shared_ptr
and std::make_shared
.
Firstly, std::unique_ptr
and std::shared_ptr
take advantage of RAII to prevent resource leaks.
Herb Sutter, the chair of the ISO C++ standards committee, explains why you should use std::make_unique
and std::make_shared
in this blog post.
C++17 introduced std::variant
which is known in type theory as a sum type.
In practice, you can just think of it as a type-safe union;
std::variant
remembers “what it contains” with an internal tag,
so it can enforce that you handle all its possible types,
or throw an exception if you incorrectly assume the type of and access its internal value.
In general, we often have a heterogeneous collection of “things”.
In a compiler, you typically have an abstract syntax tree with different kinds of nodes.
Let’s consider how to represent a binary expression, containing a “left” and “right” expression (and typically an operator).
If you come from object-oriented programming, you will probably reach for inheritance:
a base Expression
class with multiple child subclasses, such as LiteralExpression
and BinaryExpression
.
You can make the Expression
class abstract to prevent instantiations of an ambiguous/incomplete expression,
and use virtual functions to process Expression
subclasses uniformly, e.g. with a visitor pattern.
With inheritance, the set of possible types is open; you cannot restrict the set of subtypes.
Because the set of subtypes is unknown, the only way to process such a heterogeneous collection is to impose a common, or closed interface;
for example, you can only use functions defined in the Expression
base class.
However, as a compiler writer, you really should know all the possible expression kinds (by the language’s grammar); the set of types is closed. Because you know all the possible types, you can explicitly handle them one by one, in a manner appropriate to each type. In some sense, the interface is open.
You might think, “with inheritance I can dynamic_cast
to figure out the concrete type, and then do whatever I want with it”.
However, there is no way to ensure you handle every possible subtype.
Run git clone http://individual.utoronto.ca/dfr/ece467/2022/ece467-starter.git ece467-compiler
:)
While you don’t need to use git
beyond this, it is recommended for your own sanity to use some proper version control system.
A quick reference for git
is provided: link.
cd
into your project root (ece467-compiler
above).cd
into it: mkdir build && cd build
(you can create more build directories with different names and configurations).configure_cmake.sh
in the project root contains the command with the correct path on the lab machines.
-DLLVM_DIR=
points to accordingly.make -j8
(the 8
means 8 threads; you can set this to something else, but the lab machines have 8 cores (on the one I checked)).Your executable is now in ./src/ece467c
(relative to inside your build directory).
From your build directory, run ./src/ece467c <lab number: 1/2/3/4> <input file path>
.
You can try the starter code for labs 1 and 2 with the included dummy.c
: (from your build directory) ./src/ece467c 1 ../dummy.c
or ./src/ece467c 2 ../dummy.c
.
I highly recommend getting used to gdb
.
The quick references has some common commands.
cd
out of your project root (cd ..
).tar
your code (ece467-compiler
above): tar -czvf <anything>.tar.gz --exclude <your build folder> ece467-compiler
.submitece467f <lab number> <anything>.tar.gz
.You can check your submitted file(s) with submitece467f -l <lab number>
.