UofT ECE 467 (2022 Fall) - Things to Read Before Starting Lab 1



The website and starter code is being pre-emptively updated!

I don't know if I'll be teaching the course again, but if I wait until next year I'll forget feedback I've received. Note that more test cases have been added (if you pull the new starter code); those are for the (potential) future!


Using the Terminal

Learning to use tmux is recommended (a quick reference of usage is provided, but google is your friend as well). A terminal multiplexer:

  1. allows you to have multiple terminal instances within a single SSH connection, and
  2. allows you to detach from your terminal sessions and reattach later.

It is also recommended to use bash as a shell instead of the default sh on the UG machines; it includes quality of life improvements such as reverse command search (control + r). While I cannot seem to change the default shell on the UG machines upon (SSH) logging in, you can instruct tmux to use bash for its windows by adding the following to ~/.tmux.conf.

set -g default-shell /usr/bin/bash

SSH Remote Access

Instructions are on Querqus (due to containing semi-sensitive information): Modules -> Remote Access to the Lab Machines -> ug-remote-access.txt.

Modern C++

C++ has a long history, not without major improvements to the language. Generally, “modern C++” refers to the C++11 standard (and later), which added “move semantics”. The labs are configured to use C++17 (C++20 is only partially supported by GCC and Clang). These labs make liberal use of modern C++ idioms.

Initialization

Initialization in C++ is at best… a complex mess. If you care about the technical details, you can read about initialization on cppreference.com. Otherwise, here is a loose summary.

You may not like it, but this is what peak C++ initialization looks like.

int x { 0 };
// or, if you like `auto`
auto y = int { 0 };

You won’t find much of the above code in the wild given C++’s long history, but, while not perfect, you should generally prefer curly braces for initialization. Yep. That means for-loops should look something like this:

for (size_t i { 0 }; i < 10; i++) {
	printf("%zd\n", i);
}

Roughly, the curly braces prevents implicit narrowing conversions of numeric types, although, for reasons, GCC only issues a warning. The following is an error on Clang, since an int doesn’t necessarily fit in an size_t (which is unsigned):

int x { 0 };
size_t y { x };

When you want to call the default constructor, using curly braces also avoids the most vexing parse:

struct Foo {
	Foo() = default;
};

Foo f(); // Interpreted as a declaration of a function `f` returning a `Foo`.
Foo g{}; // Calls the default constructor of `Foo`.

The one “gotcha” I’m aware of is that curly braces may perform aggregate initialization, which will bypass constructors (even deleted or privated ones)!

struct Foo {
	int x;
	std::string str;

	Foo() = delete;
};

Foo f { 1, "hello world" }; // Compiles.

However, aggregate initialization is useful for types that “just hold data”, such as those in the starter code’s src/nodes.hpp, as it allows you to initialize the fields (in declaration order) without defining a bunch of constructors.

Casts

C++ introduces four dedicated cast operations which are roughly explained as follows:

These should be preferred over C-style casts, which essentially try the first 3 above to get the code to compile.

Smart Pointers

You should not need to call malloc or free in C++. You also should not need to call new and delete in C++. Instead, use std::unique_ptr and std::make_unique, or std::shared_ptr and std::make_shared. Firstly, std::unique_ptr and std::shared_ptr take advantage of RAII to prevent resource leaks. Herb Sutter, the chair of the ISO C++ standards committee, explains why you should use std::make_unique and std::make_shared in this blog post.

Sum Types

C++17 introduced std::variant which is known in type theory as a sum type. In practice, you can just think of it as a type-safe union; std::variant remembers “what it contains” with an internal tag, so it can enforce that you handle all its possible types, or throw an exception if you incorrectly assume the type of and access its internal value.

In general, we often have a heterogeneous collection of “things”. In a compiler, you typically have an abstract syntax tree with different kinds of nodes. Let’s consider how to represent a binary expression, containing a “left” and “right” expression (and typically an operator). If you come from object-oriented programming, you will probably reach for inheritance: a base Expression class with multiple child subclasses, such as LiteralExpression and BinaryExpression. You can make the Expression class abstract to prevent instantiations of an ambiguous/incomplete expression, and use virtual functions to process Expression subclasses uniformly, e.g. with a visitor pattern.

With inheritance, the set of possible types is open; you cannot restrict the set of subtypes. Because the set of subtypes is unknown, the only way to process such a heterogeneous collection is to impose a common, or closed interface; for example, you can only use functions defined in the Expression base class.

However, as a compiler writer, you really should know all the possible expression kinds (by the language’s grammar); the set of types is closed. Because you know all the possible types, you can explicitly handle them one by one, in a manner appropriate to each type. In some sense, the interface is open.

You might think, “with inheritance I can dynamic_cast to figure out the concrete type, and then do whatever I want with it”. However, there is no way to ensure you handle every possible subtype.

Finally, Getting the Code

Run git clone http://individual.utoronto.ca/dfr/ece467/2022/ece467-starter.git ece467-compiler :)

While you don’t need to use git beyond this, it is recommended for your own sanity to use some proper version control system. A quick reference for git is provided: link.

Building the Code

Your executable is now in ./src/ece467c (relative to inside your build directory).

Running your Compiler

From your build directory, run ./src/ece467c <lab number: 1/2/3/4> <input file path>.

You can try the starter code for labs 1 and 2 with the included dummy.c: (from your build directory) ./src/ece467c 1 ../dummy.c or ./src/ece467c 2 ../dummy.c.

Debugging your Compiler

I highly recommend getting used to gdb. The quick references has some common commands.

Submitting your Code

You can check your submitted file(s) with submitece467f -l <lab number>.


Last updated: 2022-12-23 09:56:36 -0500.