UofT ECE 467 (2020 Fall)

Lab Description

The Flex input file in the starter code is src/lexer.l. The Flex manual contains everything you need to know about this file. We bring your attention to a few things.

A lexer produces a stream of tokens from an input string. These tokens are processed by a parser. As such, the lexer and parser must use the same representation for tokens. With Flex and Bison, Bison defines the tokens, which get written to a parser header file, and the Flex generated lexer #includes this header for the token definitions. When you run make, the parser header is written to build/src/parser.hpp (a corresponding cpp file exists in the same directory). You may want to look through this file for a better understanding of how it works.

In particular, the tokens are defined in an enum token_kind_type namespaced inside a struct token inside the yy::parser class. The type of these enum values is typedef'd to yy::parser::token_kind_type, and you would access a token HI with yy::parser::token::HI. You define tokens in the src/parser.y file by entering a line like %token <std::string> HI. Here, std::string is the type of the data associated with the HI token. You can specify any C++ type here. Tokens such as numbers or strings require this extra data for compilation. For tokens that don't require accompanying data, such as a PLUS operator token, you may omit the <type> part of the token declaration.

Often, different tokens have different types of associated data; for example, an integer token would have an int, a string token would have a char*. Traditionally in Bison, this was implemented using a C style union. Most tutorials you find online will use this old C style union. Unfortunately, limitations of C style unions prevented you from using more complex C++ types such as std::string or std::unique_ptr. A newer feature of Bison is variants, essentially a type-safe union implemented using smart meta-programming tricks. Unfortunately, the Bison manual on this is a bit scattered; in particular, we point you to here and here.

The starter lexer.l includes macros for simplifying the creation of tokens in the lexer. In particular, if you define a token HI with type std::string, you can use the TOK macro like return TOK(HI, <column number>, my_string) to return the corresponding token. For a token defined with no associated data, such as the built in YYUNDEF type symbolizing an invalid token, you can call it simply as TOK(YYUNDEF, <column number>).

In other words, you should have a %token declaration in your parser.y file for each type of token, listed below. Certain tokens will require you store additional data. When Flex matches a rule, the matched text is available inside a char* yytext of length int yyleng.

Token Listing

The tokens your compiler will handle are listed below. It is your job to come up with the corresponding regexes. When you declare these tokens in your parser.y file, the names should have a TOK_ prefix and be all uppercase. This will simplify proper output later on.

Name	Description
identifier	letter or underscore, followed by 0 or more letters, numbers, or underscores
integer	a decimal integer (no hex, binary, or octal literals)
float	a C floating point literal as described here. You do not need to implement hex literals or suffixes.
true, false	boolean literals
lparen, rparen	left and right parenthesis
lbrace, rbrace	left and right curly braces
eq, ne, lt, gt, le, ge	(in)equality relational operators
plus, minus, star, slash	basic binary arithmetic operators
log_and, log_or	logical and (&&), logical or (\|\|)
if, while, for, break, continue, return	control-flow keywords
comma, semicolon, colon, question_mark	punctuation
assign	single equals sign
Update for Lab 3: plus_assign, minus_assign, star_assign, slash_assign	augmented assignments
Update for Lab 3: type	the literal tokens "int", "float", "bool", and "void"

Location Tracking

The starter lexer.l has a LOC macro which creates the location object used for tokens. This is used by the TOK macro. Due to the structure of control flow, the TOK macro takes a column number as its second parameter. Flex already tracks line information in an int yylineno. However, it does not track columns. Using Flex's extra data feature for reentrant scanners, the starter code contains macros for getting, setting, and incrementing a column counter. To implement column tracking, you should use the SET_COLUMN and INC_COLUMN macros. Other notes:

The location of a token is the position of its first character, where the first character of the file is at line 1 column 1
1 character = 1 column (tabs count as 1 column)
In order for yyterminate to work properly, the location should point to the first character after the token being returned

Execution

The usage for your compiler is ece467c <lab #> <input file>.

Output

If you named your tokens properly, there should be nothing else to do. The driver in src/main.cpp which calls lex in src/compiler.cpp handles printing the type of tokens lexed. You do not need to change anything for error handling; the starter code error handling routine is expected. For this lab, lines with output or error will be grep'd, and compared against the desired output.

Test Cases

The following test cases are provided for you:

success-0 (output)
fail-0 (output)

Submission

Please fill out your student number, first name, and last name at the top of src/main.cpp. Include a doc.txt that summarizes your contributions. Additionally, include any major design/implementation decisions; this will be used to give part marks if there are failing test cases. Please name your project directory ece467-<student number 1>-<student number 2> before compressing it (tar -czvf ece467-<student numbers>.tar.gz --exclude build ece467-<student numbers>). Submit the resulting archive on Querqus.

UofT ECE 467 (2020 Fall) - Lab 1

Navigation

References

Corrections for Lab 3