Operating Systems

ECE344, Spring 2019
University of Toronto

Labs and website created by: Ashvin Goel
Deadlock

Quick Links

Lab AssignmentsQuercus Website

Lab 2: A Cooperative, User-Level Thread Package

Due Date: Feb 10, 11:59pm

In this lab, you will create a library of functions that define a user-level threads package. Using your library, a program will be able to create threads, destroy them, and run them concurrently on a single processor.

Background on Threads

Threads and processes are key abstractions for enabling concurrency in operating systems. To gain a deeper understanding of how these abstractions are constructed, this project asks you to build the core of a user-level threads package. Building kernel-level threads and processes is much different, but we'll do this project at user level since installing new kernels on the lab machines is problematic.

Threads provide the illusion that different parts of your program are executing concurrently. Through the years, a model of executing multithreaded programs has emerged as a defacto standard. In this model, threads share the code, heap, and the runtime system. Each thread, however, has a separate stack and, naturally, a separate set of CPU registers. This programming model also provides synchronization primitives so that different threads can coordinate access to shared resources.

User-Level vs. Kernel Threads

For practical reasons, this project is done at user level: you will construct user threads by implementing a set of functions that your program will call directly to provide the illusion of concurrency. In contrast, modern operating systems provide kernel threads, and a user program invokes the corresponding kernel thread functions via system calls. Both types of threads use the same core techniques for providing the concurrency abstraction; you would build kernel threads in essentially the same way you build user threads in this project. Also, kernel processes are built using these techniques.

However, there are a few differences between kernel and user threads:

Multiprocessing
Whereas user-level threads provide the illusion of concurrency, on machines with multiple processors, kernel-level threads can provide actual concurrency. With user-level threads, the kernel schedules the user process on one CPU, and the user-level threads package multiplexes the (kernel) thread associated with the process between one or more (user-level) threads. With kernel-level threads, the kernel is aware of the different (kernel) threads, and it can simultaneously schedule these threads from the same process on different processors.

A key simplifying assumption for this project is that you will allow programs to multiplex some number (e.g., m) of user-level threads on one kernel thread. This means that at most one user-level thread is running at a time and that your runtime system has complete control over the interleaving of user-level threads with each other. More sophisticated systems implement m on n threads packages where m user-level threads are multiplexed across n kernel threads.
Asynchronous I/O
When a user-level thread makes a system call that blocks (e.g., reading a file from disk), the kernel scheduler moves the process to the Blocked state and will not schedule it until the I/O has completed. Thus, even if there are other user-level threads within that process, they have to wait, too. Conversely, when a kernel thread blocks for a system call, the kernel scheduler is aware that other threads in the same process may be ready to run. Thus, some kernel threads may be running while others are waiting for I/O.
Timer interrupts
In the next lab, we will simulate timer interrupts that cause the scheduler to switch from one thread or process to another by using POSIX signals. In your implementation, the threads library will "turn off interrupts" by blocking delivery of these signals using system calls. However, there is nothing to prevent the threads, themselves, from "turning off interrupts" the same way. Thus, even though we will implement "preemptive" threads, a "malicious" thread could turn off interrupts and not be preempted until it calls yield, thus hogging the CPU. Note that kernel schedulers don't have this problem. Only the privileged code in the kernel can turn off the real timer interrupts.

Using Threads

With your threads library, a typical program will look like:

int
main(int argc, char **argv)
{
    // Some initialization
    // Create some threads
    // wait for threads to finish
    // exit
}

// "main" function for thread i
thread_main_i (...)
{
    // do some work
    // yield
    // do some more work
    // return (implicit thread exit)
}

Here thread_main_i is a programmer-supplied "main" function that the ith thread starts executing (note that different threads may have the same "main" function). The thread can perform useful work by calling any other functions in the program, or voluntarily yielding to other threads.

A thread exits either explicitly or implicitly. It exits explicitly when it calls the thread_exit function in the thread library. It exits implicitly when its thread_main function returns. Additionally, to add more control to the program, a thread may call thread_kill to force other threads to exit as well.

Thread Context

Each thread has per-thread state that represents the working state of the thread -- the thread's program counter, local variables, stack, etc. A thread context is a subset of this state that must be saved/restored from the processor when switching threads. (To avoid copying the entire stack, the thread context includes a pointer to the stack, not the entire stack.) Your library will store the thread context in a per-thread data structure (this structure is sometimes called the "thread control block").

Saving/Restoring Thread Context

When a thread yields the CPU, the threads library must save the current thread's context, which contains the processor register values at the time the thread yields the CPU. The library restores the saved context later when the thread gets its turn to run on the processor. Additionally, the library creates a fresh context and a new stack when it creates a thread.

Fortunately, the C runtime system allows an application to retrieve its current context and store it in a memory location, and to set its current context to a predetermined value from a memory location. Your library will make use of these two existing library calls: getcontext and setcontext.

Study the manual pages of these two calls. Notice that getcontext saves the current context into a structure of type struct ucontext of type ucontext_t. So, if you allocate a struct ucontext and pass a pointer to that memory to a call to getcontext, the current registers and other context will be stored to that memory. Later, you can call setcontext to copy that state from that memory to the processor, restoring the saved state.

The struct ucontext is defined in /usr/include/sys/ucontext.h. Look at the fields of this struct in detail, especially the uc_mcontext and the uc_sigmask fields.

You will use getcontext and setcontext in two ways. First, to suspend a currently running thread (to run another one), you will use getcontext to save its state and later use setcontext to restore its state. Second, to create a new thread, you will use getcontext to create a valid context, but you will leave the current thread running; you (the current thread, actually) will then change a few registers in this valid context to initialize it as a new thread, and put this new thread into the ready queue; finally, at some point, the new thread will be chosen by the scheduler, and it will run when setcontext is called on this new thread's context.

Changing Thread Context

As noted above, when creating a thread, you can't just make a copy of the current thread's context (using getcontext). You need to make a copy and then change four things:

In the real world, you would take advantage of an existing library function, makecontext, to make the first, third and fourth changes. The advantage of using this function is that it abstracts away the details of how a context is saved in memory, which simplifies things and helps portability. The disadvantage is that it abstracts away the details of how a context is saved in memory, which might leave you vague on exactly what's going on.

In the spirit of "there is no magic", for this lab you should not use makecontext or swapcontext. Instead, you must manipulate the fields in the saved ucontext_t directly.

What is the stub function? How do the arguments work? Read on.

Stub Function

When you create a new thread, you want it to run the thread_main function that defines the work you want the thread to do. A thread exits implicitly when it returns from its thread_main function, much like the main program thread is destroyed by the OS when it returns from its main function in C, even when the main function doesn't invoke the exit system call. To implement a similar implicit thread exit, rather than having your thread begin by running the thread_main function directly, you should start the thread initially in a "stub" function that calls the thread_main function of the thread (much like main is actually called from the crt0 stub function in UNIX). In other words, thread_create should setup a thread so that it starts running the thread_stub function shown below. Then, when the thread runs for the first time, it will execute thread_stub, which will call thread_main. Then, your thread_main function can return to the stub function, should it return.

/* thread starts by calling thread_stub. The arguments to thread_stub are the
 * thread_main() function, and one argument to the thread_main() function. */
void
thread_stub(void (*thread_main)(void *), void *arg)
{
	Tid ret;

	thread_main(arg); // call thread_main() function with arg
	ret = thread_exit();
	// we should only get here if we are the last thread. 
	assert(ret == THREAD_NONE);
	// all threads are done, so process should exit
	exit(0);
}

In the above code, the argument thread_main is a pointer to the thread_main function that describes the real work the thread should do. Notice that in C, a function's name refers to the address of its code in memory. The second argument to thread_stub (arg) is the argument to pass to the thread_main function. We'll have the thread_main function take an argument that is a pointer to an arbitrary type so that you can pass it whatever you want. thread_exit, THREAD_SELF, and THREAD_NONE are defined below.

Contexts and Calling Conventions

The context structure contains many data fields, but you only need to deal with four of them when creating new threads: the stack pointer, the program counter, and two argument registers. Other than that, you don't need to worry about the fields within the context variable, as long as you do not tamper with them. Also, it is a good idea to use variables that have been initialized through a getcontext call in order to not have bizarre behavior.

Under the Posix C calling conventions in x86-64, here's what things look like while any given function is executing:

x86-64 calling convention

Notice that while a procedure executes, it can allocate stack space by moving the stack pointer down (stack grows downwards). However, it can find local variables, parameters, return addresses, and the old frame pointer (old %rbp) by indexing relative to the frame pointer (%rbp) register because its value does not change during the duration of a function call.

When a function needs to make a function call, it copies the arguments of the "callee" function (the function to be called) to the registers shown on the right in the x86-64 architure. For example, the %rdi register will contain the first argument, the %rsi register will contain the second argument, etc. Then the caller saves the current instruction pointer (%rip) into the stack (shown as "return address" in the figure), and changes the instruction pointer to the callee function. At this point the stack pointer (%rsp) points to the return address (shown in the figure). Note that the stack pointer points to the last pushed value in the stack.

The callee function then starts executing. It first pushes the the frame pointer value of the caller function (shown as old %rbp) into the stack, and then sets the frame pointer (%rbp) to the current stack pointer (%rbp = %rsp), so that it points to the old frame pointer value. Then the callee function decrements the stack pointer (shown as %rsp), and uses the space between the frame pointer and the stack pointer for its local variables, for saving or spilling other registers, etc. As an example, these three steps are performed by the first three instructions (push, mov and sub) in the main function shown below. The callee locates its variables, parameters, and the return address, by using addresses relative to the fixed frame pointer (%rbp).

To return to the caller, a procedure simply copies the frame pointer (%rbp) to the stack pointer (%rsp = %rbp), effectively releasing the current frame. Then it pops the top stack item into %rbp to restore the %rbp of the caller function, and then uses the ret instruction to pop the old instruction pointer off the stack into the instruction register (%rip), returning control to the caller function. These steps are performed by the last two instructions (leaveq, retq) in the main function shown below.

This is a gdb listing for the test_basic program that we will provide you for this lab. Run gdb on it (or on any program that you have compiled on the lab machines), as shown below, to see the instructions at the start and the end of a function (e.g., main function). Make sure that you understand what these instructions are doing, and are able to answer the questions in the listing below.

$ cd threads
$ gdb test_basic
...
(gdb) b main
Breakpoint 1 at 0x4009ab: file test_basic.c, line 7.
(gdb) r
...

Breakpoint 1, main (argc=1, argv=0x7fffffffe868) at test_basic.c:7
7               thread_init(0);
(gdb) disassemble main
Dump of assembler code for function main:
   0x000000000040099c <+0>:     push   %rbp
   0x000000000040099d <+1>:     mov    %rsp,%rbp
   0x00000000004009a0 <+4>:     sub    $0x10,%rsp
   0x00000000004009a4 <+8>:     mov    %edi,-0x4(%rbp)  // what is happening here?
   0x00000000004009a7 <+11>:    mov    %rsi,-0x10(%rbp) // what is happening here?
=> 0x00000000004009ab <+15>:    mov    $0x0,%edi
   0x00000000004009b0 <+20>:    callq  0x401edc <thread_init>
   0x00000000004009b5 <+25>:    mov    $0x0,%eax
   0x00000000004009ba <+30>:    callq  0x4009dc <test_basic>
   0x00000000004009bf <+35>:    mov    $0x0,%eax
   0x00000000004009c4 <+40>:    leaveq
   0x00000000004009c5 <+41>:    retq
End of assembler dump.

One complication with the Posix C x86-64 calling convention is that it requires the frame pointer %rbp to aligned to 16 bytes. This byte alignment means that the value of %rbp, or the stack location (stack address) to which %rbp points to, must be a multiple of 16. Otherwise, system libraries may crash.

When you are creating a new thread that will execute the thread_stub function, you will need to setup the stack so that this calling convention is followed. In particular, you will need to think about the byte alignment for the stack pointer and the frame pointer when control is transferred to thread_stub.

Setup

Make sure to go over the setup instructions for Lab 1, if you have not done so previously.

Add the source files for this lab, available in threads.tar, to your repository, and run make in the newly created threads directory.

cd ~/ece344
tar -xf /cad2/ece344s/src/threads.tar
git status # should say that "threads/" directory is untracked
git add threads
git commit -m "Initial code for Lab 2"
git tag Lab2-start
cd threads
make

Start by implementing show_ucontext.c. Replace all instances of "-1" in the various printf function calls with the appropriate code (also, make sure to not modify any code at the beginning of main, as suggested by the comment there). Make sure that you are able to answer all the questions mentioned in the source file, which will help you understand the stack layout and how a thread's context is saved and restored. It will also clarify the concepts about the thread context, introduced above.

The implementation will also give you a brief exposure to signals, an OS concept similar to hardware interrupts. The code in show_ucontext.c will show you how to disable or enable signals, and how these actions affect the saved thread context.

Make sure to finish implementing show_ucontext.c before moving to the implementation of the threads library. You can use the tester to test the output of your modified show_ucontext.c.

Cooperative Threads API

In this project, you will build a user level threads package. A key simplifying assumption in this lab is that the threads are cooperative, i.e., each thread runs until it explicitly releases the CPU to another thread by yielding the thread or by exiting. In contrast preemptive threading systems allow a scheduler to interrupt a running thread at any time and switch the CPU to running a different thread.

The thread package provides several functions calls to allow application programs to perform thread management. In addition, there are a few conventions that application programs must adhere to in order to ensure proper and safe operation. A list of the functions that constitute the User-level threads API can be found in the thread.h file. The functions that you will be implementing for this lab are summarized here:

void thread_init(void):
You can use this function to perform any initialization that is needed by your threading system. Here, you should also "custom" create the first user thread in the system. To do so, you should setup the (kernel) thread that is running when your program begins (before any calls to thread_create) as the first user thread in the system (with tid = 0). You will not need to allocate a stack for this thread, because it will run on the (user) stack allocated for this kernel thread by the OS.
Tid thread_id():
This function returns the thread identifier of the currently running thread. The return value should lie between 0 and THREAD_MAX_THREADS. See solution requirements below.
Tid thread_yield(Tid tid):
This function suspends the caller and activates the thread given by the identifier tid. The caller is put on the ready queue and can be run later in a similar fashion. A reasonable policy is to add the caller to the tail of the ready queue to ensure fairness (so all other threads are run before this thread is scheduled again - see the THREAD_ANY argument below). The value of tid may take the identifier of any available thread. It also can take any of the following constants:
  • THREAD_ANY: tells the thread system to run any thread in the ready queue. A reasonable policy is to run the thread at the head of the ready queue, which ensures fairness. This policy is called FIFO (first-in, first-out), since the thread that first entered the ready queue (among the threads that are currently in the ready queue) is scheduled first. With preemptive scheduling, as introduced in Lab 3, your FIFO scheduler will ensure round-robin scheduling because all threads in the run queue will be scheduled for (roughly) an equal amount of time (a time slot) in a round-robin fashion.
  • THREAD_SELF: tells the thread system to continue the execution of the caller. This function could be implemented as a no-op, but it may be useful to explicitly switch to the current thread for debugging purposes.
The thread_yield function returns the identifier of the thread that took control as a result of the function call. Note that the caller does not get to see the result until it gets its turn to run (later). The function may also fail and the caller resumes immediately. To indicate the reason for failure, the call returns one of these constants:
  • THREAD_INVALID: alerts the caller that the identifier tid does not correspond to a valid thread.
  • THREAD_NONE: alerts the caller that there are no more threads, other than the caller, that are available to run, in response to a call with tid set to THREAD_ANY.
Tid thread_create(void (*fn)(void *), void *arg):
This function creates a thread whose starting point is the function fn. The second argument, arg, is a pointer that will be passed to the function fn when the thread starts executing. The created thread is put on a ready queue but does not start execution yet. The caller of the function continues to execute after the function returns. Upon success, the function returns a thread identifier of type Tid. If the function fails, it returns a value that indicates the reason of failure as follows:
  • THREAD_NOMORE: alerts the caller that the thread package cannot create more threads. See solution requirements below.
  • THREAD_NOMEMORY: alerts the caller that the thread package could not allocate memory to create a stack of the desired size. See solution requirements below.
Tid thread_exit():
This function ensures that the current thread does not run after this call, i.e., this function should not return, unless it fails as described below. A thread that is created later should be able to reuse this thread's identifier, but only after this thread has been destroyed. The function may fail, in which case, this thread continues running, and returns the following:
  • THREAD_NONE: alerts the caller that there are no more other threads that can exit, i.e., this is the last thread in the system.
Tid thread_kill(Tid tid):
This function kills another thread whose identifier is tid. The tid can be the identifier of any available thread. The killed thread should not run any further and the calling thread continues to execute. Upon success, this function returns the identifier of the thread that was killed. Upon failure, it returns the following:
  • THREAD_INVALID: alerts the caller that the identifier tid does not correspond to a valid thread, or is the current thread.

Solution Requirements

The first thread in the system (before the first call to thread_create) should have a thread identifier of 0. Your threads system should support the creation of a maximum of THREAD_MAX_THREADS concurrent threads by a program (including the initial main thread). Thus, the maximum value of the thread identifier should thus be THREAD_MAX_THREADS - 1 (since thread identifiers start from 0). Note that when a thread exits, its thread identifier can be reused by another thread created later.

Your library must maintain a "thread control block" (a thread structure) for each thread that is running in the system. This is similar to the process control block that an operating system implements to support process management. In addition, your library must maintain a queue of the threads that are ready to run, so that when the current thread yields, the next thread in the ready queue can be run. Your library allows running a fixed number of threads (THREAD_MAX_THREADS threads), so if it is helpful, you could allocate these structures statically (e.g., as a global array).

Each thread should have a stack of at least THREAD_MIN_STACK size. Your implementation must not statically allocate all stacks at initialization time (e.g., using a global data structure). Instead, you must dynamically allocate a stack (e.g., using malloc()) whenever a new thread is created (and delete one each time a thread is destroyed.)

As noted above, your library must use getcontext and setcontext to save and restore thread context state, but it may not use makecontext, or swapcontext or any other existing code to manipulate a thread's context; you need to write the code to do that yourself.

Your code must not make calls to any existing threads libraries (e.g., Linux pthreads), or borrow code from these libraries for this lab.

You may borrow any code for data structures, e.g., linked list, that you have implemented in the previous ECE344 labs. Do not use any code from other students, or from code available on the Internet. When in doubt, please ask us.

Hints and Advice

This project does not require writing a large number of lines of code. It does require you to think carefully about the code you write. Before you dive into writing code, it will pay to spend time planning and understanding the code you are going to write. If you think the problem through from beginning to end, this project will not be too hard. If you try to hack your way out of trouble, you will spend many frustrating nights in the lab.

As a start, here are some questions you should answer before you write code.

We strongly recommend that your first milestone might be for thread_yield(THREAD_SELF) to work for the initial thread (where your implementation stores and then restores the caller's state). Get this working before you try to implement thread_create or thread_exit.

Use a debugger. As an exercise, put a breakpoint at the instruction after you copy the current thread's state using getcontext. You can print the current values of the registers (in gdb, type info registers).

(gdb) info registers
rax            0x0      0
rbx            0x0      0
rcx            0x7ffff7a926b1   140737348445873
rdx            0x605910 6314256
rsi            0x0      0
rdi            0x0      0
rbp            0x7fffffffdec0   0x7fffffffdec0
rsp            0x7fffffffdea0   0x7fffffffdea0
r8             0x7ffff7dd8e60   140737351880288
r9             0x7ffff7a55c60   140737348197472
r10            0x8      8
r11            0x246    582
r12            0x400970 4196720
r13            0x7fffffffe850   140737488349264
r14            0x0      0
r15            0x0      0
rip            0x4024de 0x4024de
eflags         0x246    [ PF ZF IF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

You can print the values stored in your thread struct and the thread context. For example, say current is a pointer to the thread structure associated with the currently running thread, and context is a field in this structure that stores the thread context. Then, in gdb, you can use p/x current->context to print the context stored by a call to getcontext.

You may find this particularly useful in making sure that the state you "restore" when you run a newly-created thread for the first time makes sense.

Frequently Asked Questions

We have provided answers to various frequently asked questions (FAQ) about the lab. Make sure to go over them. We have provided answers to many questions that students have asked in previous years, so you will save time by going over these answers as you start working on the lab.

Start early, we mean it!

Testing Your Code

We have provided the program test_basic.c for testing this lab. Use it to test your code.

You can also test your code by using our auto-tester program at any time by following the testing instructions.

Using Git

You should only modify the following files in this lab.

show_ucontext.c
thread.c

You can find the files you have modified by running the git status command.

You can commit your modified files to your local repository as follows:

git add show_ucontext.c thread.c
git commit -m "Committing changes for Lab 2"

We suggest committing your changes frequently by rerunning the commands above (with different meaningful messages to the commit command), so that you can go back to see the changes you have made over time, if needed.

Once you have tested your code, and committed it (check that by running git status), you can tag the assignment as done.

git tag Lab2-end

This tag names the last commit, and you can see that using the git log or the git show commands.

If you want to see all the changes you have made in this lab, you can run the following git diff command.

git diff Lab2-start Lab2-end

More information for using the various git commands is available in the Lab 1 instructions.

Code Submission

Make sure to add the Lab2-end tag to your local repository as described above. Then run the following command to update your remote repository:

git push
git push --tags

For more details regarding code submission, please follow the lab submission instructions.

Please also make sure to test whether your submission succeeded by simulating our automated marker.