Skip to content

Coroutine

A coroutine is a programming concept referring to a function or routine that can pause its execution and yield control back to the caller, and later be resumed from where it left off. Unlike traditional subroutines or functions that run from start to finish, coroutines have multiple entry and exit points, enabling cooperative multitasking or asynchronous processing. Coroutines can be implemented using a variety of techiniques, each offering different levels of control, complexity, and performance.

Coroutines can suspend execution (yield) at certain points and resume from there later. They rely on the program's flow or a scheduler to determine when to yield or resume, instead of being preemptively managed by the operating system like threads. When a coroutine is paused, its state (e.g. variables, execution context) is saved, so it can pick up exactly where it left off. Coroutines are more lightweight than threads or processes because they don't require the overhead of system-managed context switching.

Coroutines are widely used in scenarios that require efficient handling of asynchronous or I/O bounded tasks. They allow for non-blocking operations and offer lightweight multitasking.

Stackful vs. Stackless

Stackful coroutines

Stackful coroutines are more similar to traditional threads. Each coroutine has its own separate stack. When a stackful coroutine is paused (typically via yield), the entire state of the coroutine, including the call stack, is saved. This allows the coroutine to resume exactly where it left off with its local state preserved. It can handle deep recursion and more complex function calls and each coroutine is isolated.

Languages like Lua and C++ (with libraries like Boost.coroutine) use stackful coroutines.

Stackless coroutines

Stackless coroutines, unlike stackful ones, do not have their own call stack. Instead, they rely on a shared execution context, saving only minimal state information—such as variables and the program’s execution position—when yielding. This makes them lightweight, using minimal memory, which is ideal for handling many concurrent tasks efficiently. Because there’s no need to save and restore a full call stack, context switching is faster, and stackless coroutines are easier to implement and manage. However, they are more limited in the types of tasks they can handle, as deep recursion or complex function calls may not be feasible, and they must avoid patterns that depend on a deep call stack, such as recursive calls.

Languages like Python and JavaScript use stackless coroutines.

Implementations

Coroutines can be implemented in various ways, ranging from low-level approaches like setjmp/longjmp and ucontext in C, which save and restore execution state for context switching, to more modern, high-level approaches like async/await in languages like Python and JavaScript, which manage suspension and resumption automatically via event loops. Low-level techniques, including assembly and manual stack management, offer fine control but are complex and error-prone, while green threads and coroutine libraries provide lightweight, cooperative multitasking with simpler management. OS-level threads offer scalability but at the cost of higher overhead. Each method balances control, complexity, performance, and portability differently, with higher-level solutions generally being easier to use and maintain.

setjmp / longjmp

While setjmp / longjmp can be used to simulate coroutines in C or C++, they come with many challenges, such as portability issues, complex state management, stack corruption, and lack of proper resource management.

#include <setjmp.h>
#include <stdio.h>

// define a global jmp_buf variable to store the execution context
jmp_buf buf;

void function() {
  printf("calling setjmp\n");

  // saves the current execution context the first call returns 0
  int res = setjmp(buf);

  printf("setjmp res: %d\n", res);

  if (res != 0) {
    printf("done\n");
    return;
  }


  // calling longjmp to jump back to the setjmp point with a non-zero return value
  printf("calling longjmp\n");
  longjmp(buf, 1);
}

int main() {
  function();
  return 0;
}
output
calling setjmp
setjmp res: 0
calling longjmp
setjmp res: 1
done

ucontext

ucontext in C provides functions to get and set execution contexts and control the flow of execution, similar to setjmp, longjmp but with more control over the state including registers and stack.

#include <stdlib.h>
#include <stddef.h>
#include <stdio.h>
#include <ucontext.h>

#define STACK_SIZE 2048

ucontext_t main_ctx;
ucontext_t ctx[3];

void func1() {
  printf("func1 first\n");
  // saves the current context into the first argument
  // and switches to the context provided as the second argument (main context)
  swapcontext(&ctx[0], &main_ctx);
  printf("func1 second\n");
}

void func2() {
  printf("func2 first\n");
  swapcontext(&ctx[1], &main_ctx);
  printf("func2 second\n");
}

void func3() {
  printf("func3 first\n");
  swapcontext(&ctx[2], &main_ctx);
  printf("func3 second\n");
}

int main() {
  // init main context
  getcontext(&main_ctx);

  // setup context for func1
  getcontext(&ctx[0]);
  ctx[0].uc_link = &main_ctx; // after func1 finishes, it will return to main
  ctx[0].uc_stack.ss_sp = malloc(STACK_SIZE); // allocate stack for func1, the size depends on what we do in the function
  ctx[0].uc_stack.ss_size = STACK_SIZE;
  makecontext(&ctx[0], func1, 0); // set func1 to run in this context

  // setup context for func2
  getcontext(&ctx[1]);
  ctx[1].uc_link = &main_ctx;
  ctx[1].uc_stack.ss_sp = malloc(STACK_SIZE);
  ctx[1].uc_stack.ss_size = STACK_SIZE;
  makecontext(&ctx[1], func2, 0); 

  // setup context for func3
  getcontext(&ctx[2]);
  ctx[2].uc_link = &main_ctx;
  ctx[2].uc_stack.ss_sp = malloc(STACK_SIZE);
  ctx[2].uc_stack.ss_size = STACK_SIZE;
  makecontext(&ctx[2], func3, 0); 

  // acts as a simple scheduler
  int i;
  for (i = 0; i < 6; i++) {
    printf("switching from main to ctx %d\n", i % 3);
    swapcontext(&main_ctx, &ctx[i % 3]);
  }

  printf("done\n");
}
ctx2                  ----
                      |  |     
ctx1         ----     |  |     
             |  |     |  |     
ctx0  ----   |  |     |  |     
      |  |   |  |     |  |     
main -----------------------→→→
output
switching from main to ctx 0
func1 first
switching from main to ctx 1
func2 first
switching from main to ctx 2
func3 first
switching from main to ctx 0
func1 second
switching from main to ctx 1
func2 second
switching from main to ctx 2
func3 second
done

assembly

Coroutines can also be implemented using assembly. In assembly, we need to manage things like saving and restoring registers, switching statck pointers, and maintaining the state of execution across different coroutine calls or execution contexts.

; assume on x86-64
; non-functional, just to show some ideas
; coroutine_start: Coroutine entry point
coroutine_start:
    ; Save current state (e.g., registers, stack pointer)
    push rbx        ; Save registers
    push rdi
    push rsi
    push rdx

    ; Simulate some coroutine work
    ; Your coroutine code goes here

    ; Switch to another coroutine (restore the other coroutine's state)
    pop rdx         ; Restore state
    pop rsi
    pop rdi
    pop rbx

    ; Resume the coroutine's execution
    ret             ; Return from the coroutine, or yield control

; coroutine_switch: Coroutine switch function
coroutine_switch:
    ; Save the current coroutine state
    mov rdi, [state_ptr]   ; Load state pointer
    mov [rdi], rsp         ; Save the current stack pointer

    ; Switch to the other coroutine's state
    mov rsi, [other_state_ptr]  ; Load other state pointer
    mov rsp, [rsi]             ; Restore the other coroutine's stack pointer

    ; Return control to the other coroutine
    ret