Sunday, February 2, 2025

Asynchronous programming 3 - some assistance

I've been saying it 20 years ago, and 15 years ago in the TPOPP book, and I'm still saying it now: the asynchronous programming has to be assisted by a compiler, otherwise it's just a huge pain of doing manually things that a compiler normally does. Fortunately, I think now we have an out-of-the-box solution: the C++ coroutines in C++20, as described for example here: . I haven't quite tried to do an actual implementation with them but it looks like the right thing. You define your Promise class (note that coroutines don't differentiate between the Future and Promise sides and call everything a Promise), and then the coroutine statements take that Promise class as a template argument and arrange the splitting of the sequential code into fragments. And you do the explicit parallelism on your own.

Another solution that I played with, doing a partial implementation, would work with plain C too: a preprocessor. It can be done in some smart way, as a whole pre-parser like cfront of yore, or a lot can be achieved even with the standard C preprocessor. The only trick is to generate the unique function names, and these can be done by using the macro __LINE__. Since the line number stays the same within a macro invocation, each invocation gets a unique number that can be used repeatedly within the macro body. In modern C++, of course, we could also use the lambdas, making the naming issue moot, it's more of a plain C issue.

The most difficult part is that  we'll need to use the same call and return macros in both the "header" part of the function and the "continuation" part. Which means that all the functions have to have the same result type, and return the value in the same way. So let's take the example from the last post and reformat it to fit into this approach. The original example from the previous installment was:

struct FuncContext {
  int a;

Future<int> func()
  auto ctx = make_shared<FuncContext>();
  shared_ptr<Future<int>> futa = get_a();
  return futa->chain(func2, ctx) // use default executor
    ->chain(func3, ctx);

void func2(shared_ptr<FuncContext> ctx, shared_ptr<Future<int>> arg, shared_ptr<Promise<int>> result) {
  ctx->a = arg->value();

void func3(shared_ptr<FuncContext> ctx, shared_ptr<Future<int>> arg, shared_ptr<Promise<int>> result) {
  int b = arg->value;
  result->return_value(ctx->a + b);

To get the same return type throughout we change the "header" part to return void and pass the returned future back via an argument. 

The other problem is the type of that return promise's value: carrying it through all the "continuation" parts is difficult, so we'd have to revert to the base promise type that doesn't care about the return value and cast it only when setting the value. This base type has to exist for the scheduler to juggle all these promises in its queues. Also, remember, the premise here is that coroutines are not available, which would often mean plain C, and there the promises can't be templatized in the first place.

The code becomes:

struct FuncContext {
  int a;

void func(shared_ptr<Promise<int>>* result_future)
  auto ctx = make_shared<FuncContext>();
  auto result = make_shared<Promise<int>>();
  *result_future = result.to_future();
  shared_ptr<Future<int>> fut_cont;
  fut_cont->chain(func2, ctx)->chain(result);

void func2(shared_ptr<FuncContext> ctx, shared_ptr<Future<int>> arg, shared_ptr<PromiseBase> result) {
  ctx->a = arg->value();
  shared_ptr<Future<int>> fut_cont;
  get_b(ctx->a, &fut_cont);
  fut_cont->chain(func3, ctx)->chain(result);

void func3(shared_ptr<FuncContext> ctx, shared_ptr<Future<int>> arg, shared_ptr<PromiseBase> result) {
  int b = arg->value;
  static_cast<Promise<int>*>(result.get())->return_value(ctx->a + b);

Then we want to make it look like this:

ASYNC_FUNC_0ARG(func, int, {
  int a; // this is the context
}) {
  ASYNC_CALL_0ARG(func, ctx->a, int, get_a);
  ASYNC_CALL_1ARG(func, int b, int, get_b, ctx->a);
  ASYNC_FUNC_RETURN(int, ctx->a + b);

Here for simplicity I've just used separate macros for definitions and calls of functions with different number of arguments. It's definitely possible to use the macros with variable number of arguments, just it's not something that I use often and I'm too lazy to look it up now. The invocation of ASNC_FUNC_END is needed to balance out the curly braces. The name of the calling function is needed in the CALL macros to refer to the context type name, this unfortunately can't be avoided, and then incidentally it can be used to generate the names of continuation functions. Alternatively, we could define the function name as a macro before the function definition and undef it afterwards, then everything in between could just use that macro for function name.

There is a bit of ugliness but still, looks much shorter and simpler than before, doesn't it? Now all we do is to define the macros that will translate one into another by copy-pasting from the long example (I haven't actually tried these macros right now, so they might contain small bugs but it shows the idea, and I did get a similar system working in the past):

#define ASYNC_FUNC_0ARG(fname, func_return_type, context_body) \
struct fname##Context context_body; \
void fname(shared_ptr<Promise<return_type>>* result_future) \
{ \
  using return_type = func_return_type; \
  auto ctx = make_shared<fname##Context>(); \
  auto result = make_shared<Promise<return_type>>(); \
  *result_future = result.to_future();

#define ASYNC_FUNC_END }

#define ASYNC_CALL_0ARG(fname, assign_to, call_return_type, call) \
    shared_ptr<Future<call_return_type>> fut_cont; \
    call(&fut_cont); \
static void fname##__LINE__(shared_ptr<fname##Context> ctx, shared_ptr<Future<call_return_type>> arg, shared_ptr<PromiseBase> result); \
    fut_cont->chain(cont##__LINE__, ctx)->chain(result); \
  } \
} \
static void fname##__LINE__(shared_ptr<fname##Context> ctx, shared_ptr<Future<call_return_type>> arg, shared_ptr<PromiseBase> result) { \
  assign_to = arg->value(); \

#define ASYNC_CALL_1ARG(fname, assign_to, call_return_type, call, call_arg1) \
    shared_ptr<Future<call_return_type>> fut_cont; \
    call(call_arg1, &fut_cont); \
static void fname##__LINE__(shared_ptr<fname##Context> ctx, shared_ptr<Future<call_return_type>> arg, shared_ptr<PromiseBase> result); \
    fut_cont->chain(cont##__LINE__, ctx)->chain(result); \
  } \
} \
static void fname##__LINE__(shared_ptr<fname##Context> ctx, shared_ptr<Future<call_return_type>> arg, shared_ptr<PromiseBase> result) { \
  assign_to = arg->value(); \

#define ASYNC_FUNC_RETURN(return_type, expr) \

There are a couple more of things to explain in  ASYNC_CALL macros. One is that they have to declare the continuation function before using it, this is something that I've glanced over before, because if you write these continuation functions manually, you'd collect all the declarations up front. But if they're generated on the fly, the declarations also have to come on the fly. These functions can be static because they're not called from outside the file. The second thing is that the current function gets closed with two curly braces, and the next one gets opened with two curly braces. This is because ASYNC_FUNC opens the function with a curly brace for the generated definitions, and then another brace comes after the macro, and then we need to maintain the same brace depth throughout.

Note that the execution of the asynchronous functions here is strictly sequential, no ifs nor loops. However similar macros can be made for ifs and loops, and if I ever get around to transform this text to a chapter for a newer version of my book on parallel programming, I'll do them too. They'd be ugly but still better than writing things manually. And a specialized preprocessor like cfront can reduce the ugliness of having to repeat the names that can't be remembered between the C preprocessor macros and to explicitly specify the level of nesting for the ifs and loops.


No comments:

Post a Comment