Sergey Babkin on CEP and stuff: Asynchronous programming 2

For the sake of a quick introduction, I've glanced over some things in part 1. Here I want to come back and show them.

Let's start with a small code snippet similar to what was shown in part 1:

func(context, arg)
{
...
}

Future fut1;
fut1.chain(func, context, executor);

Note that there are no types in this snippet, I've dropped them to avoid getting mired in them. Let's fill them in, and the answer might vary depending on the specific asynchronous library.

Let's start with the function argument arg. Note that it's not explicitly mentioned anywhere in chain(). That's because the argument comes from the future fut1, it's the value that becomes stored in it. So if, suppose, the type of fut1 is actually Future<int>, the argument might actually be

int arg

but the more typical solution is to pass the whole input future as an argument:

Future<int> arg

Except that normally the futures wouldn't be copied but passed by reference. And considering in how many places they get referred to, the only reasonable way is to use either reference counting or garbage collection. Reference counting is more natural for C++ and C, so the type would become:

shared_ptr<Future<int>> arg

Next, what is the function's return value? Being an asynchronous function, its return value must be returned through a Promise. Moreover, that Promise's Future side needs to be returned at the time of the chaining, so the chaining becomes (assuming that the returned value is of type double):

shared_ptr<Future<int>> fut1;
shared_ptr<Future<double>> fut2 = fut1.chain(func, context, executor);

But how will the function know where to return that value? It has to receive that result Promise as an argument too:

void func(context, shared_ptr<Future<int>> arg, shared_ptr<Promise<double>> result);

Since the result is returned through an argument, the normal function's return type becomes void. It's the responsibility of the function to make sure that the result promise will be completed, however this doesn't have to happen by function's return time. Instead it can schedule some other code that will complete this promise later (for example, by chaining it from some other future that it creates). Which, yes, is another potential source of errors when the promise completion gets forgotten in one of the branches of execution and the rest of logic gets dealocked waiting for it. The way to debug this is to have the library keep track of the futures that have some dependency chained to them but haven't been completed and haven't been scheduled to run and haven't been chained to something else. However this can also be a normal intermediate state of a future being still prepared, or of a future stored in some data structure to be found and completed later, so the program can't just abort every time on seeing such a future. Instead it has to be a tool that can run and list all the suspicious futures whenever a deadlock is suspected. Or there can be a special flag that would let the future be temporarily excepted, that gets cleared on exiting the constructing scope unless explicitly preserved. Then any con-compliant future without this flag can be an immediate reason for a program abort, but if the flag gets mismanaged, the deadlocks could still happen. As I've said many times before, the asynchronous programming is fragile and hard to debug.

The executor would generally also be a shared_ptr. The final part is the context. Which is normally also a shared_ptr to some object. What object, depends on the function. Consider a classic function:

int func()
{
int a = get_a();
int b = get_b(a);
return a+b;
}

If the functions get_a() and get_b() can block (and I've made get_b() dependent on a to make the execution sequential), in the asynchronous form this function gets split:

struct FuncContext {
int a;
};

Future<int> func()
{
auto ctx = make_shared<FuncContext>();
shared_ptr<Future<int>> futa = get_a();
return futa->chain(func2, ctx) // use default executor
->chain(func3, ctx);
}

void func2(shared_ptr<FuncContext> ctx, shared_ptr<Future<int>> arg, shared_ptr<Promise<int>> result) {
ctx->a = arg->value();
get_b(ctx->a)->chain(result);
}

void func3(shared_ptr<FuncContext> ctx, shared_ptr<Future<int>> arg, shared_ptr<Promise<int>> result) {
int b = arg->value;
result->return_value(ctx->a + b);
}

This highlights how the asynchronous code is typically written:

There are two kinds of asynchronous functions: the "head parts" of the actual meaningful high-level functions, like func(), and the split-out internal fragments of the meaningful functions, like func2() and func3(). They're usually written differently, the heads taking the arguments just like the common functions and returning a future with the result, where the fragments are tied to some future representing the result of another asynchronous function call, do the next step of computation until calling another asynchronous function, and then return the result of that function as their result (at least in this pattern where the fragments are pre-chained in advance).
The context carries the values between the fragments, and is an analog of a stack frame in a normal function. It's possible to fine-tune each step's context but that's usually more trouble than worth, so other than for some obvious optimizations (such as b here not getting stored in the context because it's used in only one fragment), it's much easier and better to just carry the same context throughout all the fragments.

Note that all the dynamic objects are nicely auto-destroyed by reference counting after the function completes, and in the meantime are held alive in the scheduling queues and future chains. However the implication there is that a value stays alive as long as the future containing it stays alive, and if that future is kept for a long time, the value would also be kept.

Why would a future be kept for a long time? Because a future represents both a value and the fact of completion, and the fact of completion might be interesting for much longer than the value, as will be shown in the future installments. In this case it might be useful to chain a future with a value to a future without a value:

Future<SomeType> fut_a;
Promise<void> prom_b;
...
fut_a->chain(prom_b);

However normally the chaining expects that the types of values on both sides are the same. So this is a special case of converting to void that should be included in the library. If it isn't in the library, it can be implemented as:

template<typename T>
void convert_to_void_impl(shared_ptr<void> ctx, shared_ptr<Future<T>> input, shared_ptr<Promise<void>> result)
{
result->return_value();
}

template<typename T>
shared_ptr<Future<void>> chain_to_void(shared_ptr<Future<T>> input) {
return input->chain(convert_to_void_impl, nullptr, input->getExecutor());
}

using an intermediate function to change the type. And if some particular library supports no void future, you can always use an int future instead and never look at its value, just be satisfied that it has some value.

Sergey Babkin on CEP and stuff

Saturday, January 25, 2025

Asynchronous programming 2 - filling in the types

No comments:

Post a Comment