Sergey Babkin on CEP and stuff: Asynchronous programming 9

Consider this traditional function:

void writeHeader()
{
char buf[512];
// ... populate the bufer ...
write(buf, 512);
}

and its asynchronous version:

void writeHeader(
shared_ptr<FutureBase> input,
shared_ptr<HeaderCtx> ctx,
shared_ptr<WritePromise> result)
{
char buf[512];
// ... populate the bufer ...
write(buf, 512)->chain(result);
}

What is wrong with it? The buffer on the stack gets freed before the write completes and the next scheduled function fills it with garbage. The next version:

HeaderCtx {
...
char buf[512];
};

void writeHeader(
shared_ptr<FutureBase> input,
shared_ptr<HeaderCtx> ctx,
shared_ptr<WritePromise> result)
{
// ... populate the bufer ...
write(ctx->buf, 512)->chain(result);
}

Potentially better, with buffer in the context (remember, the context is an analog of a stack frame in the normal functions) but now the context gets freed immediately after writeHeader() returns too! So no, not really better. What we need is to keep the context alive until the write completes. It can be done like this:

void empty(
shared_ptr<FutureBase> input,
shared_ptr<void> ctx,
shared_ptr<Promise<void>> result)
{}

void writeHeader(
shared_ptr<FutureBase> input,
shared_ptr<HeaderCtx> ctx,
shared_ptr<WritePromise> result)
{
// ... populate the bufer ...
auto wres = write(ctx->buf, 512)
wres->chain(result);
wres->chain(empty, ctx);
}

or in a slightly different version:

void empty(
shared_ptr<FutureBase> input,
shared_ptr<void> ctx,
shared_ptr<PromiseBase> result)
{
input->chain(result);
}

The empty function does nothing, it's just a placeholder for the context to be kept alive in a chained promise until the write completes. Note that the same empty function can be used in all the places where this functionality is needed.

Which brings us to the point that instead of writing custom snippets for everything, we might be able to compose a good deal of computation out of pre-defined functions.

One repeating example has been storing the result of a computation in a variable. It can be done as a reusable function that gets an address to store as its context (and that's one of the examples where the context would be better as just a pointer instead of a shared_ptr) and stores the value of a given type from its input future. Considering that a future has two separate meanings, returning the value and signaling the completion, we could even define a separate specialized kind of future that would store the value at a given address instead of keeping it internally.

Another obvious possible composition is in collecting the arguments of the asynchronous functions. It would make sense to be able to compute the arguments in parallel, then call the function. And it's not that hard to do. An asynchronous function in any case consists of multiple plain functions: "header function" and "continuation functions", with the context passed to the continuation functions being the stack frame of the asynchronous function, with the context allocated and needed arguments copied into the context by the header part. How about we make the function arguments into a structure and pass it as context to the header part of the asynchronous function? Which would now become not called directly but chained to the completion of the context. Which in turn would be driven by AllOf for completion of the computation of all the arguments (stored into the structure on completion as discussed above), and sometimes perhaps one more function, telling that the previous computation in the sequence has completed. Not every argument has to be computed asynchronously, they could be assigned synchronously, and then there just won't be a future for this argument to include into AllOf. To reduce the overhead, potentially the arguments structure can be passed not as a shared_ptr but as a plain pointer, owned by the calling function (as the arguments are on the stack for the plain functions) - then of course the calling function needs to make sure that the argument structure lives throughout the call, as been shown above with the buffer.

Well, if you're using coroutines, the compiler would probably do all that for you, generating just enough of the small functions on the fly. If the coroutines don't work for you, the missing custom fragments can probably be filled in the modern C++ with lambdas. Lambdas can be combined with the macros too if you really want to.

One thing that can be said about large collections of small functions calling each other through a scheduler, is that they'll never be very efficient. Although they can be a little more efficient if instead of returning back they could be made to jump straight to the next function in the chain, such as if the entry address of the next function is pushed onto the stack instead of the return address. In fact, I've started writing this series because of a something that I've read recently, about a virtual machine in an ancient DBMS that worked exactly like this, instead of returning from a function having an instruction (PC = (RCHAIN)+), so that a sequence of function addresses to call would be prepared in memory, RCHAIN initialized pointing to the start of it, and then calling this instruction to jump to the first function in the sequence.

P.S. I've read the description of the B language (http://cm.bell-labs.co/who/dmr/kbman.html), and it also worked like this, with each snipped ending with "jmp *(r3)+". This was called "threaded code". I also remember from reading almost 40 year ago that Forth also used the threaded code. Looks like it was a popular approach in the 1970s, and now it has a new use!

Sergey Babkin on CEP and stuff

Wednesday, February 12, 2025

Asynchronous programming 9 - composition

No comments:

Post a Comment

Links

About Me

Labels

Blog Archive