Sunday, May 3, 2020

TLB coherence

The discussion described in the last post got me thinking, why don't we have a hardware consistency for the page address translation cache (TLB), done through the same bus transactions as the usual cache snooping? In a multi-threaded environment, invalidating the TLB across all the threads is a pain that requires the cross-CPU interrupts.

In the simplest case the CPU that drives a TLB invalidation could just do a bus transaction that specifies a virtual address, and every CPU would invalidate its TLB entry that matches this address. If the user processes use the address space randomization, there would be few conflicts between processes, and the kernel address space is common for everyone. In a more complicated way, the virtual address can be accompanied by the process or memory protection id, so that only the pages of the CPUs running that process would be invalidated. And finally, each page translation entry has a unique identifier: the physical memory address where the CPU had read it from. If the TLB keeps this address as a tag (keeping enough bits to identify the page of the page directory is enough, since the rest of bits are determined by the virtual address), it can be used as a tag in the TLB invalidation transaction, and only the exact TLB entrues matching this tag would be invalidated.

Which made me wonder if anyone else had thought of this before. And after a bit of searching, it has turned out that they did: https://arxiv.org/pdf/1701.07517.pdf. These guys have approached the problem from the hypervisor standpoint, but the implementation is exactly the same, using the physical memory address of the entry as a tag. There is no date on the article, but judging by the citation list, it could not have been written earlier than 2016.

No comments:

Post a Comment