Thursday, September 6, 2018

COW filesystems

I've recently learned the term "Copy-On-Write filesystem". I guess I've been living under a rock. Not that it's a new concept , NetApp was doing this kind of a filesystem 20 years ago, but I've never heard before (or maybe heard and forgot) the term "COW filesystem".

I've also realized an interesting things: the COW filesystems seem to be essentially the log-structured filesystems but done better. A major problem with the log-structured filesystems is the periodic cleaning of the log, when the dead data gets discarded and the live data gets compacted. If the filesystem contains a large amount of data, with only a small fraction of it changing (which is fairly typical), the compaction ends up moving vast amounts of data around to fill the small gaps left by the changed data.

A COW filesystem doesn't have a contiguous log, instead it collects the unreferenced blocks into a free list and can reuse them without moving the rest of the data. Yes, the writes become non-sequential on the device, but with the modern devices, especially SSDs, this doesn't matter much any more.