NVMe namespaces and complexities of flash controllers

Sunday, October 27, 2024

Today, I was watching a video from Jeff Geerling on Raspberry Pi SSDs, he was using the nvme command to show information on the SSDs.

Not having used nvme much (after all, it’s not like you go around switching NVMe devices every day), I figured to go list the contents of the M.2 drives installed on my machine:

$ sudo nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     xxxxxxxxxxxx         WDC WDS500G2B0C-00PXH0                   1         500.11  GB / 500.11  GB    512   B +  0 B   xxxxxxxx
/dev/nvme1n1     xxxxxxxxxxxx         WD Blue SN570 1TB                        1           1.00  TB /   1.00  TB    512   B +  0 B   xxxxxxxx

(Serial numbers and firmware revisions redacted.)

There, I noticed the Namespace field. I’ve never really paid much attention to specifics of NVMe, so it was time to read up from the official source: a technical resource on namespaces from NVM Express.

Turns out, I’ve already been exposed to namespaces – through the device names in Linux. NVMe devices in Linux are named /dev/nvmeXnY, where X is the device number and Y being the namespace number. I’d noticed that all NVMe devices end that I’ve seen end in n1, so I figured they were something similar to partition numbers. (Note that partitions on NVMe devices are pZ, e.g., /dev/nvme0n1p1.)

The reason why namespaces exists is where things get more interesting. From the technical resource description, it’s basically a way for multi-tenancy through logical separation of NVMe devices. It’s obvious that this is in the specifications for data center usages (like AWS, GCP and Azure.)¹ Little did I know that NVMe specifies such provisions.

This makes a lot of sense, given that these days we’re seeing much higher storage densities realized with flash devices than traditional hard drives. With 60+ TB of storage on a single device, separating a physical storage into multiple pieces for use by different customers is a quite logical. (Excuse the pun.)

On the same technical resource page, it goes into the relationship between namespaces and over-provisioned spaces.

Flash devices have over-provisioned space that’s used by the flash controller for wear leveling and house-keeping tasks like garbage collection. Namespaces are the only areas that the software on the host has access to. Therefore, over-provisioned areas can therefore be shared areas where the controller can do their work.

These points illustrate how much intelligence that these storage controllers have. Let’s take a quick recap at how complex these devices are…

First, they abstract away the physical layer of flash devices. NAND flash devices (which is used by most, if not all, large solid-state storage devices) consist of flash memory cells that can only be written at block levels, not as individual bytes. Furthermore, each cell has a finite write durability, which means that if data is written to the same cell multiple times it will wear out. To prevent that, the controller will essentially “move around” data being written (either through new writes or “overwriting”) to do wear leveling.

Then, we have these concepts of namespaces, which allow logical separation of areas of the storage device. These namespaces are the areas that the host device can actually get access to. And, there’s the over-provisioned space (if allocated) for the controller to perform wear leveling and other house-keeping. Hence, controllers keep track of information about how memory cells are being utilized.

Oh, and we’re glossing over the fact that some SSDs have caches. These caches can be DRAM-based or by using pseudo-SLC mode to speed up write on MLC devices to make caches on the flash memory itself.

The list of things that happen on solid-state storage devices go on and on… And, all of this is happening transparently, so the host does not even know it’s happening.

Essentially, these flash devices contain computers of their own to handle complex tasks at very fast speeds. Pretty fascinating stuff.

Gone are the days of hard drives which required setting physical geometries (i.e., cylinder-head-sector) to allow software to access them!

Which shouldn’t come as a surprise as most of them are part of the promoters of NVMe. ↩

coobird.net

NVMe namespaces and complexities of flash controllers