A Tour of NVMe

Content: NVMe Specification (Section 2-2.2.1 [p15-p19] + Section 3.3.1 [p79-p83]), NVMe-tutorial (slides 1-11)

Recently, I’ve been working on making the metadata for QEMU NVMe ZNS emulation persistent across QEMU restarts. The state of namespace zones does not persist and will reverse to the empty state once QEMU is restarted. With the support of the mentors in QEMU community through weekly discussions, my searching graph for this task is as follows:

  1. What is this task and why? I’ve read a little related work about NVMe emulation metadata: 1) Dmitry Fomichev[1] , use another file to store metadata for NVMe ZNS emulation. If the file is not specified, then the metadata is not persistent as before; 2) Klaus Jensen[2], the metadata for NVMe emulation is stored at the end of the namespace backing block device.

    (I didn’t quite understand NVMe-specific things mentioned in the source code back then)

    -> read about NVMe ZNS spec, NVMe spec, slides, tutorials

  2. How to design? Two ways: 1) extensible design: use feature bits, which can represent different zone descriptors or APIs, 2) use different sets of interfaces. The things I was not clear were the aim for an interface. Now I think it means to unify interfaces for necessary cases. Regarding to zoned storage emulation, there are three cases supposed to take in considerations, zoned emulation, full emulation and NVMe ZNS emulation. Think about the design semantically and have a thorough plan through discussion before implementing anything. As it goes further, designing skeleton must be based on the implementations in source code. It reminds me of the idea Damien talked about for zone append command of zoned format driver. Different from the way in file-posix driver where append write command emulates through regular write and wp array model, a new approach could be something similar to file append. The old way still works but a new method can bring more insights.

    ? However, I didn’t think through the relation between file append and zone append write…

  3. What NVMe ZNS emulation in QEMU can emulate and how much of that is doable with the new zoned.c block driver? This question was brought out in the first week of metadata design discussion by Stefan. I was unable to answer it until further looking into spec and source code.

    -> Existing NVMe ZNS metadata support

    -> How to make file-posix and zoned format driver as underlying storage of NVMe ZNS devices?

    -> Consider supporting persistent metadata for one zoned namespace before enabling multiple zoned namespaces.

  4. Very naive design of the task. Checks and new problems.

    Store the metadata (write pointers and configurations) at the end of the zoned namespace.

    -> Where to put the metadata inside the disk? I am not favored to use another file to store metadata for each namespace. Could it lead to one namespace with one file or N namespaces with one file?

  5. Review for debugging zoned emulation. New tests. Think back to the command line for zoned emulation with considerations of real applications use QEMU. (Going back to the starting point)

  6. Stuck at the 4th step. What’s the next step to implement? Found the gap…


  1. https://lists.nongnu.org/archive/html/qemu-block/2020-06/msg00738.html ↩︎

  2. https://gitlab.eduxiji.net/duskmoon314/qemu/-/commit/bc3a65e99254cfe001bd16a569a5aa7d20f930e8 ↩︎