Profile cover photo
Profile photo
Brian Swetland
Writing the Codes
Writing the Codes


So one thing that's interesting about many of these NVME devices is they have relatively small max transfer sizes (128K is common, 2M is the largest I've seen).

So I made the choice to advertise a transfer limit of 4GB to the block middle layer and do the chunking of large transfers in the nvme driver itself.

The block middle layer, if it has to sub-chunk a transfer, will not send the next chunk until the first chunk completes, which, I suspected was not going to be an optimal strategy here.

So with the NVME driver doing the chunking, the large streaming read test has results (per individual read size) like:
64MB -> 1945.48 MB/s
32MB -> 1924.9 MB/s
8MB -> 1811.28 MB/s
1MB -> 1336.5 MB/s
512KB -> 1088.41 MB/s
128KB -> 536.513 MB/s
32KB -> 193.163 MB/s

I adjusted the driver to advertise the true max transfer (128KB in this device's case) and re-ran the tests...
64MB -> 561.534 MB/s
32MB -> 560.791 MB/s
8MB -> 562.055 MB/s
1MB -> 558.7 MB/s
512KB -> 555.748 MB/s
128KB -> 535.055 MB/s
32KB -> 192.883 MB/s

I suspect we may be able to get some improvements out of the AHCI/SATA driver using this tactic as well.

In the case of the NVME driver, the submit queue to the hw has 63 entries. With the driver internal chunking I can saturate that.

With the chunking at the middle layer we never have more than one IO pending in the queue.

A more extensive version of this post is available here:

And yet more is here:

. . .

This week I wrote a minimal NVME storage driver for Zircon. As usual, I used gerrit as a place to backup my work-in-progress from time to time. The end results are a (possibly interesting) window into how I go from a zero to functional driver...

A minimal shell of a driver that simply dumps parameters from the device and resets it. Useful to start getting some data from real hardware (since NVME has a lot of controller-specific parameters to look at):

First interaction with the hardware! Submit an IDENTIFY command to the Admin Submission Queue and observe a reply from the Admin Completion Queue. Hexdump the results for inspection:

Factor Admin Queue processing out into dedicated functions, provide a convenience function for transactions, wire up interrupts so we don't have to spin on the completion status. Decode some of the information from the IDENTIFY command and display it. Issue an IDENTIFY NAMESPACE command as well. Actually publish a device instead of just failing.

Setup an IO submission and completion queue as well (preparation to doing actual disk IO) and fiddle with IRQ setup a bit while trying to figure out why IRQs work on HW but not in Qemu.

Some #if 0'd code down in init() where I experimented with IO READ ops to verify that I understood how the command structure and prp list worked. Added a QEMU_IRQ_HACK to use polling instead of IRQs so I could test with Qemu as well. Start sketching out IO operation processing, with the concept of breaking iotxns down into utxns that are 1:1 with nvme io operations. Introduce #defines for a bunch of magic numbers, some more comments, and an IO processing thread. Wire up the device ops get_size(), ioctl(), and queue_iotxn() which will be needed for this to act as a real block device.

Build out the IO processing with io_process_cpls() to handle completion messages from the HW and io_process_txns() to handle subdividing iotxns into utxns and issuing IO commands to the hardware. Not done yet, and not code reviewed, but the iochk multithreaded disk io exerciser runs against devices published by this driver without failing or causing the driver to crash, so yay!

Fix a bug where the io thread would spin instead of wait when there was no io pending. Add some simple stat counters (which helped detect this bug).

Hey Plus,

I'm looking for an off-the-shelf, ready-to-go 4-8 tray (2.5") NAS device that I can install a stock linux distro on. Any suggestions? Just looking for some reliable hardware to migrate some drives to and don't want to run some proprietary "RAID OS" or whatnot. Mid-range modern-ish Intel cpu, 8+GB ram preferable.

Post has shared content
This came out of an extended conversation about traitor games (Werewolf/Mafia, The Resistance, Secret Hitler, Battlestar Galactica, etc):

The treacherous thing about traitors,
Is traitors are treacherous things!
Their mouths say nothing but libel,
Their treason does nothing but stings!
They're tricky, slippery, slimy, treasonous,
Fun, fun, fun, fun, fun!
But the most treacherous thing about traitors is,
I'm not the only one!

So here's a thought...

Game system emulators are pretty good, but they often don't have full fidelity to the original hardware, and while they're getting better at emulating it, the actual persistence, color, etc effects of SD NTSC displays are not entirely reproduced.

I think somebody should dig up some 1980s SD CRT TVs, VHS decks, and game consoles, and set things up to record high quality, 60+fps, 4K+ resolution HD video of what "content" really looked like back in the day.

For the benefit of current and future generations who have not (and someday may never be able to) experienced it as it was.

Dear Lazy Plus,

I'm looking for a halfway decent webcam (ideally w/ mic) that works with Ubuntu and Google Chrome for Hangouts video conferencing. 720p or 1080p. Doesn't need to be dirt cheap but ideally not absurdly expensive.

Post has attachment
So, Dishonored 2 became available on 9pm on the 9th.

It is good, good to be back in Dunwall. Taking my time, enjoying the sights, executing a not-quite-entirely-stealthy run through the game.

Post has attachment
Zachtronics launched their latest engineering puzzle game this week and it is pretty entertaining. SHENZHEN-I/O is sort of a fusion of mechanics from SpaceChem and TIS-100. It represents yet another step forward in Zachtronics producing a game that tries to emulate my job.

One of the later game puzzles (spoilers!):

Post has attachment
Entirely too much fun.
7 Photos - View album

Post has attachment
The Species Editor in Stellaris is pretty nifty...
13 Photos - View album
Wait while more posts are being loaded