For this to make sense, you need to have a pretty deep understanding of +Dmedia, but this is roughly how we'll migrate from the current version zero schema and hashing protocol to version one.

For a lot of reasons, I feel the D-Base32 encoding I've been working on is the right move for us:
http://docs.novacut.com/dbase32/dbase32.html

So the V0 => V1 protocol migration isn't just a tweak to hashing protocol, it will also involve a different base32 encoding. This means we can't do this migration in the way we plan to in the future.

In the future, the protocol version will be identified by the digest length, and different versions of the protocol will be able to exist side by side in the same FileStore. From the V1 protocol forward, we will support each protocol version indefinitely, but newly added files will be hashed according to the newest protocol. Even if we rehash files to ID them according to the newest protocol, it's important to always be able to resolve and verify a file according to their old IDs.

Because of the switch to D-Base32 encoding, we wont be supporting the V0 protocol alongside the V1 protocol, but we will provide a way to migrate from V0 to V1. This actually worked out for the best, I think, because this means V1  can use the same 240-bit digest size used by V0, which I do think is the best digest size at this point.

FileStore migration

Because of the different base32 encoding, the sub directories in .dmedia/files/ will be slightly different.  There are 1024 sub directories there, based on the first two characters of the file ID.

Compared to Base32, the D-Base32 alphabet includes "8" and "9", and removes "2" and "Z". So it's easy to tell wither the file layout was created before or after the encoding switch.

There are 124 directories that will be present in the old layout, but not in the new. If any are present, we'll move "files" to "files0":

.dmedia/files/ => .dmedia/files0/

And then create a modern .dmedia/files/ layout. The full FileStore API wont be supported for the old layout, it will pretty much just be:

FileStore._iter_()
FileStore.path(_id)
FileStore.verify_iter(_id)

The migration is expensive because we need to verify according to the old protocol leaf-by-leaf (so we know we have the correct data) and the re-hash according to the new protocol leaf-by-leaf. But aside from a few core Novacut folks, I don't think many people have large Dmedia libraries (yet), so hopefully this wont be too painful for most people.

Once we have the V1 file ID, we'll move the files into its modern canonical location, and create a symlink in the old layout pointed to the file in the new layout.

Database migration

The CouchDB database names are all versioned, for example:

dmedia-0
dmedia-0-<PROJECT_ID>
novacut-0
novacut-0-<PROJECT_ID>

When the migration takes place, the old database will be left-as if, and document by document we'll migrate into the new databases:

dmedia-1
dmedia-1-<NEW_PROJECT_ID>
novacut-1
novacut-1-<NEW_PROJECT_ID>

Dmedia and Novacut use two different types of IDs: random IDs, which are 120-bits (24 characters when base32 encoded); and intrinsic IDs, which are based on a content hash and are 240-bits (48 characters when base32 encoded).

As we migrate the schema, we need to replace old IDs with the new IDs. This includes the doc._id, but also references to other doc IDs that occur inside a document. Novacut especially uses a lot of references like this, but Dmedia uses them quite a bit too.

Migrating the random IDs is easy, we just decode the Base32 ID, then encoded it with D-Base32:

new_id = db32enc(b32dec(old_id))

Migrating the intrinsic IDs takes a bit more. For file IDs, we need a table mapping old IDs to new, and to compute the new ID we need to actually have the file available so we can re-hash according to the new protocol. I need to put some more thought into this still in terms of how to handle things when the filestore containing a given file isn't currently connected (say, a removable drive, or because isn't on the internal drive of another computer).

Fortunately, it's quite easy to incrementally do the migration. But we might need some user prompting with this, I'm not sure this is something we can (or should) do transparently in the background.

The other type of intrinsic ID is the user and machine IDs used for the secure device peering (and the Novacut account infrastructure, soon). These IDs are based on the hash of the RSA public key. Unfortunately, there isn't really a way to do this migration without requiring you to re-peer your devices. But as this is really easy, I don't feel to bad about that :P

Extra Novacut migration

During this migration, we're switching to the newer Novacut schema that I've been working on here:

https://code.launchpad.net/~jderose/novacut/gst-1.0

This is a good example of why we version the databases rather than individual documents: it's so we can gracefully do migrations that aren't 1 document to 1 document.

The newer Novacut schema supports generalized multi-track audio, whereas the old schema was kinda a hack that could only use the audio track located in the video, and only along the exact same time-slice as the video.

So when the migration happens, each Novacut slice with {"stream": "both"} will result in 2 documents in the new database, one node for video, another for the audio.
Shared publiclyView activity