Profile cover photo
Profile photo
Linus Torvalds
About
Linus's posts

Congrats to +SpaceX​ for the successful re-use and re-landing of the first stage.

Following the live feed is really quite amazing, especially when the SpaceX crowd ends up cheering on success. 

I thought I'd write an update on git and SHA1, since the SHA1 collision attack was so prominently in the news.

Quick overview first, with more in-depth explanation below:

(1) First off - the sky isn't falling. There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a "content identifier" for a content-addressable system like git.

(2) Secondly, the nature of this particular SHA1 attack means that it's actually pretty easy to mitigate against, and there's already been two sets of patches posted for that mitigation.

(3) And finally, there's actually a reasonably straightforward transition to some other hash that won't break the world - or even old git repositories.

Anyway, that's the high-level overview, you can stop there unless you are interested in some more details (keyword: "some". If you want more, you should participate in the git mailing list discussions - I'm posting this for the casual git users that might just want to see some random comments).

Anyway, on to the "details":

(1) What's the difference between using a hash for security vs using a hash for object identifiers in source control management?

Both want to use cryptographic hashes, but they want to use them for different reasons.

A hash that is used for security is basically a statement of trust: and if you can fool somebody, you can make them trust you when they really shouldn't. The point of a cryptographic hash there is to basically be the source of trust, so in many ways the hash is supposed to fundamentally protect against people you cannot trust other ways. When such a hash is broken, the whole point of the hash basically goes away.

In contrast, in a project like git, the hash isn't used for "trust". I don't pull on peoples trees because they have a hash of a4d442663580. Our trust is in people, and then we end up having lots of technology measures in place to secure the actual data.

The reason for using a cryptographic hash in a project like git is because it pretty much guarantees that there is no accidental clashes, and it's also a really really good error detection thing. Think of it like "parity on steroids": it's not able to correct for errors, but it's really really good at detecting corrupt data.

Other SCM's have used things like CRC's for error detection, although honestly the most common error handling method in most SCM's tends to be "tough luck, maybe your data is there, maybe it isn't, I don't care".

So in git, the hash is used for de-duplication and error detection, and the "cryptographic" nature is mainly because a cryptographic hash is really good at those things.

I say "mainly", because yes, in git we also end up using the SHA1 when we use "real" cryptography for signing the resulting trees, so the hash does end up being part of a certain chain of trust. So we do take advantage of some of the actual security features of a good cryptographic hash, and so breaking SHA1 does have real downsides for us.

Which gets us to ...

(2) Why is this particular attack fairly easy to mitigate against at least within the context of using SHA1 in git?

There's two parts to this one: one is simply that the attack is not a pre-image attack, but an identical-prefix collision attach. That, in turn, has two big effects on mitigation:

(a) the attacker can't just generate any random collision, but needs to be able to control and generate both the "good" (not really) and the "bad" object.

(b) you can actually detect the signs of the attack in both sides of the collision.

In particular, (a) means that it's really hard to hide the attack in data that is transparent. What do I mean by "transparent"? I mean that you actually see and react to all of the data, rather than having some "blob" of data that acts like a black box, and you only see the end results.

In the pdf examples, the pdf format acted as the "black box", and what you see is the printout which has only a very indirect relationship to the pdf encoding.

But if you use git for source control like in the kernel, the stuff you really care about is source code, which is very much a transparent medium. If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice.

Similarly, the git internal data structures are actually very transparent too, even if most users might not consider them so. There are places you could try to hide things in (in particular, things like commits that have a NUL character that ends printout in "git log"), but "git fsck" already warns about those kinds of shenanigans.

So fundamentally, if the data you primarily care about is that kind of transparent source code, the attack is pretty limited to begin with. You'll see the attack. It's not silently switching your data under from you.

"But I track pdf files in git, and I might not notice them being replaced under me?"

That's a very valid concern, and you'd want your SCM to help you even with that kind of opaque data where you might not see how people are doing odd things to it behind your back. Which is why the second part of mitigation is that (b): it's fairly trivial to detect the fingerprints of using this attack.

So we already have patches on the git mailing list which will detect when somebody has used this attack to bring down the cost of generating SHA1 collisions. They haven't been merged yet, but the good thing about those mitigation measures is that not everybody needs to even run them: if you host your project on something like http://github.com or kernel.org, it's already sufficient if the hosting place runs the checks every once in a while - you'll get notified if somebody poisoned your well.

And finally, the "yes, git will eventually transition away from SHA1". There's a plan, it doesn't look all that nasty, and you don't even have to convert your repository. There's a lot of details to this, and it will take time, but because of the issues above, it's not like this is a critical "it has to happen now thing".

Today is exactly twenty years since I moved to the US.

So has anybody tried out the new Credit Karma tax filing yet? I'm going to try regardless (they've been great for checking credit history), but it would be lovely to hear about experiences...

Post has attachment
The first step is to admit you have a problem.
Photo

Post has attachment
I've posted model builds here before.

This time I decided to try something different. One of the laser cut metal models - "Fascinations Metal Earth".

They make a big deal about how there is no glue or solder needed to build them.

And it's true. What keeps these things together is the tears of frustration when you try to fit the invisibly small tabs into the invisibly small holes.

The thing looked much bigger in pictures. It is tiny. 
Photo

Post has shared content
It's almost a year since I made a post here on G+ asking for peoples comments about gas compressibility calculations in scuba diving.

The resulting more accurate breathing gas compressibility calculations are one (very small) part of the new +Subsurface 4.6 release.

It has already successfully confused several people who learnt to do their SAC rate calculations using the ideal gas law.

Realistically, all the other changes listed in the announcement are much more important, but I had nothing to do with them, so they don't count.
The Subsurface developer team is happy to announce that release 4.6 of the Subsurface dive log program has been released.
You will find many improvements to the user experience (including a fix for spurious errors saving to cloud storage, improvements to the Facebook integration, many improvements to the dive planner and a really cool new heatmap for visualization of tissue loading during a dive). Subsurface can now download data directly from a number of new dive computers (thanks to Jef and the rest of the libdivecomputer developers) and also import several new data formats.
And of course there are a whole lot of bugfixes.
Look at the full announcement below, which also includes links to the binaries for Windows, Mac, and Linux.

Google WiFi update.

So I talked about my Google WiFi experience a couple of weeks ago, here's some more updates from having a bit more experience with it.

Notably, I got my fourth unit, and what remains the real bright spot with Google WiFi is the easy setup. Adding the unit to the network was truly trivial, and when I then plugged it into ethernet at my office, things "Just Worked(tm)".

That's what I like to see.

That experience also just reinforced my opinion that these mesh systems really should be able to mesh both wirelessly and over ethernet - the wireless mesh is great for the trivial cases and for easy setup, and the wired mesh is what makes it so good to expand on the system and get to places that would otherwise be unreachable or cause unnecessary extra wireless hops.

So I continue to be a fan. I don't think I ever want to deal with a traditional wireless router again (but I'll make a separate post about using the Ubiquiti system in more challenging environments).

Small details that have cropped up in the meanwhile:

DHCP Reseverations:

As with every system before, I did end up having to make dhcp address reservations for the printers after all. Without doing that, discoverability is just too flaky. I'm sure the whole PnP experience works for some people without it, and maybe it's the particular printers I have, but giving the printers a static IP address just helps with all those situations where they otherwise don't seem to be discoverable.

This isn't specific to Google WiFi, and the App makes it fairly easy to do. But it would be more natural to do it when looking at the device in the network overview screen, rather than having to go into "Settings" and "Advanced networking".

What I'm left missing (and nobody else seems to do that either) is to add a new DNS entry for the device when I do this. The dhcp names are good as they go, and the DNS client on the router does the right thing with them, but having a printer named something like "HP874661" is not exactly a human-friendly name.

In fact, the IP address is easier to remember than the odd dhcp name. So I'd like to be able to add a "office-printer" DNS alias when I assign the IP to the device (or even without assigning an IP to it - some things are fine to leave as dynamic addresses, but you might still want to have a local name to reach them).

And on that note:

Like a number of other fancier routers, Google WiFi does traffic tracking, and let's you name your devices so that it's easier to see exactly which device does what (so you can have "Linus' Pixel Phone" instead of some ambiguous "Andoid-2" device). This isn't the DNS alias I'm asking for, but it makes it much easier to read the statistics. Good.

And what I found interesting was how much more useful this was when you just carry your phone around with an App, rather than having a web interface on your computer. I've used routers with per-device network statistics etc before, and I've named the major devices before, but Google WiFi made it really easy to just walk around and see "ok, that name refers to this piece of equipment" and give them all more useful human-legible names.

As a result, I ended up naming everything, including things like my Rachio sprinkler controller etc. Things that I've never bothered with before, because it was just not very convenient. Walking around with a phone I could just go to the kids and say "ok, show me your phone settings screen so that can tell which IP is your phone".

So the "configure everything on your phone with the App" clearly has some secondary convenience advantages. I'd have liked to be able to filter devices (by IP address and by which unit they were connected to), but even without that, the app just made some things much simpler.

However, I do note that not having a traditional web interface at all then makes the "what the hell was the printer called again" problem much worse. If I'm at the kids computer, configuring the printer setup, and I don't have my phone with me, I can't just look it up on the router config web page on the same device that I'm trying to configure the printer on.

So you win some, you lose some.

But that issue made me really want those DNS aliases, because it's so hard remembering what IP address you picked for the printer, or what the crazy dhcp name for the printer was. Let me just call it "office-printer" or something. 

Post has attachment
I take back everything I said about the nasty weather.

It got a bit colder and actually snowed. We got almost a foot overnight, and now outdoor activities are on. It's actually pleasant outside.

Of course, in our family, outdoor activities are apparently limited to not even bothering to clear the hot tub cover. 
Photo

So a winter storm warning is in effect here, with freezing rain and just generally miserable. Everybody is staying inside, because outside is basically trying to kill you by having cars sliding around like greased pumpkins.

But somebody always thinks there is a silver lining.

The mobile weather information from http://weather.com happily tells you:

Air quality: good
Ideal air quality for outdoor activities

No, http://weather.com. Freezing weather with ambiguous snow/rain/ice falling from the sky, and the ground covered with ice is definitely not "Ideal air quality for outdoor activities".

The particulate counts don't really come into the picture at all, in fact.
Wait while more posts are being loaded