Shared publicly  - 
Do I get a prize for obfuscated (but efficient!) code?

I'd like to get rid of that loop that removes trailing slashes, it predicts horribly badly (ie common case is "exactly once through the loop") but while I could play with separate masks for NUL and '/' characters, that runs into register pressure issues. I suspect I should just special-case the "one slash" thing and avoid the mispredict that way.


Resulting inner loop:
#define ONEBYTES 0x0101010101010101ul
#define SLASHBYTES 0x2f2f2f2f2f2f2f2ful
#define HIGHBITS 0x8080808080808080ul

/* Return the high bit set in the first byte that is a zero */
static inline unsigned long has_zero(unsigned long a)
return ((a - ONEBYTES) & ~a) & HIGHBITS;

* Calculate the length and hash of the path component, and
* return the beginning of the next one (or the pointer to the
* final NUL character if none).
static inline const char *hash_name(struct qstr *str, const char *name)
unsigned long a, mask, hash, len;

str->name = name;
hash = a = 0;
len = -sizeof(unsigned long);
do {
hash = (hash + a) * 11;
len += sizeof(unsigned long);
a = *(unsigned long *)(name+len);
/* Do we have any NUL or '/' bytes in this word? */
mask = has_zero(a) | has_zero(a ^ SLASHBYTES);
} while (!mask);

/* The mask below the first high bit set */
mask = (mask - 1) & ~mask;
mask >>= 7;
hash += a & mask;
str->hash = fold_hash(hash);

/* Get the final path component length */
len += ffz(mask) / 8;
str->len = len;

/* remove trailing slashes */
while (name[len] == '/')

return name+len;
Marcin Baczyński's profile photoMichael Clements's profile photoMiguel Angel Gordian's profile photoYang Yuxi's profile photo
Ugh. G+ does something bad to formatting. Is there a way to quote code in G+ postings?
I don't think that "prize" is quite the right word for what that function needs. :)
Lower-left corner, "send feedback"! ;)
There really ought to be a way to have code formatted correctly, but I don't think there is. It would be nice if Google+ supported markdown or something similar. If it were FLOSS you could submit a patch, but...
Nice one, I like it. Kind of job security (which is non-issue for you, but well, I could use a few tricks ;)
I do have to agree with you that / mis-predicts things terribly.

As far as showing code inside G+, perhaps +Dave Besbris or +Chee Chew can provide some insight on that one. 
Maybe if Google shared their code with us we could fix that quoting for them.
Hmm, althought there are like zero people here in google+, I still like getting a source code as an status update rather than "I just had breakfast!"
As for sharing code snippets, IMO GitHub works great.
+Linus Torvalds I wish there was, but sadly G+ is lacking in rich text formatting features. This has been a problem since the beginning and Google really needs to address it.
Ken Wills
Would be great if {{{ ... }}} wiki syntax could be used for code / preformatted.
G+ needs to support markdown like GitHub and Stack Exchange, not just baby markdown.
Make a screenshot. Perhaps hide it behind some layers of other stuff. You wanted obfuscated...
Aside from the possibly unaligned load of a long, this looks pretty interesting. I've been meaning to speed up some code I wrote that needs to search for ':', '?' and '#' (parsing of URLs) and this might come in handy.
If it's well-documented, it's not obfuscated.
I think I just threw up a little in my mouth, Linus.
Pretty darn efficient as it is. I tend to try things like that in reverse just to see how it comes out.
Ah damn, someone else has already beaten me to the IOCCC page link...
I'm betting my money there will be a formatting feature in the next Google plus update! Send feedback everybody!
+Thiago Macieira: it's not only limited to architectures that can do unaligned loads efficiently, it also requires little-endian, and that you can overshoot the string by up to sizeof(long)-1 bytes.

So no, it's not portable. It's code I'm playing around with that will need a config option to be enabled, but it does speed up pathname lookup quite noticeably.

It used to be that our pathname lookup issues were all about locking, we largely solved those last year, so I was looking at profiles and thinking "what if.."
+Linus Torvalds, I have to agree that a paste bin or git link would be your best bet. Also, so glad I saw your rant and hit "follow" the other day!
I dont know anything about code, what I do know, is that you have to use Git, to avoid being ugly and stupid.
Did you use Git;-)?
speaking of bad code, the right usage of ie is i.e., not ie.
+Linus Torvalds I'm going to embarass myself here, and maybe I just haven't had enough caffeine, but shouldn't the last loop be:

/* remove trailing slashes */
while (name[len] == '/')

An ugly, but possibly workable convention:
sed 's/ /_./g' (replace all pairs of spaces with "_." or some such.
I have seen worse, so I don't know if is the code I read or something else.

At least it has comments in it :D
+Peter Sitterly: no, when it does the "remove trailing slashes" it really means "skip them" - we want to ignore them. They shouldn't be part of the hash calculation, and we want to return the pointer to past them.
+Linus Torvalds Gotcha! I think I'm just confusing "leading" with "trailing" or something. Let me grab another soda.
+Ben Kevan: no, the grammar people are just wrong on that one. It's a shorthand. I'm not adding two pointless characters to something that is meant to be short.

Putting commas/periods inside of quotes is also completely bogus. So it should "look like this", not "look like this."
Screenshot of code in a proper editor or a link is your best bet, I sometimes dump code to a presentation. So far our G+ formatting options are limited to bold *, italic _, strikethrough -, and relinquish all whitespace to the whims of the machine. I've gone so far as to replace tabs with periods, but even scripted in app, its ugly and cumbersome. Something tells me Googlers' share code with a secret blockquote tag.
+Linus Torvalds I get it now. It was the comment that threw me off. I suppose I imagined it more like "remove leading slashes" but, really, it's a misnomer. In that case, shouldn't the comment be:

/* remove slashes */


Submitted request for markdown/up/sideways for code.
You're the first one I saw posting code in Google+. With formatting it could become a social snippet brainstorming tool. Or a mess.
As long as individual comments are expandable/collapsible.
Use some Unicode whitespace character, e.g. U+2002. This is like +Eric Blossom 's idea in that it requires conversion, but at least it is readable without tricks. Windows victims might have problems entering the character, on Linux just type Ctrl-Shift-u, then hex characters, then space to finish.
 single leading space
  two leading spaces
no leading space
G+ really needs fixed-width code format mode! =(
+Carl Myers Is just took a piece of code and changed leading spaces to U+2002 in vi, then copied it here:

(defun modpow (b e n)
  (if (= e 0) 1
    (if (< e 0) (error "modpow with neg. exponent")
      (if (evenp e) (modpow (mod (* b b) n) (ash e -1) n)
        (mod (* b (modpow b (1- e) n)) n)))))
And? What does this (dirty) optimization look like in terms of performance gains? I'm curious ;)
I feel useless for not knowing any programming language... What a shame...
Nothing wrong with optimizing for special cases. And comments remove some of the confusion. Can you ever have too many comments?
+Ralf Muschall In Windows, if you're using Chrome, it's a piece of cake.

Just hit CTRL-SHIFT-J, then type document.getElementById(':m.f').innerHTML = document.getElementById(':m.f').innerHTML + '\u2002';

     It couldn't be simpler. :/
Uau man. You're amazing doing the real code!
+Baka Lea Not even a little. It's because I have to dedicate to study and to enter a Computer Science university.
+Baka Lea Ah, shucks, G+ dynamically creates a new id for the comment box, so a bit more code would be needed. Why do I use Windows again?
Okay, that C code looked far easier to understand than code in comments imho. :) From Lisp to JavaScript... somehow. :)
+Ralf Muschall IIRC you can hold alt and type the decimal number to get non-typeable characters. I would test it, but I haven't had a Windows machine in years.
+Thomas McGrew This has been working less and less these days. For instance, this is supposedly U+2002 ~~~>☻<~~~
Didn't think so.
Keep writing obfuscated code like that and you'll be the only one working on the kernel again. ;)
I wish I had more context of what Aptenodytes Rex is working on ...
I was confused at first with the name has_zero() because I looked at the use of it before looking at the function itself. The name has_zero() sounds like it would either return true or false if a zero exists, but that didn't make sense with its use. Then I thought that it would set the the bytes with zero in it. That looked better, but the code failed later on. Then I realized that the function was defined above (I hate the way g+ posts code), and after reading the code for has_zero() and the comment you wrote, I see that it only set the highest byte. It should probably be called "fist_zero()". Then the use of it wouldn't be so confusing. Unless you want to make it even more confusing than it already is :-)
Without having spent any time thinking about the problem, perhaps:

len += name[len] == '/';
while (unlikely(name[len] == '/'))

... can get gcc to optimize for the common case (0 or 1)...
I wonder how good or bad using repe scasb for the slash at the end would be...
The last line is definitely not intuitive, but it does turn into some nice assembly.
+H. Peter Anvin: rep scasb is probably not great for this - the overheads just kill you.

That said, the best thing I have found so far is pretty disgusting too, and replaces that final part with


name += len;

/* remove trailing slash: very unpredictable, don't make it a conditional */
name += *name == '/';

/* More trailing slashes? Very predictably not the case ;) */
while (unlikely(*name == '/'))

return name;
instead. Yeah, it actually seems to be measurably faster that way on the microbenchmark test I used. But at the point where I start playing games with avoiding badly predicted branches, I don't think I really care that deeply any more.
Hm, that looks slightly similar to what I had above ;)
You can take the & HIGHBITS out of has_zero(), if that hasn't been optimized away.
I'll ask a candidate about this code next time, thanks!
I'll give you a high five as a prize
Truly, obfuscated yet efficient (obfficient?) code is where the fun is at
Glad I don't maintain Linus's code. First thing I'd do is re-write it to be more obvious and less intellectual masturbation.
First saw the has_zero() style in code that was already old in the 80's (on a system with 60-bit words and 6-bit characters). Somewhat disappointing that compilers still don't implement this this optimization 25+ years later.
+Thefclef sevensevenseven I think +Linus Torvalds has this all figured out now. Just wait till he posts a YouTube video of his cat ranting on security people, while improving on his obfuscated C code. After that, surely the world will be his oyster.
Funny, but what will call hash_name? I can't seem to find it from lxe free-electronic...
+Linus Torvalds, can upload your paste as an image to the instant upload album in picasa, making it easy to share on g+. But only for pastes up to 80x24 so wouldn't help you this time.
Jar Wo
just, looking!
0 people on g+? Oh no, you are all just the voices in my head. XD
+Linus Torvalds: oops, I hadn't seen the endianness dependency. But now that you mention it, it's clear. Side question: is SSE usage in the kernel limited? Because this algorithm would work best in SSE and at 16-byte steps. In fact, it's so close that I guess you wrote SSE and then tried to figure out how to do the same without.

EDIT: the scanning would work in SSE, but the hashing not so much.
G+ should get gist support, in the same way they currently have youtube support, I think.
I could be wrong but: You should never try to optimize branch prediction using a compiled language. Inline the ASM if you must optimize a BP.

1. on a different CPU-ARCH it could actually become more expensive.
2. using a different compiler, it could also be more expensive.
3. in the future the gains will be insignifficant.
4. novice coder's are looking at your code like: WTF!!!?

On the other hand, you can probly never over optimize using bit hacks. Its the closest you can get to assembly in C without writing inline assembly.

If I am incorrect, please educate my ignorance.

If there were a prize it would be the "Bit Hackers" achievement award but in light of the context you are quite obviously due the "Bit Hacking Expert" award.

I like the idea that you will put this in misc.conf but if you are indeed screwing around with arch code and BPs it belongs in arch.conf along with any optimizations that are particular to that arch. -IMHO

Actually I don't even know what I'm talking about! I can read C code but can't write it just yet. I don't have the time to code with her day and night ATM. But that's all stuff I learned in my x86 assembler days ;)

Hmmm I think it suits a coder well to learn ASM before any compiled language.
+Linus Torvalds you got news reports from your security rant, loads of +1's for your cat picture: I just don't see the same level of support for masking for a '/', you needed a picture of a fluffy bird next (possibly after been eaten by the afore mentioned cat)! :)
I always thought I know how to program, until I saw the qsort function in glibc about 15 years ago. Since then I knew there are coders, and there are uber-hackers. And boy is it nice to read stuff like that.
I now know I am a mere coder and not an uber-hacker :-(.
+Thomas McGrew Alt+numbers is rather broken. I guess it was invented back then for latin1, i.e. characters below 256. In addition it behaves differently depending on whether you include a leading zero or not, and in my experiments the codepoints of neither results seemed to coincide with the number I typed. The best method is to get a real real editor (use M-x ucsinsert) or at least vi (use C-v u exactly_four_hexes), then C&P.
That's not obfuscated, that's a normal string function with a little bit hacks, prop. Linus stumbled over

Making it work for big-endian is also possible (and if only by swapping the read long), and if it wasn't for the hash calculation, you could make it work with strict-align targets (calc the missalignedment, fetch from an aligned address under the real pointer, mask out the data that is irrelevant), actually you always want to do the hard alignment anyway to get in swing with the page boundary even on unaligned targets, so you get no problems with the over read at a page boundary.
Or maybe you could make the hash work with strict alignment, i don't feel like understanding that hash now.
Well documented code that is faster is always preferable to slow code as far as I'm concerned.
"Do I get a prize for obfuscated (but efficient!) code?" <-- You'll get flayed.
Depending on how you want to write it, you could probably also do something like:

if (likely(name[len] == '/')) do {
} while (unlikely(name[len] == '/'));

In this style, it looks half-way good.
Hey, Linus! Use this config for
Just create it, and paste these strings and Policykit never anymore disturb you!
[Per user]

You don't need to change distro :)
ajeeee gileeeee bang linus jago bangettt
Oh please write the readable code and patch the compiler to generate the efficient code from it.
ie => id est or internet explorer?

you are not normative.
+Linus Torvalds if you have any interest in formatting your source code on Google+ you can paste it in's "New Paste" box and then copy it out and paste into you Google+ post. The tabs get converted in such a manner that the formating is preserved. Of course no indentation does add nicely to the obfuscation!

/* Return the high bit set in the first byte that is a zero */
static inline unsigned long has_zero(unsigned long a)
return ((a - ONEBYTES) & ~a) & HIGHBITS;
Code? WTF? We want you to post more pussy pictures!
Some of us have entered octal codes with three fingers at a time on the the front panel of a GEPAC 4020 with heated core memory. It takes more than a few XORs to clear the "obfuscated" threshold.
+1 obficient. Thanks for the challenge. I would like to see code to avoid crossing the boundary after end of string? Someone hinted at a solution above but I fail to figure it out in me head.
+Thiago Macieira: it really is the other way around: the new string instructions in SSE4 came from these kinds of bitwise tricks that have been played for decades.

SSE doesn't work in the kernel, and the SSE4.2 "integer vector instructions" are really mostly kind of sad. Many of them would be lovely on regular registers, in the SSE4.2 world they actually generate extra trouble (that whole "unaligned 8-byte load" is noticeably easier than "unaligned 16-byte load" on many microarchitectures, I bet).

And as you noticed, you do often want to mix integer stuff with the code. Although maybe my "use vectors to create hashes" can't be called "common".
+Linus Torvalds: Yup, I understand that. I've had many instances where I just wanted to do something simple in SSE but there was no instruction available.

At least SSE has the "movemask" instruction, which makes it possible to continue some operations in integer registers.
+Kalle Hallivuori It's "easy".
You still over read the string, but it's ok, input after the end is ignored (you check for the terminator), and you won't hit a wrongly page fault because your pointers are aligned:
When you align your load pointer natural (alignment == type size), as long as you use that pointer, you are either within a page (and check for the terminator), or are clear to go to the next page. All you have to do is to get into the swing at start.

For this you align the pointer "down":
ptr = (const unsigned long *)(((intptr_t)src) & ~((intptr_t)(sizeof(unsigned long)-1))
You also have to figure out how many bytes of the next load will be "garbage":
shift = src - (const char*)ptr;
which can be anything from 0 (pointer was aligned) to sizeof(type)-1 (you only have 1 "good" byte).
now you can load:
r = *ptr;
mask out the garbage:
r |= shift ? ONEBYTES >> ((sizeof(unsigned long) - shift)CHAR_BIT) : 0; / little endian */
(for big endian shift direction is then to the left)
and make the test:
r = has_zero(r);
After that you don't need the shift any longer (at least in most string routines, this hash business in Linus function could make it a little bit more complicated). Depending on what you are doing, this shift business can be simplified.
+Jan Seiffert: sadly, exactly because the hash is as important as the length, in my example you really can't do the "round address down" thing, unless you then shift the data back up and re-generate the words (which is very expensive to do without hardware support, although some CPU's do have that support - it's how they do unaligned loads to begin with)

And making the hash be something that is independent of the position of the bytes in the world would make the hash much weaker, so that's not an option either.

So you could do it (hey, x86 has double-word shifts that can be used to do that renormalization), but it would likely make things much less efficient.

Accessing the up-to-seven bytes after the string limits us a bit, but it's mostly acceptable for the kernel. It means you can't use this optimization together with some debugging options, but those debug options make everything else so much slower that nobody will ever even notice that they also disable this particular thing.
+Linus Torvalds Yeah, that you have to keep shifting because of the hash, i figured that out. But that's the point where you have two versions: One for UNALIGNED_OK for sane CPUs and one which does the shifting, to punish bad CPU design ;).
But for the over read: are you sure it only triggers some debug options? Is there never a situation in the Kernel where the string ends shortly before the page boundary and the next page is not existent? Wouldn't the kernel also get a SIGSEGV^w ooops?
oh, right, that's what Linus Torvalds needs: another prize for his code. :P
Don't know if you noticed, but your second constant ends with the swedish word for 'ugly' ;).
Can someone perhaps enlighten me? IMO, has_zero returns the high bit set in ALL bytes that are zero, not just the first one as stated in the comment.
Also, the "mask = (mask - 1) & ~mask; mask >>= 7;" part works as intended (cutting away bytes behind the end of the string) only because of little-endianness, right?
+Triston Taylor there are certainly plenty of things that predict differently on different architectures and sub-architectures, but reducing conditional operations is a universal performance improvement, provided you don't increase other work to the point that it offsets that benefit. A great many bit hacks work exactly this way.
I think I'd write SLASHBYTES as ONEBYTES * '/'
or avoid loop by getting Slashdot'd. ok, it's friday and i ran out of pun for the week...
You'll need root access to be able to +1 this post. Fortunately I can. ;)
Ok, so the terminal tells me that a package is not installed, but knows the apt-get command to resolve this. Can I just type Yes to install it, instead of "Opps, apt-get install whatnot, enter"? Or is this a Ubuntu feature?
+Michael Mattes Yes, has_zero sets the high bit for every zero, but because of carry propagation, since we do not use real SIMD instructions, the detected zeros after the first one are unreliable.
And "mask = (mask - 1) & ~mask; mask >>= 7;" creates the "same" mask on big and little endian, but you can only use it in a useful manner on little endian.
+Jan Seiffert Thanks a lot for the clarification. I was beginning to question myself ;)
Also thank you for confirming the second part about the mask. In big endian, you can get problems with that code if there are two NUL bytes within an 8-byte window. In that case, the code would mask out everything behind the last NUL byte instead of behinde the first one.
+Michael Mattes In big endian the main problem is that the bytes are the "wrong way round" within your register for the mask.
"mask = (mask - 1) & ~mask" is a well known bithack to "single out" and extent the first set bit, problem is: this bithack looks from lower to higher bytes, but in big endian you want to look from higher bytes to lower (within your register, that is).
This code is very readable to me. I've seen a lot worse. Just try looking at the JS code of random websites on the Web. Especially old ones.
+William Chambers you missed the sense of my message. It was less the, how you called 'typical F/OSS fanatic' (how can you call me that when you don't know me!) as the message, that Linus could have pasted his code in Diaspora*. Linus wrote: "Ugh. G+ does something bad to formatting. Is there a way to quote code in G+ postings?"
Of course is everybody free to use the 'platform' of his choice. But this is not a platform, this is a social nettwork. And maybe you noticed that I am here as well.
+Raphael Gradenwitz I kind of get what you're saying. But it's not just G+ though. All of the microblogs I've used, Twitter, MySpace, Facebook, Google Plus, none of them care about formatting text. It's like they just thought we'd never need to format text anymore. I feel fortunate enough that Google Plus thinks it to be important enough to allow you to edit a post, thank you Google, but I think we need formatting to come back around. There are plenty of use cases for social networking, and the rest of us need more options. :)
+David Ball Google+ does not have a lot of formatting... But still more than most other social networks/micro blogging services.
The teacher gave us introduce you. You are a genius.good boy
+Linus Torvalds: Out of curiosity I studied your code, and if I'm not mistaken your has_zero() function doesn't do what is expected. If there are leading 0x01 bytes in front of a NUL byte, they are also marked in the mask because of the borrow bit. You could argue, that there are no 0x01 bytes in path stings, and I agree (even with UTF-8). But you also use it for slash detection, and there you have the same effect with the '.' char, since '.' ^ '/' == 0x01. So if you have a directory name like "foobar.../" it will get handled the same as "foobar////".
Edit: Ok, my mistake was to read the integer literals from left to right, which is of course only valid for big endian.
你好,Linus!我在研究linux 0.11。
"horribly badly" english teacher would have a field's either horrible, or bad..not both ;)
There is one way to post code properly on G+, although it is a bit convoluted. Upload the code to Google Drive with the original extension. Remember to deny the conversion if a banner asks you (no effect on C, but important for html). Then share the link via G+. Viewers will need to click on the link, but the indentation is retained and code is highlighted as an added bonus.
Add a comment...