Shared publicly  - 
 
Almost as surprising as "#define struct union".
A hash function for a particular hash table should always be deterministic, right? At least, that's what I thought until a few weeks ago, when I was able to fix a performance problem by calling ra...
16
8
Toke Eskildsen's profile photocongwu chen's profile photoElazar Leibovich's profile photoAlbert Strasheim's profile photo
33 comments
 
Oddly in python, you can retrieve a value with a NaN key. It has just has to be the same instance of NaN; the runtime seems to use `is` instead of `==` when matching the keys.

Not sure how useful this would be though, especially not at the cost of quadratic time growth.
 
One issue with this solution is that "while (1) table[x] = 0;" can now suck up unbounded memory for certain values of x, whereas normally it's guaranteed to use O(1) storage.
 
Yes, testing with Python is very difficult, because once the NaN turns into an allocated Python object, the comparisons assumes that an object is always equal to itself. If it is given the same object twice, it sees the pointers are the same and doesn't do the full compare. Since that's not actually true it can lead to very surprising behavior. I was careful to work around that in the snippet.
 
+Geoffrey Irving thats already an issue because NaN != NaN, the solution prevents it from effectively turning the hash table into a list
 
How does one remove a value from a map with an NaN key? It seems like NaN is the only key where iterating over the map will not let you remove the associated value with delete, it just is a no-op. This seems unexpected, and a likely source of bugs, though I don't see a clearly better approach: http://play.golang.org/p/i60e854DOy
 
+Russ Cox - did I miss the discussion of the 4th alternative? Why not leave it the way it is - allow people to put nan into a hash table, but with bad performance if they do it multiple times - it requires the least amount of extra code, and I find it no less intuitive than the other options.
 
The semantics are unchanged, and completely implied by Go's definition of == and maps. I just made it not take forever to execute.
 
The real WTF is that you're using floating point numbers as keys in a dictionary.
 
Nice fix, though I have to say that I can't actually remember ever having seen a floating point type used as a map key in practice. (I'm sure it happens, but it's not common.)
 
Why should we allow map with floating point key anyhow? If the keys does not have a reflective equality operation, it doesn't make sense to use them as keys. There's no harm with disallowing it, client who really wants to use them as keys, can cast them to byte array, and use them like that (and if he cares about +/-0, he should fix it himself). Sounds better to me than special casing float in a map. (anyhow great post).
 
+Gareth Rees you missed the end of +Russ Cox 's post: "(Note: this is different from the hash table performance problem that was circulating in December 2011. In that case, the predictability of collisions on ordinary data was solved by making each different table use a randomly chosen hash function; there's no randomness inside the function itself.)"
 
Are you sure that code isn't in on the joke? I heard NaNs were an April Fool's joke that got away.
 
The code is demanded by the semantics of NaN, ==, and maps. It is not a joke, unless NaNs are.
 
+Russ Cox it makes more sense IMHO to disallow using NaNs as keys, panicing if someone tries too. After all, I can't see legitimate reason to use NaN as key, and allowing that just gives the user rope to hang himself on.
 
+Vaughn Wood he want it to be available when iterating over the list. "for nan,v := range nansmap {println(v)}" should print all possible values.
 
It's very dangerous to start poking holes in your abstractions. Right now the set of types valid for map keys and the set of types valid for == are exactly the same set. If you remove float64 from the former then you create a new kind of type, which makes the language more complex. Also, you're not just excluding float64 but also any struct or array containing a float64 field or element. It's a very dangerous road to explore, introducing complexity into your language just because you think someone shouldn't do something. Usually only reductions in complexity justifies such a judgment. Contrary to popular belief, floating-point numbers are not always approximations, and it is possible to use them correctly.
 
+Russ Cox Thanks, you have a point. However the reason is not only "someone shouldn't do something", but more "hash key must have an equality operator, and == is not equality operator, since it's not reflective".

But I'm not sure if I understood you correctly.

Do you argue that indeed there are use cases where one will want to use NaN as a key (or any other object with non-reflective or non-transitive equality operation), or do you think that indeed there's no such use case, however since == is defined, we must support them as keys?
 
I think using NaN as a key is dumb, but the language allows it, because trying to disallow it pokes out a large, irregularly-shaped hole. It's unfortunate, but as I said in the post, it is the least unfortunate of the alternatives. Given that, might as well make sure that map operations are still something like O(1).
 
Seems a fair-enough solution; if one’s using NaN-tagging then one can key as bytes rather than floating-point.

> reading m[NaN] never finds any data

It could cause confusion that Python does let one lookup a NaN key sometimes because of the id() check, e.g.

$ cat nan.py
def f(): return float('nan')
d = dict((k, id(k)) for k in (f(), f(), f()))
print repr(d)
for k in d.keys(): print k, d[k], id(k)
$ python nan.py
{nan: 9610768, nan: 9610744, nan: 9610720}
nan 9610768 9610768
nan 9610744 9610744
nan 9610720 9610720
$

Has anyone pinged the Python crowd so they know about the long hash chains? +Guido van Rossum
 
Huh, made me think. I thought about a NaK (non-available key) implementation instead, but the best I could make of it was nowhere near as simple as using rand.
 
The article's idea to call rand() for NaN is flawed. The right thing to do in $GOROOT/src/pkg/runtime/alg.c instead of IF(f != f) THEN {hash=runtime·fastrand1();...} is to do IF(f != f) THEN {runtime·memhash(h,sizeof(f),&f); return;} - that is: to fall back to the binary representation of the floating point value.
 
Then you will get long hash chains leading to quadratic behavior or hash table overflows.
 
... not after adding IF(✱(float64✱)a != ✱(float64✱)b) THEN {runtime·memequal64(...);return;} in the runtime·f64equal() function.
 
I failed to correctly write my last post. What I meant was: to put IF(IsNaN(✱(float64✱)a) || IsNaN(✱(float64✱)b)) THEN {runtime·memequal64(a,b);return;} in the runtime·f64equal() function. I apologize for the confusion.
 
But then you can have a float64 f such that f == f is false but [1]float{f} == [1]float{f} is true. That's pretty surprising.
 
f==f is either true or undefined. It cannot be false. When it is undefined (NaN), the implementation should fall back to comparing raw bits (which is always defined and yields true or false).
 
f==f is defined to be false when isNaN(f). that's just how NaNs work.
Add a comment...