Profile cover photo
Profile photo
Ben Tilly
Ben Tilly's posts

Google just decided to push their new chat to my work account.  And when they did, they broke the old so that it doesn't know who is online.

Google UI idiots, stop making me suffer through your idea of a cool interface that really sucks to do basic things like scroll through to find something that was said 20 messages..  And stop breaking simple looking things that Just Work Really Well.

When this arrives for my personal account, I'm going to be in the market for an alternative webemail account.  If I find a decent one, I'll just forward the old gmail to that one.  Really, it is that unusable.

(Seriously, I have not seen a single change to gmail that I have liked since I started using the original chat in 2006.)

Post has attachment
I think that raises an excellent possibility that Google, Microsoft, FB, etc could be providing access to the NSA without knowing that they do because the NSA is working through moles that it has planted.

Were I an executive at any of those companies I would take this very seriously.  And if the theory is true then I, as an outsider, would give kudos to the first company to come forth with a statement along the line, "We have identified and fired rogue employees who provided access to our data to the NSA.  We are searching for more and are considering our legal options at this point."

This is annoying.

I have received a chat on gmail that I looked at, then clicked on.  That used to mark it as having been seen so that it would stop notifying me by flashing my gmail.  But didn't.  It keeps trying to notify me that there is no chat.  So I closed the chat window and went back to another tab.  The chat window decided that the message I'd already seen was a new message, opened the window, and continued to notify me.

Of course Google doesn't do customer support, or provide easy ways to send bug reports.  So I'll just broadcast this to everyone I know, which includes several people at Google.  Starting today your gmail chat no longer can remember that I've seen a particular message, and it is really annoying.  Please fix it.

Post has attachment
Stupid lesson I figured out in the last couple of days, you don't want to optimize a probability model for the overall probability.  I'm sure it is old hat and tons of people know it, but it is still a useful fact that is new to me.

Here is what I mean.  Suppose that you have a series of independent data points, and some model that assigns a probability p to each data point being a success.  What makes a model good?  One reasonable-seeming answer is that you want to maximize the probability of the sequence of yes/no answers that you have.  Which is equivalent to maximizing the log of the probability.  Which means that you maximize the sum over all of the data points of log(p) for successes, or log(1 - p) for failures.

So far, so good.  And as a reasonable sanity check of a model you can look at how the observed log of the probability of everything seen compares with the expected log of the probability.  Which is easy to do because you added together a bunch of binary variables, with known possible outcomes and predicted probabilities.

This is a reasonable way to arrive at a bad solution.  Why is it bad?  I could give several reasons, but the one that I want to focus on is that your sanity check basically ignores any data points that received a prediction near 50%.  Why?  Because in that case log(p) and log(1 - p) are the same.  So no matter what outcome you have, you get the same answer.

But can we do better?  And if so, then how?

Well let's go back to first principles.  Right now we have two functions, x(p) = log(p) and y(p) = log(1 - p).  The function we are trying to optimize is the sum, over all data points, of x(p) for the successes, and y(p) for the failures where p is the prediction from our model.  What we want are a different pair of functions x(p), y(p), that we can use the same way, but which "work better".  In particular we have two conditions that we care about:

1. If the true probability of a particular data point being marked success is q, then the optimal prediction is p = q.

2. If predictions are consistently off (high or low), then the function that we are trying to optimize will come in statistically different from what we expect.

The log(probability) rule satisfies #1 admirably, but fails #2.  We want both.

Let's start with basic definitions/facts.  I've already described x and y.  p is always the predicted probability of success, and q is the (unknown) true probability of success.

The expected value of a single observation is: q x(p) + (1 -q) y(p)

The variance of a single observation is: q (1 - q) (x(p) - y(p))^2

(Sorry, you'll have to understand elementary calculus to follow some of teh steps from here on in.)

The rate of change in the expected value as q changes is: x(p) - y(p)

From now on, unless it is ambiguous, I'll drop the (p) as noise.  x and y are always functions of p.

Therefore as long as x and y are always different, condition #2 is always satisfied.  (The log(probability) measure behaves poorly because at p=0.5 they are not different.)  We have an infinite number of choices in how to decide that.  The choice that I will make is to have the ratio between the rate of change in the expected value and the variance be 1.  That condition means that the variation in the prediction of the function we want to optimize is correlated with the size of prediction error that we expect to be able to identify.

In short if the prediction is accurate and p = q, then: x - y = p (1-p) (x - y)^2

Divide both sides by p (1 - p) (x - y) and we get: 1/(p (1 - p)) = x - y

Solve for y and: y = x - 1/(p (1 - p))

We still don't know what x should be.  But we have condition 1, which means that the prediction that maximizes the expected value q x + (1 -q) y should be p = q.

Substituting for y, expected value = q x + (1 - q) y = q x + (1 - q) (x - 1/(p (1 - p))) = x - (1 - q)/(p (1 - p))

Now a necessary (but not sufficient) condition for the maximum is that the derivative is 0.  So differentiating the expected value by p we get:

d/dp(x - (1 - q)/(p (1 - p))) = x' + (1 - q) / ( 1/(p^2 (1 - p)) - 1/(p (1 - p)^2) )

We want this to be 0 if p - q.  So we want:

0 = x' + 1/p^2 - 1/(p ( 1-p))
x' = 1/(p (1 - p)) - 1/p^2 = 1/p + 1/(1 - p) - 1/p^2

We'll want to integrate that.  That's easier when I separated it all out as above.  So, up to a useless constant of integration, we get

x(p) = log(p) - log(1 - p) + 1/p

y(p) = log(p) - log(1 - p) + 1/p - 1/(p (1 - p))
    = log(p) - log(1 - p) - 1/(1 - p)

Now one should never trust a calculation like this without a sanity check.  According to the derivative of x with respect to p is (1 - 2p)/((p-1)p^2).  The derivative of y is similarly (1 - 2p)/((p-1)^2 p).  If you plug those back into the partial of expected value with respect to p, and set p = q, we get 0.  (If you try it, don't forget that (1-q)/(p-1) will work out to -1...)

Now we do a final sanity check.  Is this a maximum?  Well plug in some numbers and..shoot.  It is an inflection point.

But some ways of finding a maximum work just fine for inflection points instead.  For the cost of using those and knowing that you want an inflection point, you can use this choice of x and y, and what you gain in consistent sensitivity to prediction errors over your entire range is worth the complications.

Now I just need to translate this into code for the program that I'm writing...

This was kind of a cool trick.

I have a program that has a clear bug.  Unfortunately the easiest test case that I had only showed the bug at the end of a 40 minute run, processing a half-million records.  But if I ran my program again, it was able to notice things it could prove it clearly did wrong the first time.  (It probably did more things wrong, but it only detected some of them.)

I decided to evolve a faster test case.

I first wrote a script to run the program twice, and report how many things it found wrong on the second run.

I wrote a second script that would take a record set, sample half of it randomly, and then run the first script.

I wrote a third script that would take all records, run 4 copies of my test program, and then save the smaller record set that best demonstrated the bug.  Wash, rinse, and repeat.  Once it was no longer able to demonstrate the bug, it would run through its list of "best small examples" from smallest to biggest, trying again, any time it made progress repeating the process.

All three scripts combined come to 150 lines.  I left them running for a few hours and just checked how they were doing.

I now have a dozen small acceptably reproduction cases for my bug, including one with 362 records that runs in a fraction of a second.  (The program is doing statistical analysis, so there is never going to be a single record that shows the bug.  362 records to reproduce it is tiny.)

Google, why did you destroy my gmail with your new reply interface?  I honestly want plain text, with > to indicate quoted stuff.  Don't ruin that for me.

For a long time that's worked for me reliably.  But now you suddenly told me that I'm getting a new reply interface.  I found the option to switch back, then go back to plain text mode, only to find that your quoting doesn't work as well as I am used to it working.  If the original was a quoted reply with a long line, you only know to insert "> >" on the first line, but not the line breaks.  And the line break/reflow text now works worse than it has in years.

I dislike.  Intensely.

Post has shared content

Post has attachment
I was in a discussion where I was asked to justify my claim that STL iterators are a bad realization of the fact that iteration and data structures are two concepts that should be separated.  See for my explanation of the shortcomings of STL iterators that I see.

Post has attachment
I've long known that unit testing goes back farther than most think.  For example, Perl 1 in 1986 came with unit tests, and this is part of how Perl had such a good early portability history.

But tonight I got curious how far back unit testing goes.  The oldest reference that I can find is which places a clear reference to unit testing in an IBM proposal in 1962.  According to accounts of that project, the developers would write unit tests in the morning, and make them pass by the end of the day - so they actually were doing test driven development!

Does anyone know of an older unambiguous use of unit testing in software development?

Post has attachment
The best thing that I've read on the entire Adria Richards mess has got to be  It has everything from mature perspective, to inside information on Adria's personal history of overreacting and publicly escalating about perceptions of sexism in other situations.
Wait while more posts are being loaded