Profile cover photo
Profile photo
Patrick Honner
Math Teacher in Brooklyn, NY
Math Teacher in Brooklyn, NY

Communities and Collections
View all

Post has attachment
It was an honor to appear on My Favorite Theorem, and I had a blast talking about geometry and teaching with +Evelyn Lamb and Kevin Knudson!
Patrick Honner's Favorite Theorem
Patrick Honner's Favorite Theorem
Add a comment...

Post has attachment
Start the new year off with a little number theory!
Add a comment...

Post has attachment

Post has attachment
Happy Pythagorean Triple Day!
Animated Photo
Commenting is disabled for this post.

Post has attachment
The existence of this is truly amazing.
Add a comment...

Post has attachment
There were at least three mathematically erroneous questions on the New York State Geometry exam this June. Here's an analysis of one of them:
Add a comment...

Post has attachment
Had fun figuring out how to implement dynamic Voronoi diagrams in Scratch. I currently have a couple of students working on this as a project, so I figured I better see it was actually possible!

Code here:
Animated Photo
Add a comment...

Post has attachment
Thanks to Sir +Timothy Gowers, I always preface integration by parts with the method of successive approximation.
Add a comment...

Post has attachment
Apparently 20 years ago Bret Victor made an ASCII-based Street Fighter game. Because he's Bret Victor.
Add a comment...

Post has shared content
Predicting the election: a bit of information

Right now Nate Silver's site FiveThirtyEight says Hillary Clinton has a 64.2% chance of winning the election, while Sam Wang's Princeton Election Consortium says she has more than a 99% chance.  

The obvious question is: who is right? 

But that's not a very good question.   For starters, maybe neither is right!

A more reasonable question is: who is closer to being right?

But even this is very tricky.   For starters, it's hard to define what it means to be right about such an estimate. Probability and statistics are slippery subjects.    And a probabilistic prediction about a single event that will never be repeated is about as slippery as it gets.

If you have technical ideas about why Nate Silver and Sam Wang get such different results, I'd be happy to hear them.  Silver keeps saying that outcomes for different states are highly correlated, so that if the polls are wrong enough to change the outcome in one state, it's more likely to happen in others.  It's true that if the states were completely uncorrelated, we could be extremely sure about the election by now.  But what exactly is Silver's model of correlation?  And what is Wang's?  And is this enough to explain the difference?

But if you want to discuss politics, don't do it here.   There are enough other places to do that - like, the whole fucking internet.   Any word endorsing or criticizing a candidate will be enough to get your comment deleted.

Personally I just have a tiny contribution to make here.   I can only answer this question:

How much information would you get if you suddenly learned that Hillary Clinton had a 64.2% chance of being elected?  Or a 99% chance?

This is what the concept of relative information is good for.  It's relative, because how much information you get depends on what you believe beforehand.

If you thought that Clinton had a probability q of winning, and you learn that Clinton has a probability p of winning, you have gained this much information:

p log(p/q) + (1-p) log((1-p)/(1-q))

For example, suppose you started out thinking the candidates each have a 50% of winning.   That's a reasonable assumption if you just came from Mars and are completely ignorant about the situation.  Then q = 1/2, so the formula above becomes

p log(2p) + (1-p) log(2(1-p))

If we do the logarithms here in base 2, we are measuring the information in bits.   Then we can simplify the formula and get

1 + p log(p) + (1-p) log(1-p)

For example, suppose you learn that yes, Clinton indeed has a 50% chance of winning!  Then p = 1/2 and the formula above gives 0.  I will spare you the calculation, but this makes sense: you have gained no information.  You thought Clinton had a 50% chance of winning, and you learned that's right, so you learned nothing new.

Or, suppose you read Nate Silver's blog and discover that Clinton has a 64.2% chance of winning.    Then you've gotten this many bits of information:

1 + 0.642 log(0.642) + (1 - 0.642) log(1 - 0.642)

That's about 0.06 bits of information!  Not much! 

And I think it's very fascinating that such a smart guy could analyze so much polling data about the election and only feel able to extract 0.06 bits of information about this all-important question: who will win? 

It shows an amazing lack of confidence.  But that may be a good thing.  I wish I knew.

If you believe Sam Wang's blog, on the other hand, you'll discover that Clinton has an over 99% chance of winning.  If we say it's 99%, then you've gotten this many bits of information:

1 + 0.99 log(0.99) + (1 - 0.99) log(1 - 0.99)

That's about 0.92 bits.  In other words, almost the complete answer to the question.

All this illustrates the power, and value, of a single bit of information.  Claude Shannon realized early on that if you have one bit of information that other people don't know, and you can get them to bet on it, you can double your money - on average. 

Similarly, an investor who gets 1/5 a bit of such information can expect to multiply her money by 2 to the 1/5 power, which is almost 1.15.  So, one fifth a bit of 'actionable information' per year is enough to make a 15% annual return!

For more on relative information, see this:

It goes by a lot of other names, like Kullback-Leibler divergence - but I find that name hopelessly obscure.   For Nate Silver's election predictions, go here:

It's changed a bit while I was writing this post!  For Sam Wang's, go here:

#information #informationtheory
Add a comment...
Wait while more posts are being loaded