Profile cover photo
Profile photo
John Moeller
PhD Student in Computer Science
PhD Student in Computer Science


Post has attachment

Post has shared content

Post has shared content
Just realized with some horror:

sed 's/Santa\ Claus/NSA/g' < Santa-Claus-coming-to-town.lyrics > history-of-2015
Add a comment...

Post has shared content
This is one of those news items that hasn't gotten nearly enough coverage -- because it's the sort of thing that makes professionals go OH YOU HAVE GOT TO BE FUCKING KIDDING ME.

What happened? Back in 2005, the Bureau of Justice Statistics (a branch of the DOJ) did a study on recidivism, and found out that the rate is tremendously high: 68% of state prisoners end up back behind bars within three years of release. Once a criminal, always a criminal, they concluded -- and people have been shaping policy to match.

But a team read through it carefully, and it turns out that the BJS made a basic, bonehead, mistake in their statistical analysis. They thought they were measuring whether people who go to prison will reoffend; what they actually measured was that most people in prison, on any given day, are repeat offenders.

Which makes sense, because repeat offenders spend a lot more time in prison than one-time offenders. 

These are not the same thing. At all. It turns out that if you do the analysis right, only 30% or so of prisoners will ever re-offend, and only 11% will do so multiple times. In fact, this "once a criminal, always a criminal" rule appears to be completely false -- unless, that is, you structure policies so that anyone with a criminal conviction is treated like a permanent criminal, and so not allowed to (say) get virtually any job other than "criminal." In which case, you will in fact end up with lots of criminals.

In the post linked below, +Andreas Schou gives some of the explanation of what went wrong in the study. You can read more at the linked Slate article (, and even more with the paper that actually found the mistake. (

The most important lesson in all of this is that it's easy to make bonehead mistakes in statistics. If the statistics matter -- if you're going to use them to prescribe drugs or set public policy or something like that -- it's very important to have people check your work, repeatedly, and ask the right questions. The most important question is "have you actually measured what you think you measured," because there are all sorts of ways to screw that up. 

There's also a great new book on that subject: Alex Reinhart's Statistics Done Wrong. ( Please, if you do statistics in your daily life, read it. 
Add a comment...

Apparently the writers of gawk put internet sockets in it. So you can make servers. Instead of, you know, running it over an ssh session like a normal person.
Add a comment...

In the process of forcing myself to learn awk. It dawned on me as I'm skimming the user guide that it's practically custom-built for mapreduce (albeit in a time-reversed way). I wonder why more people don't use it for mapreduce jobs.
Add a comment...

I have got to add `git checkout -p` and `git reset -p` to my git vocabulary. I love `git add -p` so much, I didn't realize there was a way to go backwards the same way.
Add a comment...

Post has shared content
Is that a Panda, or a Gibbon? Investigating the mystery of Adversarial Examples

Machine Learning (ML) models show great promise in the field of computer vision, which focuses on enabling systems to model and understand digital images automatically ( 

But sometimes those ML systems get it wrong. As it turns out, many machine learning models, including Neural Networks, have intriguing properties. One such property, “blind spots” (, causes them to misclassify adversarial examples - images that are formed by applying very small, but intentional, perturbations to existing correctly labeled examples. Moreover, when these different models misclassify an adversarial example, they often agree with each other on its class. But why does this happen? 

At the 2015 International Conference on Learning Representations (, Google Research Scientists +Ian Goodfellow, +Jon Shlens, and +Christian Szegedy presented Explaining and Harnessing Adversarial Examples (, where they investigate neural networks’ vulnerability to adversarial perturbation.

Previously, the thinking was that adversarial examples were due to overfitting and the non-linear nature of Deep Neural Networks. In this paper, the authors argue that, rather, existing models are too linear, and that generalization of adversarial examples across different models can be explained as a result of the  different models learning similar functions when trained to perform the same task. In doing so, they propose a fast method of generating adversarial examples that can be used to help train models to resist adversarial perturbation.
Add a comment...

Post has shared content
Did you know you could get bibtex directly from a doi? It's called DOI content negotiation and it can do a lot of other really cool tricks.

I don't know how to do get bibtex from the browser but this works on the command line:

curl -LH "Accept: application/x-bibtex"

Here is the magic output:

doi = {10.1007/s11083-012-9252-6},
url = {},
year = 2012,
month = {mar},
publisher = {Springer Science $\mathplus$ Business Media},
volume = {30},
number = {2},
pages = {415--426},
author = {Fran{\c{c}}ois Gilbert Dorais and Steven Gubkin and Daniel McDonald and Manuel Rivera},
title = {Automorphism Groups of Countably Categorical Linear Orders are Extremely Amenable},
journal = {Order}
Add a comment...

Post has shared content
Do you want machine learning to be fair ? Accountable to more than its masters ? And transparent for all to understand and interpret ? Submit your abstracts TODAY for the ICML workshop on Fairness, Accountability and Transparency in Machine Learning (FATML) !
Add a comment...
Wait while more posts are being loaded