Profile cover photo
Profile photo
John Moeller
PhD Student in Computer Science
PhD Student in Computer Science

John's posts

Post has shared content

Post has shared content
Just realized with some horror:

sed 's/Santa\ Claus/NSA/g' < Santa-Claus-coming-to-town.lyrics > history-of-2015

Post has shared content
This is one of those news items that hasn't gotten nearly enough coverage -- because it's the sort of thing that makes professionals go OH YOU HAVE GOT TO BE FUCKING KIDDING ME.

What happened? Back in 2005, the Bureau of Justice Statistics (a branch of the DOJ) did a study on recidivism, and found out that the rate is tremendously high: 68% of state prisoners end up back behind bars within three years of release. Once a criminal, always a criminal, they concluded -- and people have been shaping policy to match.

But a team read through it carefully, and it turns out that the BJS made a basic, bonehead, mistake in their statistical analysis. They thought they were measuring whether people who go to prison will reoffend; what they actually measured was that most people in prison, on any given day, are repeat offenders.

Which makes sense, because repeat offenders spend a lot more time in prison than one-time offenders. 

These are not the same thing. At all. It turns out that if you do the analysis right, only 30% or so of prisoners will ever re-offend, and only 11% will do so multiple times. In fact, this "once a criminal, always a criminal" rule appears to be completely false -- unless, that is, you structure policies so that anyone with a criminal conviction is treated like a permanent criminal, and so not allowed to (say) get virtually any job other than "criminal." In which case, you will in fact end up with lots of criminals.

In the post linked below, +Andreas Schou gives some of the explanation of what went wrong in the study. You can read more at the linked Slate article (, and even more with the paper that actually found the mistake. (

The most important lesson in all of this is that it's easy to make bonehead mistakes in statistics. If the statistics matter -- if you're going to use them to prescribe drugs or set public policy or something like that -- it's very important to have people check your work, repeatedly, and ask the right questions. The most important question is "have you actually measured what you think you measured," because there are all sorts of ways to screw that up. 

There's also a great new book on that subject: Alex Reinhart's Statistics Done Wrong. ( Please, if you do statistics in your daily life, read it. 

Apparently the writers of gawk put internet sockets in it. So you can make servers. Instead of, you know, running it over an ssh session like a normal person.

In the process of forcing myself to learn awk. It dawned on me as I'm skimming the user guide that it's practically custom-built for mapreduce (albeit in a time-reversed way). I wonder why more people don't use it for mapreduce jobs.

I have got to add `git checkout -p` and `git reset -p` to my git vocabulary. I love `git add -p` so much, I didn't realize there was a way to go backwards the same way.

Post has shared content
Is that a Panda, or a Gibbon? Investigating the mystery of Adversarial Examples

Machine Learning (ML) models show great promise in the field of computer vision, which focuses on enabling systems to model and understand digital images automatically ( 

But sometimes those ML systems get it wrong. As it turns out, many machine learning models, including Neural Networks, have intriguing properties. One such property, “blind spots” (, causes them to misclassify adversarial examples - images that are formed by applying very small, but intentional, perturbations to existing correctly labeled examples. Moreover, when these different models misclassify an adversarial example, they often agree with each other on its class. But why does this happen? 

At the 2015 International Conference on Learning Representations (, Google Research Scientists +Ian Goodfellow, +Jon Shlens, and +Christian Szegedy presented Explaining and Harnessing Adversarial Examples (, where they investigate neural networks’ vulnerability to adversarial perturbation.

Previously, the thinking was that adversarial examples were due to overfitting and the non-linear nature of Deep Neural Networks. In this paper, the authors argue that, rather, existing models are too linear, and that generalization of adversarial examples across different models can be explained as a result of the  different models learning similar functions when trained to perform the same task. In doing so, they propose a fast method of generating adversarial examples that can be used to help train models to resist adversarial perturbation.

Post has shared content
Did you know you could get bibtex directly from a doi? It's called DOI content negotiation and it can do a lot of other really cool tricks.

I don't know how to do get bibtex from the browser but this works on the command line:

curl -LH "Accept: application/x-bibtex"

Here is the magic output:

doi = {10.1007/s11083-012-9252-6},
url = {},
year = 2012,
month = {mar},
publisher = {Springer Science $\mathplus$ Business Media},
volume = {30},
number = {2},
pages = {415--426},
author = {Fran{\c{c}}ois Gilbert Dorais and Steven Gubkin and Daniel McDonald and Manuel Rivera},
title = {Automorphism Groups of Countably Categorical Linear Orders are Extremely Amenable},
journal = {Order}

Post has shared content
Do you want machine learning to be fair ? Accountable to more than its masters ? And transparent for all to understand and interpret ? Submit your abstracts TODAY for the ICML workshop on Fairness, Accountability and Transparency in Machine Learning (FATML) !

Post has shared content
2nd Workshop on Fairness, Accountability, and Transparency in Machine Learning

ICML 2015
July 11, Lille, France
Submission Deadline: May 1, 2015



Machine learning is increasingly part of our everyday lives, influencing not only our individual interactions with online websites and platforms, but even national policy decisions that shape society at large. When algorithms make automated decisions that can affect our lives so profoundly, how do we make sure that their decisions are fair, verifiable, and accountable? This workshop will explore how to integrate these concerns into machine learning and how to address them with computationally rigorous methods.

The workshop takes place at an important moment. The debate about ‘big data' on both sides of the Atlantic has begun to expand beyond issues of privacy and data protection. Policymakers, regulators, and advocates have recently expressed fears about the potentially discriminatory impact of analytics, with many calling for further technical research into the dangers of inadvertently encoding bias into automated decisions.  At the same time, there is growing alarm that the complexity of machine learning may reduce the justification for consequential decisions to “the algorithm made me do it”.  Decision procedures perceived as fundamentally inscrutable have drawn special scrutiny.

The workshop will bring together an interdisciplinary group of researchers to address these challenges head-on.


We welcome contributions on theoretical models, empirical work, and everything in between, including (but not limited to) contributions that address the following open questions:

* How can we achieve high classification accuracy while preventing discriminatory biases?

* What are meaningful formal fairness properties?

* What is the best way to represent how a classifier or model has generated a particular result?

* Can we certify that some output has an explanatory representation?

* How do we balance the need for knowledge of sensitive attributes for  fair modeling and classification with concerns and limitations around the collection and use of sensitive attributes?

* What ethical obligations does the machine learning community have when models affect the lives of real people?


Papers are limited to four content pages, including figures and tables, and must follow the ICML 2015 format; however, an additional fifth page containing only cited references is permitted. Papers SHOULD be anonymized. Accepted papers will be made available on the workshop website; however, the workshop's proceedings can be considered non-archival, meaning contributors are free to publish their work in archival journals or conferences. Accepted papers will be either presented as a talk or poster (to be determined by the workshop organizers). 

Papers should be submitted here:

Deadline for submissions: May 1, 2015
Notification of acceptance: May 10, 2015


Workshop Organizers:

Solon Barocas, Princeton University
Sorelle Friedler, Haverford College
Moritz Hardt, IBM Almaden Research Center
Joshua Kroll, Princeton University
Carlos Scheidegger, University of Arizona
Suresh Venkatasubramanian, University of Utah
Hanna Wallach, Microsoft Research NYC
Wait while more posts are being loaded