Profile cover photo
Profile photo
Stats With Neil
Making your research make sense
Making your research make sense

Stats With Neil's posts

Post has shared content
Age Distribution of Olympians by Sport

Athletics is a wonderful source of data for statistical analysis. Inspired by a story at the Washington Post (, Gregory Matthews from the University of Massachusetts created more detailed graphics from data that he was able to scrape from the web. He used the humble boxplot to summarize the ages of olympians by sport.

Female gymnasts are pretty clearly the youngest athletes with shooting and equestrianism holding up the top end.  As always with boxplots, some of the outliers are most interesting, like the 11 year old female swimmer and the archers and sailors over age 60.  Most interesting to me is the way that different juxtapositions suggest different conclusions: sorting by sport gives a clear picture of male vs. female ages within the same sport while sorting by the median age gives a good overview.

(via +Derek Bruff)

Post has attachment
Certainty about model parameters is not certainty about reality

Over at his blog, William Briggs discusses the fact that most statistics tests the values of parameters in our models of reality. (Don't worry, I do it too.) However, that fact can lead to seemingly contradictory information such as the situation described there where two analyses of the same data gave statistically significant evidence of two opposing facts.

His point however is that each study found statistically significant evidence of the parameters of their own model. By using different models, it is entirely reasonable that they would come up with different conclusions. The mismatch between model and reality is what explains this situation. If neither model accurately reflects reality, then using data collected from the real world can easily give nonsensical results when applied to the two different models.

As a take-away, I suggest that we remain humble about what the results of our statistical tests tell us. They are statements of certainty about the parameters of our models which may (or may not) be an accurate reflection of reality.

Post has attachment
Everyone Should Learn Statistics

From a blog at the Chronicle of Higher Education, the author talks about a use of statistics in a jury trial in Washington, D.C. His point is about the misuse of statistics by the defense attorney, comparing the score on a test to a probability that two things are related. I completely agree with his point that statistics (or data literacy as I like to think of it) should be a part of the school curriculum. This will only become more relevant as larger swaths of our lives become measured and quantified.

Post has shared content
Estimation is an important quantitative literacy skill!
Guesstimation #2—How massive is a mole of cats?
A mole is the number of atoms that weigh that element’s atomic weight in grams. For example, a mole of hydrogen weighs 1 gram and a mole of carbon weighs 12 grams. It is used in chemistry to make sure that there are equivalent numbers of atoms for a chemical reaction. Compare this to the mass of a mountain, a continent, the moon, the Earth.
Post your answer on our blog:

#math #MathAwareness #mathematics

Post has shared content
This article mentions cluster analysis in a very interesting re-analysis context. This time it's analysis of unreproducible data from the surface of Mars.
This re-evaluation of the Viking mission data uses one of my favorite statistical techniques, cluster analyses. It's great to see CA used to classify the control and active experiments. What do you think about these new analyses?

Post has attachment
I am beginning to develop a sense that +Randall Munroe, besides being a funny geek, is one of the best data designers in this generation. See:

Gravity Wells:

Leave a comment if you have some other favorites.

Post has attachment
This article gives a great application of probability theory to a fundamentally statistical question: what is the probability that your observed data arose from a null distribution by chance.

The calculation is presented at the end, but the essence of his point is that the researcher mis-stated what was essentially an attained significance level (a p-value) in his interview with the New York Times. Nice work keeping the original researcher honest.

Post has attachment
Used this comic in class on Monday to explain the perils of multiple comparisons. Thanks +Randall Munroe!

Post has attachment
Stephen Wolfram provides a fascinating look at his own life through the lens of big data.

Post has attachment
Dr. Judea Pearl wins the ACM's Turing Prize for advances in artificial intelligence. His research on reasoning in Bayesian networks is extremely relevant to the relationships between causation and correlation that we wish to explore in statistics.

See for slides from an interesting lecture on the topic or for an introductory explanation of the ideas.

Via +Daniel Kaplan.
Wait while more posts are being loaded