Cover photo
John Mount
Works at Win-Vector LLC


John Mount

Shared publicly  - 
New technical article: "Using closures as objects in R"  Some R as a programming language writing.

John Mount

Shared publicly  - 
Don’t use the Sharpe ratio to A/B test email campaigns 
Having worked in finance I am a public fan of the Sharpe ratio. I have written about this here and here. One thing I have often forgotten (driving some bad analyses) is: the Sharpe ratio isn't appropriate for models of repeated events that already have linked mean and variance (such as Poisson ...

John Mount

Shared publicly  - 
Win-Vector LLC is proud to announce the R data science value pack. 50% off our video course Introduction to Data Science (available at Udemy) and 30% off Practical Data Science with R (from Manning). Pick any combination of video, e-book, and/or print-book you want. Instructions below.

John Mount

Shared publicly  - 
Nina Zumel and I are proud to announce our new data science video course: Introduction to Data Science.  Here is a half-off coupon for those of you who want to check it out (should be good for about 2 weeks): 
Michael Witbrock's profile photo
+ann this may be of interest to you

John Mount

Shared publicly  - 
I have said positive things about Apple Macs/OSX in the past- but I am beginning to regret that more and more often.

Now Spotlight no longer ever pretends to work in OSX Yosemite.  I place the Finder one directory above a directory called DataScienceCourse a search on my Mac swears there is no such directory anywhere on the machine.  The exact same search finds the directory when restricted to the parent folder.  This is after following all sorts of online guides how to delete all Spotlight content to force re-index.
Andrej Bauer's profile photo
I have found that spotlight finds folders if you tell it to look for a folder, i.e, instead of "DataCourse" I would write "DataCourse folder".

John Mount

Shared publicly  - 
Incredibly proud of this moment!
In his circles
124 people

John Mount

Shared publicly  - 
Deal of the Day March 15: Half off my book Practical Data Science with R. Use code dotd031515au at

John Mount

Shared publicly  - 
The Win-Vector LLC value pack!

Half off Introduction to Data Science video course:

10% off Practical Data Science with R book

Free in-depth blog content:

And Win-Vector LLC consulting services:

John Mount

Shared publicly  - 
I don't just write about ghosty folklore. I write about folk theorems, too.
It's a folk theorem I sometimes hear from colleagues and clients: that you must balance the class prevalence before training a classifier. Certainly, I believe that classification tends to be easier when the classes are nearly balanced, especially when the class you are actually interested in is ...
View original post

John Mount

Shared publicly  - 
Deal of the Day February 24: Half off my book Practical Data Science with R. Use code dotd022415au at

John Mount

Shared publicly  - 
A very long very technical analysis of an important machine learning algorithm on my part.

How sure are you that large margin implies low VC dimension?
How sure are you that large margin implies low VC dimension (and good generalization error)? It is true. But even if you have taken a good course on machine learning you many have seen the actual proof (with all of the caveats and conditions). I worked through the literature proofs over the ...
Francisco Pereira's profile photoKen Bury's profile photo
Thank you! Earlier this week I found myself hand-waving about this to a very bright student, and I'm glad that I can now point him to your notes.
In his circles
124 people
Principal Consultant, Win-Vector LLC
  • Win-Vector LLC
    Principal Consultant, present
I produce applied research, prototyping and training in information extraction, algorithms and data-mining for web-scale businesses, hedge funds and start ups. Right now I do this as a consultant at Win-Vector LLC. 

Please check out our book Practical Data Science with R

Also check out the Win-Vector LLC blog our Twitter feed .
Basic Information
John Mount's +1's are the things they like, agree with, or want to recommend.
Estimating Generalization Error with the PRESS statistic

As we’ve mentioned on previous occasions, one of the defining characteristics of data science is the emphasis on the availability of “large”

Factors are not first-class citizens in R

The primary user-facing data types in the R statistical computing environment behave as vectors. That is: one dimensional arrays of scalar v

Frequentist inference only seems easy

Two of the most common methods of statistical inference are frequentism and Bayesianism (see Bayesian and Frequentist Approaches: Ask the Ri

R style tip: prefer functions that return data frames

While following up on Nina Zumel’s excellent Trimming the Fat from glm() Models in R I got to thinking about code style in R. And I realized

Trimming the Fat from glm() Models in R

One of the attractive aspects of logistic regression models (and linear models in general) is their compactness: the size of the model grows

Save 45% on Practical Data Science with R (expires May 21, 2013)

Please share this generous deal from Manning publications: save 45% on Practical Data Science with R through May 21, 2014. Please tweet, for

R has some sharp corners

R is definitely our first choice go-to analysis system. In our opinion you really shouldn’t use something else until you have an articulated

Some R Resources for GLMs

by Joseph Rickert Generalized Linear Models have become part of the fabric of modern statistics, and logistic regression, at least, is a “go

You don’t need to understand pointers to program using R

R is a statistical analysis package based on writing short scripts or programs (versus being based on GUIs like spreadsheets or directed wor

Oldies but Goldies: Statistical Graphics Books

I just wanted to plug for three classical books on statistical graphics that I really enjoyed reading. The books are old (that is, older tha

I can haz buzzwords?

Catty title aside, this post takes a good swing at defining terms we hear thrown around about data these days and they mostly do a good job.

Practical Data Science with R October 2013 update

A quick status update on our upcoming book “Practical Data Science with R” by Nina Zumel and John Mount. We are really happy with how the bo

[Book] Practical Data Science with R

Nina Zumel and John Mount have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has

Prefer = for assignment in R

We share our opinion that = should be preferred to the more standard <- for assignment in R. This is from a draft of the appendix of our upc

Win-Vector Blog » How to outrun a crashing alien spaceship

Hollywood movies are obsessed with outrunning explosions and outrunning crashing alien spaceships. For explosions the movies give the optima

Allen Bushnell, Fish Rap: Salmon on the prowl near the shore in Monterey...

Allen Bushnell Fish Rap The weather forecasts a 4- to 6-foot northwest swell this weekend, but that shouldnt slow fishing down.