Profile cover photo
Profile photo
Scott B. Weingart
Scott B.'s posts

Post has shared content
New open-access mandates in the US.

The US federal government has a new set of OA mandates. They were adopted as part of the new Consolidated Appropriations Act of 2014 (HR 3547), which was introduced in Congress two days ago, passed by the House the same day, passed by the Senate the next day, and signed by President Obama today, less than one hour ago.

The new OA mandates apply to all agencies in the Departments of Education, Health and Human Services, and Labor, that spend $100 million or more per year on research and development. I'll post a list of covered agencies as soon as I find a good one.

Here's the new law in THOMAS.

Here's the OA language:

Sec. 527. Each Federal agency, or in the case of an agency with multiple bureaus, each bureau (or operating division) funded under this Act that has research and development expenditures in excess of $100,000,000 per year shall develop a Federal research public access policy that provides for

(1) the submission to the agency, agency bureau, or designated entity acting on behalf of the agency, a machine-readable version of the author's final peer-reviewed manuscripts that have been accepted for publication in peer-reviewed journals describing research supported, in whole or in part, from funding by the Federal Government;

(2) free online public access to such final peer-reviewed manuscripts or published versions not later than 12 months after the official date of publication; and

(3) compliance with all relevant copyright laws.

Here's how the new OA rules compare to three earlier, well-known federal OA policies (two actual and one potential):

Compared to the NIH policy (April 2008) <>, the new policies are:

--the same in not allowing the maximum embargo to be longer than 12 months
--the same in applying to all work funded "in whole or in part" by a covered federal agency
--the same in applying to the author's peer-reviewed manuscript 
--the same in not requiring OA for data
--the same in not requiring reuse rights or open licenses
--the same in requiring deposit in repositories (green OA), not submission to OA journals (gold OA)
--the same in resting on enacted legislation
--stronger by applying to more agencies

Compared to the Obama White House directive (February 2013) <> the new policies are:

--stronger by not allowing the maximum embargo to be longer than 12 months (when stakeholders provide evidence that it ought to be longer)
--weaker by not allowing the maximum embargo to be shorter than 12 months (when stakeholders provide evidence that it ought to be weaker)
--stronger by applying to all work funded "in whole or in part" by a covered federal agency
--the same in applying to the author's peer-reviewed manuscript 
--weaker by not applying to data
--weaker by not requiring reuse rights or open licenses
--the same in requiring deposit in repositories (green OA), not submission to OA journals (gold OA)
--stronger by being enacted legislation
--weaker by appling to just one fiscal year (though it could later be modified to become permanent, as happened to the NIH policy)
--weaker by applying to fewer agencies

Compared to the Fair Access to Science and Technology Research Act (FASTR) (February 2013) <>, the new policies are:

--weaker by allowing embargoes up to 12 months (FASTR caps embargoes at 6 months)
--the same in applying to all work funded "in whole or in part" by a covered federal agency
--the same in applying to the author's peer-reviewed manuscript 
--the same in not applying to data
--weaker by not requiring reuse rights or open licenses
--the same in requiring deposit in repositories (green OA), not submission to OA journals (gold OA)
--stronger by being enacted legislation
--weaker by applying to just one fiscal year (though it could later be modified to become permanent, this happened to the NIH policy)
--weaker by applying to fewer agencies

BONUS. The new appropriations act marginalizes the Tea Party.

#oa #openaccess #fastr #obama_directive  

Post has shared content

Post has shared content
A fascinating study of productivity after receiving the highest prize in a field (in this case, the Fields medal).  I can't agree with the stated interpretation of the results - it's a classic economic explanation, more or less "having received the top prize, there's little more incentive to work hard".  Many other explanations are also consistent with the raw data, and not discussed.  But the raw data itself is fascinating - click through to see the graphs.

Here's some excerpts from the post:

"Prizes and rewards are designed to produce more effort, to give people something to strive towards. But what happens once they actually get it? According to a new study by Harvard's George Borjas and Notre Dame's Kirk Doran of recipients of the Fields Medal, the most prestigious prize in mathematics, winning big actually kills productivity.

Mathematicians who win it publish far less in the years afterwards than similarly brilliant "contenders"—highly cited mathematicians who won other prestigious awards before the age of 40 (the cutoff for the Fields), but not the prize itself. The prize is awarded every four years to two, three, or four mathematicians. It goes to show that major awards and recognition can have unintended consequences.  

The drop off is pretty massive [see graph in the article]...

The authors did find one surprising positive effect. Though they publish less, winners also take more risks in the future. They've already reached the pinnacle of their fields, so they feel free to pursue moonshots, new areas of mathematics that they think are fascinating or vital. The risk is quite large. The winners know they're capable of doing extraordinary work in a particular area. Moving outside of it makes future results far less certain. Their particular gifts or talent might not translate well, and they're going to have to learn new skills and an entirely new body of research in the new area, which is all very time consuming. 

The researchers term movement outside the core field, "cognitive mobility," and its increase explains about half of the drop off in productivity for medal winners. The prize frees them up in an extremely significant way. In the years leading up to the medal year, the likelihood of a mathematician straying from their comfort zone is very rare, at just 5 percent. For prize-winners, the rate quintuples to 25 percent."

+Moshe Vardi

Post has shared content
I debuted my cyballs (cyborg juggling balls) at the Museum of Math's MOVES conference on Monday.  The balls got a little confused about what pattern I was doing, but at least the hardware worked.  It's all pretty alpha-stage.  Here's a picture by Colm Mulcahy:

And here's a low-quality demo video (if you turn the sound on you can hear the computer speak the throws, though at this stage there's a several-beat delay):

Post has shared content
Interpretation of recent pseudonymous novel with reference to the Yahoo music experiments.

Post has shared content

Post has shared content
Reed Elsevier, supporter of fair use.  (People interested in data mining may wish to hold them to their words here.)

Post has shared content
This will not end well for the journal in question:

"Dear Andrew Gelman

You are receiving this notice because you have published a paper with the American Journal of Public Health within the last few years. Currently, content on the Journal is closed access for the first 2 years after publication, and then freely accessible thereafter. On June 1, 2013, the Journal will be extending its closed-access window from 2 years to 10 years. Extending this window will close public access to your article via the Journal web portal, but public access will still be available via the National Institutes of Health PubMedCentral web portal.

If you would like to make your article available to the public for free on the Journal web portal, we are extending this limited time offer of open access at a steeply discounted rate of $1,000 per article. If interested in purchasing this access, please contact Brian Selzer, Publications Editor, at

Additionally, you may purchase a Noncommercial Common Use License (NCUL) for $500. This license enables readers to use your article for noncommercial purposes without the need to purchase permissions, and it also permits free reproduction of your article. The NCUL does NOT permit reproduction in commercial products such as book chapters or Journal articles. Permission must still be purchased for such use. If interested, please contact Brian Selzer, Publications Editor, at


Brian Selzer
Publications Editor
American Public Health Association"

Post has shared content
"At what sample size do correlations stabilize?", Schönbrodt & Perugini 2013

Sample correlations converge to the population value with increasing sample size, but the estimates are often inaccurate in small samples. In this report we use Monte-Carlo simulations to determine the critical sample size from which on the magnitude of a correlation can be expected to be stable. The necessary sample size to achieve stable estimates for correlations depends on the effect size, the width of the corridor of stability (i.e., a corridor around the true value where deviations are tolerated), and the requested confidence that the trajectory does not leave this corridor any more. Results indicate that in typical scenarios the sample size should approach 250 for stable estimates.

...Consider, for example, a correlation of r = .40 in a sample of 25 participants. This correlation is significantly different from zero (p=.047). Hence, it might be concluded with some confidence that there is "something > 0" in the population, and the study would be counted as a success from the NHST perspective. However, plausible values of the true correlation ρ, as expressed by a 90% confidence interval, range from .07 to .65. The estimate is quite unsatisfactory from an accuracy point of view – in any scenario beyond the NHST ritual it will make a huge difference whether the true correlation in the population is .07, which would be regarded as a very small effect in most research contexts, or .65, which would be a very large effect in many contexts. Moreover, precise point estimates are relevant for a priori sample size calculations. Given the huge uncertainty in the true magnitude of the effect, it is hard to determine the necessary sample size to replicate the effect (e.g., for an intended power of 80% and ρ = .07: n = 1599 [!]; ρ = .40: n = 46; and for ρ = .65: n = 16).

...the following empirical example demonstrates. Multiple questionnaire scales have been administered in an open online study (Schönbrodt & Gerstenberg, 2012; Study 3). The thick black line in Figure 1 shows the evolution of the correlation between two scales, namely “hope of power” and “fear of losing control” when after each new participant the correlation is recalculated. It can be seen that the correlation evolved from r = .69 (n = 20, p < .001) to r = .26 (n = 274, p < .001). From a visual inspection, the trajectory did not stabilize up to a sample size of around 150. Data have not been rearranged – it is simply the order how participants dropped into the study. Some other correlations in this data set evolved from significantly negative to non-significant, others changed from one significant direction into the significant opposite, and some correlations were stable right from the beginning with only few fluctuations around the final estimate. But how do we get to know when a correlation estimate is sufficiently stable?
Figure 1: Actual (thick black line) and bootstrapped (thin gray lines) trajectories of a correlation. The dotted curved lines show the 95% confidence interval for the final correlation of r = .26 at each n. Dashed lines show the ± .1 corridor of stability (COS) around the final correlation. The point of stability (POS) is at n = 161. After that sample size the actual trajectory does not leave the COS.

...To assess the variability of possible trajectories, bootstrap samples of the final sample size can be drawn from the original raw data, and the evolutions of correlation for the new data sets are calculated. Figure 1 shows some exemplary bootstrapped trajectories. It can be seen that some trajectories start well above the final value (as the original trajectory), some start even with a significant negative value, and some start already within the COS without ever leaving it.

...The desired width of the corridor depends on the specific research context (see Figure 1 for a COS with w = .10). In this paper, three widths are used: ± .10, ± .15, and ± .20. Following the rules of thumb proposed by Cohen (1992), a value of .10 for w corresponds to a small effect size. Hence, if the sample correlation r stays within a corridor with w = ± .10, the resulting deviations only have a small effect size.

...In an analysis of 440 large-scale real world data sets in psychology only 4.3% could be considered as reasonable approximations to a Gaussian normal distribution (Micceri, 1989). Hence, deviations from normality are rather the rule than the exception in psychology. [Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166. doi:10.1037/0033-2909.105.1.156 ]

...If Table 1 should be boiled down to simple answers, one can ask what effect size typically can be expected in personality. In a meta-meta-analysis summarizing 322 meta-analyses with more than 25'000 published studies in the field of personality and social psychology, Richard, Bond, and Stokes-Zoota (2003) ["One Hundred Years of Social Psychology Quantitatively Described",%202003.pdf ] report that the average published effect is r = .21, less than 25% of all meta-analytic effects sizes are greater than .30, and only 5.28% of all effects are greater than .50. Hence, without any specific prior knowledge it would be sensible to assume an effect size of .214. Further let's assume that a confidence level of 80% is requested (a level that is typically used for statistical power analyses), and only small effect sizes (w < .10) are considered as acceptable fluctuations. By applying these values on Table 1 the required sample size is around n = 238.
Of course, what is a meaningful or expected correlation can vary depending on the research context and questions. In some research contexts even small correlations of .10 might be meaningful and with consequential implications. In this case, larger samples are needed for stable correlations. In other research contexts the expected correlation can be greater (e.g., convergent validity between different measures of the same trait) or the researcher is willing to accept a slightly less stable estimate, perhaps compensating with an increased level of confidence. This would reduce the necessary sample size. But even under these conditions there are few occasions in which it may be justifiable to go below n = 150 and for typical research scenarios reasonable trade-offs between accuracy and confidence start to be achieved when n approaches 250.

Post has shared content
Neat idea: "One person a day wins a chance to write to the growing list of subscribers. It could be you."
Wait while more posts are being loaded