Simple explanation of meta-analysis; below is a copy of my attempt to explain basic meta-analysis on the DNB ML. I thought I might reuse it elsewhere, and I'd like to know whether it really is a good explanation or needs fixing.

---

Hm, I don't really know of any such explanation; there's Wikipedia, of course: http://en.wikipedia.org/wiki/Meta-analysis

Meta-analyses usually presume you know what an 'effect size' is. This is different from stuff like p-values. A sort of summary is that p-values say whether there is a difference between the control and experiment, while effect sizes say how big the difference is.

Each study gives you an effect size, based on the averages and standard deviation (how variable or jumpy the data is). What do you do with 10 effect sizes? How do you combine or add or aggregate them? That's where meta-analysis comes in.

Well, you could just treat each as a vote: if 6 of the effect sizes are positive, and 4 are negative, then declare victory. There's an effect of X size.

But what if some of the effects are huge, like 0.9, and all the others are 0.1? If we just vote, we get 0.1 since that's the majority. But is 0.1 really the right answer here? Doesn't seem like it.

So instead of voting, let's average! We add up the 10 studies and get something like +5; then divide by 10 and get 0.5 as our estimate. Much more reasonable: 0.9 seems too high like they may be outliers, but 0.1 is kind of weird since we did get some 0.9s; split the difference.

But studies don't always have the same number of subjects, and as we all know, the more subjects or data you have, the better an estimate you have of the true value. A study with 10 students in it is worth much less than a study which used 10,000 students! A simple average ignores this truth.

So let's weight each effect size by how many subjects/datapoints it had in it: the effect size from the study with 10 students is much smaller\* than the one from 10,000 students. So now if the first 9 studies have ~10 datapoints, and the 10th study has 1000 datapoints, those 9 count as, say, 1/10th\* the last study since they totaled ~100 to its 1,000.

So each effect size gets weighted by how many datapoints went into making it, and then they're essentially averaged together to give One Effect Size To Rule Them All.

Then you can start looking at other questions like confidence intervals (this One Effect Size is not exactly right, of course, but how far away is it from the true effect size?), heterogeneity (are we comparing apples and apples? or did we include some oranges), or biases (funnel plots and trim-and-fill: does it look like some studies are missing?)

In the case of the [DNB meta-analysis](http://www.gwern.net/DNB%20FAQ#meta-analysis), we can look at the One Effect Size over all studies which was something like 0.5. But some studies are high and some are low; is there any way to predict which are high and low? Is there some characteristic that might cause the effect sizes to be high or low? I suspected that there was: the methodological critique of active vs passive control groups. (I actually suspected this before the Melby meta-analysis came out, which did the same thing over a larger selection of WM-related studies.)

So I subcategorize the effect sizes from active control groups and the ones with passive control groups, and I do 2 smaller separate meta-analyses on each category. Did the 2 smaller meta-analyses spit out roughly the same answer as the full meta-analysis? No, they did not! They spat out quite different answers: studies with passive control groups found that the effect size was large, and studies with active control groups found that the effect size was small. This serves as very good evidence that yes, the critique is right, since it's not that likely that a random split of studies would separate them so nicely.

And that's the meat of my meta-analysis. I hope this was helpful?

\* how much smaller? Well, that's where statistics comes in. It's not a simple linear sort of thing: 100 subjects is not 10x better than 10 subjects, but less than 10x better. Diminishing returns.
Shared publiclyView activity