A meta-analysis paper looking at the association of published and random gene
sets with Breast Cancer (outcome). They find that you can choose a
random set of genes and find--in cases where there are > 100 genes
involved--that 90% of those sets are "associated" with Breast Cancer outcome.
This is another very interesting paper related to how we do our statistical
analyses. It's essentially the problem of correlation != causation.
And moreso, that (from the paper):
the question is not whether a given set of genes is related to survival, but
whether it is more related to survival than random sets of genes
Because of this problem they find that their random gene sets and those with
un-related processes--including "postprandial laughter", are associated with
breast cancer outcome. And, 28 of 47 published studies had association that
was not stronger than expected by chance.
Only 18 of those 47 were more significant than all but 5% of the random
signatures (p < 0.05).
This problem should be somewhat specific to Cancer because it causes [sic]crazy expression changes in lots of genes.