"Epidemiology, genetics and the ‘Gloomy Prospect’: embracing randomness in population health research and practice", Smith 2011; excerpts:
"Epidemiologists aim to identify modifiable causes of disease, this often being a
prerequisite for the application of epidemiological findings in public health pro-
grammes, health service planning and clinical medicine. Despite successes in
identifying causes, it is often claimed that there are missing additional causes
for even reasonably well-understood conditions such as lung cancer and coronary
heart disease. Several lines of evidence suggest that largely chance events, from
the biographical down to the sub-cellular, contribute an important stochastic
element to disease risk that is not epidemiologically tractable at the individual
level. Epigenetic influences provide a fashionable contemporary explanation for
such seemingly random processes. Chance events—such as a particular lifelong
smoker living unharmed to 100 years—are averaged out at the group level. As a
consequence population-level differences (for example, secular trends or differ-
ences between administrative areas) can be entirely explicable by causal factors
that appear to account for only a small proportion of individual-level risk. In
public health terms, a modifiable cause of the large majority of cases of a disease
may have been identified, with a wild goose chase continuing in an attempt to
discipline the random nature of the world with respect to which particular indi-
viduals will succumb. The quest for personalized medicine is a contemporary
manifestation of this dream. An evolutionary explanation of why randomness
exists in the development of organisms has long been articulated, in terms of
offering a survival advantage in changing environments. Further, the basic notion
that what is near-random at one level may be almost entirely predictable at a
higher level is an emergent property of many systems, from particle physics to the
social sciences. These considerations suggest that epidemiological approaches will
remain fruitful as we enter the decade of the epigenome.
"We cannot imagine these diseases, they are called idiopathic, spontaneous in origin, but we know instinctively there must be something more, some invisible weakness they are exploiting. It is impossible to think they fall at random, it is unbearable to think it."
--James Salter, Light Years, 1975
Despite many suc-
cesses, even with respect to the most celebrated—
such as the identification of cigarette smoking as a
major cause of lung cancer and other chronic dis-
eases—it can appear that much remains to be done.
Consider Winnie, lighting a cigarette from the candles
on her centenary birthday cake, who, after 93 years
of smoking, is not envisaging giving up the habit
(Figure 1).
...loom large in the popular imagination 2 and are
reflected in the low positive predictive values and C
statistics in many formal epidemiological prediction
models. In general, epidemiologists do a rather poor
job of predicting who is and who is not going to de-
velop disease.
This apparent failing of epidemiology has long been
recognized. Writing about ischaemic heart disease
(IHD) 40 years ago, Tom Meade and Ranjan
Chakrabarti reported that ‘within any risk group, pre-
diction is poor; it is not at present possible to express
individual risk more precisely than as about a 1 in 6
chance of a hitherto healthy man developing clinical
IHD in the next 5 years if he is at high risk’. 3
I have certainly promulgated such views in the
(usually unsuccessful) pursuit of pounds or dollars,
although the exact percentage of ‘explanation’ by es-
tablished causes would fall and rise in relation to
degree of desperation. The most feted contemporary
candidate for better prediction is probably genetics.
With the perception (in my view exaggerated) that
genome-wide association studies (GWASs) have
failed to deliver on initial expectations, 5 the next
phase of enhanced risk prediction will certainly shift
to ‘epigenetics’ 6,7 —the currently fashionable response
to any question to which you do not know the
answer.
Exposures
of this kind are, in the terminology popularized
within behavioural genetics, shared (or common) en-
vironmental factors. It is therefore perhaps surprising
that the groundbreaking 1987 paper by Robert Plomin
and Denise Daniels, 16 ‘Why are children in the same
family so different from one another?’, recently re-
printed with commentaries in the IJE, 17–21 has appar-
ently had little influence within epidemiology. The
implication of the paper—which expanded upon an
earlier analysis 22 —was that, genetics aside, siblings
are little more similar than two randomly selected
individuals of roughly the same age selected from
the source population that the siblings originate
from. This may be an intuitive observation for many
people who have siblings themselves or have more
than one child. Arising from the field of behavioural
genetics, the paper focused on measures of child be-
haviour, personality, cognitive function and psycho-
pathology, but, as Plomin points out, the same basic
finding is observed for many physical health out-
comes: obesity, cardiovascular disease, diabetes,
peptic ulcers, many cancers, asthma, longevity and
various biomarkers assayed in epidemiological stu-
dies. 18 These findings come from studies of twins,
adoptees and extended pedigrees, in which the vari-
ance in an outcome is partitioned into a genetic com-
ponent, the contribution of common environment (i.e.
that shared between people brought up in the same
home environment) and the non-shared environment
(i.e. exposures that are not correlated between people
brought up in the same family). The shared environ-
ment—which is the domain of many of the exposures
of interest to lifecourse epidemiologists—is reported
to make at best small contributions to the variance
of most outcomes. The non-shared environment—ex-
posures which (genetic influences apart) show no
greater concordance between siblings than between
non-related individuals of a similar age from the
same population—constitute by far the dominant
class of non-genetic influences on most health and
health-related outcomes (Box 1). Table 1 presents
data from a large collaborative twin study of 11
cancer sites, with universally large non-shared envir-
onmental influences (58–82%), heritabilities in the
range 21–42% (excluding uterine cancer, for which a
value of 0% is reported) and smaller shared environ-
mental effects, zero for four sites and ranging from
5% to 20% for the remainder. 23 Many other diseases
show a similar dominance of non-shared over shared
environmental influences. 18 Indeed, a greater non-
shared than shared environmental component ap-
pears to apply to some, 24–28 although not all, 29
childhood-acquired infections and the diseases they
cause. This is such a counter-intuitive observation
that one commentator on an earlier draft of this
paper used childhood infectious disease epidemiology
as an example of a situation in which the shared en-
vironment must be dominant.
...However, as Neven
Sesardic points out, even within behavioural genetics
the central, rather momentous, finding regarding the
apparently small or non-existent contribution of
family background to child outcomes went under-
appreciated; it was ‘an explosion without a bang’. 19
For epidemiologists, the fact that the generally small
shared environmental influences on many outcomes
appeared to get even smaller (or disappear com-
pletely) with age—as is seen, for example, with re-
spect to body mass index and obesity 31 —increases the
relevance of the message, since later life health out-
comes are often what we study. Yet, within epidemi-
ology, the impact of this work has been minimal; of
the 607 citations of the Plomin and Daniels paper on
ISI Web of Science (as of May 2011), only a handful
fall directly within the domain of epidemiology or
population health. In the recent book, Family Matters:
Designing, Analysing and Understanding Family-based
Studies in Lifecourse Epidemiology, 32 the issue is barely
touched upon; the balanced one page it receives near
the end of the 340-page book being perhaps too little,
too late. 33 Between-sibling studies as a way of con-
trolling for potential confounding have been widely
discussed within epidemiology, both in the book in
question 34 and elsewhere. 35,36 Certainly, this is a
useful method for taking into account shared aspects
of the childhood environment. But if shared environ-
ment has little impact on many outcomes then, on
the face of it, the approach might be missing the
issue of real concern—the more important non-
shared environmental factors. Despite this, the use
of sibling controls sometimes appears to uncover
substantial confounding. For example, maternal
smoking during pregnancy was found in a large
Swedish study to be associated with lower offspring
IQ, even after adjustment for many potential con-
founding factors. 37 In a between-sibs comparison,
however, there was no association of maternal smok-
ing with IQ of offspring, which the authors inter-
preted as indicating that the association seen for
unrelated individuals was due to residual confound-
ing. If shared environment is of such little import-
ance, how can it generate meaningful confounding
in epidemiological studies? We will return to this
issue later.
An extensive research programme in the behavioural
and social sciences consequent on the Plomin and
Daniels review focused on the direct assessment of
effects of the systematic aspect of the non-shared en-
vironment. Instruments were developed to collect de-
tailed data on sibling-specific parenting practice, sib–
sib interactions and the influence of schools and peer
groups, and studies including more than one child per
family were explicitly established to allow investiga-
tion of why siblings differ. However, a decade ago,
a meta-analytical overview of such studies concluded
that there was little direct evidence of important in-
fluences of specific non-shared environmental charac-
teristics on behavioural and social outcomes mainly
assessed during the first two decades of life. 38 At
best, only small proportions of the phenotypic vari-
ance attributed to the non-shared environment
related to directly measured influences. The effects
were rarely statistically robust and the median value
of the proportion of variation accounted for was 3%.
In the behavioural genetic studies, estimates of the
proportion of the overall phenotypic variance ac-
counted for by the non-shared environment are
almost always over 50%, and often substantially so;
similar findings apply to cancers (Table 1). There are
more optimistic assessments of the current status of
studies directly assessing the effects of non-shared
environment, 18,39 but in these the magnitude of the
effects appears small. In an example presented in
Plomin’s assessment of three decades of research on
this issue 18 non-shared aspects of maternal negativity
does have a statistically robust association with off-
spring depressive symptoms, but accounts for only
around 1% of the variance. 40
Systematic aspects of the non-shared
environments of adults that have large effects on dis-
ease outcomes may await identification. However, the
inability to identify such effects using intensive as-
sessments of exposure and outcomes in childhood is
sobering. Furthermore, in longitudinal twin studies,
in which twin pairs have repeat assessments, the gen-
eral finding is that the non-shared environmental
variance at one age overlaps little with that at a
later age—i.e. there appear to be unique and largely
uncorrelated factors acting at different ages. For ex-
ample, with respect to body mass index, the
non-shared environmental components at age 20,
48, 57 and 63 years are largely uncorrelated with
each other. 52 This suggests that exposures contribut-
ing to non-shared environmental influences are often
unsystematic and of a time- or context-dependent
nature. Similar findings have emerged from studies
of various other outcomes, with non-shared environ-
mental influences contributing little, if anything, to
tracking of phenotypes over time. 53 A distinction
can be drawn between the stable and unstable aspects
of the non-shared environment, with studies tending
to point to the latter as being of more statistical
importance in terms of explaining variance in the
distribution of disease risk. This is a crucial issue,
since some environmental exposures which are
partly non-shared in adulthood (such as cigarette
smoking and occupational exposures) tend to track
over time—and thus be stable components of the
non-shared environment.
Currently, there is largely an absence of evidence—
rather than evidence of absence—of directly assessed
systematic non-shared environmental influences on
health, and little active research in the biomedical
field. However, as the phenotypic decomposition of
variance shows similar patterns in the medical, be-
havioural and social domains, it seems prudent to
assume that similar causal structures exist, and
equivalent conclusions should be drawn: a large com-
ponent of variation in health-related traits cannot be
accounted for by measureable systematic aspects of
the non-shared environment.
Many features of twin study ana-
lysis can be problematic. For example, twin study
analysis often assumes that genetic contributions are
additive, and that genetic dominance (in the classic
Mendelian sense) or gene–gene interactions (epista-
sis) do not contribute to the genetic variance. Such
an assumption can lead to under-estimation of the
shared environmental component. 55–57 Conversely,
twin studies also assume no assortative mating (i.e.
parents are no more genetically similar than if ran-
domly sampled from the population) and no gene–
environment covariation, both of which can lead to
over-estimation of the shared environmental compo-
nent. 55 Different study designs for estimating compo-
nents of phenotypic variation make different
assumptions, however. Conventional twin studies,
studies of twins reared apart, extended twin-family
studies (in which other family members are
included), other extended pedigree studies and adop-
tion studies (including those in which there is quasi-
random assignment of particular adoptees) generally
come to the same basic conclusions about the relative
magnitude of these components. 58 All these designs
have been applied to the study of body mass index
and obesity, with the findings indicating roughly the
same magnitude of heritability. 55,59–64 This makes it
less likely that these are seriously biased, because dif-
ferent biases would all have to generate the same ef-
fects, which is not a plausible scenario.
With respect to the ‘missing heritability’, to take the
example of height—referred to by both Plomin 18 and
Turkheimer 25 —the estimate of the proportion of her-
itability explained by identified variants they give, of
<5%, has already increased to 410%, 65 and directly
estimated heritability (relating phenotypic similarity
to stochastic variation in the proportion of the
genome shared between siblings) indicates similar
heritabilities to those seen in twin studies. 66
Genome-wide prediction using common genetic varia-
tion across the genome also points to the effects of
measured genetic variation moving towards the
expectation from conventional heritability estimates. 67
Such data suggest there are large numbers of variants
as yet not robustly characterized that are contributing
to the heritability of height, with rare variants not
identifiable through GWAS probably accounting for
much of the remainder...In summary, it
seems improbable that heritability has been substan-
tially over-estimated at the expense of shared envir-
onment. The basic message that a larger non-shared
than shared environmental component to phenotypic
variance is the norm is unlikely to be overturned.
Shared environmental effects, although generally
small, are more substantial for some outcomes,
including musical ability 69 and criminality in adoles-
cents and young adults; 70 respiratory syncytial virus
infection, 29 anti-social behaviour, 53,71 mouth ulcers 72
and physical activity 73 in children and lung function
in adults. 74 Furthermore, findings with respect to
shared environmental contributions have face validity.
For example, in a twin study applying behavioural
genetic variance decomposition to behaviours, dis-
positions and experiences, shared environmental
effects were found for only 9 of the 33 factors inves-
tigated. 75 However, they were identified for those as-
pects of life that would appear to depend on shared
family characteristics, for example, for a child being
read to by a parent, but not for the child reading
books on their own. Similarly, the number of years
a child had music lessons had a substantial shared
environmental component, as might be expected as
this will initially depend on the parents organizing
such lessons. Continuing to play an instrument into
adulthood, however did not have an identified shared
environmental contribution.
Shared environ-
ment can be addressed through analysis of spousal
similarities in health outcomes, as environments are
shared to an extent by cohabiting couples, and these
also yield what on the face of it are rather small effect
estimates. For example, the cross-spousal correlation
for body mass index does not change from when cou-
ples initially come together (reflecting assortative
mating) over many years of them living together in
an at least partially shared environment. 61
Of most relevance to epidemiological approaches,
however, is that models generally fix the shared en-
vironmental component to zero if it is not ‘statistically
significantly’ different from zero. This is evident
in Table 1; with respect to pancreatic cancer, for ex-
ample, the shared environmental component is given
as 0, with a 95% confidence interval (CI) 0–0.35 (i.e.
the upper limit being 35% of phenotypic variance). In
many cases, it is simply stated that these studies find
no effect of shared environmental influences, even
though the findings are compatible with quite sub-
stantial contributions, but these cannot be reliably
estimated in the generally small samples available in
twin and adoption studies. Thus, a twin study of
aortic aneurysm reported that there was ‘no support
for a role of shared environmental influences’, 78 with
the 95% CI around the effect estimate being 0–27%. A
recent meta-analysis found that for various aspects of
child and adolescent psychopathology, shared envir-
onment makes a non-negligible contribution in ad-
equately powered analyses. 79 The claims of there
being ‘no shared environmental influence’, which
are often made (Box 2), might more realistically be
seen as an indication of inadequate sample size and
the fetishization of ‘statistical significance’. 80
The stochastic nature of phenotypic development is
something we should not be surprised to encounter
(Box 3). In his 1920 paper, ‘The relative importance of
heredity and environment in determining the piebald
pattern of guinea pigs’, Sewall Wright (Figure 2) pre-
sented a seminal path analysis (Figure 3), that has
frequently been cited as a source of this particular
statistical method. 87 Wright observed that ‘nearly all
tangible environmental conditions—feed, weather,
health of dam, etc., are identical for litter mates’; in
current terminology, they are part of the shared en-
vironment. Such factors were found to be of minor
importance; instead, most of the non-genetic variance
‘must be due to irregularities in development due to
the intangible sort of causes to which the word
chance is applied’. 87 Wright pointed out that meas-
urement error could not be separated from this intan-
gible variance, as is the case with non-shared
environment in current parlance. In a later paper, 88
Wright and his PhD student Herman Chase independ-
ently graded the guinea pig coat patterns, and demon-
strated that measurement error was only a minor
contributor (Figure 4). A summary table (Table 2)
included a shared environmental influence on litter-
mates—age of the mother—but the intangible vari-
ance dominated, with the estimate of the magnitude
of this being similar to estimates seen for the
contribution of the non-shared environment in rela-
tion to many human traits. 16 In humans, of course,
age of mother at conception could be a non-shared
environmental factor influencing differences between
siblings. In the inbred guinea pig strain, where gen-
etic differences were minor, heredity was not an issue,
and the intangible (‘non-shared environmental’) fac-
tors were even more dominant.
In genetically identical Caenorhabditis elegans reared in the same environments there are large differences in age-related functional declines, attributable to purely stochastic events. 89 In the case of genetically similar inbred laboratory rats, Klaus Gartner noted the failure to materially reduce variance for a wide variety of phenotypes, despite several decades of standardizing the environment. 90,91 Indeed, there was hardly any reduction in variance compared with that seen in wild living rats experiencing considerably more variable environments...Embryo splitting and transfer experiments in rodents and cattle demonstrated that the prenatal environment was also not a major source of phenotypic variation. 90,91 In genetically identical marbled crayfish raised in highly controlled environments considerable phenotypic differences emerge. 94 These and numerous other examples from over nearly a century 87,93–98 demonstrate the substantial contribution of what appear to be chance or stochastic events—which in the behavioural genetics field would fall into the category of non-shared environmental influences—on a wide range of outcomes.
If such a substantial role for chance exists in the
emergence of phenotypic (including pathological) pro-
files, why is this? One possible answer, with a long
pedigree, 121–123 is that it provides for evolutionary
bet-hedging. 124 Fixed phenotypes may be tuned to a
given environment, but in changing conditions a
phenotype optimized for propagation in one situation
may rapidly become suboptimal, 125 a proposition sup-
ported by experimental evidence. 126,12
Reflecting on their
demonstration of considerable phenotypic—including
epigenetic—differences between genetically identical
crayfish, they conclude that such variation may ‘act
as a general evolution factor by contributing to the
production of a broader range of phenotypes that
may occupy different micro-niches’. 94 The substantial
non-shared environmental contribution to many out-
comes could, therefore, include an element—perhaps
substantial—of random phenotypic noise, consequent
on stochastic epigenetic processes. At the molecular
level, the potential existence of such processes has
been observed within twin studies, with the formal
demonstration of non-shared environmental contribu-
tions to epigenetic profiles 130 and of substantial dif-
ferences in epigenetic markers between monozygotic
twins. 119
Other mechanisms can also contribute to pheno-
typic diversity, including meiotic recombination and
Mendelian assortment of genetic variants acting on
highly polygenic traits, with such genetic variants
having small individual effects. Mutation will also
increase phenotypic variation. Sibling contrast ef-
fects—siblings becoming less similar than their gen-
etic and shared environmental commonalities would
suppose—could also provide for such evolutionary
bet-hedging. 129 Although evidence supporting such a
process is sparse, it could lead to inflation of
non-shared environmental influences and deflation
of shared environment estimates from twin studies.
Most cases of lung cancer
are attributable to smoking, but many smokers do not
develop lung cancer. Thus, in the Whitehall Study of
male civil servants in London cigarette smoking ac-
counts for <10% of the variance (estimated as the
pseudo-R 2 ) 141 in lung cancer mortality. 102 At the
population level, however, smoking accounts for vir-
tually all of the variance—over 90% with respect to
lung cancer mortality over time in the USA, 142 and
virtually all of the differences in rates between areas
in Pennsylvania. 143 It is in relation to this large
contribution of smoking to the population burden of
lung cancer that <10% of variance accounted for by
cigarette smoking among individuals observed in pro-
spective epidemiological studies, and the 12% shared
environmental variance reported in Table 1, should be
considered. The shared environmental component will
in part reflect shared environmental differences in
cigarette smoking initiation. 144 The non-shared
environmental component (62% of the variance in
Table 1) will include the non-shared environmental
influence on initiation, amount and persistence of
smoking. 144 However, as discussed earlier, stable
aspects of the non-shared environment—which smok-
ing would tend to be—are generally small contribu-
tors to the total non-shared environmental effect,
and thus much of this will also reflect the substantial
contribution of the kinds of chance events—
These reflections will be unexceptional to epidemi-
ologists, as they merely illustrate a key point made by
Geoffrey Rose in his contributions to the theoretical
basis of population health 148,149 —that the determin-
ants of the incidence rate experienced by a population
may explain little of the variation in risk between
individuals within the population. Accounting for
incidence differs from understanding particular inci-
dents. Consider obesity in this regard; 150 its preva-
lence has increased dramatically over the past few
decades, yet estimates of the shared environmental
contributions to obesity are small. Clearly germline
genetic variation in the population has not changed
dramatically to produce this increase in obesity.
However, as Table 3 demonstrates, the prevalence of
obesity has increased in both genders, all ages, all
ethnic and socio-economic groups, and in both smo-
kers and non-smokers. 151 The most likely reason for
this is that there has been an across the board shift
in the ratio of energy intake to energy expenditure.
Study designs utilized to estimate heritability cannot
pick this up—twins, for example, are perfectly
matched by birth cohort. 150 Thus, although energy
balance may underlie the burden of obesity in a popu-
lation—and behind this, the social organization of
food production, distribution and promotion, together
with policies influencing transportation, urban plan-
ning and leisure opportunities—the determinants of
who, against this background, is obese within a popu-
lation could be largely dependent on a combination of
genetic factors and chance...Rose illustrated this point with the thought experi-
ment of a population in which all the individuals
smoke 20 cigarettes a day, in which ‘clinical, case–
control and cohort studies alike would lead us to
conclude that lung cancer was a genetic disease;
and in one sense that would be true, since if everyone
is exposed to the necessary agent, then the distribu-
tion of cases is wholly determined by individual sus-
ceptibility’. 134
Even Francis
Galton—the sometime bogeyman of the eugenics movement—wrote ‘Nature prevails enormously over nur-
ture when the differences of nature do not exceed what is commonly to be found among persons of the
same rank of society and in the same country’. 189 In other words, the contribution of genetic inheritance to
differences within a population is large when there is limited environmental variation between people within
a particular context. If the context were broadened, the contribution of such environmental factors would be
greater. Heritability is not a fixed characteristic, nor does high heritability within a particular situation
indicate that environmental change cannot lead to dramatic modification of outcomes. Height—the topic
of much of Galton’s own work—is both highly heritable and highly malleable, as changes over time in
height make clear. 190 Wilhelm Johannsen, the coiner of the term ‘gene’ recognized that in a genetically
highly homogeneous group ‘hereditary may be vanishingly small within the pure line’, 191 and that in this
situation ‘all the variations are consequently purely somatic and therefore non-heritable’. 191 Conversely, in a
highly standardized environment, the contribution of genetic factors will be increased. It is traditional in
epidemiological and related fields to hark back to such trusted thought experiments as how phenylketonuria
(PKU) would be expressed against the background of different levels of phenylalanine intake within popu-
lations, to demonstrate that the same outcome can be 100% heritable and 100% environmental in different
contexts. 5,192–197 The point is well made that the presence of a clear genetic predisposition does not mean
that environmental change cannot have major effects on disease risk. Perhaps reflecting the contested nature
of this area, however, public health academics are sometimes asymmetrical in their reasoning, and after
having presented the clear example of PKU they then claim that secular trends and migrant studies—with
their unambiguous demonstrations of environmental influences on disease—provide arguments against
strong genetic predisposition to common disease. 5 This is equivalent to saying that the clear demonstration
that genetic lesions underlie PKU in permissive environments argues against any major environmental
contribution to PKU.
A second popular thought experiment relates to the possession of two eyes or two legs. The reason
humans are almost always born with two of each is genetically determined. However, within a population
the trait would not be highly heritable—and certainly not 100% heritable—with loss of a leg or eye generally
reflecting accidental events. The distinction between explaining individual trajectories (genes are responsible
for the development of two eyes and two legs) and variation in a population is clear, and reflects the
distinction between ‘who?’ (why does one person have a disorder or problem rather than another?) and
‘how many’ (what proportion of the population are affected?) questions. 198
Within sociology, for example,
the perhaps under-appreciated role of chance has been emphasised, 206 illustrated with entertaining examples
from the sporting world. A striking example of what is known as Stein’s paradox in statistics is that within-
season prediction of the end of season batting averages for particular baseball players is generally better if
strongly weighted towards the average of all players at that stage in the season. 207 The best guess at what
will happen to an individual can often be made by largely discounting individual characteristics. The popular
recognition of the importance of chance in people’s lives 164 can also influence response to cultural artefacts.
Thus in films, novels or plays explanation of events is often near-deterministic, which in certain circum-
stances appears satisfying. Consider Alfred Hitchcock’s film Marnie. The behaviour of the eponymous
character—fear of thunderstorms, the colour red and men, together with her thieving and frigidity—is all
explained at the end of the film by a particular event occurring when Marnie was six. She discovered her
prostitute mother with a client during a thunderstorm and ended up killing him (in a cinematic shock of
bright red blood) with a poker. Everything seamlessly rolled on from this event. In crime stories this is often
what the reader wants. As Stephen Kern entertainingly demonstrates 208 the range of causal models in such
narratives has a similar range to epidemiology—from the long-arm of early life (or prenatal) events through
to primarily psychological and social causation. Outside of murder novels, however, the factitious nature of
such explanations can be entirely unsatisfactory. The apparent reality of the well-told narrative appears
unreal precisely because everything is tied up and explained—a notion that has resonance with David
Shield’s literary manifesto Reality Hunger. 209 To take one example, the clunking plots of the novels of Ian
McEwan—Saturday for example—revolve around such faux ‘explanations’. The work of McEwan—and simi-
lar purveyors of book club fare, such as Jonathan Franzen—appear, paradoxically, much less true than such
novels as Laurence Sterne’s Tristram Shandy, Macado de Assis’ Epitaph of a Small Winner, Blaise Cendrars’
Moravagine or Alasdair Gray’s Lanark, which are apparently not seeking such realism. In these works expla-
nations, when offered, become things to be explained, and the often random nature of the world as codified
in people’s experience is respected.
Rowe and Plomin
noted that after the birth of a second child parents are
often struck by how different their two children are,
despite upbringing being in common. In relation to
health, non-professional understanding of causes of
disease regularly identify the role of chance (or
fate) 164 and heritable factors 165 as being of consider-
able importance. Indeed I have to confess that when I
was involved in a cross-disciplinary project exploring
the construction of models of disease causation held
by the general public—which we referred to as ‘lay
epidemiology’ 2 —I was disappointed that, for the
public at large, there appeared to be a concentration
on such apparently individual factors as inheritance
and fate, rather than my preferred model of the
socio-political determinants of health. 166
One perhaps counter-intuitive
way is to embrace the findings of quantitative genet-
ics and realize they actually enhance the importance
of the insights that epidemiology brings. First, most
traits have a non-trivial genetic component. This is
good news: it means that genetic variants can be uti-
lized as instrumental variables for the near-alchemic
act of turning observational into experimental data,
and allow the strengthening of causal inference with
respect to environmentally modifiable exposures, in
the absence of randomized trials. 162,167 Indeed, we
might even enter the age of hypothesis-free causal-
ity. 163
- 163: Davey Smith G. "Random allocation in observational data: how small but robust effects could facilitate hypothesis-free causal inference". Epidemiology 2011;22: 460–63."
"Epidemiologists aim to identify modifiable causes of disease, this often being a
prerequisite for the application of epidemiological findings in public health pro-
grammes, health service planning and clinical medicine. Despite successes in
identifying causes, it is often claimed that there are missing additional causes
for even reasonably well-understood conditions such as lung cancer and coronary
heart disease. Several lines of evidence suggest that largely chance events, from
the biographical down to the sub-cellular, contribute an important stochastic
element to disease risk that is not epidemiologically tractable at the individual
level. Epigenetic influences provide a fashionable contemporary explanation for
such seemingly random processes. Chance events—such as a particular lifelong
smoker living unharmed to 100 years—are averaged out at the group level. As a
consequence population-level differences (for example, secular trends or differ-
ences between administrative areas) can be entirely explicable by causal factors
that appear to account for only a small proportion of individual-level risk. In
public health terms, a modifiable cause of the large majority of cases of a disease
may have been identified, with a wild goose chase continuing in an attempt to
discipline the random nature of the world with respect to which particular indi-
viduals will succumb. The quest for personalized medicine is a contemporary
manifestation of this dream. An evolutionary explanation of why randomness
exists in the development of organisms has long been articulated, in terms of
offering a survival advantage in changing environments. Further, the basic notion
that what is near-random at one level may be almost entirely predictable at a
higher level is an emergent property of many systems, from particle physics to the
social sciences. These considerations suggest that epidemiological approaches will
remain fruitful as we enter the decade of the epigenome.
"We cannot imagine these diseases, they are called idiopathic, spontaneous in origin, but we know instinctively there must be something more, some invisible weakness they are exploiting. It is impossible to think they fall at random, it is unbearable to think it."
--James Salter, Light Years, 1975
Despite many suc-
cesses, even with respect to the most celebrated—
such as the identification of cigarette smoking as a
major cause of lung cancer and other chronic dis-
eases—it can appear that much remains to be done.
Consider Winnie, lighting a cigarette from the candles
on her centenary birthday cake, who, after 93 years
of smoking, is not envisaging giving up the habit
(Figure 1).
...loom large in the popular imagination 2 and are
reflected in the low positive predictive values and C
statistics in many formal epidemiological prediction
models. In general, epidemiologists do a rather poor
job of predicting who is and who is not going to de-
velop disease.
This apparent failing of epidemiology has long been
recognized. Writing about ischaemic heart disease
(IHD) 40 years ago, Tom Meade and Ranjan
Chakrabarti reported that ‘within any risk group, pre-
diction is poor; it is not at present possible to express
individual risk more precisely than as about a 1 in 6
chance of a hitherto healthy man developing clinical
IHD in the next 5 years if he is at high risk’. 3
I have certainly promulgated such views in the
(usually unsuccessful) pursuit of pounds or dollars,
although the exact percentage of ‘explanation’ by es-
tablished causes would fall and rise in relation to
degree of desperation. The most feted contemporary
candidate for better prediction is probably genetics.
With the perception (in my view exaggerated) that
genome-wide association studies (GWASs) have
failed to deliver on initial expectations, 5 the next
phase of enhanced risk prediction will certainly shift
to ‘epigenetics’ 6,7 —the currently fashionable response
to any question to which you do not know the
answer.
Exposures
of this kind are, in the terminology popularized
within behavioural genetics, shared (or common) en-
vironmental factors. It is therefore perhaps surprising
that the groundbreaking 1987 paper by Robert Plomin
and Denise Daniels, 16 ‘Why are children in the same
family so different from one another?’, recently re-
printed with commentaries in the IJE, 17–21 has appar-
ently had little influence within epidemiology. The
implication of the paper—which expanded upon an
earlier analysis 22 —was that, genetics aside, siblings
are little more similar than two randomly selected
individuals of roughly the same age selected from
the source population that the siblings originate
from. This may be an intuitive observation for many
people who have siblings themselves or have more
than one child. Arising from the field of behavioural
genetics, the paper focused on measures of child be-
haviour, personality, cognitive function and psycho-
pathology, but, as Plomin points out, the same basic
finding is observed for many physical health out-
comes: obesity, cardiovascular disease, diabetes,
peptic ulcers, many cancers, asthma, longevity and
various biomarkers assayed in epidemiological stu-
dies. 18 These findings come from studies of twins,
adoptees and extended pedigrees, in which the vari-
ance in an outcome is partitioned into a genetic com-
ponent, the contribution of common environment (i.e.
that shared between people brought up in the same
home environment) and the non-shared environment
(i.e. exposures that are not correlated between people
brought up in the same family). The shared environ-
ment—which is the domain of many of the exposures
of interest to lifecourse epidemiologists—is reported
to make at best small contributions to the variance
of most outcomes. The non-shared environment—ex-
posures which (genetic influences apart) show no
greater concordance between siblings than between
non-related individuals of a similar age from the
same population—constitute by far the dominant
class of non-genetic influences on most health and
health-related outcomes (Box 1). Table 1 presents
data from a large collaborative twin study of 11
cancer sites, with universally large non-shared envir-
onmental influences (58–82%), heritabilities in the
range 21–42% (excluding uterine cancer, for which a
value of 0% is reported) and smaller shared environ-
mental effects, zero for four sites and ranging from
5% to 20% for the remainder. 23 Many other diseases
show a similar dominance of non-shared over shared
environmental influences. 18 Indeed, a greater non-
shared than shared environmental component ap-
pears to apply to some, 24–28 although not all, 29
childhood-acquired infections and the diseases they
cause. This is such a counter-intuitive observation
that one commentator on an earlier draft of this
paper used childhood infectious disease epidemiology
as an example of a situation in which the shared en-
vironment must be dominant.
...However, as Neven
Sesardic points out, even within behavioural genetics
the central, rather momentous, finding regarding the
apparently small or non-existent contribution of
family background to child outcomes went under-
appreciated; it was ‘an explosion without a bang’. 19
For epidemiologists, the fact that the generally small
shared environmental influences on many outcomes
appeared to get even smaller (or disappear com-
pletely) with age—as is seen, for example, with re-
spect to body mass index and obesity 31 —increases the
relevance of the message, since later life health out-
comes are often what we study. Yet, within epidemi-
ology, the impact of this work has been minimal; of
the 607 citations of the Plomin and Daniels paper on
ISI Web of Science (as of May 2011), only a handful
fall directly within the domain of epidemiology or
population health. In the recent book, Family Matters:
Designing, Analysing and Understanding Family-based
Studies in Lifecourse Epidemiology, 32 the issue is barely
touched upon; the balanced one page it receives near
the end of the 340-page book being perhaps too little,
too late. 33 Between-sibling studies as a way of con-
trolling for potential confounding have been widely
discussed within epidemiology, both in the book in
question 34 and elsewhere. 35,36 Certainly, this is a
useful method for taking into account shared aspects
of the childhood environment. But if shared environ-
ment has little impact on many outcomes then, on
the face of it, the approach might be missing the
issue of real concern—the more important non-
shared environmental factors. Despite this, the use
of sibling controls sometimes appears to uncover
substantial confounding. For example, maternal
smoking during pregnancy was found in a large
Swedish study to be associated with lower offspring
IQ, even after adjustment for many potential con-
founding factors. 37 In a between-sibs comparison,
however, there was no association of maternal smok-
ing with IQ of offspring, which the authors inter-
preted as indicating that the association seen for
unrelated individuals was due to residual confound-
ing. If shared environment is of such little import-
ance, how can it generate meaningful confounding
in epidemiological studies? We will return to this
issue later.
An extensive research programme in the behavioural
and social sciences consequent on the Plomin and
Daniels review focused on the direct assessment of
effects of the systematic aspect of the non-shared en-
vironment. Instruments were developed to collect de-
tailed data on sibling-specific parenting practice, sib–
sib interactions and the influence of schools and peer
groups, and studies including more than one child per
family were explicitly established to allow investiga-
tion of why siblings differ. However, a decade ago,
a meta-analytical overview of such studies concluded
that there was little direct evidence of important in-
fluences of specific non-shared environmental charac-
teristics on behavioural and social outcomes mainly
assessed during the first two decades of life. 38 At
best, only small proportions of the phenotypic vari-
ance attributed to the non-shared environment
related to directly measured influences. The effects
were rarely statistically robust and the median value
of the proportion of variation accounted for was 3%.
In the behavioural genetic studies, estimates of the
proportion of the overall phenotypic variance ac-
counted for by the non-shared environment are
almost always over 50%, and often substantially so;
similar findings apply to cancers (Table 1). There are
more optimistic assessments of the current status of
studies directly assessing the effects of non-shared
environment, 18,39 but in these the magnitude of the
effects appears small. In an example presented in
Plomin’s assessment of three decades of research on
this issue 18 non-shared aspects of maternal negativity
does have a statistically robust association with off-
spring depressive symptoms, but accounts for only
around 1% of the variance. 40
Systematic aspects of the non-shared
environments of adults that have large effects on dis-
ease outcomes may await identification. However, the
inability to identify such effects using intensive as-
sessments of exposure and outcomes in childhood is
sobering. Furthermore, in longitudinal twin studies,
in which twin pairs have repeat assessments, the gen-
eral finding is that the non-shared environmental
variance at one age overlaps little with that at a
later age—i.e. there appear to be unique and largely
uncorrelated factors acting at different ages. For ex-
ample, with respect to body mass index, the
non-shared environmental components at age 20,
48, 57 and 63 years are largely uncorrelated with
each other. 52 This suggests that exposures contribut-
ing to non-shared environmental influences are often
unsystematic and of a time- or context-dependent
nature. Similar findings have emerged from studies
of various other outcomes, with non-shared environ-
mental influences contributing little, if anything, to
tracking of phenotypes over time. 53 A distinction
can be drawn between the stable and unstable aspects
of the non-shared environment, with studies tending
to point to the latter as being of more statistical
importance in terms of explaining variance in the
distribution of disease risk. This is a crucial issue,
since some environmental exposures which are
partly non-shared in adulthood (such as cigarette
smoking and occupational exposures) tend to track
over time—and thus be stable components of the
non-shared environment.
Currently, there is largely an absence of evidence—
rather than evidence of absence—of directly assessed
systematic non-shared environmental influences on
health, and little active research in the biomedical
field. However, as the phenotypic decomposition of
variance shows similar patterns in the medical, be-
havioural and social domains, it seems prudent to
assume that similar causal structures exist, and
equivalent conclusions should be drawn: a large com-
ponent of variation in health-related traits cannot be
accounted for by measureable systematic aspects of
the non-shared environment.
Many features of twin study ana-
lysis can be problematic. For example, twin study
analysis often assumes that genetic contributions are
additive, and that genetic dominance (in the classic
Mendelian sense) or gene–gene interactions (epista-
sis) do not contribute to the genetic variance. Such
an assumption can lead to under-estimation of the
shared environmental component. 55–57 Conversely,
twin studies also assume no assortative mating (i.e.
parents are no more genetically similar than if ran-
domly sampled from the population) and no gene–
environment covariation, both of which can lead to
over-estimation of the shared environmental compo-
nent. 55 Different study designs for estimating compo-
nents of phenotypic variation make different
assumptions, however. Conventional twin studies,
studies of twins reared apart, extended twin-family
studies (in which other family members are
included), other extended pedigree studies and adop-
tion studies (including those in which there is quasi-
random assignment of particular adoptees) generally
come to the same basic conclusions about the relative
magnitude of these components. 58 All these designs
have been applied to the study of body mass index
and obesity, with the findings indicating roughly the
same magnitude of heritability. 55,59–64 This makes it
less likely that these are seriously biased, because dif-
ferent biases would all have to generate the same ef-
fects, which is not a plausible scenario.
With respect to the ‘missing heritability’, to take the
example of height—referred to by both Plomin 18 and
Turkheimer 25 —the estimate of the proportion of her-
itability explained by identified variants they give, of
<5%, has already increased to 410%, 65 and directly
estimated heritability (relating phenotypic similarity
to stochastic variation in the proportion of the
genome shared between siblings) indicates similar
heritabilities to those seen in twin studies. 66
Genome-wide prediction using common genetic varia-
tion across the genome also points to the effects of
measured genetic variation moving towards the
expectation from conventional heritability estimates. 67
Such data suggest there are large numbers of variants
as yet not robustly characterized that are contributing
to the heritability of height, with rare variants not
identifiable through GWAS probably accounting for
much of the remainder...In summary, it
seems improbable that heritability has been substan-
tially over-estimated at the expense of shared envir-
onment. The basic message that a larger non-shared
than shared environmental component to phenotypic
variance is the norm is unlikely to be overturned.
Shared environmental effects, although generally
small, are more substantial for some outcomes,
including musical ability 69 and criminality in adoles-
cents and young adults; 70 respiratory syncytial virus
infection, 29 anti-social behaviour, 53,71 mouth ulcers 72
and physical activity 73 in children and lung function
in adults. 74 Furthermore, findings with respect to
shared environmental contributions have face validity.
For example, in a twin study applying behavioural
genetic variance decomposition to behaviours, dis-
positions and experiences, shared environmental
effects were found for only 9 of the 33 factors inves-
tigated. 75 However, they were identified for those as-
pects of life that would appear to depend on shared
family characteristics, for example, for a child being
read to by a parent, but not for the child reading
books on their own. Similarly, the number of years
a child had music lessons had a substantial shared
environmental component, as might be expected as
this will initially depend on the parents organizing
such lessons. Continuing to play an instrument into
adulthood, however did not have an identified shared
environmental contribution.
Shared environ-
ment can be addressed through analysis of spousal
similarities in health outcomes, as environments are
shared to an extent by cohabiting couples, and these
also yield what on the face of it are rather small effect
estimates. For example, the cross-spousal correlation
for body mass index does not change from when cou-
ples initially come together (reflecting assortative
mating) over many years of them living together in
an at least partially shared environment. 61
Of most relevance to epidemiological approaches,
however, is that models generally fix the shared en-
vironmental component to zero if it is not ‘statistically
significantly’ different from zero. This is evident
in Table 1; with respect to pancreatic cancer, for ex-
ample, the shared environmental component is given
as 0, with a 95% confidence interval (CI) 0–0.35 (i.e.
the upper limit being 35% of phenotypic variance). In
many cases, it is simply stated that these studies find
no effect of shared environmental influences, even
though the findings are compatible with quite sub-
stantial contributions, but these cannot be reliably
estimated in the generally small samples available in
twin and adoption studies. Thus, a twin study of
aortic aneurysm reported that there was ‘no support
for a role of shared environmental influences’, 78 with
the 95% CI around the effect estimate being 0–27%. A
recent meta-analysis found that for various aspects of
child and adolescent psychopathology, shared envir-
onment makes a non-negligible contribution in ad-
equately powered analyses. 79 The claims of there
being ‘no shared environmental influence’, which
are often made (Box 2), might more realistically be
seen as an indication of inadequate sample size and
the fetishization of ‘statistical significance’. 80
The stochastic nature of phenotypic development is
something we should not be surprised to encounter
(Box 3). In his 1920 paper, ‘The relative importance of
heredity and environment in determining the piebald
pattern of guinea pigs’, Sewall Wright (Figure 2) pre-
sented a seminal path analysis (Figure 3), that has
frequently been cited as a source of this particular
statistical method. 87 Wright observed that ‘nearly all
tangible environmental conditions—feed, weather,
health of dam, etc., are identical for litter mates’; in
current terminology, they are part of the shared en-
vironment. Such factors were found to be of minor
importance; instead, most of the non-genetic variance
‘must be due to irregularities in development due to
the intangible sort of causes to which the word
chance is applied’. 87 Wright pointed out that meas-
urement error could not be separated from this intan-
gible variance, as is the case with non-shared
environment in current parlance. In a later paper, 88
Wright and his PhD student Herman Chase independ-
ently graded the guinea pig coat patterns, and demon-
strated that measurement error was only a minor
contributor (Figure 4). A summary table (Table 2)
included a shared environmental influence on litter-
mates—age of the mother—but the intangible vari-
ance dominated, with the estimate of the magnitude
of this being similar to estimates seen for the
contribution of the non-shared environment in rela-
tion to many human traits. 16 In humans, of course,
age of mother at conception could be a non-shared
environmental factor influencing differences between
siblings. In the inbred guinea pig strain, where gen-
etic differences were minor, heredity was not an issue,
and the intangible (‘non-shared environmental’) fac-
tors were even more dominant.
In genetically identical Caenorhabditis elegans reared in the same environments there are large differences in age-related functional declines, attributable to purely stochastic events. 89 In the case of genetically similar inbred laboratory rats, Klaus Gartner noted the failure to materially reduce variance for a wide variety of phenotypes, despite several decades of standardizing the environment. 90,91 Indeed, there was hardly any reduction in variance compared with that seen in wild living rats experiencing considerably more variable environments...Embryo splitting and transfer experiments in rodents and cattle demonstrated that the prenatal environment was also not a major source of phenotypic variation. 90,91 In genetically identical marbled crayfish raised in highly controlled environments considerable phenotypic differences emerge. 94 These and numerous other examples from over nearly a century 87,93–98 demonstrate the substantial contribution of what appear to be chance or stochastic events—which in the behavioural genetics field would fall into the category of non-shared environmental influences—on a wide range of outcomes.
If such a substantial role for chance exists in the
emergence of phenotypic (including pathological) pro-
files, why is this? One possible answer, with a long
pedigree, 121–123 is that it provides for evolutionary
bet-hedging. 124 Fixed phenotypes may be tuned to a
given environment, but in changing conditions a
phenotype optimized for propagation in one situation
may rapidly become suboptimal, 125 a proposition sup-
ported by experimental evidence. 126,12
Reflecting on their
demonstration of considerable phenotypic—including
epigenetic—differences between genetically identical
crayfish, they conclude that such variation may ‘act
as a general evolution factor by contributing to the
production of a broader range of phenotypes that
may occupy different micro-niches’. 94 The substantial
non-shared environmental contribution to many out-
comes could, therefore, include an element—perhaps
substantial—of random phenotypic noise, consequent
on stochastic epigenetic processes. At the molecular
level, the potential existence of such processes has
been observed within twin studies, with the formal
demonstration of non-shared environmental contribu-
tions to epigenetic profiles 130 and of substantial dif-
ferences in epigenetic markers between monozygotic
twins. 119
Other mechanisms can also contribute to pheno-
typic diversity, including meiotic recombination and
Mendelian assortment of genetic variants acting on
highly polygenic traits, with such genetic variants
having small individual effects. Mutation will also
increase phenotypic variation. Sibling contrast ef-
fects—siblings becoming less similar than their gen-
etic and shared environmental commonalities would
suppose—could also provide for such evolutionary
bet-hedging. 129 Although evidence supporting such a
process is sparse, it could lead to inflation of
non-shared environmental influences and deflation
of shared environment estimates from twin studies.
Most cases of lung cancer
are attributable to smoking, but many smokers do not
develop lung cancer. Thus, in the Whitehall Study of
male civil servants in London cigarette smoking ac-
counts for <10% of the variance (estimated as the
pseudo-R 2 ) 141 in lung cancer mortality. 102 At the
population level, however, smoking accounts for vir-
tually all of the variance—over 90% with respect to
lung cancer mortality over time in the USA, 142 and
virtually all of the differences in rates between areas
in Pennsylvania. 143 It is in relation to this large
contribution of smoking to the population burden of
lung cancer that <10% of variance accounted for by
cigarette smoking among individuals observed in pro-
spective epidemiological studies, and the 12% shared
environmental variance reported in Table 1, should be
considered. The shared environmental component will
in part reflect shared environmental differences in
cigarette smoking initiation. 144 The non-shared
environmental component (62% of the variance in
Table 1) will include the non-shared environmental
influence on initiation, amount and persistence of
smoking. 144 However, as discussed earlier, stable
aspects of the non-shared environment—which smok-
ing would tend to be—are generally small contribu-
tors to the total non-shared environmental effect,
and thus much of this will also reflect the substantial
contribution of the kinds of chance events—
These reflections will be unexceptional to epidemi-
ologists, as they merely illustrate a key point made by
Geoffrey Rose in his contributions to the theoretical
basis of population health 148,149 —that the determin-
ants of the incidence rate experienced by a population
may explain little of the variation in risk between
individuals within the population. Accounting for
incidence differs from understanding particular inci-
dents. Consider obesity in this regard; 150 its preva-
lence has increased dramatically over the past few
decades, yet estimates of the shared environmental
contributions to obesity are small. Clearly germline
genetic variation in the population has not changed
dramatically to produce this increase in obesity.
However, as Table 3 demonstrates, the prevalence of
obesity has increased in both genders, all ages, all
ethnic and socio-economic groups, and in both smo-
kers and non-smokers. 151 The most likely reason for
this is that there has been an across the board shift
in the ratio of energy intake to energy expenditure.
Study designs utilized to estimate heritability cannot
pick this up—twins, for example, are perfectly
matched by birth cohort. 150 Thus, although energy
balance may underlie the burden of obesity in a popu-
lation—and behind this, the social organization of
food production, distribution and promotion, together
with policies influencing transportation, urban plan-
ning and leisure opportunities—the determinants of
who, against this background, is obese within a popu-
lation could be largely dependent on a combination of
genetic factors and chance...Rose illustrated this point with the thought experi-
ment of a population in which all the individuals
smoke 20 cigarettes a day, in which ‘clinical, case–
control and cohort studies alike would lead us to
conclude that lung cancer was a genetic disease;
and in one sense that would be true, since if everyone
is exposed to the necessary agent, then the distribu-
tion of cases is wholly determined by individual sus-
ceptibility’. 134
Even Francis
Galton—the sometime bogeyman of the eugenics movement—wrote ‘Nature prevails enormously over nur-
ture when the differences of nature do not exceed what is commonly to be found among persons of the
same rank of society and in the same country’. 189 In other words, the contribution of genetic inheritance to
differences within a population is large when there is limited environmental variation between people within
a particular context. If the context were broadened, the contribution of such environmental factors would be
greater. Heritability is not a fixed characteristic, nor does high heritability within a particular situation
indicate that environmental change cannot lead to dramatic modification of outcomes. Height—the topic
of much of Galton’s own work—is both highly heritable and highly malleable, as changes over time in
height make clear. 190 Wilhelm Johannsen, the coiner of the term ‘gene’ recognized that in a genetically
highly homogeneous group ‘hereditary may be vanishingly small within the pure line’, 191 and that in this
situation ‘all the variations are consequently purely somatic and therefore non-heritable’. 191 Conversely, in a
highly standardized environment, the contribution of genetic factors will be increased. It is traditional in
epidemiological and related fields to hark back to such trusted thought experiments as how phenylketonuria
(PKU) would be expressed against the background of different levels of phenylalanine intake within popu-
lations, to demonstrate that the same outcome can be 100% heritable and 100% environmental in different
contexts. 5,192–197 The point is well made that the presence of a clear genetic predisposition does not mean
that environmental change cannot have major effects on disease risk. Perhaps reflecting the contested nature
of this area, however, public health academics are sometimes asymmetrical in their reasoning, and after
having presented the clear example of PKU they then claim that secular trends and migrant studies—with
their unambiguous demonstrations of environmental influences on disease—provide arguments against
strong genetic predisposition to common disease. 5 This is equivalent to saying that the clear demonstration
that genetic lesions underlie PKU in permissive environments argues against any major environmental
contribution to PKU.
A second popular thought experiment relates to the possession of two eyes or two legs. The reason
humans are almost always born with two of each is genetically determined. However, within a population
the trait would not be highly heritable—and certainly not 100% heritable—with loss of a leg or eye generally
reflecting accidental events. The distinction between explaining individual trajectories (genes are responsible
for the development of two eyes and two legs) and variation in a population is clear, and reflects the
distinction between ‘who?’ (why does one person have a disorder or problem rather than another?) and
‘how many’ (what proportion of the population are affected?) questions. 198
Within sociology, for example,
the perhaps under-appreciated role of chance has been emphasised, 206 illustrated with entertaining examples
from the sporting world. A striking example of what is known as Stein’s paradox in statistics is that within-
season prediction of the end of season batting averages for particular baseball players is generally better if
strongly weighted towards the average of all players at that stage in the season. 207 The best guess at what
will happen to an individual can often be made by largely discounting individual characteristics. The popular
recognition of the importance of chance in people’s lives 164 can also influence response to cultural artefacts.
Thus in films, novels or plays explanation of events is often near-deterministic, which in certain circum-
stances appears satisfying. Consider Alfred Hitchcock’s film Marnie. The behaviour of the eponymous
character—fear of thunderstorms, the colour red and men, together with her thieving and frigidity—is all
explained at the end of the film by a particular event occurring when Marnie was six. She discovered her
prostitute mother with a client during a thunderstorm and ended up killing him (in a cinematic shock of
bright red blood) with a poker. Everything seamlessly rolled on from this event. In crime stories this is often
what the reader wants. As Stephen Kern entertainingly demonstrates 208 the range of causal models in such
narratives has a similar range to epidemiology—from the long-arm of early life (or prenatal) events through
to primarily psychological and social causation. Outside of murder novels, however, the factitious nature of
such explanations can be entirely unsatisfactory. The apparent reality of the well-told narrative appears
unreal precisely because everything is tied up and explained—a notion that has resonance with David
Shield’s literary manifesto Reality Hunger. 209 To take one example, the clunking plots of the novels of Ian
McEwan—Saturday for example—revolve around such faux ‘explanations’. The work of McEwan—and simi-
lar purveyors of book club fare, such as Jonathan Franzen—appear, paradoxically, much less true than such
novels as Laurence Sterne’s Tristram Shandy, Macado de Assis’ Epitaph of a Small Winner, Blaise Cendrars’
Moravagine or Alasdair Gray’s Lanark, which are apparently not seeking such realism. In these works expla-
nations, when offered, become things to be explained, and the often random nature of the world as codified
in people’s experience is respected.
Rowe and Plomin
noted that after the birth of a second child parents are
often struck by how different their two children are,
despite upbringing being in common. In relation to
health, non-professional understanding of causes of
disease regularly identify the role of chance (or
fate) 164 and heritable factors 165 as being of consider-
able importance. Indeed I have to confess that when I
was involved in a cross-disciplinary project exploring
the construction of models of disease causation held
by the general public—which we referred to as ‘lay
epidemiology’ 2 —I was disappointed that, for the
public at large, there appeared to be a concentration
on such apparently individual factors as inheritance
and fate, rather than my preferred model of the
socio-political determinants of health. 166
One perhaps counter-intuitive
way is to embrace the findings of quantitative genet-
ics and realize they actually enhance the importance
of the insights that epidemiology brings. First, most
traits have a non-trivial genetic component. This is
good news: it means that genetic variants can be uti-
lized as instrumental variables for the near-alchemic
act of turning observational into experimental data,
and allow the strengthening of causal inference with
respect to environmentally modifiable exposures, in
the absence of randomized trials. 162,167 Indeed, we
might even enter the age of hypothesis-free causal-
ity. 163
- 163: Davey Smith G. "Random allocation in observational data: how small but robust effects could facilitate hypothesis-free causal inference". Epidemiology 2011;22: 460–63."