"Genetic link between family socioeconomic status and children's educational achievement estimated from genome-wide SNPs", Krapohl & Plomin 2016:
"One of the best predictors of children’s educational achievement is their family’s socioeconomic status (SES), but the degree to which this association is genetically mediated remains unclear. For 3000 UK-representative unrelated children we found that genome-wide single-nucleotide polymorphisms could explain a third of the variance of scores on an age-16 UK national examination of educational achievement and half of the correlation between their scores and family SES. Moreover, genome-wide polygenic scores based on a previously published genome-wide association meta-analysis of total number of years in education accounted for ~3.0% variance in educational achievement and ~2.5% in family SES. This study provides the first molecular evidence for substantial genetic influence on differences in children’s educational achievement and its association with family SES.
Here we report the first investigation of genetic influence on the variance of children’s educational achievement using DNA alone. The same DNA-based methods can also be used to estimate genetic influence on the covariance between traits.17 This enabled us to investigate possible genetic mediation of the best predictor of children’s educational achievement, their family’s SES.18, 19 This correlation is often interpreted causally as family SES causing differences in children’s educational achievement.20 However, it remains unclear whether and to what extent the association between family SES and children’s educational achievement is genetically mediated, because twin and family research is limited to studying phenotypes that can vary within a family. Key aspects of children’s environment such as poverty, parental education and neighborhood cannot be investigated using the twin method because it is methodologically impossible to decompose variance in phenotypes shared within twin pairs.
GWA attempts aimed at identifying individually significant SNPs have generally captured only extremely small fractions of genetic variance of complex traits, the so-called missing heritability problem.22 However, evidence has been accumulating that significant portions of phenotypic variation can be explained by the ensemble of markers not achieving genome-wide significance.23 Markers are identified from GWAs using an initial discovery sample to construct a genome-wide polygenic score (GPS) in an independent replication sample by calculating the effect-size-weighted sum of trait-associated alleles for each individual. An aggregate GPS score can be used to assess genetic influence on trait variation.
As they are tapping into the same genetic signal, GPS based on GWA results and GCTA can be applied to the same data sets, with both estimating the polygenic contribution to trait variance or a shared polygenic covariance between traits captured by the additive effects of common SNPs. We therefore employ a two-method approach using GCTA and GPS to explore the genetic influence on the variance of children’s educational achievement and on the covariance between family SES and children’s educational achievement. Our study had four objectives:
(1) To estimate, for the first time using DNA data, genetic influences on children’s educational achievement on an age-16 UK national examination of educational achievement using genome-wide genotypes from >3000 conventionally unrelated children. Specifically, we conduct GCTA11 to quantify pairwise genomic similarity between each pair of individuals across millions of SNPs throughout the genome in order to estimate the proportion of phenotypic variation in children’s educational achievement captured by all SNPs simultaneously.
(2) To investigate genetic mediation of the phenotypic correlation between family SES and children’s educational achievement, we conduct bivariate GCTA to estimate the proportion of phenotypic covariation between children’s family SES and children’s educational achievement captured by children's genotypes.
(3) To create a GPS based on the results of a large GWA study on adults’ total years of schooling13 and investigate its association with variance in children’s educational achievement and their family SES.
(4) To examine the role of general cognitive ability (intelligence) in the genetic nexus between children’s educational achievement and their family SES. Molecular evidence as well as twin studies have shown that cognitive ability is heritable and accounts for substantial portion of genetic variance in educational achievement.7, 24, 25, 26 In addition, recent molecular evidence from the present sample of unrelated individuals showed high genetic correlation between family SES and children’s intelligence at age 7 and 12 years.27 Based on this evidence, it is important to address the question to what extent the genetic link between family SES and children’s educational achievement is mediated by intelligence. For this reason, we perform GCTA mediation analyses to test for a direct genetic link between family SES and children’s educational achievement independent of cognitive ability. Complementarily, we test whether the GPS of adults’ total years of schooling explains variance in children’s educational achievement independently of cognitive ability.
DNA data were available for 3747 children whose first language was English and had no major medical or psychiatric problems. From that sample, 3665 DNA samples were successfully hybridized to Affymetrix GeneChip 6.0 SNP genotyping arrays (Affymetrix, Santa Clara, CA, USA) using standard experimental protocols as part of the WTCCC2 project (for details see Trzaskowski et al.).31 In addition to nearly 700 000 genotyped SNPs, more than one million other SNPs were imputed from HapMap 2, 3 and WTCCC controls using IMPUTE v.2 software.32 A total of 3152 DNA samples (1446 males and 1706 females) survived quality control criteria for ancestry, heterozygosity, relatedness and hybridization intensity outliers. To control for ancestral stratification, we performed principal component analyses on a subset of 100 000 quality-controlled SNPs after removing SNPs in linkage disequilibrium (r2>0.2).33 Using the Tracy–Widom test,34 we identified 8 axes with P<0.05 that were used as covariates in GCTA and polygenic score analyses.
Educational achievement: Educational achievement was operationalized as performance on the standardized UK-wide examination, the General Certificate of Secondary Education (GCSE), taken by almost all (>99%) pupils at the end of compulsory education at typically at the age of 16 years. English, mathematics and science are compulsory subjects. Five or more GCSEs with grades A*–C are required for further education, including GCSE English and GCSE mathematics. The joint performance on these three compulsory subjects determines admission to further education and employability...The GCSE measure for the present analyses was the mean grade of the three compulsory core subjects, mathematics, English (mean grade of ‘English Language’ and ‘English Literature’), and science (mean of any science subjects taken), requiring at least two measures to be nonmissing. Scores on the three compulsory core subjects were highly correlated (0.65–0.81).
Intelligence (IQ): Individuals were assessed at the ages of 2, 3, 4, 7, 9, 10, 12, 14, and 16 years on general cognitive ability using a battery of parent-administered and phone- and web-based tests. At ages 2, 3, and 4, tests were parent-administered and validated against standard tests administered by a trained tester. At age 7, tests were administered over the phone; at age 9, parents administered the tests; and at the ages 10–16, tests were web based. At each testing age, individuals completed at least two ability tests that assessed verbal and nonverbal intelligence. Psychometric properties of the tests have been described in detail elsewhere,36 with the exception of the measurements used at age 16 years, where subjects completed a web-based adaptation of Raven’s Standard and Advanced Progressive Matrices and the Mill-Hill Vocabulary Scale.37, 38, 39
The present sample size of ~3000 yields 80% power to detect a GCTA heritability estimate of 30% (α=0.05) and genetic correlation estimate of 0.6 (α=0.05; VG1=0.20; VG2: 0.30; rPh=0.50).
Polygenic scores: We created polygenic scores from genome-wide data of over 3000 unrelated children using GWA results for total years of schooling from an independent discovery sample.13 The same quality control criteria as for the GCTA analyses were applied to the data. Polygenic risk scores were constructed using the P-values and β-weights from the recent large (N=126 559) GWA based on years of education.6 Quality-controlled SNPs were pruned for linkage disequilibrium based on P-value informed clumping in PLINK,44 using R2=0.25 cutoff within a 200-kb window. We removed the major histocompatibility complex region of the genome because of its complex linkage disequilibrium structure. 144 890 SNPs survived linkage disequilibrium pruning. For each individual, multiple polygenic scores were generated using the PLINK score option based on the top SNPs from the GWA analysis of educational attainment for varying significance thresholds (from 0.01 to 0.50). Numbers of SNPs per threshold are summarized in Supplementary Table 3. The scores were calculated as the sum across SNPs of the number of reference alleles for each SNP multiplied by the effect size (β-coefficient) derived from the GWA analysis of years of education.
Phenotypically, children’s educational achievement correlated 0.50 (0.02 s.e.) with their family SES. Both variables also correlated with intelligence: 0.55 (0.02 s.e.) for educational achievement and 0.38 (0.02 s.e.) for family SES (Supplementary Table 1).
Bivariate GCTA: Bivariate GCTA showed that the estimated proportion of variance tagged by the sampled SNPs was 0.31 (0.12 s.e.) in educational achievement, and 0.20 (0.11 s.e.) in family SES (Figure 1). The genetic correlation, indicating the extent to which the same SNPs are associated with family SES and children’s educational achievement, was near unity (rG=1.02 (0.25 s.e.)).
Based on the genetic correlation between the two traits and the genetic contribution to variance of each trait respectively, GCTA estimates the genetic contribution to the phenotypic correlation between the two traits: C(G)=r1,2 (G) √ (V1 (G) × V2 (G)), applied to the data: 0.25=1.02 × √ (0.31 × 0.20). Hence, GCTA estimated the genetic contribution to the phenotypic correlation between family SES and children’s educational achievement as 0.25 (0.09 s.e.), indicating that the proportion of the observed correlation tagged by the additive effects of available SNPs was 50% (that is, 0.25/0.50; Figure 1). This suggests approximately half of the phenotypic correlation between children’s family SES and their educational achievement was mediated genetically.
Our GCTA heritability estimate of 20% for family SES tagged by children’s genotypes is very similar to GCTA heritability estimates of years of education in adulthood and socioeconomic measures tagged by adults’ genotypes themselves in previous studies.13, 14, 15 This is remarkable as children’s genotypes are only a proxy for their parents’ genotypes. In other words, GCTA effects on family SES estimated from children’s DNA only reflect the extent to which children inherit parental characteristics associated with the family SES created by the parents. One such factor is intelligence, and we find that children’s intelligence accounts for about one-third of the GCTA association between family SES and children’s educational achievement. However, it is interesting that two-thirds of the GCTA association is not accounted for by children’s intelligence. This finding of intelligence-independent shared genetic variance between family SES and children’s educational achievement suggests that differences in educational achievement at the end of compulsory education and the level of education and occupation attained in adulthood are not merely the manifestation of differences in intelligence. This is in line with twin research that suggests that the heritability of educational achievement reflects many genetically influenced traits such as personality and self-efficacy, not just intelligence.48
Our results also contribute to the extensive debate about meritocracy and social mobility62 that has largely ignored the fact that parents and their offspring are genetically related. Usually a lower correlation between parental and offspring SES is seen as an index of social mobility.63 However, considering genetics, we know that removing environmental sources of variation will not remove genetically driven resemblance between parents and offspring. To the contrary, as environmental differences diminish, individual differences that remain will to a larger proportion be due to genetic differences; that is, heritability would increase, which has also been demonstrated empirically.55 That way, heritability could be seen as an index of social mobility."
"One of the best predictors of children’s educational achievement is their family’s socioeconomic status (SES), but the degree to which this association is genetically mediated remains unclear. For 3000 UK-representative unrelated children we found that genome-wide single-nucleotide polymorphisms could explain a third of the variance of scores on an age-16 UK national examination of educational achievement and half of the correlation between their scores and family SES. Moreover, genome-wide polygenic scores based on a previously published genome-wide association meta-analysis of total number of years in education accounted for ~3.0% variance in educational achievement and ~2.5% in family SES. This study provides the first molecular evidence for substantial genetic influence on differences in children’s educational achievement and its association with family SES.
Here we report the first investigation of genetic influence on the variance of children’s educational achievement using DNA alone. The same DNA-based methods can also be used to estimate genetic influence on the covariance between traits.17 This enabled us to investigate possible genetic mediation of the best predictor of children’s educational achievement, their family’s SES.18, 19 This correlation is often interpreted causally as family SES causing differences in children’s educational achievement.20 However, it remains unclear whether and to what extent the association between family SES and children’s educational achievement is genetically mediated, because twin and family research is limited to studying phenotypes that can vary within a family. Key aspects of children’s environment such as poverty, parental education and neighborhood cannot be investigated using the twin method because it is methodologically impossible to decompose variance in phenotypes shared within twin pairs.
GWA attempts aimed at identifying individually significant SNPs have generally captured only extremely small fractions of genetic variance of complex traits, the so-called missing heritability problem.22 However, evidence has been accumulating that significant portions of phenotypic variation can be explained by the ensemble of markers not achieving genome-wide significance.23 Markers are identified from GWAs using an initial discovery sample to construct a genome-wide polygenic score (GPS) in an independent replication sample by calculating the effect-size-weighted sum of trait-associated alleles for each individual. An aggregate GPS score can be used to assess genetic influence on trait variation.
As they are tapping into the same genetic signal, GPS based on GWA results and GCTA can be applied to the same data sets, with both estimating the polygenic contribution to trait variance or a shared polygenic covariance between traits captured by the additive effects of common SNPs. We therefore employ a two-method approach using GCTA and GPS to explore the genetic influence on the variance of children’s educational achievement and on the covariance between family SES and children’s educational achievement. Our study had four objectives:
(1) To estimate, for the first time using DNA data, genetic influences on children’s educational achievement on an age-16 UK national examination of educational achievement using genome-wide genotypes from >3000 conventionally unrelated children. Specifically, we conduct GCTA11 to quantify pairwise genomic similarity between each pair of individuals across millions of SNPs throughout the genome in order to estimate the proportion of phenotypic variation in children’s educational achievement captured by all SNPs simultaneously.
(2) To investigate genetic mediation of the phenotypic correlation between family SES and children’s educational achievement, we conduct bivariate GCTA to estimate the proportion of phenotypic covariation between children’s family SES and children’s educational achievement captured by children's genotypes.
(3) To create a GPS based on the results of a large GWA study on adults’ total years of schooling13 and investigate its association with variance in children’s educational achievement and their family SES.
(4) To examine the role of general cognitive ability (intelligence) in the genetic nexus between children’s educational achievement and their family SES. Molecular evidence as well as twin studies have shown that cognitive ability is heritable and accounts for substantial portion of genetic variance in educational achievement.7, 24, 25, 26 In addition, recent molecular evidence from the present sample of unrelated individuals showed high genetic correlation between family SES and children’s intelligence at age 7 and 12 years.27 Based on this evidence, it is important to address the question to what extent the genetic link between family SES and children’s educational achievement is mediated by intelligence. For this reason, we perform GCTA mediation analyses to test for a direct genetic link between family SES and children’s educational achievement independent of cognitive ability. Complementarily, we test whether the GPS of adults’ total years of schooling explains variance in children’s educational achievement independently of cognitive ability.
DNA data were available for 3747 children whose first language was English and had no major medical or psychiatric problems. From that sample, 3665 DNA samples were successfully hybridized to Affymetrix GeneChip 6.0 SNP genotyping arrays (Affymetrix, Santa Clara, CA, USA) using standard experimental protocols as part of the WTCCC2 project (for details see Trzaskowski et al.).31 In addition to nearly 700 000 genotyped SNPs, more than one million other SNPs were imputed from HapMap 2, 3 and WTCCC controls using IMPUTE v.2 software.32 A total of 3152 DNA samples (1446 males and 1706 females) survived quality control criteria for ancestry, heterozygosity, relatedness and hybridization intensity outliers. To control for ancestral stratification, we performed principal component analyses on a subset of 100 000 quality-controlled SNPs after removing SNPs in linkage disequilibrium (r2>0.2).33 Using the Tracy–Widom test,34 we identified 8 axes with P<0.05 that were used as covariates in GCTA and polygenic score analyses.
Educational achievement: Educational achievement was operationalized as performance on the standardized UK-wide examination, the General Certificate of Secondary Education (GCSE), taken by almost all (>99%) pupils at the end of compulsory education at typically at the age of 16 years. English, mathematics and science are compulsory subjects. Five or more GCSEs with grades A*–C are required for further education, including GCSE English and GCSE mathematics. The joint performance on these three compulsory subjects determines admission to further education and employability...The GCSE measure for the present analyses was the mean grade of the three compulsory core subjects, mathematics, English (mean grade of ‘English Language’ and ‘English Literature’), and science (mean of any science subjects taken), requiring at least two measures to be nonmissing. Scores on the three compulsory core subjects were highly correlated (0.65–0.81).
Intelligence (IQ): Individuals were assessed at the ages of 2, 3, 4, 7, 9, 10, 12, 14, and 16 years on general cognitive ability using a battery of parent-administered and phone- and web-based tests. At ages 2, 3, and 4, tests were parent-administered and validated against standard tests administered by a trained tester. At age 7, tests were administered over the phone; at age 9, parents administered the tests; and at the ages 10–16, tests were web based. At each testing age, individuals completed at least two ability tests that assessed verbal and nonverbal intelligence. Psychometric properties of the tests have been described in detail elsewhere,36 with the exception of the measurements used at age 16 years, where subjects completed a web-based adaptation of Raven’s Standard and Advanced Progressive Matrices and the Mill-Hill Vocabulary Scale.37, 38, 39
The present sample size of ~3000 yields 80% power to detect a GCTA heritability estimate of 30% (α=0.05) and genetic correlation estimate of 0.6 (α=0.05; VG1=0.20; VG2: 0.30; rPh=0.50).
Polygenic scores: We created polygenic scores from genome-wide data of over 3000 unrelated children using GWA results for total years of schooling from an independent discovery sample.13 The same quality control criteria as for the GCTA analyses were applied to the data. Polygenic risk scores were constructed using the P-values and β-weights from the recent large (N=126 559) GWA based on years of education.6 Quality-controlled SNPs were pruned for linkage disequilibrium based on P-value informed clumping in PLINK,44 using R2=0.25 cutoff within a 200-kb window. We removed the major histocompatibility complex region of the genome because of its complex linkage disequilibrium structure. 144 890 SNPs survived linkage disequilibrium pruning. For each individual, multiple polygenic scores were generated using the PLINK score option based on the top SNPs from the GWA analysis of educational attainment for varying significance thresholds (from 0.01 to 0.50). Numbers of SNPs per threshold are summarized in Supplementary Table 3. The scores were calculated as the sum across SNPs of the number of reference alleles for each SNP multiplied by the effect size (β-coefficient) derived from the GWA analysis of years of education.
Phenotypically, children’s educational achievement correlated 0.50 (0.02 s.e.) with their family SES. Both variables also correlated with intelligence: 0.55 (0.02 s.e.) for educational achievement and 0.38 (0.02 s.e.) for family SES (Supplementary Table 1).
Bivariate GCTA: Bivariate GCTA showed that the estimated proportion of variance tagged by the sampled SNPs was 0.31 (0.12 s.e.) in educational achievement, and 0.20 (0.11 s.e.) in family SES (Figure 1). The genetic correlation, indicating the extent to which the same SNPs are associated with family SES and children’s educational achievement, was near unity (rG=1.02 (0.25 s.e.)).
Based on the genetic correlation between the two traits and the genetic contribution to variance of each trait respectively, GCTA estimates the genetic contribution to the phenotypic correlation between the two traits: C(G)=r1,2 (G) √ (V1 (G) × V2 (G)), applied to the data: 0.25=1.02 × √ (0.31 × 0.20). Hence, GCTA estimated the genetic contribution to the phenotypic correlation between family SES and children’s educational achievement as 0.25 (0.09 s.e.), indicating that the proportion of the observed correlation tagged by the additive effects of available SNPs was 50% (that is, 0.25/0.50; Figure 1). This suggests approximately half of the phenotypic correlation between children’s family SES and their educational achievement was mediated genetically.
Our GCTA heritability estimate of 20% for family SES tagged by children’s genotypes is very similar to GCTA heritability estimates of years of education in adulthood and socioeconomic measures tagged by adults’ genotypes themselves in previous studies.13, 14, 15 This is remarkable as children’s genotypes are only a proxy for their parents’ genotypes. In other words, GCTA effects on family SES estimated from children’s DNA only reflect the extent to which children inherit parental characteristics associated with the family SES created by the parents. One such factor is intelligence, and we find that children’s intelligence accounts for about one-third of the GCTA association between family SES and children’s educational achievement. However, it is interesting that two-thirds of the GCTA association is not accounted for by children’s intelligence. This finding of intelligence-independent shared genetic variance between family SES and children’s educational achievement suggests that differences in educational achievement at the end of compulsory education and the level of education and occupation attained in adulthood are not merely the manifestation of differences in intelligence. This is in line with twin research that suggests that the heritability of educational achievement reflects many genetically influenced traits such as personality and self-efficacy, not just intelligence.48
Our results also contribute to the extensive debate about meritocracy and social mobility62 that has largely ignored the fact that parents and their offspring are genetically related. Usually a lower correlation between parental and offspring SES is seen as an index of social mobility.63 However, considering genetics, we know that removing environmental sources of variation will not remove genetically driven resemblance between parents and offspring. To the contrary, as environmental differences diminish, individual differences that remain will to a larger proportion be due to genetic differences; that is, heritability would increase, which has also been demonstrated empirically.55 That way, heritability could be seen as an index of social mobility."