Profile cover photo
Profile photo
gwern branwen
2,831 followers
2,831 followers
About
Posts

Post has attachment
Are power estimates suggesting we need n=1-2m for IQ GWASes or other traits a cause for pessimism? No; 23andMe and http://Ancestry.com are doing around that much each year now, and genomics in general continues to follow exponential projections. In another 5-10 years, there may be enough sequencing capacity to sequence the entire global population once, and potentially billions of available cumulative raw genomes. Against that, phenotyping a few million will not be a big deal. When, not if. From "Big Data: Astronomical or Genomical?", Stephens et al 2015: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195

"For genomics, data acquisition is highly distributed and involves heterogeneous formats. The rate of growth over the last decade has also been truly astonishing, with the total amount of sequence data produced doubling approximately every seven months (Fig 1). The OmicsMaps catalog of all known sequencing instruments in the world [11] reports that currently there are more than 2,500 high-throughput instruments, manufactured by several different companies, located in nearly 1,000 sequencing centers in 55 countries in universities, hospitals, and other research laboratories. These centers range in size from small laboratories with a few instruments generating a few terabases per year to large dedicated facilities producing several petabases a year. (An approximate conversion factor to use in interpreting these numbers is 4 bases = 1 byte, though we will revisit this below.)

...Over the next ten years, we expect sequencing capacities will continue to grow very rapidly, although the project growth becomes more unpredictable the further out we consider. If the growth continues at the current rate by doubling every seven months, then we should reach more than one exabase of sequence per year in the next five years and approach one zettabase of sequence per year by 2025 (Fig 1, Table 1). Interestingly, even at the more conservative estimates of doubling every 12 months (Illumina’s current own estimate [12]) or every 18 months (equivalent to Moore’s law), we should reach exabase-scale genomics well within the next decade. We anticipate this sequencing will encompass genome sequences for most of the approximately 1.2 million described species of plants and animals [15]. With these genomes, plus those of thousands of individuals of “high value” species for energy, environmental, and agricultural reasons, we estimate that there will be at least 2.5 million plant and animal genome sequences by 2025. For example, the genomics powerhouse BGI, in conjunction with the International Rice Research Institute and the Chinese Academy of Agricultural Sciences, has already sequenced 3,000 varieties of rice [16] and announced a massive project of their own to sequence one million plant and animal genomes [17]. The Smithsonian Institute also has similar plans to “capture and catalog all the DNA from the world’s flora and fauna.” There also will be genomes for several millions of microbes, with explosive growth projected for both medical and environmental microbe metagenomic sequencing [18,19].

These estimates, however, are dwarfed by the very reasonable possibility that a significant fraction of the world’s human population will have their genomes sequenced. The leading driver of this trend is the promise of genomic medicine to revolutionize the diagnosis and treatment of disease, with some countries contemplating sequencing large portions of their populations: both England [20] and Saudi Arabia [21] have announced plans to sequence 100,000 of their citizens, one-third of Iceland’s 320,000 citizens have donated blood for genetic testing [22], and researchers in both the US [23] and China [17] both aim to sequence 1 million genomes in the next few years. With the world’s population projected to top 8 billion by 2025, it is possible that as many as 25% of the population in developed nations and half of that in less-developed nations will have their genomes sequenced (comparable to the current worldwide distribution of Internet users [24]).

We therefore estimate between 100 million and as many as 2 billion human genomes could be sequenced by 2025, representing four to five orders of magnitude growth in ten years and far exceeding the growth for the three other Big Data domains. Indeed, this number could grow even larger, especially since new single-cell genome sequencing technologies are starting to reveal previously unimagined levels of variation, especially in cancers, necessitating sequencing the genomes of thousands of separate cells in a single tumor [10].

...Data storage requirements for all four domains are projected to be enormous. Today, the largest astronomy data center devotes ~100 petabytes to storage, and the completion of the Square Kilometre Array (SKA) project is expected to lead to a storage demand of 1 exabyte per year. YouTube currently requires from 100 petabytes to 1 exabyte for storage and may be projected to require between 1 and 2 exabytes additional storage per year by 2025. Twitter’s storage needs today are estimated at 0.5 petabytes per year, which may increase to 1.5 petabytes in the next ten years. (Our estimates here ignore the “replication factor” that multiplies storage needs by ~4, for redundancy.) For genomics, we have determined more than 100 petabytes of storage are currently used by only 20 of the largest institutions (S1 Table)."

Post has attachment
'Bioethicist' frustrated they can't come up with a pretext for outlawing having healthier kids or choosing your mates:

"Shady Grove Fertility, the nation’s largest clinic, offers refunds if couples don’t go home with a baby. New Hope Fertility in New York City held a lottery earlier this year that awarded 30 couples a $30,000 round of IVF. And the California IVF Fertility Center is pioneering what some refer to as the “Costco model” of babymaking, creating batches of embryos using donor eggs and sperm that can be shared among several different families.

That model has served to highlight a preference among many would-be parents for tall, thin, highly-educated donors.

“It’s a little unsettling to be marketing characteristics as potentially positive in a future child,” said Rebecca Dresser, a bioethicist at Washington University in St. Louis and a member of the President’s Council on Bioethics under George W. Bush. “But it’s hard to think on what basis to prohibit that.”

And so, Dresser said, “what we have now is prospective parents making judgments about what they think ‘good’ genes are” — decisions that are literally changing the face of the next generation."

This isn't even about embryo selection, just sperm/egg selection; if you choose to marry someone who is taller rather than shorter, that choice of mate itself is 'literally changing the face of the next generation', and this is apparently problematic. Truly beyond parody.

Post has attachment
GWASes show everything is heritable, including participating in GWASes: "The molecular genetics of participation in the Avon Longitudinal Study of Parents and Children", Taylor et al 2017 https://www.biorxiv.org/content/early/2017/10/20/206698

"Background: It is often assumed that selection (including participation and dropout) does not represent an important source of bias in genetic studies. However, there is little evidence to date on the effect of genetic factors on participation. Methods: Using data on mothers (N=7,486) and children (N=7,508) from the Avon Longitudinal Study of Parents and Children, we 1) examined the association of polygenic risk scores for a range of socio-demographic, lifestyle characteristics and health conditions related to continued participation, 2) investigated whether associations of polygenic scores with body mass index (BMI; derived from self-reported weight and height) and self-reported smoking differed in the largest sample with genetic data and a sub-sample who participated in a recent follow-up and 3) determined the proportion of variation in participation explained by common genetic variants using genome-wide data. Results: We found evidence that polygenic scores for higher education, agreeableness and openness were associated with higher participation and polygenic scores for smoking initiation, higher BMI, neuroticism, schizophrenia, ADHD and depression were associated with lower participation. Associations between the polygenic score for education and self-reported smoking differed between the largest sample with genetic data (OR for ever smoking per SD increase in polygenic score:0.85, 95% CI:0.81,0.89) and sub-sample (OR:0.95, 95% CI:0.88,1.02). In genome-wide analysis, single nucleotide polymorphism based heritability explained 17-31% of variability in participation. Conclusions: Genetic association studies, including Mendelian randomization, can be biased by selection, including loss to follow-up. Genetic risk for dropout should be considered in all analyses of studies with selective participation."

Post has attachment
A large pre-registered randomized trial of police bodycams shows tiny effects at best despite extremely promising initial results in other places & experiments.

Rossi's Metallic Laws of program evaluation ( https://www.gwern.net/docs/sociology/1987-rossi ):

- The Iron Law of Evaluation: "The expected value of any net impact assessment of any large scale social program is zero."
- The Stainless Steel Law of Evaluation: "The better designed the impact assessment of a social program, the more likely is the resulting estimate of net impact to be zero."
- The Brass Law of Evaluation: "The more social programs are designed to change individuals, the more likely the net impact of the program will be zero."
- The Zinc Law of Evaluation: Only those programs that are likely to fail are evaluated.

Post has attachment

Post has attachment

Post has attachment

Post has attachment

Post has attachment

Post has attachment
Wait while more posts are being loaded