gwern's posts
Post has attachment
Public
New WP article; followup to https://en.wikipedia.org/wiki/Genome-wide_complex_trait_analysis
Post has attachment
Public
Home steam distillation of 99% pure nepelactone from catnip leaves (0.03% yield compared to theoretical max of 0.3%, so 0.45kg to 143mg).
Post has attachment
Public
"Achieving Human Parity in Conversational Speech Recognition", Xiong et al 2016:
"Conversational speech recognition has served as a flagship speech recognition task since the release of the DARPA Switchboard corpus in the 1990s. In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. The error rate of professional transcriptionists is 5.9% for the Switchboard portion of the data, in which newly acquainted pairs of people discuss an assigned topic, and 11.3% for the CallHome portion where friends and family members have open-ended conversations. In both cases, our automated system establishes a new state-of-the-art, and edges past the human benchmark. This marks the first time that human parity has been reported for conversational speech. The key to our system's performance is the systematic use of convolutional and LSTM neural networks, combined with a novel spatial smoothing method and lattice-free MMI acoustic training."
An ensemble of Residual + CNN + LSTMs implemented in CNTK.
This was reported sometime back when it was released at a conference, but now the paper is out too. Press release: https://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/ Discussion: https://news.ycombinator.com/item?id=12736409 https://news.ycombinator.com/item?id=12501036 https://www.reddit.com/r/MachineLearning/comments/58414p/r_achieving_human_parity_in_conversational_speech/ You may also have seen Baidu's earlier and also impressive work on Mandarin & English transcription.
The dataset in question uses low-quality audio (eg https://catalog.ldc.upenn.edu/desc/addenda/LDC97S62.wav ) and so humans often miss words while trying to transcribe it. For comparison, with a Dragon Naturally Speaking in a quiet environment with a microphone and trained for you personally, I understand that the transcription error rate tends to be ~5%.
"Conversational speech recognition has served as a flagship speech recognition task since the release of the DARPA Switchboard corpus in the 1990s. In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. The error rate of professional transcriptionists is 5.9% for the Switchboard portion of the data, in which newly acquainted pairs of people discuss an assigned topic, and 11.3% for the CallHome portion where friends and family members have open-ended conversations. In both cases, our automated system establishes a new state-of-the-art, and edges past the human benchmark. This marks the first time that human parity has been reported for conversational speech. The key to our system's performance is the systematic use of convolutional and LSTM neural networks, combined with a novel spatial smoothing method and lattice-free MMI acoustic training."
An ensemble of Residual + CNN + LSTMs implemented in CNTK.
This was reported sometime back when it was released at a conference, but now the paper is out too. Press release: https://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/ Discussion: https://news.ycombinator.com/item?id=12736409 https://news.ycombinator.com/item?id=12501036 https://www.reddit.com/r/MachineLearning/comments/58414p/r_achieving_human_parity_in_conversational_speech/ You may also have seen Baidu's earlier and also impressive work on Mandarin & English transcription.
The dataset in question uses low-quality audio (eg https://catalog.ldc.upenn.edu/desc/addenda/LDC97S62.wav ) and so humans often miss words while trying to transcribe it. For comparison, with a Dragon Naturally Speaking in a quiet environment with a microphone and trained for you personally, I understand that the transcription error rate tends to be ~5%.
Post has attachment
Public
A3C RL learner run on a supercomputer, with impressive results in learning performance: "Wow! This chart is has a logarithmic scale because the pace of progress the last few years has been so rapid, but I never even dreamt that our little parallel algorithm would do this well. With a mere 1536 cores on our network learns to defeat Atari Pong from scratch in just 3.9 minutes. That’s less time than it takes a human to play a single game!"
Previous: http://www.allinea.com/blog/201610/deep-learning-episode-3-supercomputer-vs-pong
I guess you could call this a hardware overhang + fast takeoff for playing Pong.
(If you're wondering how it can learn faster than 1 game even though AFAIK the Pong is played in realtime and is not being sped up, it's because he notes that Pong is a stationary game and there's nothing special about beating the computer 11 out of 21 games like in the default settings - you can treat it as a simple game which is a few seconds long, and this helps a lot with parallelism. So while the human is beating the 'full' game over 4 minutes, it's already played thousands of short games, which transfers 100% to the full game. Of course, Pong is so undemanding a game that it could almost certainly be played many times faster than realtime, at which point retraining the NN probably becomes the bottleneck.)
Previous: http://www.allinea.com/blog/201610/deep-learning-episode-3-supercomputer-vs-pong
I guess you could call this a hardware overhang + fast takeoff for playing Pong.
(If you're wondering how it can learn faster than 1 game even though AFAIK the Pong is played in realtime and is not being sped up, it's because he notes that Pong is a stationary game and there's nothing special about beating the computer 11 out of 21 games like in the default settings - you can treat it as a simple game which is a few seconds long, and this helps a lot with parallelism. So while the human is beating the 'full' game over 4 minutes, it's already played thousands of short games, which transfers 100% to the full game. Of course, Pong is so undemanding a game that it could almost certainly be played many times faster than realtime, at which point retraining the NN probably becomes the bottleneck.)
Post has attachment
Public
Post has attachment
Public
CMV also comes up a lot in anti-aging discussions: CMV seems to uselessly clog up the immune system.
Post has attachment
Public
A3C, current state of the art in RL: "Asynchronous Methods for Deep Reinforcement Learning", Mnih et al 2016:
"We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input."
Yes, I know this is relatively old, but it's still the state of the art so worth rereading especially if you missed it the first time around.
"We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input."
Yes, I know this is relatively old, but it's still the state of the art so worth rereading especially if you missed it the first time around.
Post has attachment
Public
"The negative Flynn Effect: A systematic literature review", Dutton et al 2016:
"The Flynn Effect (rising performance on intelligence tests in the general population over time) is now an established phenomenon in many developed and less developed countries. Recently, evidence has begun to amass that the Flynn Effect has gone into reverse; the so-called ‘Negative Flynn Effect.’ In this study, we present a systematic literature review, conducted in order to discover in precisely how many countries this reverse phenomenon has been uncovered. Using strict criteria regarding quality of the sample and the study, we found nine studies reporting negative Flynn Effects in seven countries. We also discuss several possible explanations for the negative Flynn Effect as an attempt to understand its most probable causes.
The nine articles included, draw upon the following tests:
(1) Sundet et al. (2004) used the General Ability Test, an IQ test developed by the Norwegian army in 1954. It is composed of Words, Numbers and Shapes and conscripts are given a GA (General Ability) score, which corresponds to an IQ score.
(2) Woodley and Meisenberg's (2013) meta-analysis of tests of Dutch adults used the GATB = General Aptitude Test Battery. This measures 9 different ‘aptitudes’ among which are verbal aptitude, numerical aptitude and spatial aptitude.
(3) Teasdale and Owen's (2008) study drew upon the Borge Prien's Prove, which is an IQ test used by the Danish army on recruits since 1961. It is comprised of logical, verbal, numerical and spatial reasoning tests.
(4) Shayer and Ginsburg's (2007, 2009) studies drew upon the Piagetian test: An IQ test developed for children. Piaget's theory focuses on interviewing the subjects to discover why they answered in a particular way.
(5) Dutton and Lynn's (2013) study drew upon annual average results of the Finnish Peruskoe, which literally translates as ‘Basic test.’ This is an IQ test developed by the Finnish army composed of Numbers, Words and Shapes tests. These results were reported in Koivunen (2007) up to 2001, a thesis which was sent to them by a Finnish army researcher, as well as in correspondence with the same Finnish army researcher for 2008–9.
(6) Korgesaar's (2013) Estonian study drew upon the Raven Standard Progressive Matrices (SPM) test, which is a widely-accepted test of general intelligence and, as such, the study is within our inclusion criteria.
(7) (7) Dutton and Lynn (2015) drew upon the French WAIS (Wechsler Adult Intelligence Test) IV manual.
From Table 1 it can be seen that in the majority of the studies the decline ranges between 0.38 and 4.30 IQ points per decade. The Estonian study seems to be somewhat of an outlier with a decline of 8.4 IQ points per decade. Taking the un-weighted average of all the studies, the mean decline per decade in the studies would be 3.18 points. When excluding the rather high value of Estonia, the average decline in the remaining seven studies becomes 2.44 IQ points per decade.
Focussing on the Nordic data, Sundet (correspondence quoted in Dutton, 2014, p.243) has noted that: “Men from (South) Asian and African countries have around 5–6 IQ points lower than non-immigrants. Yet, they seem to comprise no more than around 2–3% of the conscripts in this period. If there would be any effect then this would deflate the total mean IQ by around a maximum of 0.1–0.2 IQ points.” As such, it simply cannot fully explain the decline. Also, conscript data from Finland is particularly important in assessing the immigration hypothesis. Finland did not experience any significant third world immigration until around 1992 (see Dutton & Lynn, 2013). However, the conscripts in 1997 would have been mainly born in 1978 when the non-white population of Finland was vanishingly small.
Despite the limitations outlined above and the small N (= 7) for country level, we decided to calculate the correlation between the IQ decline per decade and average immigration between 1950 and 2015 (Migration Policy, 2016). When including all countries in the review, the correlation was virtually zero (r(7) = 0.033, p = 0.94). However, it has to be noted that Estonia did not only appear to be an outlier regarding the IQ decline, but also regarding immigration, because it was the only country that showed negative immigration numbers. When conducting the calculations again, but this time excluding Estonia, the correlation became r(6) = 0.802 and reached marginal significance (p = 0.055). Nevertheless, we have already noted that, based on the percentage of immigrants in the most reliable samples, immigration is unlikely to have a large influence. Accordingly, this association may be underpinned by a factor, which underlies both dysgenics and high immigration levels, such as degree of societal development or putative amount of time since industrialization.
something precipitated, possibly, by living in more educated society, as argued by Flynn (2012). He argues that modern society makes us increasingly look at the world through ‘scientific spectacles’ and, accordingly, examine it analytically, boosting performance IQ, especially on similarities. As this is ultimately underpinned by intelligence, it would have a genotypic limit and if genotypic intelligence were declining then the imperfect nature of the IQ test as a measure of intelligence would mask this, but only up to the genotypic limit. Once this limit was reached, any genotypic decline in IQ would become visible on the IQ tests. As mentioned above, this is known as the Co-occurrence Model. It is possible that this is what has happened because dysgenic fertility - a negative association between intelligence and numbers of children - has been observed in Denmark, Sweden, Finland, and a number of other countries reviewed in Lynn (2011). Indeed, if Flynn's ‘scientific spectacles’ explanation is accurate then we would expect to see, prior to an overall negative Flynn Effect, a negative effect on verbal and mathematical IQ concomitant with a positive effect on other parts of the test. This is, indeed, what we see in the studies we excluded. Khaleefa, Sulman, and Lynn (2009) found that Sudanese Full-scale IQ increased 2.05 points per decade between 1987 and 2007, but Verbal IQ decreased by 1.65 points over the period. Colom, Andres-Pueyo, and Juan-Espinosa (1998) reported a decline in Spanish verbal reasoning (male and female −0.3) and mathematical reasoning (male − 2.4; female − 2.1) between 1979 and 1995 but a rise on abstract reasoning (and also Ravens) sufficient to create an overall Flynn Effect.
Besides such differential effects on subtests, we would also expect to see a slowing down of the Flynn Effect before it ultimately ceased, because the Flynn Effect itself would be partly g-loaded (with g in decline) and there would be a limit to the extent to which the environment can raise IQ scores. The meta-analysis of the Flynn Effect by Pietschnig and Voracek (2015) does indeed show that IQ gains since the 1980s had considerably slowed down. The gains were also increasingly non-linear in this period."
"The Flynn Effect (rising performance on intelligence tests in the general population over time) is now an established phenomenon in many developed and less developed countries. Recently, evidence has begun to amass that the Flynn Effect has gone into reverse; the so-called ‘Negative Flynn Effect.’ In this study, we present a systematic literature review, conducted in order to discover in precisely how many countries this reverse phenomenon has been uncovered. Using strict criteria regarding quality of the sample and the study, we found nine studies reporting negative Flynn Effects in seven countries. We also discuss several possible explanations for the negative Flynn Effect as an attempt to understand its most probable causes.
The nine articles included, draw upon the following tests:
(1) Sundet et al. (2004) used the General Ability Test, an IQ test developed by the Norwegian army in 1954. It is composed of Words, Numbers and Shapes and conscripts are given a GA (General Ability) score, which corresponds to an IQ score.
(2) Woodley and Meisenberg's (2013) meta-analysis of tests of Dutch adults used the GATB = General Aptitude Test Battery. This measures 9 different ‘aptitudes’ among which are verbal aptitude, numerical aptitude and spatial aptitude.
(3) Teasdale and Owen's (2008) study drew upon the Borge Prien's Prove, which is an IQ test used by the Danish army on recruits since 1961. It is comprised of logical, verbal, numerical and spatial reasoning tests.
(4) Shayer and Ginsburg's (2007, 2009) studies drew upon the Piagetian test: An IQ test developed for children. Piaget's theory focuses on interviewing the subjects to discover why they answered in a particular way.
(5) Dutton and Lynn's (2013) study drew upon annual average results of the Finnish Peruskoe, which literally translates as ‘Basic test.’ This is an IQ test developed by the Finnish army composed of Numbers, Words and Shapes tests. These results were reported in Koivunen (2007) up to 2001, a thesis which was sent to them by a Finnish army researcher, as well as in correspondence with the same Finnish army researcher for 2008–9.
(6) Korgesaar's (2013) Estonian study drew upon the Raven Standard Progressive Matrices (SPM) test, which is a widely-accepted test of general intelligence and, as such, the study is within our inclusion criteria.
(7) (7) Dutton and Lynn (2015) drew upon the French WAIS (Wechsler Adult Intelligence Test) IV manual.
From Table 1 it can be seen that in the majority of the studies the decline ranges between 0.38 and 4.30 IQ points per decade. The Estonian study seems to be somewhat of an outlier with a decline of 8.4 IQ points per decade. Taking the un-weighted average of all the studies, the mean decline per decade in the studies would be 3.18 points. When excluding the rather high value of Estonia, the average decline in the remaining seven studies becomes 2.44 IQ points per decade.
Focussing on the Nordic data, Sundet (correspondence quoted in Dutton, 2014, p.243) has noted that: “Men from (South) Asian and African countries have around 5–6 IQ points lower than non-immigrants. Yet, they seem to comprise no more than around 2–3% of the conscripts in this period. If there would be any effect then this would deflate the total mean IQ by around a maximum of 0.1–0.2 IQ points.” As such, it simply cannot fully explain the decline. Also, conscript data from Finland is particularly important in assessing the immigration hypothesis. Finland did not experience any significant third world immigration until around 1992 (see Dutton & Lynn, 2013). However, the conscripts in 1997 would have been mainly born in 1978 when the non-white population of Finland was vanishingly small.
Despite the limitations outlined above and the small N (= 7) for country level, we decided to calculate the correlation between the IQ decline per decade and average immigration between 1950 and 2015 (Migration Policy, 2016). When including all countries in the review, the correlation was virtually zero (r(7) = 0.033, p = 0.94). However, it has to be noted that Estonia did not only appear to be an outlier regarding the IQ decline, but also regarding immigration, because it was the only country that showed negative immigration numbers. When conducting the calculations again, but this time excluding Estonia, the correlation became r(6) = 0.802 and reached marginal significance (p = 0.055). Nevertheless, we have already noted that, based on the percentage of immigrants in the most reliable samples, immigration is unlikely to have a large influence. Accordingly, this association may be underpinned by a factor, which underlies both dysgenics and high immigration levels, such as degree of societal development or putative amount of time since industrialization.
something precipitated, possibly, by living in more educated society, as argued by Flynn (2012). He argues that modern society makes us increasingly look at the world through ‘scientific spectacles’ and, accordingly, examine it analytically, boosting performance IQ, especially on similarities. As this is ultimately underpinned by intelligence, it would have a genotypic limit and if genotypic intelligence were declining then the imperfect nature of the IQ test as a measure of intelligence would mask this, but only up to the genotypic limit. Once this limit was reached, any genotypic decline in IQ would become visible on the IQ tests. As mentioned above, this is known as the Co-occurrence Model. It is possible that this is what has happened because dysgenic fertility - a negative association between intelligence and numbers of children - has been observed in Denmark, Sweden, Finland, and a number of other countries reviewed in Lynn (2011). Indeed, if Flynn's ‘scientific spectacles’ explanation is accurate then we would expect to see, prior to an overall negative Flynn Effect, a negative effect on verbal and mathematical IQ concomitant with a positive effect on other parts of the test. This is, indeed, what we see in the studies we excluded. Khaleefa, Sulman, and Lynn (2009) found that Sudanese Full-scale IQ increased 2.05 points per decade between 1987 and 2007, but Verbal IQ decreased by 1.65 points over the period. Colom, Andres-Pueyo, and Juan-Espinosa (1998) reported a decline in Spanish verbal reasoning (male and female −0.3) and mathematical reasoning (male − 2.4; female − 2.1) between 1979 and 1995 but a rise on abstract reasoning (and also Ravens) sufficient to create an overall Flynn Effect.
Besides such differential effects on subtests, we would also expect to see a slowing down of the Flynn Effect before it ultimately ceased, because the Flynn Effect itself would be partly g-loaded (with g in decline) and there would be a limit to the extent to which the environment can raise IQ scores. The meta-analysis of the Flynn Effect by Pietschnig and Voracek (2015) does indeed show that IQ gains since the 1980s had considerably slowed down. The gains were also increasingly non-linear in this period."
Post has attachment
Public
Active learning demo: interactively drag and drop photos to train a CNN+random forest to binary classify along some trait. The random forest uses the CNN features for fast enough retraining to make interactive active learning feasible (you don't need the RF since you can finetune train but that would typically take too long).
Post has attachment
Public
"Genome-wide analyses of empathy and systemizing: heritability and correlates with sex, education, and psychiatric risk", Warrier et al 2016a:
"Empathy is the drive to identify the mental states of others and respond to these with an appropriate emotion. Systemizing is the drive to analyse or build lawful systems. Difficulties in empathy have been identified in different psychiatric conditions including autism and schizophrenia. In this study, we conducted genome-wide association studies of empathy and systemizing using the Empathy Quotient (EQ) (n = 46,861) and the Systemizing Quotient-Revised (SQ-R) (n = 51,564) in participants from 23andMe, Inc. We confirmed significant sex-differences in performance on both tasks, with a male advantage on the SQ-R and female advantage on the EQ. We found highly significant heritability explained by single nucleotide polymorphisms (SNPs) for both the traits (EQ: 0.11 ± 0.014; P= 1.7x10-14 and SQ-R: 0.12 ± 0.012; P=1.2x10-20) and these were similar for males and females. However, genes with higher expression in the male brain appear to contribute to the male advantage for the SQ-R. Finally, we identified significant genetic correlations between high score for empathy and risk for schizophrenia (P= 2.5x10-5), and correlations between high score for systemizing and higher educational attainment (P= 5x10-4). These results shed light on the genetic contribution to individual differences in empathy and systemizing, two major cognitive functions of the human brain."
Another version of the empathy analysis, using a different measure: a questionnaire of self-rated empathy versus performance on a facial recognition test. Similar but not identical.
"Empathy is the drive to identify the mental states of others and respond to these with an appropriate emotion. Systemizing is the drive to analyse or build lawful systems. Difficulties in empathy have been identified in different psychiatric conditions including autism and schizophrenia. In this study, we conducted genome-wide association studies of empathy and systemizing using the Empathy Quotient (EQ) (n = 46,861) and the Systemizing Quotient-Revised (SQ-R) (n = 51,564) in participants from 23andMe, Inc. We confirmed significant sex-differences in performance on both tasks, with a male advantage on the SQ-R and female advantage on the EQ. We found highly significant heritability explained by single nucleotide polymorphisms (SNPs) for both the traits (EQ: 0.11 ± 0.014; P= 1.7x10-14 and SQ-R: 0.12 ± 0.012; P=1.2x10-20) and these were similar for males and females. However, genes with higher expression in the male brain appear to contribute to the male advantage for the SQ-R. Finally, we identified significant genetic correlations between high score for empathy and risk for schizophrenia (P= 2.5x10-5), and correlations between high score for systemizing and higher educational attainment (P= 5x10-4). These results shed light on the genetic contribution to individual differences in empathy and systemizing, two major cognitive functions of the human brain."
Another version of the empathy analysis, using a different measure: a questionnaire of self-rated empathy versus performance on a facial recognition test. Similar but not identical.
Wait while more posts are being loaded
