Nate Silver 2012, The Signal and the Noise (featuring Robin Hanson):
# ch6
In April 1997, the Red River of the North flooded Grand Forks, North Dakota, overtopping the town's levees and spilling more than two miles into the city.*4 Although there was no loss of life, nearly all of the city's 50,000 residents had to be evacuated, cleanup costs ran into the billions of dollars,5 and 75 percent of the city's homes were damaged or destroyed.6
Unlike a hurricane or an earthquake, the Grand Forks flood may have been a preventable disaster. The city's floodwalls could have been reinforced using sandbags.7 It might also have been possible to divert the overflow into depopulated areas-into farmland instead of schools, churches, and homes.
Residents of Grand Forks had been aware of the flood threat for months. Snowfall had been especially heavy in the Great Plains that winter, and the National Weather Service, anticipating runoff as the snow melted, had predicted the waters of the Red River would crest to forty-nine feet, close to the all-time record.
There was just one small problem. The levees in Grand Forks had been built to handle a flood of fifty-one feet. Even a small miss in the forty-nine-foot prediction could prove catastrophic.
In fact, the river crested to fifty-four feet. The Weather Service's forecast hadn't been perfect by any means, but a five-foot miss, two months in advance of a flood, is pretty reasonable-about as well as these predictions had done on average historically. The margin of error on the Weather Service's forecast-based on how well their flood forecasts had done in the past-was about plus or minus nine feet. That implied about a 35 percent chance of the levees being overtopped.8
...Left to their own devices, many residents became convinced they didn't have anything to worry about. (Very few of them bought flood insurance.10) A prediction of a forty-nine-foot crest in the river, expressed without any reservation, seemed to imply that the flood would hit forty-nine feet exactly; the fifty-one-foot levees would be just enough to keep them safe. Some residents even interpreted the forecast of forty-nine feet as representing the maximum possible extent of the flood.11
An oft-told joke: a statistician drowned crossing a river that was only three feet deep on average.
As I mentioned, the economists in this survey thought that GDP would end up at about 2.4 percent in 2008, slightly below its long-term trend. This was a very bad forecast: GDP actually shrank by 3.3 percent once the financial crisis hit. What may be worse is that the economists were extremely confident in their bad prediction. They assigned only a 3 percent chance to the economy's shrinking by any margin over the whole of 2008.15 And they gave it only about a 1-in-500 chance of shrinking by at least 2 percent, as it did.16
Indeed, economists have for a long time been much too confident in their ability to predict the direction of the economy. In figure 6-4, I've plotted the forecasts of GDP growth from the Survey of Professional Forecasters for the eighteen years between 1993 and 2010.17 The bars in the chart represent the 90 percent prediction intervals as stated by the economists.
A prediction interval is a range of the most likely outcomes that a forecast provides for, much like the margin of error in a poll. A 90 percent prediction interval, for instance, is supposed to cover 90 percent of the possible real-world outcomes, leaving only the 10 percent of outlying cases at the tail ends of the distribution. If the economists' forecasts were as accurate as they claimed, we'd expect the actual value for GDP to fall within their prediction interval nine times out of ten, or all but about twice in eighteen years.
In fact, the actual value for GDP fell outside the economists' prediction interval six times in eighteen years, or fully one-third of the time. Another study,18 which ran these numbers back to the beginnings of the Survey of Professional Forecasters in 1968, found even worse results: the actual figure for GDP fell outside the prediction interval almost half the time. There is almost no chance19 that the economists have simply been unlucky; they fundamentally overstate the reliability of their predictions.
In reality, when a group of economists give you their GDP forecast, the true 90 percent prediction interval-based on how these forecasts have actually performed20 and not on how accurate the economists claim them to be-spans about 6.4 points of GDP (equivalent to a margin of error of plus or minus 3.2 percent).*
When you hear on the news that GDP will grow by 2.5 percent next year, that means it could quite easily grow at a spectacular rate of 5.7 percent instead. Or it could fall by 0.7 percent-a fairly serious recession. Economists haven't been able to do any better than that, and there isn't much evidence that their forecasts are improving. The old joke about economists' having called nine out of the last six recessions correctly has some truth to it; one actual statistic is that in the 1990s, economists predicted only 2 of the 60 recessions around the world a year ahead of time.21
The government produces data on literally 45,000 economic indicators each year.24 Private data providers track as many as four million statistics.25 The temptation that some economists succumb to is to put all this data into a blender and claim that the resulting gruel is haute cuisine. There have been only eleven recessions since the end of World War II.26 If you have a statistical model that seeks to explain eleven outputs but has to choose from among four million inputs to do so, many of the relationships it identifies are going to be spurious. (This is another classic case of overfitting-mistaking noise for a signal-the problem that befell earthquake forecasters in chapter 5.)
Consider how creative you might be when you have a stack of economic variables as thick as a phone book. A once-famous "leading indicator" of economic performance, for instance, was the winner of the Super Bowl. From Super Bowl I in 1967 through Super Bowl XXXI in 1997, the stock market27 gained an average of 14 percent for the rest of the year when a team from the original National Football League (NFL) won the game.28 But it fell by almost 10 percent when a team from the original American Football League (AFL) won instead.
Through 1997, this indicator had correctly "predicted" the direction of the stock market in twenty-eight of thirty-one years. A standard test of statistical significance,29 if taken literally, would have implied that there was only about a 1-in-4,700,000 possibility that the relationship had emerged from chance alone.
It was just a coincidence, of course. And eventually, the indicator began to perform badly. In 1998, the Denver Broncos, an original AFL team, won the Super Bowl-supposedly a bad omen. But rather than falling, the stock market gained 28 percent amid the dot-com boom. In 2008, the NFL's New York Giants came from behind to upset the AFL's New England Patriots on David Tyree's spectacular catch-but Tyree couldn't prevent the collapse of the housing bubble, which caused the market to crash by 35 percent. Since 1998, in fact, the stock market has done about 10 percent better when the AFL team won the Super Bowl, exactly the opposite of what the indicator was fabled to predict.
How does an indicator that supposedly had just a 1-in-4,700,000 chance of failing flop so badly? For the same reason that, even though the odds of winning the Powerball lottery are only 1 chance in 195 million,30 somebody wins it every few weeks. The odds are hugely against any one person winning the lottery-but millions of tickets are bought, so somebody is going to get lucky. Likewise, of the millions of statistical indicators in the world, a few will have happened to correlate especially well with stock prices or GDP or the unemployment rate. If not the winner of the Super Bowl, it might be chicken production in Uganda. But the relationship is merely coincidental.
...It's much harder to find something that identifies the signal; variables that are leading indicators in one economic cycle often turn out to be lagging ones in the next. Of the seven so-called leading indicators in a 2003 Inc. magazine article,33 all of which had been good predictors of the 1990 and 2001 recessions, only two-housing prices and temporary hiring-led the recession that began in 2007 to any appreciable degree. Others, like commercial lending, did not begin to turn downward until a year after the recession began.
Even the well-regarded Leading Economic Index, a composite of ten economic indicators published by the Conference Board, has had its share of problems. The Leading Economic Index has generally declined a couple of months in advance of recessions. But it has given roughly as many false alarms-including most infamously in 1984, when it sharply declined for three straight months,34 signaling a recession, but the economy continued to zoom upward at a 6 percent rate of growth. Some studies have even claimed that the Leading Economic Index has no predictive power at all when applied in real time.35
Historically, for instance, there has been a reasonably strong correlation between GDP growth and job growth. Economists refer to this as Okun's law. During the Long Boom of 1947 through 1999, the rate of job growth40 had normally been about half the rate of GDP growth, so if GDP increased by 4 percent during a year, the number of jobs would increase by about 2 percent.
The relationship still exists-more growth is certainly better for job seekers. But its dynamics seem to have changed. After each of the last couple of recessions, considerably fewer jobs were created than would have been expected during the Long Boom years. In the year after the stimulus package was passed in 2009, for instance, GDP was growing fast enough to create about two million jobs according to Okun's law.41 Instead, an additional 3.5 million jobs were lost during the period.
Economists often debate about what the change means. The most pessimistic interpretation, advanced by economists including Jeffrey Sachs of Columbia University, is that the pattern reflects profound structural problems in the American economy: among them, increasing competition from other countries, an imbalance between the service and manufacturing sectors, an aging population, a declining middle class, and a rising national debt. Under this theory, we have entered a new and unhealthy normal, and the problems may get worse unless fundamental changes are made. "We were underestimating the role of global change in causing U.S. change," Sachs told me. "The loss of jobs internationally to China and emerging markets have really jolted the American economy."
The bigger question is whether the volatility of the 2000s is more representative of the long-run condition of the economy-perhaps the long boom years had been the outlier. During the Long Boom, the economy was in recession only 15 percent of the time. But the rate was more than twice that-36 percent-from 1900 through 1945.42
"I think the most interesting question is how little effort we actually put into forecasting, even on the things we say are important to us," Robin Hanson told me as the food arrived.
"In an MBA school you present this image of a manager as a great decision maker-the scientific decision maker. He's got his spreadsheet and he's got his statistical tests and he's going to weigh the various options. But in fact real management is mostly about managing coalitions, maintaining support for a project so it doesn't evaporate. If they put together a coalition to do a project, and then at the last minute the forecasts fluctuate, you can't dump the project at the last minute, right?
Even academics aren't very interested in collecting a track record of forecasts-they're not very interested in making clear enough forecasts to score," he says later. "What's in it for them? The more fundamental problem is that we have a demand for experts in our society but we don't actually have that much of a demand for accurate forecasts."
# ch6
In April 1997, the Red River of the North flooded Grand Forks, North Dakota, overtopping the town's levees and spilling more than two miles into the city.*4 Although there was no loss of life, nearly all of the city's 50,000 residents had to be evacuated, cleanup costs ran into the billions of dollars,5 and 75 percent of the city's homes were damaged or destroyed.6
Unlike a hurricane or an earthquake, the Grand Forks flood may have been a preventable disaster. The city's floodwalls could have been reinforced using sandbags.7 It might also have been possible to divert the overflow into depopulated areas-into farmland instead of schools, churches, and homes.
Residents of Grand Forks had been aware of the flood threat for months. Snowfall had been especially heavy in the Great Plains that winter, and the National Weather Service, anticipating runoff as the snow melted, had predicted the waters of the Red River would crest to forty-nine feet, close to the all-time record.
There was just one small problem. The levees in Grand Forks had been built to handle a flood of fifty-one feet. Even a small miss in the forty-nine-foot prediction could prove catastrophic.
In fact, the river crested to fifty-four feet. The Weather Service's forecast hadn't been perfect by any means, but a five-foot miss, two months in advance of a flood, is pretty reasonable-about as well as these predictions had done on average historically. The margin of error on the Weather Service's forecast-based on how well their flood forecasts had done in the past-was about plus or minus nine feet. That implied about a 35 percent chance of the levees being overtopped.8
...Left to their own devices, many residents became convinced they didn't have anything to worry about. (Very few of them bought flood insurance.10) A prediction of a forty-nine-foot crest in the river, expressed without any reservation, seemed to imply that the flood would hit forty-nine feet exactly; the fifty-one-foot levees would be just enough to keep them safe. Some residents even interpreted the forecast of forty-nine feet as representing the maximum possible extent of the flood.11
An oft-told joke: a statistician drowned crossing a river that was only three feet deep on average.
As I mentioned, the economists in this survey thought that GDP would end up at about 2.4 percent in 2008, slightly below its long-term trend. This was a very bad forecast: GDP actually shrank by 3.3 percent once the financial crisis hit. What may be worse is that the economists were extremely confident in their bad prediction. They assigned only a 3 percent chance to the economy's shrinking by any margin over the whole of 2008.15 And they gave it only about a 1-in-500 chance of shrinking by at least 2 percent, as it did.16
Indeed, economists have for a long time been much too confident in their ability to predict the direction of the economy. In figure 6-4, I've plotted the forecasts of GDP growth from the Survey of Professional Forecasters for the eighteen years between 1993 and 2010.17 The bars in the chart represent the 90 percent prediction intervals as stated by the economists.
A prediction interval is a range of the most likely outcomes that a forecast provides for, much like the margin of error in a poll. A 90 percent prediction interval, for instance, is supposed to cover 90 percent of the possible real-world outcomes, leaving only the 10 percent of outlying cases at the tail ends of the distribution. If the economists' forecasts were as accurate as they claimed, we'd expect the actual value for GDP to fall within their prediction interval nine times out of ten, or all but about twice in eighteen years.
In fact, the actual value for GDP fell outside the economists' prediction interval six times in eighteen years, or fully one-third of the time. Another study,18 which ran these numbers back to the beginnings of the Survey of Professional Forecasters in 1968, found even worse results: the actual figure for GDP fell outside the prediction interval almost half the time. There is almost no chance19 that the economists have simply been unlucky; they fundamentally overstate the reliability of their predictions.
In reality, when a group of economists give you their GDP forecast, the true 90 percent prediction interval-based on how these forecasts have actually performed20 and not on how accurate the economists claim them to be-spans about 6.4 points of GDP (equivalent to a margin of error of plus or minus 3.2 percent).*
When you hear on the news that GDP will grow by 2.5 percent next year, that means it could quite easily grow at a spectacular rate of 5.7 percent instead. Or it could fall by 0.7 percent-a fairly serious recession. Economists haven't been able to do any better than that, and there isn't much evidence that their forecasts are improving. The old joke about economists' having called nine out of the last six recessions correctly has some truth to it; one actual statistic is that in the 1990s, economists predicted only 2 of the 60 recessions around the world a year ahead of time.21
The government produces data on literally 45,000 economic indicators each year.24 Private data providers track as many as four million statistics.25 The temptation that some economists succumb to is to put all this data into a blender and claim that the resulting gruel is haute cuisine. There have been only eleven recessions since the end of World War II.26 If you have a statistical model that seeks to explain eleven outputs but has to choose from among four million inputs to do so, many of the relationships it identifies are going to be spurious. (This is another classic case of overfitting-mistaking noise for a signal-the problem that befell earthquake forecasters in chapter 5.)
Consider how creative you might be when you have a stack of economic variables as thick as a phone book. A once-famous "leading indicator" of economic performance, for instance, was the winner of the Super Bowl. From Super Bowl I in 1967 through Super Bowl XXXI in 1997, the stock market27 gained an average of 14 percent for the rest of the year when a team from the original National Football League (NFL) won the game.28 But it fell by almost 10 percent when a team from the original American Football League (AFL) won instead.
Through 1997, this indicator had correctly "predicted" the direction of the stock market in twenty-eight of thirty-one years. A standard test of statistical significance,29 if taken literally, would have implied that there was only about a 1-in-4,700,000 possibility that the relationship had emerged from chance alone.
It was just a coincidence, of course. And eventually, the indicator began to perform badly. In 1998, the Denver Broncos, an original AFL team, won the Super Bowl-supposedly a bad omen. But rather than falling, the stock market gained 28 percent amid the dot-com boom. In 2008, the NFL's New York Giants came from behind to upset the AFL's New England Patriots on David Tyree's spectacular catch-but Tyree couldn't prevent the collapse of the housing bubble, which caused the market to crash by 35 percent. Since 1998, in fact, the stock market has done about 10 percent better when the AFL team won the Super Bowl, exactly the opposite of what the indicator was fabled to predict.
How does an indicator that supposedly had just a 1-in-4,700,000 chance of failing flop so badly? For the same reason that, even though the odds of winning the Powerball lottery are only 1 chance in 195 million,30 somebody wins it every few weeks. The odds are hugely against any one person winning the lottery-but millions of tickets are bought, so somebody is going to get lucky. Likewise, of the millions of statistical indicators in the world, a few will have happened to correlate especially well with stock prices or GDP or the unemployment rate. If not the winner of the Super Bowl, it might be chicken production in Uganda. But the relationship is merely coincidental.
...It's much harder to find something that identifies the signal; variables that are leading indicators in one economic cycle often turn out to be lagging ones in the next. Of the seven so-called leading indicators in a 2003 Inc. magazine article,33 all of which had been good predictors of the 1990 and 2001 recessions, only two-housing prices and temporary hiring-led the recession that began in 2007 to any appreciable degree. Others, like commercial lending, did not begin to turn downward until a year after the recession began.
Even the well-regarded Leading Economic Index, a composite of ten economic indicators published by the Conference Board, has had its share of problems. The Leading Economic Index has generally declined a couple of months in advance of recessions. But it has given roughly as many false alarms-including most infamously in 1984, when it sharply declined for three straight months,34 signaling a recession, but the economy continued to zoom upward at a 6 percent rate of growth. Some studies have even claimed that the Leading Economic Index has no predictive power at all when applied in real time.35
Historically, for instance, there has been a reasonably strong correlation between GDP growth and job growth. Economists refer to this as Okun's law. During the Long Boom of 1947 through 1999, the rate of job growth40 had normally been about half the rate of GDP growth, so if GDP increased by 4 percent during a year, the number of jobs would increase by about 2 percent.
The relationship still exists-more growth is certainly better for job seekers. But its dynamics seem to have changed. After each of the last couple of recessions, considerably fewer jobs were created than would have been expected during the Long Boom years. In the year after the stimulus package was passed in 2009, for instance, GDP was growing fast enough to create about two million jobs according to Okun's law.41 Instead, an additional 3.5 million jobs were lost during the period.
Economists often debate about what the change means. The most pessimistic interpretation, advanced by economists including Jeffrey Sachs of Columbia University, is that the pattern reflects profound structural problems in the American economy: among them, increasing competition from other countries, an imbalance between the service and manufacturing sectors, an aging population, a declining middle class, and a rising national debt. Under this theory, we have entered a new and unhealthy normal, and the problems may get worse unless fundamental changes are made. "We were underestimating the role of global change in causing U.S. change," Sachs told me. "The loss of jobs internationally to China and emerging markets have really jolted the American economy."
The bigger question is whether the volatility of the 2000s is more representative of the long-run condition of the economy-perhaps the long boom years had been the outlier. During the Long Boom, the economy was in recession only 15 percent of the time. But the rate was more than twice that-36 percent-from 1900 through 1945.42
"I think the most interesting question is how little effort we actually put into forecasting, even on the things we say are important to us," Robin Hanson told me as the food arrived.
"In an MBA school you present this image of a manager as a great decision maker-the scientific decision maker. He's got his spreadsheet and he's got his statistical tests and he's going to weigh the various options. But in fact real management is mostly about managing coalitions, maintaining support for a project so it doesn't evaporate. If they put together a coalition to do a project, and then at the last minute the forecasts fluctuate, you can't dump the project at the last minute, right?
Even academics aren't very interested in collecting a track record of forecasts-they're not very interested in making clear enough forecasts to score," he says later. "What's in it for them? The more fundamental problem is that we have a demand for experts in our society but we don't actually have that much of a demand for accurate forecasts."
I personally hold a belief, that there is a degree of reflexivity to these spuriously data mined rules, proportional to the publicity that rule receives.
For example it could be that at first 200 day moving average or Dow transports index rising a certain percent, had no predictive power, and that market prices appearing to bounce off it was overfitting from trying different averages.
But once the indicator or anecdotal rule becomes popular, the traders watch that line on the charts, and place orders and make investment decisions based on an initially meaningless line on the chart.
But because they do, suddenly the 200 day average actually becomes something that affect real world, certainly the earnings of financial sector of the economy.
The meaningless data mined number suddenly works, only because of the multitude of people believing in it.
Of course nothing can override fundamentals, so eventually things go where they should, but later then everyone expectsNov 22, 2012