Nate Silver 2012, The Signal and the Noise:
# ch4
The public at large became more interested in weather forecasting after the Schoolhouse Blizzard of January 1888. On January 12 that year, initially a relatively warm day in the Great Plains, the temperature dropped almost 30 degrees in a matter of a few hours and a blinding snowstorm came.26 Hundreds of children, leaving school and caught unaware as the blizzard hit, died of hypothermia on their way home. As crude as early weather forecasts were, it was hoped that they might at least be able to provide some warning about an event so severe. So the National Weather Service was moved to the Department of Agriculture and took on a more civilian-facing mission.*
What is it, exactly, that humans can do better than computers that can crunch numbers at seventy-seven teraFLOPS? They can see. Hoke led me onto the forecasting floor, which consisted of a series of workstations marked with blue overhanging signs with such legends as MARITIME FORECAST CENTER and NATIONAL CENTER. Each station was manned by one or two meterologists-accompanied by an armada of flat-screen monitors that displayed full-color maps of every conceivable type of weather data for every corner of the country. The forecasters worked quietly and quickly, with a certain amount of Grant's military precision.30
Some of the forecasters were drawing on these maps with what appeared to be a light pen, painstakingly adjusting the contours of temperature gradients produced by the computer models-fifteen miles westward over the Mississippi Delta, thirty miles northward into Lake Erie. Gradually, they were bringing them one step closer to the Platonic ideal they were hoping to represent.
The forecasters know the flaws in the computer models. These inevitably arise because-as a consequence of chaos theory-even the most trivial bug in the model can have potentially profound effects. Perhaps the computer tends to be too conservative on forecasting nighttime rainfalls in Seattle when there's a low-pressure system in Puget Sound. Perhaps it doesn't know that the fog in Acadia National Park in Maine will clear up by sunrise if the wind is blowing in one direction, but can linger until midmorning if it's coming from another. These are the sorts of distinctions that forecasters glean over time as they learn to work around the flaws in the model, in the way that a skilled pool player can adjust to the dead spots on the table at his local bar.
...The NWS keeps two different sets of books: one that shows how well the computers are doing by themselves and another that accounts for how much value the humans are contributing. According to the agency's statistics, humans improve the accuracy of precipitation forecasts by about 25 percent over the computer guidance alone,31 and temperature forecasts by about 10 percent.32 Moreover, according to Hoke, these ratios have been relatively constant over time: as much progress as the computers have made, his forecasters continue to add value on top of it. Vision accounts for a lot.
When Hoke began his career, in the mid-'70s, the jokes about weather forecasters had some grounding in truth. On average, for instance, the NWS was missing the high temperature by about 6 degrees when trying to forecast it three days in advance (figure 4-4). That isn't much better than the accuracy you could get just by looking up a table of long-term averages. The partnership between man and machine is paying big dividends, however. Today, the average miss is about 3.5 degrees, meaning that almost half the inaccuracy has been stripped out.
Weather forecasters are also getting better at predicting severe weather. What are your odds of being struck-and killed-by lightning? Actually, this is not a constant number; they depend on how likely you are to be outdoors when lightning hits and unable to seek shelter in time because you didn't have a good forecast. In 1940, the chance of an American being killed by lightning in a given year was about 1 in 400,000.33 Today, it's just 1 chance in 11,000,000, making it almost thirty times less likely. Some of this reflects changes in living patterns (more of our work is done indoors now) and improvement in communications technology and medical care, but it's also because of better weather forecasts.
Perhaps the most impressive gains have been in hurricane forecasting. Just twenty-five years ago, when the National Hurricane Center tried to forecast where a hurricane would hit three days in advance of landfall, it missed by an average of 350 miles.34 That isn't very useful on a human scale. Draw a 350-mile radius outward from New Orleans, for instance, and it covers all points from Houston, Texas, to Tallahassee, Florida (figure 4-5). You can't evacuate an area that large.
Today, however, the average miss is only about one hundred miles, enough to cover only southeastern Louisiana and the southern tip of Mississippi. The hurricane will still hit outside that circle some of the time, but now we are looking at a relatively small area in which an impact is even money or better-small enough that you could plausibly evacuate it seventy-two hours in advance. In 1985, by contrast, it was not until twenty-four hours in advance of landfall that hurricane forecasts displayed the same skill. What this means is that we now have about forty-eight hours of additional warning time before a storm hits-and as we will see later, every hour is critical when it comes to evacuating a city like New Orleans.*
What does bitterly cold mean? A chance of flurries? Just where is the dividing line between partly cloudy and mostly cloudy? The Weather Channel needs to figure this out, and it needs to establish formal rules for doing so, since it issues far too many forecasts for the verbiage to be determined on an ad hoc basis.
Sometimes the need to adapt the forecast to the consumer can take on comical dimensions. For many years, the Weather Channel had indicated rain on their radar maps with green shading (occasionally accompanied by yellow and red for severe storms). At some point in 2001, someone in the marketing department got the bright idea to make rain blue instead-which is, after all, what we think of as the color of water. The Weather Channel was quickly beseiged with phone calls from outraged-and occasionally terrified-consumers, some of whom mistook the blue blotches for some kind of heretofore unknown precipitation (plasma storms? radioactive fallout?). "That was a nuclear meltdown," Dr. Rose told me. "Somebody wrote in and said, 'For years you've been telling us that rain is green-and now it's blue? What madness is this?'"
In 2002 an entrepeneur named Eric Floehr, a computer science graduate from Ohio State who was working for MCI, changed that. Floehr simply started collecting data on the forecasts issued by the NWS, the Weather Channel, and AccuWeather, to see if the government model or the private-sector forecasts were more accurate. This was mostly for his own edification at first-a sort of very large scale science fair project-but it quickly evolved into a profitable business, ForecastWatch.com, which repackages the data into highly customized reports for clients ranging from energy traders (for whom a fraction of a degree can translate into tens of thousands of dollars) to academics.
Floehr found that there wasn't any one clear overall winner. His data suggests that AccuWeather has the best precipitation forecasts by a small margin, that the Weather Channel has slightly better temperature forecasts, and the government's forecasts are solid all around. They're all pretty good.
But the further out in time these models go, the less accurate they turn out to be (figure 4-6). Forecasts made eight days in advance, for example, demonstate almost no skill; they beat persistence but are barely better than climatology. And at intervals of nine or more days in advance, the professional forecasts were actually a bit worse than climatology. After a little more than a week, Loft told me, chaos theory completely takes over, and the dynamic memory of the atmopshere erases itself.
...Floehr's finding raises a couple of disturbing questions. It would be one thing if, after seven or eight days, the computer models demonstrated essentially zero skill. But instead, they actually display negative skill: they are worse than what you or I could do sitting around at home and looking up a table of long-term weather averages. How can this be? It is likely because the computer programs, which are hypersensitive to the naturally occurring feedbacks in the weather system, begin to produce feedbacks of their own. It's not merely that there is no longer a signal amid the noise, but that the noise is being amplified.
The bigger question is why, if these longer-term forecasts aren't any good, outlets like the Weather Channel (which publishes ten-day forecasts) and AccuWeather (which ups the ante and goes for fifteen) continue to produce them. Dr. Rose took the position that doing so doesn't really cause any harm; even a forecast based purely on climatology might be of some interest to their consumers.
The statistical reality of accuracy isn't necessarily the governing paradigm when it comes to commercial weather forecasting. It's more the perception of accuracy that adds value in the eyes of the consumer.
For instance, the for-profit weather forecasters rarely predict exactly a 50 percent chance of rain, which might seem wishy-washy and indecisive to consumers.41 Instead, they'll flip a coin and round up to 60, or down to 40, even though this makes the forecasts both less accurate and less honest.42
Floehr also uncovered a more flagrant example of fudging the numbers, something that may be the worst-kept secret in the weather industry. Most commercial weather forecasts are biased, and probably deliberately so. In particular, they are biased toward forecasting more precipitation than will actually occur43-what meteorologists call a "wet bias." The further you get from the government's original data, and the more consumer facing the forecasts, the worse this bias becomes. Forecasts "add value" by subtracting accuracy.
...The National Weather Service's forecasts are, it turns out, admirably well calibrated46 (figure 4-7). When they say there is a 20 percent chance of rain, it really does rain 20 percent of the time. They have been making good use of feedback, and their forecasts are honest and accurate. The meteorologists at the Weather Channel will fudge a little bit under certain conditions. Historically, for instance, when they say there is a 20 percent chance of rain, it has actually only rained about 5 percent of the time.47 In fact, this is deliberate and is something the Weather Channel is willing to admit to. It has to do with their economic incentives.
People notice one type of mistake-the failure to predict rain-more than another kind, false alarms. If it rains when it isn't supposed to, they curse the weatherman for ruining their picnic, whereas an unexpectedly sunny day is taken as a serendipitous bonus. It isn't good science, but as Dr. Rose at the Weather Channel acknolwedged to me: "If the forecast was objective, if it has zero bias in precipitation, we'd probably be in trouble."
Still, the Weather Channel is a relatively buttoned-down organization-many of their customers mistakenly think they are a government agency-and they play it pretty straight most of the time. Their wet bias is limited to slightly exaggerating the probability of rain when it is unlikely to occur-saying there is a 20 percent chance when they know it is really a 5 or 10 percent chance-covering their butts in the case of an unexpected sprinkle. Otherwise, their forecasts are well calibrated (figure 4-8). When they say there is a 70 percent chance of rain, for instance, that number can be taken at face value.
...Kansas City ought to be a great market for weather forecasting-it has scorching-hot summers, cold winters, tornadoes, and droughts, and it is large enough to be represented by all the major networks. A man there named J. D. Eggleston began tracking local TV forecasts to help his daughter with a fifth-grade classroom project. Eggleston found the analysis so interesting that he continued it for seven months, posting the results to the Freakonomics blog.48
The TV meteorologists weren't placing much emphasis on accuracy. Instead, their forecasts were quite a bit worse than those issued by the National Weather Service, which they could have taken for free from the Internet and reported on the air. And they weren't remotely well calibrated. In Eggleston's study, when a Kansas City meteorologist said there was a 100 percent chance of rain, it failed to rain about one-third of the time (figure 4-9).
No people in New York City died from Hurricane Irene in 2011 despite massive media hype surrounding the storm, but three people did from flooding in landlocked Vermont52 once the TV cameras were turned off.
Evacuation decisions are not easy, in part because evacuations themselves can be deadly; a bus carrying hospital evacuees from another 2005 storm, Hurricane Rita, burst into flames while leaving Houston, killing twenty-three elderly passengers.53
Studies from Katrina and other storms have found that having survived a hurricane makes one less likely to evacuate the next time one comes.57
# ch4
The public at large became more interested in weather forecasting after the Schoolhouse Blizzard of January 1888. On January 12 that year, initially a relatively warm day in the Great Plains, the temperature dropped almost 30 degrees in a matter of a few hours and a blinding snowstorm came.26 Hundreds of children, leaving school and caught unaware as the blizzard hit, died of hypothermia on their way home. As crude as early weather forecasts were, it was hoped that they might at least be able to provide some warning about an event so severe. So the National Weather Service was moved to the Department of Agriculture and took on a more civilian-facing mission.*
What is it, exactly, that humans can do better than computers that can crunch numbers at seventy-seven teraFLOPS? They can see. Hoke led me onto the forecasting floor, which consisted of a series of workstations marked with blue overhanging signs with such legends as MARITIME FORECAST CENTER and NATIONAL CENTER. Each station was manned by one or two meterologists-accompanied by an armada of flat-screen monitors that displayed full-color maps of every conceivable type of weather data for every corner of the country. The forecasters worked quietly and quickly, with a certain amount of Grant's military precision.30
Some of the forecasters were drawing on these maps with what appeared to be a light pen, painstakingly adjusting the contours of temperature gradients produced by the computer models-fifteen miles westward over the Mississippi Delta, thirty miles northward into Lake Erie. Gradually, they were bringing them one step closer to the Platonic ideal they were hoping to represent.
The forecasters know the flaws in the computer models. These inevitably arise because-as a consequence of chaos theory-even the most trivial bug in the model can have potentially profound effects. Perhaps the computer tends to be too conservative on forecasting nighttime rainfalls in Seattle when there's a low-pressure system in Puget Sound. Perhaps it doesn't know that the fog in Acadia National Park in Maine will clear up by sunrise if the wind is blowing in one direction, but can linger until midmorning if it's coming from another. These are the sorts of distinctions that forecasters glean over time as they learn to work around the flaws in the model, in the way that a skilled pool player can adjust to the dead spots on the table at his local bar.
...The NWS keeps two different sets of books: one that shows how well the computers are doing by themselves and another that accounts for how much value the humans are contributing. According to the agency's statistics, humans improve the accuracy of precipitation forecasts by about 25 percent over the computer guidance alone,31 and temperature forecasts by about 10 percent.32 Moreover, according to Hoke, these ratios have been relatively constant over time: as much progress as the computers have made, his forecasters continue to add value on top of it. Vision accounts for a lot.
When Hoke began his career, in the mid-'70s, the jokes about weather forecasters had some grounding in truth. On average, for instance, the NWS was missing the high temperature by about 6 degrees when trying to forecast it three days in advance (figure 4-4). That isn't much better than the accuracy you could get just by looking up a table of long-term averages. The partnership between man and machine is paying big dividends, however. Today, the average miss is about 3.5 degrees, meaning that almost half the inaccuracy has been stripped out.
Weather forecasters are also getting better at predicting severe weather. What are your odds of being struck-and killed-by lightning? Actually, this is not a constant number; they depend on how likely you are to be outdoors when lightning hits and unable to seek shelter in time because you didn't have a good forecast. In 1940, the chance of an American being killed by lightning in a given year was about 1 in 400,000.33 Today, it's just 1 chance in 11,000,000, making it almost thirty times less likely. Some of this reflects changes in living patterns (more of our work is done indoors now) and improvement in communications technology and medical care, but it's also because of better weather forecasts.
Perhaps the most impressive gains have been in hurricane forecasting. Just twenty-five years ago, when the National Hurricane Center tried to forecast where a hurricane would hit three days in advance of landfall, it missed by an average of 350 miles.34 That isn't very useful on a human scale. Draw a 350-mile radius outward from New Orleans, for instance, and it covers all points from Houston, Texas, to Tallahassee, Florida (figure 4-5). You can't evacuate an area that large.
Today, however, the average miss is only about one hundred miles, enough to cover only southeastern Louisiana and the southern tip of Mississippi. The hurricane will still hit outside that circle some of the time, but now we are looking at a relatively small area in which an impact is even money or better-small enough that you could plausibly evacuate it seventy-two hours in advance. In 1985, by contrast, it was not until twenty-four hours in advance of landfall that hurricane forecasts displayed the same skill. What this means is that we now have about forty-eight hours of additional warning time before a storm hits-and as we will see later, every hour is critical when it comes to evacuating a city like New Orleans.*
What does bitterly cold mean? A chance of flurries? Just where is the dividing line between partly cloudy and mostly cloudy? The Weather Channel needs to figure this out, and it needs to establish formal rules for doing so, since it issues far too many forecasts for the verbiage to be determined on an ad hoc basis.
Sometimes the need to adapt the forecast to the consumer can take on comical dimensions. For many years, the Weather Channel had indicated rain on their radar maps with green shading (occasionally accompanied by yellow and red for severe storms). At some point in 2001, someone in the marketing department got the bright idea to make rain blue instead-which is, after all, what we think of as the color of water. The Weather Channel was quickly beseiged with phone calls from outraged-and occasionally terrified-consumers, some of whom mistook the blue blotches for some kind of heretofore unknown precipitation (plasma storms? radioactive fallout?). "That was a nuclear meltdown," Dr. Rose told me. "Somebody wrote in and said, 'For years you've been telling us that rain is green-and now it's blue? What madness is this?'"
In 2002 an entrepeneur named Eric Floehr, a computer science graduate from Ohio State who was working for MCI, changed that. Floehr simply started collecting data on the forecasts issued by the NWS, the Weather Channel, and AccuWeather, to see if the government model or the private-sector forecasts were more accurate. This was mostly for his own edification at first-a sort of very large scale science fair project-but it quickly evolved into a profitable business, ForecastWatch.com, which repackages the data into highly customized reports for clients ranging from energy traders (for whom a fraction of a degree can translate into tens of thousands of dollars) to academics.
Floehr found that there wasn't any one clear overall winner. His data suggests that AccuWeather has the best precipitation forecasts by a small margin, that the Weather Channel has slightly better temperature forecasts, and the government's forecasts are solid all around. They're all pretty good.
But the further out in time these models go, the less accurate they turn out to be (figure 4-6). Forecasts made eight days in advance, for example, demonstate almost no skill; they beat persistence but are barely better than climatology. And at intervals of nine or more days in advance, the professional forecasts were actually a bit worse than climatology. After a little more than a week, Loft told me, chaos theory completely takes over, and the dynamic memory of the atmopshere erases itself.
...Floehr's finding raises a couple of disturbing questions. It would be one thing if, after seven or eight days, the computer models demonstrated essentially zero skill. But instead, they actually display negative skill: they are worse than what you or I could do sitting around at home and looking up a table of long-term weather averages. How can this be? It is likely because the computer programs, which are hypersensitive to the naturally occurring feedbacks in the weather system, begin to produce feedbacks of their own. It's not merely that there is no longer a signal amid the noise, but that the noise is being amplified.
The bigger question is why, if these longer-term forecasts aren't any good, outlets like the Weather Channel (which publishes ten-day forecasts) and AccuWeather (which ups the ante and goes for fifteen) continue to produce them. Dr. Rose took the position that doing so doesn't really cause any harm; even a forecast based purely on climatology might be of some interest to their consumers.
The statistical reality of accuracy isn't necessarily the governing paradigm when it comes to commercial weather forecasting. It's more the perception of accuracy that adds value in the eyes of the consumer.
For instance, the for-profit weather forecasters rarely predict exactly a 50 percent chance of rain, which might seem wishy-washy and indecisive to consumers.41 Instead, they'll flip a coin and round up to 60, or down to 40, even though this makes the forecasts both less accurate and less honest.42
Floehr also uncovered a more flagrant example of fudging the numbers, something that may be the worst-kept secret in the weather industry. Most commercial weather forecasts are biased, and probably deliberately so. In particular, they are biased toward forecasting more precipitation than will actually occur43-what meteorologists call a "wet bias." The further you get from the government's original data, and the more consumer facing the forecasts, the worse this bias becomes. Forecasts "add value" by subtracting accuracy.
...The National Weather Service's forecasts are, it turns out, admirably well calibrated46 (figure 4-7). When they say there is a 20 percent chance of rain, it really does rain 20 percent of the time. They have been making good use of feedback, and their forecasts are honest and accurate. The meteorologists at the Weather Channel will fudge a little bit under certain conditions. Historically, for instance, when they say there is a 20 percent chance of rain, it has actually only rained about 5 percent of the time.47 In fact, this is deliberate and is something the Weather Channel is willing to admit to. It has to do with their economic incentives.
People notice one type of mistake-the failure to predict rain-more than another kind, false alarms. If it rains when it isn't supposed to, they curse the weatherman for ruining their picnic, whereas an unexpectedly sunny day is taken as a serendipitous bonus. It isn't good science, but as Dr. Rose at the Weather Channel acknolwedged to me: "If the forecast was objective, if it has zero bias in precipitation, we'd probably be in trouble."
Still, the Weather Channel is a relatively buttoned-down organization-many of their customers mistakenly think they are a government agency-and they play it pretty straight most of the time. Their wet bias is limited to slightly exaggerating the probability of rain when it is unlikely to occur-saying there is a 20 percent chance when they know it is really a 5 or 10 percent chance-covering their butts in the case of an unexpected sprinkle. Otherwise, their forecasts are well calibrated (figure 4-8). When they say there is a 70 percent chance of rain, for instance, that number can be taken at face value.
...Kansas City ought to be a great market for weather forecasting-it has scorching-hot summers, cold winters, tornadoes, and droughts, and it is large enough to be represented by all the major networks. A man there named J. D. Eggleston began tracking local TV forecasts to help his daughter with a fifth-grade classroom project. Eggleston found the analysis so interesting that he continued it for seven months, posting the results to the Freakonomics blog.48
The TV meteorologists weren't placing much emphasis on accuracy. Instead, their forecasts were quite a bit worse than those issued by the National Weather Service, which they could have taken for free from the Internet and reported on the air. And they weren't remotely well calibrated. In Eggleston's study, when a Kansas City meteorologist said there was a 100 percent chance of rain, it failed to rain about one-third of the time (figure 4-9).
No people in New York City died from Hurricane Irene in 2011 despite massive media hype surrounding the storm, but three people did from flooding in landlocked Vermont52 once the TV cameras were turned off.
Evacuation decisions are not easy, in part because evacuations themselves can be deadly; a bus carrying hospital evacuees from another 2005 storm, Hurricane Rita, burst into flames while leaving Houston, killing twenty-three elderly passengers.53
Studies from Katrina and other storms have found that having survived a hurricane makes one less likely to evacuate the next time one comes.57
Shared publicly