How can we get the USDA Plant Hardiness GIS data released as Open Data?

At the beginning of this year, the USDA released a new Plant Hardiness Zone Map: http://planthardiness.ars.usda.gov/PHZMWeb/

Images of the maps are available for free download, but the underlying GIS data is not: http://planthardiness.ars.usda.gov/PHZMWeb/DownloadsGIS.aspx

This data set would be incredibly useful for anyone creating a gardening or agriculture-related app, but it is only available through a vendor that provides commercial licenses that are too expensive and restrictive for most independent developers, and useless to anyone who wants to create an open data set. I've sent in several inquiries through the USDA contact form, but haven't heard back. Any suggestions or assistance on getting this data released?

Additional background and context

Climate Source, the data vendor, only sells single user licenses online:
http://www.climatesource.com/cgi-bin/csshop/index.html

Here is their single user license, which, unfortunately, disallows just about anything I could possibly want to do with the data, such as build a web app: http://climatesource.com/cgi-bin/csshop/license.html?id=CYRaLtuj

I contacted Climate Source a few months back for a license that would allow the uses I wanted to make of the data. The following is a summary of the salient points from that email exchange:

1) 'Server' licenses are $6k for a one-year agreement. Additional years are $4k, and they won't do anything longer than a four-year agreement for $18k, certainly not a perpetual and irrevocable license. According to the sales person, these are standard fees and terms for a commercial license.

2) The licensed data may not be republished in bulk. Licensees must put in place measures (such as a CAPTCHA) to prevent 3rd parties from scraping the data, and Climate Source requires prior approval of these measures before the app is live (so much for using this data in creating any open data sets or open source apps). According to the salesperson, these measures are supposedly being required by the USDA and OSU, rather than Climate Source. The salesperson points out the CAPTCHA required to do a zipcode-based lookup on the USDA website as 'proof' of this requirement. 

3) Further, according to the salesperson, the USDA (due to budget cuts) can no longer afford to create new data sets like the plant hardiness maps and just give them away. However, the raw climate data used was already collected by NOAA and is freely available, and I don't think that processing the data to produce the maps could have been all that expensive, regardless of how sophisticated the algorithms that were used.

To add final insult to injury, Climate Source has themselves been using this data to create various mobile apps, so essentially the licensing fees are a barrier to entry to anyone who might compete with their 'Climate Wise' apps:
http://climatesource.com/climatewise.html
2
1
Christo Norman's profile photoMichael Bernstein's profile photoJeffrey Evans's profile photoAlexander Howard's profile photo
18 comments
 
One of the challenges I see is getting government to switch from always thinking it has to deliver a service (like a website or infographic like this) to providing the community with the raw data so it can create its own service.

Sounds to me that is one of the problems being faced here with the department spending $$$ on developing apps when others could have done it just as well...
 
Not sure on the background of that agreement, Michael, but let me check into the problem.  I'll post back here and get back to you directly with any additional questions.  Thanks for the call out Alex.
 
[Joining the community just to be allowed to comment on this post]
The 1971-2000 data is available free for non-commercial use
http://prism.oregonstate.edu/products/viewer.phtml?file=/pub/prism/us_30s/grids/tmin/Normals/us_tmin_1971_2000.14.gz&year=1971_2000&vartype=tmin&month=14&status=final

You don't have to speculate as to how complex the algorithms used are.  They're all described in the user's guide: http://www.prism.oregonstate.edu/pub/prism/docs/prisguid.pdf and other papers: http://www.prism.oregonstate.edu/docs/

While the closed source software used to generate the data was almost certainly developed mostly or entirely with public funds, you're unlikely to pry it out of the researcher's hands without a fight after two decades of developing it under a different set of assumptions.
 
+Tom Morris, the 2012 zone maps are based on 1976-2005 data, rather than 1971-2000. Thanks for pointing out the documentation and research papers. However, I am hardly trying to get access to the software at all (though I suppose that would be nice), much less 'pry it out of the researchers hands'. I just want the GIS data for the resulting maps.

As for the name, that is interesting, but even if there is any familial relationship between the two, it doesn't seem it would necessarily be improper, unless one was involved somehow in the decision to give the other a USDA contract or grant.
 
Michael, I'm working on this but it's a bit more complicated than originally thought. I will post again tomorrow with more information.

In the meantime, the source raw data is referenced in the paper here http://journals.ametsoc.org/doi/full/10.1175/2010JAMC2536.1 but not in a machine readable format.  Some of the data referenced comes from http://www.census.gov/population/www/cen2000/briefs/phc-t29/index.html and http://prism.oregonstate.edu/

I will keep pursuing how to get this.
 
Data liberation in action. Love it. Will the result by semantic data available via data.gov?
 
Thanks, Jeanne! I am naturally very curious about the complications, and glad that you've actually made progress. 
 
Michael--sorry for the delay, I've been out sick for a few weeks.

The issue in releasing that data is that USDA does not actually own the rights to release the data in its machine readable format beyond the government.  The extended answer directly from USDA is below.

There are many instances of something like this across the government.  For reasons that make sense within a project and funding, agencies may work with a company to add value to data elements (aggregating, visualizing, organizing, etc.) and that company may then essentially own the non-governmental use rights to that data (although most of the time the value-added data is made freely accessible to the public).  Part of the reason to open up data is to allow companies to create economic value from it, but at the same time allowing the broadest possible access to the underlying data itself.

The short-term solution is that I could extract the individual tables from the paper referenced below and load those into Data.gov.  And I can do that if it will be helpful.  The long-term solution is to liberate data that is gathered or created by the government under the emerging policy changes you can see at the White House Open Data Initiatives that Data.gov is part of: http://www.whitehouse.gov/innovationfellows/opendata and http://project-open-data.github.com/  We are working aggressively to make this happen.  Stay tuned.

From USDA:  "The Agricultural Research Service (ARS) of the U.S. Department of Agriculture entered into a Specific Cooperative Agreement (SCA) with Oregon State University (OSU) for OSU’s PRISM Climate Group to produce the high resolution GIS data (“shape files” and “grids” – layers of map data) and associated data sets that are the bases for the 2012 USDA Plant Hardiness Zone Map.  The SCA-based partnership allowed the USDA Plant Hardiness Zone Map to make use of a sophisticated proprietary algorithm developed by PRISM Climate Group that was able to interpolate between actual data points, which provided never achieved before accuracy and resolution in a plant hardiness map.
 
Under the SCA, all of the parties contributed resources and worked collaboratively on the project.  The high resolution GIS data, developed solely by the PRISM Group at OSU, are not owned by USDA but, rather, they are solely owned by OSU.  This is different from a Procurement or Services Agreement such as a “work for hire” in which the U.S. Government funds the project and owns all of the results.
 
OSU has granted Climate Source, Inc. exclusive rights to serve the data and documents to requesters.  OSU granted the U.S. Government nonexclusive rights to download GIS data for U.S. Government purposes only; however, the data may not be re-distributed to non-U.S. Government users.  All non-U.S. Government users need to contact Climate Source, Inc. if they wish to obtain the high resolution GIS data.
 
Non-U.S. Government users may also contact the agencies that hold the raw station data incorporated into the USDA Plant Hardiness Zone Map.  A scientific paper describing how the USDA Plant Hardiness Zone Map was created by OSU lists these data sources in Table 2, with URLs for accessing the data (http://journals.ametsoc.org/doi/abs/10.1175/2010JAMC2536.1)."
 
+Jeanne Holm, thanks very much for looking into this. The answer you got from the USDA was very disappointing, but at least I have an answer now.

To answer your question, I think getting the data from the sources listed in Table 2 and putting it on data.gov would be useful to someone, though I am not sure it would be helpful to me. However,  I'll start looking into whether the work done by OSU is replicable in even a crude sense without access to their proprietary software or algorithm. A truly free (and freely reusable) PHZM likely doesn't have to be as accurate as the 'official' one to be better than nothing.

Thanks again, and like Alex, I hope you get well soon.
 
I am somewhat baffled by all this. It looks like the Hardiness Zones are derived by simply applying a lookup table to normalized temperature of the "mean coldest month" raster, representing a specified climate period. This is just a modeling exercise and the labor in applying a pre-existing model does not remotely reflect the cost recovery that USDA is calming.   
Daly et al (2012) structured the analysis by running the PRISM model on the raw satiation data to derive the base "mean coldest month" raster used in the classification. You could approximate the model by summarizing the freely available monthly temperature data. You would certainly loose some model precision but I cannot imagine that it would be enough to change the delineation of the regions in the slightest.  

Granted, Chris Daly has built quite an empire around PRISM but the monthlies are freely available for download (ftp://prism.oregonstate.edu/pub/prism/us/grids).

For some of our published climate modeling I have developed scripts in R that download and summarize PRISM data. It would be quite easy to add the Hardiness Zone classification. This would allow you to summarize any climate period you would like. If anybody is interested I have some of these scripts on my website (http://evansmurphy.wix.com/evansspatial) in the tools section. 

There are also better normalized climate data available (i.e., http://forest.moscowfsl.wsu.edu/climate/). I have never been happy with the lapse rates of PRISM in topographically relived areas and have found that nonlinear spline regression approaches outperform the PRISM model. We are currently working on digesting the hourly US weather station data to derive a suite of climate metrics, for any given summary period, using a thin plate spline regression for surface interpolation. This code will be freely available!   
 
+Jeffrey Evans , that is very welcome news! I am completely overcommitted at the present time, but please keep me posted on your progress, and I'll jump in to help as and where I can!
 
Jeff, thanks for the contribution and help on this thread and bringing it back to life.
Add a comment...