This is a great initiative (that, sadly, I can contribute absolutely nothing to, but if you can help, please do so!), and +Eric Haines
brings up a good point in general. There are a lot of valuable digital resources that go 404 after a number of years.
However, one problem with archiving is that the web is a living, changing thing. Yes, storage is cheap, we should archive things. But without digging into particular case you don't know if some article went 404 because the author removed it as inaccurate, because the domain was sold, or because the storage device and backups failed. So, context and seeing the change over time is important. I think that to effectively back up your favourite parts of web, you have to put it in version control instead of zips, though zips are a better alternative than "nothing".
So, I looked around and found this site - http://contentmine.org/
. And they have a scraper as well (https://github.com/ContentMine/quickscrape
), with parsers for some scientific journals, and you can write your own parser definitions. Something like this, saving data in a git repository on a large storage device (and a working git client on the disk as well) would work quite nicely. It could also be super helpful in case of Zombie Apocalypse, or being stranded on Mars.