Interesting topic. Some thoughts (but no punch line, sorry):
* Size matters but the degree varies with how the data is used, particularly, as Rasmus notes, how ephemeral or persistent it is. Breaking the tokener's operation on new text is one thing, but imagine if all the JSON data out there was potentially impacted and had to be adapted. Y2K fun!
* So, agreed, in some situations carefully engineering data formats is vital. But one could argue that a great proportion of the time it's actually OVER engineering. I recall Alan Kay complaining bitterly long ago about the vast amount of human time and talent that's been wasted worrying about packing bits into boxes instead of delivering value, and I feel he has a point.
* For instance I suspect in Doug's case here all the code REALLY cares about is the semantics of mathematical integers--incrementing, comparing etc. That an "int" declaration also carries a size limit is often an overspecification, a gratuitous gotcha clause in the canned type's contract that's just there waiting to be broken. Ideally, declarations should be exactly necessary and sufficient (the converse case is being unable to assert that, say, it's a POSITIVE integer).
* Moreover, how often does anyone actually want bizarre mod-256 semantics?
* One could imagine a runtime environment where overflow would simply cause the dataflow to adapt, preserving the "integer" semantics without the size encumbrance.
* In fact, the tagged-data architecture used in the old Lisp machines et al supported exactly that. Most small operands just zipped through the hardware ALU, with overflow exceptions triggering a graceful roll-over into bignums. Of course back then folks also argued memory miserliness was a virtue. Today the net advantage of sprinkling in a few tag bits to make data more self-describing seems like it might be worth revisiting.