Shared publicly  - 
 
Interesting post on HTML escaping. While it's a good thing to keep in mind, I do find that it is somewhat over-hyped.

The feeling that I get from a quick glance at the article is "warning: if you escape only &, <, > and ", then a clever attacker can use a tricky sequence of characters to pwn your page." But really, after reading it all in detail, the message is a far less alarming "warning: if you escape only &, <, > and ", then a stupid programmer can insert the result into a bare (or single-quoted) attribute value, and then open up the program to a very obvious attack."

In other words, if you escape only those four characters and then only ever insert the result into bare HTML text or double-quoted attribute values, then you should be 100% safe. (I THINK.) For good measure, you can also escape ' to allow insertion of the result into single-quoted attribute values as well. I don't see a need to escape a ton of other characters just in case someone stupidly puts it in a bare attribute value.
This was exaggeration for effect—there aren't many cases where a simple XSS injection could actually empty a bank account—but I wanted to make a point. By some coincidence, I've found myself w...
1
Tim Cuthbertson's profile photoMatt Giuca's profile photoKendall Lister's profile photo
6 comments
 
The UTF-7 hack (last item) is a trick I'd never heard of before, and certainly worth knowing about.
 
What? I didn't think that was a good thing... For one thing, it means more options for exploits. For another, it seems IE-specific.

Or when you say "trick" do you mean the bad kind?
 
Yeah, the worst kind of trick - the kind it's good to be told about so it never happens to you ;)
 
This type of issue is one of the best arguments against "rolling your own" code instead of making the effort to find a well-tested library and use it. Your sentence "if you only do this and then only use it like this, you should be 100% safe (I THINK)" sums it up perfectly :)
 
True, sort of. Firstly, it doesn't nullify the academic exercise of working out exactly what needs to be escaped. I still want to know precisely which characters are being escaped before I use a third-party library. Second, I was writing a Java tutorial, and since Java has shit-all built-in support for HTML escaping, I thought it might be easier to provide a simple and correct HTML escape function than bring in a full third-party library for the sake of one example.
 
Ah, well, for an academic exercise this is all fine. And if I had to trust anyone to re-write something difficult, I'd be happy if it was you.
Add a comment...