Shared publicly  - 
So Fedora 19 is named "Schrödinger's Cat", but it turns out that Fedora's bug reporting system can handle neither the apostrophe nor the umlaut.  And they're afraid that  this kind of issue will turn up in any number of other places, so the release may be renamed to "Schrodingers Cat".

I'd laugh, if I believed that I had a handle on all the Unicode issues in my own web site.  It's surprising how hard this stuff can be.
Ketil Malde's profile photoDavid Woodhouse's profile photoMarc Herbert's profile photoRafael do Nascimento Pereira's profile photo
+Kam-Yung Soh Okay, apostrophes may be uncommon (in the west), but hyphens?

I thought this was 2013.  You know, the 3rd Millenium and all?  Sheesh.
And I don't even bother to count web sites that don't accept a perfectly legitimate email address with a plus sign in it.
Well if they're really going ASCII-safe, it should be "Schroedingers".  But maybe this is the opportunity they needed to really embrace UTF.
also worth mentioning - an apostrophe in a string that is commonly grep'd out of a file will be a special kind of QA for lots of people's shell scripts. :)
A friend of mine from germany way back created problems in germany when he gave his son a middle name with the norwegian letter å.
+Jonathan Corbet your web site is currently in a dual state. It can and can not handle the the name "Schrödinger's Cat". But once you try to use it, then you will observe whether or not it does.
Don Marti
I say call it "Schrödinger's 😻" and be done with it. If you're going to do Unicode, do Unicode.
+Don Marti can  we turn the ö into ☠ ?

Schr☠dinger's 😻
Wow. This is lame in so many ways. Instead of fixing the internationalization problems, they change the name...

Unicode has been around for a while now, and we can't have this working properly?
But, actually, its worst than that. Those characters are part of ASCII-8. Is there any system that can run Fedora that doesn't support those???
Ah the fun of asking teachers 'so if the cat and dead cat are both there how is energy conserved'

Our government has enough trouble with ŵ let alone other symbols. Maybe its time start a business called %s.
+Alan Cox Or time to make a web site about a company called AT&T.
+Marc-André Laverdière so - it's not like fedora is alone in this. Go ahead - put an apostrophe into debian's release name or ubuntus and see what things slide sideways. :)

seriously I think  it is less the unicode and more the apostrophe :)
Mr. Jørgensen, Jørgensen, Jörgensen, J\370rgensen and many more all live here and receive mail, oh yes. And those are just the senders who didn't go for "You have entered an invalid name". Luckily Mr. Jorgensen also lives here in order to receive mail from those.
Too many programmers fell asleep in their quotology classes.
UTF-8 was first mapped out in 1992 by Ken Thompson on a New Jersey diner place mat. It was a refinement of another similar encoding that lacked self-synchronisation.. it really is a simple concept and took no time for Ken Thompson to update Plan 9 to use it.. yet 21 years later, we're still trying to get to grips with it everywhere else. 
+Jürgen Erhard A friend has a compound last name containing a space.  Trying to get this properly recorded in various systems has become a part-time occupation.
+Cormac Long and there lies one of the great lessons of software: Thompson didn't have a large established userbase !
My guess is that many programmers default for for simple regexps when validating input, such as [A-Za-z]+ or something like that. I guess that really means is that we need good input validation/escaping libraries, and lots and lots of test cases :)
+Marc-André Laverdière regexp is pretty much a a total botch when it comes to parsing UTF-8.. it literally implies all bets are off when you throw UTF-8 into the expression or source text as it can see the extended octet sequence as octet members in a class instead of as a single character of that class. 
+Alan Cox Good point.. its a pity we didn't throw more weight into this earlier on. In my day job I deal with the GSM charset for SMS which is nothing more than a 7-bit abomination. But it was defined long before UTF-8 and at a time when they wanted to cram 160 characters into 140 octets.. and that extra 20 characters mattered. So came the ideal of picking 160 pan-European characters.. apparently many fights ensued over this.. for example it has 10 uppercase Greek characters.. so we can formulate maths equations over SMS.. you sure as hell cannot text in Greek unless you you do so in UCS2.. then its 2 octets per character, giving you only 70 per message. If we had our time back, UTF-8 would have been the choice... but its another illustration of your point.
I think we should just name everyone "Bruce".  Problem solved.
(Morning Bruce.)
It would be more appropriate if the system was simultaneously able to handle special characters and unable to handle them. :P
+Edward Morbius What about
"First Surname1 Surname2"?

Perhaps we can replace "there's an app for that" with "there's a codepoint for that"? :-)
+Ketil Malde when G+ emailed me your comment the email had a text/plain alternative and a text/html alternative.  Naturally my email reader showed me the text/plain by default and it just contains "Firstname Surname1 Surname2" which misses the joke completely (or perhaps that is the joke).  The text/html alternative quoted it properly so I saw the   there.
+Ketil Malde That's fine if you're directly inputting the values in a system which allows HTML character encoding entities, but that's an assumption that doesn't hold for all data systems.  More generally:  many system assume that a space character, or anything that's not an alpha character, is invalid in a name, or is a separator, or ....   So validation rules tend to get in the way.

If you go back to some of the #nymwar  discussions about names, you'll find a number of common assumptions which are frequently wrong.  This article highlights the more general case of false assumptions made regarding names:
+Edward Morbius I wasn't entirely serious, you know.  But if you want to wreak havoc, Unicode probably allows a plethora of ways to spell any name with a variation of codepoints - lots of different whitespace, titlecase, combining characters, and whatnot.
I don't think this is the normal kind of 'Unicode issue'. This one appears to be pure stupidity. It looks like it would have broken with a legacy 8-bit encoding of "Schrödinger's Cat" too.

The problem was a function which applies heuristics to detect whether a given file contains text or binary. The function just has a fixed percentage of "non-ASCII" bytes which it will tolerate — any more than 2% of bytes >= 0x80 and it considers it binary. For a very small file, that doesn't even allow one character to outside the ASCII range.

(The only difference that UTF-8 makes here is to adjust the value of "very small" in the above explanation. With ISO8859-1 you need a 50-byte file before it'll tolerate a single non-ASCII character. With UTF-8 you need a 100-byte file before it'll tolerate the same single non-ASCII character.)

In fact, in the 21st century this function can be a whole lot saner. Instead of just checking for bytes >= 0x80, it should be checking for bytes >= 0x80 which are not part of a valid UTF-8 byte sequence.

This is one of the things that actually get easier with the switch to using UTF-8 everywhere.
Add a comment...