Unicode characters in URLs
What Tgr said. Background:
http://www.example.com/düsseldorf?neighbourhood=Lörick
That's not a URI. But it is an IRI.
You can't include an IRI in an HTML4 document; the type of attributes like href
is defined as URI and not IRI. Some browsers will handle an IRI here anyway, but it's not really a good idea.
To encode an IRI into a URI, take the path and query parts, UTF-8-encode them then percent-encode the non-ASCII bytes:
http://www.example.com/d%C3%BCsseldorf?neighbourhood=L%C3%B6rick
If there are non-ASCII characters in the hostname part of the IRI, eg. http://例え.テスト/
, they have be encoded using Punycode instead.
Now you have a URI. It's an ugly URI. But most browsers will hide that for you: copy and paste it into the address bar or follow it in a link and you'll see it displayed with the original Unicode characters. Wikipedia have been using this for years, eg.:
http://en.wikipedia.org/wiki/ɸ
The one browser whose behaviour is unpredictable and doesn't always display the pretty IRI version is...
...well, you know.
Use percent encoding. Modern browsers will take care of display & paste issues and make it human-readable. E. g. http://ko.wikipedia.org/wiki/위키백과:대문
Edit: when you copy such an url in Firefox, the clipboard will hold the percent-encoded form (which is usually a good thing), but if you copy only a part of it, it will remain unencoded.