What are the valid characters that can show up in a URL host?
Please see Restrictions on valid host names:
Hostnames are composed of series of labels concatenated with dots, as are all domain names1. For example, "en.wikipedia.org" is a hostname. Each label must be between 1 and 63 characters long, and the entire hostname has a maximum of 255 characters.
RFCs mandate that a hostname's labels may contain only the ASCII letters 'a' through 'z' (case-insensitive), the digits '0' through '9', and the hyphen. Hostname labels cannot begin or end with a hyphen. No other symbols, punctuation characters, or blank spaces are permitted.
Depends at what level you do the validation (before or after the URL escaping). If you try to validate user input, then it can go way beyond ASCII (with big chunks of Unicode).
See http://en.wikipedia.org/wiki/Internationalized_domain_name
If you try to validate after all the escaping and the "punycode" is done, there is no point in validation, since that is already guaranteed to only contain valid characters by the old RFC.
no, that is all that is allowed
here is a reference if you like to read: http://www.ietf.org/rfc/rfc1034.txt