What characters are allowed in an email address?
Watch out! There is a bunch of knowledge rot in this thread (stuff that used to be true and now isn't).
To avoid false-positive rejections of actual email addresses in the current and future world, and from anywhere in the world, you need to know at least the high-level concept of RFC 3490, "Internationalizing Domain Names in Applications (IDNA)". I know folks in US and A often aren't up on this, but it's already in widespread and rapidly increasing use around the world (mainly the non-English dominated parts).
The gist is that you can now use addresses like mason@日本.com and wildwezyr@fahrvergnügen.net. No, this isn't yet compatible with everything out there (as many have lamented above, even simple qmail-style +ident addresses are often wrongly rejected). But there is an RFC, there's a spec, it's now backed by the IETF and ICANN, and--more importantly--there's a large and growing number of implementations supporting this improvement that are currently in service.
I didn't know much about this development myself until I moved back to Japan and started seeing email addresses like hei@やる.ca and Amazon URLs like this:
http://www.amazon.co.jp/エレクトロニクス-デジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie=UTF8&node=3210981
I know you don't want links to specs, but if you rely solely on the outdated knowledge of hackers on Internet forums, your email validator will end up rejecting email addresses that non-English-speaking users increasingly expect to work. For those users, such validation will be just as annoying as the commonplace brain-dead form that we all hate, the one that can't handle a + or a three-part domain name or whatever.
So I'm not saying it's not a hassle, but the full list of characters "allowed under some/any/none conditions" is (nearly) all characters in all languages. If you want to "accept all valid email addresses (and many invalid too)" then you have to take IDN into account, which basically makes a character-based approach useless (sorry), unless you first convert the internationalized email addresses (dead since September 2015, used to be like this—a working alternative is here) to Punycode.
After doing that you can follow (most of) the advice above.
See RFC 5322: Internet Message Format and, to a lesser extent, RFC 5321: Simple Mail Transfer Protocol.
RFC 822 also covers email addresses, but it deals mostly with its structure:
addr-spec = local-part "@" domain ; global address
local-part = word *("." word) ; uninterpreted
; case-preserved
domain = sub-domain *("." sub-domain)
sub-domain = domain-ref / domain-literal
domain-ref = atom ; symbolic reference
And as usual, Wikipedia has a decent article on email addresses:
The local-part of the email address may use any of these ASCII characters:
- uppercase and lowercase Latin letters
A
toZ
anda
toz
;- digits
0
to9
;- special characters
!#$%&'*+-/=?^_`{|}~
;- dot
.
, provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g.[email protected]
is not allowed but"John..Doe"@example.com
is allowed);- space and
"(),:;<>@[\]
characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);- comments are allowed with parentheses at either end of the local-part; e.g.
john.smith(comment)@example.com
and(comment)[email protected]
are both equivalent to[email protected]
.
In addition to ASCII characters, as of 2012 you can use international characters above U+007F
, encoded as UTF-8 as described in the RFC 6532 spec and explained on Wikipedia. Note that as of 2019, these standards are still marked as Proposed, but are being rolled out slowly. The changes in this spec essentially added international characters as valid alphanumeric characters (atext) without affecting the rules on allowed & restricted special characters like !#
and @:
.
For validation, see Using a regular expression to validate an email address.
The domain
part is defined as follows:
The Internet standards (Request for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters
a
throughz
(in a case-insensitive manner), the digits0
through9
, and the hyphen (-
). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or blank spaces are permitted.