Allow + in regex email validate email
Save your sanity. Get a pre-made PHP RFC 822 Email address parser
It seems like you aren't really familiar with what your regex is doing currently, which would be a good first step before modifying it. Let's walk through your regex using the email address [email protected]
(in each section below, the bolded part is what is matched by that section):
^
is the start of string anchor. It specifies that any match must begin at the beginning of the string. If the pattern is not anchored, the regex engine can match a substring, which is often undesired.Anchors are zero-width, meaning that they do not capture any characters.
[_a-z0-9-]+
is made up of two elements, a character class and a repetition modifer:[...]
defines a character class, which tells the regex engine, any of these characters are valid matches. In this case the class contains the characters a-z, numbers 0-9 and the dash and underscore (in general, a dash in a character class defines a range, so you can usea-z
instead ofabcdefghijklmnopqrstuvwxyz
; when given as the last character in the class, it acts as a literal dash).+
is a repetition modifier that specifies that the preceding token (in this case, the character class) can be repeated one or more times. There are two other repetition operators:*
matches zero or more times;?
matches exactly zero or one times (ie. makes something optional).
(captures john[email protected])
(\.[_a-z0-9-]+)*
again contains a repeated character class. It also contains a group, and an escaped character:(...)
defines a group, which allows you to group multiple tokens together (in this case, the group will be repeated as a whole).
Let's say we wanted to match 'abc', zero or more times (ie. abcabcabc matches, abcccc doesn't). If we tried to use the patternabc*
, the repetition modifier would only apply to thec
, because c is the last token before the modifier. In order to get around this, we can group abc ((abc)*
), in which case the modifier would apply to the entire group, as if it was a single token.\.
specifies a literal dot character. The reason this is needed is because.
is a special character in regex, meaning any character. Since we want to match an actual dot character, we need to escape it.
(captures john.robert.smith@mail.com)
@
is not a special character in regex, so, like all other non-special characters, it matches literally.
(captures john.robert.smith@mail.com)[a-z0-9-]+
again defines a repeated character class, like item #2 above.
(captures john.robert.smith@mail.com)(\.[a-z0-9-]+)*
is almost exactly the same pattern as #3 above.
(captures john.robert.smith@mail.com)$
is the end of string anchor. It works the same as^
above, except matches the end of the string.
With that in mind, it should be a bit clearer how to add a section with captures a plus segment. As we saw above, +
is a special character so it has to be escaped. Then, since the + has to be followed by some characters, we can define a character class with the characters we want to match and define its repetition. Finally, we should make the whole group optional because email addresses don't need to have a + segment:
(\+[a-z0-9-]+)?
When inserted into your regex, it'd look like this:
/^[_a-z0-9-]+(\.[_a-z0-9-]+)*(\+[a-z0-9-]+)?@[a-z0-9-]+(\.[a-z0-9-]+)*$/i