Extracting email from text string in excel/google spreadsheet
For Google Spreadsheets
Google Spreadsheets has these cool already-builtin regex formulas
- REGEXEXTRACT, REGEXMATCH, REGEXREPLACE
We use the first one to extract mail adresses. Put this formula in a Google Spreadsheet cell:
=iferror(Regexextract(A1;"[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}");"")
A1
is the cell where the string to check (mail body in your case) should reside- Complete formula consist of 2 parts. The inner part is the regex formula and the outer part is for error prevention
Regexextract(A1,"\[A-z0-9._%+-\]+@\[A-z0-9.-\]+\.\[A-z\]{2,4}")
returns the mail addressiferror(innerformula,"")
prevents#N/A
when the regex formula wasn't able to return anything e.g no valid mail address was found
How does the regex pattern work?
[A-z0-9._%+-]+@[A-z0-9.-]+.[A-z]{2,4}
A-z
represents any character between anA
and az
.
Note the uppercase and lowercase. This way it's case-insensitive0-9
represents any digit._%+-
represent those signs itself[ ]
represents a single character which is allowed inside the brackets- Putting a
+
sign behind[ ]
allows the previous pattern to be repeated infinitely @
has no special meaning. It literally searches for a@
sign[A-z0-9.-]+
is the same as above. But_%+
are not allowed behind a@
sign this time\.
searches for a single point. It has to be escaped with a preceeding\
because.
normally is a placeholder any character- At last
[A-z]{2,4}
searches for 2,3 or 4 case-insensitive characters
Used ressources
- http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/