Extracting email from text string in excel/google spreadsheet

For Google Spreadsheets

Google Spreadsheets has these cool already-builtin regex formulas

  • REGEXEXTRACT, REGEXMATCH, REGEXREPLACE

We use the first one to extract mail adresses. Put this formula in a Google Spreadsheet cell:

=iferror(Regexextract(A1;"[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}");"")
  • A1 is the cell where the string to check (mail body in your case) should reside
  • Complete formula consist of 2 parts. The inner part is the regex formula and the outer part is for error prevention
  • Regexextract(A1,"\[A-z0-9._%+-\]+@\[A-z0-9.-\]+\.\[A-z\]{2,4}") returns the mail address
  • iferror(innerformula,"") prevents #N/A when the regex formula wasn't able to return anything e.g no valid mail address was found

enter image description here

How does the regex pattern work?

[A-z0-9._%+-]+@[A-z0-9.-]+.[A-z]{2,4}

  • A-z represents any character between an A and a z.
    Note the uppercase and lowercase. This way it's case-insensitive
  • 0-9 represents any digit
  • ._%+- represent those signs itself
  • [ ] represents a single character which is allowed inside the brackets
  • Putting a + sign behind [ ] allows the previous pattern to be repeated infinitely
  • @ has no special meaning. It literally searches for a @ sign
  • [A-z0-9.-]+ is the same as above. But _%+ are not allowed behind a @ sign this time
  • \. searches for a single point. It has to be escaped with a preceeding \ because . normally is a placeholder any character
  • At last [A-z]{2,4} searches for 2,3 or 4 case-insensitive characters

Used ressources

  • http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/