Guidelines for accepting email messages as input to application

Some guidelines and considerations:

The address question: The best thing to do is to use the "+" extension part of an email (myaddr**+custom**@gmail.com) address. This makes it easier to route, but most of all, easier to keep track of the address routing to your system. Other techniques might use a token in the subject

Spam: Do spam processing outside the app, and have the app filter based on a header.

Queuing failed messages: Don't, for the most part. The standard email behavior is to try for up to 3 days to deliver a message. For an application email server, all this does is create giant spool files of mail you'll most likely never process. Only queue messages if the failure reasons are out of your control (e.g., server is down).

Invalid message handling: There are a multiple of ways a message can be invalid. Some are limitations of the library (it can't parse the address, even though its an RFC valid one). Others are because of broken clients (e.g., omitting quotes around certain headers). Other's might be too large, or use an unknown encoding, be missing critical headers, have multiple values where there should only be one, violate some semantic specific to your application, etc, etc, etc. Basically, where ever the Java mail API could throw an exception is an error handling case you must determine how to appropriately handle.

Error responses: Not every error deserves a response. Some are generated because of spam, and you should avoid sending messages back to those addresses. Others are from automated systems (yourself, a vacation responder, another application mail system, etc), and if you reply, it'll send you another message, repeating the cycle.

Client-specific hacks: like above, each client has little differences that'll complicate your code. Keep this in mind anytime you traverse the structure of a message.

Senders, replies, and loops: Depending on your situation, you might receive mail from some of the following sources:

  • Real people, maybe from external sources
  • Mailing lists
  • Yourself, or one of your own recipient addresses
  • Other mail servers (bounces, failures, etc)
  • Entity in another system ([email protected], system-monitor@localhost)
  • An automated system
  • An alias to one of the above
  • An alias to an alias

Now, your first instinct is probably "Only accept mail from correct sources!", but that'll cause you lots of headaches down the line because people will send the damnedest things to an application mail server. I find its better to accept everything and explicitly deny the exceptions.

Debugging: Save a copy of the headers of any message you receive. This will help out tremendously anytime you have a problem.

--Edit--

I bought the book, Building Scalable Web Sites, mentioned by rossfabricant. It -does- have a good email section. A couple of important points it has are about handling email from wireless carriers and authentication of emails.


You can set the address that the email is sent from, what will be put into the To: address if someone just presses 'Reply-to'. Make that unique, and you'll be able to tell where it came from, and to where it must be directed back to.

When it comes to putting a name beside it though '"something here" ' - put something inviting to have them just reply to the mail. I've seen one major web-app, with Email capturing that has 'do not reply', which turns people off from actually sending anything to it though.


Building Scalable Web sites has a nice section on handling email. It's written by a Flickr developer.

alt text
(source: lsl.com.au)