Where to find body of email depending of mimeType
I think it will make sense if you think of the payload
as a part
in of itself. Let's say I send a message with just a subject and a plain message text:
From: [email protected]
To: [email protected]
Subject: Example Subject
This is the plain text message
This will result in the following parsed message:
{
"id": "154ecb53c10b74d8",
"threadId": "154ecb53c10b74d8",
"labelIds": [
"INBOX",
"SENT"
],
"snippet": "This is the plain text message",
"historyId": "38877",
"internalDate": "1464260181000",
"payload": {
"partId": "",
"mimeType": "text/plain",
"filename": "",
"headers": [
...
],
"body": {
"size": 31,
"data": "VGhpcyBpcyB0aGUgcGxhaW4gdGV4dCBtZXNzYWdlCg=="
}
},
"sizeEstimate": 355
}
If I send a message with a plain text part, a html part and an image, it will look like this when parsed:
{
"id": "154ed5ccaa12f3df",
"threadId": "154ed5ccaa12f3df",
"labelIds": [
"SENT",
"INBOX",
"IMPORTANT"
],
"snippet": "This is a plain/html message with an image.",
"historyId": "841379",
"internalDate": "1464271162000",
"payload": {
"mimeType": "multipart/mixed",
"filename": "",
"headers": [
...
],
"body": {
"size": 0
},
"parts": [
{
"mimeType": "multipart/alternative",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "multipart/alternative; boundary=089e0122896c7c80d80533bf3205"
}
],
"body": {
"size": 0
},
"parts": [
{
"partId": "0.0",
"mimeType": "text/plain",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "text/plain; charset=UTF-8"
}
],
"body": {
"size": 47,
"data": "VGhpcyBpcyBhIHBsYWluL2h0bWwgKm1lc3NhZ2UqIHdpdGggYW4gaW1hZ2UuDQo="
}
},
{
"partId": "0.1",
"mimeType": "text/html",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "text/html; charset=UTF-8"
}
],
"body": {
"size": 73,
"data": "PGRpdiBkaXI9Imx0ciI-VGhpcyBpcyBhIHBsYWluL2h0bWwgPGI-bWVzc2FnZTwvYj4gd2l0aCBhbiBpbWFnZS48L2Rpdj4NCg=="
}
}
]
},
{
"partId": "1",
"mimeType": "image/png",
"filename": "smile.png",
"headers": [
...
],
"body": {
"attachmentId": "ANGjdJ-OrSy7VAYL-UbRyNtmySbZLlV-fV43zJF0_neNGZ8yKugsZAxb32eSb-CrbYIhF9NvjGwBVEjSkRrUWoCS7aDpgoQnt9WR7f2sa17qVEyOg_JVSbrGrunirvQw2dY-SxxB3Y0JP3aYDHSBXpNO6fFCByVFWQDw1et5Mh9di7bGO4AWOLKFVe_Yb2RmdDwuazGXGb8zA88TTMaiEPIacPTNiVtBrIWG0EKGxHBhep9j8ujyWeCS5P9X80dBHvBNj4T9XjUwcrN6FvwegRewRMM9cBupY7jQESR7915OcbhCNyi5l64x6vVh1ZU",
"size": 2002
}
}
]
},
"sizeEstimate": 3077
}
You will see it's just the RFC822-message parsed to JSON. If you just traverse the parts
, and treat the payload
as a part
itself, you will find what you are looking for.
var parts = [response.payload];
while (parts.length) {
var part = parts.shift();
if (part.parts) {
parts = parts.concat(part.parts);
}
if(part.mimeType === 'text/html') {
var decodedPart = decodeURIComponent(escape(atob(part.body.data.replace(/\-/g, '+').replace(/\_/g, '/'))));
console.log(decodedPart);
}
}
There are many MIME types that can be returned, here are a few:
- text/plain: the message body only in plain text
- text/html: the message body only in HTML
- multipart/alternative: will contain two parts that are alternatives for each othe, for example:
- a text/plain part for the message body in plain text
- a text/html part for the message body in html
- multipart/mixed: will contain many unrelated parts which can be:
- multipart/alternative as above, or text/plain or text/html as above
- application/octet-stream, or other application/* for application specific mime types for attachments
- image/png ot other image/* for images, which could be embedded in the message.
The definitive reference for all this is RFC 2046 https://www.ietf.org/rfc/rfc2046.txt (you might want to also see 2044 and 2045)
To answer your question, build a tree of the message, and look either for:
- the first text/plain or text/html part (either in the message body or in a multipart/mixed)
- the first text/plain or text/html inside of a multipart/alternative, which may be part of a multipart mixed.
An example of a complex message:
multipart/mixed
- multipart/alternative
- text/plain <- message body in plain text
- text/html <- message body in HTML
- application/zip <- a zip file attachment
-
- multipart/alternative
I know this question is not new but I've wrote a PHP script which correctly parses messages pulled from Gmail API, including any type of attachment.
The script includes a recursive "iterateParts" function which iterates all message parts so we can be sure we extracted all available data from each message.
Script steps are:
- Pull all message ids from API
- Get some important headers (subject & from address)
- Either body is directly on payload or send payload to iterateParts
- iterateParts is parsing each message to $msgArr with it's data, base64 encoded
- Push $msgArr to master array $allmsgArr
- Traverse master array and save each part as file according to it's MIME type and filename
$maxToPull = 1; $gmailQuery = "ALL"; // Initializing Google API $service = new Google_Service_Gmail($client); // Pulling all gmail messages into $messages array $user = 'me'; $msglist = $service->users_messages->listUsersMessages($user, ["maxResults"=>$maxToPull, "q"=>$gmailQuery]); $messages = $msglist->getMessages(); // Master array that will hold all parsed messages data, including attachments $allmsgArr = array(); // Traverse each message foreach($messages as $message) { $msgArr = array(); $single_message = $service->users_messages->get('me', $message->getId()); $payload = $single_message->getPayload(); // Nice to have the gmail msg id, can be used to direct access the message in Gmail's web gui $msgArr['gmailmsgid'] = $message->getId(); // Retrieving the subject and "from" email address foreach($payload->getheaders() as $oneheader) { if($oneheader['name'] == 'Subject') $msgArr['subject'] = $oneheader['value']; if($oneheader['name'] == 'From') $msgArr['fromaddress'] = substr($oneheader['value'], strpos($oneheader['value'], '<')+1, -1); } // If body is directly in the message payload (only for plain text messages where there's no HTML part and no attachments, normally this is not the case) if($payload['body']['size'] > 0) $msgArr['textplain'] = $payload['body']['data']; // Else, iterate over each message part and continue to dig if necessary else iterateParts($payload, $message->getId()); // Push the parsed $msgArr (parsed by iterateParts) to master array array_push($allmsgArr, $msgArr); } // Traverse each parsed message and saving it's content and attachments to files foreach($allmsgArr as $onemsgArr) { $folder = "messages/".$onemsgArr['gmailmsgid']; mkdir($folder); if($onemsgArr['textplain']) file_put_contents($folder."/textplain.txt", decodeData($onemsgArr['textplain'])); if($onemsgArr['texthtml']) file_put_contents($folder."/texthtml.html", decodeData($onemsgArr['texthtml'])); if($onemsgArr['attachments']) { foreach($onemsgArr['attachments'] as $oneattachment) { if(!empty($oneattachment['filename'])) $filename = $oneattachment['filename']; else if($oneattachment['mimetype'] == "message/rfc822" && empty($oneattachment['filename'])) // email attachments $filename = "noname.eml"; else $filename = "unknown"; file_put_contents($folder."/".$filename, decodeData($oneattachment['data'])); } } } function iterateParts($obj, $msgid) { global $msgArr; global $service; foreach($obj as $parts) { // if found body data if($parts['body']['size'] > 0) { // plain text representation of message body if($parts['mimeType'] == 'text/plain') { $msgArr['textplain'] = $parts['body']['data']; } // html representation of message body else if($parts['mimeType'] == 'text/html') { $msgArr['texthtml'] = $parts['body']['data']; } // if it's an attachment else if(!empty($parts['body']['attachmentId'])) { $attachArr['mimetype'] = $parts['mimeType']; $attachArr['filename'] = $parts['filename']; $attachArr['attachmentId'] = $parts['body']['attachmentId']; // the message holds the attachment id, retrieve it's data from users_messages_attachments $attachmentId_base64 = $parts['body']['attachmentId']; $single_attachment = $service->users_messages_attachments->get('me', $msgid, $attachmentId_base64); $attachArr['data'] = $single_attachment->getData(); $msgArr['attachments'][] = $attachArr; } } // if there are other parts inside, go get them if(!empty($parts['parts']) && !empty($parts['mimeType']) && empty($parts['body']['attachmentId'])) { iterateParts($parts->getParts(), $msgid); } } } // All data returned from API is base64 encoded function decodeData($data) { $sanitizedData = strtr($data,'-_', '+/'); return base64_decode($sanitizedData); }
This is how $allmsgArr will look like (where only one message was pulled):
Array ( [0] => Array ( [gmailmsgid] => 25k1asfa556x2da [fromaddress] => [email protected] [subject] => Fwd: Sea gulls picture [textplain] => UE5SIDQxQzAwMg0KDQpBUkJFTFRFU1QxDQoNCg0K [texthtml] => PGRpdiBkaXI9Imx0ciI-PHNwYW4gc3R5bGU9ImZi [attachments] => Array ( [0] => Array ( [mimetype] => image/png [filename] => sea_gulls.png [attachmentId] => ANGjdJ9tmy4d8vPXhU_BjNEFEaDODOpu29W2u5OTM7a0 [data] => iVBORw0KGgoAAAANSUhEUgAABSYAAAKWCAYAAABUP ) [1] => Array ( [mimetype] => image/jpeg [filename] => Outlook_Signature.jpg [attachmentId] => ANGjdJ-CgZTK0oK44Q8j7TlN_JlaexxGKZ_wHFfoEB [data] => 6jRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEa ) ) ) )