AmazonS3 putObject with InputStream length example

Because the original question was never answered, and I had to run into this same problem, the solution for the MD5 problem is that S3 doesn't want the Hex encoded MD5 string we normally think about.

Instead, I had to do this.

// content is a passed in InputStream
byte[] resultByte = DigestUtils.md5(content);
String streamMD5 = new String(Base64.encodeBase64(resultByte));
metaData.setContentMD5(streamMD5);

Essentially what they want for the MD5 value is the Base64 encoded raw MD5 byte-array, not the Hex string. When I switched to this it started working great for me.

If all you are trying to do is solve the content length error from amazon then you could just read the bytes from the input stream to a Long and add that to the metadata.

/*
 * Obtain the Content length of the Input stream for S3 header
 */
try {
    InputStream is = event.getFile().getInputstream();
    contentBytes = IOUtils.toByteArray(is);
} catch (IOException e) {
    System.err.printf("Failed while reading bytes from %s", e.getMessage());
} 

Long contentLength = Long.valueOf(contentBytes.length);

ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(contentLength);

/*
 * Reobtain the tmp uploaded file as input stream
 */
InputStream inputStream = event.getFile().getInputstream();

/*
 * Put the object in S3
 */
try {

    s3client.putObject(new PutObjectRequest(bucketName, keyName, inputStream, metadata));

} catch (AmazonServiceException ase) {
    System.out.println("Error Message:    " + ase.getMessage());
    System.out.println("HTTP Status Code: " + ase.getStatusCode());
    System.out.println("AWS Error Code:   " + ase.getErrorCode());
    System.out.println("Error Type:       " + ase.getErrorType());
    System.out.println("Request ID:       " + ase.getRequestId());
} catch (AmazonClientException ace) {
    System.out.println("Error Message: " + ace.getMessage());
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
}

You'll need to read the input stream twice using this exact method so if you are uploading a very large file you might need to look at reading it once into an array and then reading it from there.

For uploading, the S3 SDK has two putObject methods:

PutObjectRequest(String bucketName, String key, File file)

and

PutObjectRequest(String bucketName, String key, InputStream input, ObjectMetadata metadata)

The inputstream+ObjectMetadata method needs a minimum metadata of Content Length of your inputstream. If you don't, then it will buffer in-memory to get that information, this could cause OOM. Alternatively, you could do your own in-memory buffering to get the length, but then you need to get a second inputstream.

Not asked by the OP (limitations of his environment), but for someone else, such as me. I find it easier, and safer (if you have access to temp file), to write the inputstream to a temp file, and put the temp file. No in-memory buffer, and no requirement to create a second inputstream.

AmazonS3 s3Service = new AmazonS3Client(awsCredentials);
File scratchFile = File.createTempFile("prefix", "suffix");
try {
    FileUtils.copyInputStreamToFile(inputStream, scratchFile);    
    PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, id, scratchFile);
    PutObjectResult putObjectResult = s3Service.putObject(putObjectRequest);

} finally {
    if(scratchFile.exists()) {
        scratchFile.delete();
    }
}

While writing to S3, you need to specify the length of S3 object to be sure that there are no out of memory errors.

Using IOUtils.toByteArray(stream) is also prone to OOM errors because this is backed by ByteArrayOutputStream

So, the best option is to first write the inputstream to a temp file on local disk and then use that file to write to S3 by specifying the length of temp file.

AmazonS3 putObject with InputStream length example

Tags:

Java

Amazon S3

Google App Engine

Md5

Inputstream

Related

Recent Posts