AWS lambda memory usage with temporary files in python code

Update

Note: To answer this question, I used Lambdash, although I had to modify the lambda version that is used to node8.10. Lambdash is a simple little library that you can use to run shell commands on a lambda from your local terminal.

The /tmp directory on AWS Lambdas is mounted as a loop device. You can verify this by (after following the setup instructions for lambdash), running the following command:

./lambdash df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       30G  4.0G   26G  14% /
/dev/loop0      526M  872K  514M   1% /tmp
/dev/loop1      6.5M  6.5M     0 100% /var/task

According to https://unix.stackexchange.com/questions/278647/overhead-of-using-loop-mounted-images-under-linux,

data accessed through the loop device has to go through two filesystem layers, each doing its own caching so data ends up cached twice, wasting much memory (the infamous "double cache" issue)

However, my guess is that /tmp is actually kept in-memory. To test this, I ran the following commands:

./lambdash df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       30G  4.0G   26G  14% /
/dev/loop0      526M  1.9M  513M   1% /tmp
/dev/loop1      6.5M  6.5M     0 100% /var/task

./lambdash dd if=/dev/zero of=/tmp/file.txt count=409600 bs=1024
409600+0 records in
409600+0 records out
419430400 bytes (419 MB) copied, 1.39277 s, 301 MB/s

./lambdash df -h
 Filesystem      Size  Used Avail Use% Mounted on
 /dev/xvda1       30G  4.8G   25G  17% /
 /dev/loop2      526M  401M  114M  78% /tmp
 /dev/loop3      6.5M  6.5M     0 100% /var/task

./lambdash df -h
 Filesystem      Size  Used Avail Use% Mounted on
 /dev/xvda1       30G  4.8G   25G  17% /
 /dev/loop2      526M  401M  114M  78% /tmp
 /dev/loop3      6.5M  6.5M     0 100% /var/task

Keep in mind, each time I ran it, the lambda was executed. Below is the output from the Lambda's Cloudwatch logs:

07:06:30 START RequestId: 4143f502-14a6-11e9-bce4-eff8b92bf218 Version: $LATEST 07:06:30 END RequestId: 4143f502-14a6-11e9-bce4-eff8b92bf218 07:06:30 REPORT RequestId: 4143f502-14a6-11e9-bce4-eff8b92bf218 Duration: 3.60 ms Billed Duration: 100 ms Memory Size: 1536 MB Max Memory Used: 30 MB

07:06:32 START RequestId: 429eca30-14a6-11e9-9b0b-edfabd15c79f Version: $LATEST 07:06:34 END RequestId: 429eca30-14a6-11e9-9b0b-edfabd15c79f 07:06:34 REPORT RequestId: 429eca30-14a6-11e9-9b0b-edfabd15c79f Duration: 1396.29 ms Billed Duration: 1400 ms Memory Size: 1536 MB Max Memory Used: 430 MB

07:06:36 START RequestId: 44a03f03-14a6-11e9-83cf-f375e336ed87 Version: $LATEST 07:06:36 END RequestId: 44a03f03-14a6-11e9-83cf-f375e336ed87 07:06:36 REPORT RequestId: 44a03f03-14a6-11e9-83cf-f375e336ed87 Duration: 3.69 ms Billed Duration: 100 ms Memory Size: 1536 MB Max Memory Used: 431 MB

07:06:38 START RequestId: 4606381a-14a6-11e9-a32d-2956620824ab Version: $LATEST 07:06:38 END RequestId: 4606381a-14a6-11e9-a32d-2956620824ab 07:06:38 REPORT RequestId: 4606381a-14a6-11e9-a32d-2956620824ab Duration: 3.63 ms Billed Duration: 100 ms Memory Size: 1536 MB Max Memory Used: 431 MB

What happened and what does this mean?

The lambda was executed 4 times. On the first execution, I displayed mounted devices. On the second execution, I populated a file in the /tmp directory, utilizing 401Mb of the 500Mb allowed. On the subsequent executions, I listed mounted devices, displaying their available space.

The memory utilization on the first execution was 30Mb. The memory utilization for the subsequent executions was in the 400Mb range.

This confirms that /tmp utilization does in fact contribute to memory utilization.

Original Answer

My guess is that what you are observing is python, or the lambda container itself, buffering the file in memory during write operations.

According to https://docs.python.org/3/library/functions.html#open,

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:

Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On many systems, the buffer will typically be 4096 or 8192 bytes long. “Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.

The tempfile.TemporaryFile() function has a keyword parameter, buffering, which is basically passed directly into the open call described above.

So my guess is that the tempfile.TemporaryFile() function uses the default open() function's buffering setting. You might try something like tempfile.TemporaryFile(buffering=0) to disable buffering, or tempfile.TemporaryFile(buffering=512) to explicitly set the maximum amount of memory that will be utilized while writing data to a file.

Usage of /tmp does not count towards memory usage. The only case when this could be correlated is when you read file content into the memory.

AWS lambda memory usage with temporary files in python code

Tags:

Python

Amazon Web Services

Aws Lambda

Related

Recent Posts