'safe' json_decode( ,,, ) to prevent exhausting memory
You must be getting some massive JSON responses if they manage to exhaust your server's memory. Here are some metrics with a 1 MB file containing a multidimensional associated array (containing data prepared for entry into three MySQL tables with diverse data-types).
When I include
and the file is loaded into memory as an array, my memory usage goes to 9 MB. If I get the raw data with file_get_contents()
, it takes 1 MB memory as expected. Then, a PHP array has an approximate ratio of 1:9 to the strlen()
of the data (originally output with var_export()
).
When I run json_encode()
, peak memory usage doesn't increase. (PHP allocates memory in blocks so there's often a bit of overhead, in this case enough to include the string data of the JSON; but it could bump you up one block more.) The resulting JSON data as a string takes 670 KB.
When I load the JSON data with file_get_contents
into a string, it takes an expected 0.75 MB of memory. When I run json_decode()
on it, it takes 7 MB of memory. I would then factor a minimum ratio of 1:10 for JSON-data-bytesize decoded to native PHP array-or-object for RAM requirement.
To run a test on your JSON data before decoding it, you could then do something like this:
if (strlen($my_json) * 10 > ($my_mb_memory * 1024 * 1024)) {
die ('Decoding this would exhaust the server memory. Sorry!');
}
...where $my_json
is the raw JSON response, and $my_mb_memory
is your allocated RAM that's converted into bytes for comparison with the incoming data. (You can of course also use intval(ini_get('memory_limit'))
to get your memory limit as an integer.)
As pointed out below, the RAM usage will also depend on your data structure. For contrast, a few more quick test cases because I'm curious myself:
- If I create a uni-dimensional array with integers 1-60000, the saved PHP array size is 1 MB, but peak RAM usage is between 10.5 and 12.5 MB (curious oscillation), or a ratio of 1:12-ish.
- If I create a 1 MB file's worth data as 12000 random strings as a basic associative array, memory usage is only 5 MB when loaded; ratio of 1:5.
- If I create a 1 MB file's worth as a similar associative array, where half the entries are arrays as strings with a numeric index, memory usage is 7 MB, ratio 1:7.
So your actual RAM mileage may vary a good deal. Also be aware that if you pass that bulk of data around in circles and do a bit of this and that, your memory usage may get much (or exponentially, depending on your code economy) higher than what json_decode()
alone will cause.
To debug memory usage, you can use memory_get_usage()
and/or memory_get_peak_usage()
at major intervals in your code to log or output the memory used in different parts of your code.
Rather than simply quit if the JSON file is too large, you can process arbitrary size JSON files by using an event-based JSON parser like https://github.com/salsify/jsonstreamingparser. Only a small chunk of the object/array will be loaded into memory at a time.
My first answer above is purely about avoiding the memory limit. Now how can you deal with the data if you hate to discard some, but if it keeps being occasionally bulky beyond your memory limit?
Presuming that you don't need to have the response parsed in one shot and absolute real time. Then you could simply split the response into suitably sized chunks, for example with explode()
or preg_split()
, and save them into a temporary directory, and process later in a batch operation.
I presume the large API responses return multiple data-sets at once; if not, you could also splice a single multi-dimensional entry into more manageable chunks that are later rejoined, although that would require much more surgical precision into crafting your JSON-string splitter function.
If the multiple data-sets need to be associated in later processing (such as database-entry), you would also want to have an aggregator file containing the metadata for the batch op. (Or otherwise stick it all into a database.) You would of course have to ensure that the chunked data is well-formed. It's not ideal, but not having gigs of memory isn't ideal either. Batching is one way of dealing with it.