Processing large JSON files in PHP

I decided on working on an event based parser. It's not quite done yet and will edit the question with a link to my work when I roll out a satisfying version.

EDIT:

I finally worked out a version of the parser that I am satisfied with. It's available on GitHub:

https://github.com/kuma-giyomu/JSONParser

There's probably room for some improvement and am welcoming feedback.


Recently I made a library called JSON Machine, which efficiently parses unpredictably big JSON files. Usage is via simple foreach. I use it myself for my project.

Example:

foreach (JsonMachine::fromFile('employees.json') as $employee) {
    $employee['name']; // etc
}

See https://github.com/halaxa/json-machine


I've written a streaming JSON pull parser pcrov/JsonReader for PHP 7 with an api based on XMLReader.

It differs significantly from event-based parsers in that instead of setting up callbacks and letting the parser do its thing, you call methods on the parser to move along or retrieve data as desired. Found your desired bits and want to stop parsing? Then stop parsing (and call close() because it's the nice thing to do.)

(For a slightly longer overview of pull vs event-based parsers see XML reader models: SAX versus XML pull parser.)


Example 1:

Read each object as a whole from your JSON.

use pcrov\JsonReader\JsonReader;

$reader = new JsonReader();
$reader->open("data.json");

$reader->read(); // Outer array.
$depth = $reader->depth(); // Check in a moment to break when the array is done.
$reader->read(); // Step to the first object.
do {
    print_r($reader->value()); // Do your thing.
} while ($reader->next() && $reader->depth() > $depth); // Read each sibling.

$reader->close();

Output:

Array
(
    [property] => value
    [property2] => value2
)
Array
(
    [prop] => val
)
Array
(
    [foo] => bar
)

Objects get returned as stringly-keyed arrays due (in part) to edge cases where valid JSON would produce property names that are not allowed in PHP objects. Working around these conflicts isn't worthwhile as an anemic stdClass object brings no value over a simple array anyway.


Example 2:

Read each named element individually.

$reader = new pcrov\JsonReader\JsonReader();
$reader->open("data.json");

while ($reader->read()) {
    $name = $reader->name();
    if ($name !== null) {
        echo "$name: {$reader->value()}\n";
    }
}

$reader->close();

Output:

property: value
property2: value2
prop: val
foo: bar

Example 3:

Read each property of a given name. Bonus: read from a string instead of a URI, plus get data from properties with duplicate names in the same object (which is allowed in JSON, how fun.)

$json = <<<'JSON'
[
    {"property":"value", "property2":"value2"},
    {"foo":"foo", "foo":"bar"},
    {"prop":"val"},
    {"foo":"baz"},
    {"foo":"quux"}
]
JSON;

$reader = new pcrov\JsonReader\JsonReader();
$reader->json($json);

while ($reader->read("foo")) {
    echo "{$reader->name()}: {$reader->value()}\n";
}

$reader->close();

Output:

foo: foo
foo: bar
foo: baz
foo: quux

How exactly to best read through your JSON depends on its structure and what you want to do with it. These examples should give you a place to start.