How to json_decode invalid JSON with apostrophe instead of quotation mark

Here's an alternative solution to this problem:

function fixJSON($json) {
    $regex = <<<'REGEX'
~
    "[^"\\]*(?:\\.|[^"\\]*)*"
    (*SKIP)(*F)
  | '([^'\\]*(?:\\.|[^'\\]*)*)'
~x
REGEX;

    return preg_replace_callback($regex, function($matches) {
        return '"' . preg_replace('~\\\\.(*SKIP)(*F)|"~', '\\"', $matches[1]) . '"';
    }, $json);
}

This approach is more robust than h2ooooooo's function in two respects:

  • It preserves double quotes occurring in a single quoted string, by applying additional escaping to them. h2o's variant will replace them with double quotes instead, thus changing the value of the string.
  • It will properly handle escaped double quotes \", for which h2o's version seems to go into an infinite loop.

Test:

$brokenJSON = <<<'JSON'
['foo', {"bar": "hel'lo", "foo": 'ba"r ba\"z', "baz": "wor\"ld ' test"}]
JSON;

$fixedJSON = fixJSON($brokenJSON);
$decoded = json_decode($fixedJSON);

var_dump($fixedJSON);
print_r($decoded);

Output:

string(74) "["foo", {"bar": "hel'lo", "foo": "ba\"r ba\"z", "baz": "wor\"ld ' test"}]"
Array
(
    [0] => foo
    [1] => stdClass Object
        (
            [bar] => hel'lo
            [foo] => ba"r ba"z
            [baz] => wor"ld ' test
        )
)

NikiCs´ answer is already spot on. Your input seems to be manually generated, so it's entirely possible that within ' single quoted strings, you'll receive unquoted " doubles. A regex assertion is therefore advisable instead of a plain search and replace.

But there are also a few userland JSON parsers which support a bit more Javascript expression syntax. It's probably best to speak of JSOL, JavaScript Object Literals, at this point.

PEARs Services_JSON

Services_JSON can decode:

  • unquoted object keys
  • and strings enclosed in single quotes.

No additional options are required, just = (new Services_JSON)->decode($jsol);

up_json_decode() in upgradephp

This was actually meant as fallback for early PHP versions without JSON extension. It reimplements PHPs json_decode(). But there's also the upgrade.php.prefixed version, which you'd use here.
It introduces an additional flag JSON_PARSE_JAVASCRIPT.

up_json_decode($jsol, false, 512, JSON_PARSE_JAVASCRIPT);

And I totally forgot about mentionind this in the docs, but it also supports single-quoted strings.
For instance:

{ num: 123, "key": "value", 'single': 'with \' and unquoted " dbls' } 

Will decode into:

stdClass Object
(
    [num] => 123
    [key] => value
    [single] => with ' and unquoted " double quotes
)

Other options

  • JasonDecoder by @ArtisticPhoenix does support unquoted keys and literals, though no '-quoted strings. It's easy to understand or extend however.

  • YAML (1.2) is a superset of JSON, and most parsers support both unquoted keys or single-quoted strings. See also PHP YAML Parsers

Obviously any JSOL tokenizer/parser in userland is measurably slower than just preprocessing malformed JSON. If you expect no further gotchas from your webservice, go for the regex/quote conversion instead.


Here's a simple parser that'll fix your quotes for you. If it encounters a ' quote which isn't in a double quote ", it'll assume that it's wrong and replace the double quotes inside of that quote, and turn the quote enclosured into double quotes:

Example:

<?php
    function fixJSON($json) {
        $newJSON = '';

        $jsonLength = strlen($json);
        for ($i = 0; $i < $jsonLength; $i++) {
            if ($json[$i] == '"' || $json[$i] == "'") {
                $nextQuote = strpos($json, $json[$i], $i + 1);
                $quoteContent = substr($json, $i + 1, $nextQuote - $i - 1);
                $newJSON .= '"' . str_replace('"', "'", $quoteContent) . '"';
                $i = $nextQuote;
            } else {
                $newJSON .= $json[$i];
            }
        }

        return $newJSON;
    }

    $brokenJSON = "['foo', {\"bar\": \"hel'lo\", \"foo\": 'ba\"r'}]";
    $fixedJSON = fixJSON( $brokenJSON );

    var_dump($fixedJSON);

    print_r( json_decode( $fixedJSON ) );
?>

Output:

string(41) "["foo", {"bar": "hel'lo", "foo": "ba'r"}]"
Array
(
    [0] => foo
    [1] => stdClass Object
        (
            [bar] => hel'lo
            [foo] => ba'r
        )

)

DEMO


One solution would be to build a proxy using NodeJS. NodeJS will handle the faulty JSON just fine and return a clean version:

johan:~ # node
> JSON.stringify(['foo', 'bar']);
'["foo","bar"]'

Maybe write a simple Node script that accepts the JSON data as STDIN and returns the validated JSON to STDOUT. That way you can call it from PHP.

The downside is that your server would need NodeJS. Not sure if that is a problem for you.

Tags:

Php

Json