Parse only one level of json
I think you can solve this using regex, it is working for me:
import re
pattern = re.compile('"([a-zA-Z0-9]+)"\s*:\s*(".*"|\[.*\]|\{.*\})')
dict(re.findall(pattern, json_string))
But I dont know if this is faster, you need try using your data.
[EDIT]
Yes, it is faster. I tried the scripts below and the regex version is 5 times faster.
using json module:
import json
val='''
{
"key1": "val1",
"key2": ["a","b", 3],
"key3": {"foo": 27, "bar": [1, 2, 3]}
}
'''
for n in range(100000):
dict((k,json.dumps(v)) for k,v in json.loads(val).items())
using regex:
import re
val='''{
"key1": "val1",
"key2": ["a","b", 3],
"key3": {"foo": 27, "bar": [1, 2, 3]}
}'''
pattern = re.compile('"([a-zA-Z0-9]+)"\s*:\s*(".*"|\[.*\]|\{.*\})')
for n in range(100000):
dict(re.findall(pattern, val))
Hardly an answer, but I only see two possibilities:
- Load the full JSON and dump back the values, which you have ruled out in your question
- Modify the content by wrapping the values in quotes, so that the JSON load yields string values
To be honest, I think there is no such thing as 'performance critical JSON parsing code', it just sounds wrong, so I'd go with the first option.