Regex with sed command to parse json text

Do not parse complex nested data structures like JSON or XML with regular expressions, use a proper JSON parser, like jshon.

First you need to install it:

sudo apt-get install jshon

Then you have to provide it the JSON data to parse via standard input, so you can either redirect another command's output there with a pipe (|) or redirect a file to it (< filename).

The arguments it needs to extract the data you want look like this:

jshon -e "buildStatus" -e "status" -u
  • -e "buildStatus" picks the element with the "buildStatus" index from the top level dictionary.
  • -e "status" picks the element with the "status" index from the second level dictionary picked above.
  • -u converts the selected data from JSON to plain data (i.e. here it removes the quotes around the string)

So the command you run, depending on where you get the data from, looks like one of those:

jshon -e "buildStatus" -e "status" -u < YOUR_INPUT_FILE
YOUR_JSON_PRODUCING_COMMAND | jshon -e "buildStatus" -e "status" -u

To learn more about jshon, you can read its manpage accessible online here or by simply typing man jshon.


Job for jq:

jq -r '.["buildStatus"]["status"]' file.json

Can be shortened to:

jq -r '.buildStatus.status' file.json

-r (--raw-output) outputs the string without json string formatting i.e. without quotes.

Example:

% cat file.json                   
{
    "buildStatus" : {
        "status" : "ERROR",
        "conditions" : [{
                "status" : "OK",
                "metricKey" : "bugs"
            }, {
                "status" : "ERROR",
                "metricKey" : "test_success_density"
            }, {
                "status" : "OK",
                "metricKey" : "vulnerabilities"
            }
        ],
        "periods" : []
    }
}

% jq -r '.["buildStatus"]["status"]' file.json
ERROR

% jq -r '.buildStatus.status' file.json       
ERROR

If not installed already, install it by (available in the Universe repository):

sudo apt-get install jq 

As has been mentioned, parsing complex structured data is preferable with appropriate API. Python has json module for that , which I personally use quite a lot in my scripts, and it's quite easy to extract the desired fields you want as so:

$ python -c 'import sys,json;print json.load(sys.stdin)["buildStatus"]["status"]' <  input.txt
ERROR

What happens here is that we redirect input file to python's stdin, and read that with json.load(). That becomes a python dictionary with key "buildStatus", and it contains another python dictionary with "status" key. Thus, we're merely are printing out value of a key in a dictionary that is stored within another dictionary. Fairly simple.

Aside from simplicity, another advantage is that python and this API are all preinstalled and come with Ubuntu by default.