How to write a basic JSON parsing class

If your "JSON" is really like this, you should first take a baseball bat and go knock its producer over the head. Seriously.

If you really insist on writing your own class (why?), you can for instance use an interface like this:

public interface MyParser
{
    boolean parse()
        throws MyParsingException;
    MyParser next();
}

Implementations would then take a CharBuffer as an argument and a map builder class; and to parse you would do:

final CharBuffer buf = CharBuffer.wrap(yourSource);
final MyMapBuilder builder = new MyMapBuilder();

MyParser parser = new OpenBracketParser(buf, builder);

while (parser.parse())
    parser = parser.next();

// result is builer.build()

This is but one example...

Second solution, you want to use an existing parsing tool; in this case have a look at Parboiled. MUCH easier to use than antlr, jflex or others since you write your grammars in pure Java.

Finally, if you decide that enough is enough, and decide to use a JSON library (you really should do that), go with Jackson, which can read even such malformed JSON:

public static void main(final String... args)
    throws IOException
{
    final ObjectMapper mapper = new ObjectMapper()
        .configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true)
        .configure(JsonParser.Feature.ALLOW_SINGLE_QUOTES, true);

    final JsonNode node = mapper.readTree("{name: 'John'}");
    System.out.println(node); // {"name":"John"}
}

I've written one before. Steps:

  1. Take a string representing JSON text.

  2. Create a JsonToken class. I call mine JToken.

  3. Go over the entire text from step#1 and parse out the JToken(s).

  4. Recursively group and nest your JToken(s).

  5. Attempt To keep it simple and uniform. All JToken nodes have a child array that can have 0 or more children. If node is array, flag as array. Child array is used for the children of the node if is is an OBJECT or ARRAY. The only thing that changes is what it is flagged as. Also keep all values as string type. That way you just need a single member on the node called "value" that can be interpreted as the correct data type after all the hard work is done.

  6. Use defensive coding and unit tests. Write tests for all of the components of the parser. Better to spend an extra 3 hours writing code in a paranoid fashion where you assume you are making mistakes every second than to have to spend 3 hours hunting down bugs. Code paranoid enough, and you'll very rarely spend time being frustrated when debugging.

Example Code: When I was doing an easy(ironically) challenge on code-eval.com. There was a json menu parsing challenge. I thought it would be cheating to use any built-in functions, because to me the whole point of code challenges is to test your algorithm problem solving abilities. The challenge is here: https://www.codeeval.com/open_challenges/102/

My Code, that passes this challenge, using a parser built from scratch in javascript:

CODE: https://pastebin.com/BReK9iij
Was not able to post it on stack-overflow because it is too much code.
Put it in a non-expiring paste-bin post.

Note: This code could use some improvement. Some of it is very inefficient and it won't work with Unicode.

I wouldn't recommend writing your own JSON parser unless you are interpreting the JSON in some type of non-standard way.

For example: I currently am using JSONedit to organize branches for a text-based adventure. I am only using JSON file format because it is compact and the viewer allows me to expand and contract items. The standard parser that comes with GOLang doesn't interpret the information the way I want it to be interpreted, so I am writing my own parser.


This answer assumes that you really want to write a parser and are prepared to put in the effort required.

You MUST start with the formal specification of JSON. I have found http://www.ietf.org/rfc/rfc4627.txt. This defines the language precisely. You MUST implement everything in the spec and write tests for it. Your parser MUST cater for incorrect JSON (like yours) and throw Exceptions.

If you wish to write a parser, stop, think and then don't. It's a lot of work to get it working correctly. Whatever you do, make a proper job of it - incomplete parsers are a menace and should never be distributed.

You MUST write code that conforms. Here are some phrases from the spec. If you don't understand them you will have to research carefully and make sure you understand:

"JSON text SHALL be encoded in Unicode. The default encoding is UTF-8."

"A JSON parser MUST accept all texts that conform to the JSON grammar."

"Encoding considerations: 8bit if UTF-8; binary if UTF-16 or UTF-32

  JSON may be represented using UTF-8, UTF-16, or UTF-32.  When JSON
  is written in UTF-8, JSON is 8bit compatible.  When JSON is
  written in UTF-16 or UTF-32, the binary content-transfer-encoding
  must be used.

"

"Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C". "

If you understand these and still want to write a parser, then review some other parsers, and see if any of them have conformance tests. Borrow these for your own application.

If you are still keen you should strongly consider using a parser generator. Examples are JAVACC, CUP and my preferred tool, ANTLR. ANTLR is very powerful but can be difficult to start with. See also the suggestion of Parboiled, which I would now recommend. JSON is relatively simple and it would be a useful exercise. Most parser-generators generate a complete parser which can create executable code or generate the parse tree of your JSON.

There is a JSON parser-generator using ANTLR at http://www.antlr.org/wiki/display/ANTLR3/JSON+Interpreter if you are allowed to peek at it. I have also just discovered a Parboiled parser-generator for JSON. If your main reason for writing a parser is learning how to do it, this is probably a good starting point.

If you are not allowed (or don't want to) use a parser-generator then you will have to create your own parser. This generally comes in two parts:

a lexer/tokenizer. This recognizes the basic primitives defined in the language spec. In this case it would have to recognize curly-brackets, quotes, etc. It would probably also build the representation of numbers.

an AbstractSyntaxTree (http://en.wikipedia.org/wiki/Abstract_syntax_tree, AST) generator. Here you write code to assemble a tree representing the abstraction of your JSON (e.g. whitespace and curlies have been discarded).

When you have the AST it should be easy to iterate over the nodes and create your desired output.

But writing parser-generators, even for a simple language like JSON, is a lot-of-work.

Tags:

Java

Parsing

Json