Parsing large JSON file in .NET

This might still be relevant to some now that the "new" System.Text.Json is out.

await using FileStream file = File.OpenRead("files/data.json");
var options = new JsonSerializerOptions {
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase
};

// Switch the JsonNode type with one of your own if
// you have a specific type you want to deserialize to.
IAsyncEnumerable<JsonNode?> enumerable = JsonSerializer.DeserializeAsyncEnumerable<JsonNode>(file, options);

await foreach (JsonNode? obj in enumerable) {
    var firstname = obj?["firstname"]?.GetValue<string>();
}

If you're interested in more, such as how to parse zipped JSON, there's this blog post that I wrote: Parsing 60GB Json Files using Streams in .NET.

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.

Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually.

This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.

    using (WebClient client = new WebClient())
    using (Stream stream = client.OpenRead(stringUrl))
    using (StreamReader streamReader = new StreamReader(stream))
    using (JsonTextReader reader = new JsonTextReader(streamReader))
    {
        reader.SupportMultipleContent = true;

        var serializer = new JsonSerializer();
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                Contact c = serializer.Deserialize<Contact>(reader);
                Console.WriteLine(c.FirstName + " " + c.LastName);
            }
        }
    }

Full demo here: https://dotnetfiddle.net/2TQa8p

I have done a similar thing in Python for the file size of 5 GB. I downloaded the file in some temporary location and read it line by line to form an JSON object similar on how SAX works.

For C# using Json.NET, you can download the file, use a stream reader to read the file, and pass that stream to JsonTextReader and parse it to JObject using JTokens.ReadFrom(your JSonTextReader object).

Json.NET supports deserializing directly from a stream. Here is a way to deserialize your JSON using a StreamReader reading the JSON string one piece at a time instead of having the entire JSON string loaded into memory.

using (WebClient client = new WebClient())
{
    using (StreamReader sr = new StreamReader(client.OpenRead(stringUrl)))
    {
        using (JsonReader reader = new JsonTextReader(sr))
        {
            JsonSerializer serializer = new JsonSerializer();

            // read the json from a stream
            // json size doesn't matter because only a small piece is read at a time from the HTTP request
            IList<Contact> result = serializer.Deserialize<List<Contact>>(reader);
        }
    }
}

Reference: JSON.NET Performance Tips

Parsing large JSON file in .NET

Tags:

C#

Json.Net

Deserialization

Json Deserialization

Related

Recent Posts