Removing elements from JSON based on a condition in C#
var jObj = (JObject)JsonConvert.DeserializeObject(json);
HashSet<string> idsToDelete = new HashSet<string>() { "2f7661ae3c7a42dd9f2eb1946262cd24" };
jObj["response"]["docs"]
.Where(x => idsToDelete.Contains((string)x["id"]))
.ToList()
.ForEach(doc=>doc.Remove());
var newJson = jObj.ToString();
I've been attempting to compress this into a nicer LINQ statement for the last 10 minutes or so, but the fact that the list of known Ids is inherently changing how each element is evaluated means that I'm probably not going to get that to happen.
var jObj = (JObject)JsonConvert.DeserializeObject(json);
var docsToRemove = new List<JToken>();
foreach (var doc in jObj["response"]["docs"])
{
var id = (string)doc["id"];
if (knownIds.Contains(id))
{
docsToRemove.Add(doc);
}
else
{
knownIds.Add(id);
}
}
foreach (var doc in docsToRemove)
doc.Remove();
This seems to work well with the crappy little console app I spun up to test, but my testing was limited to the sample data above so if there's any problems go ahead and leave a comment so I can fix them.
For what it's worth, this will basically run in linear time with respect to how many elements you feed it, which is likely all the more algorithmic performance you're going to get without getting hilarious with this problem. Spinning each page of ~100 records off into its own task using the Task Parallel Library invoking a worker that will handle its own little page and returned the cleaned JSON string comes to mind. That would certainly make this faster if you ran it on a multi-cored machine, and I'd be happy to provide some code to get you started on that, but it's also a huge overengineering for the scope of the problem as it's presented.