XML sorting/formatting tool

I liked this tool: https://xmlsorter.codeplex.com/

You can sort by tag name and attributes. I like to use it before comparing some XML files.

XML Sorter main window


I was looking for a similar utility and didn't really find what I was looking for, so I just barely wrote one instead. It's very simple (and doesn't include attributes in node sorting), but works.

Maybe it'll be useful to others.. It's on GitHub.

Here's a bit from the GitHub page...

USAGE: sortxml.exe [options] infile [outfile]

  infile      The name of the file to sort, etc.
  outfile     The name of the file to save the output to.
              If this is omitted, then the output is written to stdout.

OPTIONS:

  --pretty    Ignores the input formatting and makes the output look nice.
  --sort      Sort both the nodes and attributes.
  --sortnode  Sort the nodes.
  --sortattr  Sort the attributes.

(prefix an option with ! to turn it off.)

The default is to output pretty and sorted nodes and attributes. Here is an example:

> type sample.xml
<?xml version="1.0" encoding="utf-8" ?><root><node value="one" attr="name"/></root>

> sortxml.exe sample.xml
<?xml version="1.0" encoding="utf-8"?>
<root>
  <node attr="name" value="one" />
</root>

Out of frustration with Visual Studio which seems to reorder & rewrite EDMX-files (Entity Framework) all the time (see also this Uservoice), I wrote some Linqpad-code to reorder stuff. It is is however easy (and obvious) to use outside of LinqPad.

It orders the elements by element-type (tag), then by the value of the element-attribute "Name", and then by some other stuff to try to make it sort of deterministic (different xml, but same meaning, is [usually] same output - see code).

It also orders the attributes. Note that semantically XML-attributes may have no (relevant) order, but textually they do, and version control systems still consider them plain text...

(Note that it does not fix the different aliases, mentioned in Entity Framework edmx file regenerating differently amongst team)

void Main()
{
    XDocument xdoc = XDocument.Load(@"\\filepath1\file1.edmx");

    var orderedElements = CopyAndSortElements(xdoc.Elements());

    var newDoc = new XDocument();
    newDoc.Add(orderedElements);
    newDoc.Save(@"\\filepath1\file1.Ordered.edmx");
}

public IEnumerable<XElement> CopyAndSortElements(IEnumerable<XElement> elements)
{
    var newElements = new List<XElement>();
    // Sort XElements by Tag & name-attribute (and some other properties)
    var orderedElements = elements.OrderBy(elem => elem.Name.LocalName) // element-tag
                                  .ThenByDescending(elem => elem.Attributes("Name").Count()) // can be 0, more than 1 is invalid XML
                                  .ThenBy(elem => (elem.Attributes("Name").Any() ? elem.Attributes("Name").First().Value.ToString() : string.Empty))
                                   // in case of no Name-Attributes, try to sort by (number of) children
                                  .ThenBy(elem => elem.Elements().Count())
                                  .ThenBy(elem => elem.Attributes().Count())
                                  // next line may vary for textually different but semantically equal input when elem & attr were unordered on input, but I need to restrain myself...
                                  .ThenBy(elem => elem.ToString());
    foreach (var oldElement in orderedElements)
    {
        var newElement = new XElement(oldElement.Name);
        if (oldElement.HasElements == false && string.IsNullOrEmpty(oldElement.Value) == false)
        {
            // (EDMX does not have textual nodes, but SO-users may use it for other XML-types ;-) )
            // IsNullOrEmpty-check: not setting empty value keeps empty-element tag, setting value (even empty) causes start-tag immediately followed by an end-tag
            // (empty-element tags may be a matter of taste, but for textual comparison it will matter!)
            newElement.Value = oldElement.Value;
        }
        var orderedAttrs = oldElement.Attributes().OrderBy(attr => attr.Name.LocalName).ThenBy(attr => attr.Value.ToString());
        newElement.Add(orderedAttrs);
        newElement.Add(CopyAndSortElements(oldElement.Elements()));
        newElements.Add(newElement);
    }
    return newElements;
}

PS: We ended up using an XSLT, which somebody else wrote at the same time. I think it fitted easier/better in everybody's build process. But maybe/hopefully this is of some use to somebody.

Tags:

Xml

Formatting