Linq-to-XML XElement.Remove() leaves unwanted whitespace
It's not easy to answer in a portable way, because the solution heavily depends on how XDocument.Load()
generates whitespace text nodes (and there are several implementations of LINQ to XML around that might disagree about that subtle detail).
That said, it looks like you're never removing the last child (<description>
) from the <book>
elements. If that's indeed the case, then we don't have to worry about the indentation of the parent element's closing tag, and we can just remove the element and all its following text nodes until we reach another element. TakeWhile() will do the job.
EDIT: Well, it seems you need to remove the last child after all. Therefore, things will get more complicated. The code below implements the following algorithm:
- If the element is not the last element of its parent:
- Remove all following text nodes until we reach the next element.
- Otherwise:
- Remove all following text nodes until we find one containing a newline,
- If that node only contains a newline:
- Remove that node.
- Otherwise:
- Create a new node containing only the whitespace found after the newline,
- Insert that node after the original node,
- Remove the original node.
- Remove the element itself.
The resulting code is:
public static void RemoveWithNextWhitespace(this XElement element)
{
IEnumerable<XText> textNodes
= element.NodesAfterSelf()
.TakeWhile(node => node is XText).Cast<XText>();
if (element.ElementsAfterSelf().Any()) {
// Easy case, remove following text nodes.
textNodes.ToList().ForEach(node => node.Remove());
} else {
// Remove trailing whitespace.
textNodes.TakeWhile(text => !text.Value.Contains("\n"))
.ToList().ForEach(text => text.Remove());
// Fetch text node containing newline, if any.
XText newLineTextNode
= element.NodesAfterSelf().OfType<XText>().FirstOrDefault();
if (newLineTextNode != null) {
string value = newLineTextNode.Value;
if (value.Length > 1) {
// Composite text node, trim until newline (inclusive).
newLineTextNode.AddAfterSelf(
new XText(value.SubString(value.IndexOf('\n') + 1)));
}
// Remove original node.
newLineTextNode.Remove();
}
}
element.Remove();
}
From there, you can do:
if (Author != null) Author.RemoveWithNextWhitespace();
if (Title != null) Title.RemoveWithNextWhitespace();
if (Genre != null) Genre.RemoveWithNextWhitespace();
Though I would suggest you replace the above with something like a loop fed from an array or a params
method call , to avoid code redundancy.
I have a simpler solution than the accepted answer that works for my case and appears to work for yours too. Perhaps there are some more complicated cases it will not work for though, I'm not sure.
Here is the code:
public static void RemoveWithNextWhitespace(this XElement element)
{
if (element.PreviousNode is XText textNode)
{
textNode.Remove();
}
element
.Remove();
}
Here is my LINQPad query with your use case:
void Main()
{
var xDoc = XDocument.Parse(@"<?xml version=""1.0""?>
<catalog>
<book id=""bk101"">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>", LoadOptions.PreserveWhitespace);
XElement Author = xDoc.Root.Descendants("author").FirstOrDefault();
XElement Title = xDoc.Root.Descendants("title").FirstOrDefault();
XElement Genre = xDoc.Root.Descendants("genre").FirstOrDefault();
// Do something with Author, Title, and Genre here...
if (Author != null) Author.RemoveWithNextWhitespace();
if (Title != null) Title.RemoveWithNextWhitespace();
if (Genre != null) Genre.RemoveWithNextWhitespace();
xDoc.ToString().Dump();
}
static class Ext
{
public static void RemoveWithNextWhitespace(this XElement element)
{
if (element.PreviousNode is XText textNode)
{
textNode.Remove();
}
element
.Remove();
}
}
The main reason why I didn't just use the accepted answer myself was because it did not leave my XML properly formatted in some cases. e.g. in your use case if I removed the "description" element it would leave something that looked like this:
<catalog>
<book id="bk101">
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
</book>
</catalog>
Reading xml via an XmlReader
will preserve whitespace by default, including insignificant whitespace as you see here.
You should read it in ignoring whitespace by setting the appropriate xml reader setting:
using (var reader = XmlReader.Create(xmlStream, new XmlReaderSettings { IgnoreWhitespace = true }))
Note this doesn't remove significant whitespace (such as those in mixed content or in a scope preserving whitespace) so your formatting will remain.