C# .net converting HTML to RTF
Create a WebBrowser. Load it with the html content. Select all and copy from it. Paste into a richtextbox. Then you have the RTF
string html = "...."; // html content
RichTextBox rtbTemp = new RichTextBox();
WebBrowser wb = new WebBrowser();
wb.Navigate("about:blank");
wb.Document.Write(html);
wb.Document.ExecCommand("SelectAll", false, null);
wb.Document.ExecCommand("Copy", false, null);
rtbTemp.SelectAll();
rtbTemp.Paste();
Now rtbTemp.RTF has the RTF converted from the HTML.
TL;DR: I recommend using the OpenXml
format and the HtmlToOpenXml
nuget package if possible.
Microsoft Word COM
I haven't really searched much into this topic as a my use case is to use the functionality on a server which makes COM components not a great selection.
XHTML2RTF
As @IAmTimCorey mentioned you can use this codeproject library.
Disadvantages are:
- Limited supported HTML and CSS
- Not really .NET
- ...
Windows Forms Web Browser
As @Jerry mentioned you can use the Windows Forms WebBrowser
control.
Disadvantages are:
- Reference to System.Windows.Forms
- Uses copy & paste (problematic for multithreading)
- Only works in an STA thread
Not supported features include:
- Fonts
- Colors
- Numbered lists
- Strikethrough (
del
element) - ...
DevExpress
Code sample of "Paul V" from the devexpress support center. (03.02.2015)
public String ConvertRTFToHTML(String RTF)
{
MemoryStream ms = new MemoryStream();
StreamWriter writer = new StreamWriter(ms);
writer.Write(RTF);
writer.Flush();
ms.Position = 0;
String output = "";
HtmlEditorExtension.Import(HtmlEditorImportFormat.Rtf, ms, (s, enumerable) => output = s);
return output;
}
public String ConvertHTMLToRTF(String Html)
{
MemoryStream ms = new MemoryStream();
var editor = new ASPxHtmlEditor { Html = html };
editor.Export(HtmlEditorExportFormat.Rtf, ms);
ms.Position = 0;
StreamReader reader = new StreamReader(ms);
return reader.ReadToEnd();
}
Or you could use the RichEditDocumentServer
type as shown in this example.
- A license for devexpress can coast from around 1500.- USD to 2200.- USD.
Unknown what actually is supported.
Disadvantages are:
- Price
- Quite a lot of references for one small thing
- More?
Not supported features include:
- Striketrough (
del
element)
Sautinsoft
public string ConvertHTMLToRTF(string html)
{
SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
return h.ConvertString(htmlString);
}
public string ConvertRTFToHTML(string rtf)
{
SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
byte[] bytes = Encoding.ASCII.GetBytes(rtf);
r.OpenDocx(bytes );
return r.ToHtml();
}
More examples and configuration options can be found here and here.
- A licence for this component can coast from 400.- USD to 2000.- USD.
Supported is the following:
- HTML 3.2
- HTML 4.01
- HTML 5
- CSS
- XHTML
Disadvantages are:
- I'm not sure how active the development is
- Price
Usage knowledgebase:
- Converting numbered lists from the trix angular editor destroys indend
DIY
If you only wanted to support limited functionality you could write your own converter. I would not recommend this if the supported feature set is too large. (Sautinsoft claims to have written over 20'000 lines of code).
I have a small sample project here but is only for educational purposes in its current state.
OpenXml
If the OpenXml format is also ok for your use case you can use the HtmlToOpenXml nuget package. Its free and did support all features I've tested the other solutions against.
The project is based on the Open Xml SDK by microsoft and seems active.
public static byte[] ConvertHtmlToOpenXml(string html)
{
using (var generatedDocument = new MemoryStream())
{
using (var package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
var mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
var converter = new HtmlConverter(mainPart);
converter.ParseHtml(html);
mainPart.Document.Save();
}
return generatedDocument.ToArray();
}
}
- Link to example gist
The ExpertsExchange article is a poor one at best. Basically the OP gave up because they couldn't give a good answer. They list a link to the CodeProject article ( http://www.codeproject.com/KB/HTML/XHTML2RTF.aspx ) that shows you how to convert HTML to RTF but it isn't really a .NET solution. Instead, it would be something that would need to be highly adapted.
From my experience, there isn't a good open source converter out there. The pieces all seem to be there but it is waiting for someone to do the legwork of putting it all together. However, the immediate answer to your question is that there is not a converter already out there.