dotnet core System.Text.Json unescape unicode string
You need to set the JsonSerializer options not to encode those strings.
JsonSerializerOptions jso = new JsonSerializerOptions();
jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;
Then you pass this options when you call your Serialize
method.
var s = JsonSerializer.Serialize(a, jso);
Full code:
JsonSerializerOptions jso = new JsonSerializerOptions();
jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;
var a = new A { Name = "你好" };
var s = JsonSerializer.Serialize(a, jso);
Console.WriteLine(s);
Result:
If you need to print the result in the console, you may need to install additional language. Please refer here.
To change the escaping behavior of the JsonSerializer
you can pass in a custom JavascriptEncoder
to the JsonSerializer
by setting the Encoder
property on the JsonSerializerOptions
.
https://docs.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializeroptions.encoder?view=netcore-3.0#System_Text_Json_JsonSerializerOptions_Encoder
The default behavior is designed with security in mind and the JsonSerializer
over-escapes for defense-in-depth.
If all you are looking for is escaping certain "alphanumeric" characters of a specific non-latin language, I would recommend that you instead create a JavascriptEncoder
using the Create
factory method rather than using the UnsafeRelaxedJsonEscaping
encoder.
JsonSerializerOptions options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.CjkUnifiedIdeographs)
};
var a = new A { Name = "你好" };
var s = JsonSerializer.Serialize(a, options);
Console.WriteLine(s);
Doing so keeps certain safe-guards, for instance, HTML-sensitive characters will continue to be escaped.
I would caution against using System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping
flippantly since it does minimal escaping (which is why it has "unsafe" in the name). If the JSON you are creating is written to a UTF-8 encoded file on disk or if its part of web request which explicitly sets the charset to utf-8 (and is not going to potentially be embedded within an HTML component as is), then it is probably OK to use this.
See the remarks section within the API docs: https://docs.microsoft.com/en-us/dotnet/api/system.text.encodings.web.javascriptencoder.unsaferelaxedjsonescaping?view=netcore-3.0#remarks
You could also consider specifying UnicodeRanges.All
if you expect/need all languages to remain un-escaped. This still escapes certain ASCII characters that are prone to security vulnerabilities.
JsonSerializerOptions options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)
};
For more information and code samples, see: https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to?view=netcore-3.0#customize-character-encoding
See the Caution Note