GETting a URL with an url-encoded slash
By default, the Uri
class will not allow an escaped /
character (%2f
) in a URI (even though this appears to be legal in my reading of RFC 3986).
Uri uri = new Uri("http://example.com/%2F");
Console.WriteLine(uri.AbsoluteUri); // prints: http://example.com//
(Note: don't use Uri.ToString to print URIs.)
According to the bug report for this issue on Microsoft Connect, this behaviour is by design, but you can work around it by adding the following to your app.config or web.config file:
<uri>
<schemeSettings>
<add name="http" genericUriParserOptions="DontUnescapePathDotsAndSlashes" />
</schemeSettings>
</uri>
(Reposted from https://stackoverflow.com/a/10415482 because this is the "official" way to avoid this bug without using reflection to modify private fields.)
Edit: The Connect bug report is no longer visible, but the documentation for <schemeSettings>
recommends this approach to allow escaped /
characters in URIs. Note (as per that article) that there may be security implications for components that don't handle escaped slashes correctly.
This is a terrible hack, bound to be incompatible with future versions of the framework and so on.
But it works!
(on my machine...)
Uri uri = new Uri("http://example.com/%2F");
ForceCanonicalPathAndQuery(uri);
using (WebClient webClient = new WebClient())
{
webClient.DownloadData(uri);
}
void ForceCanonicalPathAndQuery(Uri uri){
string paq = uri.PathAndQuery; // need to access PathAndQuery
FieldInfo flagsFieldInfo = typeof(Uri).GetField("m_Flags", BindingFlags.Instance | BindingFlags.NonPublic);
ulong flags = (ulong) flagsFieldInfo.GetValue(uri);
flags &= ~((ulong) 0x30); // Flags.PathNotCanonical|Flags.QueryNotCanonical
flagsFieldInfo.SetValue(uri, flags);
}
Update on this: It looks like the default behavior of the Uri class was actually changed in .NET 4.5, and you can now use escaped slashes and they will not be touched.
I ran the following code in .NET 3.5, .NET 4.0, .NET 4.5/4.5.1
static void Main(string[] args)
{
var uri = new Uri("http://www.yahooo.com/%2F");
var client = new WebClient();
client.DownloadString(uri);
}
In .NET 3.5/4.0 the trace shows that the %2F was in fact unescaped as expected.
However, In .NET 4.5/4.5.1 you can see the %2F was not unescaped (notice the GET /%2F)
You can even use ToString() now on the Uri and you'll get the same result.
So in conclusion, it appears if you are using .NET >= .NET 4.5 then things will behave as they should inline with the RFC.
I just did an exploration of trying to get the same approach working on Mono. I posted my question on the approach here: Getting a Uri with escaped slashes on mono