What's the difference between URI.escape and CGI.escape?
What's the difference between an axe and a sword and which one I should use? Well it depends on what you need to do.
URI.escape
was supposed to encode a string (URL) into, so called, "Percent-encoding".
CGI::escape
is coming from the CGI spec, which describes how data should be encoded/decode between web server and application.
Now, let's say that you need to escape a URI in your app. It is a more specific use case.
For that, the Ruby community used URI.escape
for years. The problem with URI.escape
was that it could not handle the RFC-3896 spec.
URI.escape 'http://google.com/foo?bar=at#anchor&title=My Blog & Your Blog'
# => "http://google.com/foo?bar=at%23anchor&title=My%20Blog%20&%20Your%20Blog"
URI.escape
was marked as obsolete:
Moreover current URI.encode is simple gsub. But I think it should split a URI to components, then escape each components, and finally join them.
So current URI.encode is considered harmful and deprecated. This will be removed or change behavior drastically.
What is the replacement at this time?
As I said above, current URI.encode is wrong on spec level. So we won't provide the exact replacement. The replacement will vary by its use case.
https://bugs.ruby-lang.org/issues/4167
Unfortunately there is not a single word about it in the docs, the only way to know about it is to check the source, or run the script with warnings in verbose level (-wW2
) (or use some google-fu).
Some proposed to use CGI::Escape
for query parameters, because you couldn't escape an entire URI:
CGI::escape 'http://google.com/foo?bar=at#anchor&title=My Blog & Your Blog'
# => "http%3A%2F%2Fgoogle.com%2Ffoo%3Fbar%3Dat%23anchor%26title%3DMy+Blog+%26+Your+Blog"
CGI::escape
should be used for query parameters only, but the results will be, again, against the spec. Actually the most common use-case is escaping form data, such as while sending an application/x-www-form-urlencoded
POST request.
Also mentioned WEBrick::HTTPUtils.escape
is not much of improvement (again it's just a simple gsub
, which is, IMO, even a worse option than URI.escape
):
WEBrick::HTTPUtils.escape 'http://google.com/foo?bar=at#anchor&title=My Blog & Your Blog'
# => "http://google.com/foo?bar=at%23anchor&title=My%20Blog%20&%20Your%20Blog"
The closest to the spec seems to be the Addressable gem:
require 'addressable/uri'
Addressable::URI.escape 'http://google.com/foo?bar=at#anchor&title=My Blog & Your Blog'
# => "http://google.com/foo?bar=at#anchor&title=My%20Blog%20&%20Your%20Blog"
Notice, that unlike all previous options, Addressable doesn't escape #
, and this is the expected behaviour. you want to keep the #
hash in the URI path but not in the URI query.
The only problem left is that we didn't escape our query parameters properly, which brings us to the conclusion: we should not use a single method for the entire URI, because there is no perfect solution (so far).
As you see &
was not escaped from "My Blog & Your Blog". We need to use a different form of escaping for query params, where users can put different characters that have a special meaning in URLs. Enter URL encode. URL encode should be used for every "suspicious" query value, similar to what ERB::Util.url_encode
does:
ERB::Util.url_encode "My Blod & Your Blog"
# => "My%20Blod%20%26%20Your%20Blog""
It's cool but we've already required Addressable:
uri = Addressable::URI.parse("http://www.go.com/foo")
# => #<Addressable::URI:0x186feb0 URI:http://www.go.com/foo>
uri.query_values = {title: "My Blog & Your Blog"}
uri.normalize.to_s
# => "http://www.go.com/foo?title=My%20Blog%20%26%20Your%20Blog"
Conclusion:
- Do not use
URI.escape
or similar - Use
CGI::escape
if you only need form escape - If you need to work with URIs, use Addressable, it offers URL encoding, form encoding and normalizes URLs.
- If it is a Rails project, check out "How do I URL-escape a string in Rails?"
There were some small differences, but the important point is that URI.escape
has been deprecated in Ruby 1.9.2... so use CGI::escape
or ERB::Util.url_encode.
There is a long discussion on ruby-core for those interested which also mentions WEBrick::HTTPUtils.escape and WEBrick::HTTPUtils.escape_form.
URI.escape takes a second parameter that lets you mark what's unsafe. See APIDock:
http://apidock.com/ruby/CGI/escape/class
http://apidock.com/ruby/URI/Escape/escape