How to replace all percent-encoded UTF-8 substrings with plain UTF-8 text?
With bash, zsh, GNU echo or some implementations of ksh on some systems, this can be decoded simply by echo -e
after replacing all %
with \x
.
url_encoded_string="%D1%80%D0%B5%D1%81%D1%83%D1%80%D1%81%D1%8B"
temp_string=${url_encoded_string//%/\\x}
printf '%s\n' "$temp_string"
# output: \xD1\x80\xD0\xB5\xD1\x81\xD1\x83\xD1\x80\xD1\x81\xD1\x8B
echo -e "$temp_string"
# output: ресурсы
(It assumes the string itself doesn't contain backslash characters and is not one of the options supported by your echo
command)
As @JoshLee also points out, the "echo caveat" can be avoided by directly using:
printf ${url_encoded_string//%/\\x}
instead directly behind the first command.
With perl:
perl -pe 's/%([0-9A-F]{2})/pack"H2",$1/gei'
Or with URI::Escape
:
perl -MURI::Escape -pe '$_=uri_unescape$_'