Decoding URL encoding (percent encoding)
Found these Python one liners that do what you want:
Python2
$ alias urldecode='python -c "import sys, urllib as ul; \
print ul.unquote_plus(sys.argv[1])"'
$ alias urlencode='python -c "import sys, urllib as ul; \
print ul.quote_plus(sys.argv[1])"'
Python3
$ alias urldecode='python3 -c "import sys, urllib.parse as ul; \
print(ul.unquote_plus(sys.argv[1]))"'
$ alias urlencode='python3 -c "import sys, urllib.parse as ul; \
print (ul.quote_plus(sys.argv[1]))"'
Example
$ urldecode 'q+werty%3D%2F%3B'
q werty=/;
$ urlencode 'q werty=/;'
q+werty%3D%2F%3B
References
- Urlencode and urldecode from a command line
sed
Try the following command line:
$ sed 's@+@ @g;s@%@\\x@g' file | xargs -0 printf "%b"
or the following alternative using echo -e
:
$ sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' file | xargs echo -e
Note: The above syntax may not convert +
to spaces, and can eat all the newlines.
You may define it as alias and add it to your shell rc files:
$ alias urldecode='sed "s@+@ @g;s@%@\\\\x@g" | xargs -0 printf "%b"'
Then every time when you need it, simply go with:
$ echo "http%3A%2F%2Fwww" | urldecode
http://www
Bash
When scripting, you can use the following syntax:
input="http%3A%2F%2Fwww"
decoded=$(printf '%b' "${input//%/\\x}")
However above syntax won't handle pluses (+
) correctly, so you've to replace them with spaces via sed
or as suggested by @isaac, use the following syntax:
decoded=$(input=${input//+/ }; printf "${input//%/\\x}")
You can also use the following urlencode()
and urldecode()
functions:
urlencode() {
# urlencode <string>
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf '%%%02X' "'$c" ;;
esac
done
}
urldecode() {
# urldecode <string>
local url_encoded="${1//+/ }"
printf '%b' "${url_encoded//%/\\x}"
}
Note that above
urldecode()
assumes the data contains no backslash.
Here is similar Joel's version found at: https://github.com/sixarm/urldecode.sh
bash + xxd
Bash function with xxd
tool:
urlencode() {
local length="${#1}"
for (( i = 0; i < length; i++ )); do
local c="${1:i:1}"
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
esac
done
}
Found in cdown's gist file, also at stackoverflow.
PHP
Using PHP you can try the following command:
$ echo oil+and+gas | php -r 'echo urldecode(fgets(STDIN));' // Or: php://stdin
oil and gas
or just:
php -r 'echo urldecode("oil+and+gas");'
Use -R
for multiple line input.
Perl
In Perl you can use URI::Escape
.
decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")
Or to process a file:
perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file
awk
Try anon solution:
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..
Note: Parameter -n
is specific to GNU awk
.
See: Using awk printf to urldecode text.
decoding file names
If you need to remove url encoding from the file names, use deurlname
tool from renameutils
(e.g. deurlname *.*
).
See also:
- Can wget decode uri file names when downloading in batch?
- How to remove URI encoding from file names?
Related:
- How to decode URL-encoded string in shell? at SO
- How can I encode and decode percent-encoded strings on the command line? at Ask Ubuntu
There is a built-in function for that in the Python standard library. In Python 2, it's urllib.unquote
.
decoded_url=$(python2 -c 'import sys, urllib; print urllib.unquote(sys.argv[1])' "$encoded_url")
Or to process a file:
python2 -c 'import sys, urllib; print urllib.unquote(sys.stdin.read())' <file >file.new &&
mv -f file.new file
In Python 3, it's urllib.parse.unquote
.
decoded_url=$(python3 -c 'import sys, urllib.parse; print(urllib.parse.unquote(sys.argv[1]))' "$encoded_url")
Or to process a file:
python3 -c 'import sys, urllib; print(urllib.parse.unquote(sys.stdin.read()))' <file >file.new &&
mv -f file.new file
In Perl you can use URI::Escape
.
decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")
Or to process a file:
perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file
If you want to stick to POSIX portable tools, it's awkward, because the only serious candidate is awk, which doesn't parse hexadecimal numbers. See Using awk printf to urldecode text for examples with common awk implementations, including BusyBox.