Decoding URL encoding (percent encoding)

Found these Python one liners that do what you want:

Python2

$ alias urldecode='python -c "import sys, urllib as ul; \
    print ul.unquote_plus(sys.argv[1])"'

$ alias urlencode='python -c "import sys, urllib as ul; \
    print ul.quote_plus(sys.argv[1])"'

Python3

$ alias urldecode='python3 -c "import sys, urllib.parse as ul; \
    print(ul.unquote_plus(sys.argv[1]))"'

$ alias urlencode='python3 -c "import sys, urllib.parse as ul; \
    print (ul.quote_plus(sys.argv[1]))"'

Example

$ urldecode 'q+werty%3D%2F%3B'
q werty=/;

$ urlencode 'q werty=/;'
q+werty%3D%2F%3B

References

  • Urlencode and urldecode from a command line

sed

Try the following command line:

$ sed 's@+@ @g;s@%@\\x@g' file | xargs -0 printf "%b"

or the following alternative using echo -e:

$ sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' file | xargs echo -e

Note: The above syntax may not convert + to spaces, and can eat all the newlines.


You may define it as alias and add it to your shell rc files:

$ alias urldecode='sed "s@+@ @g;s@%@\\\\x@g" | xargs -0 printf "%b"'

Then every time when you need it, simply go with:

$ echo "http%3A%2F%2Fwww" | urldecode
http://www

Bash

When scripting, you can use the following syntax:

input="http%3A%2F%2Fwww"
decoded=$(printf '%b' "${input//%/\\x}")

However above syntax won't handle pluses (+) correctly, so you've to replace them with spaces via sed or as suggested by @isaac, use the following syntax:

decoded=$(input=${input//+/ }; printf "${input//%/\\x}")

You can also use the following urlencode() and urldecode() functions:

urlencode() {
    # urlencode <string>
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) printf '%%%02X' "'$c" ;;
        esac
    done
}

urldecode() {
    # urldecode <string>

    local url_encoded="${1//+/ }"
    printf '%b' "${url_encoded//%/\\x}"
}

Note that above urldecode() assumes the data contains no backslash.

Here is similar Joel's version found at: https://github.com/sixarm/urldecode.sh


bash + xxd

Bash function with xxd tool:

urlencode() {
  local length="${#1}"
  for (( i = 0; i < length; i++ )); do
    local c="${1:i:1}"
    case $c in
      [a-zA-Z0-9.~_-]) printf "$c" ;;
    *) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
  esac
done
}

Found in cdown's gist file, also at stackoverflow.


PHP

Using PHP you can try the following command:

$ echo oil+and+gas | php -r 'echo urldecode(fgets(STDIN));' // Or: php://stdin
oil and gas

or just:

php -r 'echo urldecode("oil+and+gas");'

Use -R for multiple line input.


Perl

In Perl you can use URI::Escape.

decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")

Or to process a file:

perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file

awk

Try anon solution:

awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..

Note: Parameter -n is specific to GNU awk.

See: Using awk printf to urldecode text.

decoding file names

If you need to remove url encoding from the file names, use deurlname tool from renameutils (e.g. deurlname *.*).

See also:

  • Can wget decode uri file names when downloading in batch?
  • How to remove URI encoding from file names?

Related:

  • How to decode URL-encoded string in shell? at SO
  • How can I encode and decode percent-encoded strings on the command line? at Ask Ubuntu

There is a built-in function for that in the Python standard library. In Python 2, it's urllib.unquote.

decoded_url=$(python2 -c 'import sys, urllib; print urllib.unquote(sys.argv[1])' "$encoded_url")

Or to process a file:

python2 -c 'import sys, urllib; print urllib.unquote(sys.stdin.read())' <file >file.new &&
mv -f file.new file

In Python 3, it's urllib.parse.unquote.

decoded_url=$(python3 -c 'import sys, urllib.parse; print(urllib.parse.unquote(sys.argv[1]))' "$encoded_url")

Or to process a file:

python3 -c 'import sys, urllib; print(urllib.parse.unquote(sys.stdin.read()))' <file >file.new &&
mv -f file.new file

In Perl you can use URI::Escape.

decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")

Or to process a file:

perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file

If you want to stick to POSIX portable tools, it's awkward, because the only serious candidate is awk, which doesn't parse hexadecimal numbers. See Using awk printf to urldecode text for examples with common awk implementations, including BusyBox.