How to allow only certain characters in a bash variable
You were close.
You want to check whether the URL contains at least one of the disallowed characters (and then report it as invalid), not at least one of the allowed character.
You can negate a character set in a bracket expression with !
(^
also works with bash
and a few other shells).
In any case, you were right to explicitly list characters individually, using ranges like a-z
, A-Z
, 0-9
would only work (at matching only the 26+26+10 characters you listed) in the C locale. In other locales they could match thousands of other characters or even collating elements made of more than one character (the ones that sort between A
and Z
like É
for instance for A-Z
).
case $URL in
("" | *[!abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_/\&?:.=-]*)
echo >&2 "That URL is NOT allowed.";;
(*)
echo "That URL is allowed.";;
esac
Your attempt
Your attempt might not work as expected because it considers all URLs as allowed that contain at least one of the allowed characters. Formally, you compare the URL with
<anything><allowed_character><anything>
If this does not match, you reject the URL.
This might help
If you replace your if ... else ... fi
by
if [[ "${URL}" =~ [^A-Za-z0-9\&./=_:?-] ]]; then
echo "That URL is NOT allowed."
else
echo "That URL is allowed."
fi
it might do what you want.
Here, the binary operator =~
is used to find a match of the regular expression [^A-Za-z0-9\&\./=_-:\?]
within "${URL}"
. This operator does not require that the whole string "${URL}"
matches the regular expression, any matching substring will do. Such a match is found for any character that is not allowed in the URL. The "not" comes from the leading caret (^
) in the definition of the character set. Please note that there is no negating !
in the conditional expression any more.
If "${URL}"
contains a forbidden character, the regular expression matches and the compound command [[...]]
evaluates to true (zero exit status).