Regular expression to remove one parameter from query string
/(?<=&|\?)foo(=[^&]*)?(&|$)/
Uses lookbehind and the last group to "anchor" the match, and allows a missing value. Change the \?
to ^
if you've already stripped off the question mark from the query string.
Regex is still not a substitute for a real parser of the query string, however.
Update: Test script: (run it at codepad.org)
import re
regex = r"(^|(?<=&))foo(=[^&]*)?(&|$)"
cases = {
"foo=123": "",
"foo=123&bar=456": "bar=456",
"bar=456&foo=123": "bar=456",
"abc=789&foo=123&bar=456": "abc=789&bar=456",
"oopsfoo=123": "oopsfoo=123",
"oopsfoo=123&bar=456": "oopsfoo=123&bar=456",
"bar=456&oopsfoo=123": "bar=456&oopsfoo=123",
"abc=789&oopsfoo=123&bar=456": "abc=789&oopsfoo=123&bar=456",
"foo": "",
"foo&bar=456": "bar=456",
"bar=456&foo": "bar=456",
"abc=789&foo&bar=456": "abc=789&bar=456",
"foo=": "",
"foo=&bar=456": "bar=456",
"bar=456&foo=": "bar=456",
"abc=789&foo=&bar=456": "abc=789&bar=456",
}
failures = 0
for input, expected in cases.items():
got = re.sub(regex, "", input)
if got != expected:
print "failed: input=%r expected=%r got=%r" % (input, expected, got)
failures += 1
if not failures:
print "Success"
It shows where my approach failed, Mark has the right of it—which should show why you shouldn't do this with regex.. :P
The problem is associating the query parameter with exactly one ampersand, and—if you must use regex (if you haven't picked up on it :P, I'd use a separate parser, which might use regex inside it, but still actually understand the format)—one solution would be to make sure there's exactly one ampersand per parameter: replace the leading ?
with a &
.
This gives /&foo(=[^&]*)?(?=&|$)/
, which is very straight forward and the best you're going to get. Remove the leading &
in the final result (or change it back into a ?
, etc.). Modifying the test case to do this uses the same cases as above, and changes the loop to:
failures = 0
for input, expected in cases.items():
input = "&" + input
got = re.sub(regex, "", input)
if got[:1] == "&":
got = got[1:]
if got != expected:
print "failed: input=%r expected=%r got=%r" % (input, expected, got)
failures += 1
if not failures:
print "Success"
If you want to do this in just one regular expression, you could do this:
/&foo(=[^&]*)?|^foo(=[^&]*)?&?/
This is because you need to match either an ampersand before the foo=..., or one after, or neither, but not both.
To be honest, I think it's better the way you did it: removing the trailing ampersand in a separate step.