Python Regex to find a string in double quotes within a string
From https://stackoverflow.com/a/69891301/1531728
My solution is:
import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw f "first" +&%#$%"second",vwrfhir, d2e u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due" "tre"fef fre f', ' "uno""dos" "tres"', '"unu""doua""trei"', ' "um" "dois" "tres" ']
my_substrings = []
for current_test_string in my_strings:
for values in re.findall(r'\"(.+?)\"', current_test_string):
my_substrings.append(values)
#print("values are:",values,"=")
print(" my_substrings are:",my_substrings,"=")
my_substrings = []
Alternate regular expressions to use are:
- re.findall('"(.+?)"', current_test_string) [Avinash2021] [user17405772021]
- re.findall('"(.*?)"', current_test_string) [Shelvington2020]
- re.findall(r'"(.*?)"', current_test_string) [Lundberg2012] [Avinash2021]
- re.findall(r'"(.+?)"', current_test_string) [Lundberg2012] [Avinash2021]
- re.findall(r'"["]', current_test_string) [Muthupandi2019]
- re.findall(r'"([^"]*)"', current_test_string) [Pieters2014]
- re.findall(r'"(?:(?:(?!(?<!\)").)*)"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Booboo2020]
- re.findall(r'"(.*?)(?<!\)"', current_test_string) [Hassan2014]
- re.findall('"[^"]*"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Martelli2013]
- re.findall('"([^"]*)"', current_test_string) [jspcal2014]
- re.findall("'(.*?)'", current_test_string) [akhilmd2016]
The current_test_string.split("\"")
approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.
References:
- [Avinash2021] Arvind Kumar Avinash, Answer to ``Extract text between quotation using regex python'', Stack Exchange, Inc., New York, NY, October 12, 2021. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/69543129/1531728 and Extract text between quotation using regex python November 8, 2021 was the last accessed date.
- [user17405772021] user1740577, Answer to ``Extract text between quotation using regex python'', Stack Exchange, Inc., New York, NY, October 12, 2021. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/69543030/1531728 and Extract text between quotation using regex python November 8, 2021 was the last accessed date.
- [Shelvington2020] Iain Shelvington, Answer to ``Extracting only words out of a mixed string in Python [duplicate]'', Stack Exchange, Inc., New York, NY, January 5, 2020. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/59598630/1531728 and Extracting only words out of a mixed string in Python November 6, 2021 was the last accessed date.
- [Lundberg2012] Johan Lundberg, Answer to ``Python Regex to find a string in double quotes within a string'', Stack Exchange, Inc., New York, NY, March 1, 2012. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/9519934/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Muthupandi2019] Daniel Muthupandi and trotta, Answer to ``Python Regex to find a string in double quotes within a string'', Stack Exchange, Inc., New York, NY, August 3, 2019. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/57337020/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Booboo2020] Booboo, Answer to ``Python Regex to find a string in double quotes within a string'', Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/63707053/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Pieters2014] Martijn Pieters, Answer to ``Extract a string between double quotes'', Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/22735466/1531728 and Extract a string between double quotes November 6, 2021 was the last accessed date.
- [Hassan2014] Sabuj Hassan, Answer to ``Extract a string between double quotes'', Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/22735480/1531728 and Extract a string between double quotes November 6, 2021 was the last accessed date.
- [Martelli2013] Alex Martelli and Sumit Singh, Answer to "Extract string from between quotations", Stack Exchange Inc., New York, NY, March 14, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/2076357/1531728 and Extract string from between quotations November 6, 2021 was the last accessed date.
- [jspcal2014] jspcal, Answer to "Extract string from between quotations", Stack Exchange Inc., New York, NY, March 14, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/2076356/1531728 and Extract string from between quotations November 6, 2021 was the last accessed date.
- [akhilmd2016] akhilmd, Answer to "Stripping string in python between quotes", Stack Exchange Inc., New York, NY, July 2, 2016. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/38161072/1531728 and ; November 5, 2021 was the last accessed date.
Just try to fetch double quoted strings from the multiline string:
import re
s = """
"my name is daniel" "mobile 8531111453733"[[[[[[--"i like pandas"
"location chennai"! -asfas"aadhaar du2mmy8969769##69869"
@4343453 "pincode 642002""@mango,@apple,@berry"
"""
print(re.findall(r'"(.*?)"', s))
Here's all you need to do:
def doit(text):
import re
matches = re.findall(r'"(.+?)"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
doit('Regex should return "String 1" or "String 2" or "String3" ')
result:
'String 1,String 2,String3'
As pointed out by Li-aung Yip:
To elaborate,
.+?
is the "non-greedy" version of.+
. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version,.+
, will giveString 1" or "String 2" or "String 3
; the non-greedy version.+?
givesString 1
,String 2
,String 3
.
In addition, if you want to accept empty strings, change .+
to .*
. Star *
means zero or more while plus +
means at least one.
The highly up-voted answer doesn't account for the possibility that the double-quoted string might contain one or more double-quote characters (properly escaped, of course). To handle this situation, the regex needs to accumulate characters one-by-one with a positive lookahead assertion stating that the current character is not a double-quote character that is not preceded by a backslash (which requires a negative lookbehind assertion):
"(?:(?:(?!(?<!\\)").)*)"
See Regex Demo
import re
import ast
def doit(text):
matches=re.findall(r'"(?:(?:(?!(?<!\\)").)*)"',text)
for match in matches:
print(match, '=>', ast.literal_eval(match))
doit('Regex should return "String 1" or "String 2" or "String3" and "\\"double quoted string\\"" ')
Prints:
"String 1" => String 1
"String 2" => String 2
"String3" => String3
"\"double quoted string\"" => "double quoted string"