BeautifulSoup: RuntimeError: maximum recursion depth exceeded
I'm unsure about why this works (I haven't examined the source), but adding .text
or .get_text()
seems to bypass the error for me.
For instance, changing
lambda x: BeautifulSoup(x, 'html.parser')
to
lambda x: BeautifulSoup(x, 'html.parser').get_text()
seems to work without throwing a recursion depth error.
I had encountered this problem and browsed a lot of web pages. I summary two methods to solve this problem.
However, I think we should know why that happened. Python limits the number of recursive(default number is 1000). We can see this number with print sys.getrecursionlimit()
. I guess that BeautifulSoup uses recursion to find child elements. When recursion is more than 1000 times, RuntimeError: maximum recursion depth exceeded
will appear.
First method: use sys.setrecursionlimit()
set limited number of recursive. You obviously can set 1000000, but maybe cause segmentation fault
.
Second Method: use try-except
. If appeared maximum recursion depth exceeded
, Our algorithm might have problems. Generally speaking, we can use loops instead of recursion. In your question, we could deal with HTML with replace()
or regular expression in advance.
Finally, I give an example.
from bs4 import BeautifulSoup
import sys
#sys.setrecursionlimit(10000)
try:
doc = ''.join(['<br>' for x in range(1000)])
soup = BeautifulSoup(doc, 'html.parser')
a = soup.find('br')
for i in a:
print i
except:
print 'failed'
If removed the #
, it could print doc
.
Hoping to help you.