BeautifulSoup: RuntimeError: maximum recursion depth exceeded

I'm unsure about why this works (I haven't examined the source), but adding .text or .get_text() seems to bypass the error for me.

For instance, changing

lambda x: BeautifulSoup(x, 'html.parser')


lambda x: BeautifulSoup(x, 'html.parser').get_text() seems to work without throwing a recursion depth error.

I had encountered this problem and browsed a lot of web pages. I summary two methods to solve this problem.

However, I think we should know why that happened. Python limits the number of recursive(default number is 1000). We can see this number with print sys.getrecursionlimit(). I guess that BeautifulSoup uses recursion to find child elements. When recursion is more than 1000 times, RuntimeError: maximum recursion depth exceeded will appear.

First method: use sys.setrecursionlimit() set limited number of recursive. You obviously can set 1000000, but maybe cause segmentation fault.

Second Method: use try-except. If appeared maximum recursion depth exceeded, Our algorithm might have problems. Generally speaking, we can use loops instead of recursion. In your question, we could deal with HTML with replace() or regular expression in advance.

Finally, I give an example.

from bs4 import BeautifulSoup
import sys   

    doc = ''.join(['<br>' for x in range(1000)])
    soup = BeautifulSoup(doc, 'html.parser')
    a = soup.find('br')
    for i in a:
        print i
    print 'failed'

If removed the #, it could print doc.

Hoping to help you.