what does read() in urlopen('http.....').read() do? [urllib]

Quoting BS docs:

To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:

When you're using .read() method you use an "string" inteface. When you are not, you're using "filehandle" interface.

Effectively it works same way (although BS4 may read file-like object in lazy way). In your case whole content is read to string object (it's may consume more memory unnecessarily).


urllib.request.urlopen returns a file-like object, the read method of it will return the response body of that url.

BeautifulSoup constructor accepts both a string or an open filehandle, so yes, read() is redundant here.


Without BeautifulSoup Module

.read() is useful when you are not using the "BeautifulSoup" Module thus making it non-redundant in this case. Only if you use .read() you will get the html content, without which you will just have the object returned by .urlopen()

With BeautifulSoup Module

The BS module has 2 constructors for this feature, one will accept String and the other will accept the object returned by .urlopen(some-site)