What does netloc mean?
From RFC 1808, Section 2.1
, every URL should follow a specific format:
<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
Lets break this format down syntactically:
scheme
: The protocol name, usually http/httpsnetloc
: Contains the network location - which includes the domain itself (and subdomain if present), the port number, along with an optional credentials in form ofusername:password
. Together it may take form ofusername:[email protected]:80
.path
: Contains information on how the specified resource needs to be accessed.params
: Element which adds fine tuning to path. (optional)query
: Another element adding fine grained access to the path in consideration. (optional)fragment
: Contains bits of information of the resource being accessed within the path. (optional)
Lets take a very simple example to understand the above clearly:
https://cat.example/list;meow?breed=siberian#pawsize
In the above example:
https
is the scheme (first element of a URL)cat.example
is the netloc (sits between the scheme and path)/list
is the path (between the netloc and params)meow
is the param (sits between path and query)breed=siberian
is the query (between the fragment and params)pawsize
is the fragment (last element of a URL)
This can be replicated programmatically using Python's urllib.parse.urlparse
:
>>> import urllib.parse
>>> url ='https://cat.example/list;meow?breed=siberian#pawsize'
>>> urllib.parse.urlparse(url)
ParseResult(scheme='https', netloc='cat.example', path='/list', params='meow', query='breed=siberian', fragment='pawsize')
Now coming to your code, the if
statement checks whether or not the next_page
exists and whether the next_page
has a netloc. In that login()
function, checking if .netloc != ''
, means that it is checking whether the result of url_parse(next_page)
is a relative URL. A relative URL has a path but no hostname (and thus no netloc
).
import urllib.parse
url="https://example.com/something?a=1&b=1"
o = urllib.parse.urlsplit(url)
print(o.netloc)
example.com