Comparing two urls in Python
Lib https://github.com/rbaier/urltools
Have a look at my project i am doing the same thing
https://github.com/tg123/tao.bb/blob/master/url_normalize.py
Use urlparse
and write a comparison function with the fields that you need
>>> from urllib.parse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
And you can compare on any of the following:
- scheme 0 URL scheme specifier
- netloc 1 Network location part
- path 2 Hierarchical path
- params 3 Parameters for last path element
- query 4 Query component
- fragment 5 Fragment identifier
- username User name
- password Password
- hostname Host name (lower case)
- port Port number as integer, if present
Here is a simple class that enables you to do this:
if Url(url1) == Url(url2):
pass
It could easily be revamped as a function, though these objects are hashable, and therefore enable you to add them into a cache using a set or dictionary:
# Python 2
# from urlparse import urlparse, parse_qsl
# from urllib import unquote_plus
# Python 3
from urllib.parse import urlparse, parse_qsl, unquote_plus
class Url(object):
'''A url object that can be compared with other url orbjects
without regard to the vagaries of encoding, escaping, and ordering
of parameters in query strings.'''
def __init__(self, url):
parts = urlparse(url)
_query = frozenset(parse_qsl(parts.query))
_path = unquote_plus(parts.path)
parts = parts._replace(query=_query, path=_path)
self.parts = parts
def __eq__(self, other):
return self.parts == other.parts
def __hash__(self):
return hash(self.parts)