Python: CSS Selector to use inside lxml.cssselect

I believe you cannot get the attribute value through CSS selectors. You should get the elements...

>>> elements = doc.cssselect('div.results dl dt a')

...and then get the attributes from them:

>>> for element in elements:
...     print element.get('href')
... 
/link 1
/link 2

Of course, list comprehensions are your friends:

>>> [element.get('href') for element in elements]
['/link 1', '/link 2']

Since you cannot update properties of attributes in CSS, I believe there is no sense on getting them through CSS selectors. You can "mention" attributes in CSS selectors to retrieve only to match their elements. However, is is just cogitation and I may be wrong; if I am, please someone correct me :) Well, @Tim Diggs confirms my hypothesis below :)


You need to get the attribute on the result of cssselect (it always returns the element, never an attribute):

firstly, I'm not sure about doc.cssselect (but maybe this is your own function?)

lxml.cssselect is normally used:

from lxml.cssselect import CSSSelector
sel = CSSSelector('html body div.results dl dt a[href]')

then, assuming you've already got a doc

links = []
for a_href in sel(doc):
    links.append(a_href.get('href'))

or the more succinct:

links = [a_href.get('href') for a_href in doc.cssselect('html body div.results dl dt a[href]')]

I have successfully used

#element-id ::attr(value)

To get the "value" attribute for HTML elements.