String comparison technique used by Python

From the docs:

The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.

Also:

Lexicographical ordering for strings uses the Unicode code point number to order individual characters.

or on Python 2:

Lexicographical ordering for strings uses the ASCII ordering for individual characters.

As an example:

>>> 'abc' > 'bac'
False
>>> ord('a'), ord('b')
(97, 98)

The result False is returned as soon as a is found to be less than b. The further items are not compared (as you can see for the second items: b > a is True).

Be aware of lower and uppercase:

>>> [(x, ord(x)) for x in abc]
[('a', 97), ('b', 98), ('c', 99), ('d', 100), ('e', 101), ('f', 102), ('g', 103), ('h', 104), ('i', 105), ('j', 106), ('k', 107), ('l', 108), ('m', 109), ('n', 110), ('o', 111), ('p', 112), ('q', 113), ('r', 114), ('s', 115), ('t', 116), ('u', 117), ('v', 118), ('w', 119), ('x', 120), ('y', 121), ('z', 122)]
>>> [(x, ord(x)) for x in abc.upper()]
[('A', 65), ('B', 66), ('C', 67), ('D', 68), ('E', 69), ('F', 70), ('G', 71), ('H', 72), ('I', 73), ('J', 74), ('K', 75), ('L', 76), ('M', 77), ('N', 78), ('O', 79), ('P', 80), ('Q', 81), ('R', 82), ('S', 83), ('T', 84), ('U', 85), ('V', 86), ('W', 87), ('X', 88), ('Y', 89), ('Z', 90)]

Python string comparison is lexicographic:

From Python Docs: http://docs.python.org/reference/expressions.html

Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters. Unicode and 8-bit strings are fully interoperable in this behavior.

Hence in your example, 'abc' < 'bac', 'a' comes before (less-than) 'b' numerically (in ASCII and Unicode representations), so the comparison ends right there.

Python and just about every other computer language use the same principles as (I hope) you would use when finding a word in a printed dictionary:

(1) Depending on the human language involved, you have a notion of character ordering: 'a' < 'b' < 'c' etc

(2) First character has more weight than second character: 'az' < 'za' (whether the language is written left-to-right or right-to-left or boustrophedon is quite irrelevant)

(3) If you run out of characters to test, the shorter string is less than the longer string: 'foo' < 'food'

Typically, in a computer language the "notion of character ordering" is rather primitive: each character has a human-language-independent number ord(character) and characters are compared and sorted using that number. Often that ordering is not appropriate to the human language of the user, and then you need to get into "collating", a fun topic.

String comparison technique used by Python

Tags:

Python

String

Comparison

Related

Recent Posts