In Python, when are two objects the same?
Python has some types that it guarantees will only have one instance. Examples of these instances are None
, NotImplemented
, and Ellipsis
. These are (by definition) singletons and so things like None is None
are guaranteed to return True
because there is no way to create a new instance of NoneType
.
It also supplies a few doubletons 1True
, False
2 -- All references to True
point to the same object. Again, this is because there is no way to create a new instance of bool
.
The above things are all guaranteed by the python language. However, as you have noticed, there are some types (all immutable) that store some instances for reuse. This is allowed by the language, but different implementations may choose to use this allowance or not -- depending on their optimization strategies. Some examples that fall into this category are small integers (-5 -> 255), the empty tuple
and empty frozenset
.
Finally, Cpython intern
s certain immutable objects during parsing...
e.g. if you run the following script with Cpython, you'll see that it returns True
:
def foo():
return (2,)
if __name__ == '__main__':
print foo() is foo()
This seems really odd. The trick that Cpython is playing is that whenever it constructs the function foo
, it sees a tuple-literal that contains other simple (immutable) literals. Rather than create this tuple (or it's equivalents) over and over, python just creates it once. There's no danger of that object being changed since the whole deal is immutable. This can be a big win for performance where the same tight loop is called over and over. Small strings are interned as well. The real win here is in dictionary lookups. Python can do a (blazingly fast) pointer compare and then fall back on slower string comparisons when checking hash collisions. Since so much of python is built on dictionary lookups, this can be a big optimization for the language as a whole.
1I might have just made up that word ... But hopefully you get the idea...
2Under normal circumstances, you don't need do check if the object is a reference to True
-- Usually you just care if the object is "truthy" -- e.g. if if some_instance: ...
will execute the branch. But, I put that in here just for completeness.
Note that is
can be used to compare things that aren't singletons. One common use is to create a sentinel value:
sentinel = object()
item = next(iterable, sentinel)
if items is sentinel:
# iterable exhausted.
Or:
_sentinel = object()
def function(a, b, none_is_ok_value_here=_sentinel):
if none_is_ok_value_here is sentinel:
# Treat the function as if `none_is_ok_value_here` was not provided.
The moral of this story is to always say what you mean. If you want to check if a value is another value, then use the is
operator. If you want to check if a value is equal to another value (but possibly distinct), then use ==
. For more details on the difference between is
and ==
(and when to use which), consult one of the following posts:
- Is there a difference between `==` and `is` in Python?
- Python None comparison: should I use "is" or ==?
Addendum
We've talked about these CPython implementation details and we've claimed that they're optimizations. It'd be nice to try to measure just what we get from all this optimizing (other than a little added confusion when working with the is
operator).
String "interning" and dictionary lookups.
Here's a small script that you can run to see how much faster dictionary lookups are if you use the same string to look up the value instead of a different string. Note, I use the term "interned" in the variable names -- These values aren't necessarily interned (though they could be). I'm just using that to indicate that the "interned" string is the string in the dictionary.
import timeit
interned = 'foo'
not_interned = (interned + ' ').strip()
assert interned is not not_interned
d = {interned: 'bar'}
print('Timings for short strings')
number = 100000000
print(timeit.timeit(
'd[interned]',
setup='from __main__ import interned, d',
number=number))
print(timeit.timeit(
'd[not_interned]',
setup='from __main__ import not_interned, d',
number=number))
####################################################
interned_long = interned * 100
not_interned_long = (interned_long + ' ').strip()
d[interned_long] = 'baz'
assert interned_long is not not_interned_long
print('Timings for long strings')
print(timeit.timeit(
'd[interned_long]',
setup='from __main__ import interned_long, d',
number=number))
print(timeit.timeit(
'd[not_interned_long]',
setup='from __main__ import not_interned_long, d',
number=number))
The exact values here shouldn't matter too much, but on my computer, the short strings show about 1 part in 7 faster. The long strings are almost 2x faster (because the string comparison takes longer if the string has more characters to compare). The differences aren't quite as striking on python3.x, but they're still definitely there.
Tuple "interning"
Here's a small script you can play around with:
import timeit
def foo_tuple():
return (2, 3, 4)
def foo_list():
return [2, 3, 4]
assert foo_tuple() is foo_tuple()
number = 10000000
t_interned_tuple = timeit.timeit('foo_tuple()', setup='from __main__ import foo_tuple', number=number)
t_list = (timeit.timeit('foo_list()', setup='from __main__ import foo_list', number=number))
print(t_interned_tuple)
print(t_list)
print(t_interned_tuple / t_list)
print('*' * 80)
def tuple_creation(x):
return (x,)
def list_creation(x):
return [x]
t_create_tuple = timeit.timeit('tuple_creation(2)', setup='from __main__ import tuple_creation', number=number)
t_create_list = timeit.timeit('list_creation(2)', setup='from __main__ import list_creation', number=number)
print(t_create_tuple)
print(t_create_list)
print(t_create_tuple / t_create_list)
This one is a bit trickier to time (and I'm happy to take any better ideas how to time it in comments). The gist of this is that on average (and on my computer), a tuple takes about 60% as long to create as a list does. However, foo_tuple()
takes on average about 40% the time that foo_list()
takes. That shows that we really do gain a little bit of a speedup from these interns. The time savings seem to increase as the tuple gets larger (creating a longer list takes longer -- The tuple "creation" takes constant time since it was already created).
Also note that I've called this "interning". It actually isn't (at least not in the same sense the strings are interned). We can see the difference in this simple script:
def foo_tuple():
return (2,)
def bar_tuple():
return (2,)
def foo_string():
return 'foo'
def bar_string():
return 'foo'
print(foo_tuple() is foo_tuple()) # True
print(foo_tuple() is bar_tuple()) # False
print(foo_string() is bar_string()) # True
We see that the strings are really "interned" -- Different invocations using the same literal notation return the same object. The tuple "interning" seems to be specific to a single line.
It varies according to implementation.
CPython caches some immutable objects in memory. This is true of "small" integers like 1 and 2 (-5 to 255, as noted in the comments below). CPython does this for performance reasons; small integers are commonly used in most programs, so it saves memory to only have one copy created (and is safe because integers are immutable).
This is also true of "singleton" objects like None
; there is only ever one None
in existence at any given time.
Other objects (such as the empty tuple, ()
) may be implemented as singletons, or they may not be.
In general, you shouldn't necessarily assume that immutable objects will be implemented this way. CPython does so for performance reasons, but other implementations may not, and CPython may even stop doing it at some point in the future. (The only exception might be None
, as x is None
is a common Python idiom and is likely to be implemented across different interpreters and versions.)
Usually you want to use ==
instead of is
. Python's is
operator isn't used often, except when checking to see if a variable is None
.