Python - language optimizations


Interning is the practice of reusing objects on-demand. At startup, the Python kernel will preload certain frequently used objects. These include integers in the range -5 - 256. Any time a new int is instantiated that falls within that range, will get the pre-assigned memory address. In other words, the ints from -5 to 256 are singletons. Python does this for speed.

In [4]:
a = 20
b = c = 20
d = 40 - 20
print(hex(id(a)), hex(id(b)), hex(id(c)), hex(id(d))) # should all be same
print(a is b)
print(b is c)
0x7f871df21b90 0x7f871df21b90 0x7f871df21b90 0x7f871df21b90

Numbers beyond 256 are not singletons. Hence, each instance is a new object.

In [7]:
e = 500
f = 1000*5
print(e is f)

String interning

Strings are immutable and are good candidates for being cached. Thus Python will intern strings that look like valid identifiers (strings that match variable name criteria) as the assumption is you might need those strings multiple times.

In [8]:
s1 = "good_morning"  # valid identifier string
s2 = "good_morning"
print(hex(id(s1)), hex(id(s2)))
print(s1 is s2)
0x7f87223545b0 0x7f87223545b0
In [9]:
s3 = "good morning"  # space makes it an invalid identifier string
s4 = "good morning"
print(hex(id(s3)), hex(id(s4)))  # diff addresses
print(s3 is s4)
print(s3 == s4)
0x7f8722354470 0x7f8722354070

Forced interning

You can force the interpreter to intern strings that you think you will need multiple times. Use sys.intern() function. This is helpful in NLP type applications where you need to tokenize common strings like a, the, when etc.

In [10]:
import sys
s5 = sys.intern("good morning")  # will add given str to globally interned strings as its new
s6 = sys.intern("good morning")  # will lookup global table and return previous result
print(hex(id(s5)), hex(id(s6)))  # same addresses
print(s5 is s6)
0x7f87223549f0 0x7f87223549f0

Unless absolutely needed, you don't have to intern your objects. Do this with caution. The example below will time the difference between interned and non-interned operation.

In [15]:
def compare_str_using_equality(reps):
    a = "A long uninterned string is here" * 250
    b = "A long uninterned string is here" * 250
    for i in range(reps):
        if a == b:
In [16]:
def compare_str_using_identity(reps):
    a = sys.intern("A long uninterned string is here" * 250)
    b = sys.intern("A long uninterned string is here" * 250)
    for i in range(reps):
        if a is b:
In [17]:
import time

start = time.perf_counter()
compare_str_using_equality(10000000) # 10 million times
end = time.perf_counter()
print(f'Equality: {end-start} secs')

start = time.perf_counter()
compare_str_using_identity(10000000) # 10 million times
end = time.perf_counter()
print(f'Identity: {end-start} secs')
Equality: 4.601066562001506 secs
Identity: 0.5143052139992506 secs

Python - peephole optimizations

In addition, Python, at compile time, it performs a number of subtler optimizations. These include:

  • evaluating constant expressions (like 5*35 where all operands are constants)
  • evaluating short immutable sequences (like 'abc'*3 or (1,2)*4) that are <20 elements
  • converting mutable collecitons to immutable collections if they contain constants

We can see this in action by accessing func_name.__code__.co_consts property:

In [21]:
def some_func():
    a = 'abc'*4  # str is immutable
    b = (1,2,3) * 5  # immutable
    c = 34*77
    d = 'this is a long string' * 10
    e = [1,2,3] * 4
In [22]:
 (1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
 'this is a long stringthis is a long stringthis is a long stringthis is a long stringthis is a long stringthis is a long stringthis is a long stringthis is a long stringthis is a long stringthis is a long string',
 (1, 2, 3),

Even thought I have not executed the function, just upon compilation, the constants are evaluated. All immutable types have been evaluated, but mutable types are not, even if they are small.

In [ ]: