Python - Data types

Python - data types

A set of reference materials to help you in those occassional bouts of forgetfulness.

Python tutorials and references

Following are some resources to learn Python

  1. Article with reviews about various tutorials http://noeticforce.com/best-free-tutorials-to-learn-python-pdfs-ebooks-online-interactive
  2. user voted list of tutorials on quora: https://www.quora.com/What-is-the-best-online-resource-to-learn-Python
  3. Google’s Python class https://developers.google.com/edu/python/
  4. https://www.learnpython.org/
  5. Python reference documentation https://docs.python.org/3/
  6. A list of Python libraries for various applications: https://github.com/vinta/awesome-python

Python type system

Before we get into data structures, let us talk about the type system in Python. At a high level, there are “Numbers”, “Collections”, “Callables” and “Singletons”

Numbers have two categories - Integral and Non-integral numbers.

  • Integral number types:
    • Integers
    • Booleans
  • Non integral number types:
    • Floats implemented as doubles in C
    • Complex
    • Decimals
    • Fractions

Collections have three sub categories:

  • Sequence types
    • List (mutable)
    • Tuple (immutable)
    • String (immutable)
  • Sets
    • Set - mutable
    • FrozenSet - immutable
  • Mappings
    • Dict

Callables are types that can be called for execution

  • UDF or user defined functions
  • generators
  • classes
  • instance methods

Singletons are types that have only 1 instance within the execution space

  • None
  • NotImplemented
  • Ellipsis operator : (...)

Python naming conventions

Variable / identifier names

The following rules apply when choosing variable names

  • Can start with _, a-z, A-z
  • Can be of any length and contain 0-9 in addition
  • Can contain any unicode char
  • Cannot be a reserved keyword in the Python language

Special names

  • Vars that start with _ mean they are internal and not to be used by the consumer. They are private. But this is only by convention as everything is public in Python
  • Further, when you run from module import *, vars that begin with _ are not imported by the interpreter
  • Vars that follow __var_name__ are really reserved for Python internals. For example __init__ is used for a class constructor. The __lt__() method is used to implement a custom < operator etc. Don’t invent your own __var__ names.
  • Vars that follow __var_name are slightly different. They are used in a specific feature called name mangling in inheritance chains.

PEP8 conventions

The list below are just conventions and not rules. Following these will improve code readability.

  • Packages : short, all-lowercase names. No underscores
  • Modules: short, all-lowercase names. Can have underscores
  • Classes: CapWords or upper-camel case
  • Functions & Variables: snake_case
  • Constants: UPPER_SNAKE_CASE

Data structures

Lists

l1 = list()
l2 = [] #both empty lists
l3 = [1,2,3]
In [1]:
l1 = list()
type(l1)
Out[1]:
list
In [2]:
l2 = []
len(l2)
Out[2]:
0
list slicing
In [3]:
l3 = [1,2,3,4,5,6,7,8,9]
l3[:] #prints all
Out[3]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
In [4]:
l3[0]
Out[4]:
1
In [5]:
l3[:4] #prints first 4. the : is slicing operator
Out[5]:
[1, 2, 3, 4]
In [6]:
l3[4:7] #upto 1 less than highest index
Out[6]:
[5, 6, 7]
In [7]:
a = len(l3)
l3[a-1] #negative index for traversing in opposite dir
Out[7]:
9
In [8]:
l3[-4:] #to pick the last 4 elements
Out[8]:
[6, 7, 8, 9]
In [9]:
l3.reverse() #happens inplace
In [10]:
l3
Out[10]:
[9, 8, 7, 6, 5, 4, 3, 2, 1]
append and extend
In [11]:
l3.append(10) #to add new values
l3
Out[11]:
[9, 8, 7, 6, 5, 4, 3, 2, 1, 10]
In [12]:
a1 = ['a','b','c']
l3.append(a1)
In [13]:
l3[-1]
Out[13]:
['a', 'b', 'c']
In [14]:
a1 = ['a','b','c']
l3.extend(a1) #to splice two lists. need not be same data type
l3
Out[14]:
[9, 8, 7, 6, 5, 4, 3, 2, 1, 10, ['a', 'b', 'c'], 'a', 'b', 'c']
In [15]:
lol = [[1,2,3],[4,5,6]] #lol - list of lists
len(lol)
Out[15]:
2
In [16]:
lol[1].reverse()
lol[1]
Out[16]:
[6, 5, 4]
mutability of lists

list elements are mutable and can be changed

In [17]:
l3
Out[17]:
[9, 8, 7, 6, 5, 4, 3, 2, 1, 10, ['a', 'b', 'c'], 'a', 'b', 'c']
In [18]:
l3[-1] = 'solar fare' #modify the last element
l3
Out[18]:
[9, 8, 7, 6, 5, 4, 3, 2, 1, 10, ['a', 'b', 'c'], 'a', 'b', 'solar fare']
In [19]:
#list.insert(index, object) to insert a new value
print(str(len(l3))) #before insertion
l3.insert(1,'two')
l3
14
Out[19]:
[9, 'two', 8, 7, 6, 5, 4, 3, 2, 1, 10, ['a', 'b', 'c'], 'a', 'b', 'solar fare']
In [20]:
# l3.pop(index) remove item at index and give that item
l3.pop(-3) #remove 3rd item from last and give them
Out[20]:
'a'
In [21]:
l3
Out[21]:
[9, 'two', 8, 7, 6, 5, 4, 3, 2, 1, 10, ['a', 'b', 'c'], 'b', 'solar fare']
In [22]:
# l3.clear()  to empty a list
lol.clear()
lol
Out[22]:
[]
In [1]:
l3 = [9, 8, 7, 6, 5, 4, 3, 2, 1, 10, ['a', 'b', 'c'], 'a', 'b', 'c', 10,10,10]
l3
Out[1]:
[9, 8, 7, 6, 5, 4, 3, 2, 1, 10, ['a', 'b', 'c'], 'a', 'b', 'c', 10, 10, 10]
In [2]:
# l3.count(value) counts the number of occurrences of a value
l3.count(10)
Out[2]:
4
Lists and indices
In [3]:
# l3.index(value, <start, <stop>>) returns the first occurrence of element
l3.index(10)
Out[3]:
9

Find all the indices of an element

In [5]:
# indices = [i for i, x in enumerate(my_list) if x == "whatever"]

#find all occurrence of 10
indices_of_10 = [i for i, x in enumerate(l3) if x == 10]
indices_of_10
Out[5]:
[9, 14, 15, 16]
In [7]:
list(enumerate(l3))
Out[7]:
[(0, 9),
 (1, 8),
 (2, 7),
 (3, 6),
 (4, 5),
 (5, 4),
 (6, 3),
 (7, 2),
 (8, 1),
 (9, 10),
 (10, ['a', 'b', 'c']),
 (11, 'a'),
 (12, 'b'),
 (13, 'c'),
 (14, 10),
 (15, 10),
 (16, 10)]

Dictionaries

Key value pairs

d1 = dict()
d2 = {'key1':value,
      'key2':value2}
In [26]:
d1 = dict()
d2 = {}
len(d2)
Out[26]:
0
In [27]:
d3 = {'day':'Thursday',
     'day_of_week':5,
     'start_of_week':'Sunday',
     'day_of_year':123,
     'dod':{'month_of_year':'Feb',
           'year':2017},
     'list1':[8,7,66]}
len(d3)
Out[27]:
6
In [28]:
d3.keys()
Out[28]:
dict_keys(['day', 'day_of_week', 'start_of_week', 'day_of_year', 'dod', 'list1'])
In [29]:
d3['start_of_week']
Out[29]:
'Sunday'
In [30]:
type(d3['dod'])
Out[30]:
dict
In [31]:
# now that dod is a dict, get its keys
d3['dod'].keys()
Out[31]:
dict_keys(['month_of_year', 'year'])
In [32]:
d3['dod']['year']
Out[32]:
2017
mutability of dicts

dicts like lists are mutable

In [33]:
d3['day_of_year'] = -48
d3
Out[33]:
{'day': 'Thursday',
 'day_of_week': 5,
 'day_of_year': -48,
 'dod': {'month_of_year': 'Feb', 'year': 2017},
 'list1': [8, 7, 66],
 'start_of_week': 'Sunday'}
In [46]:
# insert new values just by adding kvp (key value pair)
d3['workout_of_the_week']='bungee jumpging'
d3
Out[46]:
{'day': 'Thursday',
 'day_of_week': 5,
 'day_of_year': -48,
 'dod': {'month_of_year': 'Feb', 'year': 2017},
 'list1': [8, 7, 66],
 'start_of_week': 'Sunday',
 'workout_of_the_week': 'bungee jumpging'}
dict exploration

what happens when you inquire a key thats not present

In [47]:
d3['dayyy']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-47-c500fcefcb1b> in <module>()
----> 1 d3['dayyy']

KeyError: 'dayyy'
In [48]:
# safe way to get elements is to use get()
d3.get('day')
Out[48]:
'Thursday'
In [49]:
d3.get('dayyy') #retuns None
In [50]:
# use items() to get a list of tuples of key value pairs
d3.items()
Out[50]:
dict_items([('day_of_week', 5), ('day', 'Thursday'), ('workout_of_the_week', 'bungee jumpging'), ('dod', {'year': 2017, 'month_of_year': 'Feb'}), ('list1', [8, 7, 66]), ('day_of_year', -48), ('start_of_week', 'Sunday')])
In [51]:
# use values() to get only the values
d3.values()
Out[51]:
dict_values([5, 'Thursday', 'bungee jumpging', {'year': 2017, 'month_of_year': 'Feb'}, [8, 7, 66], -48, 'Sunday'])

Tuple

tuple is a immutable list

In [58]:
t1 = tuple()
t2 = ()
len(t1)
Out[58]:
0
In [59]:
type(t2)
Out[59]:
tuple
In [60]:
t3 = (3,4,5,'t','g','b')
t3[0]
Out[60]:
3
In [61]:
#use it just like a list
t3[-1]
Out[61]:
'b'
mutability of tuples

cannot modify tuples.

In [62]:
t3[0] = 'good evening'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-62-8d3766e24208> in <module>()
----> 1 t3[0] = 'good evening'

TypeError: 'tuple' object does not support item assignment

Sets

set is a sequence of unique values

s1 = set(<sequence>)
s2 = {}
In [63]:
s1 = set([1,1,1,2,2,2,4,4,4,4,4,4,4,5])
s1
Out[63]:
{1, 2, 4, 5}
In [64]:
s2 = {1,2,2,2,2,3} 
s2
Out[64]:
{1, 2, 3}
set from dictionary

Works on dicts too. But will return a set of keys only, not values.

In [65]:
# works on dicts too
s3_repeat_values = set({'k1':'v1',
                   'k2':'v1',
                   'k3':'v2'})
s3_repeat_values
Out[65]:
{'k1', 'k2', 'k3'}
In [66]:
type(s3_repeat_values)
Out[66]:
set
In [67]:
# repeating keys
s3_repeat_keys = set({'k1':'v1',
                     'k1':'v2'})
In [68]:
s3_repeat_keys
Out[68]:
{'k1'}

Note. When you create a dict with duplicate keys, Python just keeps the last occurrence of the kvp. It thinks the kvp needs to be updated to the latest value

In [69]:
d80 = {'k1':'v1', 'k2':'v2', 'k1':'v45'} # k1 is repeated
d80
Out[69]:
{'k1': 'v45', 'k2': 'v2'}