NumPy - Array slicing, dicing, searching

Array operations - slicing, dicing, searching

In [1]:
import numpy as np
In [2]:
arr1 = np.random.randint(10,30, size=8)
arr1
Out[2]:
array([25, 10, 18, 10, 16, 22, 14, 26])
In [3]:
arr2 = np.random.randint(20,200,size=50).reshape(5,10)  #method chaining - numbers from 0 to 50
arr2
Out[3]:
array([[147, 134,  58,  21,  90, 193, 135, 179, 129, 113],
       [ 85, 161,  31, 123, 191, 166,  52,  25,  94, 184],
       [174, 149, 143, 123, 126, 143,  59, 180, 116, 105],
       [ 78, 198, 161, 152, 167,  84, 104, 128, 173, 140],
       [181,  47, 114, 145, 139, 180, 183, 125,  41,  46]])

Array slicing

get elements using index like in a List

In [4]:
arr1[0]
Out[4]:
25
In [5]:
arr1[3]
Out[5]:
10
In [6]:
arr1[:3] #get the first 3 elements. Gets lower bounds inclusive, upper bound exclusive
Out[6]:
array([25, 10, 18])
In [7]:
arr1[2:] #lower bound inclusive
Out[7]:
array([18, 10, 16, 22, 14, 26])
In [8]:
arr1[2:5] #get elements at index 2,3,4
Out[8]:
array([18, 10, 16])

nD array slicing

In [9]:
arr2
Out[9]:
array([[147, 134,  58,  21,  90, 193, 135, 179, 129, 113],
       [ 85, 161,  31, 123, 191, 166,  52,  25,  94, 184],
       [174, 149, 143, 123, 126, 143,  59, 180, 116, 105],
       [ 78, 198, 161, 152, 167,  84, 104, 128, 173, 140],
       [181,  47, 114, 145, 139, 180, 183, 125,  41,  46]])
In [10]:
arr2[0,0] #style 1 - you pass in a list of indices
Out[10]:
147
In [11]:
arr2[0][0] #style 2 - parse it as list of lists - not so popular
Out[11]:
147
In [12]:
arr2[1] # get a full row
Out[12]:
array([ 85, 161,  31, 123, 191, 166,  52,  25,  94, 184])

Array dicing

In [13]:
#get the second column
arr2[:,1]
Out[13]:
array([134, 161, 149, 198,  47])

Thus, you specify : for all columns, followed by 1 for column. And you get a 1D array of the result

In [14]:
#get the 3rd row
arr2[2,:] #which is same as arr2[2]
Out[14]:
array([174, 149, 143, 123, 126, 143,  59, 180, 116, 105])
In [15]:
#get the center 3,3 elements - columns 4,5,6 and rows 1,2,3
arr2[1:4, 4:7]
Out[15]:
array([[191, 166,  52],
       [126, 143,  59],
       [167,  84, 104]])

Array broadcasting

NumPy allows bulk assigning values, just like in matlab

In [16]:
arr2
Out[16]:
array([[147, 134,  58,  21,  90, 193, 135, 179, 129, 113],
       [ 85, 161,  31, 123, 191, 166,  52,  25,  94, 184],
       [174, 149, 143, 123, 126, 143,  59, 180, 116, 105],
       [ 78, 198, 161, 152, 167,  84, 104, 128, 173, 140],
       [181,  47, 114, 145, 139, 180, 183, 125,  41,  46]])
In [17]:
arr2_subset = arr2[1:4, 4:7]
arr2_subset
Out[17]:
array([[191, 166,  52],
       [126, 143,  59],
       [167,  84, 104]])
In [18]:
arr2_subset[:,:] = 999 #assign this entire numpy the same values
arr2_subset
Out[18]:
array([[999, 999, 999],
       [999, 999, 999],
       [999, 999, 999]])

Deep copy

NumPy Arrays like Python objects are always shallow copied. Hence any modification made in derivative affects the source. Make deep copies using copy() method

In [19]:
arr2 #notice the 999 in the middle
Out[19]:
array([[147, 134,  58,  21,  90, 193, 135, 179, 129, 113],
       [ 85, 161,  31, 123, 999, 999, 999,  25,  94, 184],
       [174, 149, 143, 123, 999, 999, 999, 180, 116, 105],
       [ 78, 198, 161, 152, 999, 999, 999, 128, 173, 140],
       [181,  47, 114, 145, 139, 180, 183, 125,  41,  46]])
In [20]:
arr2_subset_a = arr2_subset
arr2_subset_a is arr2_subset
Out[20]:
True

Notice they are same obj in memory

In [21]:
arr3_subset = arr2_subset.copy()
arr3_subset
Out[21]:
array([[999, 999, 999],
       [999, 999, 999],
       [999, 999, 999]])
In [22]:
arr3_subset is arr2_subset
Out[22]:
False

Notice they are different objects in memory. Thus changing arr3_subset will not affect its source

In [23]:
arr3_subset[:,:] = 0.1
arr2_subset
Out[23]:
array([[999, 999, 999],
       [999, 999, 999],
       [999, 999, 999]])

Array searching

Use matlab style array searching

In [24]:
arr1
Out[24]:
array([25, 10, 18, 10, 16, 22, 14, 26])
In [28]:
arr1>15  # gives truth vector
Out[28]:
array([ True, False,  True, False,  True,  True, False,  True])

You can use the Truth vector as an index to search. Get all numbers greater than 15

In [29]:
arr1[arr1 > 15]
Out[29]:
array([25, 18, 16, 22, 26])
In [30]:
arr1[arr1 > 20]
Out[30]:
array([25, 22, 26])

just the condition returns a boolean matrix of same dimension as the one being queried

In [31]:
arr1 > 12
Out[31]:
array([ True, False,  True, False,  True,  True,  True,  True])
In [32]:
arr2[arr2 > 50] #looses the original shape as its impossible to keep the 2D shape
Out[32]:
array([147, 134,  58,  90, 193, 135, 179, 129, 113,  85, 161, 123, 999,
       999, 999,  94, 184, 174, 149, 143, 123, 999, 999, 999, 180, 116,
       105,  78, 198, 161, 152, 999, 999, 999, 128, 173, 140, 181, 114,
       145, 139, 180, 183, 125])
In [33]:
arr2[arr2 < 30]
Out[33]:
array([21, 25])

Compound searching

Find elements within a range for instance:

In [35]:
arr1[(arr1>16) & (arr1<23)]
Out[35]:
array([18, 22])

Math operations - elemenwise

NumPy has operators like +, -, /, * overloaded so you can add two matrices like scalars

In [36]:
arr1
Out[36]:
array([25, 10, 18, 10, 16, 22, 14, 26])
In [37]:
arr_sum = arr1 + arr1 # elementwise addition
arr_sum
Out[37]:
array([50, 20, 36, 20, 32, 44, 28, 52])
In [38]:
arr_cubed = arr1 ** 2 # elementwise exponentiation
arr_cubed
Out[38]:
array([625, 100, 324, 100, 256, 484, 196, 676])

Similarly, you can add a scalar to an array and NumPy will broadcast that operation on all the elements.

In [39]:
arr_cubed - 100 # element wise subtraction by a scalar
Out[39]:
array([525,   0, 224,   0, 156, 384,  96, 576])

Math operations - matrix math

Use built-in functions for matrix operations

In [41]:
arr1
Out[41]:
array([25, 10, 18, 10, 16, 22, 14, 26])
In [40]:
np.dot(arr1, arr1)
Out[40]:
2761

Above it automatically transposed the second array input to calculate the matrix multiplication of 1xnxnx1

Caveats

Numpy does not throw errors for divide by zero or for 0/0. Intead it sets value to inf and nan.

In [42]:
arr_cubed[0] = 0
arr_cubed
Out[42]:
array([  0, 100, 324, 100, 256, 484, 196, 676])
In [43]:
arr_cubed / 0
/Users/atma6951/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
  """Entry point for launching an IPython kernel.
/Users/atma6951/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in true_divide
  """Entry point for launching an IPython kernel.
Out[43]:
array([nan, inf, inf, inf, inf, inf, inf, inf])

Thus 0/0 = nan and num/0 = inf

Universal functions

Numpy has a bunch of universal functions that work on the array elements one at a time and allow arrays to be used or treated as scalars.

Before writing a loop, look up the function list here