Pandas - Indices, Sorting, Merging¶

Navigating multilevel index¶

In [1]:

Copied!

import pandas as pd
import numpy as np
import pandas as pd
import numpy as np

In [2]:

Copied!





# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside)) #create a list of tuples
hier_index
# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside)) #create a list of tuples
hier_index

Out[2]:

[('G1', 1), ('G1', 2), ('G1', 3), ('G2', 1), ('G2', 2), ('G2', 3)]

In [3]:

Copied!

#create a multiindex
hier_index = pd.MultiIndex.from_tuples(hier_index)
hier_index
#create a multiindex
hier_index = pd.MultiIndex.from_tuples(hier_index)
hier_index

Out[3]:

MultiIndex(levels=[['G1', 'G2'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])

In [4]:

Copied!

# Create a dataframe (6,2) with multi level index
df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df
# Create a dataframe (6,2) with multi level index
df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df

Out[4]:

		A	B
G1	1	-0.349997	1.372488
	2	0.090160	0.121530
	3	0.751559	-0.335218
G2	1	1.501890	0.835727
	2	-1.542555	-1.878225
	3	0.721939	-0.121186

Accessing rows and columns¶

You can use loc and iloc as a chain to access the elements. Go from outer index to inner index

In [5]:

Copied!

#access columns as usual
df['A']
#access columns as usual
df['A']

Out[5]:

G1  1   -0.349997
    2    0.090160
    3    0.751559
G2  1    1.501890
    2   -1.542555
    3    0.721939
Name: A, dtype: float64

In [6]:

Copied!

#access rows
df.loc['G1']
#access rows
df.loc['G1']

Out[6]:

	A	B
1	-0.349997	1.372488
2	0.090160	0.121530
3	0.751559	-0.335218

In [7]:

Copied!

#acess a single row form inner
df.loc['G1'].loc[1]
#acess a single row form inner
df.loc['G1'].loc[1]

Out[7]:

A   -0.349997
B    1.372488
Name: 1, dtype: float64

In [14]:

Copied!

#access a single cell
df.loc['G2'].loc[3]['B']
#access a single cell
df.loc['G2'].loc[3]['B']

Out[14]:

-0.12479824997165252

Naming indices¶

Indices can have names (appear similar to column names)

In [8]:

Copied!

df.index.names
df.index.names

Out[8]:

FrozenList([None, None])

In [9]:

Copied!

df.index.names = ['Group', 'Serial']
df
df.index.names = ['Group', 'Serial']
df

Out[9]:

		A	B
Group	Serial
G1	1	-0.349997	1.372488
	2	0.090160	0.121530
	3	0.751559	-0.335218
G2	1	1.501890	0.835727
	2	-1.542555	-1.878225
	3	0.721939	-0.121186

Accessing rows and columns using cross section¶

The xs method allows to get a cross section. The advantage is it can penetrate a multilevel index in a single step. Now that we have named the indices, we can use cross section effectively

In [10]:

Copied!

# Get all rows with Serial 1
df.xs(1, level='Serial')
# Get all rows with Serial 1
df.xs(1, level='Serial')

Out[10]:

	A	B
Group
G1	-0.349997	1.372488
G2	1.501890	0.835727

In [11]:

Copied!

# Get rows with serial 2 in group 1
df.xs(['G1',2])
# Get rows with serial 2 in group 1
df.xs(['G1',2])

Out[11]:

A    0.09016
B    0.12153
Name: (G1, 2), dtype: float64

Missing data¶

You can either drop rows/cols with missing values using dropna() or fill those cells with values using the fillna() methods.

dropna¶

Use dropna(axis, thresh...) where axis is 0 for rows, 1 for cols and thresh represents how many occurrences of nan before dropping happens

In [12]:

Copied!

d = {'a':[1,2,np.nan], 'b':[np.nan, 5, np.nan], 'c':[6,7,8]}
dfna = pd.DataFrame(d)
dfna
d = {'a':[1,2,np.nan], 'b':[np.nan, 5, np.nan], 'c':[6,7,8]}
dfna = pd.DataFrame(d)
dfna

Out[12]:

	a	b	c
0	1.0	NaN	6
1	2.0	5.0	7
2	NaN	NaN	8

In [13]:

Copied!

# dropping rows with one or more na values
dfna.dropna()
# dropping rows with one or more na values
dfna.dropna()

Out[13]:

	a	b	c
1	2.0	5.0	7

In [14]:

Copied!

# dropping cols with one or more na values
dfna.dropna(axis=1)
# dropping cols with one or more na values
dfna.dropna(axis=1)

Out[14]:

	c
0	6
1	7
2	8

In [15]:

Copied!

# Dropping rows only if 2 or more cols have na values
dfna.dropna(axis=0, thresh=2)
# Dropping rows only if 2 or more cols have na values
dfna.dropna(axis=0, thresh=2)

Out[15]:

	a	b	c
0	1.0	NaN	6
1	2.0	5.0	7

fillna¶

In [16]:

Copied!

dfna.fillna(value=999)
dfna.fillna(value=999)

Out[16]:

	a	b	c
0	1.0	999.0	6
1	2.0	5.0	7
2	999.0	999.0	8

In [17]:

Copied!

# filling with mean value of entire dataframe
dfna.fillna(value = dfna.mean())
# filling with mean value of entire dataframe
dfna.fillna(value = dfna.mean())

Out[17]:

	a	b	c
0	1.0	5.0	6
1	2.0	5.0	7
2	1.5	5.0	8

In [18]:

Copied!

# fill with mean value row by row
dfna['a'].fillna(value = dfna['a'].mean())
# fill with mean value row by row
dfna['a'].fillna(value = dfna['a'].mean())

Out[18]:

0    1.0
1    2.0
2    1.5
Name: a, dtype: float64

Data aggregation¶

Pandas allows sql like control on the dataframes. You can treat each DF as a table and perform sql aggregation.

groupby¶

Format is: df.groupby('col_name').aggregation()

In [19]:

Copied!





comp_data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}

comp_df = pd.DataFrame(comp_data)
comp_df
comp_data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}

comp_df = pd.DataFrame(comp_data)
comp_df

Out[19]:

	Company	Person	Sales
0	GOOG	Sam	200
1	GOOG	Charlie	120
2	MSFT	Amy	340
3	MSFT	Vanessa	124
4	FB	Carl	243
5	FB	Sarah	350

mean min max¶

In [21]:

Copied!

# mean sales by company - automatically only applies mean on numerical columns
comp_df.groupby('Company').mean()
# mean sales by company - automatically only applies mean on numerical columns
comp_df.groupby('Company').mean()

Out[21]:

	Sales
Company
FB	296.5
GOOG	160.0
MSFT	232.0

In [22]:

Copied!

# standard deviation in sales by company
comp_df.groupby('Company').std()
# standard deviation in sales by company
comp_df.groupby('Company').std()

Out[22]:

	Sales
Company
FB	75.660426
GOOG	56.568542
MSFT	152.735065

You can run other aggregation functions like mean, min, max, std, count etc. Lets look at describe which does all of it.

describe¶

In [23]:

Copied!

comp_df.groupby('Company').describe()
comp_df.groupby('Company').describe()

Out[23]:

	Sales
	count	mean	std	min	25%	50%	75%	max
Company
FB	2.0	296.5	75.660426	243.0	269.75	296.5	323.25	350.0
GOOG	2.0	160.0	56.568542	120.0	140.00	160.0	180.00	200.0
MSFT	2.0	232.0	152.735065	124.0	178.00	232.0	286.00	340.0

transpose¶

Long over due, you can tile a DF by calling the transpose() method.

In [24]:

Copied!

comp_df.groupby('Company').describe().transpose()
comp_df.groupby('Company').describe().transpose()

Out[24]:

	Company	FB	GOOG	MSFT
Sales	count	2.000000	2.000000	2.000000
	mean	296.500000	160.000000	232.000000
	std	75.660426	56.568542	152.735065
	min	243.000000	120.000000	124.000000
	25%	269.750000	140.000000	178.000000
	50%	296.500000	160.000000	232.000000
	75%	323.250000	180.000000	286.000000
	max	350.000000	200.000000	340.000000

In [25]:

Copied!

comp_df.groupby('Company').describe().index
comp_df.groupby('Company').describe().index

Out[25]:

Index(['FB', 'GOOG', 'MSFT'], dtype='object', name='Company')

Combining DataFrames¶

You can concatenate, merge and join data frames.

Lets take a look at 3 DataFrames

In [26]:

Copied!





df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],'D': ['D0', 'D1', 'D2', 'D3']}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], 'B': ['B4', 'B5', 'B6', 'B7'],
                        'C': ['C4', 'C5', 'C6', 'C7'],'D': ['D4', 'D5', 'D6', 'D7']}, index=[4, 5, 6, 7])
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],'D': ['D0', 'D1', 'D2', 'D3']}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], 'B': ['B4', 'B5', 'B6', 'B7'],
                        'C': ['C4', 'C5', 'C6', 'C7'],'D': ['D4', 'D5', 'D6', 'D7']}, index=[4, 5, 6, 7])

In [27]:

Copied!

df1
df1

Out[27]:

	A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1
2	A2	B2	C2	D2
3	A3	B3	C3	D3

In [28]:

Copied!

df2
df2

Out[28]:

	A	B	C	D
4	A4	B4	C4	D4
5	A5	B5	C5	D5
6	A6	B6	C6	D6
7	A7	B7	C7	D7

concat¶

pd.concat([list_of_df], axis=0) will extend a dataframe either along rows or columns. All DF in the list should be of same dimension.

In [29]:

Copied!

# extend along rows
pd.concat([df1, df2]) #flows well because index is sequential and colmns match
# extend along rows
pd.concat([df1, df2]) #flows well because index is sequential and colmns match

Out[29]:

	A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1
2	A2	B2	C2	D2
3	A3	B3	C3	D3
4	A4	B4	C4	D4
5	A5	B5	C5	D5
6	A6	B6	C6	D6
7	A7	B7	C7	D7

In [30]:

Copied!

#extend along columns
pd.concat([df1, df2], axis=1) #fills NaN when index dont match
#extend along columns
pd.concat([df1, df2], axis=1) #fills NaN when index dont match

Out[30]:

	A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	NaN	NaN	NaN	NaN
1	A1	B1	C1	D1	NaN	NaN	NaN	NaN
2	A2	B2	C2	D2	NaN	NaN	NaN	NaN
3	A3	B3	C3	D3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	A4	B4	C4	D4
5	NaN	NaN	NaN	NaN	A5	B5	C5	D5
6	NaN	NaN	NaN	NaN	A6	B6	C6	D6
7	NaN	NaN	NaN	NaN	A7	B7	C7	D7

merge¶

merge lets you do a sql merge with inner, outer, right and left joins. pd.merge(left, right, how='outer', on='key') where, left and right are your two DataFrames (tables) and on refers to the foreign key

In [31]:

Copied!





left = pd.DataFrame({'key1': ['K0', 'K1', 'K2', 'K3'],'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})
    
right = pd.DataFrame({'key1': ['K0', 'K1', 'K2', 'K3'],'B': ['C0', 'C1', 'C2', 'C3'],
                                  'C': ['D0', 'D1', 'D2', 'D3']})
left
left = pd.DataFrame({'key1': ['K0', 'K1', 'K2', 'K3'],'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})
    
right = pd.DataFrame({'key1': ['K0', 'K1', 'K2', 'K3'],'B': ['C0', 'C1', 'C2', 'C3'],
                                  'C': ['D0', 'D1', 'D2', 'D3']})
left

Out[31]:

	A	B	key1
0	A0	B0	K0
1	A1	B1	K1
2	A2	B2	K2
3	A3	B3	K3

In [32]:

Copied!

right
right

Out[32]:

	B	C	key1
0	C0	D0	K0
1	C1	D1	K1
2	C2	D2	K2
3	C3	D3	K3

inner merge¶

Inner join keeps only the intersection.

In [33]:

Copied!

#merge along key1
pd.merge(left, right, how='inner', on='key1')
#merge along key1
pd.merge(left, right, how='inner', on='key1')

Out[33]:

	A	B_x	key1	B_y	C
0	A0	B0	K0	C0	D0
1	A1	B1	K1	C1	D1
2	A2	B2	K2	C2	D2
3	A3	B3	K3	C3	D3

When both tables have same column names that are not used for merging (on) then pandas appends x and y to their names to differentiate

merge on multiple columns¶

Sometimes, your foreign key is composite. Then you can merge on multiple keys by passing a list to the on argument. Now lets add a key2 column to both the tables.

In [34]:

Copied!

left['key2'] = ['K0', 'K1', 'K0', 'K1']
left
left['key2'] = ['K0', 'K1', 'K0', 'K1']
left

Out[34]:

	A	B	key1	key2
0	A0	B0	K0	K0
1	A1	B1	K1	K1
2	A2	B2	K2	K0
3	A3	B3	K3	K1

In [35]:

Copied!

right['key2'] =  ['K0', 'K0', 'K0', 'K0']
right
right['key2'] =  ['K0', 'K0', 'K0', 'K0']
right

Out[35]:

	B	C	key1	key2
0	C0	D0	K0	K0
1	C1	D1	K1	K0
2	C2	D2	K2	K0
3	C3	D3	K3	K0

In [36]:

Copied!

pd.merge(left, right, how='inner', on=['key1', 'key2'])
pd.merge(left, right, how='inner', on=['key1', 'key2'])

Out[36]:

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
1	A2	B2	K2	K0	C2	D2

inner merge will only keep the intersection, thus only 2 rows.

outer merge¶

Use how='outer' to keep the union of both the tables. pandas fills NaN when a cell has no values.

In [37]:

Copied!

om = pd.merge(left, right, how='outer', on=['key1', 'key2'])
om
om = pd.merge(left, right, how='outer', on=['key1', 'key2'])
om

Out[37]:

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
3	A3	B3	K3	K1	NaN	NaN
4	NaN	NaN	K1	K0	C1	D1
5	NaN	NaN	K3	K0	C3	D3

Sorting¶

Use DataFrame.sort_values(by=columns, inplace=False, ascending=True) to sort the table.

In [38]:

Copied!

om.sort_values(by=['key1', 'key2']) #now you got the merge sorted by columns.
om.sort_values(by=['key1', 'key2']) #now you got the merge sorted by columns.

Out[38]:

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
4	NaN	NaN	K1	K0	C1	D1
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
5	NaN	NaN	K3	K0	C3	D3
3	A3	B3	K3	K1	NaN	NaN

right merge¶

how='right' will keep all the rows of right table and drop the rows of left table that dont have a matching keys.

In [39]:

Copied!

pd.merge(left, right, how='right', on=['key1', 'key2']).sort_values(by='key1')
pd.merge(left, right, how='right', on=['key1', 'key2']).sort_values(by='key1')

Out[39]:

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
2	NaN	NaN	K1	K0	C1	D1
1	A2	B2	K2	K0	C2	D2
3	NaN	NaN	K3	K0	C3	D3

left merge¶

how='left' will similarly keep all rows of left and those rows of right that has a matching foreign key.

In [40]:

Copied!

pd.merge(left, right, how='left', on=['key1', 'key2']).sort_values(by='key1')
pd.merge(left, right, how='left', on=['key1', 'key2']).sort_values(by='key1')

Out[40]:

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
3	A3	B3	K3	K1	NaN	NaN

join¶

Joins are like merges but work on index instead of columns. Further, they are by default either left or right with inner as mode of joins. See example below:

In [41]:

Copied!





df_a = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                      index=['K0', 'K1', 'K2']) 
df_b = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                    'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])

df_a
df_a = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                      index=['K0', 'K1', 'K2']) 
df_b = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                    'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])

df_a

Out[41]:

	A	B
K0	A0	B0
K1	A1	B1
K2	A2	B2

In [42]:

Copied!

df_b
df_b

Out[42]:

	C	D
K0	C0	D0
K2	C2	D2
K3	C3	D3

In [43]:

Copied!

#join b to a, default mode = keep all rows of a and matching rows of b (left join)
df_a.join(df_b)
#join b to a, default mode = keep all rows of a and matching rows of b (left join)
df_a.join(df_b)

Out[43]:

	A	B	C	D
K0	A0	B0	C0	D0
K1	A1	B1	NaN	NaN
K2	A2	B2	C2	D2

Thus all rows of df_a and those in df_b. If df_b did not have that index, then NaN for values.

In [44]:

Copied!

#join b to a
df_b.join(df_a)
#join b to a
df_b.join(df_a)

Out[44]:

	C	D	A	B
K0	C0	D0	A0	B0
K2	C2	D2	A2	B2
K3	C3	D3	NaN	NaN

In [45]:

Copied!

#outer join - union of outputs
df_b.join(df_a, how='outer')
#outer join - union of outputs
df_b.join(df_a, how='outer')

Out[45]:

	C	D	A	B
K0	C0	D0	A0	B0
K1	NaN	NaN	A1	B1
K2	C2	D2	A2	B2
K3	C3	D3	NaN	NaN

	A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1
2	A2	B2	C2	D2
3	A3	B3	C3	D3
4	A4	B4	C4	D4
5	A5	B5	C5	D5
6	A6	B6	C6	D6
7	A7	B7	C7	D7

	A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	NaN	NaN	NaN	NaN
1	A1	B1	C1	D1	NaN	NaN	NaN	NaN
2	A2	B2	C2	D2	NaN	NaN	NaN	NaN
3	A3	B3	C3	D3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	A4	B4	C4	D4
5	NaN	NaN	NaN	NaN	A5	B5	C5	D5
6	NaN	NaN	NaN	NaN	A6	B6	C6	D6
7	NaN	NaN	NaN	NaN	A7	B7	C7	D7

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
3	A3	B3	K3	K1	NaN	NaN
4	NaN	NaN	K1	K0	C1	D1
5	NaN	NaN	K3	K0	C3	D3

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
4	NaN	NaN	K1	K0	C1	D1
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
5	NaN	NaN	K3	K0	C3	D3
3	A3	B3	K3	K1	NaN	NaN

	A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1
2	A2	B2	C2	D2
3	A3	B3	C3	D3
4	A4	B4	C4	D4
5	A5	B5	C5	D5
6	A6	B6	C6	D6
7	A7	B7	C7	D7

	A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	NaN	NaN	NaN	NaN
1	A1	B1	C1	D1	NaN	NaN	NaN	NaN
2	A2	B2	C2	D2	NaN	NaN	NaN	NaN
3	A3	B3	C3	D3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	A4	B4	C4	D4
5	NaN	NaN	NaN	NaN	A5	B5	C5	D5
6	NaN	NaN	NaN	NaN	A6	B6	C6	D6
7	NaN	NaN	NaN	NaN	A7	B7	C7	D7

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
3	A3	B3	K3	K1	NaN	NaN
4	NaN	NaN	K1	K0	C1	D1
5	NaN	NaN	K3	K0	C3	D3

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
4	NaN	NaN	K1	K0	C1	D1
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
5	NaN	NaN	K3	K0	C3	D3
3	A3	B3	K3	K1	NaN	NaN

	A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1
2	A2	B2	C2	D2
3	A3	B3	C3	D3
4	A4	B4	C4	D4
5	A5	B5	C5	D5
6	A6	B6	C6	D6
7	A7	B7	C7	D7

	A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	NaN	NaN	NaN	NaN
1	A1	B1	C1	D1	NaN	NaN	NaN	NaN
2	A2	B2	C2	D2	NaN	NaN	NaN	NaN
3	A3	B3	C3	D3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	A4	B4	C4	D4
5	NaN	NaN	NaN	NaN	A5	B5	C5	D5
6	NaN	NaN	NaN	NaN	A6	B6	C6	D6
7	NaN	NaN	NaN	NaN	A7	B7	C7	D7

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
3	A3	B3	K3	K1	NaN	NaN
4	NaN	NaN	K1	K0	C1	D1
5	NaN	NaN	K3	K0	C3	D3

	A	B_x	key1	key2	B_y	C
0	A0	B0	K0	K0	C0	D0
4	NaN	NaN	K1	K0	C1	D1
1	A1	B1	K1	K1	NaN	NaN
2	A2	B2	K2	K0	C2	D2
5	NaN	NaN	K3	K0	C3	D3
3	A3	B3	K3	K1	NaN	NaN