Seaborn - matrix and regression plots¶

In [1]:

Copied!





import seaborn as sns
%matplotlib inline
flights = sns.load_dataset('flights')
flights.head()
import seaborn as sns
%matplotlib inline
flights = sns.load_dataset('flights')
flights.head()

Out[1]:

	year	month	passengers
0	1949	January	112
1	1949	February	118
2	1949	March	132
3	1949	April	129
4	1949	May	121

In [2]:

Copied!

flights.shape
flights.shape

Out[2]:

(144, 3)

Let us pivot this flights data such that it becomes a 2D matrix. Lets make the Month as row indices

In [3]:

Copied!

flights_pv = flights.pivot_table(index='month', columns='year', values='passengers')
flights_pv.head()
flights_pv = flights.pivot_table(index='month', columns='year', values='passengers')
flights_pv.head()

Out[3]:

year	1949	1950	1951	1952	1953	1954	1955	1956	1957	1958	1959	1960
month
January	112	115	145	171	196	204	242	284	315	340	360	417
February	118	126	150	180	196	188	233	277	301	318	342	391
March	132	141	178	193	236	235	267	317	356	362	406	419
April	129	135	163	181	235	227	269	313	348	348	396	461
May	121	125	172	183	229	234	270	318	355	363	420	472

Using pivot_tables we have also aggregated the data by month and years.

ToC

Heatmap
Cluster plot
Regression Linear model plot

Heatmap¶

Heatmaps are a great way to represent continually variying data. However, you need to run this on a matrix kind of dataset, one where the row indexes are values themselves instead of serials.

In [4]:

Copied!

sns.heatmap(flights_pv)
sns.heatmap(flights_pv)

Out[4]:

<matplotlib.axes._subplots.AxesSubplot at 0x117e0ed68>

No description has been provided for this image

From the heatmap above, we see there are more passengers in summer (June, July, August) and the number of passengers increases by the year as well.

Heatmap for null data visualization¶

SNS Heatmap is great to view how many nulls are in your data.

In [2]:

Copied!





#from ml chapter, read titanic data
import pandas as pd
titanic = pd.read_csv('../udemy_ml_bootcamp/Machine Learning Sections/Logistic-Regression/titanic_train.csv')
titanic.head()
#from ml chapter, read titanic data
import pandas as pd
titanic = pd.read_csv('../udemy_ml_bootcamp/Machine Learning Sections/Logistic-Regression/titanic_train.csv')
titanic.head()

Out[2]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

In [5]:

Copied!

titanic.shape
titanic.shape

Out[5]:

(891, 12)

In [3]:

Copied!

titanic.isnull().head()
titanic.isnull().head()

Out[3]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	False	False	False	False	False	False	False	False	False	False	True	False
1	False	False	False	False	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False	False	True	False
3	False	False	False	False	False	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False	False	False	False	True	False

In [4]:

Copied!

sns.heatmap(titanic.isnull(), yticklabels=False, cbar=False, cmap='viridis')
sns.heatmap(titanic.isnull(), yticklabels=False, cbar=False, cmap='viridis')

Out[4]:

<matplotlib.axes._subplots.AxesSubplot at 0x115ae64a8>

You can see Age and Cabin columns have lots of null while others have none or very few.

Cluster plot¶

Cluster plots are useful to auto group datasets. More of this in machine learning section

In [5]:

Copied!

sns.clustermap(flights_pv)
sns.clustermap(flights_pv)

/Users/atma6951/anaconda/envs/pychakras/lib/python3.6/site-packages/matplotlib/cbook.py:136: MatplotlibDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use facecolor instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

Out[5]:

<seaborn.matrix.ClusterGrid at 0x11a438470>

Cluster map rearranges the data to show cells of similar values close by.

Regression linear model plot¶

You can do regression plots in two ways. You can decorate a scatter plot to have a fit or you can make a regression plot with scatter on it. We will see the latter.

In [6]:

Copied!

tips = sns.load_dataset('tips')
tips.head()
tips = sns.load_dataset('tips')
tips.head()

Out[6]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

In [7]:

Copied!

#regressin total bill to the tip
sns.lmplot(x='total_bill', y='tip', data=tips)
#regressin total bill to the tip
sns.lmplot(x='total_bill', y='tip', data=tips)

Out[7]:

<seaborn.axisgrid.FacetGrid at 0x11aa258d0>

You can decorate this by splitting it by sex and assigning a different color for males and females

In [9]:

Copied!

sns.lmplot(x='total_bill', y='tip', data=tips, hue='sex')
sns.lmplot(x='total_bill', y='tip', data=tips, hue='sex')

Out[9]:

<seaborn.axisgrid.FacetGrid at 0x11aa4c630>

You can bring in factors like day of week and create a regression for each day

In [10]:

Copied!

sns.lmplot(x='total_bill', y='tip', data=tips, hue='sex', col='day')
sns.lmplot(x='total_bill', y='tip', data=tips, hue='sex', col='day')

Out[10]:

<seaborn.axisgrid.FacetGrid at 0x11af086a0>