Geopandas - geocoding, interactive plotting¶
More IO, interactive visualization using folium and geocoding
In [1]:
Copied!
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon, LineString
import seaborn as sns
import pyepsg
%matplotlib inline
plt.style.use('bmh')
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon, LineString
import seaborn as sns
import pyepsg
%matplotlib inline
plt.style.use('bmh')
Read CSV data¶
In [2]:
Copied!
df = pd.read_csv('data/2015-06-21.csv')
df.head()
df = pd.read_csv('data/2015-06-21.csv')
df.head()
Out[2]:
Start time | End time | Calories (kcal) | Distance (m) | Average heart rate (bpm) | Max heart rate (bpm) | Min heart rate (bpm) | Height (m) | Low latitude (deg) | Low longitude (deg) | ... | High longitude (deg) | Average speed (m/s) | Max speed (m/s) | Min speed (m/s) | Step count | Average weight (kg) | Max weight (kg) | Min weight (kg) | Inactive duration (ms) | Walking duration (ms) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00:00:00.000-07:00 | 00:15:00.000-07:00 | 16.215914 | NaN | NaN | NaN | NaN | NaN | 34.054939 | -117.242691 | ... | -117.242432 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 900000 | NaN |
1 | 00:15:00.000-07:00 | 00:30:00.000-07:00 | 16.215914 | NaN | NaN | NaN | NaN | NaN | 34.054955 | -117.242424 | ... | -117.242401 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 900000 | NaN |
2 | 00:30:00.000-07:00 | 00:45:00.000-07:00 | 16.215914 | NaN | NaN | NaN | NaN | NaN | 34.054939 | -117.242706 | ... | -117.242416 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 900000 | NaN |
3 | 00:45:00.000-07:00 | 01:00:00.000-07:00 | 16.215914 | NaN | NaN | NaN | NaN | NaN | 34.054970 | -117.242661 | ... | -117.242447 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 900000 | NaN |
4 | 01:00:00.000-07:00 | 01:15:00.000-07:00 | 16.215914 | NaN | NaN | NaN | NaN | NaN | 34.054977 | -117.242668 | ... | -117.242409 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 900000 | NaN |
5 rows × 21 columns
In [3]:
Copied!
df.shape
df.shape
Out[3]:
(96, 21)
In [4]:
Copied!
df.columns
df.columns
Out[4]:
Index(['Start time', 'End time', 'Calories (kcal)', 'Distance (m)', 'Average heart rate (bpm)', 'Max heart rate (bpm)', 'Min heart rate (bpm)', 'Height (m)', 'Low latitude (deg)', 'Low longitude (deg)', 'High latitude (deg)', 'High longitude (deg)', 'Average speed (m/s)', 'Max speed (m/s)', 'Min speed (m/s)', 'Step count', 'Average weight (kg)', 'Max weight (kg)', 'Min weight (kg)', 'Inactive duration (ms)', 'Walking duration (ms)'], dtype='object')
filter out columns without data
In [5]:
Copied!
df.describe()
df.describe()
Out[5]:
Calories (kcal) | Distance (m) | Average heart rate (bpm) | Max heart rate (bpm) | Min heart rate (bpm) | Height (m) | Low latitude (deg) | Low longitude (deg) | High latitude (deg) | High longitude (deg) | Average speed (m/s) | Max speed (m/s) | Min speed (m/s) | Step count | Average weight (kg) | Max weight (kg) | Min weight (kg) | Inactive duration (ms) | Walking duration (ms) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 96.000000 | 2.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 91.000000 | 91.000000 | 91.000000 | 91.000000 | 2.000000 | 2.000000 | 2.000000 | 15.000000 | 0.0 | 0.0 | 0.0 | 96.000000 | 5.000000 |
mean | 16.411472 | 31.000000 | NaN | NaN | NaN | NaN | 34.055033 | -117.242620 | 34.055120 | -117.242453 | 0.799705 | 0.799705 | 0.799705 | 116.600000 | NaN | NaN | NaN | 880833.302083 | 150074.200000 |
std | 2.665210 | 2.828427 | NaN | NaN | NaN | NaN | 0.000113 | 0.000092 | 0.000075 | 0.000097 | 0.333833 | 0.333833 | 0.333833 | 157.019016 | NaN | NaN | NaN | 87443.309515 | 142937.658018 |
min | 3.434296 | 29.000000 | NaN | NaN | NaN | NaN | 34.054150 | -117.242844 | 34.054996 | -117.242722 | 0.563649 | 0.563649 | 0.563649 | 1.000000 | NaN | NaN | NaN | 190607.000000 | 40929.000000 |
25% | 16.215914 | 30.000000 | NaN | NaN | NaN | NaN | 34.054996 | -117.242691 | 34.055063 | -117.242500 | 0.681677 | 0.681677 | 0.681677 | 16.000000 | NaN | NaN | NaN | 900000.000000 | 50464.000000 |
50% | 16.215914 | 31.000000 | NaN | NaN | NaN | NaN | 34.055023 | -117.242653 | 34.055122 | -117.242447 | 0.799705 | 0.799705 | 0.799705 | 64.000000 | NaN | NaN | NaN | 900000.000000 | 62575.000000 |
75% | 16.215914 | 32.000000 | NaN | NaN | NaN | NaN | 34.055098 | -117.242538 | 34.055174 | -117.242405 | 0.917733 | 0.917733 | 0.917733 | 121.000000 | NaN | NaN | NaN | 900000.000000 | 233586.000000 |
max | 34.519853 | 33.000000 | NaN | NaN | NaN | NaN | 34.055172 | -117.242409 | 34.055359 | -117.242027 | 1.035761 | 1.035761 | 1.035761 | 581.000000 | NaN | NaN | NaN | 900000.000000 | 362817.000000 |
In [7]:
Copied!
df2 = df[['Start time', 'End time','Calories (kcal)','High latitude (deg)',
'High longitude (deg)','Average speed (m/s)','Step count']]
df2.head(5)
df2 = df[['Start time', 'End time','Calories (kcal)','High latitude (deg)',
'High longitude (deg)','Average speed (m/s)','Step count']]
df2.head(5)
Out[7]:
Start time | End time | Calories (kcal) | High latitude (deg) | High longitude (deg) | Average speed (m/s) | Step count | |
---|---|---|---|---|---|---|---|
0 | 00:00:00.000-07:00 | 00:15:00.000-07:00 | 16.215914 | 34.054996 | -117.242432 | NaN | NaN |
1 | 00:15:00.000-07:00 | 00:30:00.000-07:00 | 16.215914 | 34.055000 | -117.242401 | NaN | NaN |
2 | 00:30:00.000-07:00 | 00:45:00.000-07:00 | 16.215914 | 34.055065 | -117.242416 | NaN | NaN |
3 | 00:45:00.000-07:00 | 01:00:00.000-07:00 | 16.215914 | 34.055058 | -117.242447 | NaN | NaN |
4 | 01:00:00.000-07:00 | 01:15:00.000-07:00 | 16.215914 | 34.055061 | -117.242409 | NaN | NaN |
Create a geopandas data frame from pandas dataframe¶
In [8]:
Copied!
point_list = [Point(xy) for xy in zip(df2['High longitude (deg)'],df['High latitude (deg)'])]
point_list[:4]
point_list = [Point(xy) for xy in zip(df2['High longitude (deg)'],df['High latitude (deg)'])]
point_list[:4]
Out[8]:
[<shapely.geometry.point.Point at 0x10f8b5630>, <shapely.geometry.point.Point at 0x10f8b5668>, <shapely.geometry.point.Point at 0x10f8b5710>, <shapely.geometry.point.Point at 0x10f8b56a0>]
It is not necessary to construct a geoseries, but Folium needs a geoseries and that to have a coordinate system set. So I am building it here
In [9]:
Copied!
gseries = gpd.GeoSeries(point_list)
gseries = gpd.GeoSeries(point_list)
In [10]:
Copied!
wgs84=pyepsg.get(4326)
wgs84=pyepsg.get(4326)
In [11]:
Copied!
# gseries.crs = pyepsg.get(4326)
gseries.crs ={'init':'epsg:4326'}
# gseries.crs = pyepsg.get(4326)
gseries.crs ={'init':'epsg:4326'}
In [12]:
Copied!
gdf = gpd.GeoDataFrame(data=df2, geometry=gseries)
gdf.head(4)
gdf = gpd.GeoDataFrame(data=df2, geometry=gseries)
gdf.head(4)
Out[12]:
Start time | End time | Calories (kcal) | High latitude (deg) | High longitude (deg) | Average speed (m/s) | Step count | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 00:00:00.000-07:00 | 00:15:00.000-07:00 | 16.215914 | 34.054996 | -117.242432 | NaN | NaN | POINT (-117.242431640625 34.05499649047852) |
1 | 00:15:00.000-07:00 | 00:30:00.000-07:00 | 16.215914 | 34.055000 | -117.242401 | NaN | NaN | POINT (-117.2424011230469 34.05500030517578) |
2 | 00:30:00.000-07:00 | 00:45:00.000-07:00 | 16.215914 | 34.055065 | -117.242416 | NaN | NaN | POINT (-117.2424163818359 34.0550651550293) |
3 | 00:45:00.000-07:00 | 01:00:00.000-07:00 | 16.215914 | 34.055058 | -117.242447 | NaN | NaN | POINT (-117.2424468994141 34.05505752563477) |
In [13]:
Copied!
gdf[['High latitude (deg)','High longitude (deg)']].describe()
gdf[['High latitude (deg)','High longitude (deg)']].describe()
Out[13]:
High latitude (deg) | High longitude (deg) | |
---|---|---|
count | 91.000000 | 91.000000 |
mean | 34.055120 | -117.242453 |
std | 0.000075 | 0.000097 |
min | 34.054996 | -117.242722 |
25% | 34.055063 | -117.242500 |
50% | 34.055122 | -117.242447 |
75% | 34.055174 | -117.242405 |
max | 34.055359 | -117.242027 |
Drop the rows that don't have coordinates
In [14]:
Copied!
gdf.dropna(subset=['High latitude (deg)'], inplace=True)
gdf.shape
gdf.dropna(subset=['High latitude (deg)'], inplace=True)
gdf.shape
Out[14]:
(91, 8)
In [15]:
Copied!
fig,ax= plt.subplots(1, figsize=(5,5))
gdf.plot(ax=ax)
# ax.set_xlim([-117.2,-117.4])
# ax.set_ylim([34.054, 34.055])
fig,ax= plt.subplots(1, figsize=(5,5))
gdf.plot(ax=ax)
# ax.set_xlim([-117.2,-117.4])
# ax.set_ylim([34.054, 34.055])
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f99f0f0>
Viz on Folium map¶
In [16]:
Copied!
import folium
folium.__version__
import folium
folium.__version__
Out[16]:
'0.5.0'
In [17]:
Copied!
# from folium.plugins import HeatMap
# hm = HeatMap(data=[xy for xy in zip(gdf['geometry'].x, gdf['geometry'].y)])
# from folium.plugins import HeatMap
# hm = HeatMap(data=[xy for xy in zip(gdf['geometry'].x, gdf['geometry'].y)])
In [135]:
Copied!
m = folium.Map(location=[34.054,-117.2], zoom_start=15)
folium.GeoJson(data=gdf).add_to(m)
m
m = folium.Map(location=[34.054,-117.2], zoom_start=15)
folium.GeoJson(data=gdf).add_to(m)
m
Out[135]:
Geocoding¶
Geocoding is the process of turning addresses into coordinates by plotting them on a map. It is a combination of search and computation (when you interpolate street addresses for unknown extents) that is typically performed on server side due to the amount of data required on a global scale.
Geopandas uses geopy
to perform geocoding and returns the result of hits as a geodataframe
.
In [18]:
Copied!
from geopandas.tools import geocode, geocoding, reverse_geocode
from geopandas.tools import geocode, geocoding, reverse_geocode
In [19]:
Copied!
type(geocode), type(reverse_geocode), type(geocoding)
type(geocode), type(reverse_geocode), type(geocoding)
Out[19]:
(function, function, module)
Get list of providers from geopy
In [15]:
Copied!
import geopy, inspect
print(geopy.__version__)
# use inspection, but limit to just classes
inspect.getmembers(geopy, predicate=inspect.isclass)
import geopy, inspect
print(geopy.__version__)
# use inspection, but limit to just classes
inspect.getmembers(geopy, predicate=inspect.isclass)
1.14.0
Out[15]:
[('ArcGIS', geopy.geocoders.arcgis.ArcGIS), ('Baidu', geopy.geocoders.baidu.Baidu), ('Bing', geopy.geocoders.bing.Bing), ('DataBC', geopy.geocoders.databc.DataBC), ('GeoNames', geopy.geocoders.geonames.GeoNames), ('GeocodeFarm', geopy.geocoders.geocodefarm.GeocodeFarm), ('GoogleV3', geopy.geocoders.googlev3.GoogleV3), ('IGNFrance', geopy.geocoders.ignfrance.IGNFrance), ('LiveAddress', geopy.geocoders.smartystreets.LiveAddress), ('Location', geopy.location.Location), ('Mapzen', geopy.geocoders.mapzen.Mapzen), ('Nominatim', geopy.geocoders.osm.Nominatim), ('OpenCage', geopy.geocoders.opencage.OpenCage), ('OpenMapQuest', geopy.geocoders.openmapquest.OpenMapQuest), ('Photon', geopy.geocoders.photon.Photon), ('PickPoint', geopy.geocoders.pickpoint.PickPoint), ('Point', geopy.point.Point), ('What3Words', geopy.geocoders.what3words.What3Words), ('Yandex', geopy.geocoders.yandex.Yandex)]
In [20]:
Copied!
geocode(strings='Hollywood sign') # default provider is googlev3
geocode(strings='Hollywood sign') # default provider is googlev3
Out[20]:
address | geometry | |
---|---|---|
0 | Los Angeles, CA 90068, USA | POINT (-118.3215482 34.1341151) |
In [21]:
Copied!
geocode(strings='Hollywood sign', provider='arcgis')
geocode(strings='Hollywood sign', provider='arcgis')
Out[21]:
address | geometry | |
---|---|---|
0 | Hollywood Sign | POINT (-118.3219799999999 34.13438000000008) |
In [26]:
Copied!
address_list = ['hospitals near Redlands, CA',
'groceries near Redlands, CA',
'schools near Redlands, CA']
geocoded_gdf = geocode(strings=address_list, provider='arcgis')
geocoded_gdf
address_list = ['hospitals near Redlands, CA',
'groceries near Redlands, CA',
'schools near Redlands, CA']
geocoded_gdf = geocode(strings=address_list, provider='arcgis')
geocoded_gdf
Out[26]:
address | geometry | |
---|---|---|
0 | Redlands Community Hospital-ER | POINT (-117.2044100362101 34.03633005943772) |
1 | Target | POINT (-117.2072099744647 34.07031003505633) |
2 | Redlands East Valley High School | POINT (-117.12776 34.06201000000004) |
In [30]:
Copied!
m2 = folium.Map(location=[34.054,-117.2], zoom_start=12)
folium.GeoJson(data=geocoded_gdf).add_to(m2)
m2
m2 = folium.Map(location=[34.054,-117.2], zoom_start=12)
folium.GeoJson(data=geocoded_gdf).add_to(m2)
m2
Out[30]:
In [31]:
Copied!
m2.crs
m2.crs
Out[31]:
'EPSG3857'
In [32]:
Copied!
m2.to_json()
m2.to_json()
Out[32]:
'{"name": "Map", "id": "94aa73e804f245ec8f4e34436a4a2412", "children": {"openstreetmap": {"name": "TileLayer", "id": "6e327110ed23459f86840082b427835d", "children": {}}, "geo_json_bd54f09d34e648ffbd9a583dc626582f": {"name": "GeoJson", "id": "bd54f09d34e648ffbd9a583dc626582f", "children": {}}}}'