GeoPandas - intro, plotting¶
In [1]:
Copied!
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('bmh') # better for plotting geometries vs general plots.
from shapely.geometry import Point, Polygon, LineString
import pandas as pd
import geopandas as gpd
from geopandas import GeoSeries, GeoDataFrame
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('bmh') # better for plotting geometries vs general plots.
from shapely.geometry import Point, Polygon, LineString
import pandas as pd
import geopandas as gpd
from geopandas import GeoSeries, GeoDataFrame
Geometry types¶
Geopandas uses shapely.geometry
geometry objects. Geopandas has 6
types of geometry objects
- Point
- Line (LineString)
- Polygon
- Multi-Point
- Multi-Line
- Multi-Polygon
Gotchas¶
- Geopandas is a growing project and its API could change over time
- Geopandas does not restrict or check for consistency in geometry type of its series.
Constructing GeoSeries
from Shapely Point
s¶
In [2]:
Copied!
point_list = [Point(-120,45), Point(-121.2, 46), Point(-122.9, 47.5)]
gs = GeoSeries(point_list)
gs
point_list = [Point(-120,45), Point(-121.2, 46), Point(-122.9, 47.5)]
gs = GeoSeries(point_list)
gs
Out[2]:
0 POINT (-120 45) 1 POINT (-121.2 46) 2 POINT (-122.9 47.5) dtype: object
In [3]:
Copied!
type(gs)
type(gs)
Out[3]:
geopandas.geoseries.GeoSeries
In [4]:
Copied!
gs.geometry
gs.geometry
Out[4]:
0 POINT (-120 45) 1 POINT (-121.2 46) 2 POINT (-122.9 47.5) dtype: object
In [5]:
Copied!
gs.geom_type
gs.geom_type
Out[5]:
0 Point 1 Point 2 Point dtype: object
In [6]:
Copied!
gs.iloc[0]
gs.iloc[0]
Out[6]:
In [8]:
Copied!
gs.crs = {'init':'epsg:4326'}
gs.crs = {'init':'epsg:4326'}
Plot the geometries¶
This internally uses descartes
and matplotlib
In [9]:
Copied!
gs.plot()
gs.plot()
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a0e833080>
Constructing GeoSeries
from a Pandas DataFrame
¶
In [16]:
Copied!
data_dict = {'name':['a','b','c'],
'lat':[45,46,47.5],
'lon':[-120,-121.2,-123]}
df = pd.DataFrame(data_dict)
df.set_index(df['name'], inplace=True)
df
data_dict = {'name':['a','b','c'],
'lat':[45,46,47.5],
'lon':[-120,-121.2,-123]}
df = pd.DataFrame(data_dict)
df.set_index(df['name'], inplace=True)
df
Out[16]:
lat | lon | name | |
---|---|---|---|
name | |||
a | 45.0 | -120.0 | a |
b | 46.0 | -121.2 | b |
c | 47.5 | -123.0 | c |
In [17]:
Copied!
[xy for xy in zip(df['lon'], df['lat'])]
[xy for xy in zip(df['lon'], df['lat'])]
Out[17]:
[(-120.0, 45.0), (-121.2, 46.0), (-123.0, 47.5)]
In [18]:
Copied!
point_list = [Point(xy) for xy in zip(df['lon'], df['lat'])]
point_list
point_list = [Point(xy) for xy in zip(df['lon'], df['lat'])]
point_list
Out[18]:
[<shapely.geometry.point.Point at 0x1a0e8af5c0>, <shapely.geometry.point.Point at 0x1a19123400>, <shapely.geometry.point.Point at 0x1a191237f0>]
In [20]:
Copied!
gdf = GeoDataFrame(df, geometry=point_list)
gdf
gdf = GeoDataFrame(df, geometry=point_list)
gdf
Out[20]:
lat | lon | name | geometry | |
---|---|---|---|---|
name | ||||
a | 45.0 | -120.0 | a | POINT (-120 45) |
b | 46.0 | -121.2 | b | POINT (-121.2 46) |
c | 47.5 | -123.0 | c | POINT (-123 47.5) |
In [21]:
Copied!
gdf.plot()
gdf.plot()
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a191bd9b0>
Reading a SHP file¶
In [22]:
Copied!
fire_stations = gpd.read_file('./data/FireStations.shp')
type(fire_stations)
fire_stations = gpd.read_file('./data/FireStations.shp')
type(fire_stations)
Out[22]:
geopandas.geodataframe.GeoDataFrame
In [23]:
Copied!
fire_stations.head()
fire_stations.head()
Out[23]:
OBJECTID | post_id | Name | descriptio | cat1 | cat2 | cat3 | addrln1 | addrln2 | city | ... | source | ext_id | use_type | dis_status | latitude | longitude | date_updat | POINT_X | POINT_Y | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 40223266 | 2022.0 | Los Angeles County Fire Department - Battalion... | The Battalion provides fire and rescue service... | Public Safety | Fire Stations | None | 7643 W. Santa Monica Blvd. | None | Los Angeles | ... | 211 | None | publish | None | 34.090940 | -118.356388 | 2013-06-01 | 6.453766e+06 | 1.855666e+06 | POINT (6453766.230470523 1855666.492027521) |
1 | 40223440 | 2236.0 | Los Angeles County Fire Department - Battalion... | The Battalion provides fire and rescue service... | Public Safety | Fire Stations | Battalion HQ | 6031 Rickenbacker Rd. | None | Commerce | ... | 211 | None | publish | None | 33.988587 | -118.154162 | 2013-06-01 | 6.514937e+06 | 1.818263e+06 | POINT (6514937.276763365 1818262.767294437) |
2 | 40223442 | 2237.0 | Los Angeles County Fire Department - Battalion... | The Battalion provides fire and rescue service... | Public Safety | Fire Stations | None | 1059 N. White Ave. | None | Pomona | ... | 211 | None | publish | None | 34.067555 | -117.759366 | 2013-06-01 | 6.634548e+06 | 1.847052e+06 | POINT (6634547.880061358 1847052.198560596) |
3 | 40223459 | 2262.0 | Los Angeles County Fire Department - Battalion... | The Battalion provides fire and rescue service... | Public Safety | Fire Stations | None | 1260 Encinal Canyon Rd | None | Malibu | ... | 211 | None | publish | None | 34.084350 | -118.866037 | 2013-06-01 | 6.299441e+06 | 1.854207e+06 | POINT (6299441.115893021 1854206.637991846) |
4 | 40223463 | 2269.0 | Los Angeles County Fire Department - Battalion... | The Battalion provides fire and rescue service... | Public Safety | Fire Stations | Battalion HQ | 6301 S. Santa Fe Ave. | None | Huntington Park | ... | 211 | None | publish | None | 33.983196 | -118.230626 | 2013-06-01 | 6.491753e+06 | 1.816345e+06 | POINT (6491753.321819022 1816345.240289599) |
5 rows × 30 columns
In [24]:
Copied!
fire_stations.crs
fire_stations.crs
Out[24]:
{'proj': 'lcc', 'lat_1': 34.03333333333333, 'lat_2': 35.46666666666667, 'lat_0': 33.5, 'lon_0': -118, 'x_0': 2000000, 'y_0': 500000.0000000001, 'datum': 'NAD83', 'units': 'us-ft', 'no_defs': True}
Data exploration using pandas¶
In [28]:
Copied!
fire_stations.columns
fire_stations.columns
Out[28]:
Index(['OBJECTID', 'post_id', 'Name', 'descriptio', 'cat1', 'cat2', 'cat3', 'addrln1', 'addrln2', 'city', 'state', 'zip', 'hours', 'phones', 'url', 'email', 'info1', 'info2', 'link', 'org_name', 'source', 'ext_id', 'use_type', 'dis_status', 'latitude', 'longitude', 'date_updat', 'POINT_X', 'POINT_Y', 'geometry'], dtype='object')
In [67]:
Copied!
fire_stations['city'].value_counts().head()
fire_stations['city'].value_counts().head()
Out[67]:
Los Angeles 65 Long Beach 25 Santa Clarita 10 Pasadena 10 Pomona 9 Name: city, dtype: int64
Plotting¶
In [39]:
Copied!
fire_stations.plot(figsize=(10,10))
fire_stations.plot(figsize=(10,10))
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a19347048>
Overlay USA map¶
Read the bundled dataset and overlay it on top of this
In [41]:
Copied!
gpd.datasets.available
gpd.datasets.available
Out[41]:
['naturalearth_cities', 'naturalearth_lowres', 'nybb']
In [42]:
Copied!
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.head()
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.head()
Out[42]:
pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
---|---|---|---|---|---|---|
0 | 28400000.0 | Asia | Afghanistan | AFG | 22270.0 | POLYGON ((61.21081709172574 35.65007233330923,... |
1 | 12799293.0 | Africa | Angola | AGO | 110300.0 | (POLYGON ((16.32652835456705 -5.87747039146621... |
2 | 3639453.0 | Europe | Albania | ALB | 21810.0 | POLYGON ((20.59024743010491 41.85540416113361,... |
3 | 4798491.0 | Asia | United Arab Emirates | ARE | 184300.0 | POLYGON ((51.57951867046327 24.24549713795111,... |
4 | 40913584.0 | South America | Argentina | ARG | 573900.0 | (POLYGON ((-65.50000000000003 -55.199999999999... |
In [45]:
Copied!
world.shape
world.shape
Out[45]:
(177, 6)
In [46]:
Copied!
world['name'].unique()
world['name'].unique()
Out[46]:
array(['Afghanistan', 'Angola', 'Albania', 'United Arab Emirates', 'Argentina', 'Armenia', 'Antarctica', 'Fr. S. Antarctic Lands', 'Australia', 'Austria', 'Azerbaijan', 'Burundi', 'Belgium', 'Benin', 'Burkina Faso', 'Bangladesh', 'Bulgaria', 'Bahamas', 'Bosnia and Herz.', 'Belarus', 'Belize', 'Bolivia', 'Brazil', 'Brunei', 'Bhutan', 'Botswana', 'Central African Rep.', 'Canada', 'Switzerland', 'Chile', 'China', "Côte d'Ivoire", 'Cameroon', 'Dem. Rep. Congo', 'Congo', 'Colombia', 'Costa Rica', 'Cuba', 'N. Cyprus', 'Cyprus', 'Czech Rep.', 'Germany', 'Djibouti', 'Denmark', 'Dominican Rep.', 'Algeria', 'Ecuador', 'Egypt', 'Eritrea', 'Spain', 'Estonia', 'Ethiopia', 'Finland', 'Fiji', 'Falkland Is.', 'France', 'Gabon', 'United Kingdom', 'Georgia', 'Ghana', 'Guinea', 'Gambia', 'Guinea-Bissau', 'Eq. Guinea', 'Greece', 'Greenland', 'Guatemala', 'Guyana', 'Honduras', 'Croatia', 'Haiti', 'Hungary', 'Indonesia', 'India', 'Ireland', 'Iran', 'Iraq', 'Iceland', 'Israel', 'Italy', 'Jamaica', 'Jordan', 'Japan', 'Kazakhstan', 'Kenya', 'Kyrgyzstan', 'Cambodia', 'Korea', 'Kosovo', 'Kuwait', 'Lao PDR', 'Lebanon', 'Liberia', 'Libya', 'Sri Lanka', 'Lesotho', 'Lithuania', 'Luxembourg', 'Latvia', 'Morocco', 'Moldova', 'Madagascar', 'Mexico', 'Macedonia', 'Mali', 'Myanmar', 'Montenegro', 'Mongolia', 'Mozambique', 'Mauritania', 'Malawi', 'Malaysia', 'Namibia', 'New Caledonia', 'Niger', 'Nigeria', 'Nicaragua', 'Netherlands', 'Norway', 'Nepal', 'New Zealand', 'Oman', 'Pakistan', 'Panama', 'Peru', 'Philippines', 'Papua New Guinea', 'Poland', 'Puerto Rico', 'Dem. Rep. Korea', 'Portugal', 'Paraguay', 'Palestine', 'Qatar', 'Romania', 'Russia', 'Rwanda', 'W. Sahara', 'Saudi Arabia', 'Sudan', 'S. Sudan', 'Senegal', 'Solomon Is.', 'Sierra Leone', 'El Salvador', 'Somaliland', 'Somalia', 'Serbia', 'Suriname', 'Slovakia', 'Slovenia', 'Sweden', 'Swaziland', 'Syria', 'Chad', 'Togo', 'Thailand', 'Tajikistan', 'Turkmenistan', 'Timor-Leste', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Taiwan', 'Tanzania', 'Uganda', 'Ukraine', 'Uruguay', 'United States', 'Uzbekistan', 'Venezuela', 'Vietnam', 'Vanuatu', 'Yemen', 'South Africa', 'Zambia', 'Zimbabwe'], dtype=object)
In [47]:
Copied!
usa = world[world['name']=='United States']
usa
usa = world[world['name']=='United States']
usa
Out[47]:
pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
---|---|---|---|---|---|---|
168 | 313973000.0 | North America | United States | USA | 15094000.0 | (POLYGON ((-155.54211 19.08348000000001, -155.... |
In [49]:
Copied!
ax = fire_stations.plot(figsize=(10,10))
usa.plot(ax = ax)
ax = fire_stations.plot(figsize=(10,10))
usa.plot(ax = ax)
Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1939fc18>
In [50]:
Copied!
usa.plot()
usa.plot()
Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a193be9e8>
In [51]:
Copied!
usa.crs
usa.crs
Out[51]:
{'init': 'epsg:4326'}
Snap! They are in different projections and geopandas does not handle that. SO lets try to reproject the fire stations to match usa
Reprojection¶
Projecting the fire stations layer to match the world dataset from geopandas
In [52]:
Copied!
fire_stations_gcs = fire_stations.to_crs(epsg=4326)
fire_stations_gcs = fire_stations.to_crs(epsg=4326)
In [53]:
Copied!
fire_stations_gcs.crs
fire_stations_gcs.crs
Out[53]:
{'init': 'epsg:4326', 'no_defs': True}
In [54]:
Copied!
fire_stations_gcs.plot()
fire_stations_gcs.plot()
Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a193ec0f0>
In [60]:
Copied!
fire_stations_gcs.total_bounds
fire_stations_gcs.total_bounds
Out[60]:
array([-118.88357856, 33.32376846, -117.70773665, 34.75793147])
Overlay plot¶
In [66]:
Copied!
f, ax = plt.subplots(1, figsize=(12, 12))
ax.set_title('Fire stations in Southern California')
usa.plot(ax=ax, facecolor='lightgray', edgecolor='gray')
fire_stations_gcs.plot(ax=ax)
#this plots the whole US. To limit the scale to just SoCal, limit the axes
ax.set_xlim([-120, -118])
ax.set_ylim([33, 35])
f, ax = plt.subplots(1, figsize=(12, 12))
ax.set_title('Fire stations in Southern California')
usa.plot(ax=ax, facecolor='lightgray', edgecolor='gray')
fire_stations_gcs.plot(ax=ax)
#this plots the whole US. To limit the scale to just SoCal, limit the axes
ax.set_xlim([-120, -118])
ax.set_ylim([33, 35])
Out[66]:
(33, 35)
In [2]:
Copied!
damsel_gdf = gpd.read_file('./data/damselfish/DAMSELFISH_distributions.shp')
damsel_gdf.head(3)
damsel_gdf = gpd.read_file('./data/damselfish/DAMSELFISH_distributions.shp')
damsel_gdf.head(3)
Out[2]:
ID_NO | BINOMIAL | ORIGIN | COMPILER | YEAR | CITATION | SOURCE | DIST_COMM | ISLAND | SUBSPECIES | ... | RL_UPDATE | KINGDOM_NA | PHYLUM_NAM | CLASS_NAME | ORDER_NAME | FAMILY_NAM | GENUS_NAME | SPECIES_NA | CATEGORY | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 183963.0 | Stegastes leucorus | 1 | IUCN | 2010 | International Union for Conservation of Nature... | None | None | None | None | ... | 2012.1 | ANIMALIA | CHORDATA | ACTINOPTERYGII | PERCIFORMES | POMACENTRIDAE | Stegastes | leucorus | VU | POLYGON ((-115.6437454219999 29.71392059300007... |
1 | 183963.0 | Stegastes leucorus | 1 | IUCN | 2010 | International Union for Conservation of Nature... | None | None | None | None | ... | 2012.1 | ANIMALIA | CHORDATA | ACTINOPTERYGII | PERCIFORMES | POMACENTRIDAE | Stegastes | leucorus | VU | POLYGON ((-105.589950704 21.89339825500002, -1... |
2 | 183963.0 | Stegastes leucorus | 1 | IUCN | 2010 | International Union for Conservation of Nature... | None | None | None | None | ... | 2012.1 | ANIMALIA | CHORDATA | ACTINOPTERYGII | PERCIFORMES | POMACENTRIDAE | Stegastes | leucorus | VU | POLYGON ((-111.159618439 19.01535626700007, -1... |
3 rows × 24 columns
In [3]:
Copied!
damsel_gdf.columns
damsel_gdf.columns
Out[3]:
Index(['ID_NO', 'BINOMIAL', 'ORIGIN', 'COMPILER', 'YEAR', 'CITATION', 'SOURCE', 'DIST_COMM', 'ISLAND', 'SUBSPECIES', 'SUBPOP', 'LEGEND', 'SEASONAL', 'TAX_COMM', 'RL_UPDATE', 'KINGDOM_NA', 'PHYLUM_NAM', 'CLASS_NAME', 'ORDER_NAME', 'FAMILY_NAM', 'GENUS_NAME', 'SPECIES_NA', 'CATEGORY', 'geometry'], dtype='object')
In [4]:
Copied!
damsel_gdf.shape
damsel_gdf.shape
Out[4]:
(231, 24)
Plot a map by category¶
In [17]:
Copied!
damsel_gdf.plot(column='BINOMIAL', categorical=True,
figsize=(12, 4))
damsel_gdf.plot(column='BINOMIAL', categorical=True,
figsize=(12, 4))
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1d4a9eb8>
Groupby a column¶
The BINOMIAL
column contains the full genus
and species
names of the fishes. We could split the shape file by each fish type
In [5]:
Copied!
damsel_gdf['BINOMIAL'].value_counts()
damsel_gdf['BINOMIAL'].value_counts()