Ex_extract_archives

Extract zip files

An exercise to read only .zip files from folder and extract them into folders. These .zip files contain shape files. We will use the same name as the zip file for the folders.

This exercise showcases how to use standard libraries of Python and loops.

Import necessary libraries

We will import os and zipfile libraries. Whenever possible import only the necessary components (modules, classes etc) from libraries.

In [1]:
# import necessary modules
import os
from zipfile import ZipFile

Let us explore how the zipfile library can be put to use

In [33]:
help(ZipFile.extractall)
Help on function extractall in module zipfile:

extractall(self, path=None, members=None, pwd=None)
    Extract all members from the archive to the current working
    directory. `path' specifies a different directory to extract to.
    `members' is optional and must be a subset of the list returned
    by namelist().

In [18]:
# path to folder containing zip files.
root = r'D:\DS_demo\Boston_data\archive2'

Make a list of zip files

In [21]:
file_list = list()
for fld_path, fld, files in os.walk(root):
    for file in files:
        print(os.path.join(fld_path, file))
        file_list.append(os.path.join(fld_path, file))
D:\DS_demo\Boston_data\archive2\Boston_Neighborhoods.zip
D:\DS_demo\Boston_data\archive2\Boston_Police_Stations.zip
D:\DS_demo\Boston_data\archive2\Boston_Segments.zip
D:\DS_demo\Boston_data\archive2\Budget_Facilities.zip
D:\DS_demo\Boston_data\archive2\Charging_Stations.zip
D:\DS_demo\Boston_data\archive2\City_of_Boston_Projects_FY2017.zip
D:\DS_demo\Boston_data\archive2\Colleges_and_Universities.zip
D:\DS_demo\Boston_data\archive2\Existing_Bike_Network.zip
D:\DS_demo\Boston_data\archive2\Fire_Departments.zip
D:\DS_demo\Boston_data\archive2\Fire_Districts.zip
D:\DS_demo\Boston_data\archive2\Fire_Hydrant.zip
D:\DS_demo\Boston_data\archive2\Hubway_Stations.zip
D:\DS_demo\Boston_data\archive2\Hydrography_Line.zip
D:\DS_demo\Boston_data\archive2\Live_Street_Address_Management_SAM_Addresses.zip
D:\DS_demo\Boston_data\archive2\Municipal_Building_Energy_Reporting_BERDO.csv
D:\DS_demo\Boston_data\archive2\Municipal_Building_Energy_Reporting_BERDO.zip
D:\DS_demo\Boston_data\archive2\Non_Public_Schools.zip
D:\DS_demo\Boston_data\archive2\Open_Space.zip
D:\DS_demo\Boston_data\archive2\Parcels_2016_Data_Full.zip
D:\DS_demo\Boston_data\archive2\Parcels_2017.zip
D:\DS_demo\Boston_data\archive2\Parking_Meters.zip
D:\DS_demo\Boston_data\archive2\Pedestrian_Ramp_Inventory.zip
D:\DS_demo\Boston_data\archive2\Police_Districts.zip
D:\DS_demo\Boston_data\archive2\Polling_Locations.zip
D:\DS_demo\Boston_data\archive2\Public_Schools.zip
D:\DS_demo\Boston_data\archive2\Traffic_Signals.zip
D:\DS_demo\Boston_data\archive2\Trees.zip
D:\DS_demo\Boston_data\archive2\Wards.zip
D:\DS_demo\Boston_data\archive2\ZIP_Codes.zip
In [22]:
len(file_list)
Out[22]:
29

Extract each zip file

To start with, we need a name for each folder. Let us derive that from the zip file name. To analyze the file paths let us use the pathlib library.

In [30]:
import pathlib
p1 = pathlib.Path(file_list[0])
In [31]:
p1.name
Out[31]:
'Boston_Neighborhoods.zip'
In [32]:
p1.name.split('.')
Out[32]:
['Boston_Neighborhoods', 'zip']
In [39]:
p1.parent
Out[39]:
WindowsPath('D:/DS_demo/Boston_data/archive2')
In [40]:
str(p1.parent)
Out[40]:
'D:\\DS_demo\\Boston_data\\archive2'

Loop through each zip file and extract it

In [52]:
for f in file_list:
    #construct file name
    p = pathlib.Path(f)
    fld_name, extn = p.name.split('.')
    if extn != "zip":
        print(f + " is not a zip, skipping")
        continue
    
    #construct full folder path
    output_path = os.path.join(str(p.parent),fld_name)
    
    #extract
    print("Extracting " + fld_name, end=" # ")
    z = ZipFile(f)
    z.extractall(output_path)
    print("success")
Extracting Boston_Neighborhoods # success
Extracting Boston_Police_Stations # success
Extracting Boston_Segments # success
Extracting Budget_Facilities # success
Extracting Charging_Stations # success
Extracting City_of_Boston_Projects_FY2017 # success
Extracting Colleges_and_Universities # success
Extracting Existing_Bike_Network # success
Extracting Fire_Departments # success
Extracting Fire_Districts # success
Extracting Fire_Hydrant # success
Extracting Hubway_Stations # success
Extracting Hydrography_Line # success
Extracting Live_Street_Address_Management_SAM_Addresses # success
D:\DS_demo\Boston_data\archive2\Municipal_Building_Energy_Reporting_BERDO.csv is not a zip, skipping
Extracting Municipal_Building_Energy_Reporting_BERDO # success
Extracting Non_Public_Schools # success
Extracting Open_Space # success
Extracting Parcels_2016_Data_Full # success
Extracting Parcels_2017 # success
Extracting Parking_Meters # success
Extracting Pedestrian_Ramp_Inventory # success
Extracting Police_Districts # success
Extracting Polling_Locations # success
Extracting Public_Schools # success
Extracting Traffic_Signals # success
Extracting Trees # success
Extracting Wards # success
Extracting ZIP_Codes # success
In [ ]: