lidar_labeler.batch_download_chunks

This script performs batch processing of geospatial data to facilitate the download and merging of raster files. It uses the dem_getter module to download DEM data in batches based on specified grid sizes and polygons. The script includes functions to divide polygons into grids, filter non-intersecting rows, and manage raster file downloads. Additionally, it handles the merging of raster files and ensures the proper management of temporary files. The processed data is saved to specified directories for further use.

  1"""
  2This script performs batch processing of geospatial data to facilitate the download and merging of raster files. 
  3It uses the `dem_getter` module to download DEM data in batches based on specified grid sizes and polygons. The script 
  4includes functions to divide polygons into grids, filter non-intersecting rows, and manage raster file downloads. 
  5Additionally, it handles the merging of raster files and ensures the proper management of temporary files. 
  6The processed data is saved to specified directories for further use.
  7"""
  8
  9try:
 10    import dem_getter as dg
 11except ImportError:
 12    print("Failed to import the 'dem_getter' module. Please ensure that you have set up your environment correctly.")
 13    print("For setup instructions, refer to the Setting Up the Dem Getter section in README document.")
 14
 15#Call demgetter fns to get data
 16import sys
 17import os
 18from pathlib import Path
 19# scriptDir = os.path.dirname(os.path.abspath(__file__))
 20# parentDir = os.path.dirname(scriptDir)
 21
 22# #Import Lidar Label Builder
 23# sys.path.append(os.path.join(parentDir, 'lidar_label_builder'))
 24
 25sys.path.append(Path(__file__).resolve().parent.parent / 'lidar_label_builder')
 26import lidar_label_builder as llb
 27
 28import geopandas as gpd
 29from shapely.geometry import Polygon
 30import numpy as np
 31import json
 32import shutil
 33import time
 34from requests.exceptions import HTTPError
 35
 36#Connect To Global Variables
 37# global_vars = os.path.join(parentDir, 'configs', 'global_variables.json')
 38
 39# with open(global_vars, 'r') as f:
 40#     params_dict = json.load(f)
 41
 42with (Path(__file__).resolve().parent.parent / 'configs' / 'global_variables.json').open('r') as f:
 43    params_dict = json.load(f)
 44    
 45BATCH_DOWNLOAD_AREA = params_dict['BATCH_DOWNLOAD_AREA']
 46BATCH_DOWNLOAD_DATASET_NAME = params_dict['BATCH_DOWNLOAD_DATASET_NAME']
 47MERGED_RASTER_FILE_LABEL = params_dict['MERGED_RASTER_FILE_LABEL']
 48MERGED_RASTER_FOLDER = params_dict['MERGED_RASTER_FOLDER']
 49TEMP_RASTER_DOWNLOAD_FOLDER = params_dict['TEMP_RASTER_DOWNLOAD_FOLDER']
 50RSTR_COL_PATTERN = params_dict['RSTR_COL_PATTERN']
 51SMALL_GRID_DF_COL = params_dict['SMALL_GRID_DF_COL']
 52SMALL_GRID_DF_DIRECTORY = params_dict['SMALL_GRID_DF_DIRECTORY']
 53DEPLOY_MED_GRID_LABEL = params_dict['DEPLOY_MED_GRID_LABEL']
 54LABEL_FILE_EXT = params_dict['LABEL_FILE_EXT']
 55DEPLOY_SM_GRID_LABEL = params_dict['DEPLOY_SM_GRID_LABEL']
 56
 57
 58def load_polygon_gdb_and_convert_multipolygons(pathToAreaPolygon:str):
 59    """Loads GeoDataFrame from a filepath of a geodatabase containing an area polygon or multipolygon 
 60    and converts any multipolygons to polygons.
 61
 62    Args:
 63        pathToAreaPolygon (str): _description_
 64
 65    Returns:
 66        gdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry.
 67    """
 68    # Load the GeoDataFrame from the specified file path
 69    gdf = gpd.read_file(pathToAreaPolygon)
 70
 71    # Check if the GeoDataFrame contains multipolygons and, if so, explode them into individual polygons
 72    return gdf.explode(index_parts=True).reset_index() if 'MultiPolygon' in gdf.geom_type.unique() else gdf
 73
 74def divide_polygon_into_square_grid(polygonGdf:gpd.GeoDataFrame, gridCellLength:int, i:int = 0):
 75    """Takes a polygon in a geodataframe and divides it into a square grid of a desired size.
 76
 77    Args:
 78        polygonGdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry.
 79        gridCellLength (int): The desired length of gridcells in meters.
 80        i (int): index value of the polygon to grid. Defaults to 0.
 81
 82    Returns:
 83        squareGridDf (gpd.GeoDataFrame): A geopandas GeoDataframe of the input polygon divided into a square grid. Each row contains geometry for a single square grid cell. 
 84    """
 85    # Extract the coordinates of the polygon's exterior
 86    x, y = polygonGdf['geometry'][i].exterior.coords.xy
 87    
 88    # Determine the bounding box of the polygon
 89    xmin, xmax, ymin, ymax = [min(x), max(x), min(y), max(y)]
 90
 91    # Create a list of x and y coordinates for the bottom-left corners of grid cells
 92    xcoords = np.arange(xmin, xmax, gridCellLength) #create a list of the bottom left x coordinates for grids
 93    ycoords = np.arange(ymin, ymax, gridCellLength) #create a list of the bottom left y coordinates for grids
 94
 95    gridCells = [] # List to store the grid cells
 96
 97    # Construct grid cells as polygons and add them to the list
 98    for x in xcoords:
 99        for y in ycoords:
100            gridCell = Polygon([(x,y),(x, y+ gridCellLength),
101                                (x+gridCellLength, y+gridCellLength),
102                                (x + gridCellLength, y),
103                                (x,y)]) # Define the corners of the grid cell
104            gridCells.append(gridCell) # Append grid cell to list
105   
106    # Retrieve the coordinate reference system (CRS) from the input GeoDataFrame
107    crs = polygonGdf.crs
108
109    # Create a GeoDataFrame with the grid cells and return it
110    return gpd.GeoDataFrame(geometry = gridCells, crs = crs)
111
112def calc_medium_grid_length(smallGridSize:int):
113    """Calculates a size of medium grid that is compatible with the small grid size based on the globally defined batch download area. 
114
115    Args:
116        smallGridSize (int):The width of the small square grid cells.
117
118    Returns:
119        mediumGridLength (int): The calculated length of the medium grid cells. 
120    """
121    # Calculate the number of small grids that fit into the batch download area
122    nSmallGridsPerMedGrid = int(np.floor(BATCH_DOWNLOAD_AREA/(smallGridSize**2)))
123
124    # Calculate the number of small grids that can fit along one side of the medium grid
125    nSmallGridsLength = int(np.floor(np.sqrt(nSmallGridsPerMedGrid)))
126
127    # Calculate the length of the medium grid cell by multiplying the number of small grids per side by the size of a small grid
128    return nSmallGridsLength*smallGridSize
129
130def rm_non_intersecting_rows(targetDf:gpd.GeoDataFrame, intersectsPoly:gpd.GeoSeries):
131    """Removes rows from the target dataframe that do not intersect with the specified polygon.
132
133    Args:
134        targetDf (gpd.GeoDataFrame): The target GeoDataFrame.
135        intersectsPoly (gpd.GeoSeries): A GeoSeries containing the polygon(s) to check intersections against.
136
137    Returns:
138        outDf (gpd.GeoDataFrame): The filtered GeoDataFrame with non-intersecting rows removed.
139    """
140    rows_to_drop = []  # List to store indices of rows that do not intersect
141
142    # Iterate over each row in the target GeoDataFrame
143    for i, row in targetDf.iterrows():
144        # Check if the geometry of the current row intersects with the geometries in intersectsPoly
145        intersects = row['geometry'].intersects(intersectsPoly['geometry']).iloc[0]
146
147        # If there is no intersection, add the index to the list of rows to drop
148        if not intersects:
149            rows_to_drop.append(i)
150
151    # Drop rows from the target GeoDataFrame where there was no intersection
152    outDf = targetDf.drop(index=rows_to_drop).copy()
153
154    # Reset the index of the resulting GeoDataFrame
155    outDf.reset_index(drop=True, inplace=True)
156
157    return outDf
158
159def create_gridded_dfs_for_batch_download(polygonPath:str, smallGridSize:int, outDirectory:str):
160    """Creates gridded dataframes for batch downloading by dividing the input polygon area into grids of specified sizes.
161    The medium grid is used to batch download and the smaller grids are used to later crop the downloaded
162    data into small squares and apply a cnn model to each square.
163    Returns the small grids as a list and a geopandas GeoDataFrame where each row contains the paths
164    to the corresponding small grid for that area.
165
166    Args:
167        polygonPath (str): The file path to the input polygon.
168        smallGridSize (int): The size of the small grid.
169        outDirectory (str): The output directory to save the gridded dataframes.
170
171    Returns:
172        clippedSmallGridDfs (list): A list of small gridded GeoDataFrames.
173        filteredMedGridDf (gpd.GeoDataFrame): The filtered medium grid GeoDataFrame.
174    """
175
176    bboxPolygon = load_polygon_gdb_and_convert_multipolygons(polygonPath)[0:] #load extent polygon making sure multipolygons removed
177    smallGriddedDf = divide_polygon_into_square_grid(bboxPolygon, smallGridSize) #create a small grid across the entire input polygon
178    #Calculate the length of the medium grid cells that will 1) be an appropriate download size and 2) contain a whole number of small grids (i.e. no small grids are partially in any medium grids)
179    medGridLength = calc_medium_grid_length(smallGridSize) 
180    medGridDf = divide_polygon_into_square_grid(bboxPolygon, medGridLength) #creare a medium grid across the entire input polygon based on the length returned from the previous function
181
182     #Remove any small and medium grids that are not at least partially touching the input polygon
183    filteredSmallGridDf = rm_non_intersecting_rows(smallGriddedDf, bboxPolygon)
184    filteredMedGridDf = rm_non_intersecting_rows(medGridDf, bboxPolygon)
185    
186    
187    clippedSmallGridDfPaths = []
188    clippedSmallGridDfs = []
189    clippedMedGrids = []
190
191    for i, medPolygon in filteredMedGridDf.iterrows():
192        #Isolating only the grids within each medium Polygon
193        clippedSmallGridDf = gpd.clip(filteredSmallGridDf, medPolygon['geometry'])
194        clippedSmallGridDf = clippedSmallGridDf[clippedSmallGridDf['geometry'].geom_type == 'Polygon']
195        clippedSmallGridDf.reset_index(drop = True, inplace = True)
196        
197        #Now make the medium grids so they are a combination of the small grids within the original medium Grid to make sure they perfectly overlap.
198        # This also converts them to an irregular shape around the edges of non rectangular polygons.
199        clippedMedGrid = clippedSmallGridDf.unary_union
200        clippedMedGrids.append(clippedMedGrid)
201
202        #SaveSmallGridDf
203        fname = f'{DEPLOY_SM_GRID_LABEL}{i}.shp'
204        if outDirectory:
205            outDir = os.path.join(outDirectory, SMALL_GRID_DF_DIRECTORY)
206        else:
207            outDir = SMALL_GRID_DF_DIRECTORY
208        if not os.path.exists(outDir):
209            os.makedirs(outDir)
210        outPath = os.path.join(outDir, fname)
211        clippedSmallGridDf.to_file(outPath)
212        clippedSmallGridDfPaths.append(outPath)
213        clippedSmallGridDfs.append(clippedSmallGridDf)
214
215    filteredMedGridDf['geometry'] = clippedMedGrids
216    filteredMedGridDf[SMALL_GRID_DF_COL]= clippedSmallGridDfPaths
217
218    #If any multipolygons were created in the medium gridded db handle these
219    if 'MultiPolygon' in filteredMedGridDf['geometry'].geom_type.unique():
220        print('Handling MultiPolygons...')
221        MultiPolyIndices = []
222        for i,row in filteredMedGridDf.iterrows():
223            if row['geometry'].geom_type == 'MultiPolygon':
224                MultiPolyIndices.append(i)#creates a list of indices where there are Multipolygons
225
226        explodedDf = filteredMedGridDf.explode(index_parts=True).reset_index(drop=True)
227        #Loops through the original multipolygon and searches for rows that have the same small grid filepath as the original multipolygon row
228        for index in MultiPolyIndices:
229            matchingPathIndices = []
230            pathToSmallGridFile = filteredMedGridDf.loc[index, SMALL_GRID_DF_COL]
231            origSmallGrid = gpd.read_file(pathToSmallGridFile)
232            base, ext = os.path.splitext(pathToSmallGridFile)
233            for i, row in explodedDf.iterrows():
234                if row[SMALL_GRID_DF_COL] == pathToSmallGridFile:
235                    matchingPathIndices.append(i)
236            for i in matchingPathIndices:
237                #Clip the original grid to the new geometry
238                reclippedSmallGridDf = gpd.clip(origSmallGrid, explodedDf.loc[i, 'geometry'])
239                reclippedSmallGridDf = reclippedSmallGridDf[reclippedSmallGridDf['geometry'].geom_type == 'Polygon'] # Removes any non polygon geometry
240                reclippedSmallGridDf.reset_index(drop=True, inplace = True)
241
242                savePath = f"{base}_clip{i}_{ext}"
243                reclippedSmallGridDf.to_file(savePath)
244                explodedDf.at[i, SMALL_GRID_DF_COL] = savePath
245        filteredMedGridDf = explodedDf.copy()
246
247    return clippedSmallGridDfs, filteredMedGridDf
248
249def batch_download_and_merge(gdf:gpd.GeoDataFrame, i:int, outDirectory:str, 
250                             rasterLabel:str = MERGED_RASTER_FILE_LABEL, deleteTempFiles:bool = True, overwriteExisting:bool=False):
251    """
252    Downloads and merges raster files in batches for the specified grid cell.
253
254    Args:
255        gdf (gpd.GeoDataFrame): The GeoDataFrame containing grid cell geometry.
256        i (int): The index of the grid cell to process.
257        outDirectory (str): The output directory to save the merged raster files.
258        rasterLabel (str, optional): The label for the merged raster file. Defaults to MERGED_RASTER_FILE_LABEL.
259        deleteTempFiles (bool, optional): Whether to delete temporary files after merging. Defaults to True.
260        overwriteExisting (bool, optional): Whether to overwrite existing raster files. Defaults to False.
261
262    Returns:
263        outPath (str): The path to the merged raster file.
264        newInfo (bool): Whether new information was added to the GeoDataFrame.
265    """
266    
267    #Set a marker for if new information is added to the gdf. When new information, the batch download chunks function will save the file.
268    newInfo = True
269
270    #Initialize the raster column if not yet existing
271    if not RSTR_COL_PATTERN in gdf.columns:
272        gdf[RSTR_COL_PATTERN] = ''
273    
274    #If there is already a path or 'No Path' string in this row, skip the download for this row. 
275    # Returns false for new info to avoid redundant saving. 
276    if RSTR_COL_PATTERN in gdf.columns and not overwriteExisting:
277        existingPath = gdf.at[i, RSTR_COL_PATTERN]
278        if existingPath:
279            print(f"Skipping batch download and merge for index {i} as the path already exists. {existingPath}")
280            newInfo = False
281            return existingPath, newInfo
282        
283    #Set unique file path for the merged raster
284    fname = f'{rasterLabel}_{i}.tif'
285    outPath = os.path.join(outDirectory, MERGED_RASTER_FOLDER, fname)
286    
287    #Check if there is already a merged raster for this index, if overwriteExisting is not set to true 
288    # return this path as the outPath for this index and skip the download.
289    if os.path.exists(outPath) and not overwriteExisting:
290        print(f'Existing path detected for idx {i}. Skipping batch download.')
291        return outPath, newInfo
292    
293    #A check for multipolygons (which should have been removed previously) to avoid an unwanted error 
294    # since the merge_warp is not compatible with multipolygons
295    if gdf['geometry'][i] == 'MultiPolygon':
296        print(f'Multipolygon at idx {i}. Batch download skipped to avoid errors.')
297        outPath = 'No Path'
298        return outPath, newInfo
299    
300    #Attempt to do a batch download
301    try:
302        #Get epsg code
303        epsg = llb.get_spatial_ref_from_shapefile(gdf)[1]
304
305        #Send request to get paths for the min max x y extents of the geometry in this row
306        #Set 10 attempts to get the path, this will help bypass some server timeouts that can happen with requests
307        maxAttempts = 10
308        for attempt in range(maxAttempts):
309            try:
310                paths = dg.get_aws_paths_from_geodataframe(BATCH_DOWNLOAD_DATASET_NAME, gdf, rowIdx=i) 
311                if paths is None:
312                    break
313                break
314            except HTTPError:
315                time.sleep(5) #retry after 5 seconds if we get some error 
316            
317        #Set save directory and create it if it doesnt exist. 
318        saveDir = os.path.join(outDirectory, MERGED_RASTER_FOLDER)
319        if not os.path.exists(saveDir):
320            os.makedirs(saveDir)
321        
322        #If paths were returned download these paths and merge them to the min max x y extent of the geometry in this row
323        if paths:
324            print(f'Batch Downloading for idx: {i}...')
325            tempSaveDir = os.path.join(outDirectory, TEMP_RASTER_DOWNLOAD_FOLDER, str(i)) 
326            if not os.path.exists(tempSaveDir):
327                os.makedirs(tempSaveDir)
328            filelist = dg.batch_download(paths, tempSaveDir, doForceDownload = True)
329            fname = f'{rasterLabel}_{i}.tif'
330            outPath = os.path.join(saveDir, fname)
331            x,y = gdf['geometry'][i].exterior.coords.xy
332            mergeExtent = ([min(x), max(x)],[min(y),max(y)])
333
334            print(f'Merging DEMS for idx: {i}...')
335            dg.merge_warp_dems(filelist, outPath, mergeExtent, epsg)
336
337            #This removes the downloaded products if deleteTempFiles is True
338            if deleteTempFiles:
339                if os.path.exists(tempSaveDir):
340                    shutil.rmtree(tempSaveDir)
341        #Returns a string 'No Path' if the aws request found no products or this area
342        else:
343            outPath = 'No Path'
344
345    #If an error occurred during the batch download and persists after maxAttemps, prints the error 
346    # statement and returns a blank outPath so this idx download can be reattemped by running this function again
347    except Exception as e:
348        print(f'An error occurred with batch download for idx:{i}: {e}')
349        outPath= '' #If an error occurs, leave the outPath blank so that when the function is run again it will retry this request. 
350        newInfo = False
351    return outPath, newInfo
352
353def batch_download_chunks(polygonPath:str, smallGridSize:int, 
354                          label:str = None, outDirectory:str=None, deleteTempFiles:bool = True, overwriteExisting:bool=False):
355    """Takes an input polygon area, divides it into medium grids for batch downloads and smaller grids for labeling,
356    then downloads the products in batches using the dem getter tool. Outputs a dataframe with the geometry for the 
357    medium grid with a path to a raster file or none if no products were available and a path to the small gridded database for labeling.
358
359    Args:
360        polygonPath (str): The file path to the input polygon.
361        smallGridSize (int): The size of the small grid.
362        label (str, optional): The label for the output files. Defaults to None.
363        outDirectory (str, optional): The output directory to save the files. When none places the output files into the same directory as the 
364        input polygon. Defaults to None.
365        deleteTempFiles (bool, optional): Whether to delete temporary files after processing. Defaults to True.
366        overwriteExisting (bool, optional): Whether to overwrite existing files if detected. Defaults to False.
367
368    Returns:
369        medGridDf (gpd.GeoDataFrame): The GeoDataFrame with medium grid geometry and paths to raster files.
370    """
371    #If no label was specified get the label from the name of the polygon
372    if label is None:
373        polygonDir, fname = os.path.split(polygonPath)
374        label = os.path.splitext(fname)[0]
375
376    #If no save directory specified, place in the same directory as the input polygon
377    if not outDirectory:
378        outDirectory = polygonDir
379        
380    dfOutPath = os.path.join(outDirectory, f'{label}{DEPLOY_MED_GRID_LABEL}{LABEL_FILE_EXT}')
381    
382    #Look for a preexisting file and load if exists, otherwise make a the gridded dfs for batch download. 
383    if os.path.exists(dfOutPath):
384        (print("Loading Gridded Df..."))
385        medGridDf = gpd.read_file(dfOutPath)
386    else:
387        print("Making Gridded Df...")
388        medGridDf =  create_gridded_dfs_for_batch_download(polygonPath, smallGridSize, outDirectory)[1]
389    
390    medGridDf.to_file(dfOutPath, truncation=False)
391    
392    #Loop through the df and batch download/merge for each row
393    length = len(medGridDf)
394    for i, _ in medGridDf.iterrows():
395        #Check for multipolygons (which should have been removed already)
396        if medGridDf.iloc[i]['geometry'] == 'MultiPolygon':
397            outPath = None
398            
399        outPath, newInfo = batch_download_and_merge(medGridDf, i, outDirectory, deleteTempFiles = deleteTempFiles, overwriteExisting=overwriteExisting)
400        
401        #Resave the df with updated info if new information was gained from the batch_download_and_merge fn. 
402        if newInfo:
403            medGridDf.at[i,RSTR_COL_PATTERN] = outPath
404            if len(medGridDf) == length:
405                medGridDf.to_file(dfOutPath, truncation=False)
406                print(f'df with updated filepaths for index {i} saved to {dfOutPath}')
407    
408    #Removes the merged raster folder, which should be empty at this point if deleteTempFiles is True
409    if deleteTempFiles:
410        tempDir = os.path.join(outDirectory, TEMP_RASTER_DOWNLOAD_FOLDER)
411        if os.path.exists(tempDir):
412            shutil.rmtree(tempDir)
413    
414    #Final save
415    medGridDf.to_file(dfOutPath, truncation=False)
416    print(f'chunked batch download dataframe saved to: {dfOutPath}')
417    
418    return medGridDf
419
420if __name__ == '__main__':
421    import sys
422    import json
423
424    # Load parameters from the JSON file
425    params = sys.argv[1]
426    with open(params, 'r') as f:
427        params_dict = json.load(f)
428
429    start = time.time()
430    batch_download_chunks(**params_dict)
431    end = time.time()
432
433    print(f'Time to batch download: {end-start}')
BATCH_DOWNLOAD_AREA = 140000000
BATCH_DOWNLOAD_DATASET_NAME = 'DEM_1m'
MERGED_RASTER_FILE_LABEL = 'deployment_area_raster'
MERGED_RASTER_FOLDER = 'merged_rasters'
TEMP_RASTER_DOWNLOAD_FOLDER = 'temp_raster_files'
RSTR_COL_PATTERN = 'rstr_paths'
SMALL_GRID_DF_COL = 'griddedDf'
SMALL_GRID_DF_DIRECTORY = 'griddedDfs'
DEPLOY_MED_GRID_LABEL = '_mediumGrid'
LABEL_FILE_EXT = '.shp'
DEPLOY_SM_GRID_LABEL = 'smallGrid'
def load_polygon_gdb_and_convert_multipolygons(pathToAreaPolygon: str):
59def load_polygon_gdb_and_convert_multipolygons(pathToAreaPolygon:str):
60    """Loads GeoDataFrame from a filepath of a geodatabase containing an area polygon or multipolygon 
61    and converts any multipolygons to polygons.
62
63    Args:
64        pathToAreaPolygon (str): _description_
65
66    Returns:
67        gdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry.
68    """
69    # Load the GeoDataFrame from the specified file path
70    gdf = gpd.read_file(pathToAreaPolygon)
71
72    # Check if the GeoDataFrame contains multipolygons and, if so, explode them into individual polygons
73    return gdf.explode(index_parts=True).reset_index() if 'MultiPolygon' in gdf.geom_type.unique() else gdf

Loads GeoDataFrame from a filepath of a geodatabase containing an area polygon or multipolygon and converts any multipolygons to polygons.

Arguments:
  • pathToAreaPolygon (str): _description_
Returns:

gdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry.

def divide_polygon_into_square_grid( polygonGdf: geopandas.geodataframe.GeoDataFrame, gridCellLength: int, i: int = 0):
 75def divide_polygon_into_square_grid(polygonGdf:gpd.GeoDataFrame, gridCellLength:int, i:int = 0):
 76    """Takes a polygon in a geodataframe and divides it into a square grid of a desired size.
 77
 78    Args:
 79        polygonGdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry.
 80        gridCellLength (int): The desired length of gridcells in meters.
 81        i (int): index value of the polygon to grid. Defaults to 0.
 82
 83    Returns:
 84        squareGridDf (gpd.GeoDataFrame): A geopandas GeoDataframe of the input polygon divided into a square grid. Each row contains geometry for a single square grid cell. 
 85    """
 86    # Extract the coordinates of the polygon's exterior
 87    x, y = polygonGdf['geometry'][i].exterior.coords.xy
 88    
 89    # Determine the bounding box of the polygon
 90    xmin, xmax, ymin, ymax = [min(x), max(x), min(y), max(y)]
 91
 92    # Create a list of x and y coordinates for the bottom-left corners of grid cells
 93    xcoords = np.arange(xmin, xmax, gridCellLength) #create a list of the bottom left x coordinates for grids
 94    ycoords = np.arange(ymin, ymax, gridCellLength) #create a list of the bottom left y coordinates for grids
 95
 96    gridCells = [] # List to store the grid cells
 97
 98    # Construct grid cells as polygons and add them to the list
 99    for x in xcoords:
100        for y in ycoords:
101            gridCell = Polygon([(x,y),(x, y+ gridCellLength),
102                                (x+gridCellLength, y+gridCellLength),
103                                (x + gridCellLength, y),
104                                (x,y)]) # Define the corners of the grid cell
105            gridCells.append(gridCell) # Append grid cell to list
106   
107    # Retrieve the coordinate reference system (CRS) from the input GeoDataFrame
108    crs = polygonGdf.crs
109
110    # Create a GeoDataFrame with the grid cells and return it
111    return gpd.GeoDataFrame(geometry = gridCells, crs = crs)

Takes a polygon in a geodataframe and divides it into a square grid of a desired size.

Arguments:
  • polygonGdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry.
  • gridCellLength (int): The desired length of gridcells in meters.
  • i (int): index value of the polygon to grid. Defaults to 0.
Returns:

squareGridDf (gpd.GeoDataFrame): A geopandas GeoDataframe of the input polygon divided into a square grid. Each row contains geometry for a single square grid cell.

def calc_medium_grid_length(smallGridSize: int):
113def calc_medium_grid_length(smallGridSize:int):
114    """Calculates a size of medium grid that is compatible with the small grid size based on the globally defined batch download area. 
115
116    Args:
117        smallGridSize (int):The width of the small square grid cells.
118
119    Returns:
120        mediumGridLength (int): The calculated length of the medium grid cells. 
121    """
122    # Calculate the number of small grids that fit into the batch download area
123    nSmallGridsPerMedGrid = int(np.floor(BATCH_DOWNLOAD_AREA/(smallGridSize**2)))
124
125    # Calculate the number of small grids that can fit along one side of the medium grid
126    nSmallGridsLength = int(np.floor(np.sqrt(nSmallGridsPerMedGrid)))
127
128    # Calculate the length of the medium grid cell by multiplying the number of small grids per side by the size of a small grid
129    return nSmallGridsLength*smallGridSize

Calculates a size of medium grid that is compatible with the small grid size based on the globally defined batch download area.

Arguments:
  • smallGridSize (int): The width of the small square grid cells.
Returns:

mediumGridLength (int): The calculated length of the medium grid cells.

def rm_non_intersecting_rows( targetDf: geopandas.geodataframe.GeoDataFrame, intersectsPoly: geopandas.geoseries.GeoSeries):
131def rm_non_intersecting_rows(targetDf:gpd.GeoDataFrame, intersectsPoly:gpd.GeoSeries):
132    """Removes rows from the target dataframe that do not intersect with the specified polygon.
133
134    Args:
135        targetDf (gpd.GeoDataFrame): The target GeoDataFrame.
136        intersectsPoly (gpd.GeoSeries): A GeoSeries containing the polygon(s) to check intersections against.
137
138    Returns:
139        outDf (gpd.GeoDataFrame): The filtered GeoDataFrame with non-intersecting rows removed.
140    """
141    rows_to_drop = []  # List to store indices of rows that do not intersect
142
143    # Iterate over each row in the target GeoDataFrame
144    for i, row in targetDf.iterrows():
145        # Check if the geometry of the current row intersects with the geometries in intersectsPoly
146        intersects = row['geometry'].intersects(intersectsPoly['geometry']).iloc[0]
147
148        # If there is no intersection, add the index to the list of rows to drop
149        if not intersects:
150            rows_to_drop.append(i)
151
152    # Drop rows from the target GeoDataFrame where there was no intersection
153    outDf = targetDf.drop(index=rows_to_drop).copy()
154
155    # Reset the index of the resulting GeoDataFrame
156    outDf.reset_index(drop=True, inplace=True)
157
158    return outDf

Removes rows from the target dataframe that do not intersect with the specified polygon.

Arguments:
  • targetDf (gpd.GeoDataFrame): The target GeoDataFrame.
  • intersectsPoly (gpd.GeoSeries): A GeoSeries containing the polygon(s) to check intersections against.
Returns:

outDf (gpd.GeoDataFrame): The filtered GeoDataFrame with non-intersecting rows removed.

def create_gridded_dfs_for_batch_download(polygonPath: str, smallGridSize: int, outDirectory: str):
160def create_gridded_dfs_for_batch_download(polygonPath:str, smallGridSize:int, outDirectory:str):
161    """Creates gridded dataframes for batch downloading by dividing the input polygon area into grids of specified sizes.
162    The medium grid is used to batch download and the smaller grids are used to later crop the downloaded
163    data into small squares and apply a cnn model to each square.
164    Returns the small grids as a list and a geopandas GeoDataFrame where each row contains the paths
165    to the corresponding small grid for that area.
166
167    Args:
168        polygonPath (str): The file path to the input polygon.
169        smallGridSize (int): The size of the small grid.
170        outDirectory (str): The output directory to save the gridded dataframes.
171
172    Returns:
173        clippedSmallGridDfs (list): A list of small gridded GeoDataFrames.
174        filteredMedGridDf (gpd.GeoDataFrame): The filtered medium grid GeoDataFrame.
175    """
176
177    bboxPolygon = load_polygon_gdb_and_convert_multipolygons(polygonPath)[0:] #load extent polygon making sure multipolygons removed
178    smallGriddedDf = divide_polygon_into_square_grid(bboxPolygon, smallGridSize) #create a small grid across the entire input polygon
179    #Calculate the length of the medium grid cells that will 1) be an appropriate download size and 2) contain a whole number of small grids (i.e. no small grids are partially in any medium grids)
180    medGridLength = calc_medium_grid_length(smallGridSize) 
181    medGridDf = divide_polygon_into_square_grid(bboxPolygon, medGridLength) #creare a medium grid across the entire input polygon based on the length returned from the previous function
182
183     #Remove any small and medium grids that are not at least partially touching the input polygon
184    filteredSmallGridDf = rm_non_intersecting_rows(smallGriddedDf, bboxPolygon)
185    filteredMedGridDf = rm_non_intersecting_rows(medGridDf, bboxPolygon)
186    
187    
188    clippedSmallGridDfPaths = []
189    clippedSmallGridDfs = []
190    clippedMedGrids = []
191
192    for i, medPolygon in filteredMedGridDf.iterrows():
193        #Isolating only the grids within each medium Polygon
194        clippedSmallGridDf = gpd.clip(filteredSmallGridDf, medPolygon['geometry'])
195        clippedSmallGridDf = clippedSmallGridDf[clippedSmallGridDf['geometry'].geom_type == 'Polygon']
196        clippedSmallGridDf.reset_index(drop = True, inplace = True)
197        
198        #Now make the medium grids so they are a combination of the small grids within the original medium Grid to make sure they perfectly overlap.
199        # This also converts them to an irregular shape around the edges of non rectangular polygons.
200        clippedMedGrid = clippedSmallGridDf.unary_union
201        clippedMedGrids.append(clippedMedGrid)
202
203        #SaveSmallGridDf
204        fname = f'{DEPLOY_SM_GRID_LABEL}{i}.shp'
205        if outDirectory:
206            outDir = os.path.join(outDirectory, SMALL_GRID_DF_DIRECTORY)
207        else:
208            outDir = SMALL_GRID_DF_DIRECTORY
209        if not os.path.exists(outDir):
210            os.makedirs(outDir)
211        outPath = os.path.join(outDir, fname)
212        clippedSmallGridDf.to_file(outPath)
213        clippedSmallGridDfPaths.append(outPath)
214        clippedSmallGridDfs.append(clippedSmallGridDf)
215
216    filteredMedGridDf['geometry'] = clippedMedGrids
217    filteredMedGridDf[SMALL_GRID_DF_COL]= clippedSmallGridDfPaths
218
219    #If any multipolygons were created in the medium gridded db handle these
220    if 'MultiPolygon' in filteredMedGridDf['geometry'].geom_type.unique():
221        print('Handling MultiPolygons...')
222        MultiPolyIndices = []
223        for i,row in filteredMedGridDf.iterrows():
224            if row['geometry'].geom_type == 'MultiPolygon':
225                MultiPolyIndices.append(i)#creates a list of indices where there are Multipolygons
226
227        explodedDf = filteredMedGridDf.explode(index_parts=True).reset_index(drop=True)
228        #Loops through the original multipolygon and searches for rows that have the same small grid filepath as the original multipolygon row
229        for index in MultiPolyIndices:
230            matchingPathIndices = []
231            pathToSmallGridFile = filteredMedGridDf.loc[index, SMALL_GRID_DF_COL]
232            origSmallGrid = gpd.read_file(pathToSmallGridFile)
233            base, ext = os.path.splitext(pathToSmallGridFile)
234            for i, row in explodedDf.iterrows():
235                if row[SMALL_GRID_DF_COL] == pathToSmallGridFile:
236                    matchingPathIndices.append(i)
237            for i in matchingPathIndices:
238                #Clip the original grid to the new geometry
239                reclippedSmallGridDf = gpd.clip(origSmallGrid, explodedDf.loc[i, 'geometry'])
240                reclippedSmallGridDf = reclippedSmallGridDf[reclippedSmallGridDf['geometry'].geom_type == 'Polygon'] # Removes any non polygon geometry
241                reclippedSmallGridDf.reset_index(drop=True, inplace = True)
242
243                savePath = f"{base}_clip{i}_{ext}"
244                reclippedSmallGridDf.to_file(savePath)
245                explodedDf.at[i, SMALL_GRID_DF_COL] = savePath
246        filteredMedGridDf = explodedDf.copy()
247
248    return clippedSmallGridDfs, filteredMedGridDf

Creates gridded dataframes for batch downloading by dividing the input polygon area into grids of specified sizes. The medium grid is used to batch download and the smaller grids are used to later crop the downloaded data into small squares and apply a cnn model to each square. Returns the small grids as a list and a geopandas GeoDataFrame where each row contains the paths to the corresponding small grid for that area.

Arguments:
  • polygonPath (str): The file path to the input polygon.
  • smallGridSize (int): The size of the small grid.
  • outDirectory (str): The output directory to save the gridded dataframes.
Returns:

clippedSmallGridDfs (list): A list of small gridded GeoDataFrames. filteredMedGridDf (gpd.GeoDataFrame): The filtered medium grid GeoDataFrame.

def batch_download_and_merge( gdf: geopandas.geodataframe.GeoDataFrame, i: int, outDirectory: str, rasterLabel: str = 'deployment_area_raster', deleteTempFiles: bool = True, overwriteExisting: bool = False):
250def batch_download_and_merge(gdf:gpd.GeoDataFrame, i:int, outDirectory:str, 
251                             rasterLabel:str = MERGED_RASTER_FILE_LABEL, deleteTempFiles:bool = True, overwriteExisting:bool=False):
252    """
253    Downloads and merges raster files in batches for the specified grid cell.
254
255    Args:
256        gdf (gpd.GeoDataFrame): The GeoDataFrame containing grid cell geometry.
257        i (int): The index of the grid cell to process.
258        outDirectory (str): The output directory to save the merged raster files.
259        rasterLabel (str, optional): The label for the merged raster file. Defaults to MERGED_RASTER_FILE_LABEL.
260        deleteTempFiles (bool, optional): Whether to delete temporary files after merging. Defaults to True.
261        overwriteExisting (bool, optional): Whether to overwrite existing raster files. Defaults to False.
262
263    Returns:
264        outPath (str): The path to the merged raster file.
265        newInfo (bool): Whether new information was added to the GeoDataFrame.
266    """
267    
268    #Set a marker for if new information is added to the gdf. When new information, the batch download chunks function will save the file.
269    newInfo = True
270
271    #Initialize the raster column if not yet existing
272    if not RSTR_COL_PATTERN in gdf.columns:
273        gdf[RSTR_COL_PATTERN] = ''
274    
275    #If there is already a path or 'No Path' string in this row, skip the download for this row. 
276    # Returns false for new info to avoid redundant saving. 
277    if RSTR_COL_PATTERN in gdf.columns and not overwriteExisting:
278        existingPath = gdf.at[i, RSTR_COL_PATTERN]
279        if existingPath:
280            print(f"Skipping batch download and merge for index {i} as the path already exists. {existingPath}")
281            newInfo = False
282            return existingPath, newInfo
283        
284    #Set unique file path for the merged raster
285    fname = f'{rasterLabel}_{i}.tif'
286    outPath = os.path.join(outDirectory, MERGED_RASTER_FOLDER, fname)
287    
288    #Check if there is already a merged raster for this index, if overwriteExisting is not set to true 
289    # return this path as the outPath for this index and skip the download.
290    if os.path.exists(outPath) and not overwriteExisting:
291        print(f'Existing path detected for idx {i}. Skipping batch download.')
292        return outPath, newInfo
293    
294    #A check for multipolygons (which should have been removed previously) to avoid an unwanted error 
295    # since the merge_warp is not compatible with multipolygons
296    if gdf['geometry'][i] == 'MultiPolygon':
297        print(f'Multipolygon at idx {i}. Batch download skipped to avoid errors.')
298        outPath = 'No Path'
299        return outPath, newInfo
300    
301    #Attempt to do a batch download
302    try:
303        #Get epsg code
304        epsg = llb.get_spatial_ref_from_shapefile(gdf)[1]
305
306        #Send request to get paths for the min max x y extents of the geometry in this row
307        #Set 10 attempts to get the path, this will help bypass some server timeouts that can happen with requests
308        maxAttempts = 10
309        for attempt in range(maxAttempts):
310            try:
311                paths = dg.get_aws_paths_from_geodataframe(BATCH_DOWNLOAD_DATASET_NAME, gdf, rowIdx=i) 
312                if paths is None:
313                    break
314                break
315            except HTTPError:
316                time.sleep(5) #retry after 5 seconds if we get some error 
317            
318        #Set save directory and create it if it doesnt exist. 
319        saveDir = os.path.join(outDirectory, MERGED_RASTER_FOLDER)
320        if not os.path.exists(saveDir):
321            os.makedirs(saveDir)
322        
323        #If paths were returned download these paths and merge them to the min max x y extent of the geometry in this row
324        if paths:
325            print(f'Batch Downloading for idx: {i}...')
326            tempSaveDir = os.path.join(outDirectory, TEMP_RASTER_DOWNLOAD_FOLDER, str(i)) 
327            if not os.path.exists(tempSaveDir):
328                os.makedirs(tempSaveDir)
329            filelist = dg.batch_download(paths, tempSaveDir, doForceDownload = True)
330            fname = f'{rasterLabel}_{i}.tif'
331            outPath = os.path.join(saveDir, fname)
332            x,y = gdf['geometry'][i].exterior.coords.xy
333            mergeExtent = ([min(x), max(x)],[min(y),max(y)])
334
335            print(f'Merging DEMS for idx: {i}...')
336            dg.merge_warp_dems(filelist, outPath, mergeExtent, epsg)
337
338            #This removes the downloaded products if deleteTempFiles is True
339            if deleteTempFiles:
340                if os.path.exists(tempSaveDir):
341                    shutil.rmtree(tempSaveDir)
342        #Returns a string 'No Path' if the aws request found no products or this area
343        else:
344            outPath = 'No Path'
345
346    #If an error occurred during the batch download and persists after maxAttemps, prints the error 
347    # statement and returns a blank outPath so this idx download can be reattemped by running this function again
348    except Exception as e:
349        print(f'An error occurred with batch download for idx:{i}: {e}')
350        outPath= '' #If an error occurs, leave the outPath blank so that when the function is run again it will retry this request. 
351        newInfo = False
352    return outPath, newInfo

Downloads and merges raster files in batches for the specified grid cell.

Arguments:
  • gdf (gpd.GeoDataFrame): The GeoDataFrame containing grid cell geometry.
  • i (int): The index of the grid cell to process.
  • outDirectory (str): The output directory to save the merged raster files.
  • rasterLabel (str, optional): The label for the merged raster file. Defaults to MERGED_RASTER_FILE_LABEL.
  • deleteTempFiles (bool, optional): Whether to delete temporary files after merging. Defaults to True.
  • overwriteExisting (bool, optional): Whether to overwrite existing raster files. Defaults to False.
Returns:

outPath (str): The path to the merged raster file. newInfo (bool): Whether new information was added to the GeoDataFrame.

def batch_download_chunks( polygonPath: str, smallGridSize: int, label: str = None, outDirectory: str = None, deleteTempFiles: bool = True, overwriteExisting: bool = False):
354def batch_download_chunks(polygonPath:str, smallGridSize:int, 
355                          label:str = None, outDirectory:str=None, deleteTempFiles:bool = True, overwriteExisting:bool=False):
356    """Takes an input polygon area, divides it into medium grids for batch downloads and smaller grids for labeling,
357    then downloads the products in batches using the dem getter tool. Outputs a dataframe with the geometry for the 
358    medium grid with a path to a raster file or none if no products were available and a path to the small gridded database for labeling.
359
360    Args:
361        polygonPath (str): The file path to the input polygon.
362        smallGridSize (int): The size of the small grid.
363        label (str, optional): The label for the output files. Defaults to None.
364        outDirectory (str, optional): The output directory to save the files. When none places the output files into the same directory as the 
365        input polygon. Defaults to None.
366        deleteTempFiles (bool, optional): Whether to delete temporary files after processing. Defaults to True.
367        overwriteExisting (bool, optional): Whether to overwrite existing files if detected. Defaults to False.
368
369    Returns:
370        medGridDf (gpd.GeoDataFrame): The GeoDataFrame with medium grid geometry and paths to raster files.
371    """
372    #If no label was specified get the label from the name of the polygon
373    if label is None:
374        polygonDir, fname = os.path.split(polygonPath)
375        label = os.path.splitext(fname)[0]
376
377    #If no save directory specified, place in the same directory as the input polygon
378    if not outDirectory:
379        outDirectory = polygonDir
380        
381    dfOutPath = os.path.join(outDirectory, f'{label}{DEPLOY_MED_GRID_LABEL}{LABEL_FILE_EXT}')
382    
383    #Look for a preexisting file and load if exists, otherwise make a the gridded dfs for batch download. 
384    if os.path.exists(dfOutPath):
385        (print("Loading Gridded Df..."))
386        medGridDf = gpd.read_file(dfOutPath)
387    else:
388        print("Making Gridded Df...")
389        medGridDf =  create_gridded_dfs_for_batch_download(polygonPath, smallGridSize, outDirectory)[1]
390    
391    medGridDf.to_file(dfOutPath, truncation=False)
392    
393    #Loop through the df and batch download/merge for each row
394    length = len(medGridDf)
395    for i, _ in medGridDf.iterrows():
396        #Check for multipolygons (which should have been removed already)
397        if medGridDf.iloc[i]['geometry'] == 'MultiPolygon':
398            outPath = None
399            
400        outPath, newInfo = batch_download_and_merge(medGridDf, i, outDirectory, deleteTempFiles = deleteTempFiles, overwriteExisting=overwriteExisting)
401        
402        #Resave the df with updated info if new information was gained from the batch_download_and_merge fn. 
403        if newInfo:
404            medGridDf.at[i,RSTR_COL_PATTERN] = outPath
405            if len(medGridDf) == length:
406                medGridDf.to_file(dfOutPath, truncation=False)
407                print(f'df with updated filepaths for index {i} saved to {dfOutPath}')
408    
409    #Removes the merged raster folder, which should be empty at this point if deleteTempFiles is True
410    if deleteTempFiles:
411        tempDir = os.path.join(outDirectory, TEMP_RASTER_DOWNLOAD_FOLDER)
412        if os.path.exists(tempDir):
413            shutil.rmtree(tempDir)
414    
415    #Final save
416    medGridDf.to_file(dfOutPath, truncation=False)
417    print(f'chunked batch download dataframe saved to: {dfOutPath}')
418    
419    return medGridDf

Takes an input polygon area, divides it into medium grids for batch downloads and smaller grids for labeling, then downloads the products in batches using the dem getter tool. Outputs a dataframe with the geometry for the medium grid with a path to a raster file or none if no products were available and a path to the small gridded database for labeling.

Arguments:
  • polygonPath (str): The file path to the input polygon.
  • smallGridSize (int): The size of the small grid.
  • label (str, optional): The label for the output files. Defaults to None.
  • outDirectory (str, optional): The output directory to save the files. When none places the output files into the same directory as the
  • input polygon. Defaults to None.
  • deleteTempFiles (bool, optional): Whether to delete temporary files after processing. Defaults to True.
  • overwriteExisting (bool, optional): Whether to overwrite existing files if detected. Defaults to False.
Returns:

medGridDf (gpd.GeoDataFrame): The GeoDataFrame with medium grid geometry and paths to raster files.