lidar_labeler.batch_download_chunks
This script performs batch processing of geospatial data to facilitate the download and merging of raster files.
It uses the dem_getter module to download DEM data in batches based on specified grid sizes and polygons. The script
includes functions to divide polygons into grids, filter non-intersecting rows, and manage raster file downloads.
Additionally, it handles the merging of raster files and ensures the proper management of temporary files.
The processed data is saved to specified directories for further use.
1""" 2This script performs batch processing of geospatial data to facilitate the download and merging of raster files. 3It uses the `dem_getter` module to download DEM data in batches based on specified grid sizes and polygons. The script 4includes functions to divide polygons into grids, filter non-intersecting rows, and manage raster file downloads. 5Additionally, it handles the merging of raster files and ensures the proper management of temporary files. 6The processed data is saved to specified directories for further use. 7""" 8 9try: 10 import dem_getter as dg 11except ImportError: 12 print("Failed to import the 'dem_getter' module. Please ensure that you have set up your environment correctly.") 13 print("For setup instructions, refer to the Setting Up the Dem Getter section in README document.") 14 15#Call demgetter fns to get data 16import sys 17import os 18from pathlib import Path 19# scriptDir = os.path.dirname(os.path.abspath(__file__)) 20# parentDir = os.path.dirname(scriptDir) 21 22# #Import Lidar Label Builder 23# sys.path.append(os.path.join(parentDir, 'lidar_label_builder')) 24 25sys.path.append(Path(__file__).resolve().parent.parent / 'lidar_label_builder') 26import lidar_label_builder as llb 27 28import geopandas as gpd 29from shapely.geometry import Polygon 30import numpy as np 31import json 32import shutil 33import time 34from requests.exceptions import HTTPError 35 36#Connect To Global Variables 37# global_vars = os.path.join(parentDir, 'configs', 'global_variables.json') 38 39# with open(global_vars, 'r') as f: 40# params_dict = json.load(f) 41 42with (Path(__file__).resolve().parent.parent / 'configs' / 'global_variables.json').open('r') as f: 43 params_dict = json.load(f) 44 45BATCH_DOWNLOAD_AREA = params_dict['BATCH_DOWNLOAD_AREA'] 46BATCH_DOWNLOAD_DATASET_NAME = params_dict['BATCH_DOWNLOAD_DATASET_NAME'] 47MERGED_RASTER_FILE_LABEL = params_dict['MERGED_RASTER_FILE_LABEL'] 48MERGED_RASTER_FOLDER = params_dict['MERGED_RASTER_FOLDER'] 49TEMP_RASTER_DOWNLOAD_FOLDER = params_dict['TEMP_RASTER_DOWNLOAD_FOLDER'] 50RSTR_COL_PATTERN = params_dict['RSTR_COL_PATTERN'] 51SMALL_GRID_DF_COL = params_dict['SMALL_GRID_DF_COL'] 52SMALL_GRID_DF_DIRECTORY = params_dict['SMALL_GRID_DF_DIRECTORY'] 53DEPLOY_MED_GRID_LABEL = params_dict['DEPLOY_MED_GRID_LABEL'] 54LABEL_FILE_EXT = params_dict['LABEL_FILE_EXT'] 55DEPLOY_SM_GRID_LABEL = params_dict['DEPLOY_SM_GRID_LABEL'] 56 57 58def load_polygon_gdb_and_convert_multipolygons(pathToAreaPolygon:str): 59 """Loads GeoDataFrame from a filepath of a geodatabase containing an area polygon or multipolygon 60 and converts any multipolygons to polygons. 61 62 Args: 63 pathToAreaPolygon (str): _description_ 64 65 Returns: 66 gdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry. 67 """ 68 # Load the GeoDataFrame from the specified file path 69 gdf = gpd.read_file(pathToAreaPolygon) 70 71 # Check if the GeoDataFrame contains multipolygons and, if so, explode them into individual polygons 72 return gdf.explode(index_parts=True).reset_index() if 'MultiPolygon' in gdf.geom_type.unique() else gdf 73 74def divide_polygon_into_square_grid(polygonGdf:gpd.GeoDataFrame, gridCellLength:int, i:int = 0): 75 """Takes a polygon in a geodataframe and divides it into a square grid of a desired size. 76 77 Args: 78 polygonGdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry. 79 gridCellLength (int): The desired length of gridcells in meters. 80 i (int): index value of the polygon to grid. Defaults to 0. 81 82 Returns: 83 squareGridDf (gpd.GeoDataFrame): A geopandas GeoDataframe of the input polygon divided into a square grid. Each row contains geometry for a single square grid cell. 84 """ 85 # Extract the coordinates of the polygon's exterior 86 x, y = polygonGdf['geometry'][i].exterior.coords.xy 87 88 # Determine the bounding box of the polygon 89 xmin, xmax, ymin, ymax = [min(x), max(x), min(y), max(y)] 90 91 # Create a list of x and y coordinates for the bottom-left corners of grid cells 92 xcoords = np.arange(xmin, xmax, gridCellLength) #create a list of the bottom left x coordinates for grids 93 ycoords = np.arange(ymin, ymax, gridCellLength) #create a list of the bottom left y coordinates for grids 94 95 gridCells = [] # List to store the grid cells 96 97 # Construct grid cells as polygons and add them to the list 98 for x in xcoords: 99 for y in ycoords: 100 gridCell = Polygon([(x,y),(x, y+ gridCellLength), 101 (x+gridCellLength, y+gridCellLength), 102 (x + gridCellLength, y), 103 (x,y)]) # Define the corners of the grid cell 104 gridCells.append(gridCell) # Append grid cell to list 105 106 # Retrieve the coordinate reference system (CRS) from the input GeoDataFrame 107 crs = polygonGdf.crs 108 109 # Create a GeoDataFrame with the grid cells and return it 110 return gpd.GeoDataFrame(geometry = gridCells, crs = crs) 111 112def calc_medium_grid_length(smallGridSize:int): 113 """Calculates a size of medium grid that is compatible with the small grid size based on the globally defined batch download area. 114 115 Args: 116 smallGridSize (int):The width of the small square grid cells. 117 118 Returns: 119 mediumGridLength (int): The calculated length of the medium grid cells. 120 """ 121 # Calculate the number of small grids that fit into the batch download area 122 nSmallGridsPerMedGrid = int(np.floor(BATCH_DOWNLOAD_AREA/(smallGridSize**2))) 123 124 # Calculate the number of small grids that can fit along one side of the medium grid 125 nSmallGridsLength = int(np.floor(np.sqrt(nSmallGridsPerMedGrid))) 126 127 # Calculate the length of the medium grid cell by multiplying the number of small grids per side by the size of a small grid 128 return nSmallGridsLength*smallGridSize 129 130def rm_non_intersecting_rows(targetDf:gpd.GeoDataFrame, intersectsPoly:gpd.GeoSeries): 131 """Removes rows from the target dataframe that do not intersect with the specified polygon. 132 133 Args: 134 targetDf (gpd.GeoDataFrame): The target GeoDataFrame. 135 intersectsPoly (gpd.GeoSeries): A GeoSeries containing the polygon(s) to check intersections against. 136 137 Returns: 138 outDf (gpd.GeoDataFrame): The filtered GeoDataFrame with non-intersecting rows removed. 139 """ 140 rows_to_drop = [] # List to store indices of rows that do not intersect 141 142 # Iterate over each row in the target GeoDataFrame 143 for i, row in targetDf.iterrows(): 144 # Check if the geometry of the current row intersects with the geometries in intersectsPoly 145 intersects = row['geometry'].intersects(intersectsPoly['geometry']).iloc[0] 146 147 # If there is no intersection, add the index to the list of rows to drop 148 if not intersects: 149 rows_to_drop.append(i) 150 151 # Drop rows from the target GeoDataFrame where there was no intersection 152 outDf = targetDf.drop(index=rows_to_drop).copy() 153 154 # Reset the index of the resulting GeoDataFrame 155 outDf.reset_index(drop=True, inplace=True) 156 157 return outDf 158 159def create_gridded_dfs_for_batch_download(polygonPath:str, smallGridSize:int, outDirectory:str): 160 """Creates gridded dataframes for batch downloading by dividing the input polygon area into grids of specified sizes. 161 The medium grid is used to batch download and the smaller grids are used to later crop the downloaded 162 data into small squares and apply a cnn model to each square. 163 Returns the small grids as a list and a geopandas GeoDataFrame where each row contains the paths 164 to the corresponding small grid for that area. 165 166 Args: 167 polygonPath (str): The file path to the input polygon. 168 smallGridSize (int): The size of the small grid. 169 outDirectory (str): The output directory to save the gridded dataframes. 170 171 Returns: 172 clippedSmallGridDfs (list): A list of small gridded GeoDataFrames. 173 filteredMedGridDf (gpd.GeoDataFrame): The filtered medium grid GeoDataFrame. 174 """ 175 176 bboxPolygon = load_polygon_gdb_and_convert_multipolygons(polygonPath)[0:] #load extent polygon making sure multipolygons removed 177 smallGriddedDf = divide_polygon_into_square_grid(bboxPolygon, smallGridSize) #create a small grid across the entire input polygon 178 #Calculate the length of the medium grid cells that will 1) be an appropriate download size and 2) contain a whole number of small grids (i.e. no small grids are partially in any medium grids) 179 medGridLength = calc_medium_grid_length(smallGridSize) 180 medGridDf = divide_polygon_into_square_grid(bboxPolygon, medGridLength) #creare a medium grid across the entire input polygon based on the length returned from the previous function 181 182 #Remove any small and medium grids that are not at least partially touching the input polygon 183 filteredSmallGridDf = rm_non_intersecting_rows(smallGriddedDf, bboxPolygon) 184 filteredMedGridDf = rm_non_intersecting_rows(medGridDf, bboxPolygon) 185 186 187 clippedSmallGridDfPaths = [] 188 clippedSmallGridDfs = [] 189 clippedMedGrids = [] 190 191 for i, medPolygon in filteredMedGridDf.iterrows(): 192 #Isolating only the grids within each medium Polygon 193 clippedSmallGridDf = gpd.clip(filteredSmallGridDf, medPolygon['geometry']) 194 clippedSmallGridDf = clippedSmallGridDf[clippedSmallGridDf['geometry'].geom_type == 'Polygon'] 195 clippedSmallGridDf.reset_index(drop = True, inplace = True) 196 197 #Now make the medium grids so they are a combination of the small grids within the original medium Grid to make sure they perfectly overlap. 198 # This also converts them to an irregular shape around the edges of non rectangular polygons. 199 clippedMedGrid = clippedSmallGridDf.unary_union 200 clippedMedGrids.append(clippedMedGrid) 201 202 #SaveSmallGridDf 203 fname = f'{DEPLOY_SM_GRID_LABEL}{i}.shp' 204 if outDirectory: 205 outDir = os.path.join(outDirectory, SMALL_GRID_DF_DIRECTORY) 206 else: 207 outDir = SMALL_GRID_DF_DIRECTORY 208 if not os.path.exists(outDir): 209 os.makedirs(outDir) 210 outPath = os.path.join(outDir, fname) 211 clippedSmallGridDf.to_file(outPath) 212 clippedSmallGridDfPaths.append(outPath) 213 clippedSmallGridDfs.append(clippedSmallGridDf) 214 215 filteredMedGridDf['geometry'] = clippedMedGrids 216 filteredMedGridDf[SMALL_GRID_DF_COL]= clippedSmallGridDfPaths 217 218 #If any multipolygons were created in the medium gridded db handle these 219 if 'MultiPolygon' in filteredMedGridDf['geometry'].geom_type.unique(): 220 print('Handling MultiPolygons...') 221 MultiPolyIndices = [] 222 for i,row in filteredMedGridDf.iterrows(): 223 if row['geometry'].geom_type == 'MultiPolygon': 224 MultiPolyIndices.append(i)#creates a list of indices where there are Multipolygons 225 226 explodedDf = filteredMedGridDf.explode(index_parts=True).reset_index(drop=True) 227 #Loops through the original multipolygon and searches for rows that have the same small grid filepath as the original multipolygon row 228 for index in MultiPolyIndices: 229 matchingPathIndices = [] 230 pathToSmallGridFile = filteredMedGridDf.loc[index, SMALL_GRID_DF_COL] 231 origSmallGrid = gpd.read_file(pathToSmallGridFile) 232 base, ext = os.path.splitext(pathToSmallGridFile) 233 for i, row in explodedDf.iterrows(): 234 if row[SMALL_GRID_DF_COL] == pathToSmallGridFile: 235 matchingPathIndices.append(i) 236 for i in matchingPathIndices: 237 #Clip the original grid to the new geometry 238 reclippedSmallGridDf = gpd.clip(origSmallGrid, explodedDf.loc[i, 'geometry']) 239 reclippedSmallGridDf = reclippedSmallGridDf[reclippedSmallGridDf['geometry'].geom_type == 'Polygon'] # Removes any non polygon geometry 240 reclippedSmallGridDf.reset_index(drop=True, inplace = True) 241 242 savePath = f"{base}_clip{i}_{ext}" 243 reclippedSmallGridDf.to_file(savePath) 244 explodedDf.at[i, SMALL_GRID_DF_COL] = savePath 245 filteredMedGridDf = explodedDf.copy() 246 247 return clippedSmallGridDfs, filteredMedGridDf 248 249def batch_download_and_merge(gdf:gpd.GeoDataFrame, i:int, outDirectory:str, 250 rasterLabel:str = MERGED_RASTER_FILE_LABEL, deleteTempFiles:bool = True, overwriteExisting:bool=False): 251 """ 252 Downloads and merges raster files in batches for the specified grid cell. 253 254 Args: 255 gdf (gpd.GeoDataFrame): The GeoDataFrame containing grid cell geometry. 256 i (int): The index of the grid cell to process. 257 outDirectory (str): The output directory to save the merged raster files. 258 rasterLabel (str, optional): The label for the merged raster file. Defaults to MERGED_RASTER_FILE_LABEL. 259 deleteTempFiles (bool, optional): Whether to delete temporary files after merging. Defaults to True. 260 overwriteExisting (bool, optional): Whether to overwrite existing raster files. Defaults to False. 261 262 Returns: 263 outPath (str): The path to the merged raster file. 264 newInfo (bool): Whether new information was added to the GeoDataFrame. 265 """ 266 267 #Set a marker for if new information is added to the gdf. When new information, the batch download chunks function will save the file. 268 newInfo = True 269 270 #Initialize the raster column if not yet existing 271 if not RSTR_COL_PATTERN in gdf.columns: 272 gdf[RSTR_COL_PATTERN] = '' 273 274 #If there is already a path or 'No Path' string in this row, skip the download for this row. 275 # Returns false for new info to avoid redundant saving. 276 if RSTR_COL_PATTERN in gdf.columns and not overwriteExisting: 277 existingPath = gdf.at[i, RSTR_COL_PATTERN] 278 if existingPath: 279 print(f"Skipping batch download and merge for index {i} as the path already exists. {existingPath}") 280 newInfo = False 281 return existingPath, newInfo 282 283 #Set unique file path for the merged raster 284 fname = f'{rasterLabel}_{i}.tif' 285 outPath = os.path.join(outDirectory, MERGED_RASTER_FOLDER, fname) 286 287 #Check if there is already a merged raster for this index, if overwriteExisting is not set to true 288 # return this path as the outPath for this index and skip the download. 289 if os.path.exists(outPath) and not overwriteExisting: 290 print(f'Existing path detected for idx {i}. Skipping batch download.') 291 return outPath, newInfo 292 293 #A check for multipolygons (which should have been removed previously) to avoid an unwanted error 294 # since the merge_warp is not compatible with multipolygons 295 if gdf['geometry'][i] == 'MultiPolygon': 296 print(f'Multipolygon at idx {i}. Batch download skipped to avoid errors.') 297 outPath = 'No Path' 298 return outPath, newInfo 299 300 #Attempt to do a batch download 301 try: 302 #Get epsg code 303 epsg = llb.get_spatial_ref_from_shapefile(gdf)[1] 304 305 #Send request to get paths for the min max x y extents of the geometry in this row 306 #Set 10 attempts to get the path, this will help bypass some server timeouts that can happen with requests 307 maxAttempts = 10 308 for attempt in range(maxAttempts): 309 try: 310 paths = dg.get_aws_paths_from_geodataframe(BATCH_DOWNLOAD_DATASET_NAME, gdf, rowIdx=i) 311 if paths is None: 312 break 313 break 314 except HTTPError: 315 time.sleep(5) #retry after 5 seconds if we get some error 316 317 #Set save directory and create it if it doesnt exist. 318 saveDir = os.path.join(outDirectory, MERGED_RASTER_FOLDER) 319 if not os.path.exists(saveDir): 320 os.makedirs(saveDir) 321 322 #If paths were returned download these paths and merge them to the min max x y extent of the geometry in this row 323 if paths: 324 print(f'Batch Downloading for idx: {i}...') 325 tempSaveDir = os.path.join(outDirectory, TEMP_RASTER_DOWNLOAD_FOLDER, str(i)) 326 if not os.path.exists(tempSaveDir): 327 os.makedirs(tempSaveDir) 328 filelist = dg.batch_download(paths, tempSaveDir, doForceDownload = True) 329 fname = f'{rasterLabel}_{i}.tif' 330 outPath = os.path.join(saveDir, fname) 331 x,y = gdf['geometry'][i].exterior.coords.xy 332 mergeExtent = ([min(x), max(x)],[min(y),max(y)]) 333 334 print(f'Merging DEMS for idx: {i}...') 335 dg.merge_warp_dems(filelist, outPath, mergeExtent, epsg) 336 337 #This removes the downloaded products if deleteTempFiles is True 338 if deleteTempFiles: 339 if os.path.exists(tempSaveDir): 340 shutil.rmtree(tempSaveDir) 341 #Returns a string 'No Path' if the aws request found no products or this area 342 else: 343 outPath = 'No Path' 344 345 #If an error occurred during the batch download and persists after maxAttemps, prints the error 346 # statement and returns a blank outPath so this idx download can be reattemped by running this function again 347 except Exception as e: 348 print(f'An error occurred with batch download for idx:{i}: {e}') 349 outPath= '' #If an error occurs, leave the outPath blank so that when the function is run again it will retry this request. 350 newInfo = False 351 return outPath, newInfo 352 353def batch_download_chunks(polygonPath:str, smallGridSize:int, 354 label:str = None, outDirectory:str=None, deleteTempFiles:bool = True, overwriteExisting:bool=False): 355 """Takes an input polygon area, divides it into medium grids for batch downloads and smaller grids for labeling, 356 then downloads the products in batches using the dem getter tool. Outputs a dataframe with the geometry for the 357 medium grid with a path to a raster file or none if no products were available and a path to the small gridded database for labeling. 358 359 Args: 360 polygonPath (str): The file path to the input polygon. 361 smallGridSize (int): The size of the small grid. 362 label (str, optional): The label for the output files. Defaults to None. 363 outDirectory (str, optional): The output directory to save the files. When none places the output files into the same directory as the 364 input polygon. Defaults to None. 365 deleteTempFiles (bool, optional): Whether to delete temporary files after processing. Defaults to True. 366 overwriteExisting (bool, optional): Whether to overwrite existing files if detected. Defaults to False. 367 368 Returns: 369 medGridDf (gpd.GeoDataFrame): The GeoDataFrame with medium grid geometry and paths to raster files. 370 """ 371 #If no label was specified get the label from the name of the polygon 372 if label is None: 373 polygonDir, fname = os.path.split(polygonPath) 374 label = os.path.splitext(fname)[0] 375 376 #If no save directory specified, place in the same directory as the input polygon 377 if not outDirectory: 378 outDirectory = polygonDir 379 380 dfOutPath = os.path.join(outDirectory, f'{label}{DEPLOY_MED_GRID_LABEL}{LABEL_FILE_EXT}') 381 382 #Look for a preexisting file and load if exists, otherwise make a the gridded dfs for batch download. 383 if os.path.exists(dfOutPath): 384 (print("Loading Gridded Df...")) 385 medGridDf = gpd.read_file(dfOutPath) 386 else: 387 print("Making Gridded Df...") 388 medGridDf = create_gridded_dfs_for_batch_download(polygonPath, smallGridSize, outDirectory)[1] 389 390 medGridDf.to_file(dfOutPath, truncation=False) 391 392 #Loop through the df and batch download/merge for each row 393 length = len(medGridDf) 394 for i, _ in medGridDf.iterrows(): 395 #Check for multipolygons (which should have been removed already) 396 if medGridDf.iloc[i]['geometry'] == 'MultiPolygon': 397 outPath = None 398 399 outPath, newInfo = batch_download_and_merge(medGridDf, i, outDirectory, deleteTempFiles = deleteTempFiles, overwriteExisting=overwriteExisting) 400 401 #Resave the df with updated info if new information was gained from the batch_download_and_merge fn. 402 if newInfo: 403 medGridDf.at[i,RSTR_COL_PATTERN] = outPath 404 if len(medGridDf) == length: 405 medGridDf.to_file(dfOutPath, truncation=False) 406 print(f'df with updated filepaths for index {i} saved to {dfOutPath}') 407 408 #Removes the merged raster folder, which should be empty at this point if deleteTempFiles is True 409 if deleteTempFiles: 410 tempDir = os.path.join(outDirectory, TEMP_RASTER_DOWNLOAD_FOLDER) 411 if os.path.exists(tempDir): 412 shutil.rmtree(tempDir) 413 414 #Final save 415 medGridDf.to_file(dfOutPath, truncation=False) 416 print(f'chunked batch download dataframe saved to: {dfOutPath}') 417 418 return medGridDf 419 420if __name__ == '__main__': 421 import sys 422 import json 423 424 # Load parameters from the JSON file 425 params = sys.argv[1] 426 with open(params, 'r') as f: 427 params_dict = json.load(f) 428 429 start = time.time() 430 batch_download_chunks(**params_dict) 431 end = time.time() 432 433 print(f'Time to batch download: {end-start}')
59def load_polygon_gdb_and_convert_multipolygons(pathToAreaPolygon:str): 60 """Loads GeoDataFrame from a filepath of a geodatabase containing an area polygon or multipolygon 61 and converts any multipolygons to polygons. 62 63 Args: 64 pathToAreaPolygon (str): _description_ 65 66 Returns: 67 gdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry. 68 """ 69 # Load the GeoDataFrame from the specified file path 70 gdf = gpd.read_file(pathToAreaPolygon) 71 72 # Check if the GeoDataFrame contains multipolygons and, if so, explode them into individual polygons 73 return gdf.explode(index_parts=True).reset_index() if 'MultiPolygon' in gdf.geom_type.unique() else gdf
Loads GeoDataFrame from a filepath of a geodatabase containing an area polygon or multipolygon and converts any multipolygons to polygons.
Arguments:
- pathToAreaPolygon (str): _description_
Returns:
gdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry.
75def divide_polygon_into_square_grid(polygonGdf:gpd.GeoDataFrame, gridCellLength:int, i:int = 0): 76 """Takes a polygon in a geodataframe and divides it into a square grid of a desired size. 77 78 Args: 79 polygonGdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry. 80 gridCellLength (int): The desired length of gridcells in meters. 81 i (int): index value of the polygon to grid. Defaults to 0. 82 83 Returns: 84 squareGridDf (gpd.GeoDataFrame): A geopandas GeoDataframe of the input polygon divided into a square grid. Each row contains geometry for a single square grid cell. 85 """ 86 # Extract the coordinates of the polygon's exterior 87 x, y = polygonGdf['geometry'][i].exterior.coords.xy 88 89 # Determine the bounding box of the polygon 90 xmin, xmax, ymin, ymax = [min(x), max(x), min(y), max(y)] 91 92 # Create a list of x and y coordinates for the bottom-left corners of grid cells 93 xcoords = np.arange(xmin, xmax, gridCellLength) #create a list of the bottom left x coordinates for grids 94 ycoords = np.arange(ymin, ymax, gridCellLength) #create a list of the bottom left y coordinates for grids 95 96 gridCells = [] # List to store the grid cells 97 98 # Construct grid cells as polygons and add them to the list 99 for x in xcoords: 100 for y in ycoords: 101 gridCell = Polygon([(x,y),(x, y+ gridCellLength), 102 (x+gridCellLength, y+gridCellLength), 103 (x + gridCellLength, y), 104 (x,y)]) # Define the corners of the grid cell 105 gridCells.append(gridCell) # Append grid cell to list 106 107 # Retrieve the coordinate reference system (CRS) from the input GeoDataFrame 108 crs = polygonGdf.crs 109 110 # Create a GeoDataFrame with the grid cells and return it 111 return gpd.GeoDataFrame(geometry = gridCells, crs = crs)
Takes a polygon in a geodataframe and divides it into a square grid of a desired size.
Arguments:
- polygonGdf (gpd.GeoDataFrame): A geopandas GeoDatarame containing polygon geometry.
- gridCellLength (int): The desired length of gridcells in meters.
- i (int): index value of the polygon to grid. Defaults to 0.
Returns:
squareGridDf (gpd.GeoDataFrame): A geopandas GeoDataframe of the input polygon divided into a square grid. Each row contains geometry for a single square grid cell.
113def calc_medium_grid_length(smallGridSize:int): 114 """Calculates a size of medium grid that is compatible with the small grid size based on the globally defined batch download area. 115 116 Args: 117 smallGridSize (int):The width of the small square grid cells. 118 119 Returns: 120 mediumGridLength (int): The calculated length of the medium grid cells. 121 """ 122 # Calculate the number of small grids that fit into the batch download area 123 nSmallGridsPerMedGrid = int(np.floor(BATCH_DOWNLOAD_AREA/(smallGridSize**2))) 124 125 # Calculate the number of small grids that can fit along one side of the medium grid 126 nSmallGridsLength = int(np.floor(np.sqrt(nSmallGridsPerMedGrid))) 127 128 # Calculate the length of the medium grid cell by multiplying the number of small grids per side by the size of a small grid 129 return nSmallGridsLength*smallGridSize
Calculates a size of medium grid that is compatible with the small grid size based on the globally defined batch download area.
Arguments:
- smallGridSize (int): The width of the small square grid cells.
Returns:
mediumGridLength (int): The calculated length of the medium grid cells.
131def rm_non_intersecting_rows(targetDf:gpd.GeoDataFrame, intersectsPoly:gpd.GeoSeries): 132 """Removes rows from the target dataframe that do not intersect with the specified polygon. 133 134 Args: 135 targetDf (gpd.GeoDataFrame): The target GeoDataFrame. 136 intersectsPoly (gpd.GeoSeries): A GeoSeries containing the polygon(s) to check intersections against. 137 138 Returns: 139 outDf (gpd.GeoDataFrame): The filtered GeoDataFrame with non-intersecting rows removed. 140 """ 141 rows_to_drop = [] # List to store indices of rows that do not intersect 142 143 # Iterate over each row in the target GeoDataFrame 144 for i, row in targetDf.iterrows(): 145 # Check if the geometry of the current row intersects with the geometries in intersectsPoly 146 intersects = row['geometry'].intersects(intersectsPoly['geometry']).iloc[0] 147 148 # If there is no intersection, add the index to the list of rows to drop 149 if not intersects: 150 rows_to_drop.append(i) 151 152 # Drop rows from the target GeoDataFrame where there was no intersection 153 outDf = targetDf.drop(index=rows_to_drop).copy() 154 155 # Reset the index of the resulting GeoDataFrame 156 outDf.reset_index(drop=True, inplace=True) 157 158 return outDf
Removes rows from the target dataframe that do not intersect with the specified polygon.
Arguments:
- targetDf (gpd.GeoDataFrame): The target GeoDataFrame.
- intersectsPoly (gpd.GeoSeries): A GeoSeries containing the polygon(s) to check intersections against.
Returns:
outDf (gpd.GeoDataFrame): The filtered GeoDataFrame with non-intersecting rows removed.
160def create_gridded_dfs_for_batch_download(polygonPath:str, smallGridSize:int, outDirectory:str): 161 """Creates gridded dataframes for batch downloading by dividing the input polygon area into grids of specified sizes. 162 The medium grid is used to batch download and the smaller grids are used to later crop the downloaded 163 data into small squares and apply a cnn model to each square. 164 Returns the small grids as a list and a geopandas GeoDataFrame where each row contains the paths 165 to the corresponding small grid for that area. 166 167 Args: 168 polygonPath (str): The file path to the input polygon. 169 smallGridSize (int): The size of the small grid. 170 outDirectory (str): The output directory to save the gridded dataframes. 171 172 Returns: 173 clippedSmallGridDfs (list): A list of small gridded GeoDataFrames. 174 filteredMedGridDf (gpd.GeoDataFrame): The filtered medium grid GeoDataFrame. 175 """ 176 177 bboxPolygon = load_polygon_gdb_and_convert_multipolygons(polygonPath)[0:] #load extent polygon making sure multipolygons removed 178 smallGriddedDf = divide_polygon_into_square_grid(bboxPolygon, smallGridSize) #create a small grid across the entire input polygon 179 #Calculate the length of the medium grid cells that will 1) be an appropriate download size and 2) contain a whole number of small grids (i.e. no small grids are partially in any medium grids) 180 medGridLength = calc_medium_grid_length(smallGridSize) 181 medGridDf = divide_polygon_into_square_grid(bboxPolygon, medGridLength) #creare a medium grid across the entire input polygon based on the length returned from the previous function 182 183 #Remove any small and medium grids that are not at least partially touching the input polygon 184 filteredSmallGridDf = rm_non_intersecting_rows(smallGriddedDf, bboxPolygon) 185 filteredMedGridDf = rm_non_intersecting_rows(medGridDf, bboxPolygon) 186 187 188 clippedSmallGridDfPaths = [] 189 clippedSmallGridDfs = [] 190 clippedMedGrids = [] 191 192 for i, medPolygon in filteredMedGridDf.iterrows(): 193 #Isolating only the grids within each medium Polygon 194 clippedSmallGridDf = gpd.clip(filteredSmallGridDf, medPolygon['geometry']) 195 clippedSmallGridDf = clippedSmallGridDf[clippedSmallGridDf['geometry'].geom_type == 'Polygon'] 196 clippedSmallGridDf.reset_index(drop = True, inplace = True) 197 198 #Now make the medium grids so they are a combination of the small grids within the original medium Grid to make sure they perfectly overlap. 199 # This also converts them to an irregular shape around the edges of non rectangular polygons. 200 clippedMedGrid = clippedSmallGridDf.unary_union 201 clippedMedGrids.append(clippedMedGrid) 202 203 #SaveSmallGridDf 204 fname = f'{DEPLOY_SM_GRID_LABEL}{i}.shp' 205 if outDirectory: 206 outDir = os.path.join(outDirectory, SMALL_GRID_DF_DIRECTORY) 207 else: 208 outDir = SMALL_GRID_DF_DIRECTORY 209 if not os.path.exists(outDir): 210 os.makedirs(outDir) 211 outPath = os.path.join(outDir, fname) 212 clippedSmallGridDf.to_file(outPath) 213 clippedSmallGridDfPaths.append(outPath) 214 clippedSmallGridDfs.append(clippedSmallGridDf) 215 216 filteredMedGridDf['geometry'] = clippedMedGrids 217 filteredMedGridDf[SMALL_GRID_DF_COL]= clippedSmallGridDfPaths 218 219 #If any multipolygons were created in the medium gridded db handle these 220 if 'MultiPolygon' in filteredMedGridDf['geometry'].geom_type.unique(): 221 print('Handling MultiPolygons...') 222 MultiPolyIndices = [] 223 for i,row in filteredMedGridDf.iterrows(): 224 if row['geometry'].geom_type == 'MultiPolygon': 225 MultiPolyIndices.append(i)#creates a list of indices where there are Multipolygons 226 227 explodedDf = filteredMedGridDf.explode(index_parts=True).reset_index(drop=True) 228 #Loops through the original multipolygon and searches for rows that have the same small grid filepath as the original multipolygon row 229 for index in MultiPolyIndices: 230 matchingPathIndices = [] 231 pathToSmallGridFile = filteredMedGridDf.loc[index, SMALL_GRID_DF_COL] 232 origSmallGrid = gpd.read_file(pathToSmallGridFile) 233 base, ext = os.path.splitext(pathToSmallGridFile) 234 for i, row in explodedDf.iterrows(): 235 if row[SMALL_GRID_DF_COL] == pathToSmallGridFile: 236 matchingPathIndices.append(i) 237 for i in matchingPathIndices: 238 #Clip the original grid to the new geometry 239 reclippedSmallGridDf = gpd.clip(origSmallGrid, explodedDf.loc[i, 'geometry']) 240 reclippedSmallGridDf = reclippedSmallGridDf[reclippedSmallGridDf['geometry'].geom_type == 'Polygon'] # Removes any non polygon geometry 241 reclippedSmallGridDf.reset_index(drop=True, inplace = True) 242 243 savePath = f"{base}_clip{i}_{ext}" 244 reclippedSmallGridDf.to_file(savePath) 245 explodedDf.at[i, SMALL_GRID_DF_COL] = savePath 246 filteredMedGridDf = explodedDf.copy() 247 248 return clippedSmallGridDfs, filteredMedGridDf
Creates gridded dataframes for batch downloading by dividing the input polygon area into grids of specified sizes. The medium grid is used to batch download and the smaller grids are used to later crop the downloaded data into small squares and apply a cnn model to each square. Returns the small grids as a list and a geopandas GeoDataFrame where each row contains the paths to the corresponding small grid for that area.
Arguments:
- polygonPath (str): The file path to the input polygon.
- smallGridSize (int): The size of the small grid.
- outDirectory (str): The output directory to save the gridded dataframes.
Returns:
clippedSmallGridDfs (list): A list of small gridded GeoDataFrames. filteredMedGridDf (gpd.GeoDataFrame): The filtered medium grid GeoDataFrame.
250def batch_download_and_merge(gdf:gpd.GeoDataFrame, i:int, outDirectory:str, 251 rasterLabel:str = MERGED_RASTER_FILE_LABEL, deleteTempFiles:bool = True, overwriteExisting:bool=False): 252 """ 253 Downloads and merges raster files in batches for the specified grid cell. 254 255 Args: 256 gdf (gpd.GeoDataFrame): The GeoDataFrame containing grid cell geometry. 257 i (int): The index of the grid cell to process. 258 outDirectory (str): The output directory to save the merged raster files. 259 rasterLabel (str, optional): The label for the merged raster file. Defaults to MERGED_RASTER_FILE_LABEL. 260 deleteTempFiles (bool, optional): Whether to delete temporary files after merging. Defaults to True. 261 overwriteExisting (bool, optional): Whether to overwrite existing raster files. Defaults to False. 262 263 Returns: 264 outPath (str): The path to the merged raster file. 265 newInfo (bool): Whether new information was added to the GeoDataFrame. 266 """ 267 268 #Set a marker for if new information is added to the gdf. When new information, the batch download chunks function will save the file. 269 newInfo = True 270 271 #Initialize the raster column if not yet existing 272 if not RSTR_COL_PATTERN in gdf.columns: 273 gdf[RSTR_COL_PATTERN] = '' 274 275 #If there is already a path or 'No Path' string in this row, skip the download for this row. 276 # Returns false for new info to avoid redundant saving. 277 if RSTR_COL_PATTERN in gdf.columns and not overwriteExisting: 278 existingPath = gdf.at[i, RSTR_COL_PATTERN] 279 if existingPath: 280 print(f"Skipping batch download and merge for index {i} as the path already exists. {existingPath}") 281 newInfo = False 282 return existingPath, newInfo 283 284 #Set unique file path for the merged raster 285 fname = f'{rasterLabel}_{i}.tif' 286 outPath = os.path.join(outDirectory, MERGED_RASTER_FOLDER, fname) 287 288 #Check if there is already a merged raster for this index, if overwriteExisting is not set to true 289 # return this path as the outPath for this index and skip the download. 290 if os.path.exists(outPath) and not overwriteExisting: 291 print(f'Existing path detected for idx {i}. Skipping batch download.') 292 return outPath, newInfo 293 294 #A check for multipolygons (which should have been removed previously) to avoid an unwanted error 295 # since the merge_warp is not compatible with multipolygons 296 if gdf['geometry'][i] == 'MultiPolygon': 297 print(f'Multipolygon at idx {i}. Batch download skipped to avoid errors.') 298 outPath = 'No Path' 299 return outPath, newInfo 300 301 #Attempt to do a batch download 302 try: 303 #Get epsg code 304 epsg = llb.get_spatial_ref_from_shapefile(gdf)[1] 305 306 #Send request to get paths for the min max x y extents of the geometry in this row 307 #Set 10 attempts to get the path, this will help bypass some server timeouts that can happen with requests 308 maxAttempts = 10 309 for attempt in range(maxAttempts): 310 try: 311 paths = dg.get_aws_paths_from_geodataframe(BATCH_DOWNLOAD_DATASET_NAME, gdf, rowIdx=i) 312 if paths is None: 313 break 314 break 315 except HTTPError: 316 time.sleep(5) #retry after 5 seconds if we get some error 317 318 #Set save directory and create it if it doesnt exist. 319 saveDir = os.path.join(outDirectory, MERGED_RASTER_FOLDER) 320 if not os.path.exists(saveDir): 321 os.makedirs(saveDir) 322 323 #If paths were returned download these paths and merge them to the min max x y extent of the geometry in this row 324 if paths: 325 print(f'Batch Downloading for idx: {i}...') 326 tempSaveDir = os.path.join(outDirectory, TEMP_RASTER_DOWNLOAD_FOLDER, str(i)) 327 if not os.path.exists(tempSaveDir): 328 os.makedirs(tempSaveDir) 329 filelist = dg.batch_download(paths, tempSaveDir, doForceDownload = True) 330 fname = f'{rasterLabel}_{i}.tif' 331 outPath = os.path.join(saveDir, fname) 332 x,y = gdf['geometry'][i].exterior.coords.xy 333 mergeExtent = ([min(x), max(x)],[min(y),max(y)]) 334 335 print(f'Merging DEMS for idx: {i}...') 336 dg.merge_warp_dems(filelist, outPath, mergeExtent, epsg) 337 338 #This removes the downloaded products if deleteTempFiles is True 339 if deleteTempFiles: 340 if os.path.exists(tempSaveDir): 341 shutil.rmtree(tempSaveDir) 342 #Returns a string 'No Path' if the aws request found no products or this area 343 else: 344 outPath = 'No Path' 345 346 #If an error occurred during the batch download and persists after maxAttemps, prints the error 347 # statement and returns a blank outPath so this idx download can be reattemped by running this function again 348 except Exception as e: 349 print(f'An error occurred with batch download for idx:{i}: {e}') 350 outPath= '' #If an error occurs, leave the outPath blank so that when the function is run again it will retry this request. 351 newInfo = False 352 return outPath, newInfo
Downloads and merges raster files in batches for the specified grid cell.
Arguments:
- gdf (gpd.GeoDataFrame): The GeoDataFrame containing grid cell geometry.
- i (int): The index of the grid cell to process.
- outDirectory (str): The output directory to save the merged raster files.
- rasterLabel (str, optional): The label for the merged raster file. Defaults to MERGED_RASTER_FILE_LABEL.
- deleteTempFiles (bool, optional): Whether to delete temporary files after merging. Defaults to True.
- overwriteExisting (bool, optional): Whether to overwrite existing raster files. Defaults to False.
Returns:
outPath (str): The path to the merged raster file. newInfo (bool): Whether new information was added to the GeoDataFrame.
354def batch_download_chunks(polygonPath:str, smallGridSize:int, 355 label:str = None, outDirectory:str=None, deleteTempFiles:bool = True, overwriteExisting:bool=False): 356 """Takes an input polygon area, divides it into medium grids for batch downloads and smaller grids for labeling, 357 then downloads the products in batches using the dem getter tool. Outputs a dataframe with the geometry for the 358 medium grid with a path to a raster file or none if no products were available and a path to the small gridded database for labeling. 359 360 Args: 361 polygonPath (str): The file path to the input polygon. 362 smallGridSize (int): The size of the small grid. 363 label (str, optional): The label for the output files. Defaults to None. 364 outDirectory (str, optional): The output directory to save the files. When none places the output files into the same directory as the 365 input polygon. Defaults to None. 366 deleteTempFiles (bool, optional): Whether to delete temporary files after processing. Defaults to True. 367 overwriteExisting (bool, optional): Whether to overwrite existing files if detected. Defaults to False. 368 369 Returns: 370 medGridDf (gpd.GeoDataFrame): The GeoDataFrame with medium grid geometry and paths to raster files. 371 """ 372 #If no label was specified get the label from the name of the polygon 373 if label is None: 374 polygonDir, fname = os.path.split(polygonPath) 375 label = os.path.splitext(fname)[0] 376 377 #If no save directory specified, place in the same directory as the input polygon 378 if not outDirectory: 379 outDirectory = polygonDir 380 381 dfOutPath = os.path.join(outDirectory, f'{label}{DEPLOY_MED_GRID_LABEL}{LABEL_FILE_EXT}') 382 383 #Look for a preexisting file and load if exists, otherwise make a the gridded dfs for batch download. 384 if os.path.exists(dfOutPath): 385 (print("Loading Gridded Df...")) 386 medGridDf = gpd.read_file(dfOutPath) 387 else: 388 print("Making Gridded Df...") 389 medGridDf = create_gridded_dfs_for_batch_download(polygonPath, smallGridSize, outDirectory)[1] 390 391 medGridDf.to_file(dfOutPath, truncation=False) 392 393 #Loop through the df and batch download/merge for each row 394 length = len(medGridDf) 395 for i, _ in medGridDf.iterrows(): 396 #Check for multipolygons (which should have been removed already) 397 if medGridDf.iloc[i]['geometry'] == 'MultiPolygon': 398 outPath = None 399 400 outPath, newInfo = batch_download_and_merge(medGridDf, i, outDirectory, deleteTempFiles = deleteTempFiles, overwriteExisting=overwriteExisting) 401 402 #Resave the df with updated info if new information was gained from the batch_download_and_merge fn. 403 if newInfo: 404 medGridDf.at[i,RSTR_COL_PATTERN] = outPath 405 if len(medGridDf) == length: 406 medGridDf.to_file(dfOutPath, truncation=False) 407 print(f'df with updated filepaths for index {i} saved to {dfOutPath}') 408 409 #Removes the merged raster folder, which should be empty at this point if deleteTempFiles is True 410 if deleteTempFiles: 411 tempDir = os.path.join(outDirectory, TEMP_RASTER_DOWNLOAD_FOLDER) 412 if os.path.exists(tempDir): 413 shutil.rmtree(tempDir) 414 415 #Final save 416 medGridDf.to_file(dfOutPath, truncation=False) 417 print(f'chunked batch download dataframe saved to: {dfOutPath}') 418 419 return medGridDf
Takes an input polygon area, divides it into medium grids for batch downloads and smaller grids for labeling, then downloads the products in batches using the dem getter tool. Outputs a dataframe with the geometry for the medium grid with a path to a raster file or none if no products were available and a path to the small gridded database for labeling.
Arguments:
- polygonPath (str): The file path to the input polygon.
- smallGridSize (int): The size of the small grid.
- label (str, optional): The label for the output files. Defaults to None.
- outDirectory (str, optional): The output directory to save the files. When none places the output files into the same directory as the
- input polygon. Defaults to None.
- deleteTempFiles (bool, optional): Whether to delete temporary files after processing. Defaults to True.
- overwriteExisting (bool, optional): Whether to overwrite existing files if detected. Defaults to False.
Returns:
medGridDf (gpd.GeoDataFrame): The GeoDataFrame with medium grid geometry and paths to raster files.