ArcGIS & Python

Monday, October 15, 2012

Circular binary structure for SciPy morphological operations

I wanted to try buffering a mask raster, without converting to a shapefile and then back, so I used the SciPy Multi-dimensional Image Module's binary_dilation function. I then used the following function to generate a circular binary structure array of the desired radius:

import numpy as np

def circular_structure(radius):
    size = radius*2+1
    i,j = np.mgrid[0:size, 0:size]
    i -= (size/2)
    j -= (size/2)
    return np.sqrt(i**2+j**2) <= radius

Monday, September 10, 2012

ArcGIS 10.0 Implementation of the Canny Edge Detector

The following is my attempt at an almost pure ArcGIS implementation of the Canny Edge Detector. This would be much easier to apply in Python/NumPy (or in 1 line using the canny function in scikits), but I wanted to be able to be able to apply it to very large rasters that exceed the 2GB limit of the 32bit NumPy that is installed with ArcGIS.

I found this site and this site both really helpful for actually implementing the Canny Edge Detetor. Definitely refer to both of them for more examples and pictures.

I did use Python/NumPy for generating the Gaussian Filter kernel files that were needed for the ArcGIS Focal Statistics function. The following code will generate a properly formatted kernel file based on the kernel size and sigma value.

import os
import numpy as np
def gaussian_kernel(workspace, size=5, sigma=1.4):
    sigma_str = str(round(sigma,1)).replace('.', 'p')
    file_name = 'gaussian_{0}_{1}.txt'.format(size, sigma_str)
    size = int(size)

    i,j = np.mgrid[0:size,0:size]
    i -= (size/2)
    j -= (size/2)
    g = np.exp(-(i**2+j**2)/(2*sigma**2))
    g /= g.sum()
    
    f = open(os.path.join(workspace, file_name), 'wb')
    f.write('{0} {0}\n'.format(size))
    for row in g:
        print '  {0}'.format(' '.join(['{0:0.6f}'.format(x) for x in row]))
        f.write('{0}\n'.format(' '.join(['{0:0.6f}'.format(x) for x in row])))
    f.close()

I then had to build kernel files for the Sobel Operators.

sobel_x.txt	sobel_y.txt
`3 3 -1 0 1 -2 0 2 -1 0 1`	`3 3 -1 -2 -1 0 0 0 1 2 1`

Finally, I had to build 8 kernel files for shifting the raster 1 cell in each of the primary directions for the non-maxima suppression calculation. I used this approach instead of using the Data Management Shift Function so that I would get spatial analyst raster objects but I'm not sure it is actually better. The following are an example of the kernel files for the first 4 directions.

d0.txt	d1.txt	d2.txt	d3.txt
`3 3 0 0 0 0 0 1 0 0 0`	`3 3 0 0 0 0 0 0 0 0 1`	`3 3 0 0 0 0 0 0 0 1 0`	`3 3 0 0 0 0 0 0 1 0 0`

The following code is my implementation of the Canny Edge Detector in ArcGIS 10.0. There are probably more efficient ways of doing it but this seemed to work. The intermediate files could be saved by adding a few temp_obj.save(temp_raster) calls. I didn't include all of my input checking and some names/values are hardwired.

import os
import arcpy
from arcpy import env
import arcpy.sa as sa
import numpy as np
def canny_edge(workspace, raster_name, filter_name, t_low, t_high):
    #### Set ArcGIS environment parameters
    arcpy.CheckOutExtension("Spatial")
    env.overwriteOutput = True
    env.workspace = workspace
    env.scratchWorkspace = workspace
    env.cellSize = raster_path
    env.extent = raster_path
    env.snapRaster = raster_path

    #### Hardwired file names for kernels
    sobel_x_path = os.path.join(workspace, 'sobel_x.txt')
    sobel_y_path = os.path.join(workspace, 'sobel_y.txt')
    d0_path = os.path.join(workspace, 'd0.txt')
    d1_path = os.path.join(workspace, 'd1.txt')
    d2_path = os.path.join(workspace, 'd2.txt')
    d3_path = os.path.join(workspace, 'd3.txt')
    d4_path = os.path.join(workspace, 'd4.txt')
    d5_path = os.path.join(workspace, 'd5.txt')
    d6_path = os.path.join(workspace, 'd6.txt')
    d7_path = os.path.join(workspace, 'd7.txt')

    #### Canny Edge Detector Workflow
    #### Apply gaussian filter
    filter_obj = sa.FocalStatistics(
        raster_path, sa.NbrWeight(filter_path), "SUM", "NODATA")

    #### Calculate 1st derivative in horizontal and vertical direction
    grad_x_obj = sa.FocalStatistics(
        filter_obj, sa.NbrWeight(sobel_x_path), "SUM", "DATA")
    grad_y_obj = sa.FocalStatistics(
        filter_obj, sa.NbrWeight(sobel_y_path), "SUM", "DATA")
    del filter_obj

    #### Calculate gradient and direction
    grad_obj = sa.SquareRoot(
        sa.Power(grad_x_obj,2) + sa.Power(grad_y_obj,2))
    angle_obj = sa.ATan2(grad_y_obj, grad_x_obj)
    del grad_x_obj, grad_y_obj

    #### Reclassify angle into four directions
    #### See http://suraj.lums.edu.pk/~cs436a02/CannyImplementation.htm
    reclass_remap = sa.RemapRange(
        [[-3.141593, -2.748894, 0], [-2.748894, -1.963495, 1],
         [-1.963495, -1.178097, 2], [-1.178097, -0.392699, 3],
         [-0.392699,  0.392699, 0], [ 0.392699,  1.178097, 1],
         [ 1.178097,  1.963495, 2], [ 1.963495,  2.748894, 3],
         [ 2.748894,  3.141593, 0]])
    zone_obj = sa.Reclassify(
        angle_obj, "Value", reclass_remap, "NODATA")
    del angle_obj, reclass_remap

    #### Non-maxima suppresion
    #### Shift raster 1 cell in each of the 8 possible directions
    #### This will get neighboring cells for non-maxima suppression
    #### Keep cells that are greater than the two neighboring cells
    ####  along the gradient direction
    #### Calculate separate raster for each primary direction then sum
    d0_obj = sa.FocalStatistics(
        grad_obj, sa.NbrWeight(d0_path), "SUM", "DATA")
    d4_obj = sa.FocalStatistics(
        grad_obj, sa.NbrWeight(d4_path), "SUM", "DATA")
    grad_04_obj = sa.Con(
        (zone_obj == 0) & (grad_obj > d0_obj) & (grad_obj > d4_obj),
        grad_obj, 0)
    del d0_obj, d4_obj

    d1_obj = sa.FocalStatistics(
        grad_obj, sa.NbrWeight(d1_path), "SUM", "DATA")
    d5_obj = sa.FocalStatistics(
        grad_obj, sa.NbrWeight(d5_path), "SUM", "DATA")
    grad_15_obj = sa.Con(
        (zone_obj == 1) & (grad_obj > d1_obj) & (grad_obj > d5_obj),
        grad_obj, 0)
    del d1_obj, d5_obj

    d2_obj = sa.FocalStatistics(
        grad_obj, sa.NbrWeight(d2_path), "SUM", "DATA")
    d6_obj = sa.FocalStatistics(
        grad_obj, sa.NbrWeight(d6_path), "SUM", "DATA")
    grad_26_obj = sa.Con(
        (zone_obj == 2) & (grad_obj > d2_obj) & (grad_obj > d6_obj),
        grad_obj, 0)
    del d2_obj, d6_obj

    d3_obj = sa.FocalStatistics(
        grad_obj, sa.NbrWeight(d3_path), "SUM", "DATA")
    d7_obj = sa.FocalStatistics(
        grad_obj, sa.NbrWeight(d7_path), "SUM", "DATA")
    grad_37_obj = sa.Con(
        (zone_obj == 3) & (grad_obj > d3_obj) & (grad_obj > d7_obj),
        grad_obj, 0)
    del d3_obj, d7_obj
    del grad_obj, zone_obj

    print "Sum Directions"
    nms_obj = (grad_04_obj + grad_15_obj + grad_26_obj + grad_37_obj)
    del grad_04_obj, grad_15_obj, grad_26_obj, grad_37_obj
    nms_obj.save(raster_path.replace('.img', '_nms.img'))
    
    #### Hysteresis Thresholding
    t_low_obj = sa.RegionGroup(
        sa.Con(nms_obj > t_low, 1), "EIGHT", "CROSS", "NO_LINK")
    t_zs_obj = sa.ZonalStatistics(
        t_low_obj, "value", nms_obj, "MAXIMUM", "DATA")
    del t_low_obj
    t_high_obj = sa.Con((t_zs_obj >= t_high), nms_obj)
    del t_zs_obj, nms_obj
    t_high_obj.save(raster_path.replace('.img', '_edges.img'))

There are probably issues at the edges of the rasters because I noticed that the Focal Statistics was not properly handling nodata using the weighting option.

Monday, January 23, 2012

Python flood-fill algorithm

Update (2016-09-23): If anyone stumbles upon this I have made a github repo with working examples of the these two functions (http://github.com/cgmorton/flood-fill) and I have updated the code to address the missing variable mentioned in the comments.

I was converting a Matlab script to Python and needed to replicate the Matlab imfill function to fill holes/depressions in floating point rasters.

To determine the algorithm that imfill uses, I followed the reference in the imfill documentation: Soille, P., Morphological Image Analysis: Principles and Applications, Springer-Verlag, 1999, pp. 173-174. The 2nd edition, published in 2004 has a small description on hole filling in section 6.3.7 (pg 208). The key point is that the input image (referred to as the mask image) is set to the maximum value of the image except along the border (referred to as the marker image), and then morphologically eroded. The process is explained in more detail in Soile, P., Vogt, J., and Colombo, R., 2003. Carving and Adaptive Drainage Enforcement of Grid Digital Elevation Models. Water Resources Research, 39(12), 1366. The erosion operation outputs the minimum value in the neighboring cells of a point (defined using a structuring element that is typically a 3x3 square or a cross). The filled image is found by taking the maximum of either the erosion of the marker cell (the minimum value of the neighboring marker cells) or the original mask cell value.

The SciPy ndimage module has morphology tools. There is a binary hole filling function, but no floating point hole fill function. There is a grey_erosion function, and the following Python function mimics imfill using the grey_erosion function:

import numpy as np
from scipy import ndimage

def flood_fill(test_array, four_way=False):

    """"""

    input_array = np.copy(test_array)

    # Set h_max to a value larger than the array maximum to ensure
    #   that the while loop will terminate
    h_max = np.max(input_array * 2.0)

    # Build mask of cells with data not on the edge of the image
    # Use 3x3 square structuring element

    # Build Structuring element only using NumPy module
    # Structuring element could also be built using SciPy ndimage module
    #   el = ndimage.generate_binary_structure(2,2).astype(np.int)
    data_mask = np.isfinite(input_array)
    inside_mask = ndimage.binary_erosion(
        data_mask, 
        structure=np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]]).astype(np.bool))
    edge_mask = (data_mask & ~inside_mask)
    
    # Initialize output array as max value test_array except edges
    output_array = np.copy(input_array)
    output_array[inside_mask] = h_max

    # Array for storing previous iteration
    output_old_array = np.copy(input_array)
    output_old_array.fill(0)   

    # Cross structuring element
    if four_way:
        el = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]]).astype(np.bool)
        # el = ndimage.generate_binary_structure(2, 1).astype(np.int)
    else:
        el = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]]).astype(np.bool)
        # el = ndimage.generate_binary_structure(2, 2).astype(np.int)

    # Iterate until marker array doesn't change
    while not np.array_equal(output_old_array, output_array):
        output_old_array = np.copy(output_array)
        output_array = np.maximum(
            input_array,
            ndimage.grey_erosion(output_array, size=(3, 3), footprint=el))
    return output_array

This function is very slow though, because the calculation starts on the edge of the image and moves in. Only a small number of cells are changing but the entire array is being processed for each iteration. The function was used to fill holes on a 2206x2986 image with numerous large depressions and took 690 seconds to run.

The Soille (2004) text and the Soille et al. (2003) article mentioned that a fast algorithm that uses priority queues is proposed in Soille and Gratin, 1994. An Efficient Algorithm for Drainage Networks Extraction on DEMs. Journal of Visual Communication and Image Representation, 5(2), 181-189. This article was difficult to obtain but it turns out that the article Liu et al., 2009. Another Fast and Simple DEM Depression-Filling Algorithm Based on Priority Queue Structure. Atmopsheric and Oceanic Science Letters, 2(4) 214-219 presents an algorithm that is nearly identical (but with no mention or citation of the Soile and Gratin (1994) article!). The article is available with a quick Google search. The basic idea of the method is the same as the previous algorithm, but instead of processing every cell each iteration, the cells with the lowest value are processed first. Conceptually, from the edge the algorithm works upslope until it hits a depression, at which point the depression is filled with the current value (h_crt) until a higher point is reached.

The following python function implements the priority queue based flood-fill algorithm:

import heapq
import numpy as np
from scipy import ndimage

def flood_fill(test_array, four_way=False):

    """"""

    input_array = np.copy(test_array)
    input_rows, input_cols = input_array.shape
    
    # Set h_max to a value larger than the array maximum to ensure
    #   that the while loop will terminate
    h_max = np.max(input_array*2.0)
    
    # Build mask of cells with data not on the edge of the image
    # Use 3x3 square structuring element
    inside_mask = ndimage.binary_erosion(
        np.isfinite(input_array),

        structure=ndimage.generate_binary_structure(2, 2).astype(np.int))
    
    # Initialize output array as max value test_array except edges
    output_array = np.copy(input_array)
    output_array[inside_mask] = h_max
    
    # Build priority queue and place edge pixels into priority queue
    # Last value is flag to indicate if cell is an edge cell
    put = heapq.heappush
    get = heapq.heappop
    fill_heap = [
        (output_array[t_row, t_col], int(t_row), int(t_col), True)
        for t_row, t_col in np.transpose(np.where(edge_mask))]
    heapq.heapify(fill_heap)

    def neighbors(row, col, four_way=False):
        """Return indices of adjacent cells"""
        if four_way:
            return [
                (row - 1, col), (row, col + 1), 
                (row + 1, col), (row, col - 1)]
        else:
            return [
                (row - 1, col), (row - 1, col + 1), 
                (row, col + 1), (row + 1, col + 1), 
                (row + 1, col), (row + 1, col - 1), 
                (row, col - 1), (row - 1, col - 1)]

    # Iterate until priority queue is empty
    while True:
        try:

            h_crt, t_row, t_col, edge_flag = get(fill_heap)
        except IndexError:

            break
        for n_row, n_col in neighbors(t_row, t_col, four_way):
            # Skip cell if outside array edges
            if edge_flag:
                try:
                    if not inside_mask[n_row, n_col]:

                        continue
                except IndexError:

                    continue
            if output_array[n_row, n_col] == h_max:
                output_array[n_row, n_col] = max(
                    h_crt, input_array[n_row, n_col])
                put(fill_heap, (output_array[n_row, n_col], n_row, n_col, False))
    return output_array

This function only took 61 seconds to fill the same test image, which is a considerable improvement, but imfill only takes ~3 seconds, so there is still a long way to go. I will probably try implementing the function in C++ next.

A number of things helped speed up the algorithm. The biggest gain was saving 180 seconds by using Python Ints instead of NumPy Ints in the heap data tuple (cell_value, row, col, edge_flag). Other improvements included using the heapq module instead of Queue.PriorityQueue() (saved 47s), using the python max function instead of the numpy max function (saved 25s), creating local copies of the function references heappush and heappop (saved 2s), and only checking if the neighboring pixels were in the image for edge pixels using the edge_flag (saved 4s). Last, I set the max value of the marker array to be greater than the max value of the input array so that the while loop would terminate without checking the value each time.

Friday, September 2, 2011

Simple python issues

This post will list some simple python issues that I always have to look up.

To build a string showing a number with a percent sign, use: "%f%%" % some_float

To swap values: a,b = b,a

The three ways of building a path string:
'D:\\Temp\\temp.txt', r'D:\Temp\temp.txt', 'D:/Temp/temp.txt'
(Note: I think Python converts all three to the first one, so splitting on "\\" seems to work best)

To traverse all sub-folders and files of a workspace, use:
for root, subs, files in os.walk(workspace)

To calculate DOY from a datetime object:


from datetime import datetime

dt_obj = datetime.now()

doy = dt_obj.timetuple().tm_yday

Regular expression to "match all": '.*' or '.*?'
Regular expression to "match nothing": 'a^'

Wednesday, August 31, 2011

Latitude & Longitude rasters in ArcGIS 10

The ability to calculate latitude and longitude rasters using the GRID commands $$YMap and $$XMap was removed in ArcGIS 10 (as well as all the GRID commands including $$COLMAP, $$ROWMAP, $$NROWS, $$NCOLS, $$CELLSIZE). To get around this you can still load the ArcGIS 9.3 geoprocessor in ArcGIS 10 and then call the functions using the Single Output Map Algebra tool. Latitude & longitude rasters can also be calculated with NumPy or the FlowAccumulation function but these aren't great options for really large rasters. Make sure to use a DEM that is in a geographic coordinates system and then project the results.

def lat_long_rasters(ned_raster, lat_raster, lon_raster):
    import arcgisscripting
    gp = arcgisscripting.create(9.3)
    gp.CheckOutExtension("Spatial")
    gp.WorkSpace = os.path.dirname(ned_raster)
    gp.SingleOutputMapAlgebra_sa(
        "Con(IsNull({0}), {1}, $$YMap)".format(ned_raster, ned_raster), 
        lat_raster)
    gp.SingleOutputMapAlgebra_sa(
        "Con(IsNull({0}), {1}, $$XMap)".format(ned_raster, ned_raster), 
        lon_raster)
    del gp

Friday, August 19, 2011

Nan in NumPy array to NoData in ArcGIS raster

For smaller rasters, NumPy is a much faster alternative to some of the ArcGIS geoprocessing functions. One issue with doing this is that every raster must have exactly the same dimensions (rows, columns, cellsize, extent, etc.) since NumPy has no spatial component. Later on I will post about some of the techniques we use to make sure they are all identical.

There is one issue with converting NumPy arrays back to rasters though that I want to outline. The arcpy function NumPyArrayToRaster does not correctly convert NaN values. The values will display as NoData in the identity window, but they do not act as NoData in subsequent calculations and the 'Display NoData as' option in the Symbology tab has no impact.

To get around this, convert all NaN values to some unused value, and then set that value as the value_to_nodata parameter. For example:



test_array[np.isnan(test_array)] = -9999

test_obj = arcpy.NumPyArrayToRaster(test_array, pnt, cellsize, cellsize, -9999)

test_obj.save(test_raster)

This is Bug NIM063028 and is not fixed in any current or future version.

Pyramids documentation

There is a small issue with the documentation for the pyramids environment parameter. The explanation section for the pyramid_option has a few errors. The option is "PYRAMIDS", not "PYRAMID", and the default is to build full pyramids (-1 level), not NONE. This is Bug NIM062605 and is reported as fixed in 10.1. The example code at the bottom of the help page shows the correct usage. To force no pyramids I prefer to use: env.pyramid = "PYRAMIDS 0".

Also, the CreateRandomRaster function doesn't honor the pyramids setting. This is Bug NIM062606 and is not currently reported as fixed.