Exploding overlapping to new non-overlapping polygons?
This is a graph coloring problem.
Recall that a graph coloring is an assignment of a color to the vertices of a graph in such a way that no two vertices which share an edge will also have the same color. Specifically, the (abstract) vertices of the graph are the polygons. Two vertices are connected with an (undirected) edge whenever they intersect (as polygons). If we take any solution to the problem--which is a sequence of (say k) disjoint collections of the polygons--and assign a unique color to each collection in the sequence, then we will have obtained a k-coloring of the graph. It is desirable to find a small k.
This problem is pretty hard and remains unsolved for arbitrary graphs. Consider an approximate solution that's simple to code. A sequential algorithm ought to do. The Welsh-Powell algorithm is a greedy solution based on a descending ordering of the vertices by degree. Translated to the language of the original polygons, first sort the polygons in descending order of the number of other polygons they overlap. Working in order, give the first polygon an initial color. In each successive step, try to color the next polygon with an existing color: that is, choose a color that is not already used by any of that polygon's neighbors. (There are many ways to choose among the available colors; try either the one that has been least used or else choose one randomly.) If the next polygon cannot be colored with an existing color, create a new color and color it with that.
Once you have achieved a coloring with a small number of colors, perform zonalstats color by color: by construction, you're guaranteed that no two polygons of a given color overlap.
Here's sample code in R
. (Python code wouldn't be much different.) First, we describe overlaps among the seven polygons shown.
edges <- matrix(c(1,2, 2,3, 3,4, 4,5, 5,1, 2,6, 4,6, 4,7, 5,7, 1,7), ncol=2, byrow=TRUE)
That is, polygons 1 and 2 overlap, and so do polygons 2 and 3, 3 and 4, ..., 1 and 7.
Sort the vertices by descending degree:
vertices <- unique(as.vector(edges))
neighbors <- function(i) union(edges[edges[, 1]==i,2], edges[edges[, 2]==i,1])
nbrhoods <- sapply(vertices, neighbors)
degrees <- sapply(nbrhoods, length)
v <- vertices[rev(order(degrees))]
A (crude) sequential coloring algorithm uses the earliest available color not already used by any overlapping polygon:
color <- function(i) {
n <- neighbors(i)
candidate <- min(setdiff(1:color.next, colors[n]))
if (candidate==color.next) color.next <<- color.next+1
colors[i] <<- candidate
}
Initialize the data structures (colors
and color.next
) and apply the algorithm:
colors <- rep(0, length(vertices))
color.next <- 1
temp <- sapply(v, color)
Split the polygons into groups according to color:
split(vertices, colors)
The output in this example uses four colors:
$`1`
[1] 2 4
$`2`
[1] 3 6 7
$`3`
[1] 5
$`4`
[1] 1
It has partitioned the polygons into four non-overlapping groups. In this case the solution is not optimal ({{3,6,5}, {2,4}, {1,7}} is a three-coloring for this graph). In general the solution it gets shouldn't be too bad, though.
The methodology recommended by #whuber inspired me to take a new direction, and here is my arcpy solution, in two functions. The first, called countOverlaps, make two fields, "overlaps" and "ovlpCount" to record for each poly which polys overlapped with it, and how many overlaps occurred. The second function, explodeOverlaps, creates a third field, "expl", which gives a unique integer to each group of non-overlapping polys. The user can then export new fc's based on this field. The process is broken into two functions because I think the countOverlaps tool can prove useful by itself. Please excuse the sloppiness of the code (and the careless naming convention), as it's pretty preliminary, but it works. Also make sure that the "idName" field represents a field with unique IDs (only tested with integer IDs). Thank you whuber for providing me with the framework necessary to approach this problem!
def countOverlaps(fc,idName):
intersect = arcpy.Intersect_analysis(fc,'intersect')
findID = arcpy.FindIdentical_management(intersect,"explFindID","Shape")
arcpy.MakeFeatureLayer_management(intersect,"intlyr")
arcpy.AddJoin_management("intlyr",arcpy.Describe("intlyr").OIDfieldName,findID,"IN_FID","KEEP_ALL")
segIDs = {}
featseqName = "explFindID.FEAT_SEQ"
idNewName = "intersect."+idName
for row in arcpy.SearchCursor("intlyr"):
idVal = row.getValue(idNewName)
featseqVal = row.getValue(featseqName)
segIDs[featseqVal] = []
for row in arcpy.SearchCursor("intlyr"):
idVal = row.getValue(idNewName)
featseqVal = row.getValue(featseqName)
segIDs[featseqVal].append(idVal)
segIDs2 = {}
for row in arcpy.SearchCursor("intlyr"):
idVal = row.getValue(idNewName)
segIDs2[idVal] = []
for x,y in segIDs.iteritems():
for segID in y:
segIDs2[segID].extend([k for k in y if k != segID])
for x,y in segIDs2.iteritems():
segIDs2[x] = list(set(y))
arcpy.RemoveJoin_management("intlyr",arcpy.Describe(findID).name)
if 'overlaps' not in [k.name for k in arcpy.ListFields(fc)]:
arcpy.AddField_management(fc,'overlaps',"TEXT")
if 'ovlpCount' not in [k.name for k in arcpy.ListFields(fc)]:
arcpy.AddField_management(fc,'ovlpCount',"SHORT")
urows = arcpy.UpdateCursor(fc)
for urow in urows:
idVal = urow.getValue(idName)
if segIDs2.get(idVal):
urow.overlaps = str(segIDs2[idVal]).strip('[]')
urow.ovlpCount = len(segIDs2[idVal])
urows.updateRow(urow)
def explodeOverlaps(fc,idName):
countOverlaps(fc,idName)
arcpy.AddField_management(fc,'expl',"SHORT")
urows = arcpy.UpdateCursor(fc,'"overlaps" IS NULL')
for urow in urows:
urow.expl = 1
urows.updateRow(urow)
i=1
lyr = arcpy.MakeFeatureLayer_management(fc)
while int(arcpy.GetCount_management(arcpy.SelectLayerByAttribute_management(lyr,"NEW_SELECTION",'"expl" IS NULL')).getOutput(0)) > 0:
ovList=[]
urows = arcpy.UpdateCursor(fc,'"expl" IS NULL','','','ovlpCount D')
for urow in urows:
ovVal = urow.overlaps
idVal = urow.getValue(idName)
intList = ovVal.replace(' ','').split(',')
for x in intList:
intList[intList.index(x)] = int(x)
if idVal not in ovList:
urow.expl = i
urows.updateRow(urow)
ovList.extend(intList)
i+=1
It's been awhile, but I used this code for my own application and it's been working great — thank you. I re-wrote part of it to update it, apply it to lines (with a tolerance) and speed it up significantly (below — I'm running it on 50 million intersecting buffers and it only takes a couple of hours).
def ExplodeOverlappingLines(fc, tolerance, keep=True):
print('Buffering lines...')
idName = "ORIG_FID"
fcbuf = arcpy.Buffer_analysis(fc, fc+'buf', tolerance, line_side='FULL', line_end_type='FLAT')
print('Intersecting buffers...')
intersect = arcpy.Intersect_analysis(fcbuf,'intersect')
print('Creating dictionary of overlaps...')
#Find identical shapes and put them together in a dictionary, unique shapes will only have one value
segIDs = defaultdict(list)
with arcpy.da.SearchCursor(intersect, ['Shape@WKT', idName]) as cursor:
x=0
for row in cursor:
if x%100000 == 0:
print('Processed {} records for duplicate shapes...'.format(x))
segIDs[row[0]].append(row[1])
x+=1
#Build dictionary of all buffers overlapping each buffer
segIDs2 = defaultdict(list)
for v in segIDs.values():
for segID in v:
segIDs2[segID].extend([k for k in v if k != segID and k not in segIDs2[segID]])
print('Assigning lines to non-overlapping sets...')
grpdict = {}
# Mark all non-overlapping one to group 1
for row in arcpy.da.SearchCursor(fcbuf, [idName]):
if row[0] in segIDs2:
grpdict[row[0]] = None
else:
grpdict[row[0]] = 1
segIDs2sort = sorted(segIDs2.items(), key=lambda x: (len(x[1]), x[0])) #Sort dictionary by number of overlapping features then by keys
i = 2
while None in grpdict.values(): #As long as there remain features not assigned to a group
print(i)
ovset = set() # list of all features overlapping features within current group
s_update = ovset.update
for rec in segIDs2sort:
if grpdict[rec[0]] is None: #If feature has not been assigned a group
if rec[0] not in ovset: #If does not overlap with a feature in that group
grpdict[rec[0]] = i # Assign current group to feature
s_update(rec[1]) # Add all overlapping feature to ovList
i += 1 #Iterate to the next group
print('Writing out results to "expl" field in...'.format(fc))
arcpy.AddField_management(fc, 'expl', "SHORT")
with arcpy.da.UpdateCursor(fc,
[arcpy.Describe(fc).OIDfieldName, 'expl']) as cursor:
for row in cursor:
if row[0] in grpdict:
row[1] = grpdict[row[0]]
cursor.updateRow(row)
if keep == False:
print('Deleting intermediate outputs...')
for fc in ['intersect', "explFindID"]:
arcpy.Delete_management(fc)