How to randomly subset X% of selected points?
Here's a python function that will select random features in a layer based on percent, ignoring current selection:
def SelectRandomByPercent (layer, percent):
#layer variable is the layer name in TOC
#percent is percent as whole number (0-100)
if percent > 100:
print "percent is greater than 100"
return
if percent < 0:
print "percent is less than zero"
return
import random
fc = arcpy.Describe (layer).catalogPath
featureCount = float (arcpy.GetCount_management (fc).getOutput (0))
count = int (featureCount * float (percent) / float (100))
if not count:
arcpy.SelectLayerByAttribute_management (layer, "CLEAR_SELECTION")
return
oids = [oid for oid, in arcpy.da.SearchCursor (fc, "OID@")]
oidFldName = arcpy.Describe (layer).OIDFieldName
path = arcpy.Describe (layer).path
delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
randOids = random.sample (oids, count)
oidsStr = ", ".join (map (str, randOids))
sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
arcpy.SelectLayerByAttribute_management (layer, "", sql)
Copy/paste this into the python shell in ArcMap.
Then in the shell type SelectRandomByPercent ("layer", num)
, where layer
is the name of your layer, and num
is a whole number of your percent.
A variation to find a subset selection as asked:
def SelectRandomByPercent (layer, percent):
#layer variable is the layer name in TOC
#percent is percent as whole number (0-100)
if percent > 100:
print "percent is greater than 100"
return
if percent < 0:
print "percent is less than zero"
return
import random
featureCount = float (arcpy.GetCount_management (layer).getOutput (0))
count = int (featureCount * float (percent) / float (100))
if not count:
arcpy.SelectLayerByAttribute_management (layer, "CLEAR_SELECTION")
return
oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
oidFldName = arcpy.Describe (layer).OIDFieldName
path = arcpy.Describe (layer).path
delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
randOids = random.sample (oids, count)
oidsStr = ", ".join (map (str, randOids))
sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
arcpy.SelectLayerByAttribute_management (layer, "", sql)
Finally, one more variation to select a layer by a count, instead of a percent:
def SelectRandomByCount (layer, count):
import random
layerCount = int (arcpy.GetCount_management (layer).getOutput (0))
if layerCount < count:
print "input count is greater than layer count"
return
oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
oidFldName = arcpy.Describe (layer).OIDFieldName
path = arcpy.Describe (layer).path
delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
randOids = random.sample (oids, count)
oidsStr = ", ".join (map (str, randOids))
sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
arcpy.SelectLayerByAttribute_management (layer, "", sql)
Generally, I also recommend using the spatial ecology tools as discussed by blah238.
However, another method you could try would be to add an attribute called Random to store a random number:
Then, using the field calculator on that attribute, with the Python Parser, use the following codeblock:
import random
def rand():
return random.random()
See image below:
This will create random values between 0 and 1. Then, if you want to select 20% of the features, you could select features where the Random value is less than 0.2. Of course, this will work better with many features. I created a feature class with only 7 features as a test and there were no values less than 0.2. However, it looks like you have plenty of features, so that shouldn't matter.
There is also an earlier Select features at random script from @StephenLead available for ArcGIS Desktop. Although written, I think, for ArcGIS 9.x, and last modified in 2008, I used it in about 2010 at 10.0, and it still worked well.