Scaling DA UpdateCursor to large datasets?
Does creating an index seem to have any impact on run time?
Also, you can use a SQL statement to restrict the values you are accessing. If you really do need to access a very large amount of data, you can nest the SQL statement and cursor within a while loop.
Edit: I've run some benchmarks on my machine (Windows 7 x64, 2.4GHz i3-370m (lol), 8GB RAM, ArcMap 10.1 SP1). I created a feature class of 25,000,000** rows with a field, x, containing sequential integers. I'm assuming that update cursors are slower, so that is why I tested a search cursor.
import arcpy, time
shp = "C:/images/junk/massive.gdb/large"
entries = int(arcpy.GetCount_management(shp).getOutput(0))
test1 = []
startval = 0
breakval = 1000000
c = time.clock()
while startval < entries:
sql = '"OBJECTID" BETWEEN {0} AND {1}'.format(startval+1, startval+breakval)
test1.extend([row[0] for row in arcpy.da.SearchCursor(shp, "x", sql)])
startval+=breakval
print time.clock()-c
c = time.clock()
test2 = [row[0] for row in arcpy.da.SearchCursor(shp, "x")]
print time.clock()-c
print test1 == test2
The results were as follows:
614.128610407
601.801415697
True
This would place my read time at ~41,000 records/s or 1,000 records in ~24.4 µs.
**I ran test1 with 50,000,000 features and it took 1217. Couldn't run test2 as I received a memory overflow error. But regardless, I doubled the features and the time roughly doubled, which is encouraging.
I'm not sure the cutoff point for you where the access time skyrockets, but if this is something that you will be running often, it's worth the time to optimize breakval
. To do that, increase/decrease breakval
and just access a subset of your FC with SQL once (the entire FC clearly takes too long). If total run time*entries/breakval
decreases between runs, then you can continue tweaking breakval
.