How to efficiently access the features returned by QgsSpatialIndex?

    # assume a list of feature ids returned from index and a QgsVectorLayer 'lyr'
    fids = [1, 2, 4]
    request = QgsFeatureRequest()
    request.setFilterFids(fids)

    features = lyr.getFeatures(request)
    # can now iterate and do fun stuff:
    for feature in features:
        print feature.id(), feature

    1 <qgis._core.QgsFeature object at 0x000000000E987510>
    2 <qgis._core.QgsFeature object at 0x000000000E987400>
    4 <qgis._core.QgsFeature object at 0x000000000E987510>

In a blog post on the subject, Nathan Woodrow provides the following code:

layer = qgis.utils.iface.activeLayer()

# Select all features along with their attributes
allAttrs = layer.pendingAllAttributesList()
layer.select(allAttrs)
# Get all the features to start
allfeatures = {feature.id(): feature for (feature) in layer}

def noindex():
    for feature in allfeatures.values():
        for f in allfeatures.values():
            touches = f.geometry().touches(feature.geometry())
            # It doesn't matter if we don't return anything it's just an example

def withindex():
    # Build the spatial index for faster lookup.
    index = QgsSpatialIndex()
    map(index.insertFeature, allfeatures.values())

    # Loop each feature in the layer again and get only the features that are going to touch.
    for feature in allfeatures.values():
        ids = index.intersects(feature.geometry().boundingBox())
        for id in ids:
            f = allfeatures[id]
            touches = f.geometry().touches(feature.geometry())
            # It doesn't matter if we don't return anything it's just an example

import timeit
print "With Index: %s seconds " % timeit.timeit(withindex,number=1)
print "Without Index: %s seconds " % timeit.timeit(noindex,number=1)

This creates a dictionary allowing you to quickly look up a QgsFeature using it's FID.

I've found that for very large layers this isn't especially practical as it requires a lot of memory. However, the alternative (random access of the desired feature) using layer.getFeatures(QgsFeatureRequest().setFilterFid(fid)) seems to be comparatively very slow. I'm not sure why this is, as the equivalent call using the SWIG OGR bindings layer.GetFeature(fid) seems much faster than this.

For comparisons, look at More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc. The solution presented use the Python modules Fiona, Shapely and rtree (Spatial Index).

With PyQGIS and the same example two layers, point and polygon:

enter image description here

1) Without a spatial index:

polygons = [feature for feature in polygon.getFeatures()]
points = [feature for feature in point.getFeatures()]
for pt in points: 
    point = pt.geometry()
    for pl  in polygons:
        poly = pl.geometry()
        if poly.contains(point):
            print point.asPoint(), poly.asPolygon()
(184127,122472) [[(183372,123361), (184078,123130), (184516,122631),   (184516,122265), (183676,122144), (183067,122570), (183128,123105), (183372,123361)]]
(183457,122850) [[(183372,123361), (184078,123130), (184516,122631), (184516,122265), (183676,122144), (183067,122570), (183128,123105), (183372,123361)]]
(184723,124043) [[(184200,124737), (185368,124372), (185466,124055), (185515,123714), (184955,123580), (184675,123471), (184139,123787), (184200,124737)]]
(182179,124067) [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]

2) With the R-Tree PyQGIS spatial index:

# build the spatial index with all the polygons and not only a bounding box
index = QgsSpatialIndex()
for poly in polygons:
     index.insertFeature(poly)

# intersections with the index 
# indices of the index for the intersections
for pt in points:
    point = pt.geometry()
    for id in index.intersects(point.boundingBox()):
    print id
0
0
1
2

What does these indices mean ?

for i, pt in enumerate(points):
     point = pt.geometry()
     for id in index.intersects(point.boundingBox()):
        print "Point ", i, points[i].geometry().asPoint(), "is in Polygon ", id, polygons[id].geometry().asPolygon()
Point  1 (184127,122472) is in Polygon  0 [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]
Point  2 (183457,122850) is in Polygon  0 [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]
Point  4 (184723,124043) is in Polygon  1 [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]
Point  6 (182179,124067) is in Polygon  2 [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]

Same conclusions as in More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc:

Without and index, you must iterate through all the geometries (polygons and points).
With a bounding spatial index (QgsSpatialIndex()), you iterate only through the geometries which have a chance to intersect with your current geometry ('filter' which can save a considerable amount of calculations and time...).
You can also use other spatial index Python modules (rtree, Pyrtree or Quadtree) with PyQGIS as in Using a QGIS spatial index to speed up your code (with QgsSpatialIndex() and rtree)
but a Spatial Index is not a magic wand. When a very large part of the dataset has to be retrieved, a Spatial Index cannot give any speed benefit.

Other example in GIS se: How to find the nearest line to a point in QGIS? [duplicate]

How to efficiently access the features returned by QgsSpatialIndex?

Tags:

Spatial Index

Qgis

Pyqgis

Related

Recent Posts