Selecting a good SQL Server 2008 spatial index with large polygons

Splitting Data

If the query is for displaying data then you could split up your large polygons using a grid. These would be then very quick to retrieve with an index. You could remove the outlines so the features would still look contiguous.

Most commercial GIS packages will have tools to split one polygon dataset by another. Search for tools that do intersections.

If you are using OpenSource then have a look at QGIS and http://www.ftools.ca which "perform geoprocessing operations including intersections, differencing, unions, dissolves, and clipping." I've not used the latter myself.

Have a look at: http://postgis.refractions.net/docs/ch04.html#id2790790 for why large features are bad.

Filter and Intersects

There is more on the Filter clause here - Link

Spatial Indexes

Something else to check is that the spatial index is actually being used in the query plan. You may have to force the query to use the index with the WITH clause:

Link

More details on indexes below:

Link

Also try running sp_help_spatial_geometry_index for your data to see what settings to use for your spatial index

http://msdn.microsoft.com/en-us/library/cc627426.aspx

Running this SP with some test geometry produces all sorts of statistics to try and tailor your index to your data. A full list of properties is at http://msdn.microsoft.com/en-us/library/cc627425.aspx

These include values such as:

  • CellArea_To_BoundingBoxArea_Percentage_In_Level1
  • Number_Of_Rows_Selected_By_Primary_Filter

Messed Up Geometry

From the results of sp_help_spatial_geometry_index it looks like you may have issues with the geometry itself rather than the spatial index.

The Base_Table_Rows count looks to be a bug - http://connect.microsoft.com/SQLServer/feedback/details/475838/number-of-rows-in-base-table-incorrect-in-sp-help-spatial-geography-index-xml It may be worth recreating table / database and trying the index from scratch.

Total_Number_Of_ObjectCells_In_Level0_In_Index 60956 is a lot of features to return at level 0. It is likely they are either outside the spatial index extent or nulls. It then runs the Intersect (Number_Of_Times_Secondary_Filter_Is_Called 60956) on all these features which would explain why it is slow. Even though the docs claim no performance hit for null features - I believe it still has to look up the records, even if no intersect is performed.

NULL and empty instances are counted at level 0 but will not impact performance. Level 0 will have as many cells as NULL and empty instances at the base table.

The Primary_Filter_Efficiency of 0.003281055 I believe indicates 0.03% efficiency!

A few things to try:

  1. Anything strange from SELECT * FROM sys.spatial_indexes?

  2. The MakeValid statement:

    UPDATE MyTable SET GeomFieldName = GeomFieldName.MakeValid()

  3. Reset / double check SRID:

    UPDATE MyTable SET GeomFieldName.STSrid = 4326

  4. Add in some fields to show the extents of your features. This may highlight issues / NULL geometries.

    ALTER TABLE MyTable ADD MinX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STX,0)) PERSISTED ALTER TABLE MyTable ADD MinY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STY,0)) PERSISTED ALTER TABLE MyTable ADD MaxX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STX,0)) PERSISTED ALTER TABLE MyTable ADD MaxY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STY,0)) PERSISTED


In your index query you use:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-90, -180, 90, 180),
...

The BOUNDING_BOX therefore maps to:

xmin = -90
ymin = -180
xmax = 90
ymax = 180
  • Longtitude (-180 to 180 - designating East / West of the Meridian) should map to X
  • Latitude (-90 to 90 - designating how far North or South of the Equator) should map to Y

So to create the BOUNDING_BOX for the world you should use:

CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit] 
(
    [geom]
)USING  GEOMETRY_GRID 
WITH (
BOUNDING_BOX =(-180, -90, 180, 90),
...

This should create an index that fits your data and means all your features are covered by the index.