Calculating average width of polygon?

Part of the problem is to find a suitable definition of "average width." Several are natural but will differ, at least slightly. For simplicity, consider definitions based on properties that are easy to compute (which is going to rule out those based on the medial axis transform or sequences of buffers, for instance).

As an example, consider that the archetypal intuition of a polygon with a definite "width" is a small buffer (say of radius r with squared ends) around a long, fairly straight polyline (say of length L). We think of 2r = w as its width. Thus:

  • Its perimeter P is approximately equal to 2L + 2w;

  • Its area A is approximately equal to w L.

The width w and length L can then be recovered as roots of the quadratic x^2 - (P/2)x + A; in particular, we can estimate

  • w = (P - Sqrt(P^2 - 16A))/4.

When you're sure the polygon really is long and skinny, as a further approximation you can take 2L + 2w to equal 2L, whence

  • w (crudely) = 2A / P.

The relative error in this approximation is proportional to w/L: the skinnier the polygon, the closer w/L is to zero, and the better the approximation gets.

Not only is this approach extremely simple (just divide the area by the perimeter and multiply by 2), with either formula it doesn't matter how the polygon is oriented or where it is situated (because such Euclidean motions change neither the area nor the perimeter).

You might consider using either of these formulas to estimate average width for any polygons that represent street segments. The error you make in the original estimate of w (with the quadratic formula) comes about because the area A also includes tiny wedges at each bend of the original polyline. If the sum of bend angles is t radians (this is the total absolute curvature of the polyline), then really

  • P = 2L + 2w + 2 Pi t w and

  • A = L w + Pi t w^2.

Plug these into the previous (quadratic formula) solution and simplify. When the smoke clears, the contribution from the curvature term t has disappeared! What originally looked like an approximation is perfectly accurate for non-self-intersecting polyline buffers (with squared ends). For variable-width polygons, this is therefore a reasonable definition of average width.


Here I show little optimization about @whuber solution, and I'm putting in terms of "buffer width", because it is useful for integrate the solution of a more general problem: Is there a st_buffer inverse function, that returns a width estimation?

CREATE FUNCTION buffer_width(
        -- rectangular strip mean width estimator
    p_len float,   -- len of the central line of g
    p_geom geometry, -- g
    p_btype varchar DEFAULT 'endcap=flat' -- st_buffer() parameter
) RETURNS float AS $f$
  DECLARE
    w_half float;
    w float;    
  BEGIN
         w_half := 0.25*ST_Area(p_geom)/p_len;
         w      := 0.50*ST_Area( ST_Buffer(p_geom,-w_half,p_btype) )/(p_len-2.0*w_half);
     RETURN w_half+w;
  END
$f$ LANGUAGE plpgsql IMMUTABLE;

For this problem, the @celenius question about street width, sw, the solution is

 sw = buffer_width(ST_Length(g1), g2)

where sw is the "average width", g1 the central line of g2, and the street g2 is a POLYGON. I used only the OGC standard library, tested with PostGIS, and solved other serious practical applications with the same buffer_width function.

DEMONSTRATION

A2 is the area of g2, L1 the length of the central line (g1) of g2.

Supposing that we can generate g2 by g2=ST_Buffer(g1,w), and that g1 is a straight, so g2 is a rectangle with lenght L1 and width 2*w, and

    A2 = L1*(2*w)   -->  w = 0.5*A2/L1

It is not the same formula of @whuber, because here w is a half of the rectangle (g2) width. It is a good estimator, but as we can see by tests (below), is not exact, and the function use it as a clue, to reduce the g2 area, and as a final estimator.

Here we do not evaluate buffers with "endcap=square" or "endcap=round", that need a sum to A2 of an area of a point buffer with the same w.

REFERENCES: in a similar forum of 2005, W. Huber explains suchlike and other solutions.

TESTS AND REASONS

For straight lines the results, as expected, are exact. But for other geometries the results can be disappointing. The main reason is, perhaps, all the model is for exact rectangles, or for geometries that can be approximated to a "strip rectangle". Here a "test kit" for check the limits of this approximation (see wfactor in the results above).

 SELECT *, round(100.0*(w_estim-w)/w,1) as estim_perc_error
    FROM (
        SELECT btype, round(len,1) AS len, w, round(w/len,3) AS wfactor,
               round(  buffer_width(len, gbase, btype)  ,2) as w_estim ,
               round(  0.5*ST_Area(gbase)/len       ,2) as w_near
        FROM (
         SELECT
            *, st_length(g) AS len, ST_Buffer(g, w, btype) AS gbase
         FROM (
               -- SELECT ST_GeomFromText('LINESTRING(50 50,150 150)') AS g, -- straight
               SELECT ST_GeomFromText('LINESTRING(50 50,150 150,150 50,250 250)') AS g,
            unnest(array[1.0,10.0,20.0,50.0]) AS w
              ) AS t, 
             (SELECT unnest(array['endcap=flat','endcap=flat join=bevel']) AS btype
             ) AS t2
        ) as t3
    ) as t4;
    

RESULTS:

WITH RECTANGLES (central line is a STRAIGHT LINE):

         btype          |  len  |  w   | wfactor | w_estim | w_near | estim_perc_error 
------------------------+-------+------+---------+---------+--------+------------------
 endcap=flat            | 141.4 |  1.0 |   0.007 |       1 |      1 |                0
 endcap=flat join=bevel | 141.4 |  1.0 |   0.007 |       1 |      1 |                0
 endcap=flat            | 141.4 | 10.0 |   0.071 |      10 |     10 |                0
 endcap=flat join=bevel | 141.4 | 10.0 |   0.071 |      10 |     10 |                0
 endcap=flat            | 141.4 | 20.0 |   0.141 |      20 |     20 |                0
 endcap=flat join=bevel | 141.4 | 20.0 |   0.141 |      20 |     20 |                0
 endcap=flat            | 141.4 | 50.0 |   0.354 |      50 |     50 |                0
 endcap=flat join=bevel | 141.4 | 50.0 |   0.354 |      50 |     50 |                0

WITH OTHER GEOMETRIES (centerline folded):

         btype          | len |  w   | wfactor | w_estim | w_near | estim_perc_error 
 -----------------------+-----+------+---------+---------+--------+------------------
 endcap=flat            | 465 |  1.0 |   0.002 |       1 |      1 |                0
 endcap=flat join=bevel | 465 |  1.0 |   0.002 |       1 |   0.99 |                0
 endcap=flat            | 465 | 10.0 |   0.022 |    9.98 |   9.55 |             -0.2
 endcap=flat join=bevel | 465 | 10.0 |   0.022 |    9.88 |   9.35 |             -1.2
 endcap=flat            | 465 | 20.0 |   0.043 |   19.83 |  18.22 |             -0.9
 endcap=flat join=bevel | 465 | 20.0 |   0.043 |   19.33 |  17.39 |             -3.4
 endcap=flat            | 465 | 50.0 |   0.108 |   46.29 |  40.47 |             -7.4
 endcap=flat join=bevel | 465 | 50.0 |   0.108 |   41.76 |  36.65 |            -16.5

 wfactor= w/len
 w_near = 0.5*area/len
 w_estim is the proposed estimator, the buffer_width function.

About btype see ST_Buffer guide, with good ilustratins and the LINESTRINGs used here.

CONCLUSIONS:

  • the estimator of w_estim is always better than w_near;
  • for "near to rectangular" g2 geometries, its ok, any wfactor
  • for another geometries (near to "rectangular strips"), use the limit wfactor=~0.01 for 1% of error on w_estim. Up to this wfactor, use another estimator.

Caution and prevention

Why the estimation error occurs? When you use ST_Buffer(g,w), you expect, by the "rectangular strip model", that the new area added by the buffer of width w is about w*ST_Length(g) or w*ST_Perimeter(g)... When not, usually by overlays (see folded lines) or by "styling", is when the estimation of average w fault. This is the main message of the tests.

To detect this problem at any king of buffer, check the behavior of the buffer generation:

SELECT btype, w, round(100.0*(a1-len1*2.0*w)/a1)::varchar||'%' AS straight_error,  
                 round(100.0*(a2-len2*2.0*w)/a2)::varchar||'%' AS curve2_error,
                 round(100.0*(a3-len3*2.0*w)/a3)::varchar||'%' AS curve3_error
FROM (
 SELECT
    *, st_length(g1) AS len1, ST_Area(ST_Buffer(g1, w, btype)) AS a1,
    st_length(g2) AS len2, ST_Area(ST_Buffer(g2, w, btype)) AS a2,
    st_length(g3) AS len3, ST_Area(ST_Buffer(g3, w, btype)) AS a3
 FROM (
       SELECT ST_GeomFromText('LINESTRING(50 50,150 150)') AS g1, -- straight
              ST_GeomFromText('LINESTRING(50 50,150 150,150 50)') AS g2,
              ST_GeomFromText('LINESTRING(50 50,150 150,150 50,250 250)') AS g3,
              unnest(array[1.0,20.0,50.0]) AS w
      ) AS t, 
     (SELECT unnest(array['endcap=flat','endcap=flat join=bevel']) AS btype
     ) AS t2
) as t3;

RESULTS:

         btype          |  w   | straight_error | curve2_error | curve3_error 
------------------------+------+----------------+--------------+--------------
 endcap=flat            |  1.0 | 0%             | -0%          | -0%
 endcap=flat join=bevel |  1.0 | 0%             | -0%          | -1%
 endcap=flat            | 20.0 | 0%             | -5%          | -10%
 endcap=flat join=bevel | 20.0 | 0%             | -9%          | -15%
 endcap=flat            | 50.0 | 0%             | -14%         | -24%
 endcap=flat join=bevel | 50.0 | 0%             | -26%         | -36%

        alert


If you can join your polygon data to your centerline data (by either spatial or tabular means), then just sum the polygon areas for each centerline alignment and divide by the length of the centerline.