Alternative to MakeValid() for spatial data in SQL Server 2016
Let me caveat that I am playing with spatial data in SQL server for the first time (so you probably already know this first part), but it took me a while to figure out that SQL Server isn't treating (x y z) coordinates as true 3D values, it is treating them as (latitude longitude) with an optional "elevation" value, Z, which is ignored by validation and other functions.
Evidence:
select geography::STGeomFromText('LINESTRING (0 0 1, 0 1 2, 0 -1 3)', 4326)
.IsValidDetailed()
24413: Not valid because of two overlapping edges in curve (1).
Your first example seemed weird to me because (0 0 1), (0 1 2), and (0 -1 3) are not collinear in 3D space (I'm a mathematician, so I was thinking in those terms). IsValidDetailed
(and MakeValid
) is treating these as (0 0), (0 1), and (0, -1), which does make an overlapping line.
To prove it, just swap the X and Z, and it validates:
select geography::STGeomFromText('LINESTRING (1 0 0, 2 1 0, 3 -1 0)', 4326)
.IsValidDetailed()
24400: Valid
This actually makes sense if we think of these as regions or paths traced on the surface of our globe, instead of points in mathematical 3D space.
The second part of your issue is that Z (and M) point values are not preserved by SQL through functions:
Z-coordinates are not used in any calculations made by the library and are not carried through any library calculations.
This is unfortunately by design. This was reported to Microsoft in 2010, the request was closed as "Won't Fix". You might find that discussion relevant, their reasoning is:
Assigning Z and M is ambiguous, because MakeValid splits and merges spatial elements. Points often gets created, removed or moved during this process. Therefore MakeValid (and other constructions) drops Z and M values.
For example:
DECLARE @a geometry = geometry::Parse('POINT(0 0 2 2)');
DECLARE @b geometry = geometry::Parse('POINT(0 0 1 1)');
SELECT @a.STUnion(@b).AsTextZM()
Values Z and M are ambiguous for point (0 0). We decided to drop Z and M completely instead of returning half-correct result.
You can assign them later on if you know exactly how. Alternatively you can change the way you generate your objects to be valid on input, or keep two versions of your objects, one that is valid and another one that preserves all your features. If you explain your scenario better and what you do with the objects maybe we could be able to give you additional workarounds.
In addition, as you've already seen, MakeValid
can also do other unexpected things, like change the order of points, return a MULTILINESTRING, or even return a POINT object.
One idea I came across was to store them as a MULTIPOINT object instead:
The problem is when your linestring actually retraces a continuous section of line between two points that was previously traced by the line. By definition, if you're retracing existing points, then the linestring is no longer the simplest geometry that can represent this pointset, and MakeValid() will give you a multilinestring instead (and lose your Z/M values).
Unfortunately, if you're working with GPS data or similar then it's quite likely that you might have retraced your path at some point in the route, so linestrings are not always that useful in these scenarios :( Arguably, such data should be stored as a multipoint anyway since your data represents the discrete location of an object sampled at regular points in time.
In your case it validates just fine:
select geometry::STGeomFromText('MULTIPOINT (0 0 1, 0 1 2, 0 -1 3)',4326)
.IsValidDetailed()
24400: Valid
If you absolutely need to maintain these as LINESTRINGS, then you will have to write your own version of MakeValid
that slightly adjusts some of the source X or Y points by some tiny value, while still preserving Z (and doesn't do other crazy things like convert it into other object types).
I'm still working on some code, but give a look at some of the starting ideas here:
- Converting linestring to points
- Rob Farley Blog: Converting points to a path
- Create geography polyline from points in T-SQL
EDIT Ok, a few things I found while testing:
- If the geometry object is invalid, you just can't do much with it. You can't read the
STGeometryType
, you can't get theSTNumPoints
or useSTPointN
to iterate through them. If you can't useMakeValid
, you're basically stuck with operating on the text representation of the geographic object. - Using
STAsText()
will return the text representation of even an invalid object, but doesn't return Z or M values. Instead, we wantAsTextZM()
orToString()
. - You can't create a function that calls
RAND()
(functions need to be deterministic), so I just made it nudge by successively larger and larger values. I really have no idea what the precision of your data is, or how tolerant it is of small changes, so use or modify this function at your own discretion.
I have no idea if there are possible inputs that will cause this loop to go on forever. You have been warned.
CREATE FUNCTION dbo.FixBadLineString (@input geography) RETURNS geography
AS BEGIN
DECLARE @output geography
IF @input.STIsValid() = 1 --send valid objects back as-is
SET @output = @input;
ELSE IF LEFT(@input.IsValidDetailed(),6) = '24413:'
--"Not valid because of two overlapping edges in curve"
BEGIN
--make a new MultiPoint object from the LineString text
DECLARE @mp geography = geography::STGeomFromText(
REPLACE(@input.AsTextZM(), 'LINESTRING', 'MULTIPOINT'), 4326);
DECLARE @newText nvarchar(max); --to build output
DECLARE @point int
DECLARE @tinynum float = 0;
SET @output = @input;
--keep going until it validates
WHILE @output.STIsValid() = 0
BEGIN
SET @newText = 'LINESTRING (';
SET @point = 1
SET @tinynum = @tinynum + 0.00000001
--Loop through the points, add a bit and append to the new string
WHILE @point <= @mp.STNumPoints()
BEGIN
SET @newText = @newText + convert(varchar(50),
@mp.STPointN(@point).Long + @tinynum) + ' ';
SET @newText = @newText + convert(varchar(50),
@mp.STPointN(@point).Lat - @tinynum) + ' ';
SET @newText = @newText + convert(varchar(50),
@mp.STPointN(@point).Z) + ', ';
SET @tinynum = @tinynum * -2
SET @point = @point + 1
END
--close the parens and make the new LineString object
SET @newText = LEFT(@newText, LEN(@newText) - 1) + ')'
SET @output = geography::STGeomFromText(@newText, 4326);
END; --this will loop if it is still invalid
RETURN @output;
END;
--Any other unhandled error, just send back NULL
ELSE SET @output = NULL;
RETURN @output;
END
Instead of parsing the string, I chose to create a new MultiPoint
object using the same set of points, so I could iterate through them and nudge them, then reassemble a new LineString. Here's some code to test it, 3 of these values (including your sample) start invalid but are fixed:
declare @geostuff table (baddata geography)
INSERT INTO @geostuff (baddata)
SELECT geography::STGeomFromText('LINESTRING (0 0 1, 0 1 2, 0 -1 3)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (0 2 0, 0 1 0.5, 0 -1 -14)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (0 0 4, 1 1 40, -1 -1 23)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (1 1 9, 0 1 -.5, 0 -1 3)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (6 6 26.5, 4 4 42, 12 12 86)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (0 0 2, -4 4 -2, 4 -4 0)',4326)
SELECT baddata.AsTextZM() as before, baddata.IsValidDetailed() as pretest,
dbo.FixBadLineString(baddata).AsTextZM() as after,
dbo.FixBadLineString(baddata).IsValidDetailed() as posttest
FROM @geostuff
This is BradC's FixBadLineString
function tweaked to use a random number between 0 and 0.000000003, thereby allowing it to scale for LINESTRINGs
with a large number of points, and also minimizing the change to the coordinates:
CREATE FUNCTION dbo.FixBadLineString (@input geography) RETURNS geography
AS BEGIN
DECLARE @output geography
IF @input.STIsValid() = 1 --send valid objects back as-is
SET @output = @input;
ELSE IF LEFT(@input.IsValidDetailed(),6) = '24413:'
--"Not valid because of two overlapping edges in curve"
BEGIN
--make a new MultiPoint object from the LineString text
DECLARE @mp geography = geography::STGeomFromText(
REPLACE(@input.AsTextZM(), 'LINESTRING', 'MULTIPOINT'), 4326);
DECLARE @newText nvarchar(max); --to build output
DECLARE @point int
SET @output = @input;
--keep going until it validates
WHILE @output.STIsValid() = 0
BEGIN
SET @newText = 'LINESTRING (';
SET @point = 1
--Loop through the points, add/subtract a random value between 0 and 3E-9 and append to the new string
WHILE @point <= @mp.STNumPoints()
BEGIN
SET @newText = @newText + convert(varchar(50),
CAST(@mp.STPointN(@point).Long AS NUMERIC(18,9)) +
CAST(ABS(CHECKSUM(PWDENCRYPT(N''))) / 644245094100000000 AS NUMERIC(18,9))) + ' ';
SET @newText = @newText + convert(varchar(50),
CAST(@mp.STPointN(@point).Lat AS NUMERIC(18,9)) -
CAST(ABS(CHECKSUM(PWDENCRYPT(N''))) / 644245094100000000 AS NUMERIC(18,9))) + ' ';
SET @newText = @newText + convert(varchar(50),
@mp.STPointN(@point).Z) + ', ';
SET @point = @point + 1
END
--close the parens and make the new LineString object
SET @newText = LEFT(@newText, LEN(@newText) - 1) + ')'
SET @output = geography::STGeomFromText(@newText, 4326);
END; --this will loop if it is still invalid
RETURN @output;
END;
--Any other unhandled error, just send back NULL
ELSE SET @output = NULL;
RETURN @output;
END