Self split geopandas Linestring geodataframe in a fast way without loosing all attributes
I found a solution.
Using my example:
a) The original shapefile
import geopandas as gpd
df = gpd.read_file("stac-graphe.shp")
df
id test geometry
1 test1 LINESTRING (10.244 -273.317, 784.201 -222.924)
2 test2 LINESTRING (210.484 -553.461, 324.991 -4.534)
3 test3 LINESTRING (169.970 -134.276, 126.511 -218.533...
4 test4 LINESTRING (100.000 -433.317, 724.390 -112.341...
5 test5 LINESTRING (232.683 -113.317, 694.146 -445.024...
6 test6 LINESTRING (563.415 -552.341, 559.512 -22.585)
b) Buffer the original geometry to avoid float arithmetic problems (in intersects
or within
)
df2 = df.copy()
df2.geometry = df2.geometry.buffer(0.01)
c) Use unary_union
to split all the self-intersected LineStrings
un = df.geometry.unary_union
geom = [i for i in un]
id = [j for j in range(len(geom))]
unary = gpd.GeoDataFrame({"id":id,"geometry":geom})
unary.head()
id geometry
0 LINESTRING (10.244 -273.317, 192.920 -261.423)
1 LINESTRING (192.920 -261.423, 272.484 -256.242)
2 LINESTRING (272.484 -256.242, 418.308 -246.748)
3 LINESTRING (418.308 -246.748, 469.403 -243.421)
4 LINESTRING (469.403 -243.421, 561.095 -237.451)
d) Use a spatial join (with within
or intersect
) to join the two dataframes and retrieve the original attributes
from geopandas.tools import sjoin
result =sjoin(unary, df2, how="inner",op='within')
result.head()
id_left geometry index_right id_right test
0 LINESTRING (10.244 -273.317, 192.920 -261.423) 0 1 test1
1 LINESTRING (192.920 -261.423, 272.484 -256.242) 0 1 test1
2 LINESTRING (272.484 -256.242, 418.308 -246.748) 0 1 test1
3 LINESTRING (418.308 -246.748, 469.403 -243.421) 0 1 test1
4 LINESTRING (469.403 -243.421, 561.095 -237.451) 0 1 test1