How can I use Mathematica to sort the US Congressional Districts by Perimeter÷Area
The US congressional districts are available as geographic entities in the form
Entity["USCongressionalDistrict", "Illinois:District7"]
As such, in principle you can enumerate them using
GeoEntities[Entity["Country", "UnitedStates"], "USCongressionalDistrict"]
but this is liable to timing out because you're asking for a lot of data, so instead you can break down the query into per-state lookups, giving you something like
Length[
districts = Flatten[GeoEntities[#, "USCongressionalDistrict"] & /@
GeoEntities[Entity["Country", "UnitedStates"], "USState"]]
]
(*449*)
(Or, as Michael pointed out, using the built-in EntityList["USCongressionalDistrict"]
is way better.)
To get the data you want, you can use GeoArea
. To get the perimeter you can apply GeoDistance
to the data but you need to fish it out first. You can get the shape of the district out using EntityValue[#, "Polygon"]
and this will return things of the form
Polygon[ GeoPosition[ {{lat1, lon1}, {lat2, lon2}, …, } ] ]
and you can't run GeoDistance
on the Polygon
- only on the GeoPosition
object inside it or the lists of latitudes and longitudes inside that. Thus you can use
perimeter[polygon_] := GeoDistance[polygon[[1]]]
and to match it up you can define
area[district_] := EntityValue[district, "Area"]
To do the calculation, then, you can run
polygons = EntityValue[districts, "Polygon"]
which is relatively fast.
There is a slight complication with the data, though, and it's that not all districts appear as a single polygon. In fact, if you define districts = EntityList["USCongressionalDistrict"]
and polygons
as above, then there are
Select[polygons, Length[#] > 1 &] // Length
= 15 districts with multiple polygons. This is a bit surprising, since US congressional districts are meant to be territorially contiguous (and hence the gerrymandering stuff), but 11 of these have islands, which is fairly reasonable. Nevertheless, there are districts like the Tennessee 2nd district which are not, in fact, contiguous, even though their exclaves are separated only by land. To visualize these 15 districts, use
multiPolygonDistricts = Pick[districts, Length[#] > 1 & /@ polygons]
GeoGraphics[EntityValue[#, "Polygon"], PlotLabel -> #,
ImageSize -> {{250}, {250}}] & /@ multiPolygonDistricts
In addition to this, there are some districts which don't have polygon data available, such as
EntityValue[Entity["USCongressionalDistrict","Pennsylvania:District19"],"Polygon"]
(*Missing["NotAvailable"]*)
There are 12 such districts and you can find them using
Pick[districts, MissingQ /@ polygons]
These turn out to be historical districts (like the aforementioned PA 19th) that are no longer in use, and as of November 2016 and MM v11 they are no longer present in the data.
There are also some irregularities in the formatting of the polygon data, so for example
EntityValue[Entity["USCongressionalDistrict", "Tennessee:District9"], "Polygon"]
has an output of the form {Polygon[_]}
, but
EntityValue[Entity["USCongressionalDistrict", "Texas:District1"], "Polygon"]
has an output of the form {{Polygon[_]}}
with a deeper list, whereas
EntityValue[Entity["USCongressionalDistrict", "Alaska:District0"], "Polygon"]
returns a straight Polygon[_]
, though its insides are all messed up (i.e. it contains multiple polygons inside it, instead of a list of polygons, which will definitely mess with the perimeter
calculator above); either way, if you're studying gerrymandering you presumably want to leave out this district as it's the entirety of Alaska. Again, this is buggy data which I'll leave to you to report.
EDIT: some issues with the data seem to have been resolved, but it's not necessarily all good yet. It's also not clear to me whether the fix is version-dependent (or fixed upstream at the server), either. So: watch out for this if you want to implement this.
To get clean data, then, you can do
cleanPolygons = Flatten /@ DeleteCases[DeleteMissing[polygons], Polygon[_]];
cleanDistricts = DeleteCases[Pick[districts, Not@*MissingQ /@ polygons],
Entity["USCongressionalDistrict", "Alaska:District0"]];
You can now calculate the perimeters using
perimeters = Total /@ Map[perimeter, cleanPolygons, {2}]
which is now quite fast as well (the previous implementation was slowing the whole thing down). Calculating the areas as
areas = EntityValue[cleanDistricts, "Area"]
you finally have enough to get your rankings: doing
TableForm[
(data = SortBy[{EntityValue[cleanDistricts, "Name"], perimeters^2/areas
}\[Transpose], Minus@*Last])[[1 ;; 5]]
]
gets you for the top five
District p^2/A
"North Carolina 12th Congressional District" 365.611
"Florida 5th Congressional District" 244.518
"North Carolina 1st Congressional District" 234.952
"North Carolina 4th Congressional District" 228.47
"Maryland 3rd Congressional District" 226.788
And, indeed, you get some pretty heavy gerrymandering going on:
As an interesting trivia, note that the fourth place on this list, Maryland's 3rd district, really stretches the definition for "contiguous" - if you zoom in on the map into the Baltimore harbour, there's not a lot of district land connecting Federal Hill and Little Italy.
It's also important to note that it matters what year the data is from: Mathematica seems to be using the 2014 electoral districts, which have since changed in some states (North Carolina in particular, but also Florida and Virginia) precisely because they were deemed to be too gerrymandered. For a good illustration of the changes see this NPR animation for the NC redistricting.
First (update: I guess the missing ones should be deleted),
alldistricts = DeleteMissing@ EntityList["USCongressionalDistrict"];
allpolygons = #["Polygon"] & /@ alldistricts (* // DeleteMissing *);
Then find the GeoArea[poly]
and perimeter* of them. Note that you might want to compute a dimensionless quantity such as perimeter^2/area.
Update:
The perimeter was a little tricker than my original blithe remark led one to believe. Here is an improvement over GeoDistance
, at least in terms of reliability:
perimeters = Total[allpolygons /. Polygon :> GeoLength@*GeoPath, {2}]
areas = Total[GeoArea /@ allpolygons, {2}] (* why not put in the areas, too *)