Creating a subset of XY data by most recent date
Maybe you could use the dissolve geoprocessing tool with Point as the dissolve field and specify Date as the statistics field with the MAX statistic type.
http://desktop.arcgis.com/en/arcmap/latest/tools/data-management-toolbox/dissolve.htm
It has many statistics type available too which has the range, first, last types too.
I suggest you to use GeoPandas python package. You can use it within ArcMap Python Console if you install it.
Sample data:
import geopandas as gp
from datetime import datetime
# open data file
df = gp.read_file("C:/PATH/TO/YOUR/FILE.shp")
# convert "SAMP_DATE" to datetime and add to new DATE column
# as far as I know, GeoPandas gets date type as string
df["DATE"] = df["SAMP_DATE"].apply(lambda x:datetime.strptime(x, '%m/%d/%Y'))
Note: If date format in your data is different, change '%m/%d/%Y'
pattern. It seems the pattern is '%m/%d/%Y'
in your data. Mine is '%Y-%m-%d'
.
# sort, group, get first (most recent)
df_recent = df.sort_values(['X', 'Y', 'DATE', 'SAMP_TIME'], ascending=False) \
.groupby(['X', 'Y']) \ # group by points having same coordinates
.first() \ # get first row in group
.reset_index()
Result of .sort_values()
and .groupby()
:
Note: groupby
preserves the order of rows within each group. SAMP_TIME
is string and '15:50'
string is bigger than '10:30'
string. So, you don't need to convert that string to datetime etc.
Result of .first()
: (most recent data you want)
# remove DATE column
df_recent.drop("DATE", axis=1, inplace=True)
# convert DataFrame to GeoDataFrame
df_recent= gp.GeoDataFrame(df_recent, crs=df.crs)
# Save the most recent data
df_recent.to_file("C:/PATH/TO/NEW/FILE.shp")
All code without comments:
import geopandas as gp
from datetime import datetime
df = gp.read_file("C:/PATH/TO/YOUR/FILE.shp")
df["DATE"] = df["SAMP_DATE"].apply(lambda x:datetime.strptime(x, '%m/%d/%Y'))
df_recent = df.sort_values(['X', 'Y', 'DATE', 'SAMP_TIME'], ascending=False).groupby(['X', 'Y']).first().reset_index()
df_recent.drop("DATE", axis=1, inplace=True)
df_recent= gp.GeoDataFrame(df_recent, crs=df.crs)
df_recent.to_file("C:/PATH/TO/NEW/FILE.shp")
Here are steps I followed and got correct results. Do let me know me know how it works for you as it was tested with my own sample data
- Create new field as
StringTime
- calculate
StringTime = datetime.datetime.strptime( !Date! ,"%d-%m-%Y").strftime('%Y%m%d')
- Dissolve layer using field
X_Point
andY_Point
and in statistics useMax(StringTime)
andMax(Time)