Querying Esri File Geodatabase?
Personal geodatabase are great because they can be accessed outside of ArcGIS. But they are slow and, in practice, the performance starts degrading after they get larger than 400-500MB in size. If you can - do migrate to the file geodatabases.
There are many ways you can query your data using SQL-like syntax:
Select By Attributes geoprocessing tool or a menu command will let you run your SQL
where
clause.You can use the Python window in ArcMap for running the Python code. For instance,
arcpy.da.SearchCursor
provides a rich interface into running your SQL queries while limiting the recordset. Look for thesql_clause
argument in this help link. SQL prefix supports None, DISTINCT, and TOP. SQL postfix supports None, ORDER BY, and GROUP BY.
Some examples of the SQL queries:
- Select distinct sub regions for all states table
The code:
import arcpy
states = r'C:\Program Files (x86)\ArcGIS\Desktop10.4\TemplateData\TemplateData.gdb\USA\states'
sql = ('DISTINCT', None)
with arcpy.da.SearchCursor(states,"SUB_REGION",sql_clause=sql) as cur:
for row in cur:
print row[0]
#E N Cen
#S Atl
#Mid Atl
#Mtn
#E S Cen
#N Eng
#W S Cen
#W N Cen
#Pacific
- Group counties by state and fips code
The code:
import arcpy
counties = r'C:\Program Files (x86)\ArcGIS\Desktop10.4\TemplateData\TemplateData.gdb\USA\counties'
sql = (None, 'GROUP BY STATE_NAME, STATE_FIPS')
with arcpy.da.SearchCursor(counties,["STATE_NAME","STATE_FIPS"],sql_clause=sql) as cur:
for row in cur:
print row[0],row[1]
#Minnesota 27
#Washington 53
#Idaho 16
#Montana 30
#North Dakota 38
#Maine 23
#Wisconsin 55
pandas
Python package provides rich interface for data analysis. This is pre-installed with ArcGIS 10.4 and can be installed easily for pre-10.4 versions of ArcGIS. There are great samples here that translate SQL queries into pandas syntax.- Create a data frame from a feature class
The code:
states = r'C:\Program Files (x86)\ArcGIS\Desktop10.4\TemplateData\TemplateData.gdb\USA\states'
fields = ['AREA','STATE_NAME','SUB_REGION']
df = pd.DataFrame.from_records([f for f in arcpy.da.SearchCursor(states,fields,"SUB_REGION = 'Pacific'")])
df.columns = fields
print df
AREA STATE_NAME SUB_REGION
0 6381 Hawaii Pacific
1 67290 Washington Pacific
2 97074 Oregon Pacific
3 157776 California Pacific
4 576594 Alaska Pacific
Running SQL ORDER BY:
df.sort_values('AREA',ascending=False)
AREA STATE_NAME SUB_REGION
4 576594.104 Alaska Pacific
3 157776.310 California Pacific
2 97073.594 Oregon Pacific
1 67290.061 Washington Pacific
0 6380.614 Hawaii Pacific
Running SQL TOP:
df.nlargest(3,'AREA')
AREA STATE_NAME SUB_REGION
4 576594.104 Alaska Pacific
3 157776.310 California Pacific
2 97073.594 Oregon Pacific
At a last resort, you might consider to keep a copy of your data in other free DBMS (such as SQL Server Express, PostGIS, SQLite) to be able to take advantage of SQL querying. You can execute SQL queries from arcpy
to DBMS using any decent Python package such as pymssql
, SQLAlchemy
, or psycopg2
. Python also has sqlite3
module for working with SQLite databases (you can copy your data into the file-based database and run some queries there).
I would recommend staying with .gdb since it sounds like only ArcGIS desktop users will be querying the data. Here are some general concepts to help enhance querying:
- To query data, first recommendation use arcpy da cursors (will only look at certain fields while querying, this should make querying tables with many fields more efficient) with if conditional logic for evaluating for values you are looking for. You may also use select layer by attribute arcpy method
- Apply attribute and spatial indexes to improve querying
- Use filter expression on cursors to query subsets within subsets of data