Copying ArcSDE geodatabase to file geodatabase using ArcPy?
I know this post is a little old but I though I would share my answer since I was faced with the same issue. The following script SHOULD copy all tables, feature classes and relationships not in a dataset and also will copy over all datasets including the feature classes, topology, etc within the dataset. It will skip over any errors durring the copy adn keep going. It will produce a log file that contains data such as the source DB item count and the destination item count so you can compare the copy and it will log errors it encounters as well.
import arcpy, os, shutil, time
import logging as log
from datetime import datetime
def formatTime(x):
minutes, seconds_rem = divmod(x, 60)
if minutes >= 60:
hours, minutes_rem = divmod(minutes, 60)
return "%02d:%02d:%02d" % (hours, minutes_rem, seconds_rem)
else:
minutes, seconds_rem = divmod(x, 60)
return "00:%02d:%02d" % (minutes, seconds_rem)
def getDatabaseItemCount(workspace):
arcpy.env.workspace = workspace
feature_classes = []
for dirpath, dirnames, filenames in arcpy.da.Walk(workspace,datatype="Any",type="Any"):
for filename in filenames:
feature_classes.append(os.path.join(dirpath, filename))
return feature_classes, len(feature_classes)
def replicateDatabase(dbConnection, targetGDB):
startTime = time.time()
featSDE,cntSDE = getDatabaseItemCount(dbConnection)
featGDB,cntGDB = getDatabaseItemCount(targetGDB)
now = datetime.now()
logName = now.strftime("SDE_REPLICATE_SCRIPT_%Y-%m-%d_%H-%M-%S.log")
log.basicConfig(datefmt='%m/%d/%Y %I:%M:%S %p', format='%(asctime)s %(message)s',\
filename=logName,level=log.INFO)
print "Old Target Geodatabase: %s -- Feature Count: %s" %(targetGDB, cntGDB)
log.info("Old Target Geodatabase: %s -- Feature Count: %s" %(targetGDB, cntGDB))
print "Geodatabase being copied: %s -- Feature Count: %s" %(dbConnection, cntSDE)
log.info("Geodatabase being copied: %s -- Feature Count: %s" %(dbConnection, cntSDE))
arcpy.env.workspace = dbConnection
#deletes old targetGDB
try:
shutil.rmtree(targetGDB)
print "Deleted Old %s" %(os.path.split(targetGDB)[-1])
log.info("Deleted Old %s" %(os.path.split(targetGDB)[-1]))
except Exception as e:
print e
log.info(e)
#creates a new targetGDB
GDB_Path, GDB_Name = os.path.split(targetGDB)
print "Now Creating New %s" %(GDB_Name)
log.info("Now Creating New %s" %(GDB_Name))
arcpy.CreateFileGDB_management(GDB_Path, GDB_Name)
datasetList = [arcpy.Describe(a).name for a in arcpy.ListDatasets()]
featureClasses = [arcpy.Describe(a).name for a in arcpy.ListFeatureClasses()]
tables = [arcpy.Describe(a).name for a in arcpy.ListTables()]
#Compiles a list of the previous three lists to iterate over
allDbData = datasetList + featureClasses + tables
for sourcePath in allDbData:
targetName = sourcePath.split('.')[-1]
targetPath = os.path.join(targetGDB, targetName)
if arcpy.Exists(targetPath)==False:
try:
print "Atempting to Copy %s to %s" %(targetName, targetPath)
log.info("Atempting to Copy %s to %s" %(targetName, targetPath))
arcpy.Copy_management(sourcePath, targetPath)
print "Finished copying %s to %s" %(targetName, targetPath)
log.info("Finished copying %s to %s" %(targetName, targetPath))
except Exception as e:
print "Unable to copy %s to %s" %(targetName, targetPath)
print e
log.info("Unable to copy %s to %s" %(targetName, targetPath))
log.info(e)
else:
print "%s already exists....skipping....." %(targetName)
log.info("%s already exists....skipping....." %(targetName))
featGDB,cntGDB = getDatabaseItemCount(targetGDB)
print "Completed replication of %s -- Feature Count: %s" %(dbConnection, cntGDB)
log.info("Completed replication of %s -- Feature Count: %s" %(dbConnection, cntGDB))
totalTime = (time.time() - startTime)
totalTime = formatTime(totalTime)
log.info("Script Run Time: %s" %(totalTime))
if __name__== "__main__":
databaseConnection = r"YOUR_SDE_CONNECTION"
targetGDB = "DESTINATION_PATH\\SDE_Replicated.gdb"
replicateDatabase(databaseConnection, targetGDB)
I had really good luck with this. I was replicating an SDE database to a file geodatabase. I haven't done too extensive of testing on this script though since it fulfilled all my needs. I tested it using ArcGIS 10.3. Also, one thing to note, I was in talks with someone that has used this script and they ran into an issue with an error copying certain datasets due to improper permissions and empty tables.
Lemur - why not create your relationships based to a global id instead of the object id? That you your relationships would be preserved. If you haven't created global id's I would highly recommend it.
-update
I added a little more logic into the code to handle bad database connection paths and better logging and error handling:
import time, os, datetime, sys, logging, logging.handlers, shutil
import arcpy
########################## user defined functions ##############################
def getDatabaseItemCount(workspace):
log = logging.getLogger("script_log")
"""returns the item count in provided database"""
arcpy.env.workspace = workspace
feature_classes = []
log.info("Compiling a list of items in {0} and getting count.".format(workspace))
for dirpath, dirnames, filenames in arcpy.da.Walk(workspace,datatype="Any",type="Any"):
for filename in filenames:
feature_classes.append(os.path.join(dirpath, filename))
log.info("There are a total of {0} items in the database".format(len(feature_classes)))
return feature_classes, len(feature_classes)
def replicateDatabase(dbConnection, targetGDB):
log = logging.getLogger("script_log")
startTime = time.time()
if arcpy.Exists(dbConnection):
featSDE,cntSDE = getDatabaseItemCount(dbConnection)
log.info("Geodatabase being copied: %s -- Feature Count: %s" %(dbConnection, cntSDE))
if arcpy.Exists(targetGDB):
featGDB,cntGDB = getDatabaseItemCount(targetGDB)
log.info("Old Target Geodatabase: %s -- Feature Count: %s" %(targetGDB, cntGDB))
try:
shutil.rmtree(targetGDB)
log.info("Deleted Old %s" %(os.path.split(targetGDB)[-1]))
except Exception as e:
log.info(e)
GDB_Path, GDB_Name = os.path.split(targetGDB)
log.info("Now Creating New %s" %(GDB_Name))
arcpy.CreateFileGDB_management(GDB_Path, GDB_Name)
arcpy.env.workspace = dbConnection
try:
datasetList = [arcpy.Describe(a).name for a in arcpy.ListDatasets()]
except Exception, e:
datasetList = []
log.info(e)
try:
featureClasses = [arcpy.Describe(a).name for a in arcpy.ListFeatureClasses()]
except Exception, e:
featureClasses = []
log.info(e)
try:
tables = [arcpy.Describe(a).name for a in arcpy.ListTables()]
except Exception, e:
tables = []
log.info(e)
#Compiles a list of the previous three lists to iterate over
allDbData = datasetList + featureClasses + tables
for sourcePath in allDbData:
targetName = sourcePath.split('.')[-1]
targetPath = os.path.join(targetGDB, targetName)
if not arcpy.Exists(targetPath):
try:
log.info("Atempting to Copy %s to %s" %(targetName, targetPath))
arcpy.Copy_management(sourcePath, targetPath)
log.info("Finished copying %s to %s" %(targetName, targetPath))
except Exception as e:
log.info("Unable to copy %s to %s" %(targetName, targetPath))
log.info(e)
else:
log.info("%s already exists....skipping....." %(targetName))
featGDB,cntGDB = getDatabaseItemCount(targetGDB)
log.info("Completed replication of %s -- Feature Count: %s" %(dbConnection, cntGDB))
else:
log.info("{0} does not exist or is not supported! \
Please check the database path and try again.".format(dbConnection))
#####################################################################################
def formatTime(x):
minutes, seconds_rem = divmod(x, 60)
if minutes >= 60:
hours, minutes_rem = divmod(minutes, 60)
return "%02d:%02d:%02d" % (hours, minutes_rem, seconds_rem)
else:
minutes, seconds_rem = divmod(x, 60)
return "00:%02d:%02d" % (minutes, seconds_rem)
if __name__ == "__main__":
startTime = time.time()
now = datetime.datetime.now()
############################### user variables #################################
'''change these variables to the location of the database being copied, the target
database location and where you want the log to be stored'''
logPath = ""
databaseConnection = "path_to_sde_or_gdb_database"
targetGDB = "apth_to_replicated_gdb\\Replicated.gdb"
############################### logging items ###################################
# Make a global logging object.
logName = os.path.join(logPath,(now.strftime("%Y-%m-%d_%H-%M.log")))
log = logging.getLogger("script_log")
log.setLevel(logging.INFO)
h1 = logging.FileHandler(logName)
h2 = logging.StreamHandler()
f = logging.Formatter("[%(levelname)s] [%(asctime)s] [%(lineno)d] - %(message)s",'%m/%d/%Y %I:%M:%S %p')
h1.setFormatter(f)
h2.setFormatter(f)
h1.setLevel(logging.INFO)
h2.setLevel(logging.INFO)
log.addHandler(h1)
log.addHandler(h2)
log.info('Script: {0}'.format(os.path.basename(sys.argv[0])))
try:
########################## function calls ######################################
replicateDatabase(databaseConnection, targetGDB)
################################################################################
except Exception, e:
log.exception(e)
totalTime = formatTime((time.time() - startTime))
log.info('--------------------------------------------------')
log.info("Script Completed After: {0}".format(totalTime))
log.info('--------------------------------------------------')
The only way you can get a true copy of the data (domains, datasets, relationships, etc) is to use the manual copy and paste method inside catalog. ESRI has not yet given us the ability to transfer this data over any other way with a single operation that can be scripted easily.
I have a nightly process that copies my two primary SDE Databases to file geodatabases for Continuity of Operations. This is so that in the event of an emergency my staff has some data to work with until my IT shop can rebuild my SDE from backup. After much trial and error I have decided we can live with the limitations of using FeatureClassToFeatureClass_conversion and TableToTable_conversion to transfer our data over every night.
Yes, we lose some of the functionality of the geodatabase but it will now run unattended at night and is ready to go as soon as I get it. In my case the only functionality that we are truly missing (assuming operating under an emergency mode) is that my relationship classes are broken because the conversion resets the ObjectIDs that link the two tables.
Until ESRI gives us more options you will have to look at what are you willing to sacrifice at the moment; time and effort or functionality?
I have used a script similar to Peter's up above and had good luck, although his is better. One thing to point out that might trip someone up is if you are using 64 bit python geoprocessing and you have ArcFM loaded on top of ESRI, it will fail on all features that have been set to use ArcFM or Designer with a ERROR 000260. This is because you have to use 32 bit python or the ArcFM stuff will not license properly.
For a more detailed description of using the 32 bit ArcPy, see the first two comments on this thread in Exchange
https://infrastructurecommunity.schneider-electric.com/docs/DOC-2563