Seeking options for Spatial ETL (Extract, Transform, Load)?
For a recent project working with several GBs of spatial data, I started the data loading / reprojections with FME. It worked well, but there is a learning curve.
By the end of the project I was using Python scripts to automate the reamining processes. FME can be scripted, but if you have the Python basics why complicate things further? Python gives you complete flexibility and with each import script written your Python skills are improving.
I found the following Python packages invaluable when working with data transformations:
- PyProj
- GeoPy
- Shapely
- xlrd for importing data from Excel spreadsheets
- pyobdc to connect to databases
- SQLAlchemy to run SQL statements and work with databases
If you have a developer / programming background I'd recommend using Python, if you prefer working with a GUI (which can also generate nice images for documentation) I'd recommend FME.
This question has been converted to Community Wiki and wiki locked because it is an example of a question that seeks a list of answers and appears to be popular enough to protect it from closure. It should be treated as a special case and should not be viewed as the type of question that is encouraged on this, or any Stack Exchange site, but if you wish to contribute more content to it then feel free to do so by editing this answer.
I'll talk only about what i've seen in a professional context. A student of mine worked with an enterprise tasked to receive, validate and integrate huge quantities of spatial data, from a well known source (TeleAtlas) into their GIS. She used several workflows using FME, doing very complicated verifications and tranformations on the fly, from a format to another, like feature selection, topology verification, duplicates removing, etc. The workflow was afterwards able to process automatically incoming datasets.
I was on a jury for a viva probation report (sorry, google traduction of "soutenance de rapport de stage"), where the student described another FME workflow like this, but this time to validate the regional datasets sent to the national level for integration to the national risks database. The main difference is that in this last example the dataset were in very diverses file formats, raster and vector, scales, and styles.
Last, i tested Spatial Data Integrator, the open source ETL based on Talend Open Studio. The features were numerous, however less than FME's, but i think the main differences were on the documentation and the user-friendliness of the workflow creation. I was often forced to modifiy the java code source of the workflow components. But it was an earlier version of SDI, and the shortcomings i describe here are somewhat usual with open source projects at their beginning, and we cannont compare on the same level proprietary well honed software and free open source young contenders.
I love open-source but FME easily wins out against the opensource ETL's as best I can tell. It's actually quite cheap for maintenance and support too (at least compared to most other corporate solutions we have for things).
If you're looking for translations between formats then OGR may do it (with some piping into GDAL for transformations). Of course, that's command line.
For visual modelling beyond those listed in the "possible duplicate" comment, they're working on a QGIS/SEXTANTE model builder; proof of concept video: https://www.youtube.com/watch?v=LTUu-I2ouqU
(No, I don't work for Safe, I'm just a relatively happy customer).