Organizing GIS projects?
Note: This rant will be updated as I go
I'm no computer or ArcGIS pro by any means, but here's what I do:
Base Files/DBS
- These are files that are "raw" in nature and constitute the base of all my analysis
- These files, databases, and data are hosted outside of my
projects
folder, and are hosted on my internet server, local computer, and dropbox. I always have access to them, and they are very organized, dis and aggregated. You'll spend a lot of time organizing these. - I put them all in databases whether in Arc or PostGIS.
- To each table, I add 3 fields in the table itself or the meta data: DATE_OBTAINED, DATA_DATE, SOURCE_NOTES
- Also Base files could be queries of multiple other tables. For example, a table could aggregate all the traffic count I obtain into one large query/table.
- I also put here all other data that I find scouring the internet.
- I NEVER DO ANY DIRECT ANALYSIS ON ANY OF THE FILES IN THE BASE FILES
Project Files
- All my project files go in a
my_projects
folder. It contains everything related to that project as in, if I copy and paste that folder somewhere else, it will contain everything. - Usually I have the following structure:
- my_project/
- admin/
- communication/
- raw_data/
- analyzed_data/
- output_data/
- from_client/
- FINAL/
- code/
- some_document_date_time.doc
- README
- my_project/
- Slowly I've been moving to a local GIT. (you can even host it locally or on your own server). The reason I do not put it on GitHub is that github has a 1.2gb limit which is useless for GIS analysis
- For my projects, I usually replicate all the GIS tables that I need for my analysis into a new db: project_whatever.
- 9 times out of 10, I work only in shp files and I save all my GIS (images, excel, coordinates, etc) to my
projects/my_project/raw_data
,projects/my_projects/analyzed_data
, andprojects/my_projects/output_data
. - When a project is complete, I put the final submitted copy in
my_projects/FINAL/date_submitted
- For my MXD, I usually save to a new MXD every 2 or 3 hours
my_proj_dec_22_11__13_20.mxd
for example - For Ms Word documents, illustrations, and mostly editing documents, they go in my_projects folder such as
RFP_TENDER_Dec_22_11__11_15.doc
anddraft_ver5_Dec_31_11__12_30.doc
. Again all my final deliverables go in in the FINAL folder - For R, Python code and some C#, it gets a bit tricky, as I host it outside of the project but with a working copy to
my_projects/code
folder. I do this as most of the python code is reusable. If you put all your python code besides the projects, you'll forget about them. Also, all my python code goes on github. - To me project files include any file types including time tracking, communications (I save all my emails as .msg files), I log all our verbal communications in a word file, and I put all those files my_project/communication
- With ArcGIS use Models, LYR, and "save selection as a new shp layer". These tools will make it easy to store files in smaller formats, reuse files, and with models, be able to use something in another place.
Final Output
- Each project when finalized gets zipped and put on my external harddrive.
- All final products are converted to files from Tables, and to PDF from all other formats.
- Every Project I do, gets printed for a hard copy backup
The Bottom Line
- Each person uses multiple and different software and tools. A lot of people I know get organized using basecamp, Harvest, or any other multitude of tools. Also people have different working habits and OCD tendencies. I'm fairly obsessed with getting stuff organized maybe a bit more than others. So develop the system that causes you the least stress while guaranteeing you'll be consistent in applying and updating it
- Backup and replicate everything
- Don't work directly on your raw / base data
- For your projects always use a replica file, as data changes over time, and you don't want to be scrambling to find the
base_layer_2006.shp
. - each my_projects folder must have a README text file that you edit it while you're doing the projects to give some basic information that you know you'll forget later when you visit the project 2 years down the road
You did not state that you only work with Desktop GIS software, so I'll share some of my experiences from the programming oriented mindset. Let me first start by saying that I agree with of the things @dassouki says. I think the most important thing is not how you organize, but that you do this.
But to go on to my workflow. What I like about using a programming language (R in my case) is that the script I write documents all the steps I take. This is in contrast to using ArcGIS where I think it is harder to see how a user went from the raw input data to what you can see in an mxd file. Ofcourse you can keep a log of all the steps you take in the GUI, but I think a programming language lends itself much better to saving the exact workflow you took. This can be particularly important when a client/supervisor asks how you did something, or what you exactly did to produce a certain product.
So in practice I have several folders on my drive that are important (note that I am a scientist):
- Experiments, here I store all experiments I perform, e.g. trying a certain analysis on a certain body of data. Each experiment has its own directory. I also store resulting tables and such here. All my R scripts are in this directory.
- Datasets, all my raw datasets are stored seperate from the
- tools, I have a separate directory where I store code that I have generalized for reuse in another project.
- Documents, my work revolves around writing scientific papers. For each paper I have a separate dir where I store my Latex files. These files read the illustrations and tables from the experiments directory. A paper can contain several illustrations.
- software, in a separate dir I store software, mainly R packages I wrote and some fortran code I compile to run models.
Some main ideas I use:
- Separate (relatively) static form dynamic stuff: for example saving generalized scripts somewhere different than where you save shortterm projects. Or separate your raw data from your analyses on them.
- Use version management software where your can. I like mercurial and git.
- AUTOMATE YOUR BACKUPS!!!! You never think about them when you do them manually, and then your harddrive crashes. Under linux this kind of automation is easy. I'm not sure how this is under Windows/Mac.
In general I like using a programming language because in one script you can go from the raw data to the resulting pictures/tables. R is quite a good candidate because it can read and write GIS data easily and has a ton of analyses on board, both GIS and statistics.
I would just like to add to the above answer - 2 things.
I like to have folders in the import raw data directory - folders for each time a receive a dataset - i.e. from_clientname-2011dec23. This way i can trace back when i received each piece of data used in the project.
I also like having a project doc folding on the go - i can then create either a word doc or a simple TXT file in here that i can write down what i did on the project, date, and who requested it. That way i can go back and cover myself off is someone questions why i did something. This may sound tedious for small requests, but it can save you in the end.