How to reproject 500 CSV files efficiently and easily using QGIS?
If you're looking to reproject csv files from the Python Console in QGIS then you could use the following script. All you would need to change are the three paths which are mentioned in the comments.
Essentially, the script imports your csv files into QGIS as shapefiles (assuming your geometric fields are named X
and Y
). It then uses the qgis:reprojectlayer
and qgis:fieldcalculator
algorithms from the Processing Toolbox to reproject and update the X
and Y
fields with the new coordinates. It then saves these in a folder and converts them to csv files in a path you specify. So in the end, you have updated shapefiles and csv files in separate folders.
import glob, os, processing
path_to_csv = "C:/Users/You/Desktop/Testing//" # Change path to the directory of your csv files
shape_result = "C:/Users/You/Desktop/Testing/Shapefile results//" # Change path to where you want the shapefiles saved
os.chdir(path_to_csv) # Sets current directory to path of csv files
for fname in glob.glob("*.csv"): # Finds each .csv file and applies following actions
uri = "file:///" + path_to_csv + fname + "?delimiter=%s&crs=epsg:4326&xField=%s&yField=%s" % (",", "x", "y")
name = fname.replace('.csv', '')
lyr = QgsVectorLayer(uri, name, 'delimitedtext')
QgsMapLayerRegistry.instance().addMapLayer(lyr) # Imports csv files to QGIS canvas (assuming 'X' and 'Y' fields exist)
crs = 'EPSG:32633' # Set crs
shapefiles = QgsMapLayerRegistry.instance().mapLayers().values() # Identifies loaded layers before transforming and updating 'X' and 'Y' fields
for shapes in shapefiles:
outputs_0 = processing.runalg("qgis:reprojectlayer", shapes, crs, None)
outputs_1 = processing.runalg("qgis:fieldcalculator", outputs_0['OUTPUT'], 'X', 0, 10, 10, False, '$x', None)
outputs_2 = processing.runalg("qgis:fieldcalculator", outputs_1['OUTPUT_LAYER'], 'Y', 0, 10, 10, False, '$y', shape_result + shapes.name())
os.chdir(shape_result) # Sets current directory to path of new shapefiles
for layer in glob.glob("*.shp"): # Finds each .shp file and applies following actions
new_layer = QgsVectorLayer(layer, os.path.basename(layer), "ogr")
new_name = layer.replace('.shp', '')
csvpath = "C:/Users/You/Desktop/Testing/CSV results/" + new_name + ".csv" # Change path to where you want the csv(s) saved
QgsVectorFileWriter.writeAsVectorFormat(new_layer, csvpath, 'utf-8', None, "CSV")
Hope this helps!
A quick solution for transforming a space separated file containing "lon lat" in WGS84 to UTM33N but you don't get any other data:
#!/bin/bash
#
for i in $( ls *.csv ); do
gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 < ${i} > utm${i}
done
That works and it preserves the order of the data so maybe another loop using e.g. awk to combine the descriptive data with the coordinates?
Edit. Due to the messy comments I made below I'll edit the answer here instead.
The following script should do the job of reading multiple csv files, adding new coordinate columns to each file.
#!/bin/bash
#
for i in $( ls *.csv ); do
paste -d',' ${i} <(awk -v OFS="," -F " " 'NR>1 {print $1 " " $2}' ${i} | gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 | awk '{gsub(" ",",",$0); print $0}' | /usr/local/bin/sed "1i\X,Y,Z") > utm${i}
#
#paste -d',' ${i} <(awk -v OFS="," -F " " 'NR>1 {print $1 " " $2}' ${i} | gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 | awk '{gsub(" ",",",$0); print $0}' |sed "1i\X,Y,Z") > utm${i}
#
done
On OSX you will need to install the latest (2009) version of sed and use the first, uncommented line in the loop. For Linux comment out the first and use the second. Adjust the -F " "
according to the format of the separator in your csv files e.g. -F ","
for comma separated.
Also note that the elevation transformation is to the ellipsoid,not the geoid, so be sure to transform the heights accordingly.
Using qgis or even OGR is overkill for this.
Use pyproj
(https://pypi.python.org/pypi/pyproj) combined with the python csv writer and a few standard library tricks. You do not need to install anything other than pyproj
for this!
import csv
import pyproj
from functools import partial
from os import listdir, path
#Define some constants at the top
#Obviously this could be rewritten as a class with these as parameters
lon = 'lon' #name of longitude field in original files
lat = 'lat' #name of latitude field in original files
f_x = 'x' #name of new x value field in new projected files
f_y = 'y' #name of new y value field in new projected files
in_path = u'D:\\Scripts\\csvtest\\input' #input directory
out_path = u'D:\\Scripts\\csvtest\\output' #output directory
input_projection = 'epsg:4326' #WGS84
output_projecton = 'epsg:32633' #UTM33N
#Get CSVs to reproject from input path
files= [f for f in listdir(in_path) if f.endswith('.csv')]
#Define partial function for use later when reprojecting
project = partial(
pyproj.transform,
pyproj.Proj(init=input_projection),
pyproj.Proj(init=output_projecton))
for csvfile in files:
#open a writer, appending '_project' onto the base name
with open(path.join(out_path, csvfile.replace('.csv','_project.csv')), 'wb') as w:
#open the reader
with open(path.join( in_path, csvfile), 'rb') as r:
reader = csv.DictReader(r)
#Create new fieldnames list from reader
# replacing lon and lat fields with x and y fields
fn = [x for x in reader.fieldnames]
fn[fn.index(lon)] = f_x
fn[fn.index(lat)] = f_y
writer = csv.DictWriter(w, fieldnames=fn)
#Write the output
writer.writeheader()
for row in reader:
x,y = (float(row[lon]), float(row[lat]))
try:
#Add x,y keys and remove lon, lat keys
row[f_x], row[f_y] = project(x, y)
row.pop(lon, None)
row.pop(lat, None)
writer.writerow(row)
except Exception as e:
#If coordinates are out of bounds, skip row and print the error
print e