The main idea for this post was to define way how to process datasets that are stored as a files and organized in multiple folders. To be more specific I will describe some particular case.
Task: Create GeoTiff files from delivered ASCII files (x,y,z). ASCII files are organised in folders: the root folder is “grid_data” and there are several subfolders grouping the data according to resolution (5m grid, 2m grid). There is no coordinate system definition attached to files. Coordinate system definition is delivered in separated pdf document as well as information about grid resolution.
Before processing you always have to make some preparation especially if you want to make you processing autonomous and not dependent to operator. In this case I’ve decided to create xml file (info.xml) that will hold some basic information about delivered data as well as some other information about desired parameters for processing. Here is sample file.
<?xml version="1.0" encoding="UTF-8"?> <geoprocessing> <input_data> <!-- dataset resolution in m --> <res>5</res> <!-- name of ESRI prj file --> <srs>C:\sp\python\grid_data\srs\ED_1950_UTM_Zone_31N.prj</srs> </input_data> <results> <!-- Root folder for processed files. Leave empty if you want to use same folder as input data --> <root_folder></root_folder> <!-- name of ESRI prj file. Leave empty if you want to use same srs as input data --> <srs></srs> </results> </geoprocessing>I have decided to use xml for this purpose because it is more flexible when you need to make some changes (if you want to use similar approach to solution of some other task). To parse this xml file I have created following python class:
''' Created on 23. nov. 2012 @author: Stipica Pavicic ''' from xml.dom.minidom import parseString ''' configuration sample file m:\python\info.xml ''' class ConfigObj(object): ''' Parsing of xml file that holds configuration parameters for porcessing. Parameters are separated in two sections <input_data> and <results>. ''' _dom='' _result_obj='' _input_obj='' class InputObj(object): ''' parsing of <input_data> segment of info.xml file ''' _res = 'not_set' _srs = 'not_set' def __init__(self, dom_obj): node = dom_obj.getElementsByTagName('input_data')[0] dom_elem = node.getElementsByTagName('srs')[0].firstChild if dom_elem: self._srs = dom_elem.data dom_elem = node.getElementsByTagName('res')[0].firstChild if dom_elem: self._res = dom_elem.data def getSrs(self): return self._srs def getRes(self): return self._res class ResultObj(object): ''' parsing of <results> segment of info.xml file ''' _root_folder='not_set' _srs='not_set' def __init__(self, dom_obj): node = dom_obj.getElementsByTagName('results')[0] dom_elem = node.getElementsByTagName('srs')[0].firstChild if dom_elem: self._srs = dom_elem.data else: # in the case if result srs is not defined in xml file use the input data srs nodei = dom_obj.getElementsByTagName('input_data')[0] dom_elemi = nodei.getElementsByTagName('srs')[0].firstChild self._srs = dom_elemi.data dom_elem = node.getElementsByTagName('root_folder')[0].firstChild if dom_elem: self._root_folder = dom_elem.data def getSrs(self): return self._srs def getRootFolder(self): return self._root_folder def __init__(self, info_xml_path): try: myfile = open(info_xml_path,'r') #convert to string: mydata = myfile.read() #close file because we dont need it anymore: myfile.close() #parse the xml you got from the file self._dom = parseString(mydata) self._result_obj = self.ResultObj(self._dom) self._input_obj = self.InputObj(self._dom) except IOError: print("IO Error") except: print("General Error!") def getDom(self): return self._dom def getGetResultParam(self): ''' accessTo use this class you need to create xml file with configuration parameters (eg. c:\test\info.xml). Here is the sample code how to access parameters:parameters ''' return self._result_obj def getGetInputParam(self): ''' access parameters ''' return self._input_obj
from my.xml.configurator import ConfigObj conf = ConfigObj(r'c:\test\info.xml') print conf.getGetInputParam().getSrs() print conf.getGetInputParam().getRes() print conf.getGetResultParam().getSrs() print conf.getGetResultParam().getRootFolder()As you can see it is very easy to adjust python code to some different structured xml file. Maybe you need to add additional parameters inside <input> section, or maybe you want to add additional section for some other group of parameters (something like <global>). All this depends on the issue you want to solve using this approach. For this particular case I’ll stick with the parameters that already exists in presented xml file ..I’m calling it info.xml but it can be any other name :-) So... for now you know how to create configuration file, to create some parameters and values, and how to access those parameters using python code (class ConfigObj). Next step is to create script that process all files in all subfolders starting from some arbitrary root folder. Script will search for configuration files (info.xml) and processing will be performed only for those folders. Here is code for that. The processing is straightforward but I will make some comments just to explain idea. Some comments are also included in source code.
''' Created on 23. nov. 2012 @author: Stipica Pavicic ''' from my.xml.configurator import ConfigObj from my.process_timer import Timer import os, fnmatch import arcpy import random import shutil from arcpy.conversion import PointToRaster arcpy.CheckOutExtension("3D") def createTmpGDB(folder): '''creates temporary file geodatabase''' #temp geodatabase name tmp_gdb_name = 'tempXXX.gdb' #redefine name if already exists while os.path.exists(folder + os.sep + tmp_gdb_name): tmp_gdb_name = tmp_gdb_name.replace('XXX',str(random.randint(1000000, 9000000))) arcpy.CreateFileGDB_management(folder, tmp_gdb_name ) return tmp_gdb_name def deleteTmpGDB(tmp_gdb): '''removes temporary file geodatabase''' shutil.rmtree(tmp_gdb) return 0 def proces_grid_file(in_file_folder, out_file_folder, resolution, in_srs, out_srs, filename, temporery_gdb): try: fbasename, extension = os.path.splitext(filename) in_csv = in_file_folder + os.sep + filename #point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + fbasename #basename can contain ilegal characters for arcpy that is why I use random name (for this particular case the name of intermediate results is not important point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + str(random.randint(1000000, 9000000)) while os.path.exists(point_data): #redefine point_data file name if already exists point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + str(random.randint(1000000, 9000000)) out_tif = out_file_folder + os.sep + fbasename + ".tif" arcpy.ASCII3DToFeatureClass_3d(in_csv, "XYZ", point_data, "POINT", "", out_srs, "", "", "DECIMAL_POINT") arcpy.PointToRaster_conversion(point_data , "Shape.Z", out_tif, "MOST_FREQUENT", "NONE", str(resolution)) #process finished without exception return 1 except Exception as e: print e #process finished with exception return 0 for (root, dirs, files) in os.walk(r'C:\sp\python\grid_data'): measure = Timer() config_file_name='info.xml' filter='*.csv' # get parameters from xml file (conf) and define configuration for processing the files in folder (root) if config_file_name in files: print '*****************************************************************************' print '>>>> Root : ', root print 'Nr. of files : ' , len(fnmatch.filter(files,filter)) # nuber of files in folder that match filter expression print 'Config file : ' , root + os.sep + config_file_name conf = ConfigObj(root + os.sep + config_file_name) #open prj files inputSrsFile = open(conf.getGetInputParam().getSrs(),'r') resultSrsFile = open(conf.getGetResultParam().getSrs(),'r') #read files inputSrs = inputSrsFile.read() resultSrs = resultSrsFile.read() #close files inputSrsFile.close() resultSrsFile.close() print 'spatial reference system definition' print 'input data srs :' , inputSrs print 'results srs :' , resultSrs # create temporary file geodatabase tmp_gdb = createTmpGDB(root) # loop though list of files that match filter criteria (filter) & process files cnt = 0 for filename in fnmatch.filter(files,filter): print '-----------------------------------------------------------------------------' cnt = cnt + 1 print cnt, ' -> processing file: ', filename fbasename, extension = os.path.splitext(filename) #print root, '||', filename, '||', fbasename, '||', extension r=proces_grid_file(root, root, conf.getGetInputParam().getRes(), inputSrs, resultSrs, filename, tmp_gdb) if r: # r=0 process is finished without exception print filename, 'processing finished!' else: # r=1 process is finished with exception print filename, ': file is not processed!' print filename, measure.getTimeDifference() #delete temporary file geodatabase deleteTmpGDB(root + os.sep + tmp_gdb) # total time for processing print measure.getElapsedlTime()Script starts at the line 59 but before that I have included some helper functions:
- to create temporary file geodatabase (line 17)
- to remove temporary file geodatabase (line 30)
- file processing function (line 36)
... how to use this code?
Firstly you need to look how the files are organised. If they are stored in folder by grouping according to some joint property there is a good chance that this code can be useful to you, no matter if you need to perform the same processing (as in this post) or not. For this particular case files were stored in folders according to the resolution and the spatial reference system, ie. files with the same resolution and the same spatial reference system should be in the same folder. You can split files with the same properties in multiple folders but you cannot put the files with the different properties in the same folder.
Secondly, for all folders that you want to process, create one xml file with the properties. Use the same name for all xml files (see the code line 62; info.xml). Folders without configuration file won’t be processed.
Thirdly, change the filter you want to use for your case (line 63). Files I had to process were delivered as csv files.
And finally, run the script. To be able to run it you will have to establish working enviroment (all import should be accessible by script). To measure processing time I have also used class described in one of my previous posts. If you don’t want to use it just comment lines 7, 61, 110 and 116 (all lines related to Timer class).
Feel free to use this code and to modify it according to your needs. I hope that I will make a new post to show you how to modify existing code and how to use it by following this concept: walking through the folders and processing files based on the xml configuration file.
.....
... enjoy!
Mmorpg oyunları
ReplyDeleteInstagram takipci satin al
Tiktok Jeton Hilesi
TİKTOK JETON HİLESİ
antalya saç ekimi
İnstagram Takipçi Satın Al
Instagram Takipci Satın Al
MT2 PVP SERVERLER
instagram takipçi satın al