Which language is easiest and fastest to work with XML content?
XSLT
I suggest using XSLT templates to transform the XML into INSERT statements (or whatever you need), as required.
You should be able to invoke XSLT from any of the languages you mention.
This will result in a lot less code than doing it the long way round.
I'll toss in a suggestion for Hpricot, a popular Ruby XML parser (although there are many similar options).
Example:
Given the following XML:
<Export>
<Product>
<SKU>403276</SKU>
<ItemName>Trivet</ItemName>
<CollectionNo>0</CollectionNo>
<Pages>0</Pages>
</Product>
</Export>
You parse simply by:
FIELDS = %w[SKU ItemName CollectionNo Pages]
doc = Hpricot.parse(File.read("my.xml"))
(doc/:product).each do |xml_product|
product = Product.new
for field in FIELDS
product[field] = (xml_product/field.intern).first.innerHTML
end
product.save
end
It sounds like your application would be very fit for a Rails application, You could quickly prototype what you need, you've got direct interaction with your database of choice and you can output the data however you need to.
Here's another great resource page for parsing XML with Hpricot that might help as well as the documentation.
In .NET, C# 3.0 and VB9 provide excellent support for working with XML using LINQ to XML:
LINQ to XML Overview
A dynamic language rules for this. Why? The mappings are easy to code and change. You don't have to recompile and rebuild.
Indeed, with a little cleverness, you can have your "XML XPATH to a Tag -> DB table-field" mappings as disjoint blocks of Python code that your main application imports.
The block of Python code is your configuration file. It's not an .ini
file or a .properties
file that describes a configuration. It is the configuration.
We use Python, xml.etree and the SQLAlchemy (to separate the SQL out of your programs) for this because we're up and running with very little effort and a great deal of flexibility.
source.py
"""A particular XML parser. Formats change, so sometimes this changes, too."""
import xml.etree.ElementTree as xml
class SSXML_Source( object ):
ns0= "urn:schemas-microsoft-com:office:spreadsheet"
ns1= "urn:schemas-microsoft-com:office:excel"
def __init__( self, aFileName, *sheets ):
"""Initialize a XML source.
XXX - Create better sheet filtering here, in the constructor.
@param aFileName: the file name.
"""
super( SSXML_Source, self ).__init__( aFileName )
self.log= logging.getLogger( "source.PCIX_XLS" )
self.dom= etree.parse( aFileName ).getroot()
def sheets( self ):
for wb in self.dom.getiterator("{%s}Workbook" % ( self.ns0, ) ):
for ws in wb.getiterator( "{%s}Worksheet" % ( self.ns0, ) ):
yield ws
def rows( self ):
for s in self.sheets():
print s.attrib["{%s}Name" % ( self.ns0, ) ]
for t in s.getiterator( "{%s}Table" % ( self.ns0, ) ):
for r in t.getiterator( "{%s}Row" % ( self.ns0, ) ):
# The XML may not be really useful.
# In some cases, you may have to convert to something useful
yield r
model.py
"""This is your target object.
It's part of the problem domain; it rarely changes.
"""
class MyTargetObject( object ):
def __init__( self ):
self.someAttr= ""
self.anotherAttr= ""
self.this= 0
self.that= 3.14159
def aMethod( self ):
"""etc."""
pass
builder_today.py One of many mapping configurations
"""One of many builders. This changes all the time to fit
specific needs and situations. The goal is to keep this
short and to-the-point so that it has the mapping and nothing
but the mapping.
"""
import model
class MyTargetBuilder( object ):
def makeFromXML( self, element ):
result= model.MyTargetObject()
result.someAttr= element.findtext( "Some" )
result.anotherAttr= element.findtext( "Another" )
result.this= int( element.findtext( "This" ) )
result.that= float( element.findtext( "that" ) )
return result
loader.py
"""An application that maps from XML to the domain object
using a configurable "builder".
"""
import model
import source
import builder_1
import builder_2
import builder_today
# Configure this: pick a builder is appropriate for the data:
b= builder_today.MyTargetBuilder()
s= source.SSXML_Source( sys.argv[1] )
for r in s.rows():
data= b.makeFromXML( r )
# ... persist data with a DB save or file write
To make changes, you can correct a builder or create a new builder. You adjust the loader source to identify which builder will be used. You can, without too much trouble, make the selection of builder a command-line parameter. Dynamic imports in dynamic languages seem like overkill to me, but they are handy.