You are here: Home > Dive Into Python > XML Processing > Creating separate handlers by node type | << >> | ||||
Dive Into PythonPython for experienced programmers |
The third useful XML processing tip involves separating your code into logical functions, based on node types and element names. Parsed XML documents are made up of various types of nodes, each represented by a Python object. The root level of the document itself is represented by a Document object. The Document then contains one or more Element objects (for actual XML tags), each of which may contain other Element objects, Text objects (for bits of text), or Comment objects (for embedded comments). Python makes it easy to write a dispatcher to separate the logic for each node type.
>>> from xml.dom import minidom >>> xmldoc = minidom.parse('kant.xml') >>> xmldoc <xml.dom.minidom.Document instance at 0x01359DE8> >>> xmldoc.__class__ <class xml.dom.minidom.Document at 0x01105D40> >>> xmldoc.__class__.__name__ 'Document'
Assume for a moment that kant.xml is in the current directory. | |
As we saw in Packages, the object returned by parsing an XML document is a Document object, as defined in the minidom.py in the xml.dom package. As we saw in Instantiating classes, __class__ is built-in attribute of every Python object. | |
Furthermore, __name__ is a built-in attribute of every Python class, and it is a string. This string is not mysterious; it’s the same as the class name you type when you define a class yourself. (See Defining classes.) |
Fine, so now we can get the class name of any particular XML node (since each XML node is represented as a Python object). How can we use this to our advantage to separate the logic of parsing each node type? The answer is getattr, which we first saw in Getting object references with getattr.
def parse(self, node): parseMethod = getattr(self, "parse_%s" % node.__class__.__name__) parseMethod(node)
def parse_Document(self, node): self.parse(node.documentElement) def parse_Text(self, node): text = node.data if self.capitalizeNextWord: self.pieces.append(text[0].upper()) self.pieces.append(text[1:]) self.capitalizeNextWord = 0 else: self.pieces.append(text) def parse_Comment(self, node): pass def parse_Element(self, node): handlerMethod = getattr(self, "do_%s" % node.tagName) handlerMethod(node)
In this example, the dispatch functions parse and parse_Element simply find other methods in the same class. If your processing is very complex (or you have many different tag names), you could break up your code into separate modules, and use dynamic importing to import each module and call whatever functions you needed. Dynamic importing will be discussed in Data-Centric Programming.
<< Finding direct children of a node |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
Handling command line arguments >> |