Python xml get key value
I am trying to pull account details from XML files supplied by vendors. Show
I have one vendor that supplied XML files like:
And I can parse this fairly easily using python:
Which outputs like this:
However another vendor supplies the XML files in something more like name/value pairs, and I am unsure how to easily access that data. It doesn't work the same way as above:
So far I've got this, but would like to be able to access the values separately and easily:
Which outputs something like:
So my question is this - How can I access the values and assign them to variables (based on the Looping using My main concern, though, is that your code is very fragile. It assumes that you will always receive the event
then it would also behave in an unintuitive way. Since XML is a verbose file format that explicitly describes its own structure, you should actually look at the tags instead of blindly assuming a certain structure. Here's how I would write it:
This works by temporarily keeping track of any keys and values that it sees, then actually adding the key:value pair to the dictionary when a Source code: Lib/xml/etree/ElementTree.py The Changed in version 3.3: This module will use a fast implementation whenever available. Deprecated since version 3.3: The Tutorial¶This is a short tutorial for using
XML tree and elements¶XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. Parsing XML¶We’ll be using the following XML document as the sample data for this section:
We can import this data by reading from a file: import xml.etree.ElementTree as ET tree = ET.parse('country_data.xml') root = tree.getroot() Or directly from a string: root = ET.fromstring(country_data_as_string)
As an
>>> root.tag 'data' >>> root.attrib {} It also has children nodes over which we can iterate: >>> for child in root: ... print(child.tag, child.attrib) ... country {'name': 'Liechtenstein'} country {'name': 'Singapore'} country {'name': 'Panama'} Children are nested, and we can access specific child nodes by index: >>> root[0][1].text '2008' Note Not all elements of the XML input will end up as elements of the
parsed tree. Currently, this module skips over any XML comments, processing instructions, and document type declarations in the input. Nevertheless, trees built using this module’s API rather than parsing from XML text can have comments and processing instructions in them; they will be included when generating XML output. A document type declaration may be accessed by passing a custom
Pull API for non-blocking parsing¶Most parsing functions provided by this module require the whole document to be read at once before returning any result. It is possible to use an
The most powerful tool for doing this is
>>> parser = ET.XMLPullParser(['start', 'end']) >>> parser.feed(' The obvious use case is applications that operate in a non-blocking fashion where the XML data is being received from a socket or read incrementally from some storage device. In such cases, blocking reads are unacceptable. Because it’s so flexible, Finding interesting elements¶
>>> for neighbor in root.iter('neighbor'): ... print(neighbor.attrib) ... {'name': 'Austria', 'direction': 'E'} {'name': 'Switzerland', 'direction': 'W'} {'name': 'Malaysia', 'direction': 'N'} {'name': 'Costa Rica', 'direction': 'W'} {'name': 'Colombia', 'direction': 'E'}
>>> for country in root.findall('country'): ... rank = country.find('rank').text ... name = country.get('name') ... print(name, rank) ... Liechtenstein 1 Singapore 4 Panama 68 More sophisticated specification of which elements to look for is possible by using XPath. Modifying an XML File¶
Once created, an Let’s say we want to add one to each country’s rank, and add an >>> for rank in root.iter('rank'): ... new_rank = int(rank.text) + 1 ... rank.text = str(new_rank) ... rank.set('updated', 'yes') ... >>> tree.write('output.xml') Our XML now looks like this:
We can remove elements using
>>> for country in root.findall('country'): ... # using root.findall() to avoid removal during traversal ... rank = int(country.find('rank').text) ... if rank > 50: ... root.remove(country) ... >>> tree.write('output.xml') Note that concurrent modification while iterating can lead to problems, just like when iterating and modifying Python lists or dicts. Therefore, the example first collects all matching elements with Our XML now looks like this:
Building XML documents¶The >>> a = ET.Element('a') >>> b = ET.SubElement(a, 'b') >>> c = ET.SubElement(a, 'c') >>> d = ET.SubElement(c, 'd') >>> ET.dump(a) Parsing XML with Namespaces¶If the XML input has namespaces, tags and attributes with prefixes in
the form Here is an XML example that incorporates two namespaces, one with the prefix “fictional” and the other serving as the default namespace:
One way to search and explore this XML example is
to manually add the URI to every tag or attribute in the xpath of a root = fromstring(xml_text) for actor in root.findall('{http://people.example.com}actor'): name = actor.find('{http://people.example.com}name') print(name.text) for char in actor.findall('{http://characters.example.com}character'): print(' |-->', char.text) A better way to search the namespaced XML example is to create a dictionary with your own prefixes and use those in the search functions: ns = {'real_person': 'http://people.example.com', 'role': 'http://characters.example.com'} for actor in root.findall('real_person:actor', ns): name = actor.find('real_person:name', ns) print(name.text) for char in actor.findall('role:character', ns): print(' |-->', char.text) These two approaches both output: John Cleese |--> Lancelot |--> Archie Leach Eric Idle |--> Sir Robin |--> Gunther |--> Commander Clement XPath support¶This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module. Example¶Here’s an example that demonstrates some of the XPath capabilities of the module. We’ll be
using the import xml.etree.ElementTree as ET root = ET.fromstring(countrydata) # Top-level elements root.findall(".") # All 'neighbor' grand-children of 'country' children of the top-level # elements root.findall("./country/neighbor") # Nodes with name='Singapore' that have a 'year' child root.findall(".//year/..[@name='Singapore']") # 'year' nodes that are children of nodes with name='Singapore' root.findall(".//*[@name='Singapore']/year") # All 'neighbor' nodes that are the second child of their parent root.findall(".//neighbor[2]") For XML with namespaces, use the usual qualified # All dublin-core "title" tags in the document root.findall(".//{http://purl.org/dc/elements/1.1/}title") Supported XPath syntax¶
Predicates (expressions within square brackets) must be preceded by a tag name, an asterisk, or another predicate. Reference¶Functions¶xml.etree.ElementTree. canonicalize (xml_data=None, *, out=None, from_file=None,
**options)¶C14N 2.0 transformation function. Canonicalization is a way to normalise XML output in a way that allows byte-by-byte comparisons and digital signatures. It reduced the freedom that XML serializers have and instead generates a more constrained XML representation. The main restrictions regard the placement of namespace declarations, the ordering of attributes, and ignorable whitespace. This function takes an XML data string (xml_data) or a file path or file-like object (from_file) as input, converts it to the canonical form, and writes it out using the out file(-like) object, if provided, or returns it as a text string if not. The output file receives text,
not bytes. It should therefore be opened in text mode with Typical uses: xml_data = " The configuration options are as follows:
In the option list above, “a set” refers to any collection or iterable of strings, no ordering is expected. New in version 3.8. Comment element factory. This factory function creates a special element that will be serialized as an XML comment by the standard serializer. The comment string can be either a bytestring or a Unicode string. text is a string containing the comment string. Returns an element instance representing a comment. Note that xml.etree.ElementTree. dump (elem)¶Writes an element tree or element structure to sys.stdout. This function should be used for debugging only. The exact output format is implementation dependent. In this version, it’s written as an ordinary XML file. elem is an element tree or an individual element. Changed in version 3.8: The xml.etree.ElementTree. fromstring (text,
parser=None)¶Parses an XML section from a string constant. Same as xml.etree.ElementTree. fromstringlist (sequence, parser=None)¶Parses an XML document from a sequence of string fragments. sequence is a list or other sequence containing XML data fragments.
parser is an optional parser instance. If not given, the standard New in version 3.2. xml.etree.ElementTree. indent (tree, space=' ', level=0)¶Appends whitespace to the subtree to indent the tree visually. This can be used to generate pretty-printed XML output. tree can be an Element or ElementTree. space is the whitespace string that will be inserted for each indentation level, two space characters by default. For indenting partial subtrees inside of an already indented tree, pass the initial indentation level as level. New in version 3.9. xml.etree.ElementTree. iselement (element)¶Check if an object appears to be a valid element object. element is an element instance. Return xml.etree.ElementTree. iterparse (source, events=None, parser=None)¶Parses an XML section into an element tree incrementally, and reports what’s going on to the
user. source is a filename or file object containing XML data. events is a sequence of events to report back. The supported events are the strings Note that while
Note
If you need a fully populated element, look for “end” events instead. Deprecated since version 3.4: The parser argument. Changed in version 3.8: The xml.etree.ElementTree. parse (source, parser=None)¶Parses an XML section into an element tree. source is a filename or file object containing XML data. parser is an optional parser instance. If
not given, the standard xml.etree.ElementTree. ProcessingInstruction (target, text=None)¶PI element factory. This factory function creates a special element that will be serialized as an XML processing instruction. target is a string containing the PI target. text is a string containing the PI contents, if given. Returns an element instance, representing a processing instruction. Note that xml.etree.ElementTree. register_namespace (prefix, uri)¶Registers a namespace prefix. The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed. prefix is a namespace prefix. uri is a namespace uri. Tags and attributes in this namespace will be serialized with the given prefix, if at all possible. New in version 3.2. xml.etree.ElementTree. SubElement (parent, tag, attrib={},
**extra)¶Subelement factory. This function creates an element instance, and appends it to an existing element. The element name, attribute names, and attribute values can be either bytestrings or Unicode strings. parent is the parent element. tag is the subelement name. attrib is an optional dictionary, containing element attributes. extra contains additional attributes, given as keyword arguments. Returns an element instance. xml.etree.ElementTree. tostring (element, encoding='us-ascii', method='xml', *,
xml_declaration=None, default_namespace=None, short_empty_elements=True)¶Generates a string representation of an XML element, including all subelements. element is
an New in version 3.4: The short_empty_elements parameter. New in version 3.8: The xml_declaration and default_namespace parameters. Changed in version 3.8: The xml.etree.ElementTree. tostringlist (element,
encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None,
short_empty_elements=True)¶Generates a string representation of an XML element, including all subelements. element is an
New in version 3.2. New in version 3.4: The short_empty_elements parameter. New in version 3.8: The xml_declaration and default_namespace parameters. Changed in version 3.8: The xml.etree.ElementTree. XML (text, parser=None)¶Parses an XML section from a string constant. This function can be used to embed “XML literals” in Python code. text is a string containing XML data.
parser is an optional parser instance. If not given, the standard xml.etree.ElementTree. XMLID (text, parser=None)¶Parses an XML section from a string constant, and also returns a dictionary which maps from element id:s to elements. text is a string containing XML data.
parser is an optional parser instance. If not given, the standard XInclude support¶This module provides limited support for XInclude directives, via the Example¶Here’s an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the
By default, the href attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax. To process this file, load it as usual, and pass the root element to the from xml.etree import ElementTree, ElementInclude tree = ElementTree.parse("document.xml") root = tree.getroot() ElementInclude.include(root) The ElementInclude module replaces the
If the parse attribute is omitted, it defaults to “xml”. The href attribute is required. To include a text document, use the
The result might look something like:
Reference¶Functions¶xml.etree.ElementInclude. default_loader (href, parse, encoding=None)¶Default loader. This default loader reads an included resource from disk. href is a URL.
parse is for parse mode either “xml” or “text”. encoding is an optional text encoding. If not given, encoding is xml.etree.ElementInclude. include (elem,
loader=None, base_url=None, max_depth=6)¶This function expands XInclude directives. elem is the root element. loader is an optional resource loader. If
omitted, it defaults to Returns the expanded resource. If the parse mode is New in version 3.9: The base_url and max_depth parameters. Element Objects¶classxml.etree.ElementTree. Element (tag, attrib={},
**extra)¶Element class. This class defines the Element interface, and provides a reference implementation of this interface. The element name, attribute names, and attribute values can be either bytestrings or Unicode strings. tag is the element name. attrib is an optional dictionary, containing element attributes. extra contains additional attributes, given as keyword arguments. tag ¶A string identifying what kind of data this element represents (the element type, in other words). text ¶ tail ¶These attributes can be used to hold additional data
associated with the element. Their values are usually strings but may be any application-specific object. If the element is created from an XML file, the text attribute holds either the text between the element’s start tag and its first child or end tag, or 1 the a element has To collect the inner text of an element, see
Applications may store arbitrary objects in these attributes. attrib ¶A dictionary containing the element’s attributes. Note that while the attrib value is always a real mutable Python dictionary, an ElementTree implementation may choose to use another internal representation, and create the dictionary only if someone asks for it. To take advantage of such implementations, use the dictionary methods below whenever possible. The following dictionary-like methods work on the element attributes. clear ()¶Resets an element. This function removes all subelements, clears all
attributes, and sets the text and tail attributes to get (key, default=None)¶Gets the element attribute named key. Returns the attribute value, or default if the attribute was not found. items ()¶Returns the element attributes as a sequence of (name, value) pairs. The attributes are returned in an arbitrary order. keys ()¶Returns the elements attribute names as a list. The names are returned in an arbitrary order. set (key,
value)¶Set the attribute key on the element to value. The following methods work on the element’s children (subelements). append (subelement)¶Adds the element subelement to the end of this element’s internal list of subelements. Raises extend (subelements)¶Appends
subelements from a sequence object with zero or more elements. Raises New in version 3.2. find (match, namespaces=None)¶Finds the first subelement matching match. match may be a tag name or a
path. Returns an element instance or findall (match,
namespaces=None)¶Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.
namespaces is an optional mapping from namespace prefix to full name. Pass findtext (match, default=None,
namespaces=None)¶Finds text for the first subelement matching match. match may be a tag name or a path. Returns the text
content of the first matching element, or default if no element was found. Note that if the matching element has no text content an empty string is returned. namespaces is an optional mapping from namespace prefix to full name. Pass insert (index,
subelement)¶Inserts subelement at the given position in this element. Raises iter (tag=None)¶Creates
a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. If tag is not New in version 3.2. iterfind (match, namespaces=None)¶Finds all matching subelements, by tag name or path. Returns an iterable yielding all matching elements in document order. namespaces is an optional mapping from namespace prefix to full name. New in version 3.2. itertext ()¶Creates a text iterator. The iterator loops over this element and all subelements, in document order, and returns all inner text. New in version 3.2. makeelement (tag, attrib)¶Creates a new element object of the same type as this element. Do not call this method, use the
remove (subelement)¶
Removes subelement from the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.
Caution: Elements with no subelements will test as element = root.find('foo') if not element: # careful! print("element not found, or element has no subelements") if element is None: print("element not found") Prior to Python 3.8, the serialisation order of the XML attributes of elements was artificially made predictable by sorting the attributes by their name. Based on the now guaranteed ordering of dicts, this arbitrary reordering was removed in Python 3.8 to preserve the order in which attributes were originally parsed or created by user code. In general, user code should try not to depend on a specific ordering of attributes, given that the XML Information Set explicitly excludes the attribute order from conveying information. Code should be prepared to deal with any ordering on input. In cases where
deterministic XML output is required, e.g. for cryptographic signing or test data sets, canonical serialisation is available with the In cases where canonical output is not applicable but a specific attribute order is still desirable on output, code should aim for creating the attributes directly in the desired order, to avoid perceptual mismatches for readers of the code. In cases where this is difficult to achieve, a recipe like the following can be applied prior to serialisation to enforce an order independently from the Element creation: def reorder_attributes(root): for el in root.iter(): attrib = el.attrib if len(attrib) > 1: # adjust attribute order, e.g. by sorting attribs = sorted(attrib.items()) attrib.clear() attrib.update(attribs) ElementTree Objects¶classxml.etree.ElementTree. ElementTree (element=None, file=None)¶ElementTree wrapper class. This class represents an entire element hierarchy, and adds some extra support for serialization to and from standard XML. element is the root element. The tree is initialized with the contents of the XML file if given. _setroot (element)¶Replaces the root element for this tree. This discards the current contents of the tree, and replaces it with the given element. Use with care. element is an element instance. find (match,
namespaces=None)¶Same as findall (match, namespaces=None)¶Same as
findtext (match, default=None,
namespaces=None)¶Same as getroot ()¶Returns the root element for this tree. iter (tag=None)¶Creates and returns a tree iterator for the root element. The iterator loops over all elements in this tree, in section order. tag is the tag to look for (default is to return all elements). iterfind (match, namespaces=None)¶Same as New in version 3.2. parse (source, parser=None)¶
Loads an external XML section into this element tree. source is a file name or file object. parser is an optional parser instance. If not given, the standard write (file, encoding='us-ascii', xml_declaration=None, default_namespace=None, method='xml', *,
short_empty_elements=True)¶Writes the element tree to a file, as XML. file is a file name, or a file object opened for writing. encoding 1 is the output encoding (default is US-ASCII). xml_declaration controls if an XML declaration should be added to the file. Use The output is either a string ( New in version 3.4: The short_empty_elements parameter. Changed in version 3.8: The This is the XML file that is going to be manipulated: <html> <head> <title>Example pagetitle> head> <body> <p>Moved to <a href="http://example.org/">example.orga> or <a href="http://example.com/">example.coma>.p> body> html> Example of changing the attribute “target” of every link in first paragraph: >>> from xml.etree.ElementTree import ElementTree >>> tree = ElementTree() >>> tree.parse("index.xhtml") QName Objects¶classxml.etree.ElementTree. QName (text_or_uri,
tag=None)¶QName wrapper. This can be used to wrap a QName attribute value, in order to get proper namespace handling on output. text_or_uri is a string containing the QName value, in the form {uri}local, or, if the tag argument is given, the URI part of a QName.
If tag is given, the first argument is interpreted as a URI, and this argument is interpreted as a local name. TreeBuilder Objects¶classxml.etree.ElementTree. TreeBuilder (element_factory=None, *, comment_factory=None, pi_factory=None,
insert_comments=False, insert_pis=False)¶Generic element structure builder. This builder converts a sequence of start, data, end, comment and pi method calls to a well-formed element structure. You can use this class to build an element structure using a custom XML parser, or a parser for some other XML-like format. element_factory, when given, must be a callable accepting two positional arguments: a tag and a dict of attributes. It is expected to return a new element instance. The comment_factory and pi_factory functions, when given, should behave like the
close ()¶Flushes the builder buffers, and returns the toplevel
document element. Returns an data (data)¶Adds text to the current element. data is a string. This should be either a bytestring, or a Unicode string. end (tag)¶Closes the current element. tag is the element name. Returns the closed element. start (tag,
attrs)¶Opens a new element. tag is the element name. attrs is a dictionary containing element attributes. Returns the opened element. Creates a comment with the given text. If New in version 3.8. pi (target, text)¶Creates a comment with the given target name and text. If New in version 3.8. In addition, a custom doctype (name, pubid,
system)¶Handles a doctype declaration. name is the doctype name. pubid is the public identifier. system is the system identifier. This method does not exist on the default
New in version 3.2. start_ns (prefix,
uri)¶Is called whenever the parser encounters a new namespace declaration, before the New in version 3.8. end_ns (prefix)¶Is called after the New in version 3.8. classxml.etree.ElementTree. C14NWriterTarget (write, *, with_comments=False, strip_text=False, rewrite_prefixes=False,
qname_aware_tags=None, qname_aware_attrs=None, exclude_attrs=None, exclude_tags=None)¶A
C14N 2.0 writer. Arguments are the same as for the New in version 3.8. XMLParser Objects¶classxml.etree.ElementTree. XMLParser (*, target=None,
encoding=None)¶This class is the low-level building block of the module. It uses
Changed in version 3.8: Parameters are now keyword-only. The html argument no longer supported. close ()¶Finishes feeding data to the parser. Returns the result of calling the feed (data)¶Feeds data to the parser. data is encoded data.
>>> from xml.etree.ElementTree import XMLParser >>> class MaxDepth: # The target object of the parser ... maxDepth = 0 ... depth = 0 ... def start(self, tag, attrib): # Called for each opening tag. ... self.depth += 1 ... if self.depth > self.maxDepth: ... self.maxDepth = self.depth ... def end(self, tag): # Called for each closing tag. ... self.depth -= 1 ... def data(self, data): ... pass # We do not need to do anything with data. ... def close(self): # Called when all data has been parsed. ... return self.maxDepth ... >>> target = MaxDepth() >>> parser = XMLParser(target=target) >>> exampleXml = """ ... ... ... ... ... XMLPullParser Objects¶classxml.etree.ElementTree. XMLPullParser (events=None)¶
A pull parser suitable for non-blocking applications. Its input-side API is similar to that of feed (data)¶Feed the given bytes data to the parser. close ()¶Signal the parser that the data stream is terminated. Unlike read_events ()¶Return an iterator over the events which have been encountered in the data fed to the parser. The iterator yields
Events provided in a previous call to Note
If you need a fully populated element, look for “end” events instead. New in version 3.4. Changed in version 3.8: The Exceptions¶classxml.etree.ElementTree. ParseError ¶XML parse error, raised by the various parsing methods in this module when parsing fails. The string representation of an instance of this exception will contain a user-friendly error message. In addition, it will have the following attributes available: code ¶A numeric error code from the expat parser. See the documentation of position ¶A tuple of line, column numbers, specifying where the error occurred. Footnotes 1(1,2,3,4)The encoding string included in XML output should conform to the appropriate standards. For example, “UTF-8” is valid, but “UTF8” is not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl and https://www.iana.org/assignments/character-sets/character-sets.xhtml. How do I find a specific tag in XML in Python?Use xml.. tree = ElementTree. parse("sample.xml"). root = tree. getroot(). dogs = root. findall("dog"). for dog in dogs:. print(dog. text). What is Etree in Python?The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.
How do you add Subelements in XML in Python?Add them using Subelement() function and define it's text attribute.. child=xml. Element("employee") nm = xml. SubElement(child, "name") nm. text = student. ... . import xml. etree. ElementTree as et tree = et. ElementTree(file='employees.xml') root = tree. ... . import xml. etree. ElementTree as et tree = et.. Can pandas read XML?The Pandas data analysis library provides functions to read/write data for most of the file types. For example, it includes read_csv() and to_csv() for interacting with CSV files. However, Pandas does not include any methods to read and write XML files.
|