python3 xml;python ElementTree;cElementTree;
ElementTree
注意,ET在解析是将xml以tree的形式加载到内存中(in-memory tree),在进行处理,因此存在内存消耗的问题。为了解决这个问题,ET提供了类似SAX的工具iterparse。使用方法参考博客园 - python3解析XML的5、利用iterparse解析XML流。
1. 官方文档和相关教程
https://docs.python.org/3/library/xml.etree.elementtree.html
https://www.runoob.com/python/python-xml.html
2. 导入
ElementTree是python内置的包,cElementTree是用C语言实现的,速度更快,占用内存更小,因此优先使用cElementTree。
# python3.3+版本的ElemenTree直接导入ElementTree会自动优先选C语言。import xml.etree.ElementTree as ET# python3.3之前可以使用下面方法导入ElementTreetry:import xml.etree.cElementTree as ETexcept ImportError:import xml.etree.ElementTree as ET
3. 使用(挖坑)
有空自己整理一个example,结合下面复制过的就够了。
这里以下面的xml文档讲解ElementTree的增删改查,主要参考:博客园 - python3解析XML。注意有时候xml会被表示成一行,没有格式化,可以去XML在线格式化|菜鸟工具在线格式化xml方便阅读。
<?xml version="1.0"?><doc><branch name="codingpy.com" hash="1cdf045c">source</branch><branch name="release01" hash="f200013e"><sub-branch name="subrelease01">xml,sgml</sub-branch></branch><branch name="invalid"></branch></doc>
读取、查找
读取文档
import xml.etree.ElementTree as ET# 加载本地`test.xml`文档tree = ET.ElementTree(file='test.xml')# 获取root元素root = tree.getroot()print(root)'<Element 'doc' at 0x11eb780>'
遍历
# 遍历全部元素(DFS)for elem in tree.iter():print(elem.tag, elem.attrib)'''doc {}branch {'hash': '1cdf045c', 'name': 'codingpy.com'}branch {'hash': 'f200013e', 'name': 'release01'}sub-branch {'name': 'subrelease01'}branch {'name': 'invalid'}'''# 遍历指定tag的元素for elem in tree.iter(tag='branch'):print(elem.tag, elem.attrib)'''branch {'hash': '1cdf045c', 'name': 'codingpy.com'}branch {'hash': 'f200013e', 'name': 'release01'}branch {'name': 'invalid'}'''# 遍历节点子元素for child in root:print(child.tag, child.attrib)'''branch {'hash': '1cdf045c', 'name': 'codingpy.com'}branch {'hash': 'f200013e', 'name': 'release01'}branch {'name': 'invalid'}'''
索引、查找
支持index和XPath
# 通过索引来访问特定子元素print(root[0].tag, root[0].text)'branch', '\n source\n '# 通过XPath查找元素# 查找branch元素之下所有tag为sub-branch的元素for elem in tree.iterfind('branch/sub-branch'):print(elem.tag, elem.attrib)'''sub-branch {'name': 'subrelease01'}'''# 通过XPath查找所有具备某个name属性的branch元素for elem in tree.iterfind('branch[@name="release01"]'):print(elem.tag, elem.attrib)'''branch {'hash': 'f200013e', 'name': 'release01'}'''
增删改
创建xml文档
a = ET.Element('elem')c = ET.SubElement(a, 'child1')c.text = "some text"d = ET.SubElement(a, 'child2')b = ET.Element('elem_b')root = ET.Element('root')root.extend((a, b))# 也可以是root.append(a), root.append(b)tree = ET.ElementTree(root)ET.dump(tree)'''<root><elem><child1>some text</child1><child2 /></elem><elem_b /></root>'''
删除节点
修改节点
root = tree.getroot()del root[2]root[0].set('foo', 'bar')for subelem in root:print(subelem.tag, subelem.attrib)'''branch {'foo': 'bar', 'hash': '1cdf045c', 'name': 'codingpy.com'}branch {'hash': 'f200013e', 'name': 'release01'}'''ET.dump(root)# 请注意,文档中元素的属性顺序与原文档不同。这是因为ET是以字典的形式保存属性的,而字典是一个无序的数据结构。当然,XML也不关注属性的顺序。'''<doc><branch foo="bar" hash="1cdf045c" name="codingpy.com">text,source</branch><branch hash="f200013e" name="release01"><sub-branch name="subrelease01">xml,sgml</sub-branch></branch></doc>'''
删改查
import xml.etree.ElementTree as ET"""ElementTree.write() 将构建的XML文档写入(更新)文件。Element.set(key, value) 添加和修改属性Element.text = '' 直接改变字段内容Element.remove(Element) 删除Element节点Element.append(Element) 为当前的Elment对象添加子对象ET.SubElement(Element,tag)创建子节点"""# 增加自动缩进换行def indent(elem, level=0):i = "\n" + level*" "if len(elem):if not elem.text or not elem.text.strip():elem.text = i + " "if not elem.tail or not elem.tail.strip():elem.tail = ifor elem in elem:indent(elem, level+1)if not elem.tail or not elem.tail.strip():elem.tail = ielse:if level and (not elem.tail or not elem.tail.strip()):elem.tail = i#------------新增XML----------#创建根节点a = ET.Element("student")#创建子节点,并添加属性b = ET.SubElement(a,"name")b.attrib = {"NO.":"001"}#添加数据b.text = "张三"#创建elementtree对象,写文件indent(a,0)tree = ET.ElementTree(a)tree.write("writeXml.xml",encoding="utf-8")#----------编辑XML--------# 读取待修改文件updateTree = ET.parse("writeXml.xml")root = updateTree.getroot()# --新增--# 创建新节点并添加为root的子节点newnode = ET.Element("name")newnode.attrib = {"NO.":"003"}newnode.text = "张三水"root.append(newnode)#---修改---sub1 = root.findall("name")[2]# --修改节点的属性sub1.set("NO.","100")# --修改节点内文本sub1.text="陈真"#----删除---#--删除标签内文本sub1.text = ""#--删除标签的属性del sub1.attrib["NO."]#--删除一个节点root.remove(sub1)# 写回原文件indent(root,0)updateTree.write("writeXml.xml",encoding="utf-8", xml_declaration=True)
