主要内容
|
|
|
| Paragrapgh |
docx.text.paragraph.Paragraph |
docx.oxml.text.paragraph.CT_P |
| Run |
docx.text.run.Run |
docx.oxml.text.run.CT_R |
- Document —
- Paragraph
- Run and Font
- Text
- Element
- related_parts
解析
Paragraph
- paragraphs中不包含表格
- paragraph.text只能获取w:p子节点w:r的text,所以特殊格式的text无法获取
paragraph._element可转为docx.oxml.text.paragraph.CT_P,可使用xml, xpath, getchildren等方法
import docxdoc = docx.Document('test.docx')for paragraph in doc.paragraphs: print(paragraph.style.style_id) # 段落格式 如'1'表示一级标题,'a'表示普通文本 print(p._element.xml) print(paragraph.text) print(''.join(r.text for r in p._p.xpath('.//w:r'))) # 可获取全部text print(''.join(r.text for r in p._p.xpath('.//w:t'))) # 同上
Run an Font
for run in paragraph.runs: print(run.text) print(run._element.xml) (run.font.bold)
图片
超链接
样式