主要内容
|
|
|
Paragrapgh |
docx.text.paragraph.Paragraph |
docx.oxml.text.paragraph.CT_P |
Run |
docx.text.run.Run |
docx.oxml.text.run.CT_R |
- Document —
- Paragraph
- Run and Font
- Text
- Element
- related_parts
解析
Paragraph
- paragraphs中不包含表格
- paragraph.text只能获取w:p子节点w:r的text,所以特殊格式的text无法获取
paragraph._element
可转为docx.oxml.text.paragraph.CT_P
,可使用xml, xpath, getchildren等方法
import docx
doc = docx.Document('test.docx')
for paragraph in doc.paragraphs:
print(paragraph.style.style_id) # 段落格式 如'1'表示一级标题,'a'表示普通文本
print(p._element.xml)
print(paragraph.text)
print(''.join(r.text for r in p._p.xpath('.//w:r'))) # 可获取全部text
print(''.join(r.text for r in p._p.xpath('.//w:t'))) # 同上
Run an Font
for run in paragraph.runs:
print(run.text)
print(run._element.xml)
(run.font.bold)
图片
超链接
样式