怎么用Python自动揪出Word里重复的字、例题插图表格，还有标红或加粗的文本？

### 使用Python处理Word文档对于在Word文档中的各种操作，可以利用`python-docx`库来完成这些任务。下面分别介绍如何通过Python实现所需功能。 #### 1. 检查并标记连续重复字为了检测并标注出文中所有的连续重复字符，在读取文档内容之后可以通过正则表达式匹配相邻相同汉字，并在其后面插入提示信息[^1]。 ```python from docx import Document import re def mark_repeated_characters(doc_path, output_path): document = Document(doc_path) for paragraph in document.paragraphs: text = paragraph.text # 正则表达式用于寻找两个相同的中文字符相连的情况 pattern = r'([\u4e00-\u9fff])\1' matches = list(re.finditer(pattern, text)) if not matches: continue new_text_parts = [] last_end = 0 for match in reversed(matches): start, end = match.span() # 插入警告文本到原字符串中适当位置 new_text_parts.append(text[last_end:start]) new_text_parts.append(f"{text[start:end]}(注意：此处有重复)") last_end = end new_text_parts.append(text[last_end:]) paragraph.text = ''.join(new_text_parts[::-1]) document.save(output_path) ``` #### 2. 提取文档内的例题、插图和表格清单针对不同类型的对象（如图片、表格），可以直接遍历对应的集合属性获取它们的信息；而对于题目这样的自定义样式，则可能需要依据特定格式或关键词来进行筛选。 ```python def extract_elements(document): examples = [] # 假设例子是以“Example: ”开头的段落 images = [(rel.target_ref.split('/')[-1], shape.name) for rel in document.part.rels.values() for shape in rel._target.shapes] tables_list = [{"title": table.rows[0].cells[0].text.strip(), "content": [[cell.text for cell in row.cells] for row in table.rows]} for table in document.tables] for para in document.paragraphs: if 'example:' in para.text.lower(): examples.append(para.text.replace('Example:', '').strip()) return {'examples': examples, 'images': images, 'tables': tables_list} ``` #### 3. 查找并高亮显示所有红色字体及加粗的文字要定位具有特殊样式的文本片段，可迭代访问每一个段落下的运行(run)，检查其是否有指定的颜色编码或是bold属性设置为True。一旦找到符合条件的内容就改变背景颜色作为突出显示的方式之一。 ```python def highlight_special_texts(doc_path, output_path): from docx.shared import RGBColor document = Document(doc_path) for paragraph in document.paragraphs: runs_to_highlight = [] for run in paragraph.runs: font_color = getattr(run.font.color.rgb, None) is_bold = run.bold or False if (font_color and str(font_color).lower() == '#ff0000') or is_bold: runs_to_highlight.append((run.start, run.end)) for start, end in sorted(runs_to_highlight)[::-1]: highlighted_run = paragraph.add_run(paragraph.text[start:end]) highlighted_run.font.highlight_color = WD_COLOR_INDEX.YELLOW document.save(output_path) ```

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

下一篇用 Python 实现 SSH 远程登录并交互执行命令，核心依赖哪个库？