Hunyuan-OCR-WEBUI自动化：结合Python脚本批量处理

# Hunyuan-OCR-WEBUI自动化：结合Python脚本批量处理 > **获取更多AI镜像** > > 想探索更多AI镜像和应用场景？访问 [CSDN星图镜像广场](https://ai.csdn.net/?utm_source=mirror_blog_end)，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。 ## 1. 项目背景与价值你有没有遇到过这样的烦恼：手头有一大堆图片需要提取文字，一张张上传到OCR网站，手动复制粘贴结果，整个过程枯燥又耗时？如果是几百张甚至上千张图片，光是想一想就让人头疼。腾讯混元OCR（HunyuanOCR）给了我们一个强大的文字识别工具，但通过网页界面一张张处理显然不是高效的方式。这就是为什么我们需要自动化——用Python脚本批量处理，让电脑帮我们完成重复性工作。想象一下这样的场景：你有一个文件夹里存放着1000张商品标签图片，需要提取所有文字信息录入系统。手动操作可能需要一整天，而用自动化脚本可能只需要喝杯咖啡的时间。 ## 2. 环境准备与快速部署 ### 2.1 基础环境要求在开始之前，确保你的环境满足以下要求： - 操作系统：Linux（推荐Ubuntu 20.04+）或Windows - Python版本：3.8或更高版本 - 显卡：NVIDIA GPU（推荐显存8GB以上） - 网络：能够正常访问部署的Hunyuan-OCR服务 ### 2.2 快速部署Hunyuan-OCR 如果你还没有部署Hunyuan-OCR，可以按照以下步骤快速搭建： ```bash # 克隆项目仓库 git clone https://github.com/Tencent/HunyuanOCR.git cd HunyuanOCR # 安装依赖（推荐使用conda环境） conda create -n hunyuan-ocr python=3.9 conda activate hunyuan-ocr pip install -r requirements.txt # 启动WebUI服务（使用7860端口） python app.py --port 7860 ``` 或者使用官方提供的启动脚本： ```bash # 使用界面推理模式 bash 1-界面推理-pt.sh # 或者使用API接口模式 bash 2-API接口-pt.sh ``` 服务启动后，你可以通过浏览器访问 `http://localhost:7860` 来使用Web界面。 ## 3. Python自动化脚本开发 ### 3.1 安装必要的Python库首先安装我们需要的Python库： ```bash pip install requests pillow opencv-python numpy ``` 这些库的作用分别是： - `requests`：用于发送HTTP请求到OCR API - `pillow`（PIL）：用于处理图像文件 - `opencv-python`：用于图像预处理 - `numpy`：数值计算支持 ### 3.2 基础OCR调用函数让我们先写一个基础的函数来调用Hunyuan-OCR的API： ```python import requests import base64 import json import os def ocr_single_image(image_path, api_url="http://localhost:8000/ocr"): """ 单张图片OCR识别函数参数: image_path: 图片路径 api_url: OCR API地址返回: OCR识别结果 """ try: # 读取图片并编码为base64 with open(image_path, "rb") as image_file: encoded_image = base64.b64encode(image_file.read()).decode('utf-8') # 构建请求数据 payload = { "image": encoded_image, "lang": "ch" # 中文识别，可改为"en"英文或其他支持语言 } # 发送POST请求 response = requests.post(api_url, json=payload, timeout=30) if response.status_code == 200: result = response.json() return result else: print(f"请求失败，状态码: {response.status_code}") return None except Exception as e: print(f"处理图片 {image_path} 时出错: {str(e)}") return None ``` ### 3.3 批量处理脚本现在我们来编写完整的批量处理脚本： ```python import os import glob import json import time from concurrent.futures import ThreadPoolExecutor, as_completed class BatchOCRProcessor: def __init__(self, api_url="http://localhost:8000/ocr", output_dir="ocr_results"): self.api_url = api_url self.output_dir = output_dir os.makedirs(output_dir, exist_ok=True) def process_single_image(self, image_path): """处理单张图片并返回结果""" result = ocr_single_image(image_path, self.api_url) if result and 'text' in result: # 提取文件名（不含扩展名） base_name = os.path.splitext(os.path.basename(image_path))[0] # 保存结果到文件 output_file = os.path.join(self.output_dir, f"{base_name}.txt") with open(output_file, 'w', encoding='utf-8') as f: f.write(result['text']) # 同时保存完整的JSON结果（可选） json_file = os.path.join(self.output_dir, f"{base_name}.json") with open(json_file, 'w', encoding='utf-8') as f: json.dump(result, f, ensure_ascii=False, indent=2) return True, image_path else: return False, image_path def process_batch(self, image_folder, file_patterns=['*.jpg', '*.png', '*.jpeg'], max_workers=4): """批量处理文件夹中的所有图片""" # 获取所有图片文件 image_files = [] for pattern in file_patterns: image_files.extend(glob.glob(os.path.join(image_folder, pattern))) print(f"找到 {len(image_files)} 张待处理图片") # 使用线程池并行处理 success_count = 0 failed_count = 0 failed_files = [] with ThreadPoolExecutor(max_workers=max_workers) as executor: # 提交所有任务 future_to_file = { executor.submit(self.process_single_image, file): file for file in image_files } # 处理完成的任务 for future in as_completed(future_to_file): file = future_to_file[future] try: success, processed_file = future.result() if success: success_count += 1 print(f"✓ 已完成: {processed_file}") else: failed_count += 1 failed_files.append(processed_file) print(f"✗ 失败: {processed_file}") except Exception as e: failed_count += 1 failed_files.append(file) print(f"✗ 异常: {file}, 错误: {str(e)}") # 输出统计结果 print(f"\n处理完成! 成功: {success_count}, 失败: {failed_count}") if failed_files: print("失败的文件:") for file in failed_files: print(f" - {file}") return success_count, failed_count, failed_files # 使用示例 if __name__ == "__main__": processor = BatchOCRProcessor() processor.process_batch("path/to/your/images", max_workers=4) ``` ## 4. 高级功能与优化 ### 4.1 图像预处理增强识别率有时候原始图片质量不佳，我们可以先进行预处理： ```python from PIL import Image import cv2 import numpy as np def preprocess_image(image_path, output_path=None): """ 图像预处理函数，增强OCR识别率 """ # 读取图像 img = cv2.imread(image_path) # 转换为灰度图 gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # 应用自适应阈值处理 processed = cv2.adaptiveThreshold( gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2 ) # 可选：降噪 processed = cv2.medianBlur(processed, 3) if output_path: cv2.imwrite(output_path, processed) return processed # 修改OCR函数，加入预处理选项 def ocr_with_preprocess(image_path, api_url, preprocess=True): if preprocess: # 创建临时预处理文件 temp_path = "temp_processed.jpg" preprocess_image(image_path, temp_path) result = ocr_single_image(temp_path, api_url) os.remove(temp_path) # 清理临时文件 return result else: return ocr_single_image(image_path, api_url) ``` ### 4.2 支持多种输出格式除了文本文件，我们还可以生成更结构化的输出： ```python def generate_csv_report(results_dir, output_csv="ocr_report.csv"): """生成CSV格式的OCR报告""" import csv from datetime import datetime csv_file = os.path.join(results_dir, output_csv) with open(csv_file, 'w', newline='', encoding='utf-8') as file: writer = csv.writer(file) writer.writerow(['文件名', '识别文本', '处理时间', '字符数量']) # 遍历所有txt结果文件 for txt_file in glob.glob(os.path.join(results_dir, "*.txt")): with open(txt_file, 'r', encoding='utf-8') as f: text = f.read() base_name = os.path.basename(txt_file).replace('.txt', '') char_count = len(text) process_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S") writer.writerow([base_name, text, process_time, char_count]) print(f"CSV报告已生成: {csv_file}") def generate_html_report(results_dir, output_html="ocr_report.html"): """生成HTML格式的OCR报告""" html_file = os.path.join(results_dir, output_html) with open(html_file, 'w', encoding='utf-8') as f: f.write('''<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>OCR处理报告</title> <style> body { font-family: Arial, sans-serif; margin: 40px; } .item { margin-bottom: 20px; border-bottom: 1px solid #eee; padding-bottom: 20px; } .filename { font-weight: bold; color: #333; } .text { margin-top: 10px; padding: 10px; background: #f5f5f5; border-radius: 5px; } </style> </head> <body> <h1>OCR批量处理报告</h1> <p>生成时间: ''' + datetime.now().strftime("%Y-%m-%d %H:%M:%S") + '''</p> ''') # 添加每个文件的结果 for txt_file in sorted(glob.glob(os.path.join(results_dir, "*.txt"))): with open(txt_file, 'r', encoding='utf-8') as tf: text = tf.read() base_name = os.path.basename(txt_file).replace('.txt', '') f.write(f''' <div class="item"> <div class="filename">{base_name}</div> <div class="text">{text}</div> </div> ''') f.write(''' </body> </html>''') print(f"HTML报告已生成: {html_file}") ``` ## 5. 实际应用案例 ### 5.1 案例一：批量处理扫描文档假设你有一批扫描的PDF文档已经转换为图片，现在需要提取所有文字： ```python def process_scanned_documents(pdf_image_folder, output_base_dir="scanned_docs_ocr"): """ 处理扫描文档的专用函数 """ # 按文档类型创建子文件夹 doc_types = { 'contract': '合同文档', 'report': '报告文档', 'invoice': '发票收据' } for doc_type, doc_name in doc_types.items(): type_folder = os.path.join(pdf_image_folder, doc_type) if os.path.exists(type_folder): output_dir = os.path.join(output_base_dir, doc_name) processor = BatchOCRProcessor(output_dir=output_dir) processor.process_batch(type_folder) # 生成类型报告 generate_csv_report(output_dir, f"{doc_name}_OCR报告.csv") # 使用示例 process_scanned_documents("scanned_documents") ``` ### 5.2 案例二：社交媒体图片文字提取对于社交媒体图片，可能包含多种语言和特殊格式： ```python def process_social_media_images(image_folder, output_dir="social_media_ocr"): """处理社交媒体图片的专用函数""" processor = BatchOCRProcessor(output_dir=output_dir) # 设置多语言支持 processor.api_url = "http://localhost:8000/ocr" # 确保API支持多语言 success_count, failed_count, failed_files = processor.process_batch( image_folder, file_patterns=['*.jpg', '*.png', '*.jpeg', '*.gif'], max_workers=3 # 社交媒体图片可能较大，减少并发数 ) # 生成详细报告 generate_html_report(output_dir, "社交媒体文字提取报告.html") return success_count, failed_count ``` ## 6. 常见问题与解决方案 ### 6.1 性能优化建议在处理大量图片时，可以考虑以下优化措施： 1. **调整并发数量**：根据你的硬件配置调整`max_workers`参数 - CPU密集型：建议设置为CPU核心数 - I/O密集型：可以设置为CPU核心数的2-3倍 2. **批量大小控制**：对于极大数量的文件，可以分批次处理 ```python def process_in_batches(image_files, batch_size=100): """分批次处理大量文件""" for i in range(0, len(image_files), batch_size): batch = image_files[i:i+batch_size] print(f"处理批次 {i//batch_size + 1}: {len(batch)} 张图片") # 处理当前批次... ``` 3. **内存管理**：及时清理不再需要的变量，避免内存泄漏 ### 6.2 错误处理与重试机制网络请求可能会失败，添加重试机制提高稳定性： ```python def ocr_with_retry(image_path, api_url, max_retries=3): """带重试机制的OCR函数""" for attempt in range(max_retries): try: result = ocr_single_image(image_path, api_url) if result: return result except Exception as e: print(f"第 {attempt+1} 次尝试失败: {str(e)}") if attempt < max_retries - 1: wait_time = 2 ** attempt # 指数退避 print(f"等待 {wait_time} 秒后重试...") time.sleep(wait_time) return None ``` ## 7. 总结通过本文介绍的Python自动化方法，你可以将Hunyuan-OCR-WEBUI的强大识别能力与批量处理效率完美结合。无论你是需要处理成百上千的文档扫描件，还是需要从社交媒体图片中提取文字信息，这套方案都能帮你节省大量时间和精力。 **关键优势总结**： - **高效批量处理**：自动处理整个文件夹的图片，无需人工干预 - **灵活的输出格式**：支持文本、CSV、HTML等多种输出方式 - **智能错误处理**：内置重试机制和详细日志记录 - **易于扩展**：模块化设计，可以根据需要添加新功能 **下一步建议**： 1. 根据自己的具体需求调整脚本参数 2. 尝试添加更多的图像预处理方法以提高识别率 3. 探索将OCR结果直接集成到你的业务系统中 4. 考虑添加进度条和更详细的统计信息自动化处理不仅节省时间，还能减少人为错误，提高工作效率。现在就开始尝试用Python脚本解放你的双手吧！

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

下一篇 Python3.11服务网格部署：Istio集成实践与性能评测

目录

Hunyuan-OCR-WEBUI自动化：结合Python脚本批量处理

Python内容推荐

网上购物系统前台后台设计

中介效应分析-下载即用.zip

量子机器学习算法开发解决方案.pptx

Notepad- 是使用C++编写的轻量级文本编辑器, 简称ndd, 可以支持Window/Mac/Linux操作系统平台

MySQL查询重写规则[源码]

chromedriver-linux64-149.0.7827.53(Beta).zip

Quartus II中文指导

商用级量子卫星互联网接入解决方案.pptx

回文质数解析[代码]

C/C++断点调试指南[项目源码]

生成式AI详解[源码]

MySQL8开启日志[项目源码]

AI驱动的网络安全态势感知解决方案.pptx

pip-xgboost-0.4a26.tar.gz.zip

单片机擦除只读存储器-下载即用.zip

pip-xgboost-1.0.0.tar.gz.zip

pip-xgboost-0.82-py2.py3-none-manylinux1_x86_64.whl.zip

Springboot毕业设计含文档和代码餐厅点餐系统

PDB到Mol结构转换指南[项目代码]

A character and story-centric AIGC end-to-end creation tool.一款以角.zip

YOLO26-DeepSORT识别和跟踪和分类手写数字-检测和跟踪光学字符识别和跟踪和数字图像处理+数据集+deepsort跟踪算法+训练好的检测模型.zip

大模型安全对齐与风险管控解决方案.pptx

pip-xgboost-0.81.tar.gz.zip

[编译原理笔记·文法构造]

LM016L中文资料-下载即用.zip

学生成绩管理系统C++课程设计与实践

别再手动拖拽了！用Lumerical脚本批量创建FDTD仿真结构（附完整代码）

Java邮件解析任务中，如何安全高效地提取HTML邮件内容并避免硬编码、资源泄漏和类型转换异常？

RH公司应收账款管理优化策略研究

新手别慌！用BingPi-M2开发板带你5分钟搞懂Tina Linux SDK目录结构