DeepSeek发布OCR开源模型：重新定义文档识别效率

👉 项目官网：https://www.python-office.com/ 👈

👉 本开源项目的交流群 👈

大家好，这里是程序员晚枫，正在all in AI编程实战。

1、DeepSeek-OCR发布

10月20日消息，人工智能团队DeepSeek AI正式发布全新多模态模型 DeepSeek-OCR。

这款仅30亿参数的模型，以上下文光学压缩技术实现文本信息的高效压缩，在保持97%识别精度的同时将计算成本降低10倍，单张A100-40G显卡每日可处理超20万页文档，彻底颠覆传统OCR工具的性能边界。

模型提供Tiny/Small/Base/Large/Gundam五种尺寸配置，其中Gundam版本专为超高清文档优化，支持1024×640混合尺寸处理，完美应对多栏排版、图文混杂的专业场景。

所有输出结果原生支持Markdown格式，配合内置的边界框检测功能，可精准定位文本块、表格、插图在原图中的位置信息，解决了传统OCR只认文字、不识布局的行业痛点。

目前模型已完整开源至GitHub和HuggingFace，采用MIT许可证允许免费商用。开发者可通过transformers库直接加载使用，官方同步提供PDF转图像、批量处理脚本等辅助工具，即使非专业技术人员也能快速部署。

代码示例

from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'deepseek-ai/DeepSeek-OCR'

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)

# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'

res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)

2、零基础玩转poocr：3行代码实现发票批量识别

面对DeepSeek-OCR这样的专业级模型，非技术背景用户可能望而却步。

但借助腾讯云OCR接口封装的poocr工具，普通人只需3行Python代码即可实现发票批量识别，每月1000次免费额度完全满足个人办公需求。

准备工作：3分钟完成环境配置

首先通过阿里云镜像安装poocr库：

1	pip install -i https://mirrors.aliyun.com/pypi/simple/ poocr -U

注册腾讯云账号并开通OCR服务，在API密钥管理页面创建密钥，获取SecretId和SecretKey。注意保管好密钥信息，避免泄露造成安全风险。

免费开通地址：https://cloud.tencent.com/product/ocr

核心代码：一行命令搞定批量识别

创建Python文件，输入以下代码：

import poocr

# 替换为你的腾讯云密钥
r_id = '你的SecretId'
r_key = '你的SecretKey'

# 批量识别指定文件夹发票并导出Excel
poocr.ocr2excel.VatInvoiceOCR2Excel(
    input_path=r'C:\发票图片文件夹',  # 存放发票图片的目录
    output_path=r'C:\识别结果',       # 导出Excel的保存路径
    id=r_id,
    key=r_key
)