Extracting Text from Patent Figures with DeepSeek-OCR
This article explores 12 approaches to extracting text and reference numbers from patent figure sheets using the DeepSeek-OCR model. It highlights the challenges of dealing with rotated text, dense data screens, and scattered reference numbers.
Why it matters
Extracting text and data from patent figures is an important task for researchers and engineers, and this article provides insights into the challenges and potential solutions for improving OCR performance in this domain.
Key Points
- 1DeepSeek-OCR, a 3.3B parameter vision model, was tested on 8 patent figure sheets from US11423567B2
- 2Upright flowcharts were detected perfectly, but other figures had issues with rotated text, small labels, and hallucinated detections on grid marks
- 3Binarization and Tesseract OSD for rotation detection did not improve results significantly
- 4Manually rotating the sheets to the correct 90-degree orientation greatly improved the number of accurate detections
Details
The article discusses the challenges of using OCR on patent figure sheets, which often have text at multiple orientations, tiny reference numbers scattered among drawings, dense data screens with white text on dark backgrounds, and structural elements that can be mistaken for text. The DeepSeek-OCR model, which has a grounding mode that returns bounding boxes alongside text, was tested on 8 sheets from a facial recognition depth mapping system patent. While the model performed well on clean, upright flowcharts, it struggled with other figure types, either missing small labels or hallucinating detections on grid marks. Binarization and Tesseract OSD for rotation detection did not significantly improve the results. The key finding was that manually rotating the sheets to the correct 90-degree orientation greatly improved the number of accurate detections, highlighting the need for robust orientation handling in OCR systems for patent figures.
No comments yet
Be the first to comment