Dev.to Machine Learning3h ago|Research & Papers Products & Services

Extracting Text from Patent Figures with DeepSeek-OCR

This article explores 12 approaches to extracting text and reference numbers from patent figure sheets using the DeepSeek-OCR model. It highlights the challenges of dealing with rotated text, dense data screens, and scattered reference numbers.

💡

Why it matters

Extracting text and data from patent figures is an important task for researchers and engineers, and this article provides insights into the challenges and potential solutions for improving OCR performance in this domain.

Key Points

1DeepSeek-OCR, a 3.3B parameter vision model, was tested on 8 patent figure sheets from US11423567B2
2Upright flowcharts were detected perfectly, but other figures had issues with rotated text, small labels, and hallucinated detections on grid marks
3Binarization and Tesseract OSD for rotation detection did not improve results significantly
4Manually rotating the sheets to the correct 90-degree orientation greatly improved the number of accurate detections

Details

The article discusses the challenges of using OCR on patent figure sheets, which often have text at multiple orientations, tiny reference numbers scattered among drawings, dense data screens with white text on dark backgrounds, and structural elements that can be mistaken for text. The DeepSeek-OCR model, which has a grounding mode that returns bounding boxes alongside text, was tested on 8 sheets from a facial recognition depth mapping system patent. While the model performed well on clean, upright flowcharts, it struggled with other figure types, either missing small labels or hallucinating detections on grid marks. Binarization and Tesseract OSD for rotation detection did not significantly improve the results. The key finding was that manually rotating the sheets to the correct 90-degree orientation greatly improved the number of accurate detections, highlighting the need for robust orientation handling in OCR systems for patent figures.

Extracting Text from Patent Figures with DeepSeek-OCR

Why it matters

Key Points

Details

Dive deeper

Related Articles

Drivel-ology: Challenging LLMs with Interpreting Nonsense w…

How To Make Money With AI: A Comprehensive Guide

Complete Guide: How To Make Money With AI

Replicate Offers a Free API to Run Powerful AI Models

Survey of Vulnerabilities in Large Language Models Revealed…

Unlocking the Power of AI: A Guide to Making Money with Art…

Examining COVID-19 Forecasting using Spatio-Temporal Graph …

Why Your AI Has the Memory of a Goldfish (and How to Fix It)

Deploying Custom Vision Transformers (ViT) on iOS with Core…

VHS: Latent Verifier Cuts Diffusion Model Verification Cost…

AI Curator

Ask me anything about AI

Related Articles

Drivel-ology: Challenging LLMs with Interpreting Nonsense w…

How To Make Money With AI: A Comprehensive Guide

Complete Guide: How To Make Money With AI

Replicate Offers a Free API to Run Powerful AI Models

Survey of Vulnerabilities in Large Language Models Revealed…

Unlocking the Power of AI: A Guide to Making Money with Art…

Examining COVID-19 Forecasting using Spatio-Temporal Graph …

Why Your AI Has the Memory of a Goldfish (and How to Fix It)

Deploying Custom Vision Transformers (ViT) on iOS with Core…

VHS: Latent Verifier Cuts Diffusion Model Verification Cost…