Dev.to Machine Learning3h ago|Research & PapersProducts & Services

Failing to Train DeBERTa to Detect Patent Antecedent Basis Errors

The article discusses the challenge of training an AI model to detect antecedent basis errors in patent claims, which are common issues that lead to patent rejections. The author fine-tuned DeBERTa-v3 on synthetic data but found it performed poorly on real USPTO examiner rejections.

💡

Why it matters

Detecting antecedent basis errors is an important task for improving patent quality and reducing costly rejections. The failure of the AI model highlights the difficulty of bridging the gap between synthetic and real-world data in this domain.

Key Points

  • 1Antecedent basis errors are a common issue in patent claims, where a term is referenced without a proper introduction
  • 2The author used DeBERTa-v3 for a token classification task to detect these errors
  • 3Synthetic training data was generated by injecting errors into clean patent claims, but the model failed to generalize to real USPTO rejections

Details

Patent claims have a simple rule: introduce 'a thing' before referring to 'the thing'. The author fine-tuned DeBERTa-v3 on synthetic antecedent basis errors and achieved 90% F1 on the test set. However, when evaluated on real USPTO examiner rejections from the PEDANTIC dataset, the model's performance collapsed to 14.5% F1 and 8% recall. The article covers the author's approach, the challenges with the training data, and what the failure reveals about the gap between synthetic and real patent data.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies