Dev.to Machine Learning2h ago|Research & PapersProducts & Services

Robust DPO with Stochastic Negatives Improves Multimodal Sequential Recommendations

New research introduces RoDPO, a method that improves recommendation ranking by using stochastic sampling from a dynamic candidate pool for negative selection during Direct Preference Optimization training, addressing the false negative problem in implicit feedback.

đź’ˇ

Why it matters

This research has direct implications for luxury and retail companies building next-generation recommendation systems, enabling more robust personalization at scale and deeper multimodal understanding.

Key Points

  • 1RoDPO uses a dynamic pool of likely candidates (top-K items) for negative sampling instead of all unobserved items
  • 2Stochastic sampling from this pool introduces controlled randomness to smooth optimization while retaining informative hard signals
  • 3Optional sparse Mixture-of-Experts encoder enables efficient multimodal feature handling without exploding inference costs
  • 4RoDPO achieves up to 5.25% NDCG@5 gains on Amazon benchmarks compared to baseline DPO approaches

Details

The research paper addresses a critical challenge in applying Direct Preference Optimization (DPO) to recommender systems that rely on implicit feedback. In implicit feedback scenarios, items a user hasn't interacted with aren't necessarily negatives - they might be items the user would like but simply hasn't encountered yet. Treating all unobserved items as hard negatives during DPO training introduces 'erroneous suppressive gradients' that degrade model performance. The researchers' central finding was that replacing deterministic hard negatives with stochastic sampling from a dynamic top-K candidate pool consistently improved ranking metrics. RoDPO maintains a dynamic pool of likely candidates that gets updated during training, and for each step, negatives are randomly sampled from this pool, introducing controlled stochasticity that smooths optimization while retaining informative hard signals. An optional sparse Mixture-of-Experts encoder enables efficient multimodal feature handling. The method was evaluated on Amazon benchmarks, where it achieved up to 5.25% improvement in NDCG@5 compared to baseline DPO approaches, with nearly unchanged inference latency.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies