EGGROLL: Trained a Model Without Backprop, Found Better Generalization

The article discusses a novel approach called EGGROLL (Evolution Strategies at the Hyperscale) that trains a model without using backpropagation, and found it generalized better than a contrastive learning baseline.

💡

Why it matters

EGGROLL demonstrates a novel training approach that can outperform standard contrastive learning techniques, highlighting the potential for alternative optimization methods in machine learning.

Key Points

  • 1EGGROLL optimizes NDCG (Normalized Discounted Cumulative Gain) directly instead of using contrastive loss for retrieval tasks
  • 2NDCG has sorting, which cannot be backpropagated through, so EGGROLL uses evolution strategies instead
  • 3The EGGROLL model achieved a 22% better validation score than the contrastive learning baseline, despite worse training performance

Details

The article presents an unconventional approach called EGGROLL (Evolution Strategies at the Hyperscale) for training machine learning models. Typically, retrieval tasks use contrastive loss for training, and then evaluate using NDCG (Normalized Discounted Cumulative Gain). The author wondered what would happen if they optimized NDCG directly, instead of using contrastive loss. However, NDCG involves sorting, which is not differentiable and cannot be backpropagated through. The solution proposed in EGGROLL is to use evolution strategies - adding noise, evaluating the impact, and updating the model in that direction. This 'caveman optimization' approach resulted in a model that achieved a 22% better validation score than a contrastive learning baseline, despite the baseline having a perfect training score. This highlights how severe overfitting can be with contrastive learning.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies