EGGROLL: trained a model without backprop and found it generalized better

The article discusses a novel approach to training a model using evolution strategies instead of backpropagation, which led to better generalization performance on a retrieval task.

💡

Why it matters

This work demonstrates that alternative optimization techniques like evolution strategies can outperform traditional backpropagation-based approaches, especially for non-differentiable metrics like NDCG.

Key Points

  • 1Trained a model using evolution strategies instead of backpropagation
  • 2Optimized the model directly for NDCG (Normalized Discounted Cumulative Gain) metric instead of contrastive loss
  • 3The evolution strategies-based model achieved a 22% better validation score than the contrastive learning baseline, despite worse training performance

Details

The article describes an experiment where the author explored training a model without using backpropagation. The problem they were working on was a retrieval task, where the typical approach is to use contrastive loss for training and then evaluate using the NDCG metric. The author had the idea of directly optimizing the model for NDCG, which is challenging because NDCG involves sorting, which is not differentiable. To overcome this, the author used evolution strategies - a technique where noise is added to the model parameters, the performance is evaluated, and the updates are made in the direction that improves the performance. This 'caveman optimization' approach led to a model that generalized better than the contrastive learning baseline, despite having a worse training score. The author released the code for this experiment, which was implemented in PyTorch.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies