EGGROLL: trained a model without backprop and found it generalized better
The article discusses a novel approach to training a model using evolution strategies instead of backpropagation, which led to better generalization performance on a retrieval task.
Why it matters
This work demonstrates that alternative optimization techniques like evolution strategies can outperform traditional backpropagation-based approaches, especially for non-differentiable metrics like NDCG.
Key Points
- 1Trained a model using evolution strategies instead of backpropagation
- 2Optimized the model directly for NDCG (Normalized Discounted Cumulative Gain) metric instead of contrastive loss
- 3The evolution strategies-based model achieved a 22% better validation score than the contrastive learning baseline, despite worse training performance
Details
The article describes an experiment where the author explored training a model without using backpropagation. The problem they were working on was a retrieval task, where the typical approach is to use contrastive loss for training and then evaluate using the NDCG metric. The author had the idea of directly optimizing the model for NDCG, which is challenging because NDCG involves sorting, which is not differentiable. To overcome this, the author used evolution strategies - a technique where noise is added to the model parameters, the performance is evaluated, and the updates are made in the direction that improves the performance. This 'caveman optimization' approach led to a model that generalized better than the contrastive learning baseline, despite having a worse training score. The author released the code for this experiment, which was implemented in PyTorch.
No comments yet
Be the first to comment