Reddit ML13h ago|研究・論文プロダクト・サービス

EGGROLL: trained a model without backprop and found it generalized better

The article discusses a novel approach to training a model using evolution strategies instead of backpropagation, which led to better generalization performance on a retrieval task.

💡

Why it matters

This work demonstrates that alternative optimization techniques like evolution strategies can outperform traditional backpropagation-based approaches, especially for non-differentiable metrics like NDCG.

Key Points

1Trained a model using evolution strategies instead of backpropagation
2Optimized the model directly for NDCG (Normalized Discounted Cumulative Gain) metric instead of contrastive loss
3The evolution strategies-based model achieved a 22% better validation score than the contrastive learning baseline, despite worse training performance

Details

The article describes an experiment where the author explored training a model without using backpropagation. The problem they were working on was a retrieval task, where the typical approach is to use contrastive loss for training and then evaluate using the NDCG metric. The author had the idea of directly optimizing the model for NDCG, which is challenging because NDCG involves sorting, which is not differentiable. To overcome this, the author used evolution strategies - a technique where noise is added to the model parameters, the performance is evaluated, and the updates are made in the direction that improves the performance. This 'caveman optimization' approach led to a model that generalized better than the contrastive learning baseline, despite having a worse training score. The author released the code for this experiment, which was implemented in PyTorch.

EGGROLL: trained a model without backprop and found it generalized better

Why it matters

Key Points

Details

Dive deeper

Related Articles

[R] No causal inference workshops at ICLR 2026?

ONNX Runtime & CoreML May Silently Convert Your Model to FP…

[D] Isn’t it insanely beautiful that we went from 3 to 41 o…

Memory-Efficient TF-IDF Library in Python for Large Datasets

WrenAI System Architecture

Is model-building really only 10% of ML engineering?

Researchers Exploring Structured Wrongness and Blind Recons…

Researcher Builds Alternate Computer Use Architecture

[D] - Building Gesture Typing with LLM

Benchmarking Semantic vs. Lexical Deduplication on the Bank…

AI Curator

Ask me anything about AI

Related Articles

[R] No causal inference workshops at ICLR 2026?

ONNX Runtime & CoreML May Silently Convert Your Model to FP…

[D] Isn’t it insanely beautiful that we went from 3 to 41 o…

Memory-Efficient TF-IDF Library in Python for Large Datasets

Is model-building really only 10% of ML engineering?

Researchers Exploring Structured Wrongness and Blind Recons…

Researcher Builds Alternate Computer Use Architecture

[D] - Building Gesture Typing with LLM

Benchmarking Semantic vs. Lexical Deduplication on the Bank…