LocalLLaMA Reddit6h ago|研究・論文プロダクト・サービス

EGGROLL: Trained a Model Without Backprop, Found Better Generalization

The article discusses a novel approach called EGGROLL (Evolution Strategies at the Hyperscale) that trains a model without using backpropagation, and found it generalized better than a contrastive learning baseline.

💡

Why it matters

EGGROLL demonstrates a novel training approach that can outperform standard contrastive learning techniques, highlighting the potential for alternative optimization methods in machine learning.

Key Points

1EGGROLL optimizes NDCG (Normalized Discounted Cumulative Gain) directly instead of using contrastive loss for retrieval tasks
2NDCG has sorting, which cannot be backpropagated through, so EGGROLL uses evolution strategies instead
3The EGGROLL model achieved a 22% better validation score than the contrastive learning baseline, despite worse training performance

Details

The article presents an unconventional approach called EGGROLL (Evolution Strategies at the Hyperscale) for training machine learning models. Typically, retrieval tasks use contrastive loss for training, and then evaluate using NDCG (Normalized Discounted Cumulative Gain). The author wondered what would happen if they optimized NDCG directly, instead of using contrastive loss. However, NDCG involves sorting, which is not differentiable and cannot be backpropagated through. The solution proposed in EGGROLL is to use evolution strategies - adding noise, evaluating the impact, and updating the model in that direction. This 'caveman optimization' approach resulted in a model that achieved a 22% better validation score than a contrastive learning baseline, despite the baseline having a perfect training score. This highlights how severe overfitting can be with contrastive learning.

EGGROLL: Trained a Model Without Backprop, Found Better Generalization

Why it matters

Key Points

Details

Dive deeper

Related Articles

Proud of My 2x3090 + Spare 3060 Setup

Nemotron-Nano-30B: What settings are you getting good resul…

Built a free local voice dictation app using faster-whisper…

As 2025 wraps up, which local LLMs really mattered this yea…

Moore Threads Unveils The Lushan Gaming & Huashan AI GPUs: …

OpenAIに対する不満の声

llama.cpp appreciation post

Glm 4.6 vs devstral 2 123b

Dataset Quality is Not Improving Much

LongVie 2: Multimodal, Controllable, Ultra-Long Video World…

AI Curator

Ask me anything about AI

Related Articles

Proud of My 2x3090 + Spare 3060 Setup

Nemotron-Nano-30B: What settings are you getting good resul…

Built a free local voice dictation app using faster-whisper…

As 2025 wraps up, which local LLMs really mattered this yea…

Moore Threads Unveils The Lushan Gaming & Huashan AI GPUs: …

Dataset Quality is Not Improving Much

LongVie 2: Multimodal, Controllable, Ultra-Long Video World…