LocalLLaMA Reddit21h ago|Research & PapersProducts & Services

Request: Training a Pretrained, MoE Version of Mistral Nemo

A student has converted the Mistral Nemo language model into a 16-expert Mixture-of-Experts (MoE) model, but due to budget constraints, cannot afford further fine-tuning. The student hopes someone will take interest in the model and provide a trained version.

đź’ˇ

Why it matters

This request highlights the challenges faced by students and researchers with limited resources in developing and improving large language models.

Key Points

  • 1Mistral Nemo has been converted from a dense model to a 16-expert MoE model
  • 2The student has budget constraints and cannot afford full parameter or extended fine-tuning
  • 3The model currently has issues with coherence and ignoring instructions
  • 4If someone releases a trained version, the student can expand the expert pool and release a version with expanded parameter capacity

Details

The student has converted the Mistral Nemo language model, which was previously a dense model, into a 16-expert Mixture-of-Experts (MoE) model. This was done in an effort to improve the model's capabilities, but due to budget constraints and the use of a rental GPU, the student was unable to afford full parameter or extended fine-tuning. As a result, the model currently has issues with coherence and often ignores instructions. The student hopes that someone will take an interest in this model and provide a trained version, which would allow the student to expand the expert pool and release a version with expanded parameter capacity, effectively restoring the capabilities of the original Mistral Nemo model.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies