Request: Training a Pretrained, MoE Version of Mistral Nemo
A student has converted the Mistral Nemo language model into a 16-expert Mixture-of-Experts (MoE) model, but due to budget constraints, cannot afford further fine-tuning. The student hopes someone will take interest in the model and provide a trained version.
Why it matters
This request highlights the challenges faced by students and researchers with limited resources in developing and improving large language models.
Key Points
- 1Mistral Nemo has been converted from a dense model to a 16-expert MoE model
- 2The student has budget constraints and cannot afford full parameter or extended fine-tuning
- 3The model currently has issues with coherence and ignoring instructions
- 4If someone releases a trained version, the student can expand the expert pool and release a version with expanded parameter capacity
Details
The student has converted the Mistral Nemo language model, which was previously a dense model, into a 16-expert Mixture-of-Experts (MoE) model. This was done in an effort to improve the model's capabilities, but due to budget constraints and the use of a rental GPU, the student was unable to afford full parameter or extended fine-tuning. As a result, the model currently has issues with coherence and often ignores instructions. The student hopes that someone will take an interest in this model and provide a trained version, which would allow the student to expand the expert pool and release a version with expanded parameter capacity, effectively restoring the capabilities of the original Mistral Nemo model.
No comments yet
Be the first to comment