Fine-Tuning Gemma 4 on Day Zero: 3 Bugs We Solved in 30 Minutes
The article discusses the challenges faced when fine-tuning the newly released Gemma 4 model from Google, and how the authors were able to resolve three bugs within 30 minutes.
Why it matters
This article provides valuable insights into the challenges faced when working with the latest AI models and the workarounds required to overcome them.
Key Points
- 1Transformers library did not recognize the Gemma 4 architecture
- 2PEFT library could not handle the custom Gemma4ClippableLinear layer
- 3Monkey-patching the custom layer to inherit from nn.Linear resolved the issue
Details
The article describes how the authors encountered three bugs when trying to fine-tune the Gemma 4 model, a 31B dense model released by Google under the Apache 2.0 license. The first issue was that the stable Transformers library (v5.4.0) did not recognize the Gemma 4 architecture, which was only available in the development branch. The authors resolved this by installing Transformers from the GitHub source. The second issue was that the PEFT library, used for applying LoRA, could not handle the custom Gemma4ClippableLinear layer introduced in Gemma 4. This layer inherits from nn.Module instead of nn.Linear, which PEFT expects. The authors fixed this by monkey-patching the Gemma4ClippableLinear layer to inherit from nn.Linear before loading the model.
No comments yet
Be the first to comment