PentaNet: Pushing beyond BitNet with Native Pentanary Quantization
The author introduces PentaNet, a custom neural network architecture that uses pentanary quantization ({-2, -1, 0, 1, 2}) instead of the ternary quantization ({-1, 0, 1}) used in BitNet. This allows for more model capacity while preserving the 'zero-multiplier' inference benefit.
Why it matters
PentaNet demonstrates how expanding quantization beyond ternary can improve model performance with negligible compute overhead, a promising direction for efficient large language models.
Key Points
- 1PentaNet expands the weight states to pentanary: {-2, -1, 0, +1, +2}
- 2Multiplying by 2 doesn't require a hardware multiplier, just a left bit-shift
- 3PentaNet achieves a ~6.4% perplexity improvement over BitNet on WikiText-103
- 4The weight distribution stabilizes with ~11% in the ±2 buckets
Details
The author has been experimenting with extreme quantization of large language models, following the BitNet 1.58b paper. While ternary quantization {-1, 0, 1} is efficient for replacing costly matrix multiplications with simple additions, the author wondered if more model capacity could be unlocked by expanding the weight states. PentaNet was built and trained from scratch, using a pentanary weight representation: {-2, -1, 0, +1, +2}. The key benefit of the ±2 weights is that multiplying by 2 can be done through a simple left bit-shift operation, preserving the 'zero-multiplier' inference advantage of BitNet. In head-to-head benchmarks on WikiText-103 with 124M parameter models, PentaNet achieved a ~6.4% perplexity improvement over BitNet, with the weight distribution stabilizing well during training. The author has open-sourced the code and model, and the next step is to implement custom low-level kernels to leverage the bit-shift operations for real-world speedups.
No comments yet
Be the first to comment