Reddit Machine Learning2h ago|Research & Papers Products & Services

PentaNet: Pushing beyond BitNet with Native Pentanary Quantization

The author introduces PentaNet, a custom neural network architecture that uses pentanary quantization ({-2, -1, 0, 1, 2}) instead of the ternary quantization ({-1, 0, 1}) used in BitNet. This allows for more model capacity while preserving the 'zero-multiplier' inference benefit.

💡

Why it matters

PentaNet demonstrates how expanding quantization beyond ternary can improve model performance with negligible compute overhead, a promising direction for efficient large language models.

Key Points

1PentaNet expands the weight states to pentanary: {-2, -1, 0, +1, +2}
2Multiplying by 2 doesn't require a hardware multiplier, just a left bit-shift
3PentaNet achieves a ~6.4% perplexity improvement over BitNet on WikiText-103
4The weight distribution stabilizes with ~11% in the ±2 buckets

Details

The author has been experimenting with extreme quantization of large language models, following the BitNet 1.58b paper. While ternary quantization {-1, 0, 1} is efficient for replacing costly matrix multiplications with simple additions, the author wondered if more model capacity could be unlocked by expanding the weight states. PentaNet was built and trained from scratch, using a pentanary weight representation: {-2, -1, 0, +1, +2}. The key benefit of the ±2 weights is that multiplying by 2 can be done through a simple left bit-shift operation, preserving the 'zero-multiplier' inference advantage of BitNet. In head-to-head benchmarks on WikiText-103 with 124M parameter models, PentaNet achieved a ~6.4% perplexity improvement over BitNet, with the weight distribution stabilizing well during training. The author has open-sourced the code and model, and the next step is to implement custom low-level kernels to leverage the bit-shift operations for real-world speedups.

PentaNet: Pushing beyond BitNet with Native Pentanary Quantization

Why it matters

Key Points

Details

Dive deeper

Related Articles

Controlled experiment: Giving an LLM agent access to CS pap…

Additional Experiments During Rebuttal Can Worsen Paper Qua…

Create Datasets from TikTok Videos

Is TensorFlow the

Comparing ResNet and Facial Landmarks for Real-time Student…

ACL ARR Submission Desk Rejected Due to Duplicate Versions

Audit Finds Issues with LoCoMo Long-Term Memory Benchmark

Building a Transformer Out of Claudes — Collaboration Reque…

Building a Demand Forecasting System for Multi-Location Ret…

Dual-engine approach for detecting AI-generated music in co…

AI Curator

Ask me anything about AI

Related Articles

Controlled experiment: Giving an LLM agent access to CS pap…

Additional Experiments During Rebuttal Can Worsen Paper Qua…

Create Datasets from TikTok Videos

Comparing ResNet and Facial Landmarks for Real-time Student…

ACL ARR Submission Desk Rejected Due to Duplicate Versions

Audit Finds Issues with LoCoMo Long-Term Memory Benchmark

Building a Transformer Out of Claudes — Collaboration Reque…

Building a Demand Forecasting System for Multi-Location Ret…

Dual-engine approach for detecting AI-generated music in co…