Heretic Abliteration Tool Adds Universal Support for New HF Architectures
The Heretic tool, which automatically removes censorship and refusals in local language models, has been updated to support dynamic auto-registration for new Hugging Face model architectures.
Why it matters
This update to the Heretic tool makes it easier to run and 'delobotomize' the latest language models, reducing censorship and refusals.
Key Points
- 1Heretic now automatically parses config.json, imports the necessary classes, and registers new model architectures on-the-fly
- 2This eliminates the need for manual patching when new models like GLM are released
- 3Heretic was successfully tested on the GLM-4.6V-Flash multimodal 10B model, reducing the refusal rate from 100/100 to 63/100
Details
The Heretic tool is an automatic script that removes censorship and refusals from local language models while keeping the KL (Kullback-Leibler divergence) very low. The latest update adds dynamic auto-registration capabilities, allowing Heretic to automatically handle new or unsupported Hugging Face model architectures. When Transformers encounters an 'unrecognized config' error, Heretic now parses the config.json, imports the necessary config/auto/model classes, and registers them on-the-fly, enabling the model to be loaded successfully. This eliminates the need for manual patching every time a new model architecture like GLM is released. The update was tested on the GLM-4.6V-Flash multimodal 10B model, which loaded fine on a single Nvidia 4090 GPU, had a KL of 0.0000 (essentially identical to the original), and saw the refusal rate on 'spicy' prompts drop from 100/100 to 63/100, a significant improvement over previous Heretic versions.
No comments yet
Be the first to comment