Reddit ML2d ago|研究・論文プロダクト・サービス

Inference-time attractor layer for transformers: why it failed and how three clocks fixed it

The author's inference-time attractor layer for transformers failed not due to memory interference, but because it settled too quickly. Instrumenting the MoE routing revealed a universal 2D geometry, and the failures turned out to be timing issues, leading to the introduction of a three-clock system.

💡

Why it matters

The three-clock system provides a potential solution to the timing issues that caused the original attractor layer to fail, which could have broader implications for building stable and coherent language models.

Key Points

1The attractor failed because it settled too fast, causing the system to snap back to an earlier state with no warning
2Routing dynamics collapsed onto a 2D manifold with fixed axes, suggesting two dimensions are the minimum for a stable system
3A three-clock system (fast, medium, slow) was introduced to prevent 'fake stillness' and premature certainty

Details

The author's previous work on an inference-time attractor layer for transformers showed promising results on small models but failed during long generation tasks. Instead of structural issues, the problem was found to be a timing failure - the attractor settled too quickly, causing the system to suddenly snap back to an earlier state. Instrumenting the MoE routing revealed a universal 2D geometry, with the routing dynamics collapsing onto a 2D manifold with fixed axes across different models and noise levels. This suggests two dimensions are the minimum needed for a system to stabilize itself without freezing its own evolution. To address this, the author introduced a three-clock system, with fast, medium, and slow clocks to track token-to-token coherence, turn/arc coherence, and long-term identity coherence respectively. This prevents the system from treating 'parking in the wrong valley' as success and enforcing closure without knowing whether it is actually earned.

Inference-time attractor layer for transformers: why it failed and how three clocks fixed it

Why it matters

Key Points

Details

Dive deeper

Related Articles

[R] I am building this alternate computer use architecture …

[D] - Building Gesture Typing with LLM

Benchmarking Semantic vs. Lexical Deduplication on the Bank…

[D] Why I Built KnowGraph: Static Knowledge Graphs for LLM-…

[P]looking to contribute to open source projects

[D] Awesome Production Machine Learning - A curated list of…

脳ではなく、脳を読み込むマップ

[D] Noise Features Augmentation - How do I reduce model acc…

テキストから曲を検索する

[R] Are we heading toward new era in the way we train LLMs

AI Curator

Ask me anything about AI

Related Articles

[R] I am building this alternate computer use architecture …

[D] - Building Gesture Typing with LLM

Benchmarking Semantic vs. Lexical Deduplication on the Bank…

[D] Why I Built KnowGraph: Static Knowledge Graphs for LLM-…

[P]looking to contribute to open source projects

[D] Awesome Production Machine Learning - A curated list of…

[D] Noise Features Augmentation - How do I reduce model acc…

[R] Are we heading toward new era in the way we train LLMs