LocalLLaMA Reddit1d ago|研究・論文プロダクト・サービス

MiMo-V2-Flash - SGLang - mtp triton attention

The article presents testing results for a 4x 6000 Blackwell workstation card setup, with details on context, prompt, output, end-to-end speed, and accuracy length for different input sizes.

💡

Why it matters

The article offers technical details on the performance of a multi-GPU workstation setup, which is relevant for understanding the capabilities and limitations of large language models.

Key Points

1Testing results for a 4x 6000 Blackwell workstation card setup
2Metrics include context, prompt, output, end-to-end speed, and accuracy length
3Tested input sizes range from 4K to 100K tokens

Details

The article provides performance metrics for a system running on 4x 6000 Blackwell workstation cards. It tests the system's capabilities across a range of input sizes, from 4K to 100K tokens. The key metrics reported are context, prompt, output, end-to-end speed, and accuracy length. As the input size increases, the end-to-end speed decreases, going from 100.2 tokens/second for 4K inputs down to 24.5 tokens/second for 100K inputs. The accuracy length remains relatively stable, ranging from 2.24 to 2.50. This data provides insights into the scalability and performance characteristics of the tested system.

MiMo-V2-Flash - SGLang - mtp triton attention

Why it matters

Key Points

Details

Dive deeper

Related Articles

I built a 100% Java RAG engine from scratch that runs on <5…

~1.8× peak throughput for Kimi K2 with EAGLE3 draft model

GLM 4.7 IS COMING!!!

MiniMax M2.1、UI/UXデザインの実力が高評価

Major Open-Source AI Releases in 2023

Using a Local LLM (Gemma 3) to Manage Claude Code on a Home…

Got me a 32GB RTX 4080 Super

Best local model for use with agentic coding frameworks

Day 14: 21 Days of Building a Small Language Model: Positio…

Rustで作った10ms未満の高速ファイアウォール

AI Curator

Ask me anything about AI

Related Articles

I built a 100% Java RAG engine from scratch that runs on <5…

~1.8× peak throughput for Kimi K2 with EAGLE3 draft model

Major Open-Source AI Releases in 2023

Using a Local LLM (Gemma 3) to Manage Claude Code on a Home…

Best local model for use with agentic coding frameworks

Day 14: 21 Days of Building a Small Language Model: Positio…