Building a Roguelike RPG with On-Device AI

The author optimized an on-device AI model to generate content for their roguelike RPG, achieving a 79x performance improvement over the initial approach.

💡

Why it matters

Optimizing on-device AI inference is critical for building responsive, offline-capable applications like this roguelike RPG.

Key Points

  • 1Tested different quantization levels on the Adreno OpenCL backend, finding that Q8_0 was the optimal setting
  • 2Swapped to a smaller 1.7B parameter model (Qwen3-1.7B) which doubled the generation speed to 16.6 tokens/second
  • 3The smaller model also better followed the prompts, generating appropriate mob names instead of placeholders

Details

The author started by using ONNX Runtime on the CPU, which was very slow at 0.21 tokens/second. They then tried ONNX Runtime with Qualcomm's QNN HTP, which improved it to 0.31 tokens/second. Next, they switched to the llama.cpp library with Adreno OpenCL, which boosted performance to 9.0 tokens/second using the larger Phi-4-mini model. Finally, by swapping to the smaller 1.7B parameter Qwen3-1.7B model, they achieved 16.6 tokens/second, a 79x improvement over the initial approach. This on-device AI stack can generate content for the game's dungeon loading screens in just 9 seconds.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies