Dev.to LLM10h ago|Research & Papers Products & Services

Building a Roguelike RPG with On-Device AI

The author optimized an on-device AI model to generate content for their roguelike RPG, achieving a 79x performance improvement over the initial approach.

💡

Why it matters

Optimizing on-device AI inference is critical for building responsive, offline-capable applications like this roguelike RPG.

Key Points

1Tested different quantization levels on the Adreno OpenCL backend, finding that Q8_0 was the optimal setting
2Swapped to a smaller 1.7B parameter model (Qwen3-1.7B) which doubled the generation speed to 16.6 tokens/second
3The smaller model also better followed the prompts, generating appropriate mob names instead of placeholders

Details

The author started by using ONNX Runtime on the CPU, which was very slow at 0.21 tokens/second. They then tried ONNX Runtime with Qualcomm's QNN HTP, which improved it to 0.31 tokens/second. Next, they switched to the llama.cpp library with Adreno OpenCL, which boosted performance to 9.0 tokens/second using the larger Phi-4-mini model. Finally, by swapping to the smaller 1.7B parameter Qwen3-1.7B model, they achieved 16.6 tokens/second, a 79x improvement over the initial approach. This on-device AI stack can generate content for the game's dungeon loading screens in just 9 seconds.

Building a Roguelike RPG with On-Device AI

Why it matters

Key Points

Details

Dive deeper

Related Articles

Gemma 4 Benchmarks, iMac G3 Local LLM, and Ollama Android C…

Engineering the Future of Intelligent Infrastructure with C…

GEO Tarot: 22 Interactive SVG Cards Explaining Generative E…

Unlocking Latent Knowledge in Large Language Models

Can LLMs Understand and Simulate Human Emotions?

Building Production-Ready AI Agents for Slack and Discord U…

Building JarvisOS: An Agentic System for Mobile AI

Optimizing Context for Production LLM Systems

All Data and AI Weekly #236 - April 6, 2026

Migrating Your LLM Pipeline to Gemma 4 Without Breaking Eve…

AI Curator

Ask me anything about AI

Related Articles

Gemma 4 Benchmarks, iMac G3 Local LLM, and Ollama Android C…

Engineering the Future of Intelligent Infrastructure with C…

GEO Tarot: 22 Interactive SVG Cards Explaining Generative E…

Unlocking Latent Knowledge in Large Language Models

Can LLMs Understand and Simulate Human Emotions?

Building Production-Ready AI Agents for Slack and Discord U…

Building JarvisOS: An Agentic System for Mobile AI

Optimizing Context for Production LLM Systems

All Data and AI Weekly #236 - April 6, 2026

Migrating Your LLM Pipeline to Gemma 4 Without Breaking Eve…