LocalLLaMA Reddit1d ago|研究・論文プロダクト・サービス

AMD Radeon AI PRO R9700 benchmarks with ROCm and Vulkan and llama.cpp

The article presents benchmarks for the AMD Radeon AI PRO R9700 GPU, running on Arch Linux with ROCm 7.1.1 and comparing the performance of ROCm and Vulkan APIs for language models like gpt-oss 20B and Mistral Small.

💡

Why it matters

These benchmarks provide insights into the performance of AMD's Radeon AI PRO R9700 GPU for large language models, which is relevant for AI researchers and developers working on GPU-accelerated AI applications.

Key Points

1Benchmarks for novel summarization task using gpt-oss 20B and Mistral Small models
2Detailed performance metrics for prompt processing (PP), token generation (TG), and total time (T) under different batch sizes
3Comparison of ROCm and Vulkan APIs, with ROCm showing slightly faster prompt processing and less performance impact from long context

Details

The article presents benchmarks for the AMD Radeon AI PRO R9700 GPU, running on Arch Linux with ROCm 7.1.1. It compares the performance of ROCm and Vulkan APIs for language models like gpt-oss 20B and Mistral Small. For the novel summarization task, the gpt-oss 20B model with a batch size of 32 completed the task in 113 seconds, generating 18,000 output words, while the Mistral Small model with a batch of 3 took 479 seconds to generate 14,000 words. The detailed benchmarks show that ROCm usually has slightly faster prompt processing and takes less performance hit from long context, while Vulkan has slightly faster token generation. The author notes that the benchmark scripts were generated by the language model, so there may be some hallucinated values in the reported results.

AMD Radeon AI PRO R9700 benchmarks with ROCm and Vulkan and llama.cpp

Why it matters

Key Points

Details

Dive deeper

Related Articles

I built a 100% Java RAG engine from scratch that runs on <5…

~1.8× peak throughput for Kimi K2 with EAGLE3 draft model

GLM 4.7 IS COMING!!!

MiniMax M2.1、UI/UXデザインの実力が高評価

Major Open-Source AI Releases in 2023

Using a Local LLM (Gemma 3) to Manage Claude Code on a Home…

Got me a 32GB RTX 4080 Super

Best local model for use with agentic coding frameworks

Day 14: 21 Days of Building a Small Language Model: Positio…

Rustで作った10ms未満の高速ファイアウォール

AI Curator

Ask me anything about AI

Related Articles

I built a 100% Java RAG engine from scratch that runs on <5…

~1.8× peak throughput for Kimi K2 with EAGLE3 draft model

Major Open-Source AI Releases in 2023

Using a Local LLM (Gemma 3) to Manage Claude Code on a Home…

Best local model for use with agentic coding frameworks

Day 14: 21 Days of Building a Small Language Model: Positio…