Serving Qwen3.6-35B-A3B With vLLM and Building a Coding Agent With Tool Calling
The article discusses serving Alibaba's Qwen3.6-35B-A3B model locally using vLLM, calling it from Python with the OpenAI SDK, and wiring up tool calling to enable the model to act as a coding agent.
Why it matters
This article provides valuable technical details on how to leverage a powerful open-source AI model for coding and agent-based applications.
Key Points
- 1Qwen3.6-35B-A3B is a sparse mixture-of-experts model with 35 billion total parameters and 3 billion active per token
- 2The model can be served locally using vLLM 0.19.0 or later, with support for the Qwen3.6 MoE architecture
- 3The article provides instructions for starting the vLLM server in basic inference mode and with tool calling enabled
- 4The model can be called from Python using the OpenAI SDK, with the local vLLM server acting as the API endpoint
- 5The model's 'reasoning-parser' and 'tool-call-parser' flags enable it to generate internal reasoning steps and call external tools as a coding agent
Details
Alibaba's Qwen team released the Qwen3.6-35B-A3B model on April 16, 2026 under the Apache 2.0 license. It is a sparse mixture-of-experts (MoE) model with 35 billion total parameters but only about 3 billion active per token. The model scores 73.4% on the SWE-bench Verified benchmark and 37.0 on the MCPMark, making it one of the strongest open-weight models for agentic coding tasks. The article provides instructions for serving the Qwen3.6-35B-A3B model locally using the vLLM framework. vLLM version 0.19.0 or later is required, as older versions do not support the Qwen3.6 MoE architecture. The article covers basic inference-only setup as well as enabling tool calling, which allows the model to act as a coding agent that can call external tools and services. The article also demonstrates how to call the Qwen3.6-35B-A3B model from Python using the OpenAI SDK, with the local vLLM server acting as the API endpoint. The 'reasoning-parser' and 'tool-call-parser' flags enable the model to generate internal reasoning steps and call external tools as part of its response, improving its capabilities for coding tasks.
No comments yet
Be the first to comment