Serving Qwen3.6-35B-A3B With vLLM and Building a Coding Agent With Tool Calling

The article discusses serving Alibaba's Qwen3.6-35B-A3B model locally using vLLM, calling it from Python with the OpenAI SDK, and wiring up tool calling to enable the model to act as a coding agent.

đź’ˇ

Why it matters

This article provides valuable technical details on how to leverage a powerful open-source AI model for coding and agent-based applications.

Key Points

  • 1Qwen3.6-35B-A3B is a sparse mixture-of-experts model with 35 billion total parameters and 3 billion active per token
  • 2The model can be served locally using vLLM 0.19.0 or later, with support for the Qwen3.6 MoE architecture
  • 3The article provides instructions for starting the vLLM server in basic inference mode and with tool calling enabled
  • 4The model can be called from Python using the OpenAI SDK, with the local vLLM server acting as the API endpoint
  • 5The model's 'reasoning-parser' and 'tool-call-parser' flags enable it to generate internal reasoning steps and call external tools as a coding agent

Details

Alibaba's Qwen team released the Qwen3.6-35B-A3B model on April 16, 2026 under the Apache 2.0 license. It is a sparse mixture-of-experts (MoE) model with 35 billion total parameters but only about 3 billion active per token. The model scores 73.4% on the SWE-bench Verified benchmark and 37.0 on the MCPMark, making it one of the strongest open-weight models for agentic coding tasks. The article provides instructions for serving the Qwen3.6-35B-A3B model locally using the vLLM framework. vLLM version 0.19.0 or later is required, as older versions do not support the Qwen3.6 MoE architecture. The article covers basic inference-only setup as well as enabling tool calling, which allows the model to act as a coding agent that can call external tools and services. The article also demonstrates how to call the Qwen3.6-35B-A3B model from Python using the OpenAI SDK, with the local vLLM server acting as the API endpoint. The 'reasoning-parser' and 'tool-call-parser' flags enable the model to generate internal reasoning steps and call external tools as part of its response, improving its capabilities for coding tasks.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies