Dev.to AI2h ago|Products & Services Tutorials & How-To

Serving Qwen3.6-35B-A3B With vLLM and Building a Coding Agent With Tool Calling

The article discusses serving Alibaba's Qwen3.6-35B-A3B model locally using vLLM, calling it from Python with the OpenAI SDK, and wiring up tool calling to enable the model to act as a coding agent.

💡

Why it matters

This article provides valuable technical details on how to leverage a powerful open-source AI model for coding and agent-based applications.

Key Points

1Qwen3.6-35B-A3B is a sparse mixture-of-experts model with 35 billion total parameters and 3 billion active per token
2The model can be served locally using vLLM 0.19.0 or later, with support for the Qwen3.6 MoE architecture
3The article provides instructions for starting the vLLM server in basic inference mode and with tool calling enabled
4The model can be called from Python using the OpenAI SDK, with the local vLLM server acting as the API endpoint
5The model's 'reasoning-parser' and 'tool-call-parser' flags enable it to generate internal reasoning steps and call external tools as a coding agent

Details

Alibaba's Qwen team released the Qwen3.6-35B-A3B model on April 16, 2026 under the Apache 2.0 license. It is a sparse mixture-of-experts (MoE) model with 35 billion total parameters but only about 3 billion active per token. The model scores 73.4% on the SWE-bench Verified benchmark and 37.0 on the MCPMark, making it one of the strongest open-weight models for agentic coding tasks. The article provides instructions for serving the Qwen3.6-35B-A3B model locally using the vLLM framework. vLLM version 0.19.0 or later is required, as older versions do not support the Qwen3.6 MoE architecture. The article covers basic inference-only setup as well as enabling tool calling, which allows the model to act as a coding agent that can call external tools and services. The article also demonstrates how to call the Qwen3.6-35B-A3B model from Python using the OpenAI SDK, with the local vLLM server acting as the API endpoint. The 'reasoning-parser' and 'tool-call-parser' flags enable the model to generate internal reasoning steps and call external tools as part of its response, improving its capabilities for coding tasks.

Serving Qwen3.6-35B-A3B With vLLM and Building a Coding Agent With Tool Calling

Why it matters

Key Points

Details

Dive deeper

Related Articles

Mastering AI Model Fine-Tuning: Why You Should Stop Trainin…

IaC for Modern DevOps Practices

300+ cold emails later… we got our first sponsor ($150)

The bigger the agent gets, the worse a giant prompt perform…

Telegram Bot Approvals: Mobile-First Transaction Signing fo…

AI code is not dangerous because it looks bad.

Looking for collaborators on an agnostic LLM plugin for vim…

Everyone says the new Gemini model is better. Almost no one…

Panduan Aman Memakai Lem Perekat Sehari-hari

Not Every New Technology Makes Life Better in 2026

AI Curator

Ask me anything about AI

Related Articles

Mastering AI Model Fine-Tuning: Why You Should Stop Trainin…

IaC for Modern DevOps Practices

300+ cold emails later… we got our first sponsor ($150)

The bigger the agent gets, the worse a giant prompt perform…

Telegram Bot Approvals: Mobile-First Transaction Signing fo…

AI code is not dangerous because it looks bad.

Looking for collaborators on an agnostic LLM plugin for vim…

Everyone says the new Gemini model is better. Almost no one…

Panduan Aman Memakai Lem Perekat Sehari-hari

Not Every New Technology Makes Life Better in 2026