Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints

📰

Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints

NVIDIA Technical Blog·Anu Srivastava·about 1 month ago

#x2d #agenticaigenerativeai #developertoolstechniques #general #nemo #qwen3

Reading 0:00

15s threshold

Alibaba has introduced the new open source Qwen3.5 series built for native multimodal agents. The first model in this series is a ~400B parameter native vision-language model (VLM) with reasoning built with a hybrid architecture of mixture of experts (MoE) and Gated Delta Networks. Qwen3.5 can understand and navigate user interfaces, which improves on the previous generation of VLMs.  Qwen3.5 is ideal for a variety of use cases, including: Coding, including web development Visual reasoning, including mobile and web interfaces Chat applications Complex search Qwen3.5 Modalities Vision, language Total parameters 397B Active parameters 17B Activation rate 4.28% Input context length 256K extensible to 1M tokens Languages supported 200+ Additional configuration information Experts 512 Shared experts 1 Experts per token 11 (10 routed + 1 shared) Layers 60 Words (vocabulary) 248,320 Table 1.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints