Alibaba's Qwen3.7-Plus combines visual AI with autonomous agent capabilities for coding and app navigation
By
Jonathan Kemper
Summary
Alibaba's Qwen team has released Qwen3.7-Plus, a proprietary multimodal AI model that combines visual perception with agent capabilities like coding, tool use, and GUI navigation. Built on the text-only Qwen3.7, it functions as a "multimodal interactive hybrid agent" capable of recognizing real-world scenes, reading screens, operating interfaces, writing code from visual templates, and navigating mobile apps. In a demo, an agent built on the model autonomously developed a vocabulary learning app over eleven hours, producing over 10,000 lines of code across 1,000 agent calls. The model leads on-screen understanding in Qwen's benchmarks but shows mixed overall performance. It is priced well below Western frontier models and does not have open weights.
Source
Key quotes
· 3 pulledBilled as a 'multimodal interactive hybrid agent,' the model is designed to recognize real-world scenes, read screen content, operate graphical interfaces, write code from visual templates, and navigate mobile apps end to end.
Using Qwen3.7-Plus, the team had a hybrid agent system build...
Qwen3.7-Plus is a proprietary offering with no open weights, priced well below Western frontier models.
You might also wanna read
Alibaba's Qwen3-VL AI Model Demonstrates Advanced Video Analysis Capabilities
Alibaba has released a technical report on its Qwen3-VL multimodal AI model, demonstrating exceptional capabilities in processing and analyz
Alibaba Releases Qwen3.5 Medium AI Models with Open Source Licensing and Near Sonnet 4.5 Performance
Alibaba's Qwen AI team has released the Qwen3.5 Medium Model series, consisting of four new large language models with agentic tool calling
Alibaba Cloud Releases Qwen3-Omni: Native End-to-End Multimodal AI Model
Qwen3-Omni is a natively end-to-end, omni-modal large language model developed by Alibaba Cloud's Qwen team. It represents a significant adv
Alibaba Cloud Launches Qwen3-Omni: Native Multimodal AI Model with Real-Time Speech Generation
Qwen3-Omni is a new multimodal large language model from Alibaba Cloud's Qwen team that can process text, audio, images, and video natively
Qwen3: Alibaba Cloud's Open-Source Large Language Model Series for Coding Agents
Qwen3 is a large language model (LLM) series developed by the Qwen team at Alibaba Cloud, hosted on Product Hunt. The page showcases multipl
Qwen Releases Updated Qwen3-30B-A3B-Instruct-2507 Non-Thinking Mode Model
Qwen (Alibaba's AI team) released an updated version of their Qwen3-30B-A3B model, named Qwen3-30B-A3B-Instruct-2507. This is a non-thinking
Comments
Sign in to join the conversation.
No comments yet. Be the first.
