All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Benchmark Comparison: Qwen3.6-35B-A3B Outperforms Claude Opus 4.7 in Pelican Image Generation Test

By

simonw

1mo ago· 3 min readenReview

Summary

The article presents a comparative benchmark test between two AI language models - Qwen3.6-35B-A3B from Alibaba and Claude Opus 4.7 from Anthropic. The author uses a specific "pelican riding a bicycle" benchmark to evaluate the models' image generation capabilities. The Qwen3.6 model, running locally on a MacBook Pro M5 via LM Studio with a quantized 20.9GB model, produced a better pelican image than Claude Opus 4.7, which the author states "managed to mess up." The article serves as a performance comparison of the latest AI model releases.

Key quotes

· 4 pulled
For anyone who has been taking my pelican riding a bicycle benchmark seriously as a robust way to test models, here are pelicans from this morning's two big model releases—Qwen3.6-35B-A3B from Alibaba and Claude Opus 4.7 from Anthropic.
Here's the Qwen 3.6 pelican, generated using this 20.9GB Qwen3.6-35B-A3B-UD-Q4_K_S.gguf quantized model by Unsloth, running on my MacBook Pro M5 via LM Studio (and the llm-lmstudio plugin)
And here's one I got from Anthropic's brand new Claude Opus 4.7
I'm giving this one to Qwen 3.6. Opus managed to mess up
Snippet from the RSS feed
For anyone who has been taking my pelican riding a bicycle benchmark seriously as a robust way to test models, here are pelicans from this morning’s two big model releases—Qwen3.6-35B-A3B …

You might also wanna read