Distilling DeepSeek V4 Pro’s Thinking Style Into Qwen3.6-35B-A3B
Source
Twitter / XDistilling DeepSeek V4 Pro’s Thinking Style Into Qwen3.6-35B-A3Bmodelscope.cnYou might also wanna read
DeepSeek-V3.1 Released with Hybrid Inference and Enhanced Agent Capabilities
DeepSeek has released DeepSeek-V3.1, featuring hybrid inference with both 'Think' and 'Non-Think' modes in a single model. The new version o
DeepSeek-V3.1: Open-Source Language Model with Hybrid Inference for Advanced Reasoning and Coding
DeepSeek-V3.1 is an open-source large language model that introduces hybrid inference with both 'Think' and 'Non-Think' modes, optimized for
Comparing 11 LLMs on a LangGraph Code Reorganization Task: American vs. Chinese Models
A detailed experimental comparison of 11 large language models (5 American: GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash, Gemini 2.5 Pro, Gro
Technical Implementation of DeepSeek LLM Deployment with Expert Parallelism on 96 H100 GPUs
The article details the technical implementation of deploying DeepSeek, an open-source large language model, across 96 H100 GPUs using advan
DeepSeek-V4 Series Preview: Million-Token Context MoE Models with 1.6T Parameters
DeepSeek introduces the V4 series of Mixture-of-Experts (MoE) language models, including DeepSeek-V4-Pro (1.6T parameters, 49B activated) an
DeepSeek-V4-Flash revives interest in LLM steering with local model capabilities
The article discusses LLM "steering" — manipulating model activations mid-flight to guide outputs — and highlights DeepSeek-V4-Flash as a br

Comments
Sign in to join the conversation.
No comments yet. Be the first.