I’ve been analyzing the internal trajectory dynamics of several open-source models (Qwen, Llama, Gemma, etc.) using a new metric called ct_t (local trajectory instability based on hidden state curvature).
The results show a clear structural difference in how these models process information internally.
Key Finding:
While models like Gemma-2B often drift into a “Chaotic” regime (high instability spikes) and Llama-3.2 into an “Underactive” one (rigid, low variance), Qwen models consistently maintain an “Adaptive” regime.
This suggests that Qwen’s architecture achieves a unique balance between stability and flexibility, independent of model size. A 1.5B Qwen model was dynamically more stable than larger counterparts in our panel.
The 4 Regimes Identified:
Underactive: Rigid, low adaptability.
Adaptive: Balanced flux/stability (Qwen’s zone).
Transition: Boundary zone.
Chaotic: High instability, prone to divergence.
Full Technical Report:
I’ve published the full working paper with data from 158 runs on Zenodo:
Four Dynamical Regimes in Large Language Models: An Empirical Phase Map
I’d love to hear your thoughts: Does this “Adaptive” dynamic correlate with your experience of Qwen’s reasoning capabilities?
Best,
Jean-Denis Bosange Batuli
Back when DeepSeek was making headlines in the general news, I often heard people on Discord say that Qwen was the easiest to work with as a student model for LLMs. As for teacher models, back then it was DeepSeek or various other commercial LLMs…
I think Qwen was version 2.5 at the time, but I don’t think even the current version of Qwen has lost that characteristic. Qwen series tends to retain its original capabilities even after fine-tuning… though I don’t use it that heavily myself, so this is just my personal impression.
There might actually be a structural basis for its excellence as a student model…
That’s a fascinating observation, and it aligns perfectly with our geometric findings!
In our analysis, we found that Qwen models consistently occupy an ‘Adaptive’ dynamical regime (balanced flux and stability), whereas other models of similar size often drift into ‘Underactive’ (rigid) or ‘Chaotic’ (unstable) states.
From a dynamics perspective, this ‘Adaptive’ state might be exactly what makes Qwen such a robust ‘student’ model:
It’s not too rigid: It has enough internal flexibility (flux) to adapt to new instructions during fine-tuning without breaking its core structure.
It’s not too chaotic: It maintains enough stability to retain its original capabilities and avoid catastrophic forgetting.
Essentially, its hidden-state trajectory is ‘resilient’ rather than ‘fragile.’ It would be interesting to see if this dynamic signature persists in the newer Qwen-3B or 7B versions after heavy fine-tuning. Have you noticed any specific prompts where Qwen feels too stable or too flexible?"