qwq:32b-preview-fp16 - Ollama 框架

这标志着 Qwen 在扩展强化学习 (RL) 以增强推理能力方面的初步尝试。通过这一过程，我们不仅看到了规模化强化学习的巨大潜力，而且认识到预训练语言模型中未开发的可能性。在我们努力开发下一代 Qwen 的过程中，我们相信将更强大的基础模型与由规模化计算资源支持的强化学习相结合，将推动我们更接近实现通用人工智能 (AGI)。此外，我们正在积极探索将代理与强化学习相结合，以实现长期推理，旨在通过推理时间扩展来释放更大的智能。

参考

博客

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

![](/assets/library/qwq/e3d71b1c-9c62-413a-a63a-1ca604189a17)

### Future Work

This marks Qwen’s initial step in scaling Reinforcement Learning (RL) to enhance reasoning capabilities. Through this journey, we have not only witnessed the immense potential of scaled RL but also recognized the untapped possibilities within pretrained language models. As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI). Additionally, we are actively exploring the integration of agents with RL to enable long-horizon reasoning, aiming to unlock greater intelligence with inference time scaling.

### Reference
- [Blog](https://qwenlm.github.io/blog/qwq-32b/)

粘贴、拖放或单击以上传图像（.png、.jpeg、.jpg、.svg、.gif）