deepseek-v2.5:236b - Ollama 框架

指标	DeepSeek-V2-0628	DeepSeek-Coder-V2-0724	DeepSeek-V2.5
AlpacaEval 2.0	46.6	44.5	50.5
ArenaHard	68.3	66.3	76.2
AlignBench	7.88	7.91	8.04
MT-Bench	8.85	8.91	9.02
HumanEval python	84.5	87.2	89
HumanEval Multi	73.8	74.8	73.8
LiveCodeBench(01-09)	36.6	39.7	41.8
Aider	69.9	72.9	72.2
SWE-verified	N/A	19	16.8
DS-FIM-Eval	N/A	73.2	78.3
DS-Arena-Code	N/A	49.5	63.1

参考

Hugging Face

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions.

DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:

| Metric                 | DeepSeek-V2-0628 | DeepSeek-Coder-V2-0724 | DeepSeek-V2.5 |
|:-----------------------|:-----------------|:-----------------------|:--------------|
| AlpacaEval 2.0          | 46.6             | 44.5                   | 50.5          |
| ArenaHard              | 68.3             | 66.3                   | 76.2          |
| AlignBench             | 7.88             | 7.91                   | 8.04          |
| MT-Bench               | 8.85             | 8.91                   | 9.02          |
| HumanEval python       | 84.5             | 87.2                   | 89            |
| HumanEval Multi        | 73.8             | 74.8                   | 73.8          |
| LiveCodeBench(01-09)   | 36.6             | 39.7                   | 41.8          |
| Aider                  | 69.9             | 72.9                   | 72.2          |
| SWE-verified           | N/A              | 19                     | 16.8          |
| DS-FIM-Eval            | N/A              | 73.2                   | 78.3          |
| DS-Arena-Code          | N/A              | 49.5                   | 63.1          |

## Reference

[Hugging Face](https://hugging-face.cn/deepseek-ai/DeepSeek-V2.5)

粘贴、拖放或点击以上传图片（.png、.jpeg、.jpg、.svg、.gif）