taozhiyuai / qwen2-57b-a14b-instruct

Qwen2 MOE 57B

376 拉取 2 个月前更新

2 个月前更新

2 个月前

5137995288ad · 47GB

README

Qwen2-57B-A14B-Instruct

简介

Qwen2是Qwen大型语言模型的新系列。对于Qwen2，我们发布了从0.5到72亿的多个基语言模型和指令微调语言模型，包括混合专家模型。此存储库包含指令微调的57B-A14B混合专家Qwen2模型。

与最先进的开源语言模型相比，包括之前发布的Qwen1.5，Qwen2在大多数开源模型中普遍超越，并在一系列针对语言理解、语言生成、多语言能力、编码、数学、推理等方面的基准测试中与专有模型竞争。

Qwen2-57B-A14B-Instruct支持最多65536个标记的上下文长度，能够处理大量输入。请参阅本节以获取有关如何部署Qwen2处理长文本的详细信息。

更详细的信息，请参阅我们的博客和GitHub。

模型详情

Qwen2是一个包括不同大小解码器语言模型的系列。对于每个大小，我们发布基语言模型和对齐聊天模型。它基于具有SwiGLU激励函数、QKV偏差、分组查询注意力等的Transformer架构。此外，我们还有一个适用于多种自然语言和代码的改进分词器。

评估

我们简要比较了Qwen2-57B-A14B-Instruct与相同大小的指令微调大型语言模型，包括Qwen1.5-32B-Chat。结果如下

从https://hf-mirror.com/Qwen/Qwen2-57B-A14B-Instruct导入

微信号：TAOZHIYUAI

Qwen2-57B-A14B-Instruct
![截屏2024-06-09 07.10.44.png](https://ollama.ac.cn/assets/taozhiyuai/qwen2-57b-a14b-instruct/8e81a52e-ba99-4fa8-85a1-dbf057f52674)

Introduction

![截屏2024-06-09 07.11.54.png](https://ollama.ac.cn/assets/taozhiyuai/qwen2-57b-a14b-instruct/22ed46fd-d057-4b36-928e-1191929e4a4b)

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 57B-A14B Mixture-of-Experts Qwen2 model.

Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.

Qwen2-57B-A14B-Instruct supports a context length of up to 65,536 tokens, enabling the processing of extensive inputs. Please refer to this section for detailed instructions on how to deploy Qwen2 for handling long texts.

For more details, please refer to our blog and GitHub.

Model Details

Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.

Evaluation

We briefly compare Qwen2-57B-A14B-Instruct with similar-sized instruction-tuned LLMs, including Qwen1.5-32B-Chat. The results are shown as follows:

![截屏2024-06-09 06.59.57.png](https://ollama.ac.cn/assets/taozhiyuai/qwen2-57b-a14b-instruct/805372c3-0415-45c6-8bae-44d6eae6acae)

import from https://hf-mirror.com/Qwen/Qwen2-57B-A14B-Instruct

# WeChat ID : TAOZHIYUAI

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)