taozhiyuai / qwen2-57b-a14b-instruct

Qwen2 MOE 57B

376 拉取更新于2个月前

更新于2个月前

2个月前

363e858044b1 · 61GB

README

Qwen2-57B-A14B-Instruct

简介

Qwen2是Qwen大型语言模型的新系列。对于Qwen2，我们发布了从0.5到720亿参数的多个基础语言模型和指令微调语言模型，包括一个混合专家模型。本仓库包含指令微调的57B-A14B混合专家Qwen2模型。

与最先进的开源语言模型（包括之前发布的Qwen1.5）相比，Qwen2通常超过了大多数开源模型，并在一系列针对语言理解、语言生成、多语言能力、编码、数学、推理等方面的基准测试中展现了与专有模型的竞争力。

Qwen2-57B-A14B-Instruct支持高达65,536个标记的上下文长度，能够处理大量的输入。请参阅本节以获取如何部署Qwen2处理长文本的详细说明。

更多详情请参阅我们的博客和GitHub。

模型详情

Qwen2是一个包括不同模型大小的解码语言模型的系列。对于每个大小，我们发布了基础语言模型和匹配聊天模型。它是基于具有SwiGLU激活、注意力QKV偏差、分组查询注意力等的Transformer架构。此外，我们还有一个改进的适用于多种自然语言和代码的分词器。

评估

我们简要比较了Qwen2-57B-A14B-Instruct与其他类似规模的指令微调LLMs，包括Qwen1.5-32B-Chat。结果如下所示：

从https://hf-mirror.com/Qwen/Qwen2-57B-A14B-Instruct导入

微信号：TAOZHIYUAI

Qwen2-57B-A14B-Instruct
![截屏2024-06-09 07.10.44.png](https://ollama.ac.cn/assets/taozhiyuai/qwen2-57b-a14b-instruct/8e81a52e-ba99-4fa8-85a1-dbf057f52674)

Introduction

![截屏2024-06-09 07.11.54.png](https://ollama.ac.cn/assets/taozhiyuai/qwen2-57b-a14b-instruct/22ed46fd-d057-4b36-928e-1191929e4a4b)

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 57B-A14B Mixture-of-Experts Qwen2 model.

Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.

Qwen2-57B-A14B-Instruct supports a context length of up to 65,536 tokens, enabling the processing of extensive inputs. Please refer to this section for detailed instructions on how to deploy Qwen2 for handling long texts.

For more details, please refer to our blog and GitHub.

Model Details

Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.

Evaluation

We briefly compare Qwen2-57B-A14B-Instruct with similar-sized instruction-tuned LLMs, including Qwen1.5-32B-Chat. The results are shown as follows:

![截屏2024-06-09 06.59.57.png](https://ollama.ac.cn/assets/taozhiyuai/qwen2-57b-a14b-instruct/805372c3-0415-45c6-8bae-44d6eae6acae)

import from https://hf-mirror.com/Qwen/Qwen2-57B-A14B-Instruct

# WeChat ID : TAOZHIYUAI

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)