hhao / openbmb-minicpm-llama3-v-2_5

MiniCPM-V在综合性能上超过了GPT-4V、Gemini Pro、Qwen-VL和Claude 3等专属模型，并支持30多种语言的跨模态对话。

视觉

24.8K次抽取更新于2个月前

更新于2个月前

2个月前

aca28a5e05a4 · 9.6GB

{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|>

254B

参数

{"num_ctx":2048,"num_keep":4,"stop":["<|start_header_id|>","<|end_header_id|>","<|eot_id|>"]}

Readme

注意：首先需要重新构建 ./ollama 二进制文件，有以下三种方式。

1. 下载二进制文件

前往发布页面下载文件。

🔥 尤其是构建在 debian 操作系统上的 ./ollama-linux-arm64 文件。它可以在安卓手机的 Termux 应用程序中运行。

启动服务器

./ollama-linux-x86_64 serve

运行此模型

ollama run hhao/openbmb-minicpm-llama3-v-2_5

2. 在 docker 中运行（使用 cpu 或 gpu）

🆕 支持 x86_64 和 arm64 架构操作系统。
支持 CUDA (NVIDIA) 和 ROCm (AMD)。更多详情 >>

# x86_64 arch
docker pull hihao/ollama-amd64

# arm64 arch
# docker pull hihao/ollama-arm64

docker run -d -v ./models:/root/.ollama -p 11434:11434 --name ollama hihao/ollama-amd64

docker exec -it ollama bash

ollama run hhao/openbmb-minicpm-llama3-v-2_5

3. 重新构建 ./ollama 二进制文件说明

安装需求

cmake 版本 3.24 或更高
go 版本 1.22 或更高
gcc 版本 11.4.0 或更高

设置代码

准备我们各自的 llama.cpp 分支和这个 Ollama 分支。

git clone -b minicpm-v2.5 https://github.com/OpenBMB/ollama.git
cd ollama/llm
git clone -b minicpm-v2.5 https://github.com/OpenBMB/llama.cpp.git
cd ../

MacOS 编译

这里我们提供了一个 MacOS 例子。查看开发者指南了解更多平台。

brew install go cmake gcc

可选：启用调试和更详细的日志记录

## At build time
export CGO_CFLAGS="-g"

## At runtime
export OLLAMA_DEBUG=1

获取所需的库并构建原生的 LLM 代码

go generate ./...

编译 ollama

go build .

启动服务器

./ollama serve

运行此模型

ollama run hhao/openbmb-minicpm-llama3-v-2_5

Windows 编译

注意：Ollama 的 Windows 编译仍在开发中。

安装所需的工具

MSVC 工具链 - 至少需要 C/C++ 和 cmake
Go 版本 1.22 或更高
MinGW（选择一种变体）带有 GCC。
- MinGW-w64
- MSYS2

$env:CGO_ENABLED="1"
go generate ./...
go build .

启动服务器

./ollama serve

运行此模型

ollama run hhao/openbmb-minicpm-llama3-v-2_5

Windows CUDA (NVIDIA) 编译

除了上述常用的 Windows 开发工具之外，在安装 MSVC 后安装 CUDA。

NVIDIA CUDA

Windows ROCm (AMD Radeon) 编译

除了上述常用的 Windows 开发工具之外，在安装 MSVC 后安装 AMD 的 HIP 包。

最后，将 MSVC 包含的 ninja.exe 添加到系统路径中（例如 C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja）。

Linux 编译

查看 Linux 的开发者指南。

MiniCPM-V: 您手机上的 GPT-4V 级多模态 LLM

MiniCPM-Llama3-V 2.5：🔥🔥🔥MiniCPM-V 系列中最新和功能最强大的模型。总共有 8B 个参数，此模型在整体性能上超过了 GPT-4V-1106、Gemini Pro、Qwen-VL-Max 和 Claude 3 等专有模型。装备了增强的 OCR 和指令跟随能力，该模型还可以支持包括英语、中文、法语、西班牙语、德语等在内的超过 30 种语言的跨模态对话。借助量化、编译优化以及 CPU 和 NPU 上的几种高效推理技术，MiniCPM-Llama3-V 2.5 可以在端侧设备上高效部署。

新闻

📌 粘贴

[2024.05.28] 🚀🚀🚀MiniCPM-Llama3-V 2.5 现已完全支持其在 llama.cpp 和 ollama 中的功能！请拉动 llama.cpp & amp; ollama 的最新代码。我们还在这里发布了各种尺寸的 GGUF 这里。ollama 使用的问题列表将在一天内发布。请保持关注！
[2024.05.28] 💫我们现在支持 MiniCPM-Llama3-V 2.5 的 LoRA 微调，只需 2 个 V100 GPU！查看更多统计数据这里。
[2024.05.23] 🔍 我们发布了 Phi-3-vision-128k-instruct 和 MiniCPM-Llama3-V 2.5 的全面比较，包括基准评估、多语言能力和推理效率 🌟📊🌍🚀。点击这里查看更多详细信息。
[2024.05.23] 🔥🔥🔥MiniCPM-V 排名 GitHub Trending 和 Hugging Face Trending！由 Hugging Face Gradio 官方账号推荐的演示，现在可在这里查看。来试一试吧！

[2024.05.25]MiniCPM-Llama3-V 2.5 现在支持流式输出和自定义系统提示。试一试这里！
[2024.05.24] 我们发布了 MiniCPM-Llama3-V 2.5 的 gguf，它支持 llama.cpp 推理，并在手机上提供 6~8 token/s 的流畅解码。现在试试！
[2024.05.20] 我们开源了 MiniCPM-Llama3-V 2.5，它改善了 OCR 功能并支持 30 多种语言，代表着第一个实现 GPT-4V 级性能的端侧 MLLM！我们提供了高效推理和简单微调。现在试试！
[2024.04.23] MiniCPM-V-2.0 现在支持 vLLM！点击这里查看更多详细信息。
[2024.04.18] 我们创建了 HuggingFace Space 用于托管 MiniCPM-V 2.0 的演示，位于这里！
[2024.04.17] MiniCPM-V-2.0 现在 Support 部署 WebUI 演示！
【2024.04.15】MiniCPM-V-2.0 现已支持使用 SWIFT 框架进行微调！
【2024.04.12】我们开源了 MiniCPM-V 2.0，其在理解场景文本方面与 Gemini Pro 性能相当，并在OpenCompass（覆盖11个流行基准的全面评估）上优于强 Qwen-VL-Chat 9.6B 和 Yi-VL 34B。点击此处查看 MiniCPM-V 2.0 技术博客。
【2024.03.14】MiniCPM-V 现在支持使用 SWIFT 框架进行微调。感谢Jintao 的贡献！
【2024.03.01】MiniCPM-V 现可部署于 Mac！

![image.png](https://github.com/OpenBMB/ollama/assets/4156702/e0e68673-5be1-4159-9065-b864c4079f63)

# **Note:** You need to rebuild ./ollama binary file first, there are 3 ways to do so.

## 1. Download the binary file
Go to [release page](https://github.com/hhao/ollama/releases) and download the file.
> 🔥 Especially the ./ollama-linux-arm64 file was build on debian os. It can run in [Termux](https://termux.dev/) app on android phone.

Start the server:
```
./ollama-linux-x86_64 serve
```

Running this model:
```
ollama run hhao/openbmb-minicpm-llama3-v-2_5
```
## 2. Running in docker (use cpu or gpu)
- 🆕 Support x86_64 and arm64 arch os.
- Support CUDA (NVIDIA) and ROCm (AMD). [more detail >>](https://github.com/ollama/ollama/blob/main/docs/docker.md)
```bash
# x86_64 arch
docker pull hihao/ollama-amd64

# arm64 arch
# docker pull hihao/ollama-arm64

docker run -d -v ./models:/root/.ollama -p 11434:11434 --name ollama hihao/ollama-amd64

docker exec -it ollama bash

ollama run hhao/openbmb-minicpm-llama3-v-2_5
```

## 3. Rebuild ./ollama binary file instruction
### Install Requirements

- cmake version 3.24 or higher
- go version 1.22 or higher
- gcc version 11.4.0 or higher

### Setup the Code

Prepare both our [llama.cpp](https://github.com/OpenBMB/llama.cpp.git) fork and this Ollama fork.

```bash
git clone -b minicpm-v2.5 https://github.com/OpenBMB/ollama.git
cd ollama/llm
git clone -b minicpm-v2.5 https://github.com/OpenBMB/llama.cpp.git
cd ../
```

### MacOS Build

Here we give a MacOS example. See the [developer guide](https://github.com/ollama/ollama/blob/main/docs/development.md) for **more platforms**.

```bash
brew install go cmake gcc
```

Optionally enable debugging and more verbose logging:

```bash
## At build time
export CGO_CFLAGS="-g"

## At runtime
export OLLAMA_DEBUG=1
```

Get the required libraries and build the native LLM code:

```bash
go generate ./...
```

Build ollama:

```bash
go build .
```

Start the server:

```
./ollama serve
```

Running this model:

```
ollama run hhao/openbmb-minicpm-llama3-v-2_5
```

### Windows Build

Note: The windows build for Ollama is still under development.

Install required tools:

- MSVC toolchain - C/C++ and cmake as minimal requirements
- Go version 1.22 or higher
- MinGW (pick one variant) with GCC.
  - [MinGW-w64](https://www.mingw-w64.org/)
  - [MSYS2](https://www.msys2.org/)

```powershell
$env:CGO_ENABLED="1"
go generate ./...
go build .
```

Start the server:

```
./ollama serve
```

Running this model:

```
ollama run hhao/openbmb-minicpm-llama3-v-2_5
```

#### Windows CUDA (NVIDIA) Build

In addition to the common Windows development tools described above, install CUDA after installing MSVC.

- [NVIDIA CUDA](https://docs.nvda.net.cn/cuda/cuda-installation-guide-microsoft-windows/index.html)

#### Windows ROCm (AMD Radeon) Build

In addition to the common Windows development tools described above, install AMDs HIP package after installing MSVC.

- [AMD HIP](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
- [Strawberry Perl](https://strawberryperl.com/)

Lastly, add `ninja.exe` included with MSVC to the system path (e.g. `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja`).

### Linux Build

See the [developer guide](https://github.com/ollama/ollama/blob/main/docs/development.md) for Linux.

---

## MiniCPM-V: A GPT-4V Level Multimodal LLM on Your Phone

- **MiniCPM-Llama3-V 2.5**: 🔥🔥🔥 The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model **surpasses proprietary models such as GPT-4V-1106, Gemini Pro, Qwen-VL-Max and Claude 3** in overall performance. Equipped with the enhanced OCR and instruction-following capability, the model can also support multimodal conversation for **over 30 languages** including English, Chinese, French, Spanish, German etc. With help of quantization, compilation optimizations, and several efficient inference techniques on CPUs and NPUs, MiniCPM-Llama3-V 2.5 can be **efficiently deployed on end-side devices**.

## News

#### 📌 Pinned

* [2024.05.28] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 now fully supports its feature in [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5)! Please pull the latest code for llama.cpp & ollama. We also release GGUF in various sizes [here](https://hugging-face.cn/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main). FAQ list for ollama usage is comming within a day. Please stay tuned!
* [2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics [here](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics).
* [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click [here](./docs/compare_with_phi-3_vision.md) to view more details.
* [2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and Hugging Face Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available [here](https://hugging-face.cn/spaces/openbmb/MiniCPM-Llama3-V-2_5). Come and try it out!

<br>

* [2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it [here](https://hugging-face.cn/openbmb/MiniCPM-Llama3-V-2_5#usage)!
* [2024.05.24] We release the MiniCPM-Llama3-V 2.5 [gguf](https://hugging-face.cn/openbmb/MiniCPM-Llama3-V-2_5-gguf), which supports [llama.cpp](#inference-with-llamacpp) inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!
* [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](./finetune/readme.md). Try it now!
* [2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click [here](#vllm) to view more details.
* [2024.04.18] We create a HuggingFace Space to host the demo of MiniCPM-V 2.0 at [here](https://hugging-face.cn/spaces/openbmb/MiniCPM-V-2)!
* [2024.04.17] MiniCPM-V-2.0 supports deploying [WebUI Demo](#webui-demo) now!
* [2024.04.15] MiniCPM-V-2.0 now also supports [fine-tuning](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md) with the SWIFT framework!
* [2024.04.12] We open-source MiniCPM-V 2.0, which achieves comparable performance with Gemini Pro in understanding scene text and outperforms strong Qwen-VL-Chat 9.6B and Yi-VL 34B on <a href="https://rank.opencompass.org.cn/leaderboard-multimodal">OpenCompass</a>, a comprehensive evaluation over 11 popular benchmarks. Click <a href="https://openbmb.vercel.app/minicpm-v-2">here</a> to view the MiniCPM-V 2.0 technical blog.
* [2024.03.14] MiniCPM-V now supports [fine-tuning](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v最佳实践.md) with the SWIFT framework. Thanks to [Jintao](https://github.com/Jintao-Huang) for the contribution！
* [2024.03.01] MiniCPM-V now can be deployed on Mac!

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)

hhao / openbmb-minicpm-llama3-v-2_5

MiniCPM-V在综合性能上超过了GPT-4V、Gemini Pro、Qwen-VL和Claude 3等专属模型，并支持30多种语言的跨模态对话。

Readme

注意： 首先需要重新构建 ./ollama 二进制文件，有以下三种方式。

1. 下载二进制文件

2. 在 docker 中运行（使用 cpu 或 gpu）

3. 重新构建 ./ollama 二进制文件说明

安装需求

设置代码

MacOS 编译

Windows 编译

Windows CUDA (NVIDIA) 编译

Windows ROCm (AMD Radeon) 编译

Linux 编译

MiniCPM-V: 您手机上的 GPT-4V 级多模态 LLM

新闻

📌 粘贴

注意：首先需要重新构建 ./ollama 二进制文件，有以下三种方式。