marco-o1:7b-q4_K_M - Ollama 框架

使用CoT数据进行微调： 我们通过使用开源CoT数据集与我们自己开发的合成数据相结合，对基础模型执行全参数微调来开发Marco-o1-CoT。
通过MCTS扩展解决方案空间： 我们将LLM与MCTS集成（Marco-o1-MCTS），使用模型的输出置信度来指导搜索并扩展解决方案空间。
推理行动策略： 我们实施了新的推理行动策略和反射机制（Marco-o1-MCTS mini-step），包括在MCTS框架内探索不同的行动粒度并提示模型进行自我反思，从而显着增强了模型解决复杂问题的能力。
在翻译任务中的应用： 我们是第一个将大型推理模型 (LRM) 应用于机器翻译任务的人，探索多语言和翻译领域中的推理时间缩放规律。

用法

ollama run marco-o1 "How many Rs are in strawberry?"

解析<Output>和</Output>之间的结果字符串

...
<Output>
There are 3 Rs in strawberry.
</Output>

参考文献

GitHub

HuggingFace

* **Fine-Tuning with CoT Data:** We develop <ins>Marco-o1-CoT</ins> by performing full-parameter fine-tuning on the base model using open-source CoT dataset combined with our self-developed synthetic data. 
* **Solution Space Expansion via MCTS:** We integrate LLMs with MCTS (<ins>Marco-o1-MCTS</ins>), using the model's output confidence to guide the search and expand the solution space. 
* **Reasoning Action Strategy:** We implement novel reasoning action strategies and a reflection mechanism (<ins>Marco-o1-MCTS mini-step</ins>), including exploring different action granularities within the MCTS framework and prompting the model to self-reflect, thereby significantly enhancing the model's ability to solve complex problems.
* **Application in Translation Tasks:** We are the first to apply Large Reasoning Models (LRM) to <ins>Machine Translation task</ins>, exploring inference time scaling laws in the multilingual and translation domain.

## Usage

```
ollama run marco-o1 "How many Rs are in strawberry?"
```

Parse the resulting string between `<Output>` and `</Output>`:

```
...
<Output>
There are 3 Rs in strawberry.
</Output>
```

## References

[GitHub](https://github.com/AIDC-AI/Marco-o1?tab=readme-ov-file)

[HuggingFace](https://hugging-face.cn/AIDC-AI/Marco-o1)

粘贴、拖放或单击以上传图像 (.png, .jpeg, .jpg, .svg, .gif)