8B

88 pull请求 更新于2周前

2周前

92dd0dc27a50 · 8.5GB

阅读手册

Dolphin 2.9.4 Llama 3.1 8b 🐬

由Eric Hartford和Cognitive Computations整理和培养

Discord
Discord: https://discord.gg/h3K4XGj2RH

感谢赞助Dolphin 2.9.4的赞助商
- Crusoe云服务 - 提供优秀的按需8xL40S节点

此模型基于Meta Llama 3.1 8b,遵循Llama 3.1许可证。

基本模型有128K的上下文,我们的微调使用了8192序列长度。

Dolphin 2.9.4使用ChatML提示模板格式。

示例

<|im_start|>system
You are Dolphin, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Dolphin-2.9.4拥有各种指令遵循、对话和编码技能。它还具有代理能力并支持函数调用。
它特别训练以遵循系统提示,并在多种语言中执行指令。

Dolphin无需审查。我们已经过滤数据集,移除了对齐和偏见。这使得模型更符合要求。建议在使用此模型作为服务之前实现自己的对齐层。它将高度遵守任何请求,甚至是不道德的请求。请阅读我有关未审查模型的博客文章。https://erichartford.com/uncensored-models 您应对使用此模型创建的任何内容负责。请负责任地享乐。

评估``` hf (pretrained=/workspace/axolotl/dolphin-2.9.4-llama3.1-8b-hf,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (4) | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------| |leaderboard |N/A |none | 0|acc |↑ |0.2926|± |0.0041| | | |none | 0|acc_norm |↑ |0.4513|± |0.0053| | | |none | 0|exact_match |↑ |0.0982|± |0.0079| | | |none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A | | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A | | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184| | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178| | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.4931|± |0.0061| | - leaderboard_bbh_boolean_expressions | 0|none | 3|acc_norm |↑ |0.8000|± |0.0253| | - leaderboard_bbh_causal_judgement | 0|none | 3|acc_norm |↑ |0.5615|± |0.0364| | - leaderboard_bbh_date_understanding | 0|none | 3|acc_norm |↑ |0.4520|± |0.0315| | - leaderboard_bbh_disambiguation_qa | 0|none | 3|acc_norm |↑ |0.6640|± |0.0299| | - leaderboard_bbh_formal_fallacies | 0|none | 3|acc_norm |↑ |0.5600|± |0.0315| | - leaderboard_bbh_geometric_shapes | 0|none | 3|acc_norm |↑ |0.3640|± |0.0305| | - leaderboard_bbh_hyperbaton | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306| | - leaderboard_bbh_logical_deduction_five_objects | 0|none | 3|acc_norm |↑ |0.4600|± |0.0316| | - leaderboard_bbh_logical_deduction_seven_objects | 0|none | 3|acc_norm |↑ |0.4360|± |0.0314| | - leaderboard_bbh_logical_deduction_three_objects | 0|none | 3|acc_norm |↑ |0.6160|± |0.0308| | - leaderboard_bbh_movie_recommendation | 0|none | 3|acc_norm |↑ |0.7880|± |0.0259| | - leaderboard_bbh_navigate | 0|none | 3|acc_norm |↑ |0.5200|± |0.0317| | - leaderboard_bbh_object_counting | 0|none | 3|acc_norm |↑ |0.4520|± |0.0315| | - leaderboard_bbh_penguins_in_a_table | 0|none | 3|acc_norm |↑ |0.5205|± |0.0415| | - leaderboard_bbh_reasoning_about_colored_objects | 0|none | 3|acc_norm |↑ |0.5120|± |0.0317| | - leaderboard_bbh_ruin_names | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306| | - leaderboard_bbh_salient_translation_error_detection | 0|none | 3|acc_norm |↑ |0.4320|± |0.0314| | - leaderboard_bbh_snarks | 0|none | 3|acc_norm |↑ |0.5843|± |0.0370| | - leaderboard_bbh_sports_understanding | 0|none | 3|acc_norm |↑ |0.7040|± |0.0289| | - leaderboard_bbh_temporal_sequences | 0|none | 3|acc_norm |↑ |0.1440|± |0.0222| | - leaderboard_bbh_tracking_shuffled_objects_five_objects | 0|none | 3|acc_norm |↑ |0.1560|± |0.0230| | - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 0|none | 3|acc_norm |↑ |0.1320|± |0.0215| | - leaderboard_bbh_tracking_shuffled_objects_three_objects| 0|none | 3|acc_norm |↑ |0.2840|± |0.0286| | - leaderboard_bbh_web_of_lies | 0|none | 3|acc_norm |↑ |0.4840|± |0.0317| | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.2903|± |0.0132| | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm |↑ |0.2980|± |0.0326| | - leaderboard_gpqa_extended | 1|none | 0|acc_norm |↑ |0.2839|± |0.0193| | - leaderboard_gpqa_main | 1|none | 0|acc_norm |↑ |0.2946|± |0.0216| | - leaderboard_ifeval | 2|none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A | | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A | | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184| | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178| | - leaderboard_math_algebra_hard | 1|none | 4|exact_match |↑ |0.1596|± |0.0209| | - leaderboard_math_counting_and_prob_hard | 1|none | 4|exact_match |↑ |0.0488|± |0.0195| | - leaderboard_math_geometry_hard | 1|none | 4|exact_match |↑ |0.0530|± |0.0196| | - leaderboard_math_hard |N/A |none | 4|exact_match |↑ |0.0982|± |0.0079| | - leaderboard_math_intermediate_algebra_hard | 1|none | 4|exact_match |↑ |0.0143|± |0.0071| | - leaderboard_math_num_theory_hard | 1|none | 4|exact_match |↑ |0.0455|± |0.0168| | - leaderboard_math_prealgebra_hard | 1|none | 4|exact_match |↑ |0.2591|± |0.0316| | - leaderboard_math_precalculus_hard | 1|none | 4|exact_match |↑ |0.0519|± |0.0192| | - leaderboard_mmlu_pro | 0.1|none | 5|acc |↑ |0.2926|± |0.0041| | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.3862|± |0.0173| | - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm |↑ |0.5280|± |0.0316| | - leaderboard_musr_object_placements | 1|none | 0|acc_norm |↑ |0.3594|± |0.0300| | - leaderboard_musr_team_allocation | 1|none | 0|acc_norm |↑ |0.2720|± |0.0282| | Groups |Version|Filter|n-shot| Metric | |Value | |Stderr| |------------------------|-------|------|-----:|-----------------------|---|-----:|---|------| |leaderboard |N/A |none | 0|acc |↑ |0.2926|± |0.0041| | | |none | 0|acc_norm |↑ |0.4513|± |0.0053| | | |none | 0|exact_match |↑ |0.0982|± |0.0079| | | |none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A | | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A | | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184| | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178| | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.4931|± |0.0061| | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.2903|± |0.0132| | - leaderboard_math_hard|N/A |none | 4|exact_match |↑ |0.0982|± |0.0079| | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.3862|± |0.0173| ```

Built with Axolotl

参见axolotl配置

axolotl版本:0.4.1

base_model: meta-llama/Meta-Llama-3.1-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
# load_in_4bit: true
strict: false

datasets:
  - path: /workspace/datasets/dolphin-2.9.4/dolphin201-sharegpt2.jsonl
    type: sharegpt
    conversation: chatml

chat_template: chatml
# adapter: qlora
# lora_r: 128
# lora_alpha: 16
# lora_modules_to_save: [embed_tokens, lm_head]
# lora_dropout: 0.05
# lora_target_linear: true

unfrozen_parameters:
- input_layernorm
- model.norm
- post_attention_layernorm
- self_attn.rotary_emb
- ^lm_head.weight$
- ^model.embed_tokens.weight$
# mlp.down_proj layers
- model.layers.1.mlp.down_proj
- model.layers.0.mlp.down_proj
- model.layers.30.mlp.down_proj
- model.layers.2.mlp.down_proj
- model.layers.21.mlp.down_proj
- model.layers.22.mlp.down_proj
- model.layers.29.mlp.down_proj
- model.layers.5.mlp.down_proj
- model.layers.4.mlp.down_proj
- model.layers.20.mlp.down_proj
- model.layers.23.mlp.down_proj
- model.layers.19.mlp.down_proj
- model.layers.3.mlp.down_proj
- model.layers.17.mlp.down_proj
- model.layers.6.mlp.down_proj
- model.layers.31.mlp.down_proj
# mlp.up_proj layers
- model.layers.4.mlp.up_proj
- model.layers.3.mlp.up_proj
- model.layers.0.mlp.up_proj
- model.layers.5.mlp.up_proj
- model.layers.7.mlp.up_proj
- model.layers.6.mlp.up_proj
- model.layers.2.mlp.up_proj
- model.layers.1.mlp.up_proj
- model.layers.8.mlp.up_proj
- model.layers.12.mlp.up_proj
- model.layers.14.mlp.up_proj
- model.layers.9.mlp.up_proj
- model.layers.15.mlp.up_proj
- model.layers.17.mlp.up_proj
- model.layers.13.mlp.up_proj
- model.layers.19.mlp.up_proj
# self_attn.k_proj layers
- model.layers.29.self_attn.k_proj
- model.layers.25.self_attn.k_proj
- model.layers.23.self_attn.k_proj
- model.layers.28.self_attn.k_proj
- model.layers.21.self_attn.k_proj
- model.layers.19.self_attn.k_proj
- model.layers.22.self_attn.k_proj
- model.layers.20.self_attn.k_proj
- model.layers.24.self_attn.k_proj
- model.layers.31.self_attn.k_proj
- model.layers.27.self_attn.k_proj
- model.layers.26.self_attn.k_proj
- model.layers.17.self_attn.k_proj
- model.layers.11.self_attn.k_proj
- model.layers.18.self_attn.k_proj
- model.layers.14.self_attn.k_proj
# self_attn.o_proj layers
- model.layers.14.self_attn.o_proj
- model.layers.7.self_attn.o_proj
- model.layers.5.self_attn.o_proj
- model.layers.11.self_attn.o_proj
- model.layers.6.self_attn.o_proj
- model.layers.24.self_attn.o_proj
- model.layers.9.self_attn.o_proj
- model.layers.13.self_attn.o_proj
- model.layers.10.self_attn.o_proj
- model.layers.12.self_attn.o_proj
- model.layers.8.self_attn.o_proj
- model.layers.25.self_attn.o_proj
- model.layers.21.self_attn.o_proj
- model.layers.23.self_attn.o_proj
- model.layers.15.self_attn.o_proj
- model.layers.16.self_attn.o_proj
# self_attn.q_proj layers
- model.layers.8.self_attn.q_proj
- model.layers.13.self_attn.q_proj
- model.layers.9.self_attn.q_proj
- model.layers.14.self_attn.q_proj
- model.layers.10.self_attn.q_proj
- model.layers.11.self_attn.q_proj
- model.layers.0.self_attn.q_proj
- model.layers.15.self_attn.q_proj
- model.layers.1.self_attn.q_proj
- model.layers.6.self_attn.q_proj
- model.layers.5.self_attn.q_proj
- model.layers.7.self_attn.q_proj
- model.layers.12.self_attn.q_proj
- model.layers.16.self_attn.q_proj
- model.layers.17.self_attn.q_proj
- model.layers.26.self_attn.q_proj
# self_attn.v_proj layers
- model.layers.26.self_attn.v_proj
- model.layers.17.self_attn.v_proj
- model.layers.3.self_attn.v_proj
- model.layers.28.self_attn.v_proj
- model.layers.29.self_attn.v_proj
- model.layers.21.self_attn.v_proj
- model.layers.15.self_attn.v_proj
- model.layers.16.self_attn.v_proj
- model.layers.20.self_attn.v_proj
- model.layers.25.self_attn.v_proj
- model.layers.6.self_attn.v_proj
- model.layers.23.self_attn.v_proj
- model.layers.4.self_attn.v_proj
- model.layers.1.self_attn.v_proj
- model.layers.22.self_attn.v_proj
- model.layers.14.self_attn.v_proj
# mlp.gate_proj layers
- model.layers.1.mlp.gate_proj
- model.layers.2.mlp.gate_proj
- model.layers.3.mlp.gate_proj
- model.layers.4.mlp.gate_proj
- model.layers.0.mlp.gate_proj
- model.layers.25.mlp.gate_proj
- model.layers.26.mlp.gate_proj
- model.layers.5.mlp.gate_proj
- model.layers.24.mlp.gate_proj
- model.layers.28.mlp.gate_proj
- model.layers.23.mlp.gate_proj
- model.layers.27.mlp.gate_proj
- model.layers.21.mlp.gate_proj
- model.layers.22.mlp.gate_proj
- model.layers.29.mlp.gate_proj
- model.layers.20.mlp.gate_proj




dataset_prepared_path:  /workspace/axolotl/dolph-2.9.4-nemo-prepared
val_set_size: 0.01
output_dir: /workspace/axolotl/dolphin-2.9.4-llama3.1-8b

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

wandb_project: dolphin-2.9.4-llama3.1-8b
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 16
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 5e-6
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32:

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
# evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
save_total_limit: 2
save_steps:
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.1
special_tokens:
  eos_token: "<|im_end|>"
  bos_token: "<|begin_of_text|>"
  pad_token: "<|finetune_right_pad_id|>"
tokens:
  - "<|im_start|>"


# fsdp:
#   - full_shard
#   - auto_wrap
# fsdp_config:
#   fsdp_limit_all_gathers: true
#   fsdp_sync_module_states: true
#   fsdp_offload_params: true
#   fsdp_use_orig_params: false
#   fsdp_cpu_ram_efficient_loading: true
#   fsdp_transformer_layer_cls_to_wrap: MixtralSparseMoeBlock
#   fsdp_state_dict_type: FULL_STATE_DICT
#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
#   fsdp_sharding_strategy: FULL_SHARD
#   fsdp_forward_prefetch: false
#   fsdp_backward_prefetch: BACKWARD_PRE


工作空间/axolotl/dolphin-2.9.4-llama3.1-8b

此模型是基于meta-llama/Meta-Llama-3.1-8B在None数据集上的微调版本。
在评估集上实现了以下结果
- 损失:0.5655

模型描述

更多信息

预期用途和限制

更多信息

训练和评估数据

更多信息

训练流程

训练超参数

训练期间使用了以下超参数
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 16
- 总训练批次大小: 256
- 总评估批次大小: 16
- 优化器: Adam,beta=(0.9,0.999) 和 epsilon=1e-08
- 学习率调度器类型: 余弦
- 学习率预热步骤: 100
- 总周期数: 3

训练结果

训练损失 周期 步骤 验证损失
0.5837 1.0180 1161 0.5814
0.5525 2.0179 2322 0.5671
0.5514 2.9624 3420 0.5655

框架版本