Llama微调以及Ollama部署

news/2024/9/30 12:27:45 标签: llama

1 Llama微调

在基础模型的基础上，通过一些特定的数据集，将具有特定功能加在原有的模型上。

基础大模型我选择Mistral-7B-v0.3-Chinese-Chat-uncensored，
模型文件可以在HuggingFace 模型

微调大模型要想获得比较好的效果，拥有高质量的数据集是关键。可以选择用网上开源的，或者是自己制作。以中文数据集弱智吧为例，约1500条对话数据，数据集可以从HuggingFace 数据集

新建一个merge.py文件，将基础模型和lora模型合并为一个新的模型文件

执行merge.py，需要传入的参数（改成自己的）：
--base_model
基础模型路径
--lora_model
微调的lora模型路径
--output_dir
合并后模型的输出路径

利用llama.cpp进行量化模型.

1.安装CMAKE下载llama.cpp源码
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
2.安装依赖
pip install -r requirements/requirements-convert-hf-to-gguf.txt
cmake -B build
cmake --build build --config Release
3.执行转换脚本，将safetensors转换为gguf文件，便于量化
convert-hf-to-gguf.py 合并后模型的位置 --outtype f16 --outfile 转换后模型的位置my_llama3.gguf
4.对转后的文件进行量化操作。
llama.cpp所在位置\llama.cpp\build\bin\Release quantize.exe 转换后模型的路径量化后模型的位置quantized_model.gguf q4_0

至此，llama微调后的模型操作完毕，可以直接使用。

Ollama安装地址

在这里插入图片描述
打开Ollama，找到目录中现有的模型，使用ollama run llama3.2，来使用现有模型。

FROM 量化好的模型路径
TEMPLATE "[INST] {{ .Prompt }} [/INST]"

# ollama create 模型名字 -f Modelfile文件路径
ollama create panda -f  test.Modelfile