How to use GPT3 to fine-tune your own model in Windows environment

1 Environment Construction

1.1 Install Anaconda3

1.2 Create a virtual environment

conda create -n GPT python=3.7 -y
conda activate GPT

2 installation

Install the openai package to use the OpenAI command line interface
Because we need to use the pandas library to convert the data format later, pandas needs to be installed here.

pip3 install openai -i https://pypi.doubanio.com/simple/
pip install pandas

OPENAI_API_KEYSet your environment variables by adding the following line to your shell initialization script (eg .bashrc, zshrc, etc.) or running it on the command line before the fine-tuning command:

set OPENAI_API_KEY="<OPENAI_API_KEY>"

Note: How to obtain OPENAI_API_KEY
insert image description here

3 Prepare training data

Here we take a data set on kaggle as an example, the download link: https://www.kaggle.com/datasets/egorovm/patient-disease?resource=download
After downloading and decompressing, we use disease_clean_symptoms.csv as example.

insert image description here

Open disease_clean_symptoms.csv with EXCEL as shown in the figure below.

insert image description here

Then we run process.py to process this data.

import pandas as pd

df = pd.read_csv('disease_clean_symptoms.csv',header=None,index_col=False,nrows=500,names=['prompt','completion'])

df.to_csv("disease_clean_symptoms_new.csv",index=False)

After running, open the generated file disease_clean_symptoms_new.csv, as shown in the figure below.

insert image description here

4 CLI data preparation tools

OpenAI has developed a tool to validate, advise and reformat your data:

openai tools fine_tunes.prepare_data -f disease_clean_symptoms_new.csv

This tool accepts different formats, the only requirement is that they contain hint and completion columns/keys. You can pass a CSV, TSV, XLSX, JSON, or JSONL file and it will save the output to a JSONL file for fine-tuning after it guides you through the process of suggested changes.

Enter Y according to the prompt during operation, and finally get the jsonl format file, as shown in the figure below.
insert image description here

5 Create a fine-tuned model

openai api fine_tunes.create -t "disease_clean_symptoms_new_prepared.jsonl" --batch_size 64 --model ada

success!
Note: Scientific Internet access is required here.

Reference: https://www.bilibili.com/video/BV1DU4y1c77Y/?spm_id_from=333.1007.top_right_bar_window_history.content.click&vd_source=0f8024a4585deeca68e0b223bb06f4c6

Guess you like

Origin blog.csdn.net/weixin_44606139/article/details/130605686