1 Environment Construction
1.1 Install Anaconda3
1.2 Create a virtual environment
conda create -n GPT python=3.7 -y
conda activate GPT
2 installation
Install the openai package to use the OpenAI command line interface
Because we need to use the pandas library to convert the data format later, pandas needs to be installed here.
pip3 install openai -i https://pypi.doubanio.com/simple/
pip install pandas
OPENAI_API_KEYSet your environment variables by adding the following line to your shell initialization script (eg .bashrc, zshrc, etc.) or running it on the command line before the fine-tuning command:
set OPENAI_API_KEY="<OPENAI_API_KEY>"
Note: How to obtain OPENAI_API_KEY
3 Prepare training data
Here we take a data set on kaggle as an example, the download link: https://www.kaggle.com/datasets/egorovm/patient-disease?resource=download
After downloading and decompressing, we use disease_clean_symptoms.csv as example.
Open disease_clean_symptoms.csv with EXCEL as shown in the figure below.
Then we run process.py to process this data.
import pandas as pd
df = pd.read_csv('disease_clean_symptoms.csv',header=None,index_col=False,nrows=500,names=['prompt','completion'])
df.to_csv("disease_clean_symptoms_new.csv",index=False)
After running, open the generated file disease_clean_symptoms_new.csv, as shown in the figure below.
4 CLI data preparation tools
OpenAI has developed a tool to validate, advise and reformat your data:
openai tools fine_tunes.prepare_data -f disease_clean_symptoms_new.csv
This tool accepts different formats, the only requirement is that they contain hint and completion columns/keys. You can pass a CSV, TSV, XLSX, JSON, or JSONL file and it will save the output to a JSONL file for fine-tuning after it guides you through the process of suggested changes.
Enter Y according to the prompt during operation, and finally get the jsonl format file, as shown in the figure below.
5 Create a fine-tuned model
openai api fine_tunes.create -t "disease_clean_symptoms_new_prepared.jsonl" --batch_size 64 --model ada
success!
Note: Scientific Internet access is required here.
Reference: https://www.bilibili.com/video/BV1DU4y1c77Y/?spm_id_from=333.1007.top_right_bar_window_history.content.click&vd_source=0f8024a4585deeca68e0b223bb06f4c6