[AI医学] 领域指令微调样本生成框架OpenGPT与医疗健康大模型NHS-LLM

关键词:AI医疗,医学大模型,样本数据生成,OpenGPT, chatGPT, NHS, NHS-LLM

OpenGPT:基于领域知识生成指令微调样本框架

用于创建基于领域数据的指令微调样本,并在底座大模型上进行指令微调的一个开源框架。使用该框架,作者对医疗健康数据NHS生成指令微调样本,并在LLaMA的基础上进行微调训练,得到医疗健康大模型NHS-LLM。

指令微调样本的生成思路有别于standford-alpaca,相比于直接使用chatGPT的生成结果,引入领域数据作为chatGPT生成数据的参考,在一定程度能够约束生成结果的有效性,避免有偏或不恰当的生成结果。使用该框架还可用于在其他领域的非结构化数据上生成指令样本,这种样本数据的生成方式值得借鉴。

项目地址https://github.com/CogStack/opengpt

1. 样本数据描述及格式

  • NHS UK Q/A:24665对,基于NHS网站数据在OpenGPT框架下使用ChatGPT生成
text,raw_data_id
"<|user|> What is high blood pressure? <|eos|> <|ai|> High blood pressure is a condition where the force at which your heart pumps blood around your body is high. It is recorded with 2 numbers, the systolic pressure and the diastolic pressure, both measured in millimetres of mercury (mmHg).
References:
- https://www.nhs.uk/conditions/Blood-pressure-(high)/Pages/Introduction.aspx <|eos|> <|eod|>",0
  • NHS UK对话:2354条会话,基于NHS网站数据在OpenGPT框架下使用ChatGPT生成
text,raw_data_id
"<|user|> Hi, I am a bit worried about my blood pressure. How can I tell if I have high blood pressure? <|eos|> <|ai|> The only way to determine if you have high blood pressure is to have your blood pressure checked. All adults over 40 are recommended to have their blood pressure checked at least every 5 years. You can get your blood pressure checked at your GP surgery, some pharmacies, as part of your NHS Health Check or in some workplaces. You can also check your blood pressure yourself with a home blood pressure monitor.
References:
- https://www.nhs.uk/conditions/Blood-pressure-(high)/Pages/Introduction.aspx <|eos|> <|user|> What is considered high blood pressure? <|eos|> <|ai|> Blood pressure is measured with two numbers, the systolic pressure (higher number) is the force at which your heart pumps blood around your body, and the diastolic pressure (lower number) is the resistance to the blood flow in the blood vessels. A reading of 140/90mmHg or higher is considered high blood pressure. However, it's essential to note that what's considered high for you may be normal for someone else. It is always best to speak with your doctor about your individual health needs.
References:
- https://www.nhs.uk/conditions/Blood-pressure-(high)/Pages/Introduction.aspx <|eos|> <|user|> What are the risks of high blood pressure? <|eos|> <|ai|> Persistent high blood pressure can increase your risk of developing serious and potentially life-threatening health conditions, such as heart disease, heart attacks, strokes, heart failure, peripheral arterial disease, aortic aneurysms, and kidney disease. If you have high blood pressure, reducing it, even a small amount, can help lower your risk of these health conditions.
References:
- https://www.nhs.uk/conditions/Blood-pressure-(high)/Pages/Introduction.aspx <|eos|> <|user|> What can increase my risk of high blood pressure? <|eos|> <|ai|> It's not always clear what causes high blood pressure, but there are things that can increase your risk of developing it. These include being overweight, eating too much salt and not enough fruit and vegetables, not exercising enough, drinking too much alcohol or coffee, smoking, not getting much sleep, being over 65, having a relative with high blood pressure, and being of black African or Caribbean descent or living in a deprived area. Making healthy lifestyle changes can sometimes help reduce your chances of getting high blood pressure and help lower it if it's already high.
References:
- https://www.nhs.uk/conditions/Blood-pressure-(high)/Pages/Introduction.aspx <|eos|> <|user|> Can medication be used to treat high blood pressure? <|eos|> <|ai|> Yes, medication can be used to treat high blood pressure. If you're diagnosed with high blood pressure, your doctor may recommend taking one or more medicines to keep it under control. The medicine recommended for you will depend on things like how high your blood pressure is, your age, and your ethnicity. Lifestyle changes such as losing weight, reducing salt intake, exercising regularly, and quitting smoking can also help lower high blood pressure. It is essential to speak with your healthcare provider to determine the best course of action for your individual health needs.
References:
- https://www.nhs.uk/conditions/Blood-pressure-(high)/Pages/Introduction.aspx <|eos|> <|eod|>",0
  • 医疗任务/解决方案:4688对,在OpenGPT框架下使用GPT-4模型生成
text,raw_data_id
"<|user|> Analyse the patient's symptoms and provide a brief possible diagnosis.
A 35-year-old male patient reports experiencing persistent fatigue, joint pain, and unexplained weight loss over the past few months. He also mentions having a low-grade fever and occasional night sweats. <|eos|> <|ai|> The patient's symptoms could indicate a number of possible conditions, such as an autoimmune disorder (e.g., rheumatoid arthritis), infection (e.g., tuberculosis, HIV), or malignancy (e.g., lymphoma). Further history, examination, and investigations are required to confirm the diagnosis. <|eos|> <|eod|>",0
"<|user|> List the steps to correctly administer an intramuscular injection. <|eos|> <|ai|> 1. Gather all necessary equipment: needle, syringe, medication, alcohol wipes, and a cotton swab or gauze pad.
2. Select the appropriate injection site based on patient age, muscle size, and level of development: common sites include the deltoid, vastus lateralis, and ventrogluteal muscles.
3. Prepare the patient: explain the procedure and obtain consent.
4. Wash your hands and put on gloves.
5. Prepare the medication: draw up the appropriate dose using aseptic technique.
6. Clean the injection site with an alcohol wipe, moving in a circular motion from the center outward.
7. Hold the syringe like a dart and insert the needle quickly at a 90-degree angle to the skin.
8. Aspirate by pulling back the plunger slightly to check for blood, which would indicate inadvertent needle placement in a blood vessel.
9. If no blood is present, inject the medication slowly and steadily.
10. Withdraw the needle swiftly and apply a cotton swab or gauze pad with slight pressure to the site.
11. Dispose of the needle and syringe properly using a sharps container.
12. Document the injection, including the date, time, medication, dose, and site. <|eos|> <|eod|>",0

样本生成方式

本项目的重点在于提出了一种基于领域数据的指令样本生成方式,以NHS数据为例,从NHS Conditions上爬取数据,每个词条内容做为一条领域知识,通过下面的prompt生成样本{context}表示词条内容:

{context} - <e.g all the text from the Acromegaly page>
----------------
You are asked to come up with a set of 5 diverse task instructions in the field of medicine and healthcare.
All the task instructions have to be based on the context provided above, and cannot require external knowledge or information.
These tasks will be used to quiz an AI Assistant on the knowledge of the topic above.

Here are the requirements:
1. Try not to repeat the verb for each instruction to maximize diversity.
2. The language used for the instruction also should be diverse. For example, you should combine questions with imperative instructions.
3. The type of instructions should be diverse. Create instructions that include diverse kinds of tasks like step-by-step reasoning, multiple-choice-questions, open-ended generation, classification, editing, complex medical questions, etc.
4. Every instruction has to be self-contained, all the information necessary to solve the task has to be part of the instruction. For example, the instructions should never say "From the provided context" or "given the context" or "based on the context" or "based on the information".

List of 5 task instruction (every task has the following fields: Task Number:, Instruction:, Solution:):

生成结果如下

Task Number: 1
Instruction: What is considered a high blood pressure reading? 
Solution: Blood pressure readings are recorded with 2 numbers. The systolic pressure (higher number) is considered high when it is above 140, while the diastolic pressure (lower number) is high when it is above 90. However, readings between 120/80mmHg and 140/90mmHg could mean you're at risk of developing high blood pressure if you do not take steps to keep your blood pressure under control. The ideal blood pressure is usually between 90/60mmHg and 120/80mmHg.
...

疑问点: context为词条知识内容,如果过长怎么办(比如大于chatGPT的窗口4096)?

2. 模型训练

使用OpenGPT框架,借助chatGPT能力,以NHS数据为领域知识,生成训练样本,基础模型使用LLaMA-13B,训练数据同时使用了NICE GuidelineMIMIC-IIIOpen Assistant,对模型进行有监督指令微调,最终得到医疗健康领域模型NHS-LLM。在一些case上对比chatGPT测试,NHS-LLM的结果更加准确。

通过OpenGPT框架生成领域样本数据,在LLaMA上进行微调训练得到NHS-LLM模型:
在这里插入图片描述

参考:
https://aiforhealthcare.substack.com/p/a-large-language-model-for-healthcare

同步更新到AI加油站

----------END----------

猜你喜欢

转载自blog.csdn.net/iling5/article/details/130638104