[LLM] Набор данных для точной настройки инструкции по сборке для самостоятельного обучения

1. Процесс самообучения

вставьте сюда описание изображения

  • Четыре шага: генерация инструкций, распознавание задач классификации, генерация экземпляров, фильтрация и постобработка.
  • Чтобы эмпирически оценить SELF-INSTRUCT, платформа была запущена на GPT3 (Brown et al., 2020), и итерационный процесс SELF-INSTRUCT на этой модели дал около 52 тыс. инструкций и около 82 тыс. входных и выходных пар экземпляров. Полученные данные предоставили широкий спектр творческих задач, более 50% из которых имели менее 0,3 ROUGE-L совпадения с начальной инструкцией (§4.2). Другие большие модели можно точно настроить, используя сгенерированные данные инструкций.

Во-вторых, конкретный процесс

1. Генерация инструкций

  • 175 исходных задач (каждая соответствует 1 инструкции + 1 экземпляру), 8 задач выбираются случайным образом из пула задач, 6 из которых представляют собой 175 рукописных данных, а 2 извлекаются из предыдущего шага генерации модели
  • Запрошенный шаблон подсказки выглядит следующим образом:
Come up with a series of tasks:
Task 1:  {instruction for existing task 1}
Task 2:  {instruction for existing task 2}
Task 3:  {instruction for existing task 3}
Task 4:  {instruction for existing task 4}
Task 5:  {instruction for existing task 5}
Task 6:  {instruction for existing task 6}
Task 7:  {instruction for existing task 7}
Task 8:  {instruction for existing task 8}
Task 9:
  • Альпака также использует метод самообучения test-davinci-003 при точной настройке ламы.
You are asked to come up with a set of 30 diverse task instructions. These task instructions will be given to a GPT model and we will evaluate the GPT model for completing the instructions.

Here are the requirements:
1. Try not to repeat the verb for each instruction to maximize diversity.
2. The language used for the instruction also should be diverse. For example, you should combine questions with imperative instrucitons.
3. The type of instructions should be diverse. The list should include diverse types of tasks like open-ended generation, classification, editing, etc.
4. A GPT language model should be able to complete the instruction. For example, do not ask the assistant to create any visual or audio output. For another example, do not ask the assistant to wake you up at 5pm or set a reminder because it cannot perform any action.
5. The instructions should be in English.
6. The instructions should be 1 to 2 sentences long. Either an imperative sentence or a question is permitted.
7. You should generate an appropriate input to the instruction. The input field should contain a specific example provided for the instruction. It should involve realistic data and should not contain simple placeholders. The input should provide substantial content to make the instruction challenging but should ideally not exceed 100 words.
8. Not all instructions require input. For example, when a instruction asks about some general information, "what is the highest peak in the world", it is not necssary to provide a specific context. In this case, we simply put "<noinput>" in the input field.
9. The output should be an appropriate response to the instruction and the input. Make sure the output is less than 100 words.
10. Make sure the output is gramatically correct with punctuation if needed.
List of 30 tasks:

2. Идентификация задачи классификации

  • Определить, относится ли сгенерированная инструкция к задаче классификации
  • Классификационная модель, несколько выстрелов. Шаблон:
Can the following task be regarded as a classification task with finite output labels?

Task: Given my personality and the job, tell me if I would be suitable.
Is it classification? Yes

Task: Give me an example of a time when you had to use your sense of humor.
Is it classification? No

Task: Replace the placeholders in the given text with appropriate named entities.
Is it classification? No

Task: Fact checking - tell me if the statement is true, false, or unknown, based on your knowledge and common sense.
Is it classification? Yes

Task: Return the SSN number for the person.
Is it classification? No

Task: Detect if the Reddit thread contains hate speech.
Is it classification? Yes

Task: Analyze the sentences below to identify biases.
Is it classification? No

Task: Select the longest sentence in terms of the number of words in the paragraph, output the sentence index.
Is it classification? Yes

Task: Find out the toxic word or phrase in the sentence.
Is it classification? No

Task: Rank these countries by their population.
Is it classification? No

Task: You are provided with a news article, and you need to identify all the categories that this article belongs to. Possible categories include: Music, Sports, Politics, Tech, Finance, Basketball, Soccer, Tennis, Entertainment, Digital Game, World News. Output its categories one by one, seperated by comma.
Is it classification? Yes

Task: Given the name of an exercise, explain how to do it.
Is it classification? No

Task: Select the oldest person from the list.
Is it classification? Yes

Task: Find the four smallest perfect numbers.
Is it classification? No

Task: Does the information in the document supports the claim? You can answer "Support" or "Unsupport".
Is it classification? Yes

Task: Create a detailed budget for the given hypothetical trip.
Is it classification? No

Task: Given a sentence, detect if there is any potential stereotype in it. If so, you should explain the stereotype. Else, output no.
Is it classification? No

Task: Explain the following idiom to me, and try to give me some examples.
Is it classification? No

Task: Is there anything I can eat for a breakfast that doesn't include eggs, yet includes protein, and has roughly 700-1000 calories?
Is it classification? No

Task: Answer the following multiple choice question. Select A, B, C, or D for the final answer.
Is it classification? Yes

Task: Decide whether the syllogism is logically sound.
Is it classification? Yes

Task: How can individuals and organizations reduce unconscious bias?
Is it classification? No

Task: What are some things you can do to de-stress?
Is it classification? No

Task: Find out the largest one from a set of numbers. Output the number directly.
Is it classification? Yes

Task: Replace the <mask> token in the text with proper words that are consistent with the context. You can use multiple words for each <mask> token.
Is it classification? No

Task: Write a cover letter based on the given facts.
Is it classification? No

Task: Identify the pos tag of the word in the given sentence.
Is it classification? Yes

Task: Write a program to compute the sum of integers from k to n.
Is it classification? No

Task: In this task, you need to compare the meaning of the two sentences and tell if they are the same. Output yes or no.
Is it classification? Yes

Task: To make the pairs have the same analogy, write the fourth word.
Is it classification? No

Task: Given a set of numbers, find all possible subsets that sum to a given number.
Is it classification? No

Task: {
    
    instruction for the target task}

3. Генерация экземпляра

  • В соответствии с типом инструкции + задачи 1 и 2 сгенерировать экземпляр для инструкции
  • Задачи, не связанные с классификацией, подход «ввод-сначала»
Come up with examples for the following tasks. Try to generate multiple examples when possible. If the task doesn't require additional input, you can generate the output directly.

Task: Which exercises are best for reducing belly fat at home?
Output:
- Lying Leg Raises
- Leg In And Out
- Plank
- Side Plank
- Sit-ups

Task: Extract all the country names in the paragraph, list them separated by commas.
Example 1
Paragraph: Dr. No is the sixth novel by the English author Ian Fleming to feature his British Secret Service agent James Bond. Written at Fleming's Goldeneye estate in Jamaica, it was first published in the United Kingdom by Jonathan Cape in 1958. In the novel Bond looks into the disappearance in Jamaica of two fellow MI6 operatives who had been investigating Doctor No. Bond travels to No's Caribbean island and meets Honeychile Rider, who is there to collect shells. They are captured and taken to a luxurious facility carved into a mountain. The character of Doctor No, the son of a German missionary and a Chinese woman, was influenced by Sax Rohmer's Fu Manchu stories. Dr. No was the first of Fleming's novels to face widespread negative reviews in Britain, but it was received more favourably in the United States.
Output: English, British, Jamaica, the United Kingdom, German, Chinese, Britain, the United States.

Task: Converting 85 F to Celsius.
Output: 85°F = 29.44°C

Task: Sort the given list ascendingly. 
Example 1
List: [10, 92, 2, 5, -4, 92, 5, 101]
Output: [-4, 2, 5, 5, 10, 92, 92, 101]
Example 2
Input 2 - List: [9.99, 10, -5, -1000, 5e6, 999]
Output: [-1000, -5, 9.99, 10, 999, 5e6]

Task: Suggest a better and more professional rephrasing of the following sentence.
Example 1
Sentence: This house is surprisingly not constructed very well, and you probably need more money to fix it after you buy it. If you ask me, I would suggest you to consider other candidates.
Output: This house does not seem to be constructed well, so you may need to spend more money to fix it after you purchase it. I would suggest that you look at other properties.
Example 2
Sentence: Just so you know, we did an experiment last week and found really surprising results - language model can improve itself!
Output: Our experiments last week demonstrated surprising results, proving that the language model can improve itself.

Task: Read the following paragraph and answer a math question about the paragraph. You need to write out the calculation for getting the final answer.
Example 1
Paragraph: Gun violence in the United States results in tens of thousands of deaths and injuries annually, and was the leading cause of death for children 19 and younger in 2020. In 2018, the most recent year for which data are available as of 2021, the Centers for Disease Control and Prevention's (CDC) National Center for Health Statistics reports 38,390 deaths by firearm, of which 24,432 were by suicide. The rate of firearm deaths per 100,000 people rose from 10.3 per 100,000 in 1999 to 12 per 100,000 in 2017, with 109 people dying per day or about 14,542 homicides in total, being 11.9 per 100,000 in 2018. In 2010, there were 19,392 firearm-related suicides, and 11,078 firearm-related homicides in the U.S. In 2010, 358 murders were reported involving a rifle while 6,009 were reported involving a handgun; another 1,939 were reported with an unspecified type of firearm. In 2011, a total of 478,400 fatal and nonfatal violent crimes were committed with a firearm.
Question: How many more firearm-related deaths were there in 2018 compared to 2010?
Output:
38390 - (19392 + 11078) = 38390 - 30470 = 7920. 
So, in 2018, there were 7920 more deaths by firearm than in 2010.

Task: Write Python code to solve this leetcode problem.
Example 1
Problem: You are given two non-empty linked lists representing two non-negative integers. The digits are stored in reverse order, and each of their nodes contains a single digit. Add the two numbers and return the sum as a linked list. You may assume the two numbers do not contain any leading zero, except the number 0 itself.
Output:
class Solution(object):
    def addTwoNumbers(self, l1, l2):
        carry = 0
        root = n = ListNode(0)
        while l1 or l2 or carry:
            v1 = v2 = 0
            if l1:
                v1 = l1.val
                l1 = l1.next
            if l2:
                v2 = l2.val
                l2 = l2.next
            carry, val = divmod(v1+v2+carry, 10)
            n.next = ListNode(val)
            n = n.next
        return root.next

Task: Solve the equation and find the value of X. Show your steps.
Example 1
Equation: 10X + 5 = 10
Output: 10X = 5,  X = 0.5
Example 2
Equation: X + Y + 120 = 100
Output: X + Y = -20, X = -20 - Y

Task: Write a program to compute the sum of integers from k to n.
Output:
def sum(k, n):
    sum = 0
    for i in range(k, n+1):
        sum += i
    return sum

Task: Select the oldest person from the given list.
Example 1
List: George Washington, Confucius, Michael Jordan, Michelangelo
Output: Confucious
Example 2
List: Alan Turing, Geoffrey Hinton, Yann LeCun, Yoshua Bengio
Output: Alan Turing

Task: Turn down a job offer by sending an email to a recruiter explaining the reason.
Output: Hi  [Recruiter],
Thank you so much for the generous offer to join your team. As we discussed, I’ve admired the company for a number of years, and am a proud endorser of its products. However, after further consideration of where I currently am in my career, I’ve decided to accept an offer at another company.
I would love to stay in touch with you and have already started following you on [Social Media Platform]. Again, thank you so much for your time and consideration.
Thanks again,
[Your Name]

Task: {
    
    Instruction for the target task}
  • Задача классификации, выходной подход. Приоритет ввода сейчас не подходит для задач классификации, и сгенерированный ввод будет смещен в сторону определенной метки метки.
Given the classification task definition and the class labels, generate an input that corresponds to each of the class labels. If the task doesn't require input, just generate possible class labels.

Task: Classify the sentiment of the sentence into positive, negative, or mixed.
Class label: mixed
Sentence: I enjoy the flavor of the restaurant but their service is too slow.
Class label: Positive
Sentence: I had a great day today. The weather was beautiful and I spent time with friends and family.
Class label: Negative
Sentence: I was really disappointed by the latest superhero movie. I would not recommend it to anyone.

Task: Given a dialogue, classify whether the user is satisfied with the service. You should respond with "Satisfied" or "Unsatisfied".
Class label: Satisfied
Dialogue:
- Agent: Thank you for your feedback. We will work to improve our service in the future.
- Customer: I am happy with the service you provided. Thank you for your help.
Class label: Unsatisfied
Dialogue:
- Agent: I am sorry we will cancel that order for you, and you will get a refund within 7 business days.
- Customer: oh that takes too long. I want you to take quicker action on this.

Task: Given some political opinions, classify whether the person belongs to Democrats or Republicans.
Class label: Democrats
Opinion: I believe that everyone should have access to quality healthcare regardless of their income level.
Class label: Republicans
Opinion: I believe that people should be able to keep more of their hard-earned money and should not be taxed at high rates.

Task: Tell me if the following email is a promotion email or not.
Class label: Promotion
Email: Check out our amazing new sale! We've got discounts on all of your favorite products.
Class label: Not Promotion
Email: We hope you are doing well. Let us know if you need any help.

Task: Detect if the Reddit thread contains hate speech.
Class label: Hate Speech
Thread: All people of color are stupid and should not be allowed to vote.
Class label: Not Hate Speech
Thread: The best way to cook a steak on the grill.

Task:  Does the information in the document supports the claim? You can answer "Support" or "Unsupport".
Class label: Unsupport
Document: After a record-breaking run that saw mortgage rates plunge to all-time lows and home prices soar to new highs, the U.S. housing market finally is slowing. While demand and price gains are cooling, any correction is likely to be a modest one, housing economists and analysts say. No one expects price drops on the scale of the declines experienced during the Great Recession.
Claim: The US housing market is going to crash soon.
Class label: Support
Document: The U.S. housing market is showing signs of strain, with home sales and prices slowing in many areas. Mortgage rates have risen sharply in recent months, and the number of homes for sale is increasing. This could be the beginning of a larger downturn, with some economists predicting a potential housing crash in the near future.
Claim: The US housing market is going to crash soon.

Task: Answer the following multiple-choice question. Select A, B, C, or D for the final answer.
Class label: C
Question: What is the capital of Germany?
A. London
B. Paris
C. Berlin
D. Rome
Class label: D
Question: What is the largest planet in our solar system?
A) Earth
B) Saturn
C) Mars
D) Jupiter
Class label: A
Question: What is the process by which plants make their own food through photosynthesis?
A) Respiration
B) Fermentation
C) Digestion
D) Metabolism
Class label: B
Question: Who wrote the novel "The Great Gatsby"?
A) Ernest Hemingway
B) F. Scott Fitzgerald
C) J.D. Salinger
D) Mark Twain

Task: You need to read a code and detect if there is a syntax error or not. Output true if there is an error, output false if there is not.
Class label: true
Code:
def quick_sort(arr):
    if len(arr) < 2
        return arr
Class label: False
Code:
def calculate_average(numbers):
    total = 0
    for number in numbers:
        total += number
    return total / len(numbers)

Task: You are provided with a news article, and you need to identify all the categories that this article belongs to. Possible categories include Sports and Politics. Output its categories one by one, separated by a comma.
Class label: Sports
Article: The Golden State Warriors have won the NBA championship for the second year in a row.
Class label: Politics
Article: The United States has withdrawn from the Paris Climate Agreement.
Class label: Politics, Sports
Article: The government has proposed cutting funding for youth sports programs.

Task: Given a credit card statement, the cardholder's spending habits, and the account balance, classify whether the cardholder is at risk of defaulting on their payments or not.
Class label: At risk
Credit card statement: Purchases at high-end clothing stores and luxury hotels.
Cardholder's spending habits: Frequent purchases at luxury brands and high-end establishments.
Account balance: Over the credit limit and multiple missed payments.
Class label: Not at risk
Credit card statement: Purchases at grocery stores and gas stations.
Cardholder's spending habits: Regular purchases for necessary expenses and occasional dining out.
Account balance: Slightly below the credit limit and no missed payments.

Task: Given a social media post, the hashtags used, and a topic. classify whether the post is relevant to the topic or not.
Class label: Relevant
Post: I can't believe the government is still not taking action on climate change. It's time for us to take matters into our own hands.
Hashtags: #climatechange #actnow
Topic: Climate change
Class label: Not relevant 
Post: I just bought the new iPhone and it is amazing!
Hashtags: #apple #technology
Topic: Travel

Task: The answer will be 'yes' if the provided sentence contains an explicit mention that answers the given question. Otherwise, answer 'no'. 
Class label: Yes
Sentence: Jack played basketball for an hour after school.
Question: How long did Jack play basketball?
Class label: No
Sentence: The leaders of the Department of Homeland Security now appear before 88 committees and subcommittees of Congress.
Question: How often are they required to appear?

Task: Tell me what's the second largest city by population in Canada.
Class label: Montreal

Task: Classifying different types of mathematical equations, such as linear, and quadratic equations, based on the coefficients and terms in the equation.
Class label: Linear equation
Equation: y = 2x + 5
Class label: Quadratic equation
Equation: y = x^2 - 4x + 3

Task: Tell me the first number of the given list.
Class label: 1
List: 1, 2, 3
Class label: 2
List: 2, 9, 10

Task: Which of the following is not an input type? (a) number (b) date (c) phone number (d) email address (e) all of these are valid inputs.
Class label: (e)

Task: {
    
    Instruction for the target task}

4. Фильтрация и постобработка

  • Если значение ROUGE-L новой инструкции и инструкции пула задач меньше 0,7, это означает, что она является достаточно «новой», чтобы добавить новую инструкцию в пул задач.
  • Дедупликация: например, один и тот же экземпляр ввода и вывода, вывод представляет собой повторяющееся описание ввода и т. д.

3. Другие части

1. Проверьте качество данных

  • Случайным образом выбранные из сгенерированных образцов, 200 инструкций выбирают по одному экземпляру для каждой инструкции, а эксперты отмечают и проверяют, является ли экземпляр разумным.
  • Из следующего видно, что ввод и инструкция более эффективны и точны, а другие нуждаются в доработке.

вставьте сюда описание изображения

2. Анализ части речи данных, сгенерированных GPT3+SELF-INSTRUCT.

  • 20 наиболее распространенных глаголов в сгенерированных примерах находятся во внутреннем круге, и существительные во внешнем круге также распределены с хорошим разнообразием.
    вставьте сюда описание изображения

3. Индикатор Руж-Л

L — первая буква LCS (самая длинная общая подпоследовательность, самая длинная общая подпоследовательность), потому что Rouge-L использует самую длинную общую подпоследовательность. Метод расчета Rouge-L следующий:
RICS = LCS ( X , Y ) m R _ {\ mathrm {ICS}} = \ frac {\ mathrm {LCS} (X, Y)} {\ mathrm {m}}рИКС"="мЛСК ( X ,Й )
п lcs знак равно LCS ⁡ ( Икс , Y ) п P _ {\ mathrm {LCS}} = \ гидроразрыва {\ OperatorName {LCS} (X, Y)} {n}пlcs"="нЛСК ( X ,Й )
F lcs = ( 1 + β 2 ) R lcs P lcs R lcs + β 2 P lcs F_ {lcs} = \ frac {\ left (1+ \ beta ^ 2 \ right) R_ {lcs} P_ {lcs}} { R_{lcs}+\beta^2 P_{lcs}}Фл кс"="рл кс+б2 Пл кс( 1+б2 )рл кспл кс
其中LCS ⁡ ( X , Y ) \operatorname{LCS}(X, Y)ЛСК ( X ,Y ) — длина самой длинной общей подпоследовательности X и Y с учетом порядка. м , нм, нм ,n представляет собой длину (как правило, количество содержащихся слов) справочной сводки (сводка, созданная вручную) и автоматической сводки (сводка, сгенерированная машиной) соответственно.

4. Генерация экземпляра в альпаке

  • альпака упрощает процесс, нет оценок для задач классификации/неклассификации
  • Китайский пример выглядит следующим образом:
你被要求提供10个多样化的任务指令。这些任务指令将被提供给GPT模型,我们将评估GPT模型完成指令的能力。
以下是你提供指令需要满足的要求:
1.尽量不要在每个指令中重复动词,要最大化指令的多样性。
2.使用指令的语气也应该多样化。例如,将问题与祈使句结合起来。
3.指令类型应该是多样化的,包括各种类型的任务,类别种类例如:brainstorming,open QA,closed QA,rewrite,extract,generation,classification,chat,summarization。
4.GPT语言模型应该能够完成这些指令。例如,不要要求助手创建任何视觉或音频输出。例如,不要要求助手在下午5点叫醒你或设置提醒,因为它无法执行任何操作。例如,指令不应该和音频、视频、图片、链接相关,因为GPT模型无法执行这个操作。
5.指令用中文书写,指令应该是12个句子,允许使用祈使句或问句。
6.你应该给指令生成适当的输入,输入字段应包含为指令提供的具体示例,它应该涉及现实数据,不应包含简单的占位符。输入应提供充实的内容,使指令具有挑战性。
7.并非所有指令都需要输入。例如,当指令询问一些常识信息,比如“世界上最高的山峰是什么”,不需要提供具体的上下文。在这种情况下,我们只需在输入字段中放置“<无输入>”。当输入需要提供一些文本素材(例如文章,文章链接)时,就在输入部分直接提供一些样例。当输入需要提供音频、图片、视频或者链接时,则不是满足要求的指令。
8.输出应该是针对指令和输入的恰当回答。 
下面是10个任务指令的列表:
###
1. 指令: 在面试中如何回答这个问题?
1. 输入:当你在车里独处时,你会想些什么?
1. 输出:如果是在晚上,我通常会考虑我今天所取得的进步,如果是在早上,我会思考如何做到最好。我也会尝试练习感恩和活在当下的状态,以避免分心驾驶。
###
2. 指令: 按人口对这些国家进行排名。
2. 输入:巴西,中国,美国,日本,加拿大,澳大利亚
2. 输出:中国,美国,巴西,日本,加拿大,澳大利亚

Например, ответ, сгенерированный chatgpt 3.5:

3. 指令: 给出一个关于全球变暖的简短概述。
3. 输入: <无输入>
3. 输出: 全球变暖是指地球上气候系统的长期变化,包括气温、气压、降水和风等因素。虽然自然因素对气候的影响很大,但人类活动也在加剧全球变暖,导致气候的不稳定和极端天气事件的增多。

4. 指令: 给出一个有关健康的小贴士。
4. 输入: 我最近感到很疲倦,有什么建议可以提高我的能量?
4. 输出: 确保充足的睡眠是提高能量的重要因素。此外,保持健康的饮食和适当的锻炼也可以帮助提高能量水平。要尽可能避免糖和咖啡因,因为它们可能会导致疲劳。

5. 指令: 给出一个有关历史的事件描述。
5. 输入:请描述马丁·路德·金在美国民权运动中的作用。
5. 输出:马丁·路德·金是美国非裔民权运动的领袖之一,他通过非暴力抗议、演讲和写作等手段,推动了民权运动的发展,并取得了历史性的胜利,如《民权法案》和《选举权法案》的通过。

6. 指令: 给出一个有关文学的问题。
6. 输入:请问《麦田里的守望者》是谁写的?
6. 输出:《麦田里的守望者》是美国作家J·D·塞林格所写的小说,其讲述了一个年轻人的成长历程,被誉为20世纪最重要的小说之一。

7. 指令: 给出一个关于科技的趋势。
7. 输入:技术未来发展的趋势是什么?
7. 输出:未来,人工智能、物联网、区块链和量子计算等技术将继续发展,这些技术将改变我们的生活方式、工作方式和商业模式。此外,随着数据的不断增长,数据分析和数据科学也将成为越来越重要的技能。

8. 指令: 给出一个关于金融的问题。
8. 输入:请解释一下什么是债券。
8. 输出:债券是一种固定收益证券,它代表了借款人对债权人的债务。债券通常由政府、公司或其他实体发行,债券持有人可以获得固定的利息收益,并在债券到期时获得本金回收。

9. 指令: 给出一个地理问题。
9. 输入:请问世界上最高的山峰是什么?
9. 输出:珠穆朗玛峰是地球上海拔最高的山峰,位于中国和尼泊尔之间的喜马拉雅山脉中,海拔约为8,848米。

10. 指令: 给出一个关于医学的问题。
10. 输入:请解释一下心血管疾病是什么。
10. 输出:心血管疾病是一类疾病,包括冠心病、高血压、心脏衰竭、心律失常和动脉硬化等。这些疾病影响心血管系统的功能,可能导致心脏病发作、中风和其他严重的健康问题。预防心血管疾病的关键在于采取健康的生活方式,如均衡的饮食、适当的运动和戒烟等。

Ссылка

[1] Self-Instruct: выравнивание языковой модели с самогенерируемыми инструкциями
[2] Технология автоматического создания инструкций для точной настройки большой модели: введение в работу структуры автоматического создания инструкций SELF-INSTRUCT
[3] Self-Instruct: выравнивание языка модель с самогенерируемым выравниванием инструкций
[4] методом автоматической абстрактной оценки: Rouge-L, Rouge-N
[5] https://github.com/yizhongw/self-instruct

рекомендация

отblog.csdn.net/qq_35812205/article/details/131746477
рекомендация