Popular science and a preliminary understanding of large models

Table of contents

1. Simple understanding of large models

(1) Official definition

(2) Focus on large language models

(3) Application examples of large models

2. How to get a large model

(1) Overall general steps

Train your own model

Use pretrained models

Choose the appropriate model

Use cloud computing platform

Understand model licensing and usage restrictions

(2) Pre-training analysis

Supplement: What can be done with the pedestal model?

(3) Alignment concept analysis

1.Basic definitions and explanations

2. Command fine-tuning (SFT Supervised FineTune)

3. How to complete multiple rounds of dialogue tasks?

Multiple rounds of dialogue turned into continuation tasks

Conversation history convention format

Historical dialogue input

Model continuation dialogue

User interacts with model

Extension: Example use case description

4. Examples of common alignment methods

3. How to control large models

(1) prompt project

(2) Secondary training of the model

1.Overall description

2. Give an example of a task: Sentiment Analysis

3. Extension: Align the model with our usage expectations

(3) Rule-based pre-processing and post-processing

Rule-Based Preprocessing

Rule-Based Post-processing

(4) Boundaries and limitations of large models

4. Entering the big model

Mac environment setup

conda environment

python environment

Model download

Model loading

Tokenizer loading

Overview of learning sites and tutorials


1. Simple understanding of large models

(1) Official definition

Large Models does not have an official unified definition because it is usually a relative concept and its size will change according to time, technology and field development. Large models usually refer to neural network models in deep learning that have large numbers of parameters and computational resource requirements. These models may have different size thresholds in different contexts.

For example, in the field of natural language processing (NLP), large models may refer to models containing billions to hundreds of billions of parameters, such as GPT-3, GPT-4, etc. For the field of computer vision, large models may be deep convolutional neural networks with hundreds of millions of parameters, such as ResNet-152.

In the field of deep learning, with the advancement of technology, the scale of large models continues to expand to improve the performance of the model. Therefore, the official definition may be difficult to pin down, but whether a model is called a "large model" can usually be judged based on the model's number of parameters, computational resource requirements, and task performance.

Importantly, large models often require extensive computing resources and large-scale data to train, so the balance of resources and performance needs to be carefully considered when using them.

(2) Focus on large language models

Large Language Model ( LLM ) is a neural network model with a huge number of parameters, mainly used for natural language processing tasks . Its core task is to continue writing text, that is, after a given period of input text, generate a continuous A sequence of text, making it look like a continuation of natural language. The output of this model is generated word by word and can be continued until a specific termination symbol is encountered. The presence of this termination symbol allows the model to choose where appropriate end the output instead of generating the entire text at once.

  • Big in "big language model" refers to the large scale of the model, which usually needs to contain billions or even hundreds of billions or trillions of parameters . A model of this size requires a large amount of hard drive space for storage . For example, a model containing 7 billion parameters may require more than 13GB of hard drive space.
  • The large language model of multi-turn dialogue can not only be used for a single text continuation task, but also for multi-turn dialogue, that is, to generate continuous reply text in the conversation, making it look like a natural conversation flow. Such models can be used to build applications such as artificial intelligence assistants and chatbots.

Taken together, the large language model is a powerful natural language processing tool with a huge amount of parameters and capabilities. It can be used to generate natural language text, a single continuation task and multiple rounds of dialogue, generating and natural language for various texts. Language understanding tasks are supported.

(3) Application examples of large models

"Large models" usually refer to large neural network models in the field of deep learning, which have a large number of parameters and complex architectures and are used to solve various artificial intelligence tasks. These large-scale models have achieved remarkable results in fields such as natural language processing, computer vision, and speech recognition. Here are some common examples of large models:

  1. GPT-3 (Generative Pretrained Transformer 3): A natural language processing model developed by OpenAI with 175 billion parameters. It can generate high-quality text and perform a variety of text-related tasks.

  2. BERT (Bidirectional Encoder Representation Transformer): A natural language processing model developed by Google with 110 million to 340 million parameters for understanding context and processing natural language text.

  3. ResNet (Residual Network): A deep convolutional neural network widely used in the field of computer vision. It contains millions of parameters and is used for image classification and recognition.

  4. VGGNet (Visual Geometric Group Network): Another large convolutional neural network for image classification with numerous parameters.

  5. BERT (Bidirectional Encoder Representation Transformer): A natural language processing model developed by Google with 110 million to 340 million parameters for understanding context and processing natural language text.

  6. Inception (GoogLeNet): Another large convolutional neural network for image classification and object recognition with a large number of parameters.

Large models are widely used because they perform well on complex tasks , but also require significant computing resources to train and run . These models are typically pre-trained on large-scale datasets and then fine-tuned to fit specific tasks . Large-scale models have achieved impressive performance in fields such as natural language understanding, computer vision, and speech processing, and have broad potential for a variety of applications.

2. How to get a large model

(1) Overall general steps

Obtaining large neural network models (such as large language models, large deep learning models, etc.) usually involves the following steps:

Train your own model

  • If you have enough computing resources and data, you can try training a large model yourself. Typically requires massive computing resources (such as GPU or TPU) and large-scale data sets. You need to choose an appropriate deep learning framework (such as TensorFlow, PyTorch) and write model training code.
  • If you want to train a large language model, you also need to consider aspects such as text data preprocessing, tokenization, and training process tuning.

Use pretrained models

  • A more common approach is to use large models that have been pre-trained by large organizations (such as OpenAI, Google, Facebook). These models are pre-trained on large-scale datasets and can be used for a variety of natural language processing and computer vision tasks.
  • These pre-trained models are usually available through open source or commercial channels. These models can be downloaded or accessed and fine-tuned to suit specific tasks in your own projects.

Choose the appropriate model

  • When selecting a large model, consider the complexity of the task, available computing resources, and the amount of data available. Smaller models may require fewer resources but may be limited in performance.
  • If you only need to perform a specific task, you can choose a model that has been fine-tuned for that task.

Use cloud computing platform

  • Large neural network models require large amounts of computing resources, including high-performance GPUs or TPUs. If these resources are not available, you can consider using cloud computing platforms such as AWS, Google Cloud, Azure, etc. These platforms provide powerful deep learning computing resources for rent.

Understand model licensing and usage restrictions

  • Before using a large model, make sure you understand its license and usage restrictions. Some large models may be subject to specific usage conditions, such as commercial use fees, etc.

Master the use and fine-tuning of models

  • Once you have a large model, you need to learn how to use it and how to fine-tune it for specific tasks. Most pretrained models have documentation and sample code to help get started.

In summary, obtaining large neural network models requires careful consideration of computing resources, data, and task requirements. If you don't have enough resources or expertise, consider using an already pre-trained model and fine-tuning it as needed to meet the requirements of your specific task.

(2) Pre-training analysis

"Pre-training" is an important concept in the fields of deep learning and natural language processing. It refers to the process of initial training of a model on large-scale data before the model is formally applied to a specific task. This process is usually divided into the following steps:

  1. Data collection : A large amount of text data is collected, which often contains text from the Internet, including articles, news, social media posts, etc.

  2. Preprocessing : The data undergoes preprocessing steps such as cleaning, word segmentation, and tokenization to facilitate model training.

  3. Model pre-training : Using this data, the model was initially pre-trained. At this stage, the model learns the structure, grammar, semantics and other knowledge of the language, as well as the statistical characteristics of the text data. This pre-trained model is often called the "base model" or "pre-trained model". For example, LLaMa, GPT-3, GLM130B, such models are all base models.

  4. Fine-tuning : Once the base model is pre-trained, it can be fine-tuned on a specific task . This often involves using task-specific datasets, such as sentiment analysis, text generation, etc., to further tune the parameters of the model to adapt it to the specific task.

After completing pre-training, the base model has extensive language understanding and generation capabilities. It can perform a variety of text-related tasks, including continuation, translation, question answering, text classification, etc. The reason why pre-trained models are powerful is that they have learned the general knowledge and rules of language through large-scale data and can be used for the initialization of various natural language processing tasks.

Pre-trained models have achieved remarkable results in the field of natural language processing and are widely used in various applications, including smart assistants, automatic translation, smart search, etc. These models perform well on many tasks because they possess extensive text understanding and generation capabilities and can handle complex natural language data.

Supplement: What can be done with the pedestal model?

Base models, such as the GPT series (such as GPT-3, GPT-4) and other large-scale language models , have a wide range of language understanding and generation capabilities, and are not limited to continuation tasks. Here's what the base model can do:

  1. Text generation and continuation : The base model can accept text fragments and generate coherent, sensible text. This is useful in tasks such as automatic text generation, automatic summarization, article creation, etc. (The model needs to have knowledge: The capital of China is Beijing .; The model needs to be able to calculate: 111+222= 333 ; etc.)

  2. Natural language understanding : It can understand and interpret text content, including answers to questions, text classification, sentiment analysis, etc. This is very helpful for tasks such as question answering systems, sentiment analysis, text classification, etc.

  3. Translation : The base model can translate text from one language to another and therefore can be used for machine translation.

  4. Conversation generation : It can be used to generate conversations, so it can be used in chatbots, virtual assistants, intelligent customer service systems, etc.

  5. Information retrieval and question answering : Pedestal models can be used for information retrieval with search engines and can also be used to answer specific questions.

  6. Knowledge Base Population : Base models can extract knowledge from large-scale text, which can be used to build a knowledge base or help answer specific questions.

  7. Intelligent recommendation : It can generate personalized recommendations based on the user's historical behavior and preferences, such as movies, music, news, etc.

  8. Automatic summary : It can automatically generate a summary of the text to help users quickly understand the core content of long texts.

  9. Text editing and repair : It can be used for text editing, helping to correct grammatical errors, provide suggestions, and more.

The reason why base models are so useful is that they have learned the general knowledge and rules of language through large-scale text data, and therefore can be used for the initialization of various text-related tasks. Additionally, by fine-tuning the base model, you can adapt it to a specific domain or task, further improving performance. Therefore, the base model has broad application prospects in the fields of natural language processing and artificial intelligence, and can be used to solve various complex text-related problems.

(3) Alignment concept analysis

1.Basic definitions and explanations

"Alignment" in this context refers to adjusting the output of a large language model so that it matches human expectations and specific needs. Alignment is done to make large models more practical and safe.

Below is a detailed explanation of both aspects:

Better to use

  • Match user expectations : When users pose a question or task to a large language model, they often expect the model's answers or generated text to be relevant to the context of the question or task. The goal of alignment is to ensure that the model's output is consistent with the user's expectations. For example, when a user asks about the capital of China, the expected answer is "Beijing" rather than other irrelevant information (the model may output "What is the capital of the United States? What is the capital of Germany?...", or it may output "This is a question that everyone knows." From the perspective of continuation, the model's answers may be correct, but they are not in line with our expectations).

  • Context-sensitive : For some tasks, such as search engine queries or domain-specific information retrieval, users expect the model to generate results that are contextually relevant to the input. Alignment ensures that the model understands the context and generates an appropriate response, rather than simply performing a continuation task. (For example, when the assistant model asks "Special hometown dishes in Baotou, Inner Mongolia", it is hoped that the model can output a call to the search engine, rather than the model directly doing the continuation task.)

safer

  • Avoid harmful content : Alignment can also be used to limit the model from generating content that may be harmful or inappropriate. For example, the model should be designed not to generate illegal or immoral content involving pornography, gambling, drugs, violence, terrorism, etc. One of the tasks of alignment is to ensure that models do not generate this type of content, thus improving the security of the platform.

  • Compliance and Ethics : Alignment is also important when it comes to compliance and ethics. The output of the model should comply with applicable laws and ethical norms, follow the privacy policy, and not cause harm to users or society.

In summary, alignment is a critical task that aims to ensure that the output of large language models meets user needs, expectations, and regulations while improving the utility and safety of the model. Through effective alignment methods, the generation behavior of the model can be better controlled and guided, making it more suitable for various application scenarios.

2. Command fine-tuning (SFT Supervised FineTune)

"Supervised Fine-Tune, SFT" (Supervised Fine-Tune, SFT) is a method of fine-tuning deep learning models, often used to adjust large language models so that they can understand and follow specific instructions.

  1. Constructing special data : During the instruction fine-tuning phase, some special data samples need to be prepared, which contain instructions and artificially constructed standard answers or expected outputs. These instructions can be various forms of tasks or requirements, such as repeating reading, answering questions, generating text, etc.

  2. Use pre-trained models : During the fine-tuning process, large-scale base models that have been pre-trained, such as the GPT series models, are usually used as the initial model.

  3. Input instructions : Use the instructions ("Where is the capital of China?" "What is the address of Zhang Yanfeng's CSDN blog?") as the input of the model, and then observe the output of the model. Often, the output of a model may not be as expected initially because it is only generated based on pre-trained knowledge. The input instructions are the previous context, and the output of the model is the posterior context.

  4. Comparison and improvement : Compare the model's output with the expected standard answer and calculate the difference between them, usually measured using a loss function. Then, through the back propagation algorithm, the parameters of the model are adjusted to reduce the loss function, so that the output of the model gradually approaches the standard answer.

  5. Iterative fine-tuning : This process is repeated multiple times, each time using different instructions and standard answer samples, to adjust the parameters of the model so that it can better follow the instructions and generate the expected output.

Through instruction fine-tuning, the model gradually learns to understand various instructions and perform corresponding tasks. This approach helps to better align the model with users' expectations and needs, thereby increasing the model's usefulness for a specific task. Instruction fine-tuning also helps the model follow specific rules and constraints, thereby improving the safety and controllability of the model. Ultimately, this process results in a model that can follow instructions and get the job done, often used in applications such as single-turn conversations.

3. How to complete multiple rounds of dialogue tasks?

The model can only continue writing, but how can it complete multiple rounds of dialogue tasks? This is achieved by converting multiple rounds of dialogue into a continuation task, through which the model generates the illusion of dialogue.

Multiple rounds of dialogue turned into continuation tasks

The core idea of ​​converting multiple rounds of dialogue into a continuation task is to present the entire conversation history to the model in a formatted manner, allowing the model to treat it as a single text fragment for continuation. This format typically includes the user's question, the model's response, and possibly dialogue markers to distinguish which text is user input and which text is the model's response.

Conversation history convention format

In the SFT (Supervised Fine-Tune) stage, the model undergoes specific training and learns the organizational format and tags of multiple rounds of dialogue. It can understand the role and relationship of each sentence in this format, including which ones are user input and which ones are model answers. This training enables the model to correctly understand the context and context of the conversation.

Historical dialogue input

In practical applications, historical conversations are input to the model in an agreed format. This historical conversation includes the user's questions and the model's responses, as well as possible conversation markers. After the model receives this formatted input, it treats it as a single text fragment.

Model continuation dialogue

After the model receives the formatted historical dialogue, it will generate the corresponding continuation output, that is, the answer or response to the next sentence. This output will be added to the historical dialogue to form a new historical dialogue, and then input to the model again. The model will continue to write and complete the process of multiple rounds of dialogue.

User interacts with model

In this way, the user and the model can have multiple rounds of dialogue, although the model itself does not remember the dialogue history. The model simply generates coherent responses based on the format and context of historical conversations, making the user feel as if they are communicating with an intelligent entity that can understand and respond to multiple rounds of conversations.

In summary, by formatting multi-turn conversations as continuation tasks, the model is able to understand and respond to the context of multi-turn conversations, thereby completing the multi-turn conversation task. This approach takes full advantage of the generation capabilities of large language models and provides users with an experience of interacting with the model. The model does not have real memory capabilities, but makes the dialogue process coherent and natural by understanding and continuing the formatted historical dialogue.

Extension: Example use case description

When converting multiple rounds of dialogue into a continuation task, the model can understand and generate continuous dialogue text. Here is an example:

Suppose there are the following multiple rounds of dialogue:

User: Hello, Zhang Yanfeng wrote a blog called "Popular Science Preliminary Understanding of Large Models". Can you provide me with the address?

Model: Hello, according to your requirements, the corresponding blog address is: https://blog.csdn.net/xiaofeng10330111/article/details/132718410.

User: OK, thanks!

Now, to turn this multi-turn conversation into a continuation task, you first need to format the conversation into a text fragment, usually using special markers or delimiters to represent each conversation turn. Formatted conversation text might look like this:

"[U1: Hello, Zhang Yanfeng wrote a blog called "Popular Science Preliminary Understanding of Large Models". Can you provide me with the address? M1: Hello, according to your request, the corresponding blog address is: https://blog .csdn.net/xiaofeng10330111/article/details/132718410. U2: OK, thank you!]"

In this formatted conversation text, U1 represents the user's first round of input, M1 represents the model's first round of answers, and U2 represents the user's second round of input. Now, the model can treat the entire conversation history as a text fragment and use the continuation task to generate the next round of answers.

For example, if you want to continue the conversation:

User: Please help me sort out the corresponding summary of ideas.

Then, add this user input to the formatted conversation text, and the model will continue to generate the following response:

"[U1: Hello, Zhang Yanfeng wrote a blog called "Popular Science Preliminary Understanding of Large Models". Can you provide me with the address? M1: Hello, according to your request, the corresponding blog address is: https://blog .csdn.net/xiaofeng10330111/article/details/132718410. U2: OK, thank you!. U3: Please help me sort out the corresponding summary of the ideas.]"

By continuously iterating this process, the model can complete multiple rounds of dialogue and generate coherent responses, making the user feel like they are communicating with an intelligent entity that can understand and respond to multiple rounds of dialogue. Although the model itself does not have real memory, it achieves the effect of multiple rounds of dialogue by understanding and continuing the dialogue history. This method can be applied to multi-turn conversation tasks such as chatbots, virtual assistants, and intelligent customer service.

4. Examples of common alignment methods

There are many methods for aligning large language models that aim to ensure that the model's generated behavior conforms to specific expectations, requirements, and rules. Here are some common alignment methods:

  • Supervised Fine-Tune (SFT) : As mentioned earlier, this method enables the model to perform specific tasks or obey specific rules by training the model to understand and follow specific instructions.

  • Policy Network : This approach involves using an additional neural network to generate a policy that determines the behavior the model should adopt when generating text. This method can be used to guide the model to generate specific types of text, such as compliance text, emotional text, etc.

  • Template matching : Template matching is a method of matching a model's generated text to a predefined text template. These templates can contain specific structure and syntax to ensure that the generated text meets requirements.

  • Reinforcement Learning : In reinforcement learning methods, the model learns how to generate text by interacting with the environment. Through reward and punishment signals, the model can gradually adjust the generated behavior to make it more consistent with expectations.

  • Rule engine : The rule engine is a rule- and logic-based approach for controlling the generation behavior of a model. By defining a set of rules, the model can be directed to generate specific types of text.

  • Detection and filtering : This approach involves using natural language processing techniques and machine learning models to detect and filter non-compliant or harmful text. Once non-compliant text is detected, appropriate action can be taken, such as deletion, modification, or flagging.

  • Human review and editing : In some cases, alignment can be achieved through human review and editing. Human reviewers can review the text generated by the model and make necessary edits and fixes to ensure that the text meets requirements.

These alignment methods can be used individually or in combination, depending on the application scenario and requirements. Alignment is a critical step in ensuring that large language models behave appropriately, are practical, and are safe in a variety of applications. Different methods can be used for different tasks and domains to meet user needs and expectations.

3. How to control large models

In the past, we defined interfaces for the system, and we designed the capabilities the system can provide; but in the future, the core system will have only one interface, which is natural language, and its capabilities are learned by ourselves. Our engineering system must be built around this AI core, and engineers need to know how to communicate with AI better than others (with the help of prompt engineering).

(1) prompt project

introduce:

  • The core goal of the Prompt project is to ensure that large-scale language models generate text as users expect and follow specific rules and conventions to meet task requirements.

  • Prompt is text input provided by the user to the model, usually including questions, instructions or context, which is used to guide the model to generate corresponding text responses.

  • Prompt projects have a wide range of applications, including generating articles, answering questions, translating text, creating stories, writing code, and more.

Instructions for use:

General steps for using prompt projects:

  • Identify the task or goal : First, identify the task or type of text you want the model to perform or generate. This can be a question, task description, or other requirement.

  • Design a clear prompt : Create a clear, unambiguous prompt to guide the model through the task. Prompts should include enough information for the model to understand the requirements of the task. Avoid ambiguous or vague prompts.

  • Consider the model's tendencies : Taking into account the model's possible tendencies and preferences, design prompts in a way that is easy for the model to understand and follow. If the model tends to generate long text, you can explicitly ask to generate short answers.

  • Test and tune prompts : Use designed prompts to make requests to the model and observe the resulting text. If the results are not as expected, try a different prompt or adjust the prompt to improve the results. This may take several tries and experiments.

  • Instruction compliance and monitoring : Ensure that the model follows instructions and produces text as expected. Monitor the generated text, and if you notice behavior that doesn't meet your expectations, you can try adjusting the hints to have better control over your model.

  • Handle edge cases : Consider edge cases that may arise, such as dealing with negative instructions, handling complex tasks, or imposing length limits on text.

  • Feedback loop : Continuously collect feedback generated by the model, adjust and improve the tips based on the feedback, so that the model generates better results.

  • Testing and Validation : The generated texts are tested and validated to ensure that they meet the requirements and quality standards of the task.

  • Automation and Integration : Integrate prompt engineering into automated processes as needed to generate text at scale or handle large numbers of requests.

Note that each task and application may require a different approach to prompt engineering. These steps can be customized and adapted to the specific situation. When using prompt engineering methods, practice and experience are often key factors in improving the quality of the results. Keep trying and improving the prompts so that the model generates the desired text or performs the task.

(2) Secondary training of the model

1.Overall description

Secondary training of the model refers to further training the model after it has been pre-trained to make it more suitable for the needs of a specific task or domain. This approach can help us gain more freedom and flexibility in controlling the model.

  • Task customization: Secondary training of the model allows it to be customized for a specific task or domain . This means you can apply your models to a wide range of application scenarios, including natural language processing, computer vision, audio processing, and more.
  • Data annotation: Before secondary training, it is usually necessary to prepare annotation data related to the target task. This data is used by the model to perform supervised learning on a specific task. For example, in a text classification task, you need to prepare text data with labels.
  • Fine-Tuning: Secondary training usually uses a method called Fine-Tuning . In Fine-Tuning, the model uses pre-trained parameters as initial weights and then performs additional training on task-specific data. This allows the model to retain the general knowledge of the pre-trained model and adapt to the requirements of a specific task.
  • Controlling model behavior: By designing appropriate training data and loss functions during the Fine-Tuning process, the behavior of the model can be controlled to adapt it to the specific needs of the task. For example, in the task of generating text, different loss functions can be used to control the quality and style of the generated text.
  • Multi-task learning: Secondary training also supports multi-task learning . This means that you can perform Fine-Tuning on multiple tasks at the same time on a model, enabling the model to perform multiple related tasks at the same time, thereby increasing the versatility of the model.
  • Domain adaptation: Secondary training of the model is also very useful for tasks that need to be adapted to different domains or specific domains . A model can be tuned using data from a specific domain so that it performs better on tasks in that domain.
  • Control output: By adjusting the training data and loss function, you can control the text or the form of output generated by the model to ensure that it meets the requirements of the task or application. This approach helps avoid inappropriate or offensive content .

In summary, secondary training of models is a powerful technique that allows large language models to be customized for a variety of tasks and applications. Through carefully designed data and training strategies, the goal of controlling model behavior more freely and flexibly can be achieved. This approach is widely used in many applications in the fields of natural language processing and artificial intelligence.

2. Give an example of a task: Sentiment Analysis

Suppose you wish to build a sentiment analysis model that can analyze the sentiment of a text review, such as positive, negative, or neutral. You already have a large dataset of text reviews, which includes review text and sentiment labels (e.g. "positive", "negative", "neutral"). Example of steps:

  • Data preparation: First, you need to prepare the data set required for the sentiment analysis task . This includes review text and sentiment tags. You may also need to preprocess the text, such as tokenization, removal of stop words, etc.

  • Model selection: You can choose a pre-trained large language model, such as GPT-3 or BERT, as the base model for secondary training . This basic model already has rich language understanding capabilities, but needs to be further adapted to sentiment analysis tasks.

  • Fine-Tuning: In the Fine-Tuning stage, the model will be trained using the prepared sentiment analysis data set . The goal is for the model to learn to understand review text and predict corresponding sentiment labels. A loss function can be designed to make the model's predictions as consistent as possible with the labels.

  • Model control: During the Fine-Tuning process, a control mechanism can be introduced to ensure that the sentiment analysis results generated by the model meet your expectations. For example, you can add a control tag or directive that tells the model to generate text relevant to sentiment analysis. This helps ensure that the model produces results that you expect.

  • Evaluation and Tuning: After completing Fine-Tuning, you need to evaluate the model's performance on sentiment analysis tasks. The validation set can be used for evaluation and the model can be tuned based on performance metrics such as accuracy, F1 score, etc. If the model performs poorly, you can continue Fine-Tuning or try different architectures and parameters.

  • Deployment and Application: Once a model performs well on a sentiment analysis task, it can be deployed into a production environment for analyzing user-provided comments or text. The model generates sentiment analysis results based on the input text to meet your application needs.

With this example, you can see how secondary training of a model allows you to tailor it to the needs of a specific task, with control mechanisms to ensure that the text generated by the model matches your intended use. This method can be applied to various natural language processing tasks and application scenarios.

3. Extension: Align the model with our usage expectations

Secondary training of the model can help ensure that the model is aligned with our usage expectations, that is, the text generated by the model or the tasks performed by the model meet our intentions and expectations.

Instruction fine-tuning (SFT, Supervised Fine-Tuning): In secondary training, the instruction fine-tuning method can be used to guide the behavior of the model by providing clear instructions. These instructions tell the model how to handle specific types of requests, ensuring that the model's output is consistent with our usage expectations.

Example: If you want the model to correctly answer geography questions, you can fine-tune the instructions by providing instructions such as "Answer the following geography question:" or "Please provide answers to the following questions:" and then provide a series of geography questions. This will make the model understand that it needs to answer these questions rather than generate other types of text.

2. Control the generation style and quality: By designing an appropriate loss function in secondary training, the style, quality, and consistency of the generated text can be controlled to ensure that the text meets our usage expectations.

Example: If you want your model to generate formal tech news reports, you can use a loss function in secondary training that requires the generated text to have a formal language style and ensure that the text does not contain offensive content.

3. Domain adaptation: In secondary training, you can use data from a specific domain to adapt the model to the tasks and needs of the specific domain to ensure that the expected use of the model in that domain is met.

Example: If you want the model to provide professional medical advice in the medical field, you can use medical literature and medical data for Fine-Tuning to make the model understand medical terminology and related knowledge to better meet usage expectations in the medical field.

4. Monitor and adjust: While using the model, continuously monitor the text generated or the tasks performed and intervene or adjust as needed. This helps ensure that the model behaves as expected for our use.

Example: If the model generates inappropriate text in certain situations, you can develop monitoring mechanisms to detect and filter this text to ensure that the model's output aligns with our expectations.

Through the above method, the behavior of the model can be guided in the secondary training of the model to better meet our usage expectations and ensure that the generated text or performed tasks are consistent with our intentions. This alignment is achieved through careful design and control at different stages of model training and use.

(3) Rule-based pre-processing and post-processing

Rule-based pre-processing and post-processing are key components in the secondary training of the model, which can be used to control the behavior of the model to ensure that the generated text or performed tasks meet our expectations.

Rule-Based Preprocessing

Preprocessing is the process of processing or transforming input data before it is passed to the model. Can be used to prepare input data, guide the behavior of the model, or ensure that the input data conforms to a specific format or requirements.

Example 1: Generative tasks

Assuming you are building an automatic text summary generation model, you can perform the following tasks in the pre-processing stage:

  • Remove irrelevant text or markup, such as ads, noise text, etc.
  • Extract key sentences or paragraphs for use in summary generation.
  • Tokenize and segment the input text so that the model can better understand and process it.

Example 2: Dialogue system

If you are developing a dialogue system, you can do the following in preprocessing:

  • Detect and correct spelling errors in user input.
  • Identify and extract key information from user questions.
  • Convert user input into a specific format to guide the model to generate relevant answers.

Rule-Based Post-processing

Post-processing is the process of processing or modifying the generated results after the model generates text. It is used to ensure that the text generated by the model meets specific standards or requirements.

Example 1: Text generation task

Assuming you are doing a text generation task, you can perform the following tasks in the post-processing phase:

  • Remove inappropriate content or sensitive information from generated text to ensure text security.
  • Adjust the length of generated text to meet length constraints or requirements.
  • The generated text is checked for syntax and semantics to ensure its quality and accuracy.

Example 2: Dialogue system

If your dialogue system generates a series of dialogue replies, you can do the following in post-processing:

  • Sort the generated responses and select the most appropriate ones.
  • Adjust generated responses based on user feedback or contextual information.
  • Detect and filter out redundant information or duplicate content in generated replies.

Through rule-based pre- and post-processing, you can guide the behavior of the model, ensure that the generated text or performed tasks are as expected, and improve the controllability and reliability of the model. These rules can be designed and optimized based on task requirements and model behavior characteristics.

(4) Boundaries and limitations of large models

  1. Hallucinations: Large models may exhibit hallucinations when generating text, making it up or failing to provide accurate, trustworthy information. This is because the model may learn inaccurate or false information during the pre-training stage, resulting in a decrease in the quality of the generated content. For example, a model may fabricate false historical events or facts.

  2. Instability: The model's output may exhibit instability from run to run. The same input may lead to different outputs, making model reproducibility challenging. This instability can lead to inconsistent results, especially in applications that require consistency, such as automated decision-making or automated generation tasks.

  3. Does not have real-time knowledge: The data that large models come into contact with during the pre-training stage is usually static and cannot reflect current events or information in a timely manner. Therefore, the model cannot provide real-time knowledge or information updates. For applications that require timely information, large models may not be adequate.

  4. Restricted input and output lengths: Large models often have limitations on the length of input text and generated text. If the length of input or generated text exceeds the model's limits, the model may truncate or discard portions of the content, resulting in incomplete or unclear information. This can be a challenge for applications that need to process long texts or long conversations.

  5. Limited responsiveness: Inference on large models can be slow, especially without massively parallel computing resources. This affects the performance of real-time applications, such as real-time conversation systems or real-time data analysis.

  6. Limited ability to follow instructions: Large models may have certain limitations in following instructions. Although the model can understand the instructions and perform the task, in some complex or unclear situations, the model may not understand or follow the instructions correctly. This can cause the model to produce unexpected results.

Taken together, large models have limitations and deficiencies in certain aspects, including the quality, stability, real-time nature of generated content, input and output limitations, response speed, and the ability to follow instructions. When applying large models, these limitations need to be considered and, if necessary, steps taken to address or mitigate these issues to ensure that the model can run effectively in specific tasks and scenarios. At the same time, models of different sizes may perform differently in different applications, and model size and task requirements need to be considered comprehensively.

4. Entering the big model

To experience it through an existing model, directly download PharMolix/OpenBioMed on github . This is the world's first commercially available multi-modal biomedicine tens of billions of parameter large-scale model BioMedGPT-10B open sourced by Zhang Tielei's team . This model has unique biological characteristics. The text generation capabilities in the medical professional field are comparable to those of human experts, and it has reached SOTA in cross-modal question and answer tasks in natural language, molecules, and proteins. At the same time, this repo also open sourced the world's first free commercial Llama 2 major language model for biomedicine, BioMedGPT-LM-7B (the following introductory tutorial is based on this 7B model as an example).

Mac environment setup

conda environment

Anaconda is a powerful tool for managing Python environments. It can create and manage multiple independent and isolated Python environments, and install and manage Python dependencies in the environments. You can use MiniConda, its free, minimally available version. You can find the corresponding download link and installation method in Miniconda ‒ conda documentation .

mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh


//After installing, initialize your newly-installed Miniconda. The following commands initialize for bash and zsh shells:
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

python environment

After installing miniconda, you can create a Python environment. Here, a python environment named biomedgpt is created and activated conda activate.

cd /Users/zyf/miniconda3/bin

./conda create -n biomedgpt python=3.10

./conda activate biomedgpt

In order to run BioMedGPT-LM-7B, we need to install pytorch and transformers.

./conda install pytorch torchvision torchaudio -c pytorch -c conda-forge

pip install transformers

Model download

Since the model file is large, Git Large File Storage also needs to be installed.

brew install git-lfs

Then you can download it through git clone.

cd /Users/zyf/zyfcodes/jpt
git clone https://huggingface.co/PharMolix/BioMedGPT-LM-7B

Model loading

BioMedGPT-LM-7B is based on meta-llama/Llama-2-7b and is incrementally trained on biomedical corpus. The loading method of the model is consistent with the loading method of the llama2-7B model. We can load the model directly through transformers. You can also directly print (model) to print out the model structure. For specific model details, you can check the llama2 official website and technical report.

The from_pretrained function of transformers.AutoModelForCausalLM can directly download the model from the huggingface warehouse or load the locally downloaded model. model_path is the path where you store the model and tokenizer files. By running the above code, we successfully loaded the BioMedGPT-LM-7B model on the mac notebook. We can print the model directly and view the detailed information of the model. If it prompts that there is insufficient memory, it is recommended to shut down other unnecessary processes first.

Tokenizer loading

As shown in the figure below (from huggingface official documentation), the text needs to go through three steps from being passed into the model to outputting the result. Tokenizer divides the input text into tokens one by one, and then converts the tokens into vectors. The Model is responsible for extracting semantic information based on the input variables and outputting logits. Finally, Post Processing performs specific NLP tasks, such as emotions, based on the semantic information output by the model. analysis, text classification, etc. Next, we need to load the tokenizer corresponding to the model. The tokenizer related files and model files are in the BioMedGPT-LM-7B/ folder. Among them, tokenizer.json stores tokenizer's vocabulary, tokenizer_config.json contains tokenizer related parameters, and tokenizer.model stores model parameters.

Use tokenizer to process the input text to obtain the vector representation required by the model.

Input_ids is the vocabulary id corresponding to each token. Use it as the input of the model to get the output token id, and then decode the output into text through the tokenizer's decoder method.

At this point, we have successfully loaded the 7B large model on the notebook and used it to generate a piece of text. Note that since this tutorial is executed on a laptop using the CPU, and the 7B model has as many as 7 billion parameters, the model inference process of model.generate is very time-consuming, and you need to wait patiently for the result output (it may take 2 hours to execute).

This part is not running, and a new test will be conducted later.

Recommended reading: Spring Boot source code interpretation and principle analysis

The predecessor of this book is the top-selling booklet in the Nuggets community - "Spring Boot Source Code Interpretation and Principle Analysis". More than 3,600 developers in the entire community have chosen this booklet, making it the leading booklet in the Nuggets community. The trump card Spring tutorial is very good!

This booklet has made the author ranked in the Top 40 of the 2020 Popularity List, and he has been awarded 8 medals of honor. The sales volume on the site is far ahead. Readers call it a conscientious work, and they like and call it.

However, due to the limited volume and length of the booklet, readers have expressed that they are still unfinished and would like more useful information. They hope that the author can explain it in more detail and thoroughness.

If you want to have a relatively reasonable, smooth, and systematic learning experience, this book is perfect.

Since this book is an upgrade based on the booklet, the content of the book is more systematic, and it is optimized based on the feedback from readers of the booklet, and the explanation is more in-depth and detailed. It’s not just an upgrade, it’s a refresh!

Different from the centralized knowledge explanation in the booklet, Linked-Bear has reorganized the content into the following four parts to explain the knowledge from the shallower to the deeper.

Part 1: The core container that Spring Boot relies on at the bottom
mainly introduces the underlying basic knowledge, aiming to help the author lay a solid foundation. First, we review Spring Boot knowledge from an overall level, allowing readers to quickly review the underlying logic and core knowledge of Spring Boot. This knowledge is the basis for subsequent programming and application.

Part 2: Analysis of Spring Boot's life cycle principles. Taking
the Events emitted in each period of the life cycle as the main line, combined with the major events completed in each life cycle, you can have an overview of Spring Boot and gain a deeper understanding of Spring Boot.

Part 3: Spring Boot integrates common development scenarios.
Corresponding to the core containers in the first two parts, the module configuration is explained and the module application is demonstrated in different scenarios. This part of the content is very close to actual combat, and these technologies can be used in e-commerce, gateway services, databases and other scenarios.

Part 4: Running Spring Boot Applications
Spring Boot has multiple packaging methods. The author chose two methods to explain the boot process of the application respectively, and introduced the graceful shutdown feature introduced in the new version. After studying this chapter, you will be able to master Spring Boot completely! He focuses on the research of distributed systems and machine learning algorithms, and has published papers at top academic conferences in multiple fields such as theory, machine learning, applications, and operating systems.

Overview of learning sites and tutorials

Website and Tutorials

  • OpenAI official website : OpenAI’s official website usually provides information about their latest research and development, as well as related tutorials and documentation. There you can find the latest advances and technologies regarding large models.

  • Deep Learning Specialization : The deep learning specialization course on Coursera hosted by Professor Andrew Ng includes content related to large neural networks. This course provides the basics of deep learning, including natural language processing and generative models.

  • Courses at Stanford University : Stanford University offers some courses related to deep learning and natural language processing. Their course materials are usually available for free online, including lecture notes, videos, and assignments.

  • GitHub : There are many open source projects and code repositories on GitHub, which can give you insights into the implementation and application of large models. For example, you can find various pre-trained models and tools for natural language processing tasks.

Books :

  • "Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville: A classic in the field of deep learning, covering the basic concepts and techniques of deep learning, and is very helpful for understanding large neural networks and language models.

  • "Natural Language Processing in Action" by Lane, Howard, and Hapke: Introduces the core concepts and techniques of natural language processing, including the application of large language models. It provides practical examples and code.

  • "BERT (Bidirectional Encoder Representations from Transformers) Explained" by Ben Trevett: An online tutorial that explains in detail the working principle and application of the BERT model. It's a great starting point for understanding pretrained models.

  • "GPT-3 and Beyond: Generative Models" by Benjamin Obi Tayo Ph.D.: This book introduces generative models (including GPT-3, etc.), which is very helpful for in-depth understanding of the working principles and applications of large-scale language models.

other:

Guess you like

Origin blog.csdn.net/xiaofeng10330111/article/details/132718410