How to add the data marked in the bidding field to a large Chinese model to build a Transformer model for parsing the bidding field?

ChatGPT scheme 1

insert image description here

Adding the data marked in the bidding field to a large Chinese model, and building a Transformer model for parsing the bidding field can be achieved through the following steps:

  1. Collect and label data related to bidding. Existing data sets can be used, or relevant data can be obtained through crawler technology, and then manually labeled. Labeling data requires labeling field names (such as subject matter, bidder, etc.) and values ​​of corresponding fields.

  2. Convert the labeled data into a format suitable for Transformer model training. The data can be converted into a format similar to the common sequence labeling dataset, that is, one row per label, and each label contains words and their corresponding labels, for example:

标的物 O
: O
XXX B-标的物
, O
YYY I-标的物
。 O
投标人 O
: O
ZZZ B-投标人
, O
WWW I-投标人
。 O
……
  1. Use an open source Transformer model library, such as Hugging Face's transformers library, to build a Transformer model for bidding field parsing. You can choose to use the pre-trained Chinese BERT or RoBERTa model, or you can train it yourself.

  2. When training the model, the data set needs to undergo Chinese word segmentation and input encoding (for example, using BertTokenizer), and then pass it as input to the Transformer model for training. When training, a cross-entropy loss function similar to the NER task and the Adam optimizer can be used.

  3. After the training is complete, the trained model can be used to parse the new bidding text and identify the field names and corresponding values. A method similar to sequence labeling can be used to segment the bidding text, and then label each word to obtain the field name and corresponding value to which the word belongs. Finally, the corresponding values ​​are aggregated according to the field names.

Bing Option 2

insert image description here

Hello, this is Bing. Based on the information I found on the web, here are my suggestions:

  • You can use a pre-trained Chinese large model, such as BERT 1 , as the base model, and then fine-tune it on the data labeled in the bidding field to suit your task needs.
  • You can think of the task of bidding field labeling as a sequence labeling problem, that is, given a bidding text, assign a label to each word or character to indicate which field it belongs to, such as project name, budget amount, bid opening time, etc. .
  • You can use Transformer 2 's encoder-decoder structure to build your model, where the encoder is responsible for encoding the input text and the decoder is responsible for generating the corresponding label sequence. You can also use some variant or improved Transformer model like BART 3 etc.
  • You can include positional encodings2 in the model's input to provide information about the position of each word or character, helping the model capture sequential relationships in language.

Hope this information helps you. If you have additional questions, please feel free to ask.

Claude+ Plan 3

insert image description here

Add bidding field labeling data to a Chinese pre-training model, and the steps to build a field parsing model are as follows:

Choose a Chinese Transformer model, such as BERT, RoBERTa, BERT-wwm, etc. Here take BERT-wwm-ext as an example.

Prepare the bidding field labeling dataset, including text sequences and BIO tag sequences.

Load the BERT-wwm-ext model, freeze all layers, and only update the last layer.

Add two new layers after the last layer of BERT-wwm-ext:

A fully connected layer, the input is the output of the last layer of BERT, the output dimension is the number of labels, and the activation function is softmax.
A CRF layer for decoding of BIO tag sequences.
Connect the bidding field annotation data set to the BERT-wwm-ext model input, and the fully connected layer and the CRF layer output the connection label.

Compile the model, use the cross-entropy loss marked by BIO, and select Adam as the optimizer.

Only train the newly added fully connected layer and CRF layer, and freeze other layers of the BERT-wwm-ext model.

After a certain number of rounds of training, evaluate the F1-score and accuracy rate on the validation set, and select the optimal model.

The final model is evaluated on the test set, and the extraction effect of each bidding field is obtained.

You can continue to optimize on this basis, such as adding regularization methods such as dropout, using a larger BERT model, etc.

In this way, a BERT-wwm-ext model and bidding field annotation data can be used to construct a Transformer model for bidding field analysis. Using the semantic expression ability brought by the large-scale Chinese pre-training model, and fine-tuning on this basis, can achieve good results.

Claude-instant scheme 4

insert image description here

The bidding field analysis model can be built on a large model such as Chinese BERT according to the following steps:

Prepare the bidding announcement data, and each announcement also marks the main content of the announcement and its corresponding bidding fields. The corresponding relationship between the main content of the announcement and the bidding fields is used as the training set data.

Choose a large Chinese model, such as BERT, RoBERTa, etc. as the basic model. It may be necessary to fine-tune the first few layers of the base model to suit downstream tasks.

Add the output layer of the bidding task to the last layer of the base model. The output layer contains tags of all bidding fields (such as bidders, qualification review, on-site bid evaluation, etc.), and each tag corresponds to an accuracy value that needs to be predicted.

The training data is processed by DataLoader, and the model training is carried out through the dataloader. The loss is the cross-entropy loss.

Expand the trained model on the test data, and for each main content of the announcement, the model will simultaneously output the accuracy prediction values ​​corresponding to all bidding marks.

According to the predicted results and real labels, the accuracy rate of each bidding mark is calculated to judge the performance of the model.

According to actual needs, the model can be optimized and adjusted to improve accuracy. It is also possible to consider adding other features (such as announcement attributes, location features, etc.) to the model, which may further improve the effect.

Combined with specific data and models, the above process can build a Transformer model for field analysis of bidding announcements. The accuracy of the model is determined by the sufficiency of the training data, the selection of the basic model and the optimization and adjustment of the trained parameters.

Wen Xin Yi Yan Program 5

insert image description here

To add the data marked in the bidding field to a large Chinese model, and build a Transformer model for parsing the bidding field, you can follow the steps below:

Data preprocessing: First, the bidding data needs to be preprocessed, including data cleaning, format conversion, and feature extraction. Chinese word segmentation, part-of-speech tagging, named entity recognition and other technologies can be used to extract key features in the data.
Feature engineering: Normalize the extracted features, remove outliers and missing values, etc., to improve the performance and robustness of the model.
Model training: Use the pre-trained Transformer model for training. You can use existing pre-training models, such as BERT, XLNet, etc., or build a pre-training model yourself.
Model prediction: Use the trained model to predict new bidding data, and use the prediction results for further analysis and decision-making.
It should be noted that in practical applications, issues such as data security and privacy protection, as well as model interpretability and scalability need to be considered. In addition, the model needs to be tuned and optimized to improve the performance and robustness of the model.

Bard Scheme 6

insert image description here
Add the data marked by the bid field to the large-scale Chinese model, and build a Transformer model for bid field parsing. The following steps can be followed:

Collect datasets for text and bid fields. The dataset should be large enough to train the model efficiently.

Preprocess data. This includes tagging text, tagging bid fields, and removing stop words.

Fine-tuning a large Chinese model on a dataset. This can be achieved through a technique called transfer learning.

Evaluate the model on a fixed test set. This will help you determine how your model will perform on unseen data.

Here are some additional details about each step:

  1. Collect datasets for text and bid fields

Datasets for text and bid fields can be collected from various sources such as:

online auction site

Government Procurement Website

Private Tendering Platform

When collecting data, it is important to ensure that text and bid fields are accurately labeled. This can be done by manually labeling data or automatically labeling data using a machine learning model.

  1. data preprocessing

Once the dataset is collected, it is important to preprocess the data before training the model. This includes tagging text, tagging bid fields, and removing stop words.

Tokenization is the process of breaking down text into individual words or tokens. This is necessary for the model to understand the meaning of the text.

Tagging a bid field is the process of identifying a specific bid field in the text. This is important for the model to learn how to parse the bid field.

Stop word removal is the process of removing common words from text. This is done to improve the performance of the model.

3. Fine-tuning the large Chinese model on the dataset

After the data has been preprocessed, it is time to fine-tune the large Chinese model on the dataset. This can be achieved through a technique called transfer learning.

Transfer learning is a technique that allows you to use a pre-trained model to train a new model for a different task. In this example, you will use a large Chinese model to train a model for bid field parsing.

The process of fine-tuning the model is relatively straightforward. You just need to feed the model with training data, and the model will learn to parse the bid fields.

  1. Evaluate the model on a fixed test set

Once the model has been fine-tuned, it is important to evaluate the model on a fixed test set. This will help you determine how your model will perform on unseen data.

The hold-out test set should be a set of text and bid fields that were not used to train the model. The model should be able to parse the bid fields in the hold out test set with high accuracy.

If the model performs poorly on the persistence test set, you may need to collect more data or fine-tune the model further.

I hope this helps you!

Guess you like

Origin blog.csdn.net/weixin_45934622/article/details/130308303