ChatGenTitle: A paper title generation model fine-tuned on the LLaMA model using information from millions of arXiv papers

insert image description here

Project design collection (artificial intelligence direction): Help newcomers quickly master skills in actual combat, complete project design upgrades independently, and improve their own hard power (not limited to NLP, knowledge graph, computer vision, etc.): Collect meaningful project design collections to help Newcomers quickly master skills in actual combat, helping users make better use of the CSDN platform, independently complete project design upgrades, and improve their own hard power.

insert image description here

  1. Column Subscription: Encyclopedia of Projects to Improve Your Hard Power

  2. [Detailed introduction of the column: project design collection (artificial intelligence direction): help newcomers quickly master skills in actual combat, complete project design upgrades independently, and improve their own hard power (not limited to NLP, knowledge graphs, computer vision, etc.)

ChatGenTitle: A paper title generation model fine-tuned on the LLaMA model using information from millions of arXiv papers

  • Related Information
  • 1. The training data set is in Cornell-University/arxiv and can be used directly;
  • 2. Officially released the LoRA model weights of LLaMa-Lora-7B-3 and LLaMa-Lora-7B-3-new versions, allowing local deployment;
    1. Completed the model fine-tuning based on alpaca -lora ;LLaMa-Lora-7B-3LLaMa-Lora-13B-3
    1. Started a long-term task of arXivregularly crawling cs.AI , cs.CV , cs.LG papers on the Internet, in order to support research in CS-related directions;
  • 5. Organized 220W+the meta information of arXiv papers, these meta information include: titleand abstract, more: id, , submitter, authors, comments, journal-ref, doi, categories;versions

1. Project Background

In research paper writing, generating an attractive and accurate title requires comprehensive consideration of multiple factors, which is an important challenge for paper authors. The difficulties in generating a paper title are:

  1. Concise but accurate: A good thesis title should be concise and concise, but at the same time accurately reflect the focus and core of the thesis research, which is a huge challenge for the author.
  2. Unique but understandable: The essay title should be unique and capture the interest of the reader, but also be understandable, avoiding overly general or overly esoteric vocabulary.
  3. Reflect the contribution of the research: A good thesis title should be able to clearly reflect the contribution of the research, highlight the innovations of the research, and make the reader's contribution to the research obvious.
  4. Avoid using catchphrases: Some common words, phrases, etc. may be used too much, which can make the paper title seem old, uninnovative, or even meaningless.

Recently, the large language model (Large Language Model, LLM) represented by ChatGPT, GPT-4, etc. has set off a new wave of research in the field of natural language processing, showing the ability of general artificial intelligence (AGI), and has been favored by the industry. extensive attention. In addition to these works, many scholars have begun to pay attention to the low-cost implementation of personal "ChatGPT" solutions, such as: stanford_alpaca[1], alpaca-lora[2], these solutions focus on large model fine-tuning, but we expect to explore large models in The landing of downstream tasks.

To this end, we pay attention to the field of paper title generation. ArXiv (full name: The arXiv.org e-Print archive) is a free and open academic preprint community created and maintained by Cornell University. It was launched in 1991 create. ArXiv is a library of electronic preprints and conference papers in mathematics, physics and other disciplines around the world. It contains a large number of high-quality academic papers and research reports, and its coverage is increasing day by day. arXiv contains many high-quality paper meta information. Through the open paper information on arXiv, we constructed a database containing meta-information of 2.2 million papers. These data are constructed into data pairs that can be used for large-scale model fine-tuning through data cleaning and so on.

Introducing these paper meta-information into large-scale model fine-tuning can have a positive impact on the difficulties involved in generating paper titles, and it can help in the following ways:

  1. Provide more accurate and extensive language models: Large models usually use a large amount of data for training, so their language models can more accurately explain natural language, be able to deal with more language scenarios, and improve the language expression ability of thesis topics.
  2. Provide more accurate semantic understanding: The large model adopts deep learning technology to construct a high-dimensional vector representation of language, thereby providing more accurate semantic understanding and helping to generate more precise and accurate thesis topics.
  3. Enhance creativity and innovation: The large model uses a large amount of training data and can extract rules from the data, thereby providing more vocabulary or sentence combinations, and enhancing the creativity and innovation of generating thesis topics.
  4. Improve efficiency: Compared with the traditional manual method, using large models to generate paper titles can greatly improve efficiency, not only reducing the time needed to write titles, but also not prone to significant errors and improving the quality of output.

In short, the introduction of large models can provide better help to solve the difficulties of generating thesis topics, and is expected to improve the ability of analysis, abstraction, and innovation.

2. Introduction to the arXiv dataset

The meta information of papers we collect includes all subject classifications, such as:

  1. Computer Science
  2. Mathematics
  3. Physics
  4. Statistics
  5. Electrical Engineering and Systems Science
  6. Economics
  7. Quantum Physics
  8. Materials Science
  9. Biology
  10. Quantitative Finance
  11. Information Science
  12. Interdisciplinary.

There are many specific subcategories under each category. For example, under the category of computer science, there are subcategories such as computer vision, machine learning, artificial intelligence, and computer network. If you want to find papers in a specific field, you can choose according to these categories.

Each paper contains meta information in the following fields:

{
    
    
	"id":string"0704.0001",
	"submitter":string"Pavel Nadolsky",
	"authors":string"C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-P. Yuan",
	"title":string"Calculation of prompt diphoton production cross sections at Tevatron and LHC energies",
	"comments":string"37 pages, 15 figures; published version",
	"journal-ref":string"Phys.Rev.D76:013009,2007",
	"doi":string"10.1103/PhysRevD.76.013009",
	"report-no":string"ANL-HEP-PR-07-12",
	"categories":string"hep-ph",
	"license":NULL,
	"abstract":string" A fully differential calculation in perturbative quantum chromodynamics is presented for the production of massive photon pairs at hadron colliders. All next-to-leading order perturbative contributions from quark-antiquark, gluon-(anti)quark, and gluon-gluon subprocesses are included, as well as all-orders resummation of initial-state gluon radiation valid at next-to-next-to-leading logarithmic accuracy. The region of phase space is specified in which the calculation is most reliable. Good agreement is demonstrated with data from the Fermilab Tevatron, and predictions are made for more detailed tests with CDF and DO data. Predictions are shown for distributions of diphoton pairs produced at the energy of the Large Hadron Collider (LHC). Distributions of the diphoton pairs from the decay of a Higgs boson are contrasted with those produced from QCD processes at the LHC, showing that enhanced sensitivity to the signal can be obtained with judicious selection of events. ",
	"versions": 
}
  • id: ArXiv ID (can be used to access the paper, see below)
  • submitter: Who submitted the paper
  • authors: Authors of the paper
  • title: Title of the paper
  • comments: Additional info, such as number of pages and figures
  • journal-ref: Information about the journal the paper was published in
  • doi: [https://www.doi.org](Digital Object Identifier)
  • abstract: The abstract of the paper
  • categories: Categories / tags in the ArXiv system
  • versions: A version history

3. LLMs fine-tuning

ChatGenTitle performs fine-tuning based on Meta's LLaMA model. The mainstream methods of fine-tuning are: Instruct fine-tuning and LoRa fine-tuning.

Instruct trimming and LoRa trimming are two different technologies. Instruct fine-tuning refers to the process of adjusting model parameters during deep neural network training to optimize the performance of the model. In the fine-tuning process, a pre-trained model is used as the base model, and then the model is fine-tuned on a new dataset. Instruct fine-tuning is a fine-tuning method done by updating all parameters of the pre-trained model, making it suitable for multiple downstream applications through fine-tuning. LoRa fine-tuning refers to the process of fine-tuning the LoRa node parameters in the Low Power Wide Area Network (LoRaWAN) to improve the transmission efficiency of the nodes. In LoRa fine-tuning, it is necessary to understand the hardware and network deployment of the nodes, and optimize the transmission efficiency by making small adjustments to the node parameters. Compared with Instruct fine-tuning, LoRA injects trainable layers in each Transformer block, since no gradients need to be computed for most of the model weights, greatly reducing the number of training parameters and lowering GPU memory requirements. The study found that fine-tuning with LoRA was of comparable quality to full-model fine-tuning, was faster and required less computation. Therefore, if you have low latency and low memory requirements, it is recommended to use LoRA fine-tuning.

So we choose to use LoRA fine-tuning to build the whole ChatGenTitle.

#下载项目
git clone https://github.com/tloen/alpaca-lora.git

#安装依赖
pip install -r requirements.txt

#转化模型
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir ../model/ \
    --model_size 7B \
    --output_dir ../model/7B-hf
	
#单机单卡训练模型
python finetune.py \
    --base_model '../model/7B-hf' \
    --data_path '../train.json' \
    --output_dir '../alpaca-lora-output'

#单机多卡(4*A100)训练模型
WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=3192 finetune.py \
    --base_model '../model/7B-hf' \
    --data_path '../train.json' \
    --output_dir '../alpaca-lora-output' \
    --batch_size 1024 \
    --micro_batch_size 128 \
	--num_epochs 3
  • online access

Before starting to deploy and use, we need to know the definition of two models. The whole project will have two models, LLaMA and LoRA. The LoRA model is the weight we fine-tuned and saved, and the LLaMA weight is the large model pre-training weight open sourced by Meta company. We can think of the generated LoRA weights as the patch weights of the original LLaMA model. So we want to load two different models at the same time. The LoRA models we have provided so far are:

model name fine tune data Fine-tuning the baseline model model size Fine-tune the duration
LLaMa-Lora-7B-3 arXiv-50-all LLaMa-7B 148.1MB 9 hours
LLaMa-Lora-7B-3-new arXiv-50-all LLaMa-7B 586MB 12.5 hours
LlaMa-Lora-13B-3 arXiv-100-all Flame-13B 230.05MB 26 hours

More models will be released soon!

Prepare the two weights you need, and you can start using them:

#推理
python generate.py \
    --load_8bit \
    --base_model '../model/7B-hf' \
    --lora_weights '../alpaca-lora-output'

After the model is running, 127.0.0.1:7860just access it.

Then Instructiontype in:

If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.

Type in Input:

<你论文的摘要>:Waste pollution is one of the most important environmental problems in the modern world. With the continuous improvement of the living standard of the population and the increasing richness of the consumption structure, the amount of domestic waste generated has increased dramatically and there is an urgent need for further waste treatment of waste. The rapid development of artificial intelligence provides an effective solution for automated waste classification. However, the large computational power and high complexity of algorithms make convolutional neural networks (CNNs) unsuitable for real-time embedded applications. In this paper, we propose a lightweight network architecture, Focus-RCNet, designed with reference to the sandglass structure of MobileNetV2, which uses deeply separable convolution to extract features from images. The Focus module is introduced into the field of recyclable waste image classification to reduce the dimensionality of features while retaining relevant information. In order to make the model focus more on waste image features while keeping the amount of parameters computationally small, we introduce the SimAM attention mechanism. Additionally, knowledge distillation is used to further compress the number of parameters in the model. By training and testing on the TrashNet dataset, the Focus-RCNet model not only achieves an accuracy of 92%, but also has high mobility of deployment.

Just click Submitand wait!

OutputThe output is the essay title that ChatGenTitle generates for you.

4. Model effect display

Note: The LLaMA model released by Meta is prohibited for commercial use, so here we open source the LoRA model. The LoRA model must be used with the corresponding version of the LLaMA model.

model name fine tune data Fine-tuning the baseline model model size Fine-tune the duration Fine tune the effect
LLaMa-Lora-7B-3 arXiv-50-all LLaMa-7B -MB 9 hours

|✅ LLaMa-Lora-7B-3-new |arXiv-50-all|LLaMa-7B|-MB|12.5 hours| |

|✅ LLaMa-Lora-7B-cs-3-new |arXiv-cs |LLaMa-7B|-MB|20.5 hours| |

|✅ LLaMa-Lora-7B-cs-6-new |arXiv-cs|LLaMa-7B|-MB|34 hours| |

|✅LLaMa-Lora-13B-3 |arXiv-100-all|LLaMa-13B|-MB|26 hours||

  • Training settings: Experiments are performed on A100 (4 cards, 80GB)

5. LLM effect comparison

prompt word Summary original thesis title ChatGenTitle ChatGPT(GPT3.5) GPT4 ChatGLM(130B)
Prompt ① Summary ① Focus-RCNet: A lightweight recyclable waste classification algorithm based on Focus and knowledge distillation Focus-RCNet: A Lightweight Convolutional Neural Network for Recyclable Waste Image Classification Focus-RCNet: A lightweight deep learning model for automated waste classification with enhanced recyclable waste image feature recognition Efficient Waste Classification with Focus-RCNet: A Lightweight Deep Learning Architecture Employing Sandglass Structure, SimAM Attention Mechanism, and Knowledge Distillation for Real-Time Embedded Applications Exceeds the Token length
prompt word② Summary② ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices ShuffleNet: Efficient Convolutional Neural Networks for Mobile Devices ShuffleNet: A Computation-Efficient CNN Architecture for Mobile Devices with Superior Performance in Object Detection and ImageNet Classification while Maintaining Accuracy ShuffleNet: A Computationally Efficient CNN Architecture for Mobile Devices with Enhanced Performance in ImageNet Classification and MS COCO Object Detection ShuffleNet: An Extremely Computation-Efficient CNN Architecture for Mobile Devices
prompt word③ Abstract③ Segment Anything Segment Anything Segment Anything: Introducing a New Task, Model, and Dataset for Promptable Image Segmentation with Superior Zero-Shot Performance Exploring the Segment Anything Project: A Promptable Image Segmentation Model and Extensive Dataset with Impressive Zero-Shot Performance Segment Anything (SA) Project: A New Task, Model, and Dataset for Image Segmentation

5.1. Prompt word ① and abstract ①

  • 提示词①:If you are an expert in writing papers, please generate a good paper title for this paper based on other authors’ descriptions of their abstracts.
  • 摘要①:Waste pollution is one of the most important environmental problems in the modern world. With the continuous improvement of the living standard of the population and the increasing richness of the consumption structure, the amount of domestic waste generated has increased dramatically and there is an urgent need for further waste treatment of waste. The rapid development of artificial intelligence provides an effective solution for automated waste classification. However, the large computational power and high complexity of algorithms make convolutional neural networks (CNNs) unsuitable for real-time embedded applications. In this paper, we propose a lightweight network architecture, Focus-RCNet, designed with reference to the sandglass structure of MobileNetV2, which uses deeply separable convolution to extract features from images. The Focus module is introduced into the field of recyclable waste image classification to reduce the dimensionality of features while retaining relevant information. In order to make the model focus more on waste image features while keeping the amount of parameters computationally small, we introduce the SimAM attention mechanism. Additionally, knowledge distillation is used to further compress the number of parameters in the model. By training and testing on the TrashNet dataset, the Focus-RCNet model not only achieves an accuracy of 92%, but also has high mobility of deployment.

5.2 Prompt word② and summary②

  • 提示词②:If you are an expert in writing papers, please generate a good paper title for this paper based on other authors’ descriptions of their abstracts.
  • 摘要②:We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13x actual speedup over AlexNet while maintaining comparable accuracy.

5.3 Prompt word ③ and abstract ③

  • 提示词③:If you are an expert in writing papers, please generate a good paper title for this paper based on other authors’ descriptions of their abstracts.
  • 摘要③:We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images.

6.QA

  1. About Instruct fine-tuning and LoRa fine-tuning

Instruct trimming and LoRa trimming are two different technologies.
Instruct fine-tuning refers to the process of adjusting model parameters during deep neural network training to optimize the performance of the model. In the fine-tuning process, a pre-trained model is used as the base model, and then the model is fine-tuned on a new dataset. Instruct fine-tuning is a fine-tuning method done by updating all parameters of the pre-trained model, making it suitable for multiple downstream applications through fine-tuning.
LoRa fine-tuning refers to the process of fine-tuning the LoRa node parameters in the Low Power Wide Area Network (LoRaWAN) to improve the transmission efficiency of the nodes. In LoRa fine-tuning, it is necessary to understand the hardware and network deployment of the nodes, and optimize the transmission efficiency by making small adjustments to the node parameters. Compared with Instruct fine-tuning, LoRA injects trainable layers in each Transformer block, since no gradients need to be computed for most of the model weights, greatly reducing the number of training parameters and lowering GPU memory requirements.
The study found that fine-tuning with LoRA was of comparable quality to full-model fine-tuning, was faster and required less computation. Therefore, if you have low latency and low memory requirements, it is recommended to use LoRA fine-tuning.

  1. Why are there two models, LLaMA model and LoRA?

As mentioned in 1, there are many ways to fine-tune the model. LoRA-based fine-tuning generates and saves new weights. We can consider the generated LoRA weights as the patch weights of the original LLaMA model . As for the LLaMA weight, it is a large model pre-training weight open sourced by Mean.

  1. About vocabulary expansion

Adding vocabulary is destructive to a certain extent. One is to destroy the original word segmentation system, and the other is to increase the untrained weight. So if you can't do enough training, there may be a big problem. I personally think that if it is not a particularly specialized field (such as biomedicine and other fields involving a lot of professional vocabulary), there is not much need to expand the English vocabulary. Chinese-LLaMA-Alpaca/issues/16

references

Project code source download

https://download.csdn.net/download/sinat_39620217/88010022

Guess you like

Origin blog.csdn.net/sinat_39620217/article/details/132123572