Download huggingface pre-trained model to local and call it

write in front

In an era when large models are rampant, it is really hard for researchers who cannot connect to the external network on the server. Every time they want to try large models such as CLIP and BLIP, they will get "requests.exceptions.ConnectionError: (MaxRetryError ("HTTPSConnectionPool(host='huggingface.co', port=443)" or "requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443)): Max retries exceeded with url: /api/models/bert-base-uncased/tree/main?recursive=True&expand=False (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f4326e15e50>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 390a0157-95dd-416d-80c5-79f4fdd4b6d1)')" similar "rewards".

In fact, the main reason is that the external network cannot be accessed and the pre-trained weights cannot be downloaded from huggingface. A simple way is to download the pre-trained weight file locally and then upload it to the server. However, after searching for relevant tutorials on the Internet, it is not particularly easy to use. I hereby record it here so that I can encounter similar problems in the future. Question reference.

The following takes BLIP calling BERT as an example.

The error statement is:

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

solution

1. Enter the huggingface official website: https://huggingface.co/
Insert image description here
2. Enter the name of the model you want to download in the search box, such as bert-base-uncased
Insert image description here
3. Find the link corresponding to the model you need as shown in the figure, and click on it. Click Files and versions
Insert image description here
4. Download the required files. Here, taking Torch as an example, you need to download four files, namely config.json , pytorch_model.bin , tokenizer.json , vocab.txt
Insert image description here
5. Create a new folder locally, which I call here BERT, then download the above four files to this directory. Note: the file name and suffix cannot be changed.
Insert image description here
6. Upload the folder to the folder where the server project is located, and use the parameters in from_pretrained()

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Modify to change the folder path on the server, such as

tokenizer = BertTokenizer.from_pretrained('/data/timer/BLIP/BERT')

Insert image description here
This is mainly because BertTokenizer.from_pretrained() can accept several parameters, short-cut name (abbreviated name, similar to bert-base-uncased), identifier name (similar to microsoft/DialoGPT-small), Folders, files.

In fact, there are related tutorials online, but they do not give an example or are not comprehensive enough. In particular, there are no particularly detailed instructions on how to place downloaded files. I hope this blog can help everyone.

reference:

  1. How to download the huggingface transformers pre-trained model locally and use it?
  2. How to download the model from huggingface official website

Guess you like

Origin blog.csdn.net/fovever_/article/details/134422603