Huggingface encountered Couldn't reach xxx on the Hub (ConnectionError) solution

Problems encountered

Using the server to download the Huggingface data set displays ConnectionError: Couldn't reach 'Salesforce/dialogstudio' on the Hub (ConnectionError)

The specific code is as follows:

dataset = load_dataset("Salesforce/dialogstudio", "TweetSumm")

The specific error information is as follows:

 1451         raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).__name__})")
   1452     elif "404" in str(e):
   1453         msg = f"Dataset '{path}' doesn't exist on the Hub"

ConnectionError: Couldn't reach 'Salesforce/dialogstudio' on the Hub (ConnectionError)

Solution

This is because the server cannot connect to huggingface, that is, the server cannot access the external network and cannot download the data set. So what is the solution?

Use a local computer to download the data set locally, and then manually upload it to the server.

On local computer run:

from datasets import Dataset, load_dataset, load_from_disk
dataset = load_dataset("Salesforce/dialogstudio", "TweetSumm")
dataset.save_to_disk("dataset/Salesforce/dialogstudio") # 保存到该目录下
dataset

Use save_to_diskto save the dataset to the local disk, then upload the dataset folder to the server and upload it to the same path of the server code.

An example of my server is as follows:
Insert image description here

The server computer runs:

from datasets import Dataset, load_dataset, load_from_disk
# dataset = load_dataset("Salesforce/dialogstudio", "TweetSumm")
dataset = load_from_disk("dataset/Salesforce/dialogstudio")
dataset

Use load_from_diskImport dataset from disk.

reference

[1] https://blog.csdn.net/weixin_44942303/article/details/129859895

Supongo que te gusta

Origin blog.csdn.net/shizheng_Li/article/details/132919987
Recomendado
Clasificación