Article directory
Problems encountered
Using the server to download the Huggingface data set displays ConnectionError: Couldn't reach 'Salesforce/dialogstudio' on the Hub (ConnectionError)
The specific code is as follows:
dataset = load_dataset("Salesforce/dialogstudio", "TweetSumm")
The specific error information is as follows:
1451 raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).__name__})")
1452 elif "404" in str(e):
1453 msg = f"Dataset '{path}' doesn't exist on the Hub"
ConnectionError: Couldn't reach 'Salesforce/dialogstudio' on the Hub (ConnectionError)
Solution
This is because the server cannot connect to huggingface, that is, the server cannot access the external network and cannot download the data set. So what is the solution?
Use a local computer to download the data set locally, and then manually upload it to the server.
On local computer run:
from datasets import Dataset, load_dataset, load_from_disk
dataset = load_dataset("Salesforce/dialogstudio", "TweetSumm")
dataset.save_to_disk("dataset/Salesforce/dialogstudio") # 保存到该目录下
dataset
Use save_to_disk
to save the dataset to the local disk, then upload the dataset folder to the server and upload it to the same path of the server code.
An example of my server is as follows:
The server computer runs:
from datasets import Dataset, load_dataset, load_from_disk
# dataset = load_dataset("Salesforce/dialogstudio", "TweetSumm")
dataset = load_from_disk("dataset/Salesforce/dialogstudio")
dataset
Use load_from_disk
Import dataset from disk.
reference
[1] https://blog.csdn.net/weixin_44942303/article/details/129859895