When using a custom data set to train Stable Diffusion, the data set folder was created according to the Datasets document , and an error was reported when running the test code.
from datasets import load_dataset
ds = load_dataset('imagefolder', data_files='/xxxxx')
ds["train"][0]
#>>>FileNotFoundError: Unable to find 'xxxxx' at /
Check the directory data format and keep it consistent with the demo
folder/train/metadata.jsonl
folder/train/0001.png
folder/train/0002.png
folder/train/0003.png
After checking the source code, we found that the source code used to fs.glob
traverse data_files
that level of folders. If you need multi-level folder traversal, you need to pass in a glob expression, that is, add it after the path.**
glob_iter = [PurePath(filepath) for filepath in fs.glob(pattern) if fs.isfile(filepath)]
Change the code to
from datasets import load_dataset
ds = load_dataset('imagefolder', data_files='/xxxxx/**')
ds["train"][0]
successfully solved