深度学习常用数据集汇总
CV
- ghcn
- climate_sphere
- ModelNet40
- Shrec17 data + label
- cosmo Spherical convergence maps dataset | Zenodo
Classification
- Fashion-MNIST
- ImageNet
- CIFAR-10 + CIFAR-100
- CelebA Dataset
- MS-Celeb-1M
- SVHN The Street View House Numbers (SVHN) Dataset
- Open Images Dataset
NLP
Sentiment Analysis
Text Classification
Dialogue Generation
- Reddit-Thread Dataset
- SimpleQuestions (v2)
- Web data: Amazon reviews
- The WikiText Long Term Dependency Language Modeling Dataset
其他
Audio
Multi-Modal
Classification
- Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model (2019)
- MUStARD: Multimodal Sarcasm Detection Dataset (ACL, 2019)
- CMU-Multimodal SDK
- UR-FUNNY
- CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotations of Modality (2020)
- Iemocap: interactive emotional dyadic motion capture database (2008)
- MM-IMDB
Search & Matching
Image Captioning
VisualQA
Tri-Modal
其他
- SVLD: The Social Vision and Language Dataset
- https://dubbel.eecs.berkeley.edu/minio/login
- AI-NLP-ML GROUP
- https://dumps.wikimedia.org/backup-index-bydb.html
- 汉语语料库
中文NLP数据集搜索(命名实体识别、文本分类、文本摘要)
参考资料