之前遇到的一些数据集,自己收集一下,归到一起使用方便,可能不是很全,持续更新汇总。。。
1. Image Datasets — 图像数据集
2. Speech Datasets — 语音数据集
Dataset | Link |
---|---|
Google Audioset | https://research.google.com/audioset/dataset/index.html |
TIMIT | http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1 |
VoxForge | http://www.voxforge.org/ |
2000 HUB5 English | https://catalog.ldc.upenn.edu/LDC2002T43 |
LibriSpeech | http://www.openslr.org/12/ |
VoxCeleb | http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ |
Open SLR | https://www.openslr.org/51 |
CALLHOME American English Speech | https://catalog.ldc.upenn.edu/LDC97S42 |
3. Text Datasets — 文本数据集
Dataset | Link |
---|---|
English Broadcast News | https://catalog.ldc.upenn.edu/LDC97S44 |
SQuAD | https://rajpurkar.github.io/SQuAD-explorer/ |
Billion Word Dataset | http://www.statmt.org/lm-benchmark/ |
20 Newsgroups | http://qwone.com/~jason/20Newsgroups/ |
Google Books Ngrams | https://aws.amazon.com/datasets/google-books-ngrams/ |
UCI Spambase | https://archive.ics.uci.edu/ml/datasets/Spambase |
Common Crawl | http://commoncrawl.org/the-data/ |
Yelp Open Dataset | https://www.yelp.com/dataset |
4. Natural Language Datasets — 自然语言数据集
Dataset | Link |
---|---|
Web 1T 5-gram | https://catalog.ldc.upenn.edu/LDC2006T13 |
Blizzard Challenge 2018 | https://www.synsig.org/index.php/Blizzard_Challenge_2018 |
Flickr personal taxonomies | https://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html |
Multi-Domain Sentiment Dataset | http://www.cs.jhu.edu/~mdredze/datasets/sentiment/ |
Enron Email Dataset | https://www.cs.cmu.edu/~./enron/ |
Blogger Corpus | http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm |
Wikipedia Links Data | https://code.google.com/archive/p/wiki-links/downloads |
Gutenberg eBooks List | http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs |
SMS Spam Collection | http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/ |
UCI’s Spambase data | https://archive.ics.uci.edu/ml/datasets/Spambase |
5. Geospatial Datasets — 地理空间数据集
Dataset | Link |
---|---|
OpenStreetMap | https://www.openstreetmap.org |
Landsat8 | https://landsat.gsfc.nasa.gov/landsat-8/ |
NEXRAD | https://www.ncdc.noaa.gov/data-access/radar-data/nexrad |
ESRI Open data | https://hub.arcgis.com/pages/open-data |
USGS EarthExplorer | https://earthexplorer.usgs.gov/ |
OpenTopography | https://opentopography.org/ |
NASA SEDAC | https://sedac.ciesin.columbia.edu/ |
NASA Earth Observations | https://neo.sci.gsfc.nasa.gov/ |
Terra Populus | https://terra.ipums.org/ |
6. Recommender Systems Datasets — 推荐系统数据集
Dataset | Link |
---|---|
Movielens | https://grouplens.org/datasets/movielens/ |
Million Song Dataset | https://www.kaggle.com/c/msdchallenge |
Last.fm | https://grouplens.org/datasets/hetrec-2011/ |
Book-crossing Dataset | http://www2.informatik.uni-freiburg.de/~cziegler/BX/ |
Jester | https://goldberg.berkeley.edu/jester-data/ |
Netflix Prize | https://www.netflixprize.com/ |
Pinterest Fashion Compatibility | http://cseweb.ucsd.edu/~jmcauley/datasets.html#pinterest |
Amazon Question and Answer Data | http://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_qa |
Social Circles Data | http://cseweb.ucsd.edu/~jmcauley/datasets.html#socialcircles |
7. Economics and Finance Datasets — 经济和金融数据集
Dataset | Link |
---|---|
Quandl | https://www.quandl.com/ |
World Bank Open Data | https://data.worldbank.org/ |
IMF Data | https://www.imf.org/en/Data |
Financial Times Market Data | https://markets.ft.com/data/ |
Google Trends | https://trends.google.com/trends/?q=google&ctab=0&geo=all&date=all&sort=0 |
American Economic Association | https://www.aeaweb.org/resources/data/us-macro-regional |
US stock Data | https://github.com/eliangcs/pystock-data |
World Factbook | https://www.cia.gov/library/publications/download/ |
Dow Jones Index Data Set | http://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index |
8. Autonomous Vehicles Datasets — 自动驾驶数据集
Dataset | Link |
---|---|
BDD100k | https://bdd-data.berkeley.edu/ |
Baidu Apolloscapes | http://apolloscape.auto/ |
Comma.ai | https://archive.org/details/comma-dataset |
Oxford’s Robotic Car | https://robotcar-dataset.robots.ox.ac.uk/ |
Cityscape Dataset | https://www.cityscapes-dataset.com/ |
CSSAD Dataset | http://aplicaciones.cimat.mx/Personal/jbhayet/ccsad-dataset |
KUL Belgium Traffic Sign Dataset | http://www.vision.ee.ethz.ch/~timofter/traffic_signs/ |
LISA | http://cvrr.ucsd.edu/LISA/datasets.html |
Bosch Small Traffic Light | https://hci.iwr.uni-heidelberg.de/node/6132 |
LaRa Traffic Light Recognition | http://www.lara.prd.fr/benchmarks/trafficlightsrecognition |
WPI Datasets | http://computing.wpi.edu/dataset.html |
Reference:
《A review of deep learning with special emphasis on architectures,
applications and recent trends》