Wax torch Education: How AI programmers get a lot of open source data for practical exercises

Original title: candle Education: AI programmers how to get a lot of open source data for practical exercises

 Many big data, machine learning, artificial intelligence beginners need a lot of data to go to practice, because never before had contact with the depth of related fields, it is difficult to find a suitable training data, today the candle teacher education gave you recommend a few open source data collection sites.

 Wax torch Education: How AI programmers get a lot of open source data for practical exercises
First, the relatively simple data collection site
Data.gov, this is the US government's open data website contains the data set from the climate, education, energy, finance and other areas of more than 190,000.
data.WorldBank.org, this is the World Bank's Open Data site that provides the world development index, education index and other broad categories of data collection. 

 Second, the large data sets website
 Amazon WebServices (AWS) datasets, Amazon offers a complete Enron e-mail, Google Booksn-gram, NASA NEX , one million songs and other data sets, you can also use on the local computer in the Amazon platform.
 Googledatasets 
Google for the majority of developers to provide some of the data sets as part of its Big Query tools, including GiHub public library and all the stories and comments of Hacker News.

 Third, predictive modeling and machine learning datasets

 MachineLearning Repository UCI
 UCI machine learning repository is the current popular database, which includes a wide variety of data sets. Such as air quality, GPS trajectory large data sets.

 Kaggle
Kaggle launched a data collection platform, people can spontaneously contribute data, there are now a total of more than 350 data sets, of which over 200 are feature datasets.

 Fourth, the image classification dataset
 The MNISTDatabase
the currently most popular image recognition database and abroad, mainly to handwritten numbers. Examples include 60000 and 10000 examples of the test set.
        Chars74K
The data set includes a natural image character recognition, including 74,000 images.
 Frontal FaceImages
This data set is mainly positive face images collected by the CMU & MIT.

 Fifth, text classification dataset
 Movie ReviewData 
 This data set provides a site Schiller movie review documents, which marked the overall mood of the user polarity (positive or negative) or subjective evaluation and (subjective or objective) or its subjective pole position of labels

 Wax torch education instructor, said the site set by the above data, even a beginner can easily find the data you need to practice.

Guess you like

Origin blog.51cto.com/14355900/2402736