AI industry started going to need white open-source data and language

To master a new technology is not difficult to have a systematic understanding of what we've learned, there must be planning learning curve

The first to have Java, Python, Linux-related knowledge of the language, which is very popular the more sought after moment of prophecy, if you have never written above three kinds of code does not matter, as long as you are engaged in development work, have knowledge of other languages be able to quickly grasp the basis of the above three languages. Python AI which is the best development language, often used to build robots smart phone development and CRM systems management.

To understand and be able to build a second large data architecture in the enterprise business scenarios, such as the most commonly used Hadoop, Spark, Flume and other basic components to be programmed to a skilled components built into a cluster architecture flexibility to run.

Third, we must be familiar with and proficient use of machine learning algorithms related to the selection algorithm based on the business problem to be solved, such as a robot to solve the phone is really easy to use or how to use time, we need the data with the results of the feedback and constantly adjust and optimize them in the face of the flow of information is necessary to take into account the recommendation and go heavy two business scenarios, for the selection of these two scenarios related algorithms, and the data and the results of them constantly optimized to achieve optimal.

Many big data, machine learning, artificial intelligence beginners need a lot of data to go to practice, because never before had contact with the depth of related fields, it is difficult to find a suitable training data, we recommend a few open-source data collection sites.

First, the relatively simple data collection sites

Data.gov, this is the US government's open data website contains the data set from the climate, education, energy, finance and other areas of more than 190,000.

data.WorldBank.org, this is the World Bank's Open Data site that provides the world development index, education index and other broad categories of data collection.

Second, the large data sets website

Amazon WebServices (AWS) datasets, Amazon offers a complete Enron e-mail, Google Booksn-gram, NASA NEX, one million songs and other data sets, you can also be used on the local computer in the Amazon platform.

Googledatasets

Google for the majority of developers to provide some of the data sets as part of its Big Query tools, including GiHub public library and all the stories and comments of Hacker News.

Third, predictive modeling and machine learning datasets

UCI MachineLearning Repository

UCI machine learning repository is the current popular database, which includes a wide variety of data sets. Such as air quality, GPS trajectory large data sets.

Kaggle

Kaggle launched a data collection platform, people can spontaneously contribute data, there are now a total of more than 350 data sets, of which over 200 are feature datasets.

Fourth, the classification image data set

The MNISTDatabbse

The most popular current domestic and international image recognition database, mainly handwritten numbers. Examples include 60000 and 10000 examples of the test set.

Chars74K

The data set includes a natural image character recognition, including 74,000 images.

Frontal FaceImages

This data set is mainly positive face images collected by the CMU & MIT.

Fifth, text classification dataset

Movie ReviewData

This data set provides a site Schiller movie review documents, which marked the overall mood of the user polarity (positive or negative) or subjective evaluation and its subjective status (subjective or objective) or polarity label

ps: This article reprinted from Europe intelligently please specify

Guess you like

Origin blog.51cto.com/14387331/2412697