How big data entry novice machine learning?

Calculation technique commonly used to analyze the data, the data is dependent on the understanding machine learning. Over the years, for most developers, machine learning is very far away, it has been elusive.

This is probably now the highest income, is also one of the most popular of a technology. There is no doubt - as a developer, a machine learning is able to show its mettle stage.

Machine learning is a logical extension of simple data retrieval and storage. Through the development of various components to make smarter computer learning and behavior occur.

Machine learning makes mining historical data and predict future trends is possible. You may not realize that, but it does already using machine learning, and benefit a lot. Examples of machine learning and related to many, such as search engines produce results, online recommendation, advertising, fraud detection and spam filtering.

Machine Learning rely on data to make decisions. Intuition is important, but it is difficult to go beyond the empirical data.

All aspects of machine learning

Once you start in-depth exploration machine learning, you will encounter the following questions:

1. supervised and unsupervised learning
2. Category
3. Markov models, Bayesian networks, etc.

Mahout和Hadoop

Apache Mahout project's aim is to build a scalable machine learning libraries.

A degree of overlap between the major data analysis and hadoop

40 + annual salary of big data development [W] tutorial, all here!

Mahout built clustering, classification and collaborative filtering algorithms. Besides that:

1. Matrix Decomposition recommendation system
2. K--means, fuzzy k- means clustering algorithm
3. Latent Dirichlet Allocation Algorithm
4. Singular value decomposition
5. logistic regression classifier
6 (complementary) Naive Bayes classifier
7. random forest classifier

Machine learning that once required complex software and high-end computers, as well as data scientists. . For now machine learning, predictive analysis that is concerned, what is needed is a fully managed cloud services.

By using a drag (drag-and-drop) and some of the data flow graph can be carried out some experiments, as write code using general algorithm from tall.

Scientists write code data by R

For statistical and data mining is, R is a very popular open source projects. The good news is R can be easily integrated into the ML Studio. I have many friends in the use of machine learning function of language, such as F #. But it is clear, R in this field is still dominant.

Test mining and survey data show that the degree of popularity in recent years, R gradually. R is built by Ross Ihaka Auckland University in New Jersey and Robert Gentleman invention, currently in charge of the R Core Development Team (R Development Core Team) research and development, the development of which is a member of the Chambers. R naming is mainly based on the first letter of the name of the first two R's. R is a GNU project, mainly using C and Fortran language written.

How to analyze data

The best way to understand is to machine learning analysis broken down into three questions:

1. What happened?

a) be seen in historical perspective

2. What will happen?

a) predict the future

3. The next step should be how to do?

a) Specification and Guidelines

Analysis of the role played by everyone

1. Information Worker

a) typically use self-service tools Power BI: Office Power BI 365 is a self-service transaction Intelligent Solutions, provides the ability to analyze data and identify the underlying transaction prediction data visualization to information workers through BI Excel and Office 365.

2. IT experts

a) relates to a data conversion, data warehouse, create a data cube analysis and data modeling

3. Data Scientists

a) deep technology and skills, including coding, mathematics, statistics and probability
b) through a series of techniques used to predict the probability (such as probability within the next 18 hours rose to% 42)
c) such as Monte Carlo ( Monte Carlo) simulation, parameterized model
d) data scientists qualities

i domain knowledge.
ii For a clear understanding of the scientific method: objectives, assumptions, verification, transparency
iii good at mathematics and statistics.
iv thirst for knowledge and a strong ability to think.
v graphical description and communication skills.
vi and advanced computing. data management capabilities

academic background

If you want to get into school, learn to be a data scientist, you can choose the following courses:

1. Applied Mathematics
2. Computer Science
3. Economics
4. Statistics
5. Engineering

Benefit from the data science industry, including:

Financial Services
Telecommunications manufacturing utilities public health market




61. Big Data predictive
62. Big Data other applications
63. Large data which industries can be applied in
64. In the financial industry, large application data
65. Big Data applications in the Internet industry
66. The application of big data in the logistics industry

Guess you like

Origin blog.csdn.net/chengxvsyu/article/details/92181282