Mathematical basis of Big Data technologies involved What?

As the working relationship, in the presence of these two types of people around me, one of the school students are learning, the second is IT company engaged in R & D engineers. They appeared two extremes in mathematics learning and application. College students, especially freshman, sophomore students each semester there are some, such as mathematical analysis, linear algebra, number theory like mathematics curriculum, although you can hear Leibniz and Newton disputes the story in the classroom, Descartes love story, but they often feel very confused, because they do not know the mathematical knowledge learned in the end what's the use. For IT R & D personnel, before they enter the big data-related posts, always think first learn mathematics, but at the end of the vast world of mathematics, where is the big data technology?

More exciting content, please point me

     It comes to big data technology, a lot of people think of math, probably because solid numbers in mathematics system location bar, which is taken for granted. In this paper, some research on the mathematical basis of big data technology this problem.

We know that the three major branches of mathematics, namely algebra, geometry and analysis, each branch with the development of research extending from a lot of small branches. In this system, mathematics, and mathematical foundations Big Data technology germane are the following categories. In particular it should be noted that, due to more involved methods of mathematical knowledge, the specific application of mathematical methods can be found in my "big data processing and Internet applications," a book models, algorithms, and other sections on privacy protection. Here just to be a general overview, you can have a general understanding.

(1) Probability and Mathematical Statistics

This relationship of parts to big data technology development very closely, the basic concept of conditional probability, independence of random variables and their distribution, multi-dimensional random variables and their distributions, analysis of variance and regression analysis, random processes (in particular Markov), parameter estimation , Bayes theory is important in large data modeling, mining. Big data has a natural high dimensional feature, designed to analyze the data model in the high-dimensional space you need a foundation of random variables and distribution of multi-dimensional. Bayes Theorem is one of the basic classifier constructed. In addition to these basics, CRFs of CRF, Hidden Markov Models, n-gram in other data analysis can be used for large vocabulary analysis, text prediction may be used to construct a classification model.

Of course, in probability theory based on the large data analysis information theory also has a role, such as information gain method, mutual information for the characterization information theory concepts are inside.

(2) Linear Algebra

     Relationship mathematical knowledge and data technologies large part of this is also very close, matrix transpose, block rank matrices, vectors, orthogonal matrix vector space, eigenvalue and eigenvector data modeling the like is large, the analysis a commonly used technique.

In large Internet data, many analysts target application scenarios can be abstracted in a matrix, said a large number of Web pages and their relationships, and their relationships microblogging users, text focused text and vocabulary relationships, etc., can be represented by a matrix. For example, for Web pages and their relationship when expressed as a matrix, the matrix element represents the relationship between a page and another page b, this relationship can be directed relationship, 1 means there is a hyperlink between a and b, 0 represents a, no hyperlinks between b. The famous PageRank algorithm is to quantify the importance of a page based on this matrix, and prove its convergence.

Various matrix-based operations, such as matrix decomposition pathway analysis is the extraction of the object feature, because the matrix represents some transformation or mapping, matrix thus obtained decomposition represents a number of new features in the new analysis target space . Therefore, the application of singular value decomposition SVD, PCA, NMF, MF and other large data analysis is very extensive.

If you are interested in big data development, want to learn the system big data, you can join the big data exchange technology to learn buttoned group: 458 345 782, welcome to add, to understand course descriptions, access to learning resources

(3) optimization method

      Pathway model learning training is used to solve many mining model parameters, the basic problem: Given a function f: A → R, to find an element a0∈A, such that (a0) ≤f A for all of a, f (a) (minimized); or f (a0) ≥f (a) (maximize). Depending on the form of the function optimization methods, for the moment, are usually based on the optimization method the differential method, the derivative, such as gradient descent, climbing method, least square method, conjugate distribution method and the like.

More exciting content, please point me

     (4) Discrete Mathematics

      The importance of discrete mathematics self-evident, it is the foundation of all branches of computer science, and an important foundation for naturally big data technology. Here not carried out.

Finally, mention is that many people think they bad math, Big Data technology development and application can be done, it is not. To make it clear what role to act as their own big data development applications. Big Data entry point for technology research and application of reference following the mathematical knowledge is mainly reflected in the data mining and model layer, these mathematical knowledge and methods need to be mastered.

Of course, on other levels, the use of these mathematical methods for improved algorithm is very meaningful, such as access to the data layer, it can estimate the value of the reptile collection page using a probability model, so that it can make better judgments. In computing and large data storage layer, using matrix calculation block parallel computing.

     If it is big data technology research and development on the other level, it does not require much mathematical methods, so long as the code on it.

Published 38 original articles · won praise 27 · views 40000 +

Guess you like

Origin blog.csdn.net/HAOXUAN168/article/details/104101986
Recommended