40 trillion "new infrastructure" Come! The new programmers the opportunity to stand up finally came!


Since March, "new infrastructure" has become a hot word, the country's 31 provinces announced investment in the future "new infrastructure" in total investment has more than 40 trillion!

This means that seven areas "new infrastructure" contains: 5G infrastructure, especially high-pressure, high-speed intercity rail and intercity rail transportation, new energy vehicles charging pile, large data centers, artificial intelligence, the Internet industry, will usher the new strategic growth.

Among them, artificial intelligence technology applications to enhance business scenario, industry will continue to expand. According to industry forecasts Prospective Research Institute, in 2020 the market size of artificial growth of 45%, far exceeding the growth rate of the global market level.

Central set the tone "new infrastructure", the Internet giant to continue to overweight artificial intelligence. However, AI industry is facing a huge talent gap! In this environment, if you are interested candidate related technical post, only hard skills, honed as early as possible, such as serious brush title, winning in a possible interview.

The following share classic face questions 2 algorithm Kong, selected from "a hundred faces Machine Learning: Algorithms Engineer take you to the interview," a book.

(Click on the picture needs to be purchased, 5% discount in ....)

LDA (linear discriminant analysis) and PCA differences and relations

First, LDA extended to many types of high-dimensional case, and problem solving correspond to 1 in the PCA. Suppose there are N classes, and eventually need to d-dimensional feature reduction. Therefore, we need to find a hyperplane d-dimensional projection , so that the sample points meet after the LDA projection target - to maximize the inter-class distance and the minimum distance classes.

Recalling the two scatter matrix, within-class scatter matrix in the category increase still meets the definition of when to N, and between the two types of problems before class scatter matrix can not be increased in accordance with the original definition of the category after. Figure 4.6 shows the distribution of three types of samples, which represent three central brown yellow green sample, [mu] represents the mean of three centers (i.e., the center of the total sample), Swi represents the class i class scatter. We can define a new matrix St, to represent the whole of the global divergence, called the global divergence matrix

If the inter-dispersion is defined as the global class scatter and the class of divergence, i.e. St = Sb + Sw, then the between-class scatter matrix may be expressed as

Where mj is the number of samples of the j-th category, N is the total number of categories. From formula (4.29) As can be seen, the divergence between classes in each category is represented by the global center to center a weighted distance. We maximize between-class scatter optimization is in fact the center of each category of projected from the projector after the global center far enough.

According to the principle of LDA, the goal of maximizing can be defined as

Wherein W is a need to solve projection hyperplane, WTW = I, according to some of the conclusions in Questions 2 and 3, we can deduce maximized J (W) corresponds to the following generalized eigenvalue problem solving

Solving optimal projection plane i.e. solving matrix d eigenvectors corresponding to the composition of a large front Eigenvalue, which is the original feature space projection to a new d-dimensional space. At this point we got to the PCA similar steps, but with LDA method for solving multiple category labels high-dimensional data.


(1) Calculate the mean vector μj data set for each category of samples, and the overall mean vector μ.

(2) within a computing class scatter matrix Sw, the overall scatter matrix St, and with between-class scatter matrix .

(3) matrix eigenvalue decomposition, the eigenvalues in descending order.

(4) wherein prior to taking a large value of d eigenvectors corresponding to the n-dimensional samples are mapped by mapping to a dimension d

From the PCA and solution procedures for both dimension reduction methods LDA point of view, they do have a lot of similarities, but the corresponding principle is somewhat different.

First departure from the target, PCA is the largest selection of the direction of the projection data variance. Because it is unsupervised, so the greater the assumed variance PCA, the more information, the main component is represented by the original data redundancy may be removed dimensions, to dimension reduction. And the LDA projection within the selected class variance small, large variance between classes direction. Category which uses tag information to find the data discriminant having a dimension such that the raw data projected in these directions, as different types of separate regions.

As a simple example, in speech recognition, we want to extract the human voice from an audio signal in, then you can use PCA to reduce the dimension to filter out some fixed frequency (less variance) of the background noise. But if our demand is to distinguish this voice from the audio belongs to which person, then we should use the LDA data dimension reduction, so that everyone has the distinction of speech signals.

In addition, in the field of face recognition, PCA and LDA will be used frequently. Face Recognition Based PCA is also called eigenfaces (Eigenface) method, which will expand the face image formed by a line of high-dimensional vector, the covariance matrix of the plurality of facial features do eigenvalue decomposition, wherein the larger feature corresponding to a value having a feature vector similar to the human face shape, so called eigenfaces. Eigenface for Recognition people will face with a paper face representation 7 wherein (see Fig. 4.7), so the characteristics of the original image can be instantaneously reduced dimension 65536 7 dimensions, face recognition performed on the spatial dimensionality reduction. However, due to its dimensionality reduction using PCA, general description of the best characterized (main component), rather than reserved classification characteristic. If we want to achieve better recognition results, it should reduce the dimensionality of the data set by LDA method, such that different face having certain distinguishing feature of the projection.

From the application point of view, we can grasp a basic principle - using PCA to reduce the dimension of the task unsupervised, supervised the application of the LDA.


K- means algorithm proof of the convergence

First, we need to know the iterative algorithm K-means clustering is actually an expectation-maximization algorithm (Expectation-Maximization algorithm), referred to as the EM algorithm. EM algorithm to solve the problem is to estimate the parameters of the implicit variable contains can not be observed in the case of a probability model. Suppose there are m samples was observed, the model parameters [theta] is, to maximize the log likelihood function can be written as follows:

When the probability model contains implicit variables can not be observed, the parameters of maximum likelihood estimation becomes:

Since z (i) is unknown, can not be directly solved by maximum likelihood estimation parameters, then we need to use the EM algorithm to solve it. Suppose distribution corresponding to z (i) to and meet . Use Jensen's inequality, we get:

For equality holds in the above formula, meet the need , wherein c is a constant, and satisfies

; Therefore , no equation referred to as the right side of the function r (x | θ). When the equation holds, we find the equivalent of a lower bound for the approximation of a function to be optimized, then it can make to improve the function to be optimized for the better by maximizing the lower bound.

Figure 5.5 is a one-dimensional example is [theta], wherein the curve represents the function of our brown be optimized, referred to as f (θ), that is, to find the optimization process such that f (θ) is the maximum value of θ. In the current values of [theta] (i.e., the green position in the drawing), may be calculated , then the function of the right side of Inequality (referred to as r (x | θ)) optimization function gives a lower bound, blue FIG. It is shown in the graph, wherein when the value of the two curves at θ equal. Then find that r (x | θ) maximizes parameters θ ', i.e., a position shown in red, then f (θ') has improved over the value f (θ) (at the position of the green). Can be demonstrated, f (θ ') ≥ r (x | θ) = f (θ), so the function is monotonic, and thus the function is bounded. Depending on the nature bounded monotonic function will convergence, the convergence of the EM algorithm is proved. But the EM algorithm only guarantee the convergence to local optima. When the function is non-convex, in Figure 5.5, for example, if the area on the left when the initialization is unable to find the right high.

Derived from the above, EM algorithm may be summarized as follows frame, alternating the two steps until convergence (1) E: calculating a desired latent variables

(2) M step: Maximize

The remaining thing is to explain the relationship between K-means algorithm and the EM algorithm. K-means algorithm is equivalent to solving the problem with maximum likelihood using the EM algorithm hidden variables: 

Which is a hidden variable model. Intuitively understand, is that when the sample x k-th cluster from the center point μk distance recently, probability proportional to 0 otherwise.

In step E, is calculated

This is equivalent to finding the nearest cluster current z (i) for each point x (i) in the K-means algorithm.

In step M, to find the optimal parameters , such that the maximum likelihood function:

After derivation available 

Thus, this step is equivalent to finding the optimum center point , so that the loss function is minimized at this time each sample x (i) corresponding to the cluster z (i) has been determined, and therefore the optimum center point of each cluster corresponding to k μk can be obtained from the average of all points in the cluster, which is in accordance with step K-means algorithm to update the current distribution of the cluster centers of the cluster are equivalent.

...... These are "a hundred faces Machine Learning: Algorithms Engineer take you to the interview," in part of the essence.

15 line algorithm engineer,

Video streaming from the world's top media companies hulu,

After the interview Guoshuo one hundred candidates,

Publication 124 questions based on the original surface of the real scene,

Which lasted six months collectively edited,

Listed on the first day of the new book standings ranked Jingdong a computer.

The original price of 89 yuan

50% off promotional activities to school season, only 44.5 yuan Oh ~

The book has been added to the e-book VIP membership card, as long as you can buy VIP membership card reading hundreds of free e-book, this free VIP card in addition to let you read, there is more interest in waiting for you to lead, down pull ↓

Author Images

"Hundred face machine learning" learning context diagram

Former executive vice president of Microsoft, the US Academy of Engineering Shum, highly recognized book: "This book is dedicated to the popularity of artificial intelligence and machine learning to help each software engineers become confident AI practitioners, each data become outstanding scientists the AI ​​researchers. "

"The wave of the summit," "mathematical beauty" Wu Jun also very reputation of the book:. "This book is teaching how we build bridges between theory and computer algorithms and specific application which allows practitioners to computer recognition of the theory of a leap, but also allows non-computer professional computer science engineers understand this powerful tool. "


Scan code 5% discount
to seize the AI outlets, spring trick victory,

See "a hundred faces Machine Learning: Algorithms Engineer take you to the interview"!

The book has been added to the e-book VIP membership card, as long as you can buy VIP membership card reading hundreds of free e-book, this free VIP card in addition to let you read, there is more interest in waiting for you to lead, down pull ↓

If you want to ask for more e-books or videos

Can be added to the exchange group oh ~

Add a little sister micro letter notes codebook

Poke "read the original" together to charge it!

Released 1861 original articles · won praise 40000 + · Views 16,860,000 +

Guess you like

Origin blog.csdn.net/csdnnews/article/details/105021195