A principal component regression hidden trap of thinking

A principal component regression hidden trap of thinking

Recently encountered a strange thing when some economic data Principal Components Regression: variable \ (X_1 \) , \ (X_2 \) and \ (X_3 \) do \ (Y \) explanatory variables regression coefficient is significant extracting \ (X_1 \) , \ (X_2 \) and \ (X_3 \) of the first principal component \ (P_1 \) , \ (P_1 \) do \ (Y \) of the explanatory variables is not significant , strange that.

Later want to understand, this is actually a process of applying principal component regression of a hidden trap of thinking.

Conventional processes the principal component regression

  1. Find the dependent variable regression analysis based on business knowledge or \ (Y \) a number of explanatory variables \ (X_1, X_2, \ dots \)
  2. Extraction explanatory variable ranking minority principal component \ (P_1, P_2, \ dots \)
  3. With \ (P_1, P_2, \ dots \) do explanatory variables of \ (Y \) Regression Analysis

Conventional three-step process described above is the principal component regression, but which hides in a trap of thinking, that is, \ (Y \) will inevitably be a small number of top-ranked and principal component regression to establish a relationship , this is actually a preconceived misconceptions .

In fact, \ (the Y-\) may only and principal component regression after the ranking was established relationship .

An example of structure

\ (P_1 \) , \ (P_2 \) and \ (P_3 \) are three independent random variables, variance in descending order. \ (X_1 \) , \ (X_2 \) and \ (X_3 \) are \ (P_1 \) , \ (P_2 \) and \ (P_3 \) linear combination:

\[ \begin{bmatrix} X_1\\ X_2\\ X_3 \end{bmatrix} = A \times \begin{bmatrix} P_1\\ P_2\\ P_3 \end{bmatrix} \]

Wherein \ (A \) is invertible matrix.

If \ (X_1 \) , \ (X_2 \) and \ (X_3 \) call the shots, then component analysis, principal component is obtained by \ (P_1 \) , \ (P_2 \) and \ (P_3 \) .

If \ (the Y = P_3 + \ varepsilon \) , \ (\ varepsilon \) and \ (P_1 \) , \ (P_2 \) and \ (P_3 \) independently. Obviously, \ (X_1 \) , \ (X_2 \) and \ (X_3 \) and \ (Y \) may establish a regression, but \ (Y \) and the first principal component \ (P_1 \) is there is no relationship.

Guess you like

Origin www.cnblogs.com/xuruilong100/p/12171028.html