Machine learning(3) Linear Discriminant Functions--Least-squares classification

版权声明: https://blog.csdn.net/qq_26386707/article/details/79404122

Machine Learning(3)Least-squares classification


Chenjing Ding
2018/02/28


notation meaning
M the number of mixture components
x_n n-th input vector
N the number of training input vectors
K the number of classes
w a vector of the weight matrix
W weight matrix
X input metrix

To put it clearly, all vectors in this passage are column vector, the transpose of them are row vector; and all Capital letter represents matrix, otherwise it represents a vector.

1.General Classification Problem

1.1 one sample input case

Let’s consider K discriminant linear models:

(1.1.1) y k ( x ) = w k T x + w k 0 , k = 1... K
Both w k and x are vector. if W is a matrix as followed: (1.1.2) W = [ w 1 , w 2 , w k ] = [ w 10 w 20 . . . w K 0 w 11 w 21 . . . w K 1 . . . . . . . . . . . . w 1 D w 2 D . . . w K D ]
then we obtain Y ( x ) which is a column vector,
(1.1.3) Y ( x ) = W T x = [ y 1 ( x )   y 2 ( x )   . . . y K ( x ) ] T

1.2 input as a matrix

For entire data set, X is a matrix.

Y ^ ( X ) = X W

X = [ x 1   x 2   . . .   x N ] T

(1.2.1) T = [ t 1   t 2... t N ] T , Y ^ ( X ) = [ Y ( x 1 )   Y ( x 2 )   . . . Y ( x N ) ] T

and t 1 , t 2 . . . is column vectors , T and Y ^ ( X ) are matrix, T is the target matrix ;

2. Closed-form solution

Try to find the closed-form solution of W, directly to minimize the sum-of-squares error:

E ( W ) = n = 1 N k = 1 K ( y k ( x n ) t n k ) 2 = n = 1 N k = 1 K ( w k T x n t n k ) 2
Let’s formulate the sum-of-squares error in matrix notation:
(2.1) i j a i j 2 = T r ( A T A ) , T r ( A ) A = I

(2.2) E ( W ) = 1 2 T r ( ( X W T ) T ( X W T ) )

(2.3) E ( W ) W = 1 2 E ( W ) ( X W T ) T ( X W T ) ( X W T ) T ( X W T ) W

(2.4) = X T ( X W T ) . . . . u s i n g ( 4 )

and for all matrix A the inverse of A T A must exist.
E ( W ) W = 0 W = ( X T X ) 1 X T T

Thus the closed form solution for y ( x n ) is :
Y ( x n ) = W T x n = ( ( X T X ) 1 X T T ) T x n

3. Problems

  1. Least-squares is very sensitive to outliers!
  2. Least-squares corresponds to Maximum Likelihood under the assumption of a Gaussian conditional distribution.However, our binary target vectors have a distribution that is clearly non-Gaussian (0-1 distribution when K is 2)!

    discuss later

猜你喜欢

转载自blog.csdn.net/qq_26386707/article/details/79404122
今日推荐