Chapter IX -EM algorithm

Starting from the ninth chapter, we learn something different from the summary, Section 2-8 is classification, belong to the supervised learning, Chapter 9 EM algorithm for unsupervised learning. This paper is a summary of the application process and deal with the problem and the principle of the EM algorithm is derived.

EM algorithm

EM algorithm (Expectation Maximization algorithm expectation maximization algorithm) is an iterative algorithm. When we are faced with the probability model, both observed variables , but also contains hidden variables or latent variables. If probabilistic model variables were observed variables , then the given data can be directly used maximum likelihood estimation method or model Bayesian estimation parameters, however, when models with hidden variables time, so you can not simply estimate, at this time, in 1977, Dempster, who proposed EM algorithm summary: E-step: find desired (expectation); M step: find the maximum value (maximization) .
\ [Input: data observed variables Y, hidden variable data Z, the joint distribution P (Y, Z | \ theta), the conditional distribution P (Z | Y, \ theta). \\ output: the model parameters \ theta. \\ (1) selects the initial value of the parameter \ theta ^ {(0)}, the iteration begins. \\ (2) Step E **: ** note \ theta ^ {(i)} i for the first iteration parameter \ estimated value of theta, the E step of the iteration i + 1, is calculated \\ \ begin { aligned} Q (\ theta, \ theta ^ {(i)}) = & E_Z \ big [\ ln P (Y, Z | \ theta) | Y, \ theta ^ {(i)} \ big] \ = & \ sum_Z \ ln P (Y, Z | \ theta) P (Z | Y, \ theta ^ {(i)}) \ end {aligned} \\ here, P (Z | Y, \ theta ^ {(i) }) in a given observation data Y and the current parameter estimation \ theta ^ {(i)} at the conditional probability distribution of the hidden variable Z data. \\ (3) ** M step: ** find that Q (\ theta, \ theta ^ {(i)}) maximizing the \ theta, determining an estimated value of the parameter i + 1 iteration \ theta ^ { (i + 1)} \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} Q (\ theta, \ theta ^ {(i)}) \\ (4) repeat step (2) and (3) step, until convergence (convergence conditions: \ theta ^ {(i)} and \ theta ^ {(i + 1)} is close to, or \\ Q (\ theta ^ {(i + 1)}, \ theta ^ {(i)}) and Q (\ theta ^ {(i)}, \ theta ^ {(i-1)}) is very close). Function Q (\ theta, \ theta ^ {(i)}) is the core of the EM algorithm, called the Q function. \]

The derivation process

EM algorithm explained above, but why EM algorithm can achieve similar observations maximum likelihood estimate it? By solving the approximate number of the observed effect on the maximum data similar problem likelihood function is derived EM algorithm to understand the EM algorithm below.

Used in the derivation formula:

\ [Jenson inequalities: f (\ sum_i \ alpha_i x_i) \ geqslant \ sum_i \ alpha_i f (x_i) where the function f is convex, then \\ convex function is a logarithmic function, \ displaystyle \ sum_i \ alpha_i = 1, \ alpha_i is the weight, 0 \ leqslant \ alpha_i \ leqslant 1 \]

\ [First there is a need observation vector \ theta, observation data Y = (y_1, y_2, \ cdots, y_N), hidden variable Z = (z_1, z_2, \ cdots, z_N), \\ when solving \ theta, The likelihood function is \ begin {aligned} L (\ theta) = \ ln P (Y | \ theta) \ = \ ln \ sum_Z P (Y, Z | \ theta) \ = \ ln (\ sum_Z P (Z | \ theta) P (Y | Z, \ theta)) \ end {aligned} \\ assumed after iteration i \ theta estimated value of \ theta ^ {(i)}, the new estimated value desired \ theta can L (\ theta) increases, i.e. L (\ theta)> L (\ theta ^ {(i)}), the difference between the two can be calculated: \\ L (\ theta) -L (\ theta ^ {(i )}) = \ ln (\ sum_Z P (Z | \ theta) P (Y | Z, \ theta)) - \ ln P (Y | \ theta ^ {(i)}) \\ In general, \ ln P_1 P_2 \ cdots P_N the better deal, but if it is \ ln \ sum P_1 P_2 it hard to deal with, in order to \\ \ sum summation symbols removed, treated with Jenson zoom inequality. \\ For the above forms, Z sum, hash out how to have a Jenson Inequality \ alpha_i it? \\ it is easy to think about the Z density function of the density function values ​​sum to 1, Z need to construct a probability distribution. \\ \ begin {aligned} L (\ theta) -L (\ theta ^ {(i)}) = \ ln (\ sum_Z P (Z | \ theta) P (Y | Z, \ theta)) - \ ln P (Y | \ theta ^ {(i)}) \\ = \ ln (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ frac {P (Z | \ theta) P (Y | Z, \ theta)} {P (Z | Y, \ Theta ^ {(i)}) P (Y | \ theta ^ {(i)})} \\ \ therefore L (\ theta) \ geqslant B (\ theta, \ theta ^ {(i)}), \ \ in other words B (\ theta, \ theta ^ {(i)}) is L (\ theta) is a lower bound, to maximize L (\ theta), i.e., maximizing B (\ theta, \ theta ^ {( i)}). \\ \ begin {aligned} \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} B (\ theta, \ theta ^ {(i)}) = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Z | \ theta) P (Y | Z, \ theta)) \\ = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta)) \ end { aligned} \\ \ displaystyle \ because Q (\ theta, \ theta ^ {(i)}) = \ sum_Z \ ln P (Y, Z | \ theta) P (Z | Y, \ theta ^ {(i)} ) \\ \ displaystyle \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} (Q (\ theta, \ theta ^ {(i)})) \\ M is equivalent to the step of the EM algorithm, E-step is equivalent to finding \ displaystyle \ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta), \\ above for get the EM algorithm, by constantly solving maximizing a lower bound approximation to solve the log-likelihood function is maximized. \] \ Theta ^ {(i)}), \\ other words B (\ theta, \ theta ^ {(i)}) is L (\ theta) is a lower bound, to maximize L (\ theta), i.e., the maximum of B (\ theta, \ theta ^ {(i)}). \\ \ begin {aligned} \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} B (\ theta, \ theta ^ {(i)}) = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Z | \ theta) P (Y | Z, \ theta)) \\ = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta)) \ end { aligned} \\ \ displaystyle \ because Q (\ theta, \ theta ^ {(i)}) = \ sum_Z \ ln P (Y, Z | \ theta) P (Z | Y, \ theta ^ {(i)} ) \\ \ displaystyle \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} (Q (\ theta, \ theta ^ {(i)})) \\ M is equivalent to the step of the EM algorithm, E-step is equivalent to finding \ displaystyle \ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta), \\ above for get the EM algorithm, by constantly solving maximizing a lower bound approximation to solve the log-likelihood function is maximized. \] \ Theta ^ {(i)}), \\ other words B (\ theta, \ theta ^ {(i)}) is L (\ theta) is a lower bound, to maximize L (\ theta), i.e., the maximum of B (\ theta, \ theta ^ {(i)}). \\ \ begin {aligned} \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} B (\ theta, \ theta ^ {(i)}) = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Z | \ theta) P (Y | Z, \ theta)) \\ = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta)) \ end { aligned} \\ \ displaystyle \ because Q (\ theta, \ theta ^ {(i)}) = \ sum_Z \ ln P (Y, Z | \ theta) P (Z | Y, \ theta ^ {(i)} ) \\ \ displaystyle \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} (Q (\ theta, \ theta ^ {(i)})) \\ M is equivalent to the step of the EM algorithm, E-step is equivalent to finding \ displaystyle \ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta), \\ above for get the EM algorithm, by constantly solving maximizing a lower bound approximation to solve the log-likelihood function is maximized. \] \ Theta ^ {(i)}). \\ \ begin {aligned} \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} B (\ theta, \ theta ^ {(i)}) = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Z | \ theta) P (Y | Z, \ theta)) \\ = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta)) \ end { aligned} \\ \ displaystyle \ because Q (\ theta, \ theta ^ {(i)}) = \ sum_Z \ ln P (Y, Z | \ theta) P (Z | Y, \ theta ^ {(i)} ) \\ \ displaystyle \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} (Q (\ theta, \ theta ^ {(i)})) \\ M is equivalent to the step of the EM algorithm, E-step is equivalent to finding \ displaystyle \ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta), \\ above for get the EM algorithm, by constantly solving maximizing a lower bound approximation to solve the log-likelihood function is maximized. \] \ Theta ^ {(i)}). \\ \ begin {aligned} \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} B (\ theta, \ theta ^ {(i)}) = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Z | \ theta) P (Y | Z, \ theta)) \\ = \ mathop {\ arg \ max} \ limits _ {\ theta} (\ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta)) \ end { aligned} \\ \ displaystyle \ because Q (\ theta, \ theta ^ {(i)}) = \ sum_Z \ ln P (Y, Z | \ theta) P (Z | Y, \ theta ^ {(i)} ) \\ \ displaystyle \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} (Q (\ theta, \ theta ^ {(i)})) \\ M is equivalent to the step of the EM algorithm, E-step is equivalent to finding \ displaystyle \ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta), \\ above for get the EM algorithm, by constantly solving maximizing a lower bound approximation to solve the log-likelihood function is maximized. \] \ Theta ^ {(i)}) \ ln P (Y, Z | \ theta)) \ end {aligned} \\ \ displaystyle \ because Q (\ theta, \ theta ^ {(i)}) = \ sum_Z \ ln P (Y, Z | \ theta) P (Z | Y, \ theta ^ {(i)}) \\ \ displaystyle \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} (Q (\ theta, \ theta ^ {(i)})) \\ equivalent to the M-step of the EM algorithm, E-step is equivalent to finding \ displaystyle \ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta), \\ EM algorithm above is obtained, by finding out a maximum lower bound approximation of log-likelihood function solved maximized. \] \ Theta ^ {(i)}) \ ln P (Y, Z | \ theta)) \ end {aligned} \\ \ displaystyle \ because Q (\ theta, \ theta ^ {(i)}) = \ sum_Z \ ln P (Y, Z | \ theta) P (Z | Y, \ theta ^ {(i)}) \\ \ displaystyle \ therefore \ theta ^ {(i + 1)} = \ mathop {\ arg \ max} \ limits _ {\ theta} (Q (\ theta, \ theta ^ {(i)})) \\ equivalent to the M-step of the EM algorithm, E-step is equivalent to finding \ displaystyle \ sum_Z P (Z | Y, \ theta ^ {(i)}) \ ln P (Y, Z | \ theta), \\ EM algorithm above is obtained, by finding out a maximum lower bound approximation of log-likelihood function solved maximized. \]

Application of EM algorithm in Gaussian mixture model learning

Gaussian mixture model

\ [Gaussian mixture model refers to the probability distribution model of the form: \\ P (y | \ theta) = \ sum_ {k = 1} ^ K \ alpha_k \ phi (y | \ theta_k) \\ wherein, \ alpha_k is a coefficient, \ displaystyle \ alpha_k \ geqslant 0, \ sum_ {k = 1} ^ K \ alpha_k = 1, \ phi (y | \ theta_k) is a Gaussian distribution density, \\\ theta_k = (\ mu_k, \ sigma_k ^ 2), \ phi (y | \ theta) = \ frac {1} {\ sqrt {2 \ pi} \ sigma_k} \ exp \ left (- \ frac {(y- \ mu_k) ^ 2} {2 \ sigma_k ^ 2} \ right) is called the k-th sub-models. \\ introduces Gaussian mixture model that considers only simple one-dimensional random variable y, is normal Gaussian distribution, \\ y \ sim N (\ mu, \ sigma ^ 2), given the observed values ​​y, can it is easy to find \ mu and \ sigma ^ 2, but not from the current y Gaussian distribution, \\ but there is a certain probability from two different Gaussian distribution N (\ mu_1, \ sigma_1 ^ 2) and N (\ mu_2, \ sigma_2 ^ 2), this is the mixing of the two Gaussian distributions, y \\ do not know from which a Gaussian distribution, the hidden variables involved here. For the parameters estimated to contain hidden variables, which you can do the following processing. With a vector \\ \ gamma represented z, if z = 1, then the \ gamma = (1,0,0, \ cdots, 0), \\ if z = 2, then \ gamma = (0,1,0, \ cdots, 0), this is equivalent to one-hot, \\ z that is the i-th Gaussian distribution, in \ i-th component of the gamma 1, the other components are zero. \]

The derivation process

Clear latent variables, write the number of complete data likelihood function

\ [The EM algorithm, there is a hidden variable \ gamma accordance with, \ gamma y represents the current from the Gaussian distribution for the first observation, there \\ \ gamma_1 = (\ gamma_ {11}, \ gamma_ {12}, \ cdots, \ gamma_ {1K}), which is defined according to the book gamma_ \ {jk} a: \ gamma_ {jk} = \ left \ {\ begin {aligned} 1, & observations from the j-th k-th points the model \\ 0, & otherwise \ end {aligned} \ right \\ j = 1,2, \ cdots, N; k = 1, 2, \ cdots, K upper distributed random variables, take a first value probability \ alpha_1, taken \\ probability value of the second \ alpha_2, ......, the probability of taking the K value is \ alpha_K, once you know \ gamma_1 value, we know that several first withdrawn from the Gaussian distribution y_1. \\ \ begin {aligned} p (\ gamma_1, y_1 | \ theta) = p (\ gamma_1 | \ theta) \ cdot p (y_1 | \ gamma_1, \ theta) \\ = \ alpha ^ {\ gamma_ {11} } 1 \ cdot \ alpha ^ {\ gamma {12}} 2 \ cdots \ alpha ^ {\ gamma {1K}} K \ phi (y_1 | \ theta_1) ^ {\ gamma {11}} \ phi (y_2 | \ theta_2) ^ {\ gamma_ {12}} \ cdots \ phi (y_1 | \ theta_K) ^ {\ gamma_ {1K}} = \ prod_ {k = 1} ^ K [\ alpha_k \ phi (y_1 | \ theta_k)] ^ {\ gamma_ {1k}} \ end {aligned} \\ this is the first point of a full data sample density function. Is a maximum likelihood estimation of the maximum of the likelihood function, which requires the joint distribution of all sample points, sample points for all \\, probability density function \\ P (y,

E step of the EM algorithm to determine the function Q

\ [Hidden variables are replaced by the desired, hidden variables \ gamma_ {jk} and n_k \\ \ displaystyle \ because E (n_k) = E \ left (\ sum_j \ gamma_ {jk} \ right) = \ sum_j E ( \ gamma_ {jk}), E (\ gamma_ {jk} | \ theta ^ {(i)}, y) = P (\ gamma_ {jk} = 1 | \ theta ^ {(i)}, y), \ \ solving desirable, based on all y_j previous step \ theta ^ {(i)} and the observed data, it is necessary to know \ gamma_ {jk} distribution P (\ gamma_ {jk} = 1 | \ theta ^ {(i )}, y). \\ \ begin {aligned} \ because P (\ gamma_ {jk} = 1 | \ theta ^ {(i)}, y) = & \ frac {P (\ gamma_ {jk} = 1, y_j | \ theta ^ {(i)})} {P (y_j | \ theta ^ {(i)})} \\ = & \ frac {P (\ gamma_ {jk} = 1, y_j | \ theta ^ {(i)}) } {\ displaystyle \ sum_ {k = 1} ^ KP (\ gamma_ {jk} = 1, y_j | \ theta ^ {(i)})} \\ = & \ frac {P (\ gamma_ {jk} = 1 | \ theta ^ {(i)}) P (y_i | \ gamma_ {jk} = 1, \ theta ^ {(i)})} {\ displaystyle \ sum_ {k = 1} ^ KP (y_j | \ gamma_ { jk} = 1, \ theta ^ {(i)}) P (\ gamma_ {jk} = 1 | \ theta ^ {(i)})} \ end {aligned} \\ \ because \ alpha_k = P (\ gamma_ {jk} = 1 | \ theta), \ phi (y_i | \ theta) = P (y_i | \ gamma_ {jk} = 1, \ theta) \\ \ displaystyle \ therefore E (\ gamma_ {jk} | y, \ Theta ^ {(i)}) = P (\ gamma_ {jk} = 1 | \ theta ^ {(i)}, y) = \ frac {\ alpha_k \ phi (y_i | \ theta ^ {(i)} )} {\ displaystyle \ sum_ {k = 1} ^ K \ alpha_k \ phi (y_i | \ theta ^ {(i)})}, where \ theta ^ {(i)} = (\ alpha_k ^ {(i) }, \ theta_k ^ {desirable under note (i)}) \\ a \ gamma_ {jk} and for a given y \ theta ^ {(i)} conditions for Z_k = E (\ gamma_ {jk} | y , \ theta ^ {(i)}), \\ because it is independent and identically distributed between the various samples, and j is therefore irrelevant Z_k. \\ \ displaystyle \ therefore Q (\ theta, \ theta ^ {(i)}) = E_Z \ big [ln P (y, \ gamma | \ theta ^ {(i)}) \ big] = \ sum_ {k = 1} ^ K \ left \ {(N Z_k) \ ln \ alpha_k + Z_k \ sum_ {j = 1} ^ N [\ ln (\ frac {1} {\ sqrt {2 \ pi}}) - \ ln \ sigma_k - \ frac {1} {2 \ sigma_k ^ 2} (y_j - \ mu_k) ^ 2] \ right \} \] \ Theta ^ {(i)}), \\ because it is independent and identically distributed between the various samples, and j is therefore irrelevant Z_k. \\ \ displaystyle \ therefore Q (\ theta, \ theta ^ {(i)}) = E_Z \ big [ln P (y, \ gamma | \ theta ^ {(i)}) \ big] = \ sum_ {k = 1} ^ K \ left \ {(N Z_k) \ ln \ alpha_k + Z_k \ sum_ {j = 1} ^ N [\ ln (\ frac {1} {\ sqrt {2 \ pi}}) - \ ln \ sigma_k - \ frac {1} {2 \ sigma_k ^ 2} (y_j - \ mu_k) ^ 2] \ right \} \] \ Theta ^ {(i)}), \\ because it is independent and identically distributed between the various samples, and j is therefore irrelevant Z_k. \\ \ displaystyle \ therefore Q (\ theta, \ theta ^ {(i)}) = E_Z \ big [ln P (y, \ gamma | \ theta ^ {(i)}) \ big] = \ sum_ {k = 1} ^ K \ left \ {(N Z_k) \ ln \ alpha_k + Z_k \ sum_ {j = 1} ^ N [\ ln (\ frac {1} {\ sqrt {2 \ pi}}) - \ ln \ sigma_k - \ frac {1} {2 \ sigma_k ^ 2} (y_j - \ mu_k) ^ 2] \ right \} \]

M-step of the EM algorithm to determine

\ [Variables must be estimated \ alpha_k, \ sigma_k, \ mu_k, then partial derivatives equal 0: \\\ begin {array} {l} \ displaystyle \ frac {\ partial Q (\ theta, \ theta ^ {( i)})} {\ partial \ mu_k} = 0 \\ \ displaystyle \ frac {\ partial Q (\ theta, \ theta ^ {(i)})} {\ partial \ sigma_k ^ 2} = 0 \\ \ left \ {\ begin {array} {l} \ displaystyle \ frac {\ partial Q (\ theta, \ theta ^ {(i)})} {\ partial \ alpha_k} = 0 \\ \ sum \ alpha_k = 1 \ . end {array} \ right \ end {array} \\ The above equation can be deduced: \ begin {array} {l} \ mu_k ^ {(i + 1)} = \ frac {\ displaystyle \ sum_ {j = 1} ^ N \ hat {\ gamma_ {jk}} y_j} {\ displaystyle \ sum_ {j = 1} ^ N \ hat {\ gamma_ {jk}}} \\ (\ sigma_k ^ 2) ^ {(i + 1)} = \ frac {\ displaystyle \ sum_ {j = 1} ^ N \ hat {\ gamma_ {jk}} (y_i - \ mu_k) ^ 2} {\ displaystyle \ sum_ {j = 1} ^ N \ hat {\ gamma_ {jk}}} \\ \ displaystyle \ alpha_k ^ {(i + 1)} = \ frac {n_k} {N} = \ frac {\ displaystyle \ sum_ {j = 1} ^ N \ hat {\ gamma_ {jk}}} {N} \\ \ end {array}, where \ displaystyle \ hat {\ gamma_ {jk}} = E \ gamma_ {jk},n_k = \sum_{j=1}^N E \gamma_{jk}, k=1,2,\cdots,K \]

To promote EM algorithm

GEM algorithm

\ [Input: observations, Q output function \\: \\ model parameters (1) initialization parameter \ theta ^ {(0)} = (\ theta ^ {(0)} _ 1, \ theta ^ {(0)} _2, \ cdots, \ theta ^ {(0)} _ d), the iteration; \\ (2) i + 1-th iteration, step 1: Write \ theta ^ {(i)} = (\ theta ^ { (i)} _ 1, \ theta ^ {(i)} _ 2, \ cdots, \ theta ^ {(i)} _ d) as a parameter \ theta = (\ theta_1, \ theta_2, \ cdots, \ theta_d) estimates calculating \\\ begin {aligned} Q (\ theta, \ theta ^ {(i)}) = & E_Z \ big [\ log P (Y, Z | \ theta) | Y, \ theta ^ {(i) } \ big] \ = & \ sum_Z P (Z | Y, \ theta ^ {(i)}) \ log P (Y, Z | \ theta) \ end {aligned} \\ (3) step 2: d times condition maximization: first, \ theta ^ {(i)} _ 2, \ theta ^ {(i)} _ 3, \ cdots, \ theta ^ {(i)} _ find that the k unchanged conditions Q (\ theta, \ theta ^ {(i)}) reached a maximum of \ theta ^ {(i + 1)} _ 1; \\ then, in \ theta_1 = \ theta ^ {(i + 1)} _ 1, \ theta_j = \ theta ^ {(j)} _ j, j = 3,4, \ conditions cdots, k evaluates that the Q (\ theta, \ theta ^ {(i)}) reached a maximum of \ theta ^ { (i + 1)}; \\ This continues through maximization condition d times, to give \ theta ^ {(i + 1)} = (\ theta ^ {(i + 1)} _ 1, \ theta ^ {( i + 1)} _ 2, \ cdots, \ theta ^ {(i + 1)} _ d) such that Q (\ theta ^ {(i + 1)}, \ theta ^ {(I)})> Q (\ theta ^ {(i)}, \ theta ^ {(i)}) \\ (4) repeat (2) and (3), until convergence. \]

Guess you like

Origin www.cnblogs.com/cecilia-2019/p/11537294.html