Prediction function: ${h}_{\theta}(x)=g({\theta}^{T}x)=\frac{1}{1+{e}^{-{\theta}^{T}x}}$

${h}_{\theta}(x)$The value of is the probability of y=1, 1- ${h}_{\theta}(x)$is the probability that y=0.

So y~B(1, ${h}_{\theta}(x)$), a two-point distribution.

The distribution of y is listed as $p(\mathrm{and})=({h}_{\theta}(x){)}^{\mathrm{and}}(1-{h}_{\theta}(x){)}^{1-\mathrm{and}}$

Likelihood function $L(\theta )=\prod _{i=1}^{m}p(\mathrm{and})$(meaning to maximize the probability of what has already happened)

The next step is to add log and find the process of partial derivative.