Prediction function:
The value of is the probability of y=1, 1- is the probability that y=0.
So y~B(1, ), a two-point distribution.
The distribution of y is listed as
Likelihood function (meaning to maximize the probability of what has already happened)
The next step is to add log and find the process of partial derivative.