机器学习测试Week3_1Logistic Regression

1 Suppose that you have trained a logistic regression classifier, and it outputs on a new example $x$ a prediction $hθ(x)h_\theta(x)$ = 0.4. This means (check all that apply):

Our estimate for

P(y=1∣x;θ)P(y=1|x;\theta)

is 0.4.

Our estimate for

P(y=1∣x;θ)P(y=1|x;\theta)

is 0.6.

Our estimate for $P(y=0∣x;θ)P(y=0|x;\theta)$ is 0.4.

Our estimate for $P(y=0∣x;θ)P(y=0|x;\theta)$ is 0.6.

预测的是y=1的概率，所以选择ad

Suppose you have the following training set, and fit a logistic regression classifier $hθ(x)=g(θ0+θ1x1+θ2x2)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2)$ .

假设你有一个训练集，逻辑回归分类器的h,选择正确的

Which of the following are true? Check all that apply.

Adding polynomial features (e.g., instead using $hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2)$ ) could increase how well we can fit the training data.

增加特征数，可以更好的拟合训练集

At the optimal value of $θ\theta$ (e.g., found by fminunc), we will have $J(θ)≥0J(\theta) \geq 0$ .

在最优值theta值的情况下，我们会得到j>=0

Adding polynomial features (e.g., instead using $hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2)$ ) would increase $J(θ)J(\theta)$ because we are now summing over more terms.

增加特征数会导致J（theta）变大，因为对更多的特征数求和，是不对的，我们可以通过设置theta的大小来控制J（theta）

If we train gradient descent for enough iterations, for some examples $x^{(i)}$ in the training set it is possible to obtain $hθ(x(i))>1h_\theta(x^{(i)}) > 1$ .

h是一个0到1的数

选择AB

For logistic regression, the gradient is given by ∂∂θjJ(θ)=1m∑mi=1(hθ(x(i))−y(i))x(i)j
. Which of these is a correct gradient descent update for logistic regression with a learning rate of α? Check all that apply.
- θ:=θ−α1m∑mi=1(θTx−y(i))x(i)
θj:=θj−α1m∑mi=1(11+e−θTx(i)−y(i))x(i)j (simultaneously update for all

j). θj:=θj−α1m∑mi=1(hθ(x(i))−y(i))x(i)(simultaneously update for all j). θj:=θj−α1m∑mi=1(hθ(x(i))−y(i))x(i)j (simultaneously update for all j).选BD

4 Which of the following statements are true? Check all that apply.

The one-vs-all technique allows you to use logistic regression for problems in which each $y^{(i)}$ comes from a fixed, discrete set of values.

For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).因为是凸函数

Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification). 由于我们在有两个类时训练一个分类器，因此当有三个类时我们训练两个分类器（并且我们进行一对一分类）。一般都只训练一个分类器

The cost function $J(θ)J(\theta)$ for logistic regression trained with $\geq 1$ examples is always greater than or equal to zero.

选AD

机器学习测试Week3_1Logistic Regression

猜你喜欢