机器学习测试Week3_1Logistic Regression

1 Suppose that you have trained a logistic regression classifier, and it outputs on a new example xxx a prediction hθ(x)h_\theta(x)hθ(x) = 0.4. This means (check all that apply):

Our estimate for P(y=1∣x;θ)P(y=1|x;\theta)P(y=1x;θ) is 0.4.
Our estimate for P(y=1∣x;θ)P(y=1|x;\theta)P(y=1x;θ) is 0.6.

Our estimate for P(y=0∣x;θ)P(y=0|x;\theta)P(y=0x;θ) is 0.4.

Our estimate for P(y=0∣x;θ)P(y=0|x;\theta)P(y=0x;θ) is 0.6.

预测的是y=1的概率,所以选择ad

2

Suppose you have the following training set, and fit a logistic regression classifier hθ(x)=g(θ0+θ1x1+θ2x2)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2)hθ(x)=g(θ0+θ1x1+θ2x2).

假设你有一个训练集,逻辑回归分类器的h,选择正确的

Which of the following are true? Check all that apply.

Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2)hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22) ) could increase how well we can fit the training data.

增加特征数,可以更好的拟合训练集

At the optimal value of θ\thetaθ (e.g., found by fminunc), we will have J(θ)≥0J(\theta) \geq 0J(θ)0.

在最优值theta值的情况下,我们会得到j>=0

Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2)hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22) ) would increase J(θ)J(\theta)J(θ) because we are now summing over more terms.

增加特征数会导致J(theta)变大,因为对更多的特征数求和,是不对的,我们可以通过设置theta的大小来控制J(theta)

If we train gradient descent for enough iterations, for some examples x(i)x^{(i)}x(i) in the training set it is possible to obtain hθ(x(i))>1h_\theta(x^{(i)}) > 1hθ(x(i))>1.

h是一个0到1的数

选择AB

3

  • For logistic regression, the gradient is given by θjJ(θ)=1mmi=1(hθ(x(i))y(i))x(i)j

    . Which of these is a correct gradient descent update for logistic regression with a learning rate of α? Check all that apply.

    • θ:=θα1mmi=1(θTxy(i))x(i)

  • θj:=θjα1mmi=1(11+eθTx(i)y(i))x(i)j (simultaneously update for all
j). θj:=θjα1mmi=1(hθ(x(i))y(i))x(i)(simultaneously update for all j). θj:=θjα1mmi=1(hθ(x(i))y(i))x(i)j (simultaneously update for all j).选BD

4 Which of the following statements are true? Check all that apply.

The one-vs-all technique allows you to use logistic regression for problems in which each y(i)y^{(i)}y(i) comes from a fixed, discrete set of values.

For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).因为是凸函数

Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification). 由于我们在有两个类时训练一个分类器,因此当有三个类时我们训练两个分类器(并且我们进行一对一分类)。一般都只训练一个分类器

The cost function J(θ)J(\theta)J(θ) for logistic regression trained with m≥1m \geq 1m1 examples is always greater than or equal to zero.

选AD

猜你喜欢

转载自blog.csdn.net/wo8vqj68/article/details/80240186