Quiz 3 in lecture 3

Which of the following neural networks are examples of a feed-forward neural network?

未选择的是正确的

这应该被选择

正确

A feed-forward network does not have cycles.

第 2 个问题

错误

0 / 1 分

2。第 2 个问题

Consider a neural network with only one training case with input $\mathbf{x} = (x_1, x_2, \ldots, x_n)^\top$ and correct output $t$ . There is only one output neuron, which is linear, i.e. $\mathbf{w}^\top\mathbf{x}$ (notice that there are no biases). The loss function is squared error. The network has no hidden units, so the inputs are directly connected to the output neuron with weights $\mathbf{w} = (w_1, w_2, \ldots, w_n)^\top$ . We're in the process of training the neural network with the backpropagation algorithm. What will the algorithm add to $w_i$ for the next iteration if we use a step size (also known as a learning rate) of $\epsilon$ ?

$\epsilon( \mathbf{w}^\top\mathbf{x}-t)x_i$

这个选项的答案不正确

$x_i$ if $\mathbf{w}^\top\mathbf{x} > t$

$-x_i$ if $\mathbf{w}^\top\mathbf{x} \leq t$

未选择的是正确的

$x_i$

未选择的是正确的

$-\epsilon( \mathbf{w}^\top\mathbf{x}-t)x_i$

正确

There are multiple components to this, all multiplied together: the learning rate, the derivative of the loss function w.r.t. the state of the output unit, and the derivative of the input to the output unit w.r.t. $w_i$ .

第 3 个问题

错误

0 / 1 分

3。第 3 个问题

Suppose we have a set of examples and Brian comes in and duplicates every example, then randomly reorders the examples. We now have twice as many examples, but no more information about the problem than we had before. If we do not remove the duplicate entries, which one of the following methods will not be affected by this change, in terms of the computer time (time in seconds, for example) it takes to come close to convergence?

Full-batch learning.

未选择的是正确的

Mini-batch learning, where for every iteration we randomly pick 100 training cases.

这个选项的答案不正确

After Brian's intervention, most mini-batches will contain duplicates and will therefore provide less information.

Online learning, where for every iteration we randomly pick a training case.

正确

Full-batch learning needs to look at every example before taking a step, therefore each step will be twice as expensive. Online learning only looks at one example at a time so each step has the same computational cost as before. On expectation, online learning would make the same progress after looking at half of the dataset as it would have if Brian has not intervened.

Although this example is a bit contrived, it serves to illustrate how online learning can be advantageous when there is a lot of redundancy in the data.

第 4 个问题

错误

0 / 1 分

4。第 4 个问题

Consider a linear output unit versus a logistic output unit for a feed-forward network with no hidden layer shown below. The network has a set of inputs $x$ and an output neuron $y$ connected to the input by weights $w$ and bias $b$ .

We're using the squared error cost function even though the task that we care about, in the end, is binary classification. At training time, the target output values are $1$ (for one class) and $0$ (for the other class). At test time we will use the classifier to make decisions in the standard way: the class of an input $x$ according to our model after training is as follows:

$\text{class of }x=$

{1 if w T x + b \geq 0 0 otherwise

class of x = {1 if w^{T} x + b \geq 0 0 otherwise

Note that we will be training the network using $y$ , but that the decision rule shown above will be the same at test time, regardless of the type of output neuron we use for training.

Which of the following statements is true?

Unlike a linear unit, using a logistic unit will not penalize is for getting things right too confidently.

正确

If the target is 1 and the prediction is 100, the logistic unit will squash this down to a number very close to 1 and so we will not incur a very high cost. With a linear unit, the difference between the prediction and target will be very large and we will incur a high cost as a result, despite the fact that we get the classification decision correct.

The error function (the error as a function of the weights) for both types of units will form a quadratic bowl.

未选择的是正确的

At the solution that minimizes the error, the learned weights are always the same for both types of units; they only differ in how they get to this solution.

未选择的是正确的

For a logistic unit, the derivatives of the error function with respect to the weights can have unbounded magnitude, while for a linear unit they will have bounded magnitude.

这个选项的答案不正确

This cannot be true. The derivative of the squared error with respect to the weights when using a linear unit depends on the distance between the prediction and the target. In other words: the further the prediction from the target, the larger the magnitude of the gradient. The prediction can be arbitrarily bad.

第 5 个问题

错误

0 / 1 分

5。第 5 个问题

Consider a neural network with one layer of logistic hidden units (intended to be fully connected to the input units) and a linear output unit. Suppose there are $n$ input units and $m$ hidden units. Which of the following statements are true? Check all that apply.

As long as $\geq 1$ , this network can learn to compute any function that can be learned by a network without any hidden layers (with the same inputs).

这个选项的答案不正确

If the weights into the hidden layer are very small, and the weights out of it are large (to compensate), then the hidden units behave like linear units, which makes lots of things possible.

Any function that can be learned by such a network can also be learned by a network without any hidden layers (with the same inputs).

未选择的是正确的

If $m > n$ , this network can learn more functions than if $m$ is less than $n$ (with $n$ being the same).

这应该被选择

A network with $m > n$ has more learnable parameters than a network without any hidden layers (with the same inputs).

正确

The bulk of the learnable parameters is in the connections from the input units to the hidden units. There are $\cdot n$ learnable parameters there.

第 6 个问题

正确

1 / 1 分

6。第 6 个问题

Brian wants to make his feed-forward network (with no hidden units) using a linearoutput neuron more powerful. He decides to combine the predictions of two networks by averaging them. The first network has weights $w_1$ and the second network has weights $w_2$ . The predictions of this network for an example $x$ are therefore:

y=12wT1x+12wT2x

Can we get the exact same predictions as this combination of networks by using a single feed-forward network (again with no hidden units) using a linear output neuron and weights $w_3=\frac{1}{2}(w_1+w_2)$ ?

Yes

正确

未选择的是正确的

第 6 个问题

Brian wants to make his feed-forward network (with no hidden units) using a logisticoutput neuron more powerful. He decides to combine the predictions of two networks by averaging them. The first network has weights $w_1$ and the second network has weights $w_2$ . The predictions of this network for an example $x$ are therefore:

y=1211+e−z1+1211+e−z2 with $z_1=w_1^Tx$ and $z_2=w_2^Tx$ .

Can we get the exact same predictions as this combination of networks by using a single feed-forward network (again with no hidden units) using a logistic output neuron and weights $w_3=\frac{1}{2}(w_1+w_2)$ ?

Yes

这个选项的答案不正确

这应该被选择

第 6 个问题

Brian wants to make his feed-forward network (with no hidden units) using a logisticoutput neuron more powerful. He decides to combine the predictions of two networks by averaging them. The first network has weights $w_1$ and the second network has weights $w_2$ . The predictions of this network for an example $x$ are therefore:

y=1211+e−z1+1211+e−z2 with $z_1=w_1^Tx$ and $z_2=w_2^Tx$ .

Can we get the exact same predictions as this combination of networks by using a single feed-forward network (again with no hidden units) using a logistic output neuron and weights $w_3=\frac{1}{2}(w_1+w_2)$ ?

Yes

这个选项的答案不正确

这应该被选择

2。第 2 个问题

3。第 3 个问题

4。第 4 个问题

5。第 5 个问题

6。第 6 个问题

第 6 个问题

第 6 个问题

猜你喜欢