Lecture 4- excercises in the tutorials

For the 24 people involved, the local encoding is created using a sparse 24-dimensional vector with all components zero, except one. E.g.

Colin \equiv(1,0,0,0,0,\ldots,0)(1,0,0,0,0,,0), Charlotte \equiv(0,0,1,0,0,\ldots,0)(0,0,1,0,0,,0), Victoria \equiv (0,0,0,0,1,\ldots,0)(0,0,0,0,1,,0)

and so on.

Why don't we use a more succinct encoding like the ones computers use for representing numbers in binary ?

Colin \equiv (0, 0, 0, 0, 1)(0,0,0,0,1), Charlotte \equiv (0, 0, 0, 1, 1)(0,0,0,1,1), Victoria \equiv (0, 0, 1, 0, 1)(0,0,1,0,1)

etc, even though this encoding will use 5-dimensional vectors as opposed to 24-dimensional ones.

Check all that apply.

正确 


In what ways is the task of predicting 'B' given 'A R' different from predicting a class label given the pixels of an image? Check all that apply.

E=\frac{1}{2}(y-t)^2E=21(yt)2, where y = \sigma(z) = \frac{1}{1+\exp(-z)}y=σ(z)=1+exp(z)1, derivatives tend to "plateau-out" when yy is close to 0 or 1.

Which of the following statements are true ?

If \mathbf{z} = (z_1, z_2, \ldots z_k)z=(z1,z2,zk) is the input to a k-way softmax unit, the output distribution is \mathbf{y}=(y_1, y_2, \ldots y_k)y=(y1,y2,yk), where

y_i = \dfrac{\exp(z_i)}{\sum_j\exp(z_j)}yi=jexp(zj)exp(zi)

Which of the following statements are true ?

Consider the following two networks with no bias weights. The network on the left takes 3 n-length word vectors corresponding to the previous 3 words, computes 3 d-length individual word-feature embeddings and then a k-length joint hidden layer which it uses to predict the 4th word. The network on the right is comparatively simpler in that it takes the previous 3 words and uses them to predict the 4th word.

If n=100,000n=100,000d=1,000d=1,000 and k=10,000k=10,000, which network has more parameters?

The network on the left.






猜你喜欢

转载自blog.csdn.net/sophiecxt/article/details/80670496