Tensorflow negative sampling function Sampled softmax loss stepped pit tour

Disclaimer: This article is original, All Rights Reserved https://blog.csdn.net/weixin_41864878/article/details/89032038

Google 16 years out of paper " Deep Neural Networks for Youtube Recommendation mentioned in the article" adopted a negative samples thought to be extreme multiclass classification task
my tensorflow achieve uploaded CSDN resources https://download.csdn.net/download/ weixin_41864878 / 11,107,472
Tensorflow offers two negative samples, respectively NCE loss and sampled softmax loss, the biggest difference between the two is that for different tasks, code implements both only different final loss function, a function of both the sample used and calculation methods are the same logits
NCE loss

  sampled_losses = sigmoid_cross_entropy_with_logits(
      labels=labels, logits=logits, name="sampled_losses")

Sampled softmax loss

  labels = array_ops.stop_gradient(labels, name="labels_stop_gradient")
  sampled_losses = nn_ops.softmax_cross_entropy_with_logits_v2(
      labels=labels, logits=logits)

Obviously, NCE can focus on multi-label, on the application of NLP tasks more, and the latter for a single label classification task
experiment I used the latter
sampled softmax text: the On the Using Very Large Target Vocabulary for Neural Machine - Search.com
emphasize here in TF sampling methods, there are four kinds:

function Sampling Method
1 log_uniform_candidate_sampler Tags can only be used if the frequency and inversely proportional to the order of occurrence, so the data needs to be cleaned, re-label mapping
2 learned_unigram_candidate_sampler We do not know of any case applicable to the distribution of the label
3 uniform_candidate_sampler Uniform sampling
4 fixed_unigram_candidate_sampler It allows users to specify a probability

The default code is the first, if you want to change the need to rewrite their own
, according to official guidance on the code, I constructed the following code:
(specific parameters of the venue and the official code ~)

def net_factory():
    input = input
    net = net() #网络结构,注意这里是softmax输出前的embedding
    output = tf.nn.softmax()
    ##
    weights = tf.Variable(tf.truncated_normal([num_class, embedding_size],
                      stddev=1.0 / math.sqrt(embedding_size)))
    biases = tf.Variable(tf.zeros([num_class]))
    loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(weights, biases, train_labels, net,
                 num_sampled, num_class))

The first error First there is a device error, because this calculation can not get on the GPU, must be calculated in the CPU (run on the GPU is not up anyway, please correct me if misunderstood)
therefore written in the session config :

config = tf.ConfigProto(allow_soft_placement = True)

Or in the network definition, loss calculation before adding with tf.device ( '/ cpu: 0 ')
it appears can not be calculated in tf.train.GradientDescentOptimizer (learning_rate) .compute_gradients (loss), the gradient is given the name to None, but can tf.train.GradientDescentOptimizer (learning_rate) .minimize (loss) calculation, but because I have a regular need to calculate the losses, must solve this problem, so after the access to information, modify the code as follows:
(self inside me copied too lazy to modify ..)

    self.weights = tf.get_variable('soft_weight',[self.item_classes, self.embedding_size], initializer=tf.variance_scaling_initializer())
    biases = tf.get_variable('soft_biases', initializer=tf.zeros([self.item_classes]), trainable=False)
    loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(weights, biases, train_labels, net,
                 num_sampled, num_class))

So he can have fun with other loss calculated by adding the gradient, and I saw an open source project on github basically wrote the first way, if the program is basically a single loss in a common format on it

Effect of number of negative samples : the current situation is substantially no influence on the calculation of loss, and 100 is set to 10 when the value of loss changes significantly, the number 10 is decreased rapidly loss

Reference:
GitHub snippet: https://github.com/tensorflow/tensorflow/blob/r1.10/tensorflow/python/ops/nn_impl.py
https://github.com/tensorflow/tensorflow/blob/master/tensorflow /python/ops/candidate_sampling_ops.py
official guidebook Demo: https://docs.pythontab.com/tensorflow/tutorials/word2vec/#_3
http://www.algorithmdog.com/tf-candidate-sampling
HTTPS: // blog.csdn.net/u010223750/article/details/69948463
https://blog.csdn.net/wuzqChom/article/details/77073246

Guess you like

Origin blog.csdn.net/weixin_41864878/article/details/89032038