关于某个复现XLNet的广告文案

在某心培训中，最常见的一个广告就是所谓复现XLNet的。原意是，在面试一个小时中，如果你不能手打XLNet，那么你连基本功都达不到。所以换句话说，倒贴钱都没公司要你。

这个广告造成极坏的影响。姑且不说，后面推荐课程课程一点帮助都没有，其实就是简单的优化理论。

我们先说一下XLNet复现为啥不可能。我们先看XLNet源码。先看这个文件。自己看看多长，我记得打印出来是四十页。四十页一个小时老师能打完我都不相信，更不用说复现。

再说一个更关键，大部分顶会论文不会把所有细节都放出来，比如说这段代码

flags.DEFINE_string("master", default=None,
      help="master")
flags.DEFINE_string("tpu", default=None,
      help="The Cloud TPU to use for training. This should be either the name "
      "used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 url.")
flags.DEFINE_string("gcp_project", default=None,
      help="Project name for the Cloud TPU-enabled project. If not specified, "
      "we will attempt to automatically detect the GCE project from metadata.")
flags.DEFINE_string("tpu_zone",default=None,
      help="GCE zone where the Cloud TPU is located in. If not specified, we "
      "will attempt to automatically detect the GCE project from metadata.")
flags.DEFINE_bool("use_tpu", default=True,
      help="Use TPUs rather than plain CPUs.")
flags.DEFINE_integer("num_hosts", default=1,
      help="number of TPU hosts")
flags.DEFINE_integer("num_core_per_host", default=8,
      help="number of cores per host")
flags.DEFINE_bool("track_mean", default=False,
      help="Whether to track mean loss.")

# Experiment (data/checkpoint/directory) config
flags.DEFINE_integer("num_passes", default=1,
      help="Number of passed used for training.")
flags.DEFINE_string("record_info_dir", default=None,
      help="Path to local directory containing `record_info-lm.json`.")
flags.DEFINE_string("model_dir", default=None,
      help="Estimator model_dir.")
flags.DEFINE_string("init_checkpoint", default=None,
      help="Checkpoint path for initializing the model.")

# Optimization config
flags.DEFINE_float("learning_rate", default=1e-4,
      help="Maximum learning rate.")
flags.DEFINE_float("clip", default=1.0,
      help="Gradient clipping value.")
# lr decay
flags.DEFINE_float("min_lr_ratio", default=0.001,
      help="Minimum ratio learning rate.")
flags.DEFINE_integer("warmup_steps", default=0,
      help="Number of steps for linear lr warmup.")
flags.DEFINE_float("adam_epsilon", default=1e-8,
      help="Adam epsilon.")
flags.DEFINE_string("decay_method", default="poly",
      help="Poly or cos.")
flags.DEFINE_float("weight_decay", default=0.0,
      help="Weight decay rate.")

# Training config
flags.DEFINE_integer("train_batch_size", default=16,
      help="Size of the train batch across all hosts.")
flags.DEFINE_integer("train_steps", default=100000,
      help="Total number of training steps.")
flags.DEFINE_integer("iterations", default=1000,
      help="Number of iterations per repeat loop.")
flags.DEFINE_integer("save_steps", default=None,
      help="Number of steps for model checkpointing. "
      "None for not saving checkpoints")
flags.DEFINE_integer("max_save", default=100000,
      help="Maximum number of checkpoints to save.")

# Data config
flags.DEFINE_integer("seq_len", default=0,
      help="Sequence length for pretraining.")
flags.DEFINE_integer("reuse_len", default=0,
      help="How many tokens to be reused in the next batch. "
      "Could be half of `seq_len`.")
flags.DEFINE_bool("uncased", False,
      help="Use uncased inputs or not.")
flags.DEFINE_integer("perm_size", 0,
      help="Window size of permutation.")
flags.DEFINE_bool("bi_data", default=True,
      help="Use bidirectional data streams, i.e., forward & backward.")
flags.DEFINE_integer("mask_alpha", default=6,
      help="How many tokens to form a group.")
flags.DEFINE_integer("mask_beta", default=1,
      help="How many tokens to mask within each group.")
flags.DEFINE_integer("num_predict", default=None,
      help="Number of tokens to predict in partial prediction.")
flags.DEFINE_integer("n_token", 32000, help="Vocab size")

# Model config
flags.DEFINE_integer("mem_len", default=0,
      help="Number of steps to cache")
flags.DEFINE_bool("same_length", default=False,
      help="Same length attention")
flags.DEFINE_integer("clamp_len", default=-1,
      help="Clamp length")

flags.DEFINE_integer("n_layer", default=6,
      help="Number of layers.")
flags.DEFINE_integer("d_model", default=32,
      help="Dimension of the model.")
flags.DEFINE_integer("d_embed", default=32,
      help="Dimension of the embeddings.")
flags.DEFINE_integer("n_head", default=4,
      help="Number of attention heads.")
flags.DEFINE_integer("d_head", default=8,
      help="Dimension of each attention head.")
flags.DEFINE_integer("d_inner", default=32,
      help="Dimension of inner hidden size in positionwise feed-forward.")
flags.DEFINE_float("dropout", default=0.0,
      help="Dropout rate.")
flags.DEFINE_float("dropatt", default=0.0,
      help="Attention dropout rate.")
flags.DEFINE_bool("untie_r", default=False,
      help="Untie r_w_bias and r_r_bias")
flags.DEFINE_string("summary_type", default="last",
      help="Method used to summarize a sequence into a compact vector.")
flags.DEFINE_string("ff_activation", default="relu",
      help="Activation type used in position-wise feed-forward.")
flags.DEFINE_bool("use_bfloat16", False,
      help="Whether to use bfloat16.")

这些是预训练的核心，有些很容易理解，有些在文中没太提到，但是我自己在训练XlNet（是的，我自己用TPU POD训练过）用过，这些参数极其重要。如果不知道这些参数怎么调，训练出来也没用。

最后一句，国内能训练起XLNet的公司举手呗。

之所以打假，就是不想让这种制造焦虑。我对于贪心学院课程没上过。所以不做评价。但是这个广告文案，过于恶心和低级，请不要在弄了。

关于某个复现XLNet的广告文案

猜你喜欢