翻译 | Improving Distributional Similarity with Lessons Learned from Word Embeddings

叶娜老师说：“读懂论文的最好方法是翻译它”。我认为这是很好的科研训练，更加适合一个陌生领域的探索。因为论文读不懂，我总结无非是因为这个领域不熟悉。如果是自己熟悉的领域，那么读起来肯定会比较顺畅。

原文

摘要

[1] Recent trends suggest that neural-network-inspired word embedding models outperform traditional count-based distributional models on word similarity and analogy detection tasks.

[2] We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyper-parameter optimizations, rather than the embedding algorithms themselves.

[3] Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains.

[4] In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.

结论

[1] Recent embedding methods introduce a plethora of design choices beyond network architecture and optimization algorithms.

[2] We reveal that these seemingly minor variations can have a large impact on the success of word representation methods.

[3] By showing how to adapt and tune these hyper-parameters in traditional methods, we allow a proper comparison between representations, and challenge various claims of superiority from the word embedding literature.

(下启第二段)

[4] This study also exposes the need for more controlled-variable experiments, and extending the concept of “variable” from the obvious task, data, and method to the often ignored preprocessing steps and hyper-parameter settings.

[5] We also stress the need for transparent and reproducible experiments, and commend authors such as Mikolov, Pennington, and others for making their code publicly available.

[6] In this spirit, we make our code available as well.

译文

摘要

[1] 最近的趋势表明，神经网络启发的嵌入词模型在词语相似度和词语类比检测任务上优于传统的基于计数的分布式模型。

[2] 我们发现，词嵌入的性能提高很大程度上是由于特定系统设计选择和超参数优化，而不是词嵌入算法本身（带来的性能提升）。

[3] 此外，我们还表明，这些修改可以转移到传统的分布模型，从而产生类似的增益。

[4] 与之前的报告相比，我们观察到方法之间主要存在局部或微小的性能差异，与其他方法相比，没有任何整体优势。

结论

[1] 最近的嵌入方法引入了过剩的网络体系结构和优化算法之外的设计选择。

[2] 我们发现，这些看似微小的变化可能会对单词表达方法的效果产生很大的影响。

[3] 通过展示如何在传统方法中采纳以及调整这些超参数，我进行了在各种表示方法之间的适当比较，并从词嵌入文献中挑战各种主张。

[4] 这项研究还暴露了对更多可控变量实验的需求，并将“变量”的概念从明显的任务、数据和方法扩展到经常被忽略的预处理步骤和超参数设置。

[5] 我们还强调需要透明和可重复的实验，并赞扬 Mikolov，Pennington 等作者公开提供其代码。

[6] 本着这种精神，我们也提供了代码。

感悟

这篇文章是一项对比研究，旨在揭示基于神经网络的词表示学习方法所带来的效果提升，在于超参数的设置，而不是网络结构的改进。

翻译 | Improving Distributional Similarity with Lessons Learned from Word Embeddings

翻译 | Improving Distributional Similarity with Lessons Learned from Word Embeddings

原文

摘要

结论

译文

摘要

结论

感悟

猜你喜欢