《Document Classiﬁcation by Inversion of Distributed Language Representations》分享 - 代码天地

《Document Classiﬁcation by Inversion of Distributed Language Representations》分享

其他 2019-02-26 06:50:55 阅读次数: 0

范涛

发表于2017-04-07

前面分享了word2vector，这里想再提下这篇文章，这篇文章是ACL2015上面的一篇paper。之前在用word2vector一直在想，怎么把词向量用在分类模型中？一篇文档可以用各个词的词向量加权平均或者直接用paragraph2vector构建文档向量，再利用lr，gbdt等分类模型就可以了。但是这里面还有些问题：（1）word2vector忽略了文档词序；（2）如果分类样本比较少，直接用paragraph2vector直接学习的向量模型未必靠谱。分类样本不多时候，如果想要利用已有大规模样本训练的词向量模型怎么办？

这篇文章的思路，可以在一个通用的词向量模型基础上，拿不同分类样本分别增量更新这个词向量模型，这样不同分类有不同的词向量模型。再利用Bayesian 规则来得到最终的分类。文章提到了该方法在情感分析上应用。说道情感分析，想多说点。我们现在通用语料训练出的词向量模型在情感分析上是很难直接应用的，比如投资市场语料进行情感分析。因为“上涨”和“下跌”这两个词的在通用语料的词向量可能很接近，但是这两个缺失完全不同的情感属性。所以，怎么解决这个问题? 既然通用语料这两个词词向量很接近，但是在利用不同情感类别语料训练的单独词向量模型两者还会很相似吗？

猜你喜欢

转载自blog.csdn.net/hero_fantao/article/details/69661377

《Document Classiﬁcation by Inversion of Distributed Language Representations》分享

Hierarchical Attention Networks for Document Classiﬁcation学习笔记

Distributed Representations of Words and Phrases and their Compositionality

Connectionist Temporal Classiﬁcation(CTC)

论文《Distributed Representations of words and Phrase and their Compositionality》

Distributed Representations of Sentences and Documents阅读笔记

Distributed Representations of Words and Phrases and their Compositionality翻译与感悟

Distributed Representations of Words and Phrases and their Compositionality论文记录

Distributed Inference and Fine-tuning of Large Language Models Over The Internet

ImageNet Classiﬁcation with Deep Convolutional Neural Networks

Distributed

【论文阅读】Advances in Pre-Training Distributed Word Representations

(29)[NIPs13] Distributed Representations of Words and Phrases and their Compositionality

文本相似度：Distributed Representations of Sentences and Documents

graph2vec: Learning Distributed Representations of Graphs 代码解读

Distributed Representations of Words and Phrases and their Compositionality论文阅读及实战

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models翻译

APAC- Augmented PAttern Classiﬁcation with Neural Networks

【论文翻译】Image Net Classiﬁcation with Deep Convolutional Neural Networks

Bag of Tricks for Image Classiﬁcation with Convolutional Neural Networks

读 Character-level Convolutional Networks for Text Classiﬁcation

《Bag of Tricks for Image Classiﬁcation with Convolutional Neural Networks》阅读笔记

05 Median Robust Extended Local Binary Pattern for Texture Classiﬁcation

Self-training with Noisy Student improves ImageNet classiﬁcation

ElasticSearch最佳入门实践（二十七）总结以及什么是distributed document store

Question Retrieval with Distributed Representations and Participant Reputation in Community QA论文笔记

Paragraph2vec(段向量）-------基于《Distributed Representations of Sentences and Documents》

Language

Document

AlexNet论文学习记录：ImageNet Classiﬁcation with Deep Convolutional Neural Networks

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

laravle中orm简单的增删改查

文本分类特征选取之CHI开方检验

Spark核心编程-WordCount

大数据开发实战系列之电信客服(1)

读书笔记 - 把时间当作朋友 by 李笑来

python 笔记--if else

SpringBoot/Mybatis/Druid, 多数据源MultiDataSource配置思路

排序三个整数

redis集群搭建【2】-Windows中Redis集群搭建

STM32F030驱动TM1650点亮4联数码管

每日归档

更多

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)