【论文阅读】《Delta TFIDF：An Improved Feature Space for Sentiment Analysis》（论文及实验）

其他 2018-12-28 01:34:47 阅读次数: 0

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/u014568072/article/details/80174724

Delta TFIDF

论文中提出了一种在文本分类之前给单词加权的计算方法，并使用SVM对三个数据集进行情感分析。

方法

在词袋模型中，每一个单词或者n-gram字都与一个值相关联。这些值通常都是文档中的数字。有时这些值会根据其对应单词在文档中的统计特征来进一步加权。相反，我们通过这些词在不同语料库中的出现情况来衡量它的值。

本方法通过计算某单词在正负语料库的TFIDF得分差异来为文档分配特征值。
给定：

$C_{t,d}$ 是单词 $t$ 在文档 $d$ 中出现的次数。
$P_t$ 是正训练语料中包含单词 $t$ 的文档数量。
$|P|$ 是正训练语料总数。
$N_t$ 是负训练语料中包含单词 $t$ 的文档数量。
$|N|$ 是负训练语料总数。
$V_{t,d}$ 是单词 $t$ 特征值在文档 $d$ 中的特征值。

因此训练集中每个词的特征值可以写为：

$V_{t,d} = C_{t,d}*log_2(\frac{|P|}{P_t}) - C_{t,d}*log_2(\frac{|N|}{ N_t})\\= C_{t,d}*log_2(\frac{|P|}{P_t}\frac{N_t}{|N|}) \\=C_{t,d}*log_2(\frac{N_t}{P_t})$

这种词频转换方法提高了在正负样本之间分布不均匀的单词的重要性，并降低了均匀分布的单词的值，从而更好滴获得它们对情感的重要程度。
其中，均匀分布的特征值应为0，分布越是不均匀的单词，其重要程度越高。有明显正向特征的词将具有正数分，有负向特征的分数将具有负分。

实验

结合论文中提出的方法，我在LMDB数据集上进行了实验（一个大型电影评论数据集，包含50k全长评论(Maas et al., 2011)），使用word2vec获得词向量，通过Delta TFIDF对每个词向量进行加权，求和形成每个文档的特征向量。使用神经网络对文本进行情感分类。

method	precision	recall	f1-score
tfidf	0.784	0.784	0.784
idf	0.825	0.825	0.825
Delta-tfidf	0.877	0.877	0.877

可以看到Delta-tfidf与tfidf、idf相比，在性能上有了明显的提升。

【参考文献】
Martineau J, Finin T. Delta TFIDF: An Improved Feature Space for Sentiment Analysis[C]// International Conference on Weblogs and Social Media, Icwsm 2009, San Jose, California, Usa, May. DBLP, 2009.

猜你喜欢

转载自blog.csdn.net/u014568072/article/details/80174724

【论文阅读】《Delta TFIDF：An Improved Feature Space for Sentiment Analysis》（论文及实验）

《Sentiment analysis based on improved pre-trained word embeddings》论文阅读笔记

论文笔记《Domain Adapted Word Embeddings for Improved Sentiment Classification》

论文阅读笔记：Transfer Learning for Deep Sentiment Analysis

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space 论文阅读与实现

【论文阅读笔记】PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

论文阅读：PointNet++: Deep Hierarchical Feature Learning onPoint Sets in a Metric Space

【论文阅读】Improved Denoising Diffusion Probabilistic Models

《Sentiment Analysis of Chinese Microblog Based on Stacked Bidirectional LSTM》论文阅读笔记

Solving Aspect Category Sentiment Analysis as a Text Generation Task论文阅读（EMNLP2021）

Discrete Opinion Tree Induction for Aspect-based Sentiment Analysis论文阅读笔记（ACL2022）

【论文阅读】Feature Pyramid Grids

论文阅读：Neural Sentiment Classification with User and Product Attention

PointNet++：Deep Hierarchical Feature Learning on Point Sets in a Metric Space 论文解析

论文精读:PointNet++: Deep Hierarchical Feature Learning onPoint Sets in a Metric Space

Sentiment Embeddings with Applications to Sentiment Analysis

论文阅读笔记 Improved Word Representation Learning with Sememes

论文阅读——《Wasserstein GAN》《Improved Training of Wasserstein GANs》

[论文阅读] TGANet: Text-guided attention for improved polyp segmentation

Progressive Growing of GANs for Improved Quality, Stability, and Variation(PGAN) 论文阅读

【l论文阅读】An Interactive Multi-Task Learning Network for End-to-End Aspect-Based Sentiment Analysis

论文笔记《Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive》

Relational Graph Attention Network for Aspect-based Sentiment Analysis论文理解

论文精读：Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection

Unsupervised Sentiment Analysis with Signed Social Networks--阅读笔记

Sentiment Analysis for Software Engineering

Sentiment Analysis（dictionary method）

Attention-based LSTM for Aspect-level Sentiment Classification论文阅读笔记

论文阅读笔记：Cross-Domain Sentiment Classification with Target Domain Specific Information

读论文《Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions》

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

让自己的头脑极度开放

CentOS 6.5(x64) 和Redhat6.5操作系误删libc

高可用注册中心

【日记】12.28/【题解】AtCoder AGC041

XML（5）_XML 约束_DTD

Java集合Map（四）

树梅派安装桌面环境教程

pipenv 的使用和安装

小程序白屏问题和内存研究

C语言简单选择排序

每日归档

更多

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)