ICTIR 2016 Analysis of the Paragraph Vector Model for Information Retrieval - 代码天地

ICTIR 2016 Analysis of the Paragraph Vector Model for Information Retrieval

其他 2018-10-31 10:32:08 阅读次数: 0

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/yangliuy/article/details/52970190

中文简介：本文是对前面的 SIGIR‘16工作的拓展, 主要是对PV model适用于IR的task时的三方面的问题进行了更加深入的分析，并且提出了针对这三个问题的相应改进。
论文出处：ICTIR' 16

英文摘要：Previous studies have shown that semantically meaningful representations of words and text can be acquired through neural embedding models. In particular, paragraph vector (PV) models have shown impressive performance in some natural language processing tasks by estimating a document (topic) level language model. Integrating the PV models with traditional language model approaches to retrieval, however, produces unstable performance and limited improvements. In this paper, we formally discuss three intrinsic problems of the original PV model that restrict its performance in retrieval tasks. We also describe modifications to the model that make it more suitable for the IR task, and show their impact through experiments and case studies. The three issues we address are (1) the unregulated training process of PV is vulnerable to short document overfitting that produces length bias in the final retrieval model; (2) the corpus-based negative sampling of PV leads to a weighting scheme for words that overly suppresses the importance of frequent words; and (3) the lack of word-context information makes PV unable to capture word substitution relationships.

下载链接：https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1242

中文简介：本文是对前面的 SIGIR‘16工作的拓展, 主要是对PV model适用于IR的task时的三方面的问题进行了更加深入的分析，并且提出了针对这三个问题的相应改进。
论文出处：ICTIR' 16

英文摘要：Previous studies have shown that semantically meaningful representations of words and text can be acquired through neural embedding models. In particular, paragraph vector (PV) models have shown impressive performance in some natural language processing tasks by estimating a document (topic) level language model. Integrating the PV models with traditional language model approaches to retrieval, however, produces unstable performance and limited improvements. In this paper, we formally discuss three intrinsic problems of the original PV model that restrict its performance in retrieval tasks. We also describe modifications to the model that make it more suitable for the IR task, and show their impact through experiments and case studies. The three issues we address are (1) the unregulated training process of PV is vulnerable to short document overfitting that produces length bias in the final retrieval model; (2) the corpus-based negative sampling of PV leads to a weighting scheme for words that overly suppresses the importance of frequent words; and (3) the lack of word-context information makes PV unable to capture word substitution relationships.

下载链接：https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1242

猜你喜欢

转载自blog.csdn.net/yangliuy/article/details/52970190

ICTIR 2016 Analysis of the Paragraph Vector Model for Information Retrieval

SIGIR 2016 Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval

Information Retrieval

Information Retrieval Resources

Learning to Rank for Information Retrieval

information retrieval (CMU 11642)

Multi-Hop Paragraph Retrieval for Open-Domain Question Answering

word2vector & paragraph2vector 技术分享

IR（Information Retrieval）初筛算法

Awesome Information Retrieval Awesome信息检索

Course Name Information Retrieval H/M

COMP3009J – Information Retrieval

Information retrieval (IR class2)

Private Information Retrieval私有信息检索

NLP中的文本呈现--BOW&Paragraph Vector

ME001Information Systems Analysis and Design

<Search Engines - Information Retrieval In Practice> 读后感 - 概述

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记

introduction to Information Retrieval 阅读笔记之第二章

【论文阅读】A Deep Look into Neural Ranking Models for Information Retrieval

EM-Tree + Paragraph2vector 实现大规模文档聚类

读论文，衣物检索：Clothing Retrieval with Visual Attention Model（2017.10.31）

【论文理解】Clothing Retrieval with Visual Attention Model

论文阅读 | A Deep Relevance Matching Model for Ad-hoc Retrieval

REALM: Retrieval-Augmented Language Model Pre-Training

「Medical Image Analysis」 Note on Contour-aware Information Aggregation

一分钟图情论文：《Content analysis in library and information research: An analysis of trends》

introduction to Information Retrieval 阅读笔记之第一章

对抗对齐分布--Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT

详细介绍文本检索基准BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models

今日推荐

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

周排行

BPM为企业带来的实际利益

好程序员web前端分享css常用属性缩写

Java文件下载（excel）

css样式的动态添加及显示和隐藏等零碎用法

axios全局配置以及拦截器

使用Logstash来实时同步MySQL和log日志数据到ES

C++获取当前时间（年月日、时分秒、毫秒）

Odoo产品分析 (四) -- 工具板块(11) -- 网站即时聊天(1)

Java环境配置正确，但是java、javac、java -version均返回“不是内部或外部命令，也不是可运行的程序或批处理文件”？

01 官网下载各种CentOS教程（超详细版）

每日归档

更多

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)