Information Retrieval notes

"Summary of web information retrieval and Practice Research Advances in state 2018 based Semantics"
temporal semantic retrieval key comprising two technologies:
1. Automatic Extraction time
2. Model Construction of temporal information based on

Tag and label the extracted time information, specifically consists of three main tasks:
1. Extract or expression recognition time
2. normalized expression of different forms of normalized time
3. Expression of time to complete the label

Calculation Similarity (q, Di), common retrieval model includes:
1. The vector space model
2. probability model
3. The model-based language ordering
4. Neural Network Model

Based on a common philosophy: the more common word document and query terms, that the higher similarity

Expression time three ways:
1. Explicit temporal expression
2. Expression implicit time
3. Expression of the relevant time
for the implicit temporal expression, mostly based on rules and statistical methods to identify


Semantic Retrieval Model tense to go to the mathematical model are mainly three types:
1⃣️. pagerank series
2⃣️. The language model series
3⃣️. TF-IDF and BM25 model series

Traditional text retrieval model only time information as a number of ranked search results factors, such as time stamp the document into the statistical language models, the introduction of time to adjust the text of the prior distribution semantic similarity query and the document.

Then with the development of full-text search and nlp techniques, into time information retrieval model is turning into a study of the correlation between the content of the document in terms of time and query. How to mark completion of the study of the text entity by nlp treatment, and by associating with the relevant time, the associated entity - time feature into the model to retrieve them.

Many methods used to determine the time stamp of the text, but there is little literature methods to determine the time of focusing. Rule-based and statistical-based

 

 

"From temporal expressions to temporal information: semantic tagging of news messages"
time information divided into three categories: 1 invisibility dominant temporal expression 2. Expression 3. Expression of the relevant time


"Document ranking methodology implicit time query"
d + = DWORD DTIME

 


"Methods to improve Web search results for a variety of research"
Under normal circumstances, most web retrieval system only return results based on the degree of matching documents and query and sort the results based on a document ordering principle PRP (probability ranking principle). This sort results:
1. From the content point of view would be relatively simple, easily lead to redundancy, can not meet the information needs of user queries diversity
2. Because of the ambiguity of the word queries, a single content will result in users can not find their own the required information, and give up the inquiry.

Therefore, diversification is intended to produce search results ordered result set, so that the former can cover more than one document N sub intention of the user's query and have lower redundancy.

"Search results ranking algorithm supports a variety of comparative research"
Theoretically, a high degree of complexity for diversity obtain the best answer is NP- complete problem (Carterette B. An Analysis of NP- completeness in Novelty and Diversity Ranking [J] Information Retrieval, 2011, 14 (1):.. 89-106)

The vast majority support diversity has been proposed a method using a two-stage process by:
1. The first stage and the same as the earlier information retrieval system, the document is obtained only considering the correlation sequence as a search result document
2. The document order to be adjustments to enhance diversity, namely re-ranking.

Shuffling can be divided into two categories:
1. Explicit: More information can be learned by some external resource inquiry, such as relevant to the query theme subtopics search terms, the number, importance and so on, making the document rearrangement when, taking into account the search results covering the various sub-themes.
Representatives: xQuAD, PM, IA_SELECT, RxQuAD2 , HistDiv etc.

2. Implicit: not rely on additional information provided by the external resources, consider only the search result is included in the document itself, such as a greedy algorithm that uses a difference maximize document and all documents in the row in front of each location. Or document clustering to infer the implicit sub-themes
representatives: MMR,


Extending existing search techniques to the case of newspaper archives


<< Search Engines notes -Diversity >>

https://www.cnblogs.com/ycsfwhh/archive/2010/12/20/1911232.html

 

Guess you like

Origin www.cnblogs.com/yyagrt/p/11257890.html