Solr Deep Paging (solr deep paging)

Please reprint from the source: http://eksliang.iteye.com/blog/2148370

作者:eksliang(ickes) blg:http://eksliang.iteye.com/

Overview

We've had a deep pagination problem for a long time. If you skip directly to a very later page, the query speed will be slower. This is because Solr needs to traverse all the data from the beginning for the query. There was no good solution to this problem until Solr 4.7. This problem was not solved until solr 4.7 introduced cursors.

 

question

The problem with deep paging is clear. Solr must prepare a list for the returned search results and return a part of it. It's not that hard if the part comes from the front of the list. But if we want to return data for the 10000th page (20 records per page), Solr needs to prepare a list containing a size of 200000 (10000*20). In this way, it takes not only time, but also memory. For example, the historical data in our current production has reached 600 million data. If we jump directly to the last page, the memory will overflow.

 

How does solr4.7 solve this problem?

A: The release of Solr 4.7 changed that, introducing the concept of cursors. A cursor is a dynamic structure and does not need to be stored on the server. The cursor contains the offset of the result of the query. Therefore, Solr no longer needs to traverse the result from the beginning until the record we want. The function of the cursor can greatly improve the performance of deep page turning.

 

usage

Using cursors is very simple. In the first query, we need to pass an extra parameter - cursorMark=* which tells Solr to return the cursor. In addition to the search results returned, we can also get the nextCursorMark information. Take a look at the example below.

http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=*

 The returned results are as follows:

 

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">186</int>
<lst name="params">
<str name="sort">price desc,id asc</str>
<str name="q">*:*</str>
<str name="cursorMark">*</str>
<str name="rows">3</str>
</lst>
</lst>
<result name="response" numFound="4160002" start="0">
<doc>
<str name="id">a004180000</str>
<str name="name">ickes_4180000</str>
<float name="price">5180000.0</float>
<str name="price_c">5180000.0,USD</str>
<str name="url">www.eksliang.iteye4180000</str>
<long name="_version_">1483095619858857993</long>
</doc>
<doc>
<str name="id">a004179999</str>
<str name="name">ickes_4179999</str>
<float name="price">5179999.0</float>
<str name="price_c">5179999.0,USD</str>
<str name="url">www.eksliang.iteye4179999</str>
<long name="_version_">1483095619858857992</long>
</doc>
<doc>
<str name="id">a004179998</str>
<str name="name">ickes_4179998</str>
<float name="price">5179998.0</float>
<str name="price_c">5179998.0,USD</str>
<str name="url">www.eksliang.iteye4179998</str>
<long name="_version_">1483095619858857991</long>
</doc>
</result>
<str name="nextCursorMark">AoIISp4UvCphMDA0MTc5OTk4</str>
</response>
 

 

   We can see that, in addition to the results that are usually returned, there is an additional cursor data nextCursorMark, which is used as the parameter for us to turn the next page.

On this basis, what to do to get the next page of data: let the value of cursorMark be equal to the nextCursorMark returned last time

For example, the next page is like this

http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=AoIISp4UvCphMDA0MTc5OTk4

 At this time, you can get the data of the next page, the data is as follows:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">234</int>
<lst name="params">
<str name="sort">price desc,id asc</str>
<str name="q">*:*</str>
<str name="cursorMark">AoIISp4UvCphMDA0MTc5OTk4</str>
<str name="rows">3</str>
</lst>
</lst>
<result name="response" numFound="4160002" start="0">
<doc>
<str name="id">a004179997</str>
<str name="name">ickes_4179997</str>
<float name="price">5179997.0</float>
<str name="price_c">5179997.0,USD</str>
<str name="url">www.eksliang.iteye4179997</str>
<long name="_version_">1483095619858857990</long>
</doc>
<doc>
<str name="id">a004179996</str>
<str name="name">ickes_4179996</str>
<float name="price">5179996.0</float>
<str name="price_c">5179996.0,USD</str>
<str name="url">www.eksliang.iteye4179996</str>
<long name="_version_">1483095619858857989</long>
</doc>
<doc>
<str name="id">a004179995</str>
<str name="name">ickes_4179995</str>
<float name="price">5179995.0</float>
<str name="price_c">5179995.0,USD</str>
<str name="url">www.eksliang.iteye4179995</str>
<long name="_version_">1483095619858857988</long>
</doc>
</result>
<str name="nextCursorMark">AoIISp4UtiphMDA0MTc5OTk1</str>
</response>

 At this time, further query becomes quite simple, directly

http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=AoIISp4UtiphMDA0MTc5OTk1

 

solrj support for Solr Deep Paging

code directly 

static void deepPaging() throws SolrServerException{
		HttpSolrServer server = new HttpSolrServer("http://192.168.238.133:8080/solr/collection1");
		server.setSoTimeout(10000);
		server.setConnectionTimeout(10000);
		server.setDefaultMaxConnectionsPerHost(12);
		server.setAllowCompression(true);
		SolrQuery query = new SolrQuery();
	    query.setQuery( "*:*" );
	    query.setRows(4);
	    query.addSort("price",ORDER.desc).addSort("id", ORDER.desc);
	    query.set(CursorMarkParams.CURSOR_MARK_PARAM, "*");
	    QueryResponse rsp = server.query( query );
	    List<CursorMark> beans = rsp.getBeans(CursorMark.class);
	    System.out.println(rsp.getNextCursorMark());//Get the next cursor
		for (CursorMark cursorMark : beans) {
			System.out.println(cursorMark);
		}		
	}

 The returned results are as follows:

AoIISp4UuiphMDA0MTc5OTk3
CursorMark [id=a004180000, name=ickes_4180000, price=5180000.0, url=www.eksliang.iteye4180000]
CursorMark [id=a004179999, name=ickes_4179999, price=5179999.0, url=www.eksliang.iteye4179999]
CursorMark [id=a004179998, name=ickes_4179998, price=5179998.0, url=www.eksliang.iteye4179998]
CursorMark [id=a004179997, name=ickes_4179997, price=5179997.0, url=www.eksliang.iteye4179997]

 

 

   Reference: http://solr.pl/en/2014/03/10/solr-4-7-efficient-deep-paging/

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326448588&siteId=291194637