Solr针对空间搜索的支持

Spatial Search

Solr支持在地理空间搜索中使用位置数据,使用空间搜索,你可以:

  • 索引点或者其他形状
  • 通过矩形框,圆或者其他形状来过滤搜索结果
  • 通过点之间的距离 或者两个区域之间的矩形之间的距离,排序或者增强分数
  • 产生一个二维的网格来生成热力图或者点绘制

有四个字段类型来进行空间搜索:

  • LatLonPointSpatialField
  • LatLonType (已经过期)
  • SpatialRecursivePrefixTreeFieldType (简称RPT ),包括RptWithGeometrySpatialField
  • BBoxField

LatLonPointSpatialField 是一般点位搜索的最常用的类型。它替代了 LatLonType ,为了兼容LatLonType 还会存在。RPT提供了具有一些高级的特点,例如使用多边形和热力图。
RptWithGeometrySpatialField 用来索引和检索非点位数据,尽管它也可以办到。它不可以进行排序和增强。
BBoxField是进行盒子索引,通过盒子查询 ,制定具体的操作(Intersects,Within,Contains,Disjoint,Equals),或者相关的排序、增强比如overlapRatio或者区域。
一些深奥的详细情况可以在下面链接中找到http://wiki.apache.org/solr/SpatialSearch

LatLonPointSpatialField


下面是LatLonPointSpatialField 在Schema中如何配置:

<fieldType name="location" class="solr.LatLonPointSpatialField" docValues="true"/>

LLPSF支持在 indexed, stored, docValues, 和multiValued的切换。 LLPSF 内部使用 Lucene “Points” 进行索引. 当启用”docValues” 的时候, 经度和纬度比特交织为64位,存入Lucene DocValues中。docValues 数据精度达到厘米级。

Indexing Points


为了索引大地点(经度和纬度),需要以这样的格式”lat,lon”提供。
所以非大地点位,需要看情况。RPT使用”x y”(空格分隔),PointType 然而使用”x,y” (逗号分隔)。
如果你宁愿使用工业标准的格式, Solr 支持 WKT 和GeoJSON。然而对于简单的数据,它们过于笨重。 (过期的类型 LatLonType 或者PointType并不支持)。

Searching with Query Parsers


Solr中存在两个空间搜索查询解析器: geofilt 和bbox。 它们接收以下参数:

参数 描述
sfield 空间索引字段
score (高级选项 LatLonType 或者 PointType并不支持),如果这一查询在评分的环境中使用(如同主查询中的q),这一本地参数决定分数如何产生,它的有效值包括:none 、kilometers 、miles 、degrees 、distance 、recipDistance 。(Don’t use this for indexed non-point shapes (e.g. polygons). The results will be erroneous. And with RPT, it’s only recommended for multi-valued point data, as the implementation doesn’t scale very well and for single-valued fields, you should instead use a separate non-RPT field purely for distance sorting.)当使用BBoxField的时候,支持一些额外的参数:overlapRatio 、area 和area2D
pt 中心点坐标 “lat,lon” 。 Otherwise, “x,y” for PointType or “x y” for RPT field types.
filter (Advanced option; not supported by LatLonType (deprecated) or PointType). If you only want the query to score (with the above score local parameter), not filter, then set this local parameter to false.
d the radial distance, usually in kilometers. (RPT & BBoxField can set other units via the setting distanceUnits)

geofilt

The geofilt filter allows you to retrieve results based on the geospatial distance (AKA the “great circle distance”) from a given point. Another way of looking at it is that it creates a circular shape filter. For example, to find all documents within five kilometers of a given lat/lon point, you could enter &q=:&fq={!geofilt sfield=store}&pt=45.15,-93.85&d=5. This filter returns all results within a circle of the given radius around the initial point:

bbox

The bbox filter is very similar to geofilt except it uses the bounding box of the calculated circle. See the blue box in the diagram below. It takes the same parameters as geofilt. Here’s a sample query: &q=:&fq={!bbox sfield=store}&pt=45.15,-93.85&d=5. The rectangular shape is faster to compute and so it’s sometimes used as an alternative to geofilt when it’s acceptable to return points outside of the radius. However, if the ideal goal is a circle but you want it to run faster, then instead consider using the RPT field and try a large “distErrPct” value like 0.1 (10% radius). This will return results outside the radius but it will do so somewhat uniformly around the shape.

Filtering by an arbitrary rectangle

Sometimes the spatial search requirement calls for finding everything in a rectangular area, such as the area covered by a map the user is looking at. For this case, geofilt and bbox won’t cut it. This is somewhat of a trick, but you can use Solr’s range query syntax for this by supplying the lower-left corner as the start of the range and the upper-right corner as the end of the range. Here’s an example: &q=:&fq=store:[45,-94 TO 46,-93]. LatLonType (deprecated) does not support rectangles that cross the dateline. For RPT and BBoxField, if you are non-geospatial coordinates (geo=”false”) then you must quote the points due to the space, e.g. “x y”.

Optimizing: Cache or Not

It’s most common to put a spatial query into an “fq” parameter – a filter query. By default, Solr will cache the query in the filter cache. If you know the filter query (be it spatial or not) is fairly unique and not likely to get a cache hit then specify cache=”false” as a local-param as seen in the following example. The only spatial types which stand to benefit from this technique are LatLonPointSpatialField and LatLonType (deprecated). Enable docValues on the field (if it isn’t already). LatLonType (deprecated) additionally requires a cost=”100” (or more) local-param.
&q=…mykeywords…&fq=…someotherfilters…&fq={!geofilt cache=false}&sfield=store&pt=45.15,-93.85&d=5
LLPSF does not support Solr’s “PostFilter”.

Distance Sorting or Boosting (Function Queries)


There are four distance function queries: geodist, see below, usually the most appropriate; dist , to calculate the p-norm distance between multi-dimensional vectors; hsin , to calculate the distance between two points on a sphere; and sqedist , to calculate the squared Euclidean distance between two points. For more information about these function queries, see the section on Function Queries.

geodist

geodist is a distance function that takes three optional parameters: (sfield,latitude,longitude). You can use the geodist function to sort results by distance or score return results.
For example, to sort your results by ascending distance, enter …&q=:&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&sort=geodist() asc.
To return the distance as the document score, enter …&q={!func}geodist()&sfield=store&pt=45.15,-93.85&sort=score+asc.

More Examples


Here are a few more useful examples of what you can do with spatial search in Solr.

Use as a Sub-Query to Expand Search Results

Here we will query for results in Jacksonville, Florida, or within 50 kilometers of 45.15,-93.85 (near Buffalo, Minnesota):
&q=:&fq=(state:”FL” AND city:”Jacksonville”) OR {!geofilt}&sfield=store&pt=45.15,-93.85&d=50&sort=geodist()+asc

Facet by Distance

To facet by distance, you can use the Frange query parser:
&q=:&sfield=store&pt=45.15,-93.85&facet.query={!frange l=0 u=5}geodist()&facet.query={!frange l=5.001 u=3000}geodist()
There are other ways to do it too, like using a {!geofilt} in each facet.query.

Boost Nearest Results

Using the DisMax or Extended DisMax, you can combine spatial search with the boost function to boost the nearest results:
&q.alt=:&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&bf=recip(geodist(),2,200,20)&sort=score desc

猜你喜欢

转载自blog.csdn.net/sn_gis/article/details/72238840