haystack full-text retrieval framework

Haystack

1. What is the Haystack

Haystack is django open source framework for full-text search (full-text search is different from the fuzzy query a particular field, using full-text search of higher efficiency), the framework supports Solr , elasticsearch , Whoosh , ** Xapian the search engines it is a pluggable end (much like Django's database layer), so almost all the code you write can easily switch between different search engines

  • Fuzzy full-text search query than the specific field, the higher the efficiency of the use of full-text search, and word processing can be performed for the Chinese
  • haystack: a package django, you can easily model for content inside the index, search, designed to support whoosh, solr, Xapian, Elasticsearc four kinds of full-text search engine backend, is a framework for full-text search
  • whoosh: written in pure Python full-text search engine, although the performance is not as sphinx, xapian, Elasticsearc, etc., but no binary package, the program does not inexplicable collapse, for small sites, whoosh enough to use
  • jieba: a free Chinese word package, if that does not work well can use some fee-based products

2. Install

pip install django-haystack
pip install whoosh
pip install jieba

3. Configuration

Add to HaystackINSTALLED_APPS

Like most Django application, you should be in your settings file (usually settings.py) added to Haystack INSTALLED_APPSexample:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',

    # 添加
    'haystack',

    # 你的app
    'blog',
]

modifysettings.py

In your settings.py, you need to add a back-end configuration file to indicate that the site is being used settings, and other back-end settings. HAYSTACK——CONNECTIONSSetting is required, and should be at least one of:

Solr example

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
        'URL': 'http://127.0.0.1:8983/solr'
        # ...or for multicore...
        # 'URL': 'http://127.0.0.1:8983/solr/mysite',
    },
}

Elasticsearch example

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}

Whoosh example

#需要设置PATH到你的Whoosh索引的文件系统位置
import os
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
        'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
    },
}

# 自动更新索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

Xapian example

#首先安装Xapian后端(http://github.com/notanumber/xapian-haystack/tree/master)
#需要设置PATH到你的Xapian索引的文件系统位置。
import os
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'xapian_backend.XapianEngine',
        'PATH': os.path.join(os.path.dirname(__file__), 'xapian_index'),
    },
}

4. Data processing

Creating an index

If you want to blog to do full-text search, you must establish the following directory blog, for example, for a certain app search_indexes.pyfile, the file name can not be modified

from haystack import indexes
from app01.models import Article

class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
   #类名必须为需要检索的Model_name+Index,这里需要检索Article,所以创建ArticleIndex
   text = indexes.CharField(document=True, use_template=True)#创建一个text字段 
   #其它字段
   desc = indexes.CharField(model_attr='desc')
   content = indexes.CharField(model_attr='content')

   def get_model(self):#重载get_model方法,必须要有!
       return Article

   def index_queryset(self, using=None):
       return self.get_model().objects.all()

Why create an index? Index directory is like a book, can provide faster navigation and look for the reader. Here is the same reason, when the amount of data is very large, to find all meet the search criteria from these data is almost unlikely, will bring great burden on the server. So we need to add an index (directory) for the specified data, here is create an index for the Note, the implementation details of the index is that we do not care, as to create an index for it which fields, how to specify, began to explain below

Each index there must be one and only one field for the document = True, this represents a haystack and search engines will use the content of this field as an index to search (primary field). Other fields only attached properties, easy call, and not as a data retrieval

注意:如果使用一个字段设置了document=True,则一般约定此字段名为text,这是在ArticleIndex类里面一贯的命名,以防止后台混乱,当然名字你也可以随便改,不过不建议改。

In addition, we textoffer on the field use_template=True. This allows us to use a data template (instead of error-prone cascade) to build a document search engine index. You should create a new template in the template directory search/indexes/blog/article_text.txt, and the following content on the inside.

#在目录“templates/search/indexes/应用名称/”下创建“模型类名称_text.txt”文件
{{ object.title }}
{{ object.desc }}
{{ object.content }}

This role is to data templates Note.titleNote.user.get_full_name, Note.bodythese three fields indexing, retrieval when these three fields will do full-text search match

5. Set View

Add SearchViewto yourURLconf

In your URLconfadd the following line:

(r'^search/', include('haystack.urls')),

This will pull the Haystack URLconf default, which is directed by a separate SearchViewinstance URLconf composition. You can pass several key parameters or complete it again to change the behavior of this class.

Search Templates

Your search template (by default search/search.html) will be very simple. Here's enough to make you run a search (you template/blockshould be different)

<!DOCTYPE html>
<html>
<head>
   <title></title>
   <style>
       span.highlighted {
           color: red;
       }
   </style>
</head>
<body>
{% load highlight %}
{% if query %}
   <h3>搜索结果如下:</h3>
   {% for result in page.object_list %}
{#        <a href="/{{ result.object.id }}/">{{ result.object.title }}</a><br/>#}
       <a href="/{{ result.object.id }}/">{%   highlight result.object.title with query max_length 2%}</a><br/>
       <p>{{ result.object.content|safe }}</p>
       <p>{% highlight result.content with query %}</p>
   {% empty %}
       <p>啥也没找到</p>
   {% endfor %}

   {% if page.has_previous or page.has_next %}
       <div>
           {% if page.has_previous %}
               <a href="?q={{ query }}&amp;page={{ page.previous_page_number }}">{% endif %}&laquo; 上一页
           {% if page.has_previous %}</a>{% endif %}
           |
           {% if page.has_next %}<a href="?q={{ query }}&amp;page={{ page.next_page_number }}">{% endif %}下一页 &raquo;
           {% if page.has_next %}</a>{% endif %}
       </div>
   {% endif %}
{% endif %}
</body>
</html>

Note that page.object_listin fact is SearchResulta list of objects. These objects return all indexed data. They can {{result.object}}access to. Therefore, {{ result.object.title}}the actual use of the database in Article object to access titlethe field.

Rebuilding indexes

Now that you've configured everything, it's time to put the data in the database indexed. Haystack comes with a command-line management tools make it easy.

Simply run ./manage.py rebuild_index. You will get how many models were processed and put into the index statistics.

6. Use the word jieba

#建立ChineseAnalyzer.py文件
#保存在haystack的安装文件夹下,路径如“D:\python3\Lib\site-packages\haystack\backends”

import jieba
from whoosh.analysis import Tokenizer, Token

class ChineseTokenizer(Tokenizer):
    def __call__(self, value, positions=False, chars=False,
                 keeporiginal=False, removestops=True,
                 start_pos=0, start_char=0, mode='', **kwargs):
        t = Token(positions, chars, removestops=removestops, mode=mode,
                  **kwargs)
        seglist = jieba.cut(value, cut_all=True)
        for w in seglist:
            t.original = t.text = w
            t.boost = 1.0
            if positions:
                t.pos = start_pos + value.find(w)
            if chars:
                t.startchar = start_char + value.find(w)
                t.endchar = start_char + value.find(w) + len(w)
            yield t


def ChineseAnalyzer():
    return ChineseTokenizer()
#复制whoosh_backend.py文件,改名为whoosh_cn_backend.py
#注意:复制出来的文件名,末尾会有一个空格,记得要删除这个空格
from .ChineseAnalyzer import ChineseAnalyzer 
查找
analyzer=StemmingAnalyzer()
改为
analyzer=ChineseAnalyzer()

7. Create a search bar in the template

<form method='get' action="/search/" target="_blank">
    <input type="text" name="q">
    <input type="submit" value="查询">
</form>

8. Other configurations

Add more variables


from haystack.views import SearchView  
from .models import *  
      
class MySeachView(SearchView):  
     def extra_context(self):       #重载extra_context来添加额外的context内容  
         context = super(MySeachView,self).extra_context()  
         side_list = Topic.objects.filter(kind='major').order_by('add_date')[:8]  
         context['side_list'] = side_list  
         return context  

        
#路由修改
url(r'^search/', search_views.MySeachView(), name='haystack_search'),  

Highlight

{% highlight result.summary with query %}  
# 这里可以限制最终{{ result.summary }}被高亮处理后的长度  
{% highlight result.summary with query max_length 40 %}  

#html中
    <style>
        span.highlighted {
            color: red;
        }
    </style>

Guess you like

Origin www.cnblogs.com/fuwei8086/p/11309611.html