Haystack
1. What is the Haystack
Haystack is django open source framework for full-text search (full-text search is different from the fuzzy query a particular field, using full-text search of higher efficiency), the framework supports Solr , elasticsearch , Whoosh , ** Xapian the search engines it is a pluggable end (much like Django's database layer), so almost all the code you write can easily switch between different search engines
- Fuzzy full-text search query than the specific field, the higher the efficiency of the use of full-text search, and word processing can be performed for the Chinese
- haystack: a package django, you can easily model for content inside the index, search, designed to support whoosh, solr, Xapian, Elasticsearc four kinds of full-text search engine backend, is a framework for full-text search
- whoosh: written in pure Python full-text search engine, although the performance is not as sphinx, xapian, Elasticsearc, etc., but no binary package, the program does not inexplicable collapse, for small sites, whoosh enough to use
- jieba: a free Chinese word package, if that does not work well can use some fee-based products
2. Install
pip install django-haystack
pip install whoosh
pip install jieba
3. Configuration
Add to HaystackINSTALLED_APPS
Like most Django application, you should be in your settings file (usually settings.py
) added to Haystack INSTALLED_APPS
example:
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
# 添加
'haystack',
# 你的app
'blog',
]
modifysettings.py
In your settings.py
, you need to add a back-end configuration file to indicate that the site is being used settings, and other back-end settings. HAYSTACK——CONNECTIONS
Setting is required, and should be at least one of:
Solr example
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
'URL': 'http://127.0.0.1:8983/solr'
# ...or for multicore...
# 'URL': 'http://127.0.0.1:8983/solr/mysite',
},
}
Elasticsearch example
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
Whoosh example
#需要设置PATH到你的Whoosh索引的文件系统位置
import os
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
},
}
# 自动更新索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Xapian example
#首先安装Xapian后端(http://github.com/notanumber/xapian-haystack/tree/master)
#需要设置PATH到你的Xapian索引的文件系统位置。
import os
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'xapian_backend.XapianEngine',
'PATH': os.path.join(os.path.dirname(__file__), 'xapian_index'),
},
}
4. Data processing
Creating an index
If you want to blog to do full-text search, you must establish the following directory blog, for example, for a certain app search_indexes.py
file, the file name can not be modified
from haystack import indexes
from app01.models import Article
class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
#类名必须为需要检索的Model_name+Index,这里需要检索Article,所以创建ArticleIndex
text = indexes.CharField(document=True, use_template=True)#创建一个text字段
#其它字段
desc = indexes.CharField(model_attr='desc')
content = indexes.CharField(model_attr='content')
def get_model(self):#重载get_model方法,必须要有!
return Article
def index_queryset(self, using=None):
return self.get_model().objects.all()
Why create an index? Index directory is like a book, can provide faster navigation and look for the reader. Here is the same reason, when the amount of data is very large, to find all meet the search criteria from these data is almost unlikely, will bring great burden on the server. So we need to add an index (directory) for the specified data, here is create an index for the Note, the implementation details of the index is that we do not care, as to create an index for it which fields, how to specify, began to explain below
Each index there must be one and only one field for the document = True, this represents a haystack and search engines will use the content of this field as an index to search (primary field). Other fields only attached properties, easy call, and not as a data retrieval
注意:如果使用一个字段设置了document=True,则一般约定此字段名为text,这是在ArticleIndex类里面一贯的命名,以防止后台混乱,当然名字你也可以随便改,不过不建议改。
In addition, we text
offer on the field use_template=True
. This allows us to use a data template (instead of error-prone cascade) to build a document search engine index. You should create a new template in the template directory search/indexes/blog/article_text.txt
, and the following content on the inside.
#在目录“templates/search/indexes/应用名称/”下创建“模型类名称_text.txt”文件
{{ object.title }}
{{ object.desc }}
{{ object.content }}
This role is to data templates Note.title
, Note.user.get_full_name
, Note.body
these three fields indexing, retrieval when these three fields will do full-text search match
5. Set View
Add SearchView
to yourURLconf
In your URLconf
add the following line:
(r'^search/', include('haystack.urls')),
This will pull the Haystack URLconf default, which is directed by a separate SearchView
instance URLconf composition. You can pass several key parameters or complete it again to change the behavior of this class.
Search Templates
Your search template (by default search/search.html
) will be very simple. Here's enough to make you run a search (you template/block
should be different)
<!DOCTYPE html>
<html>
<head>
<title></title>
<style>
span.highlighted {
color: red;
}
</style>
</head>
<body>
{% load highlight %}
{% if query %}
<h3>搜索结果如下:</h3>
{% for result in page.object_list %}
{# <a href="/{{ result.object.id }}/">{{ result.object.title }}</a><br/>#}
<a href="/{{ result.object.id }}/">{% highlight result.object.title with query max_length 2%}</a><br/>
<p>{{ result.object.content|safe }}</p>
<p>{% highlight result.content with query %}</p>
{% empty %}
<p>啥也没找到</p>
{% endfor %}
{% if page.has_previous or page.has_next %}
<div>
{% if page.has_previous %}
<a href="?q={{ query }}&page={{ page.previous_page_number }}">{% endif %}« 上一页
{% if page.has_previous %}</a>{% endif %}
|
{% if page.has_next %}<a href="?q={{ query }}&page={{ page.next_page_number }}">{% endif %}下一页 »
{% if page.has_next %}</a>{% endif %}
</div>
{% endif %}
{% endif %}
</body>
</html>
Note that page.object_list
in fact is SearchResult
a list of objects. These objects return all indexed data. They can {{result.object}}
access to. Therefore, {{ result.object.title}}
the actual use of the database in Article object to access title
the field.
Rebuilding indexes
Now that you've configured everything, it's time to put the data in the database indexed. Haystack comes with a command-line management tools make it easy.
Simply run ./manage.py rebuild_index
. You will get how many models were processed and put into the index statistics.
6. Use the word jieba
#建立ChineseAnalyzer.py文件
#保存在haystack的安装文件夹下,路径如“D:\python3\Lib\site-packages\haystack\backends”
import jieba
from whoosh.analysis import Tokenizer, Token
class ChineseTokenizer(Tokenizer):
def __call__(self, value, positions=False, chars=False,
keeporiginal=False, removestops=True,
start_pos=0, start_char=0, mode='', **kwargs):
t = Token(positions, chars, removestops=removestops, mode=mode,
**kwargs)
seglist = jieba.cut(value, cut_all=True)
for w in seglist:
t.original = t.text = w
t.boost = 1.0
if positions:
t.pos = start_pos + value.find(w)
if chars:
t.startchar = start_char + value.find(w)
t.endchar = start_char + value.find(w) + len(w)
yield t
def ChineseAnalyzer():
return ChineseTokenizer()
#复制whoosh_backend.py文件,改名为whoosh_cn_backend.py
#注意:复制出来的文件名,末尾会有一个空格,记得要删除这个空格
from .ChineseAnalyzer import ChineseAnalyzer
查找
analyzer=StemmingAnalyzer()
改为
analyzer=ChineseAnalyzer()
7. Create a search bar in the template
<form method='get' action="/search/" target="_blank">
<input type="text" name="q">
<input type="submit" value="查询">
</form>
8. Other configurations
Add more variables
from haystack.views import SearchView
from .models import *
class MySeachView(SearchView):
def extra_context(self): #重载extra_context来添加额外的context内容
context = super(MySeachView,self).extra_context()
side_list = Topic.objects.filter(kind='major').order_by('add_date')[:8]
context['side_list'] = side_list
return context
#路由修改
url(r'^search/', search_views.MySeachView(), name='haystack_search'),
Highlight
{% highlight result.summary with query %}
# 这里可以限制最终{{ result.summary }}被高亮处理后的长度
{% highlight result.summary with query max_length 40 %}
#html中
<style>
span.highlighted {
color: red;
}
</style>