kibana-5.1.2-windows-x86
elasticsearch-rtf
elasticsearch-head
elasticsearch-rtf的版本最好要和kibana接近 具体操作可以从GitHub上查找
使用到npm的话再去下载node.js
在项目中建立一个models文件夹类似django
from datetime import datetime from elasticsearch_dsl import DocType, Date, Nested, Boolean, \ analyzer, InnerDoc, Completion, Keyword, Text,Integer from elasticsearch_dsl.connections import connections connections.create_connection(hosts=["localhost"])
class jobboleItemsType(DocType): title = Text(analyzer="ik_max_word") date_time = Date() style = Text(analyzer='ik_max_word') content = Text(analyzer='ik_max_word') cherish = Integer() image_url = Keyword() img_path = Keyword() class Meta: index = 'job_bole' doc_type = 'article'
if __name__ == '__main__': jobboleItemsType.init()
如上将item中对应的设置一下 类似数据库建立表
在对应的item类中
def save_to_es(self): article = jobboleItemsType() article.title = self['title'] article.content = self['content']
article.date_time = self['date_time'] article.cherish = self['cherish'] article.image_url = self['image_url'] # article.img_path = item['img_path'] article.meta.id = self['id'] article.save() return
在对应的pipeline中调用这个方法,就可以实现将数据存进去了
写完后记得要在settings中注册
分布式-------------------------------------
下载安装好scrapy-redis
C:\Users\chase\Desktop\scrapy-redis-master\src\scrapy_redis 将这个文件夹放入到项目中之后按照GitHub上给的布置
之后运行下面这两个 修改成自己的名称
run the spider:
scrapy runspider myspider.py
push urls to redis:
redis-cli lpush myspider:start_urls http://google.com