Laravel个人博客集成Elasticsearch和ik分词

在之前的博客中,写了一篇用laravel5.5和vue写的个人博客。GitHub地址为:https://github.com/Johnson19900110/phpJourney。最近有空,就想着把Elasticsearch集成了进来。

因为博主比较懒,在博客园写博客,所以个人博客就没有同步了,因此就用php的一个爬虫库 fabpot/goutte 把自己博客园文章爬到了自己博客上。

代码如下:

<?php
namespace App\Libraries;

use App\Post;
use Goutte\CLient;
use Symfony\Component\DomCrawler\Crawler;

class CnblogsPostSpider {

    protected $client;

    protected $crawler;

    protected $urls = [];

    public function __construct(Client $client, $url)
    {
        $this->client = $client;
        $this->crawler = $client->request('GET', $url);
    }

    public function getUrls()
    {
        $urls = $this->crawler->filter('.postTitle > a')->each(function ($node) {
            return $node->attr('href');
        });

        foreach ($urls as $url) {
            $crawler = $this->client->request('GET', $url);

            $cnBlogId = $this->getCnBlogId($url);

            $post = new Post();
            if($post->where('cnblogs_id', $cnBlogId)->count()) {
                // 已爬过该博客,只更新阅读和评论数
                $post->where('cnblogs_id', $cnBlogId)->update([
                    'views'         => $this->getViews($crawler),
                    'comments'      => $this->getComments($crawler),
                ]);
            }else {
                $post->insert([
                    'title'         => $this->getTitle($crawler),
                    'category_id'   => 1,
                    'content'       => $this->getContent($crawler),
                    'user_id'       => 1,
                    'views'         => $this->getViews($crawler),
                    'comments'      => $this->getComments($crawler),
                    'cnblogs_id'    => $cnBlogId,
                    'cnblogs_url'   => $url,
                    'created_at'    => $this->getCreatedAt($crawler),
                ]);
            }
        }
    }

    public function getCnBlogId($url)
    {
        $url_arr = explode('/', $url);
        $last = array_pop($url_arr);
        $path_arr = explode('.', $last);
        return intval(array_shift($path_arr));
    }

    protected function getTitle(Crawler $crawler)
    {
        return trim($crawler->filter('.postTitle > a')->text());
    }

    protected function getContent(Crawler $crawler)
    {
        return trim($crawler->filter('#cnblogs_post_body')->text());
    }

    protected function getViews(Crawler $crawler)
    {
        return intval(trim($crawler->filter('#post_view_count')->text()));
    }

    protected function getComments(Crawler $crawler)
    {
        return intval($crawler->filter('#post_comment_count')->text());
    }

    protected function getCreatedAt(Crawler $crawler)
    {
        return trim($crawler->filter('#post-date')->text());
    }
}

然后开始使用Laravel scout 集成ES:

首先,先下载ES包:

 composer require tamayo/laravel-scout-elastic 

这个包依赖 Laravel scout包,所以也就顺便装好了。

然后 publish config 和添加  ServiceProviders 。

这时候就可以装ES了。因为我们要使用中文分词 ik 插件,在安装ik插件的时候,如果我们自己取想办法安装会浪费你很多精力。

因为博主也是刚接触ES,所以我们直接使用现成的项目: https://github.com/medcl/elasticsearch-rtf

这个项目当前的版本是 Elasticsearch 5.1.1,当然ik 插件也就顺便装好了。

$ curl http://localhost:9200

{
  "name" : "Rkx3vzo",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "Ww9KIfqSRA-9qnmj1TcnHQ",
  "version" : {
    "number" : "5.1.1",
    "build_hash" : "5395e21",
    "build_date" : "2016-12-06T12:36:15.409Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}

当你出现这个界面,说明ES已经装好了。

这时候就可以创建一个 artisan 命令,来创建ES的index和template。

<?php

namespace App\Console\Commands;

use GuzzleHttp\Client;
use Illuminate\Console\Command;

class InitEs extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'es:init';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Init es to create index';

    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }

    /**
     * Execute the console command.
     *
     * @return mixed
     */
    public function handle()
    {
        //
        $client = new Client();
        $this->createTemplate($client);
        $this->createIndex($client);
    }

    public function createTemplate(Client $client)
    {
        $url = config('scout.elasticsearch.hosts')[0] . ':9200/' . '_template/rtf';
        $client->put($url, [
            'json' => [
                'template' => '*',
                'settings' => [
                    'number_of_shards' => 1
                ],
                'mappings' => [
                    '_default_' => [
                        '_all' => [
                            'enabled' => true
                        ],
                        'dynamic_templates' => [
                            [
                                'strings' => [
                                    'match_mapping_type' => 'string',
                                    'mapping' => [
                                        'type' => 'text',
                                        'analyzer' => 'ik_smart',
                                        'ignore_above' => 256,
                                        'fields' => [
                                            'keyword' => [
                                                'type' => 'keyword'
                                            ]
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ]
            ]
        ]);

    }

    public function createIndex(Client $client)
    {
        $url = config('scout.elasticsearch.hosts')[0] . ':9200/' . config('scout.elasticsearch.index');
        $client->put($url, [
            'json' => [
                'settings' => [
                    'refresh_interval' => '5s',
                    'number_of_shards' => 1,
                    'number_of_replicas' => 0,
                ],
                'mappings' => [
                    '_default_' => [
                        '_all' => [
                            'enabled' => false
                        ]
                    ]
                ]
            ]
        ]);
    }
}

因为 tamayo/laravel-scout-elastic 不带 highlight 功能,所以我们需要稍微修改一下。新建一个EsEngine继承ElasticsearchEngine类,然后重写几个方法即可。

<?php
/**
 * Created by PhpStorm.
 * User: johnson
 * Date: 2018/6/14
 * Time: 下午3:10
 */

namespace App\Libraries;


use Laravel\Scout\Builder;
use ScoutEngines\Elasticsearch\ElasticsearchEngine;
use Illuminate\Database\Eloquent\Collection;

class EsEngine extends ElasticsearchEngine
{
    public function search(Builder $builder)
    {
        return $this->performSearch($builder, array_filter([
            'numericFilters' => $this->filters($builder),
            'size' => $builder->limit,
        ]));
    }

    protected function performSearch(Builder $builder, array $options = [])
    {
        $params = [
            'index' => $this->index,
            'type' => $builder->model->searchableAs(),
            'body' => [
                'query' => [
                    'bool' => [
                        'must' => [
                            [
                                'query_string' => [
                                    'query' => "*{$builder->query}*",
                                ]
                            ]
                        ]
                    ]
                ],
            ]
        ];
        /**
         * 这里使用了 highlight 的配置
         */
        if ($builder->model->searchSettings
            && isset($builder->model->searchSettings['attributesToHighlight'])
        ) {
            $attributes = $builder->model->searchSettings['attributesToHighlight'];
            foreach ($attributes as $attribute) {
                $params['body']['highlight']['fields'][$attribute] = new \stdClass();
            }
        }

        if ($sort = $this->sort($builder)) {
            $params['body']['sort'] = $sort;
        }

        if (isset($options['from'])) {
            $params['body']['from'] = $options['from'];
        }

        if (isset($options['size'])) {
            $params['body']['size'] = $options['size'];
        }

        if (isset($options['numericFilters']) && count($options['numericFilters'])) {
            $params['body']['query']['bool']['must'] = array_merge($params['body']['query']['bool']['must'],
                $options['numericFilters']);
        }

        return $this->elastic->search($params);
    }

    public function map($results, $model)
    {
        if ($results['hits']['total'] === 0) {
            return Collection::make();
        }

        $keys = collect($results['hits']['hits'])
            ->pluck('_id')->values()->all();

        $models = $model->whereIn(
            $model->getKeyName(), $keys
        )->get()->keyBy($model->getKeyName());

        return collect($results['hits']['hits'])->map(function ($hit) use ($model, $models) {

            $one = $models[$hit['_id']];
            /**
             * 这里返回的数据,如果有 highlight,就把对应的  highlight 设置到对象上面
             */
            if (isset($hit['highlight'])) {
                $one->highlight = $hit['highlight'];
            }
            return $one;
        });
    }
}

我们这里要搜索的是博客,所以在Post模型中添加

  use Searchable;
  public
$searchSettings = [ 'attributesToHighlight' => [ '*' ] ]; public $highlight = [];

然后在查询数据的时候使用scout的search方法即可。

public function search(Request $request)
    {
        $q = $request->get('q', false);

        $posts = [];
        if($q !== false) {
            $posts = Post::search($q)->paginate();
        }

        return view('index', compact('posts', 'q'));
    }

查询到的数据中,包含 highlight 属性。所以在模版中就可以这样用

 
 
@if(isset($post->highlight['content']))
@foreach($post->highlight['content'] as $item)
...{!! $item !!}...
@endforeach
@else
{{ empty($post->content) ? '...' : mb_substr($post->content, 0, 300) . '...' }}
@endif

最终的效果是这样滴

猜你喜欢

转载自www.cnblogs.com/johnson108178/p/9185363.html