ElasticSearch Beginners Ⅰ --- ES core knowledge generalization

C01. What is Elasticsearch

1. What is search

Vertical search (Search Site)

  • Internet search: Electric's Web site, job sites, various app

  • Search IT systems: OA software, office automation software, conference management, staff management, back office management system

2. What happens if done with a database search

When doing some database search (fuzzy search), efficiency will be poor, it is unlikely to fly.

3. What is the full-text search?

Database search drawbacks: the database, there are one million data, fuzzy matching to scan 1 million times, each scan text must match all characters can not be retrieved dismantle.

Full text search : 1 million data will split open, the establishment of inverted index , search, perhaps for the first time can be found to the corresponding data elements, may be 100 times, 1,000 times, this process is called full-text search.

Lucene: is a jar package, which contains a variety of established inverted index, as well as Code Search, including the various algorithms. Our Java development time, the introduction of the jar lucene develop it. Use lucene, we can go to the existing data indexing, lucene will give us organizational structure of the index data in the local disk above.

lucene drawbacks : the amount of data, it is difficult to put down a machine, you need multiple machines, availability, maintainability.

4. What is Elasticsearch

  • Indexed automatically maintain data across multiple nodes, as well as search requests do not get anything more nodes perform

  • Automatically maintain redundant copies of data, assured that some of the machine is down, you do not lose any data

  • Packing more advanced features to give us more high-level support, let us quickly develop applications, the development of more complex applications, sophisticated search elements function, aggregation function analysis.

 

 

C02.Elasticsearch features, application scenarios

1.Elasticsearch function

  • Distributed search engine and data analysis engine

    Search: within Baidu, the search site station, It retrieval system

    Data analysis: Electric's Web site, the last seven days toothpaste sales ranking over the previous what; news sites, last month visited the top ten ranking of what

  • Search: structured retrieval, data analysis

    Structured Search: I want to search for the classification of goods for daily chemical products which are: select * from prod where cate_id = ' cosmetic products'

    Full text search : Want to find the product name containing toothpaste commodity: select * from prod where prod_name like "% % Toothpaste"

    Data analysis : analyze how much of each commodity classification of goods, select cate_id, count (*) from prod group by cate_id

  • Near real-time processing of massive data

    Distributed: ES huge amounts of data can be automatically distributed to multiple servers to store and retrieve

    Massive data processing: after distributed, can be employed a large number of servers to retrieve and store data.

    Near real time: data search and analysis at the second level.

    / Contrast with distributed mass data: lucene, standalone applications, only on a single server with at most only a single server data processing

     

2.ES of application scenarios

  • Wikipedia, full-text search, highlighting, search Recommended

  • Electricity supplier site, search for a product

  • News sites, user behavior log (thumbs up, comment), data analysis

  • Forum website, full-text search, search questions and answers

  • Github, search code and project

  • Log data analysis, logstash acquires log, ES complex data analysis (ELK technology, elasticsearch + logstash + kibana)

  • Commodity price monitoring website, the user sets the threshold price of a commodity, below a threshold, notification is sent to the user.

  • BI systems, business intelligence, ES perform data analysis and mining

  • Site Search (electricity supplier, recruitment, portals), search (OA, CRM, ERP), etc.

     

3.ES features

  • Can be used as a large distributed cluster technology, PB-level data processing, services large companies, can also run on a single machine

  • What new technologies ES is not primarily the full-text search, data analysis, and distributed technology merged together

  • For users, out of the box, simpler

  • Database functionality in many areas of the face is not enough, such as full-text search, a synonym, relevance ranking, complex data analysis, handling huge amounts of data in near real time. ES can be used as a good traditional database, providing inadequate database functionality.

 

 

 

 

C03.Elasticsearch core concepts

1.lucene和ES

lucene, most advanced, most powerful search database, based on lucene development, is very complex, api complex and require in-depth principle.

ES based on lucene, hiding their complexity, providing a simple-to-use interfaces restful, java api interface.

  • Distributed document storage engine

  • Distributed search engines and engine analysis

  • Distributed to support PB-level data

 

2.ES core concepts

  • 1.Near Realtime (NRT): near real time, from writing data to the ES can be searched to about a 1-second delay, based on ES search and analysis can be achieved in seconds

  • 2.Cluster: cluster comprising a plurality of nodes, each node belongs to which cluster is determined by a configuration (cluster name, the default is elasticsearch), for small and medium sized applications, the beginning of a cluster node on a normal.

  • 3.Node: node, a node in the cluster, the node also has a name (the default is random), node name is very important (in the implementation of the operation and maintenance management operations), to join a default node name for the "elasticsearch" cluster , If you start a bunch of nodes, then they will automatically form a cluster elasticsearch, of course, a node can form a cluster es

  • 4.Document: document. es in the minimum unit, a document data may be a client, a product data, order data is usually represented by JSON data structure, the type of each index, can store a plurality doucument, there are a plurality of document field, each field is a data field

  • Index: Index, there is a pile of documents containing similar data structure, such as a commodity index, orders index, the index has a name. Index contains more than a document, such as the establishment of a commodities index, product index, which may store data on all commodities.

  • Type: type, each index which can have one or more type, type of data is a logical index classified, document under a type, have the same Field (there are exceptions), such as blog system, there is an index, You can define user data type, data type blog

  • shard: single machine can not store large amounts of data, es data can be cut into a plurality of index shard, distributed across multiple storage servers. With shard can scale out to store more data, so that data search and analysis operations distributed across multiple servers to execute, improve throughput and performance.

    PS: index will be split into a plurality of shard, each shard index will be stored in this portion of the data, which shard distributed across multiple servers distributed processing, and improve the throughput performance.

  • replica: any server at any possible failure or downtime, then shard might be lost, so you can create multiple replica for each shard, replica can provide services at any time during shard failure to ensure that data is not lost, but also more replica you can improve throughput and performance. primary shard (indexing time is set, can not be modified, default 5), replica shard (to modify the default one), each index 10 Shard default, Primary Shard 5, 5 replica shard, the minimum high-availability configuration , it is two servers.

    PS: shard actually called primary shard, replica called the replica shard, in fact shard

    replica benefits:

    1). Improved high availability, a shard downtime, data is not lost, continue to provide services

    2) to enhance the throughput and performance of such a search request

 

 

 

C04. ES use on windows

  • Installation JDK, at least more than Java1.8

  • Download extract ES installation package

  • Start ES (using the Terminal command-line tool starts, elasticsearch bin directory)

  • ES check whether successful start (the default port is localhost 9200)

  • Change the cluster name: elasticsearch.yml

  • Download and decompress Kibana installation package, which use development interface, operation es, es of learning is the primary interface entrance

  • Start kibana (default localhost: 5601)

 

 

 

 

 

. C05 Electric's website Commodity Management (a): CRUD operations base

1.document data format

  • Data structure of the application system is an object-oriented, complex

  • Object data stored in the database, only to dismantle, becomes flat multiple tables, each query returns the original object format, cumbersome

  • ES is a document-oriented data structure stored in the document, the object-oriented data structure is the same, the document data based on this structure, ES provide sophisticated indexing, full-text retrieval, analysis and other functions polymerization

  • ES json data format of a document to express

2. Electric's Web site Goods Management Case Background

There is a tram's Web site, you need to build a back-end system based on its ES provides the following functions:

  • Product information on the CRUD (CRUD) operations

  • Perform simple structured query

  • You can perform simple full-text search, as well as complex phrase (phrase) to retrieve

  • For the full-text search results can be highlighted

  • Simple Syndication data analysis

3. Simple Cluster Management

  • 1, quickly check the health of the cluster

    es provides a set of api, called the cat api, es can be viewed in a variety of data

    GET /_cat/health?v

     

    How to quickly understand the health status of the cluster? green, yellow, red?

    green: each index Shard primary and replica shard active state are yellow: index primary Shard each state is active, but not the active replica shard portion, in the unavailable state red: Not all of the primary index shard is the active state, part of the index data is lost

    Why now in a yellow state?

    We are now on a laptop, it started a process es, the equivalent of only one node. Es now have a index, it is kibana own built index established. Since the default configuration is assigned primary shard 5 and 5 for each replica shard index, and primary shard and replica shard can not be on the same machine (for fault tolerance). Now kibana index to establish their own is a primary shard and a replica shard. Currently on a node, so only one primary shard is assigned and activated, but not a replica shard to start the second machine.

    Do a little experiment: this time as long to start a second process es, there will be two node in the cluster es, and that a replica shard is automatically allocated in the past, and then cluster status becomes green state.

  • 2 which indexes quickly see there is a cluster

    GET /_cat/indices?v

  • 3, simple indexing operation

    Create an index: PUT / test_index pretty?

    Delete Index: DELETE / test_index pretty?

4. Goods CRUD operations

1. New Product: New documents, indexing

Syntax: the PUT / index / type / ID
 
{
   "JSON Data" 
}

 

Example:

PUT /ecommerce/product/1
{
    "name" : "gaolujie yagao",
    "desc" :  "gaoxiao meibai",
    "price" :  30,
    "producer" :      "gaolujie producer",
    "tags": [ "meibai", "fangzhu" ]
}
 

 

Return:

{
  "_index": "ecommerce",  
  "_type": "product",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": true
}
 

PS: ES will automatically create index and type, no need to create in advance, and the default es will document each field are built inverted index, so that it could be searched

2. Query Goods: retrieve documents

Syntax: GET / index / type / id

GET /ecommerce/product/1

 

 

3. Modify Goods: Replace Document

PUT /ecommerce/product/1
{
    "name" : "jiaqiangban gaolujie yagao",
    "desc" :  "gaoxiao meibai",
    "price" :  30,
    "producer" :      "gaolujie producer",
    "tags": [ "meibai", "fangzhu" ]
}

 

PS: There is a good alternative, even if you must bring all the field, in order to modify the information

4. Modify Products: updated document

POST /ecommerce/product/1/_update
{
  "doc": {
    "name": "jiaqiangban gaolujie yagao"
  }
}

 

5. Delete Products: Deleting documents

DELETE /ecommerce/product/1

 

 

 

 

. C06 Electric's website Commodity Management (2): The variety of search methods

* Search all products: GET / Ecommerce / Product / Search

took: takes a few milliseconds timed_out: whether a timeout, there is no shards: data split into five slices, so the search request, will hit all primary shard (or it may be a replica shard) Hits .total: the number of query results, three document hits.max_score: score meaning, is a document for a match score of relevance of search, more relevant, the more matches, the score is high hits.hits: contains matching search the document detailed data

 

1. query string search

The origin of the query string search, because the search query string http parameters are to be included with the request

Search product name contains yagao of goods, and in accordance with the price in descending order: GET / ecommerce / product / _search q = name:? Yagao & sort = price: desc

PS: suitable for temporary use command line tools such as curl, quickly issued a request to retrieve the information you want; but if the query requests very complicated, it is difficult to build, in a production environment, rarely using query string search

2. query DSL

DSL: Domain Specified Language, language-specific fields http request body: the request body, can be used to build json format query syntax, more convenient, you can build a variety of complex syntax, certainly stronger than the query string search more

Discover all the goods

GET /ecommerce/product/_search
{
  "query": { "match_all": {} }
}
 

 

Query name contains yagao goods, while sorted in descending order of price

GET /ecommerce/product/_search
{
    "query" : {
        "match" : {
            "name" : "yagao"
        }
    },
    "sort": [
        { "price": "desc" }
    ]
}
 

 

Paging query commodities, a total of three commodities, assuming that each page will display a commodity, now shows page 2, so we check out the second commodity

GET /ecommerce/product/_search
{
  "query": { "match_all": {} },
  "from": 1,
  "size": 1
}

 

To find out the name and specify the price of goods can be

GET /ecommerce/product/search
{
  "query": { "match_all": {} },
  "source": ["name", "price"]
}

 

PS: more suitable for use in a production environment, you can build complex queries

3. query filter

Search product name contains yagao, and the price more than 25 yuan of goods

GET /ecommerce/product/_search
{
    "query" : {
        "bool" : {
            "must" : {
                "match" : {
                    "name" : "yagao" 
                }
            },
            "filter" : {
                "range" : {
                    "price" : { "gt" : 25 } 
                }
            }
        }
    }
}

  

4. full-text search (full text search)

Full-text search will enter the search string to dismantle, to get them inside the inverted index matching, as long as the matching words on any dismantling, can be returned as a result.

GET /ecommerce/product/_search
{
    "query" : {
        "match" : {
            "producer" : "yagao producer"
        }
    }
}

  

Result:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0.70293105, //最高分,匹配度最高的那个
    "hits": [
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "4",
        "_score": 0.70293105,
        "_source": {
          "name": "special yagao",
          "desc": "special meibai",
          "price": 50,
          "producer": "special yagao producer",
          "tags": [
            "meibai"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "1",
        "_score": 0.25811607,
        "_source": {
          "name": "gaolujie yagao",
          "desc": "gaoxiao meibai",
          "price": 30,
          "producer": "gaolujie producer",
          "tags": [
            "meibai",
            "fangzhu"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "3",
        "_score": 0.25811607,
        "_source": {
          "name": "zhonghua yagao",
          "desc": "caoben zhiwu",
          "price": 40,
          "producer": "zhonghua producer",
          "tags": [
            "qingxin"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "2",
        "_score": 0.1805489,
        "_source": {
          "name": "jiajieshi yagao",
          "desc": "youxiao fangzhu",
          "price": 25,
          "producer": "jiajieshi producer",
          "tags": [
            "fangzhu"
          ]
        }
      }
    ]
  }
}

  

5. phrase search (search phrase)

With the corresponding full-text search, on the contrary, phrase search, enter a search string of requirements must be specified in the text field, fully contained exactly the same, you can count match, to return as a result.

GET /ecommerce/product/_search
{
    "query" : {
        "match_phrase" : {
            "producer" : "yagao producer"
        }
    }
}

 

Return:

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.70293105,
    "hits": [
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "4",
        "_score": 0.70293105,
        "_source": {
          "name": "special yagao",
          "desc": "special meibai",
          "price": 50,
          "producer": "special yagao producer",
          "tags": [
            "meibai"
          ]
        }
      }
    ]
  }
}

 

 

6.highlight search (highlighted search results)

GET /ecommerce/product/_search
{
    "query" : {
        "match" : {
            "producer" : "producer"
        }
    },
    "highlight": {
        "fields" : {
            "producer" : {}
        }
    }
}

 

 

 

 

 

C07 Electric's website Commodity Management (III): Aggregated analysis

1. Calculate the number of items in each tag:

GET /ecommerce/product/_search
{
  "aggs": {
    "group_by_tags": {
      "terms": { "field": "tags" } 
    }
  }
}

 

The fielddata property of the text field to true:

PUT /ecommerce/_mapping/product
{
  "properties": {
    "tags": {
      "type": "text",
      "fielddata": true
    }
  }
}

GET /ecommerce/product/_search
{
  "size": 0,
  "aggs": {
    "all_tags": {
      "terms": { "field": "tags" }
    }
  }
}
{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "fangzhu",
          "doc_count": 2
        },
        {
          "key": "meibai",
          "doc_count": 2
        },
        {
          "key": "qingxin",
          "doc_count": 1
        }
      ]
    }
  }
}

 

 

2. a name which includes yagao goods, quantity of goods calculated at each tag

GET /ecommerce/product/_search
{
  "size": 0,
  "query": {
    "match": {
      "name": "yagao"
    }
  },
  "aggs": {
    "all_tags": {
      "terms": {
        "field": "tags"
      }
    }
  }
}

 

 

3. Calculate the average price of the commodity at each tag, and sorted in descending order of average price

GET /ecommerce/product/_search
{
    "size": 0,
    "aggs" : {
        "group_by_tags" : {
            "terms" : { "field" : "tags" },
            "aggs" : {
                "avg_price" : {
                    "avg" : { "field" : "price" }
                }
            }
        }
    }
}

 

Result:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "fangzhu",
          "doc_count": 2,
          "avg_price": {
            "value": 27.5
          }
        },
        {
          "key": "meibai",
          "doc_count": 2,
          "avg_price": {
            "value": 40
          }
        },
        {
          "key": "qingxin",
          "doc_count": 1,
          "avg_price": {
            "value": 40
          }
        }
      ]
    }
  }
}

 

 

 

4. Calculate the average price of the commodity at each tag, and sorted in descending order of average price

GET /ecommerce/product/_search
{
    "size": 0,
    "aggs" : {
        "all_tags" : {
            "terms" : { "field" : "tags", "order": { "avg_price": "desc" } },
            "aggs" : {
                "avg_price" : {
                    "avg" : { "field" : "price" }
                }
            }
        }
    }
}

 

 

The interval grouped according to specified price range, then grouped by tag within each group, and finally calculate the average price for each group

GET /ecommerce/product/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 20
          },
          {
            "from": 20,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_tags": {
          "terms": {
            "field": "tags"
          },
          "aggs": {
            "average_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

 

Guess you like

Origin www.cnblogs.com/wangxiayun/p/11525117.html