Solr interview questions related

  1. What are the built-in query parameters of solr?
  • q-query string, required. Basic query used in Solr .
  • fq-(filter query) filter query, function: in the q query matching result, it is also the fq query matching result,
  • sort-sorting, format: sort=<field name>+<desc|asc>[,<field name>+<desc|asc>]... Example: (inStock desc, price asc) means "inStock" first in descending order, then "price" in ascending order, the default is the descending order of relevance.
  • start – Returns the offset position of the first record in the complete search result, starting at 0, generally used for paging.
  • rows-Specify the maximum number of records in the returned result, and cooperate with start to realize paging.
  • wt - (writer type) specifies the output format, there can be xml, json, php, phps, behind solr 1.3 increase, to use to notify us, because the default is not open.
  •  
  • fl-field as a comma-separated list specifies the  set of Fields that should be returned in the document results  . The default is " * ", which refers to all fields. "Score" means that the score should also be returned.
  • df-the default query field (field) , generally specified by default
  • qt-(query type) Specify which type to process the query request. Generally, there is no need to specify it. The default is standard.
  • hl-set the field to be highlighted (there are some built-in parameters)
  1. What are the logical operators in Solr?

Boolean operators AND, &&

Boolean operators OR, ||

Boolean operators NOT,!

  1. What do you know about solr's competitors?
    1. https://blog.csdn.net/lj6052317/article/details/70241212
  1. ElasticSearch  : Lucene-based search server,based on RESTful web interface,Elasticsearch is developed in Java, open source, mostly used in cloud computing,can achieve real-time search, stable, reliable, fast, easy to install and use.
  2. Solandra : It is a real-time search engine programthat combines Solr and Cassandra(Cassandra is an open source distributed NoSQL database system), andsupports most of Solr's default features (search, faceting, highlights) data replication, sharding, caching and compression These are all performed by Cassandra Multi-master (any node can be read and written) with high real-time performance.
  3. Describe the position of Solr in the overall application architecture?

1. Solr runs as an independent full-text search server. Internally, Lucene developed by JAVA is used to complete full-text indexing and query, and RESTful API is provided to support most programming languages. The flexible external configuration makes it possible to complete the work without writing any JAVA code, and it also provides a plug-in architecture to support more advanced user customization.

2. Solr run out of other server applications , provide services outside alone , take the equipment management system, we want to provide some user interface: for example, may initiate add a device interface, you can view the device interface, you can initiate a collar Using the device interface, as a device administrator , you may need to adjust the incorrect device . No matter which functions such as adding , receiving , viewing equipment, etc., they are all around the equipment . This information will exist in the database of the platform system and in the Solr system at the same time, but it may be different because of the different purposes and uses that are stored in different systems, and the format and completeness of the information will also be different.

    1. Solr's query is based on RESTful, that is, the essence of a query is a simple HTTP request URL and a structured response document. The structure of the response document mainly includes: XML, JSON, CSV, and other formats. This also means that a large number of client applications can use Solr, such as WEB applications, rich client applications and mobile devices. Any platform that supports the HTTP protocol can interact with Solr.
  1. What problem does Solr solve?

Strictly speaking, lucene is responsible for data storage, while solr is only an engine that provides search and insertion. It is the same as a database interpreter. What are the benefits? For example, a database has a field with 1000 words, and you want to use these words When searching for a word, ordinary databases will only let you use like to query, it will traverse each word to fuzzy match, which is very inefficient, and some cannot be queried. Of course, except for some special databases with word segmentation, For example, postgresql, what Lucene does is word segmentation, and then to match the word segmentation whether there is the word you want to search, of course, in order to improve the retrieval efficiency and memory saving bottom layer has done very complex things, you can It is so simple to think that the full-text search is not enough for databases

  1. Explain what is solrcore?
    1. Solrcore is an instance in Solr, that is, the index library. One Solr can have multiple Solr instances, and multiple Solr instances do not interfere with each other.
  2. Explain what is solrhome?

   Solrhome is the root directory of solr and the location where the index library is stored. There can be multiple solrcores in a solrhome.

  1. Explain what is collection?
    1. The stand-alone version of solr, collection is solrcore, which is an instance of solr (index library)
    2. In SolrCloud: collection is a collection of solrcore in logical structure
    3. A complete index in a logical sense in the SolrCloud cluster. It is often divided into one or more Shards , they use the same Config Set. If Shard number more than one, it is distributed index, SolrCloud Collection lets you refer to it by name, without the need to care about and want to use for distributed search Shard parameters.
  2. How to start a solr instance?

 

  1. How to start a solr cluster?

 

  1. How to start solr instance on specified port?
  2. Does the solr cluster have to rely on zookeeper?
    1. It's not. But using zookeeper is the most convenient
  3. What is highlight processing?

Scheam.xml (Configure field type IK tokenizer)

    1. In fact, more highlightin ( highlighted ) field, the field does not change the contents of the original return.
    2. If you want to highlight a field, you must set stored=true  ***** for the field
    3. S olrj has three highlight processing methods Standard Highlighter (the most commonly used) , according to the query docIdSet, obtain Documents, and obtain the value of the field to be highlighted in the current document, and do the matching algorithm based on the term of the query and the value of the field
  1. What are the important configuration files in Solr?
    1. solr.xml (used when configuring the cluster, configuring the cluster information example: SolrCloud ip port, connection timeout, etc.)
    2. SolrConfig.xml (Configure the content of the <lib> tag (jar required by the solr instance), and process the request)
    3. Schema.xml (configure filed, configure filedType, configure analyzer)
  2. Which tag is used to define a data type of Solr?

<fieldType>

  1. Which tag is used to define a field in Solr?

<field>

  1. What are the classifications of solr's fields?

common:

StringField non-separable index storage is customarily used for id, some unique columns

LongField     word segmentation, index storage self-defined, often used for numeric columns

StoredField has   no word segmentation and no index storage. It is often used for columns that only need to be displayed such as pictures and audios.

TextField     Segmentation Index Storage Customization Applicable to any type (the attribute type of solr configuration segmentation device must be this type)

  1. What is the role of fq?

Filter criteria to find the first question

  1. What is the role of fl?

Find the first question

  1. How to set the default query field?

Find the first question

  1. How to set the highlight?

Find the first question

  1. Where is the time basis for incremental indexing obtained? (Full index)

Re-index the in collcation/conf/dataimport.properties in the library

last_index_time is based on this time

  1. What is a participle?

When Solr does the indexing process again, it needs to segment the document according to the type of tokenizer defined in the solr configuration file (scheam.xml), and divide the document into one term.

  1. How to set stop words?

Set in the stopword.dic file in WEB-INF/classes of the solr service

  1. The role of stop words?

Function: Stop words will not be segmented, saving storage space and improving query efficiency

  1. How to do Chinese word segmentation?

Commonly used IK tokenizer for Chinese word segmentation, the specific process:

Word segmentation first, then filter

filter:

  1. Depunctuation
  2. Uppercase to lowercase
  3. Go to stay word

 

  1. What is solrCloud?

SolrCloud (solr cloud) is a distributed search solution provided by Solr. Use SolrCloud when you need large-scale, fault-tolerant, distributed indexing and retrieval capabilities. SolrCloud does not need to be used when a system has a small amount of index data. When the amount of index is large and the concurrency of search requests is high, SolrCloud is required to meet these requirements.

 SolrCloud is a distributed search solution based on Solr and Zookeeper. Its main idea is to use Zookeeper as the cluster configuration information center.

It has several features:

1) Centralized configuration information

2) Automatic fault tolerance

3) Near real-time search

4) Automatic load balancing during query

 

  1. How to create solrCloud?

Step 1: build zookeeper cluster

Part 2: Build a tomcat cluster

Part 3: Configure solrhome for each solr instance

Part 4: Upload solr configuration files to zookeeper for unified management

Step 5: Configure the port and ip address of each Solr service

Step 6: associate each solr with zookeeper

Part 7: Complete the configuration and start the service

 

  1. Where is the configuration file saved in solrCloud mode?

Saved in the zookeeper service, you can view it through the zookeeper client link

  1. Describe the logical structure of solrCloud?

The index set includes two shards (shard1 and shard2). shard1 and shard2 are composed of three cores, one of which is a leader and two replications. The leader is elected by zookeeper, and zookeeper controls the index data of the three cores on each shard to be consistent. Solve the problem of high availability.

The user initiates an index request to obtain it from shard1 and shard2 to solve the problem of high concurrency.

  1. Describe the physical structure of solrCloud?

Three Solr instances (each instance includes two Cores) form a SolrCloud.

  1. What is a copy of a shard?

Shard   replica (replica), a shard is composed of multiple replicas. Note that the contents of a replica in a shard should be logically the same. The data of a shard is only one replica, not a combination of these replicas.

  1. How many copies can a shard have?

According to actual business needs, no more than 10 at most.

  1. Is there a copy of a shard that must be the leader?

Not necessarily, each shard does not necessarily have a copy of the leader, but if the request is allocated to a shard that currently does not have a leader, the request will default to the same level of shard to find the leader.

  1. What is the role of the leader?

Really handle the request transaction. When a leader dies, other replicas  will re-elect a new leader

  1. Which solrj class is used to manage the stand-alone solr connection?

HttpSolrServer

SolrServer server = new HttpSolrServer("http://localhost:8080/solr");

  1. Which solrj class is used to manage the solr connection of the cluster version?

CloudSolrServer

CloudSolrServer server = new CloudSolrServer(zkHost);

  1. When using solrj to connect to solrCloud, we only need to know the zookeeper address, right?

No, you need to get the name of the collection

  1. What is automatic submission? How to configure?

Hard submission is about persistence, soft submission is about visibility

There are two automatic submission methods: 1. Automatic hard submission, 2. Automatic soft submission

Hard submission:

General submission is also called hard commit. Using this kind of submission will immediately persist the document to disk and allow you to query it immediately, because it will open a new searcher, but its shortcomings are obvious. It is very performance consuming and will block until the submitted task is completed. Using it is a very expensive operation.
You can use commit=true in the url of the submitted document.

Soft submission:

Soft submission, this kind of submission will not write the data to disk immediately, but it can enable you to query it immediately, which is the so-called support for near real-time (NRT) searching, and this operation is not expensive.

The configuration in solrConfig.xml is as follows:

<!-- Automatic hard submission-->

<autoCommit> 

<maxTime>${solr.autoCommit.maxTime:30000}</maxTime> 

<openSearcher>false</openSearcher> 

</autoCommit> 

<!-- Automatic soft submission-->

<autoSoftCommit> 

<maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime> 

</autoSoftCommit>

  1. What is soft submission? How to configure?

38 questions

  1. Is the submission method of Solrclient.commit() soft submission?

No, it is a hard submission.

  1. What are the benefits of soft submission?

This kind of submission will not write the data to the disk immediately, but it can enable you to query it immediately , improving read and write efficiency

  1. Automatic submission may cause index loss. How does Solr solve this problem?

  Configure the following properties in solrConfig.xml:

   maxDocs: When the number of memory indexes reaches the specified value, DUMP the memory indexes to the hard disk and notify the searcher class to load the new index

    maxTime: Every specified time period, automatically COMMIT the index data in the memory, and notify the Searcher class to load a new index.

  1. Talk about the relationship between lucene and solr?

Lucene is a full-text search engine toolkit, and cannot provide search and indexing services independently.

Solr is a full-text search server, which can provide full-text search services independently, and is scalable and configurable. It provides more query statements than Lucene, and optimizes the performance of Lucene.

Solr, like Lucene, does not provide view rendering.

  1. Describe how you used solr in the project?

First, in the service build solr linux system configured ik word, a de Schema field domain (domain of each field), the definition of the configuration .xml file field needs highlighted in Schema .xml specified in the attributes field Store is set to true , such as: the name of the device, the device description (anything that can be used as a query condition can be highlighted), etc., of course, you can also set the highlighted attribute

Guess you like

Origin blog.csdn.net/qq_30764991/article/details/97301472