Solr sort according to the first letter of Chinese

What I want to share this time is about the realization of a business requirement at work in the near future.
Requirement
------------------------- The     requirement is very simple. Need to sort a title field according to the first letter of Chinese pinyin to
achieve
-------------------------     Of course, I think of directly using the sort function of solr to title desc sort, found that the results are not as expected
"response": {
    "numFound": 7,
    "start": 0,
    "docs": [
      {
        "title": "二维火"
      },
      {
        "title": "a City West Yintai"
      },
      {
        "title": "?Orange"
      },
      {
        "title": "1三益里"
      },
      {
        "title": "/b古翠路"
      },
      {
        "title": "The Four Districts of Cuiyuan"
      },
      {}
    ]
  }

    Therefore, create a new title_fc field, store the pinyin initials of the first word of the title field, turn on docValues=true, and sort the field.
    The implementation of this method is also very simple, first declare a title_fc field in the schema.xml file
   
<field name="title_fc" type="first_character" indexed="false" stored="true" docValues="true"/>
<fieldType name="first_character" class="com.dfire.fieldtype.FirstCharacter" precisionStep="0" positionIncrementGap="0"/>
<copyField source="title" dest="title_fc"/>

    The above three lines of code are to realize the registration of the new field, and copy the content in the title to the new field.
    FirstCharacter is a field type implemented by itself, inherits TrieIntField, translates the injected title value into pinyin type with pinyin4j.jar package, and takes the ascii code value of the first letter as the value of the new field, and then sorts this field. , the effect is as follows.
"response": {
    "numFound": 7,
    "start": 0,
    "docs": [
      {
        "title": "/b古翠路",
        "title_fc": 126
      },
      {
        "title": "?Orange",
        "title_fc": 126
      },
      {
        "title": "二维火",
        "title_fc": 101
      },
      {
        "title": "Cuiyuan Four Districts",
        "title_fc": 99
      },
      {
        "title": "a City West Intime",
        "title_fc": 97
      },
      {
        "title": "1三益里",
        "title_fc": 49
      },
      {}
    ]
  }

     After realizing the business communication, it is found that the demand not only requires the first letter of the Chinese character of the first word, but also needs to record the first letter of the pinyin of all words, so that when the first letter is the same, the subsequent letters are compared.
     To achieve this, I don't know how to start, and I can no longer use the ascii method to record the value, nor can I use the addition method. Later, after communicating with Baisui, I decided to use the 32 bits of int to record several values.
    The specific principle is that, for example, the value of a title is "two-dimensional fire", and the first letter of pinyin is ewh, then the difference between the three-letter ascii code and a is recorded as 5, 23, 8, and the corresponding binary value is 00101 , 10101, 01000, then an int record is 001011010101000, that is, 5800. If this conversion is performed for each title, it can be converted according to each letter. The result is as follows
"response": {
    "numFound": 7,
    "start": 0,
    "docs": [
      {
        "title": "1三益里",
        "testfield": 474845952
      },
      {
        "title": "?Orange",
        "testfield": 371827456
      },
      {
        "title": "/b古翠路",
        "testfield": 369651916
      },
      {
        "title": "二维火",
        "testfield": 89948160
      },
      {
        "title": "Cuiyuan Four Districts",
        "testfield": 56964160
      },
      {
        "title": "a City West Intime",
        "testfield": 17663572
      },
      {}
    ]
  }



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326036535&siteId=291194637
Recommended