Query q = NumericRangeQuery.newLongRange("idField", 1L, 10L, true, true);
Of numeric type indexing time, the value will be converted into a plurality of lexicographic sortable string, then indexes into trie trie structure.
For example: Suppose num1 disassembled into a, ab, abc; num2 disassembled into a, ab, abd.
【figure 1】:
Num1 can be prefixed with ab, num2 have to find out by searching ab. When looking at the range of values within the same prefix lookup to find a range of object can be achieved doc plurality of return, thereby reducing lookups.
Explain the following: Works numeric types of indexes and scope of the query.
1: the binary representation of the value
In long Case: +63 bit integer bit sign bit, the sign bit 0 for positive and 1 for negative.
The larger the positive number is low for larger this number 63, it is also negative for the low 63 increases.
If the sign bit inverted. The long.min - long.max can be expressed as: 0x0000,0000,0000,0000 - 0xFFFF, FFFF, FFFF, FFFF
After such a conversion is not a character from the level is already sorted from small to large?
2: How to split a prefix
To 0x0000,0000,0000, F234, for example, each time the right four.
1: 0x0000,0000,0000, F23 and 0x0000,0000,0000, F230 --0x0000,0000,0000, prefix for all values within the range of one consistent F23F
2: 0x0000,0000,0000, F2 and 0x0000,0000,0000, F200 --0x0000,0000,0000, consistent prefix for all values within the range F2FF
3: 0x0000,0000,0000, F 0x0000,0000,0000, F000 --0x0000,0000,0000, consistent with the prefix for all values within the range of FFFF
....
0x0
If the values do several right shift key, it may represent a respective range. prefix will be understood as key values
3: folded into a small region of a wide range of
Lucene query time in the practice of law is folded into a wide range of small-scale and small-scale separately for each look with a prefix, thus reducing the number of lookups.
4: the index value to achieve the type
A first set PrecisionStep (default 4), each right type of the value (n-1) * PrecisionStep bits.
After each shift, starting from the left into a 7 each byte, consisting of a byte [],
Inserting a special byte and bit 0 in the array, the offset identified.
Each byte [] can be converted into a lexicographic sortable string.
lexicographic sortable string of characters according to the lexicographic order, and the offset value of the order is the same. - This is NumericRangeQuery key range to find!
A total of 64 long type, if precisionStep = 4, then there will be 16 lexicographic sortable string.
16 corresponds to a long value corresponding to the prefix, and then the inverted index lucene eventually index similar to that of FIG. 1 is the index structure.
Split key code:
longToPrefixCodedBytes org.apache.lucene.util.NumericUtils class () method
public static void longToPrefixCodedBytes(final long val, final int shift, final BytesRefBuilder bytes) {
if ((shift & ~0x3f) != 0) // ensure shift is 0..63
throw new IllegalArgumentException("Illegal shift value, must be 0..63");
//计算byte[]的大小,每位七位存入一个byte
int nChars = (((63-shift)*37)>>8) + 1; // i/7 is the same as (i*37)>>8 for i in 0..63
//最后还有第0位存偏移量,所以+1
bytes.setLength(nChars+1); // one extra for the byte that contains the shift info
bytes.grow(BUF_SIZE_LONG);
//标识偏移量,shift
bytes.setByteAt(0, (byte)(SHIFT_START_LONG + shift));
//把符号位取反
long sortableBits = val ^ 0x8000000000000000L;
//右移shift位,第一次shifi传0,之后按precisionStep递增
sortableBits >>>= shift;
while (nChars > 0) {
// Store 7 bits per byte for compatibility
// with UTF-8 encoding of terms
//每7位存入一上byte ,前面第一位为0——在utf8中表示ascii码.并加到数组中。
bytes.setByteAt(nChars--, (byte)(sortableBits & 0x7f));
sortableBits >>>= 7;
}
}
5: range queries
It is generally thought Start Split from both ends of the range. First split into a lower range value, and then moves to the next PrecisionStep another and to a high range.
Finally, each value in the inter-cell, according to the number of movements, and in the same manner as indexing turn into lexicographic sortable string. To find.
Code:
splitRange org.apache.lucene.util.NumericUtils class () method
private static void splitRange(
final Object builder, final int valSize,
final int precisionStep, long minBound, long maxBound
) {
if (precisionStep < 1)
throw new IllegalArgumentException("precisionStep must be >=1");
if (minBound > maxBound) return;
for (int shift=0; ; shift += precisionStep) {
// calculate new bounds for inner precision
final long diff = 1L << (shift+precisionStep),
mask = ((1L<<precisionStep) - 1L) << shift;
final boolean
hasLower = (minBound & mask) != 0L,
hasUpper = (maxBound & mask) != mask;
final long
nextMinBound = (hasLower ? (minBound + diff) : minBound) & ~mask,
nextMaxBound = (hasUpper ? (maxBound - diff) : maxBound) & ~mask;
final boolean
lowerWrapped = nextMinBound < minBound,
upperWrapped = nextMaxBound > maxBound;
if (shift+precisionStep>=valSize || nextMinBound>nextMaxBound || lowerWrapped || upperWrapped) {
// We are in the lowest precision or the next precision is not available.
addRange(builder, valSize, minBound, maxBound, shift);
// exit the split recursion loop
break;
}
if (hasLower)
addRange(builder, valSize, minBound, minBound | mask, shift);
if (hasUpper)
addRange(builder, valSize, maxBound & ~mask, maxBound, shift);
// recurse to next precision
minBound = nextMinBound;
maxBound = nextMaxBound;
}
}
For example: fractional resolved into 1001,0001-1111,0010
1: 1001,0001-1001,1111 (0x91-0x9F term post 15 has zeroth offset)
And 1111,0000 -1111,0010 (0xF0-0F2 after the term has three zeroth offset)
2: 1002,0000 - 1110, 1111 right after the first (0x11- 0x15 there are five term)
Find 23 lexicographic sortable string. You can cover the entire range.