Interpretation of docValue implementation source code in lucene (6) - writing of SortedDocValue

 I wrote NumericDocValue (which stores numbers) and BinaryDocValue (which stores byte[]). Today, let’s take a look at the docValue that stores sorted byte[], that is, SortedDocValue. It is not written as SortedBinaryDocValue here. It may be omitted. SortedDocValue Specifically, it refers to the sorted byte[]. Note that when I wrote NumericDocValue and BinaryDocValue before, I forgot to mention one thing, that is, they require that each doc can only store one value, that is, a number or a byte[], and SotedDocValue is the same here, each doc can only have one value byte[]. The ordering mentioned here refers to the ordering of the byte[] of each doc in all docs. This ordering should be preserved in SotedDocValue, that is, we can immediately get the ordering value of a certain value of this doc from the index, such as id The order of the doc that is 5 is 1, the order of id is 6 is 3, etc., so SortedDocValue is especially suitable for sorting. Another advantage is that SortedDocValue provides a method to query the byte[] of the specified order, that is, if you want to query the order of the first The value of a few doc can also be quickly searched; another advantage is that if no index is established, you can also check whether a byte[] exists or not. Well, let's talk about the benefits slowly, let's see how he stores it.

As before, the written method entry is still in the indexDocValue method of DefaultIndexingChain, which will create a SortedDocValueWriter to write docValue,

// Above are some properties,
/** This property records all added byte[]. After adding each byte[], it will give it an id, which we call termId here. If the byte[] already exists, it will return a negative value, otherwise returns its id. In fact, this class is used to save all terms used when creating an index */
final BytesRefHash hash;
/** An object that records the termId of each doc, each doc has a termid and is stored in order, so if you read the value of the doc whose id is 5, you can directly read the fifth value. */
private PackedLongValues.Builder pending;

public SortedDocValuesWriter(FieldInfo fieldInfo, Counter iwBytesUsed) {
	this.fieldInfo = fieldInfo;
	this.iwBytesUsed = iwBytesUsed;
	hash = new BytesRefHash(new ByteBlockPool(new ByteBlockPool.DirectTrackingAllocator(iwBytesUsed)),
			BytesRefHash.DEFAULT_CAPACITY, new DirectBytesStartArray(BytesRefHash.DEFAULT_CAPACITY, iwBytesUsed));
	pending = PackedLongValues.deltaPackedBuilder(PackedInts.COMPACT);
	bytesUsed = pending.ramBytesUsed();
	iwBytesUsed.addAndGet(bytesUsed);
}

 Look at his add docValue method, of course, it is still stored in memory,

public void addValue(int docID, BytesRef value) {
	if (value.length > (BYTE_BLOCK_SIZE - 2)) {//If it is too large, an error will be reported
		throw new IllegalArgumentException("DocValuesField \"" + fieldInfo.name + "\" is too large, must be <= " + (BYTE_BLOCK_SIZE - 2));
	)
	// Fill in any holes:, because some doc has no value, so fill in the holes, so when reading, you can read directly according to the id of the doc, which is faster
	while (pending.size() < docID) {
		pending.add(EMPTY_ORD);
	}
	addOneValue(value);
}

private void addOneValue(BytesRef value) {
	int termID = hash.add(value);//Add to memory and return its termid
	if (termID < 0) {//Already exists
		termID = -termID - 1;
	} else {//The first occurrence, increase the memory used		
		iwBytesUsed.addAndGet(2 * RamUsageEstimator.NUM_BYTES_INT);
	}
	pending.add(termID);//Record this termid, each doc has a termid, so when looking for the termid of a doc, you can directly search according to the doc's id.
	updateBytesUsed();
}

After reading the code, you can know that when saving to memory, the most important thing is actually two attributes, one is the hash object used to save byte[], which is used to generate an id based on byte[], and the second is A pending object that records the termid of each doc.

Let's see what happens when the index is submitted:

 

 

public void flush(SegmentWriteState state, DocValuesConsumer dvConsumer) throws IOException {
	final int maxDoc = state.segmentInfo.getDocCount();

	assert pending.size() == maxDoc;
	final int valueCount = hash.size();//Number of terms
	final PackedLongValues ​​ords = pending.build();//The id of byte[] contained in each doc can be found through this object
	// Get all termids, sorted by text value from small to large
	final int[] sortedValues = hash.sort(BytesRef.getUTF8SortedAsUnicodeComparator());
	final int[] ordMap = new int[valueCount];//The sorting of each term, the subscript of the array is the termid, and the value is the sorting.
	for (int word = 0; word <valueCount; word ++) {
		wordMap [sortedValues ​​[word]] = word; //
	}
	dvConsumer.addSortedField(fieldInfo,
			// ord -> value, returns all byte[], already sorted. This enables prefix compression to be used. This object can find the corresponding byte[] by sorting each byte[]. Because the subscript of sortedValues ​​is the sorting, and the value is the id, so the id can be found according to the sorting, and then the byte[ can be found from the hash according to the id ]
			new Iterable<BytesRef>() {
				public Iterator <BytesRef> iterator () {
					return new ValuesIterator(sortedValues, valueCount, hash);//The first parameter is the array of sorted termids, the second is the number of all byte[], and the third is all byte[].
				}
			},
			// doc -> ord, returns the id of the byte[] of each doc, you can find the id of the byte[] of each doc through ords, and then find the order according to the id through ordMap.
			new Iterable<Number>() {
				public Iterator<Number> iterator() {
					return new OrdsIterator(ordMap, maxDoc, ords);
				}
			});
}

Through the above two Iterators, you can quickly get the order of byte[] contained in each doc (through ordMap), and can find the id of the corresponding byte[] according to the order (through sortedValues), and then quickly find the corresponding byte[] according to the id byte[] (via hash). Let's take a look at the specific implementation of these two iterables:

private static class ValuesIterator implements Iterator<BytesRef> {
	final int sortedValues[];//The mapping of the order to the id of byte[]
	final BytesRefHash hash;//Map from byte[] id to byte[]
	final BytesRef scratch = new BytesRef ();
	final int valueCount;//byte[] number
	int ordUpto;//The current pointer

	ValuesIterator(int sortedValues[], int valueCount, BytesRefHash hash) {
		this.sortedValues = sortedValues;
		this.valueCount = valueCount;
		this.hash = hash;
	}

	@Override
	public boolean hasNext() {
		return ordUpto < valueCount;
	}

	@Override
	public BytesRef next () {
		if (!hasNext()) {
			throw new NoSuchElementException();
		}
		hash.get(sortedValues[ordUpto], scratch);//sortedValues[ordUpto] finds the id of byte[], then hash.get finds byte[] according to the id, and then puts it into scratch.
		ordUpto ++;
		return scratch;
	}
}

 It can be concluded that this valueSourceIterator is very simple, that is, it returns in sequence according to the byte[] that has been arranged in order. look at the next iterator

private static class OrdsIterator implements Iterator<Number> {
	final PackedLongValues.Iterator iter;//This is the id of byte[] of each doc recorded
	final int ordMap[];//records the order of each byte[]
	final int maxDoc;//How many doc in total
	int docUpto;

	OrdsIterator(int ordMap[], int maxDoc, PackedLongValues ords) {
		this.ordMap = ordMap;
		this.maxDoc = maxDoc;
		assert ords.size() == maxDoc;
		this.iter = ords.iterator();
	}

	@Override
	public boolean hasNext() {
		return docUpto < maxDoc;
	}

	@Override
	public Number next() {
		if (!hasNext()) {
			throw new NoSuchElementException();
		}
		int ord = (int) iter.next ();
		docUpto++;
		return ord == -1 ? ord : ordMap[ord];//The order is returned. That is, the order of byte[] of each doc
	}
}

 As can be seen from the code, this iterable returns the byte[] order of each doc.

 

Let's take a look at how to use two iterables. The code is in Lucene410DocValuesConsumer.addSortedField(FieldInfo, Iterable<BytesRef>, Iterable<Number>):

public void addSortedField(FieldInfo field, Iterable<BytesRef> values, Iterable<Number> docToOrd)
		throws IOException {
	meta.writeVInt(field.number);
	meta.writeByte(Lucene410DocValuesFormat.SORTED);//Record format.
	addTermsDict(field, values);//Add those byte[], in the format of prefix compression,
	addNumericField(field, docToOrd, false);//Add numbers, that is, the order corresponding to each doc
}

As can be seen from the name, he uses two blocks to store, the first step is to store byte[], and the second step is to store the order of each doc. In fact, this is what is done, let's look at the first code, which stores all byte[].

private void addTermsDict(FieldInfo field, final Iterable<BytesRef> values) throws IOException {
	// first check if its a "fixed-length" terms dict
	int minLength = Integer.MAX_VALUE;
	int maxLength = Integer.MIN_VALUE;
	long numValues = 0;
	for (BytesRef v : values) {
		minLength = Math.min(minLength, v.length);
		maxLength = Math.max(maxLength, v.length);
		numValues++;
	}
	if (minLength == maxLength) {
		addBinaryField(field, values);//If the lengths of all byte[] are equal, you can store it directly as a normal binaryDocValue. Yes, this condition is almost impossible to achieve
	} else if (numValues < REVERSE_INTERVAL_COUNT) {
		addBinaryField(field, values);//同上
	} else {
		// header
		meta.writeVInt(field.number);
		meta.writeByte(Lucene410DocValuesFormat.BINARY);
		meta.writeVInt(BINARY_PREFIX_COMPRESSED);//The write storage format is based on prefix compression.
		meta.writeLong(-1L);
		final long startFP = data.getFilePointer();
//It may not be easy to understand when I first came up, so let me tell you how to store it. When storing all byte[], it is matched according to the prefix compression, just like storing the format in the dictionary table.
//Every 16 byte[] is stored as a block, which is divided into two parts in a block. The first part is the first byte[] of the record and the following 15 bytes except for the shared prefix. length. This part is called headBuffer
// Recorded in the second block is the other 15 byte[] parts except the shared prefix and the length of the shared prefix. This part is called bytesBuffer
		RAMOutputStream addressBuffer = new RAMOutputStream ();
		MonotonicBlockPackedWriter termAddresses = new MonotonicBlockPackedWriter(addressBuffer, BLOCK_SIZE);//Record the starting position of each block in data, that is, the index of each block
		RAMOutputStream bytesBuffer = new RAMOutputStream();//Record the second part
		// buffers up block header
		RAMOutputStream headerBuffer = new RAMOutputStream();//The first part is recorded every 16 times.
		BytesRefBuilder lastTerm = new BytesRefBuilder();//The byte[] of the first part is just written.
		lastTerm.grow(maxLength);
		long count = 0;
		int suffixDeltas[] = new int[INTERVAL_COUNT];//Record the unexpected length of the byte[] of the second part excluding the shared prefix of the corresponding byte[] of the first part
		for (BytesRef v : values) {
			int termPosition = (int) (count & INTERVAL_MASK);//Every interval_mask+1, which is 16
			if (termPosition == 0) {//Every 16, record one to the first part
				termAddresses.add(data.getFilePointer() - startFP);//Record the start position of a small block, that is, the index
				headerBuffer.writeVInt(v.length);//Record the length of the current byte[] in the first part
				headerBuffer.writeBytes(v.bytes, v.offset, v.length);//Record the content of the current byte[]
				lastTerm.copyBytes(v);//Retain the current byte[],
			} else {
				// Based on prefix compression, the maximum shared prefix is ​​255
				int sharedPrefix = Math.min(255, StringHelper.bytesDifference(lastTerm.get(), v));//The number of prefixes of the current byte[] and the byte[] just written to the upper jump table, up to 255.
				bytesBuffer.writeByte((byte) sharedPrefix);//Write the length of the prefix in the second part
				bytesBuffer.writeBytes(v.bytes, v.offset + sharedPrefix, v.length - sharedPrefix);//Write the part other than the prefix into the second part
				suffixDeltas[termPosition] = v.length - sharedPrefix - 1;//Record the length of the current byte[] except the shared prefix, this value will be written in the first part in the future, such as the byte recorded in the first layer of this block [] is a, and the current byte[] records ab, so this value is 1. Of course, I am using a character array here, not byte[].
			}
			count++;
			if ((count & INTERVAL_MASK) == 0) {//For every 16, write to hard disk
				flushTermsDictBlock (headerBuffer, bytesBuffer, suffixDeltas);
			}
		}
		// flush trailing crap
		int leftover = (int) (count & INTERVAL_MASK);
		if (leftover > 0) {//Write the last part, because only a multiple of 16 will be written into data.
			Arrays.fill(suffixDeltas, leftover, suffixDeltas.length, 0);
			flushTermsDictBlock (headerBuffer, bytesBuffer, suffixDeltas);
		}
		final long indexStartFP = data.getFilePointer();//The index part, which is the starting position of the above addressBuffer in data.
		// write addresses of indexed terms, the next thing to write is the start position of each block in data
		termAddresses.finish();
		addressBuffer.writeTo(data);
		addressBuffer = null;
		termAddresses = null;
		meta.writeVInt(minLength);
		meta.writeVInt(maxLength);
		meta.writeVLong(count);
		meta.writeLong(startFP);//Record the start position of byte[] data,
		meta.writeLong(indexStartFP);//The start position of the record index
		meta.writeVInt(PackedInts.VERSION_CURRENT);
		meta.writeVInt(BLOCK_SIZE);
		addReverseTermIndex(field, values, maxLength);
	}
}

 Another method is the flushTermDictBlock method, that is, when each 16 byte[], write the first part and the second part to the hard disk to see how he operates:

/**The meaning of the parameter can be found in the above method*/
private void flushTermsDictBlock(RAMOutputStream headerBuffer, RAMOutputStream bytesBuffer, int suffixDeltas[])	throws IOException {
	boolean twoByte = false;//This means that when recording the length of each block (the length refers to the length of each byte[] in addition to the byte[] of the shared prefix), it may exceed 254, if it exceeds If so, use short, otherwise use byte.
	for (int i = 1; i < suffixDeltas.length; i++) {
		if (suffixDeltas[i] > 254) {
			twoByte = true;
		}
	}
	if (twoByte) {//Just look at one of them. It can be found that the length of each byte[] except the shared prefix is ​​written in the head, that is, the first part.
		headerBuffer.writeByte((byte) 255);
		for (int i = 1; i < suffixDeltas.length; i++) {
			headerBuffer.writeShort((short) suffixDeltas[i]);
		}
	} else {
		for (int i = 1; i < suffixDeltas.length; i++) {
			headerBuffer.writeByte((byte) suffixDeltas[i]);
		}
	}
	headerBuffer.writeTo(data);//Write head to data,
	headerBuffer.reset();
	bytesBuffer.writeTo(data);//Write bytes to data
	bytesBuffer.reset ();
}

 After reading the above two methods, we can summarize how the specific byte[] is stored when downloading and storing. After all byte[] are sorted, they are stored in the order from small to large, and then each 16 is stored as a block. This block is divided into two parts. The first part is called header, which stores the first part of the block. byte[], and the length of the remaining 15 byte[] except the prefix, the second block stores the length of each byte[]'s shared prefix, and its own byte[]. The termAddress is also stored, that is, the starting position of each block in the data is recorded, so that when we want to find a sorted byte[], we can first calculate the block to which it belongs, and then directly find the corresponding block according to the termAddress. The starting position can be searched from 16 byte[]. We also forgot a method, which is the addReverseTermIndex method above, take a look:

private void addReverseTermIndex(FieldInfo field, final Iterable<BytesRef> values, int maxLength) throws IOException {
		long count = 0;
		BytesRefBuilder priorTerm = new BytesRefBuilder ();
		priorTerm.grow(maxLength);
		BytesRef indexTerm = new BytesRef ();
		long startFP = data.getFilePointer();
		PagedBytes pagedBytes = new PagedBytes(15);
		MonotonicBlockPackedWriter addresses = new MonotonicBlockPackedWriter(data, BLOCK_SIZE);
		for (BytesRef b : values) {
			int termPosition = (int) (count & REVERSE_INTERVAL_MASK);
			if (termPosition == 0) {//Every 1024, record a term, the 0th or 1024th, enter
				int len ​​= StringHelper.sortKeyLength(priorTerm.get(), b);//Find the minimum length required to determine the size of two byte[].
				indexTerm.bytes = b.bytes;
				indexTerm.offset = b.offset;
				indexTerm.length = len;//Only the first ones are recorded.
				addresses.add(pagedBytes.copyUsingLengthPrefix(indexTerm));
			} else if (termPosition == REVERSE_INTERVAL_MASK) {//The 1023rd entry
				priorTerm.copyBytes(b);
			}
			count++;
		}
		addresses.finish();
		long numBytes = pagedBytes.getPointer();//Number of bytes used
		pagedBytes.freeze(true);
		PagedBytesDataInput in = pagedBytes.getDataInput();//
		meta.writeLong(startFP);
		data.writeVLong(numBytes);//Record the length used by pagedBytes
		data.copyBytes(in, numBytes);//写入pagedBytes。
	}

 This method is very simple. For all byte[], every 1024 records the prefix of the shortest byte[], so that all byte[] can be distinguished by size. His purpose is for some queries, For example, to query whether the byte[] of abc exists (this method exists in SortedDocValues), if there is no such function, we do not know which block it is, we can only try all the blocks, and if there is this record , we can quickly locate his approximate range, and then look for it, it will be much faster. This function was added to the new home later. I don't see it in 4.9, but in 4.10.

Let’s summarize the storage of sortedBinary again. It is divided into three blocks. The first block records all byte[]. According to each 16 small block records, each small block is divided into two parts, header and bytes (detailed No repeat); the second block records the index of each small block in the data, which is convenient and quick to find the location of the block; the third part is to record a byte[] as small as possible for each 2014, which is to find If a certain byte[] is stored or not, use binary search to quickly find its approximate location, and then search from small blocks, which can also be regarded as the index of byte[]. The expiration efficiency is not very high.

 

Another method is addNumericField in addSortedField. This method is very simple. The purpose is to record the byte[] sorting of each doc, so that it is convenient to find the specific byte[] value after finding the sorting according to the doc. This method is the method when NumericDocValue was written before. No more repetitions.

In the addNumeircField method above the algorithm, we can divide all sortedBinaryDocValue into two parts, one is the binary part, the other is the numeric part, the former part is responsible for saving byte[], which is divided into three parts to store the amount, the second Part used to store the ordering of byte[] for each doc. Knowing this storage format, we can guess the realization of the function of sortedDocValue. For example, to find the docValue whose sorting is n, we can directly find the n%16+1 small block of the first part of the binary part according to n, and according to the second small block The index quickly locates the starting position, and then searches in 16 small blocks; if you want to find the order of the doc whose docid is n, you can directly find the numeric part. If you want to find byte[], then search according to the order. Can. 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326094902&siteId=291194637