Interpretation of docValue implementation source code in lucene (3) - reading of NumericDocValue

The reading of lucene's docValue is in Lucene410DocValuesProducer. Let's take a look at his construction method. I also copied some useful class codes.
class Lucene410DocValuesProducer extends DocValuesProducer implements Closeable {
	/** Applicable to the docValue of numericDocValue, the key is the domain number, the value is the attribute of the meta file of the corresponding domain, and the meta file will be read when it is opened*/
	private final Map<Integer, NumericEntry> numerics;
	private final AtomicLong ramBytesUsed;
	private final IndexInput data;
	private final int maxDoc;
	private final int version;

	// memory-resident structures
	private final Map<Integer, MonotonicBlockPackedReader> addressInstances = new HashMap<>();
	private final Map<Integer, MonotonicBlockPackedReader> ordIndexInstances = new HashMap<>();

	/** expert: instantiates a new reader */
	Lucene49DocValuesProducer(SegmentReadState state, String dataCodec, String dataExtension, String metaCodec,
			String metaExtension) throws IOException {
		String metaName = IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, metaExtension);
		// read in the entries from the metadata file.
		ChecksumIndexInput in = state.directory.openChecksumInput(metaName, state.context);
		this.maxDoc = state.segmentInfo.getDocCount();
		boolean success = false;
		try {
			version = CodecUtil.checkHeader(in, metaCodec, Lucene49DocValuesFormat.VERSION_START,
					Lucene49DocValuesFormat.VERSION_CURRENT);
			numerics = new HashMap<>();
			. . . // omitted some irrelevant
			readFields(in, state.fieldInfos);//Read all docValue, including various types, this is read in meta (that is, in index file, dvm)

			CodecUtil.checkFooter(in);
			success = true;
		} finally {
			if (success) {
				IOUtils.close(in);
			} else {
				IOUtils.closeWhileHandlingException(in);
			}
		}

		String dataName = IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, dataExtension);
		this.data = state.directory.openInput(dataName, state.context);//Open the actual storage file.
		success = false;
		.. .. .. //omit

		ramBytesUsed = new AtomicLong(RamUsageEstimator.shallowSizeOfInstance(getClass()));
	}

 When constructing the object that reads docValue, it will read the index file, which will read the meta file, and then encapsulate it as an XxxEntry object, put it in memory, the private final Map<Integer, NumericEntry> numerics in this class; An attribute is a description of the meta file stored according to the domain. Read the meta file in the readFields method, which will read various types of docValue, here only look at the number type:

static NumericEntry readNumericEntry(IndexInput meta) throws IOException {
		
	//The information of the current docvalue read comes from meta,
	NumericEntry entry = new NumericEntry();
	entry.format = meta.readVInt();//The format of storage, such as the greatest common divisor difference, difference, compression table
	entry.missingOffset = meta.readLong();//The fp (offset) of the stored missing docs in data
	entry.offset = meta.readLong();//The offset of the real storage location (that is, beyond the missingDocBitset above)
	entry.count = meta.readVLong();//How many doc in total
	switch (entry.format) {
	case GCD_COMPRESSED://GCD-based
		entry.minValue = meta.readLong();//Minimum
		entry.gcd = meta.readLong();//Greatest common divisor
		entry.bitsPerValue = meta.readVInt();//The number of bits occupied by each number for decoding
		break;
	case TABLE_COMPRESSED://compressed table, suitable for when the number of docValue is relatively small
		final int uniqueValues ​​= meta.readVInt();//Number of specific numbers
		if (uniqueValues > 256) {
			throw new CorruptIndexException(
					"TABLE_COMPRESSED cannot have more than 256 distinct values, input=" + meta);
		}
		entry.table = new long[uniqueValues];
		for (int i = 0; i < uniqueValues; ++i) {//Read all the values, which were placed in meta before.
			entry.table[i] = meta.readLong();
		}
		entry.bitsPerValue = meta.readVInt();//This value is used for decoding.
		break;
	case DELTA_COMPRESSED:
		entry.minValue = meta.readLong();//Minimum value,
		entry.bitsPerValue = meta.readVInt();//The number of bits used to record a value in data for decoding.
		break;
	case MONOTONIC_COMPRESSED://This is not. There are only three formats above in Lucene49DocValuesConsumer
		entry.packedIntsVersion = meta.readVInt();
		entry.blockSize = meta.readVInt();
		break;
	default:
		throw new CorruptIndexException("Unknown format: " + entry.format + ", input=" + meta);
	}
	entry.endOffset = meta.readLong();//End position,
	return entry;
}

It can be found here that all the useful attributes in the meta file are read out and placed in memory, but the docValue of the numeric type has not been actually read. The real read operation is in this method

@Override
public NumericDocValues getNumeric(FieldInfo field) throws IOException {
	NumericEntry entry = numerics.get(field.number);
	return getNumeric(entry);
}

 The numerics are the meta files of all domains read before, which are searched according to the serial number of the domain, so the more critical method is the getNumeric method:

//Although we can use multiple types of digital types, they are all stored as long types when they are stored, and they are also read here.
LongValues getNumeric(NumericEntry entry) throws IOException {
	
	RandomAccessInput slice = this.data.randomAccessSlice(entry.offset, entry.endOffset - entry.offset);//This method is the most critical, it will read the specified piece in the data file. But whether it is read into memory, I don't know, please help!
	switch (entry.format) {
	
	case DELTA_COMPRESSED://difference
		final long delta = entry.minValue;
		final LongValues values = DirectReader.getInstance(slice, entry.bitsPerValue);
		return new LongValues() {
			@Override
			public long get(long id) {
				return delta + values.get(id);//Just read the difference + minimum value directly
			}
		};
		
	case GCD_COMPRESSED://The greatest common divisor is similar to the difference
		final long min = entry.minValue;
		final long mult = entry.gcd;
		final LongValues quotientReader = DirectReader.getInstance(slice, entry.bitsPerValue);
		return new LongValues() {
			@Override
			public long get(long id) {
				return min + mult * quotientReader.get(id);
			}
		};
		
	case TABLE_COMPRESSED://compressed table, read according to the sorting value
		final long table[] = entry.table;
		final LongValues ords = DirectReader.getInstance(slice, entry.bitsPerValue);
		return new LongValues() {
			@Override
			public long get(long id) {
				return table[(int) ords.get(id)];
			}
		};
	default:
		throw new AssertionError();
	}
}

 

On the premise of understanding the writing, the code to read is much simpler, but there is an important part of the randomAcessSlice method marked in red above. If there is a great god who sees it, I hope he can answer our questions. Does he read it into the memory, or is it loaded into the memory by the operating system, or will not operate in the memory at all.

 Another point to note is that the bitset is not considered when reading, that is, the bitset containing the id worthy of doc is recorded, so if you read 0, you need to judge whether it exists, that is, whether it is in that bitset. middle.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326080911&siteId=291194637