Solr's use of docValue

I wrote more than a dozen blogs before, all of which introduced docValue of lucene. There are five docValues, but there is only one configuration in solr, that is, adding docValues=true to <field> in schema.xml, then solr is building an index What type of docValue is used? Look at the source code to know.

Solr uses the DirectUpdateHandler2 class when adding documents

public int addDoc(AddUpdateCommand cmd)

 method, debug step by step from this class, and finally know how solr uses docValue. When solr adds a document, it needs to convert the SolrInputDocument passed by the client into a lucene document, and change the solr field into a lucene field. The use of docValue is here. In the parameter cmd of the addDoc method above, there is The method to get the document of lucene, AddUpdateCommand.getLuceneDocument()

public Document getLuceneDocument() {
	return DocumentBuilder.toDocument(getSolrInputDocument(), req.getSchema());
}

The name of the above method vividly shows that the document of lucene is to be obtained. In the toDocument method, there is a method to convert the chemafield of solr to the field of lucene, as follows:

//
private static void addField(Document doc, SchemaField field, Object val, float boost) {
	if (val instanceof IndexableField) {
		// set boost to the calculated compound boost
		((Field) val).setBoost(boost);
		doc.add((Field) val);
		return;
	}
	for (IndexableField f : field.getType().createFields(field, val, boost)) {
		if (f != null)
			doc.add((Field) f); // null fields are not added
	}
} lei

 The most critical of the above is the field.getType.createFields method, which is to obtain the field of lucene according to the FieldType of the SchemaField, so you only need to look at a few commonly used FieldTypes. The most commonly used in work are three, one is a number Type, one is String (String) type, one is Text type (also string change, but Text type is word segmentation).

 

1. Digital type: 

        For int, float, long, double, or tint, tfloat, tlong, tdouble, their parent class is TrieField, the full name is: org.apache.solr.schema.TrieField, look at his createFields(SchemaField, Object, float) method:

public List<IndexableField> createFields(SchemaField sf, Object value, float boost) {
	if (sf.hasDocValues()) { //When the schema contains docValue
		List<IndexableField> fields = new ArrayList<>();
		final IndexableField field = createField(sf, value, boost);//This is to generate an intField or longFiled according to the configuration, this is without docValue. Look at the next method
		fields.add(field);
		if (sf.multiValued()) {//If this value is multiValued
			BytesRef bytes = new BytesRef ();
			readableToIndexed(value.toString(), bytes);
			fields.add(new SortedSetDocValuesField(sf.getName(), bytes));//Create another SortedSetDocValue field.
		} else {//If it is a single value domain
			final long bits;
			if (field.numericValue() instanceof Integer || field.numericValue() instanceof Long) {//
				bits = field.numericValue().longValue();
			} else if (field.numericValue() instanceof Float) {
				bits = Float.floatToIntBits(field.numericValue().floatValue());
			} else {
				assert field.numericValue() instanceof Double;
				bits = Double.doubleToLongBits(field.numericValue().doubleValue());
			}
			fields.add(new NumericDocValuesField(sf.getName(), bits));//Create a simple NumericDocValue
		}
		return fields;
	} else {//Without docValue
		return Collections.singletonList(createField(sf, value, boost));
	}
}

 As you can see above, if this field is docValue (that is, docValues=true is written in schemField), an additional docValue field will be created, if it is a multi-value field, a SortedSetDocValued type will be created, and finally The returned list contains two fields; if it is a single-valued field, a NumericDocValue type will be created.

The following is the method called to produce a field without docValue,

public IndexableField createField(SchemaField field, Object value, float boost) {
	boolean indexed = field.indexed();
	boolean stored = field.stored();
	boolean docValues = field.hasDocValues();
	if (!indexed && !stored && !docValues) {//If there is no indexing, or no storage, or no dcoValues, nothing will be created, in other words, if a domain is to be meaningful, at least resume indexing, Or at least save or resume docValue.
		if (log.isTraceEnabled()) log.trace("Ignoring unindexed/unstored field: " + field);
		return null;
	}FieldType ft = new FieldType();//Create a fieldType, it can be found that in the fieldType of the number type, only five attributes are saved, which are storage, word segmentation, indexing, ignoring standard factors, and the format of the 
	inverted list. without word vectors
	ft.setStored(stored);
	ft.setTokenized(true);
	ft.setIndexed(indexed);
	ft.setOmitNorms(field.omitNorms());
	ft.setIndexOptions(getIndexOptions(field, value.toString()));
	switch (type) {
	case INTEGER:
		ft.setNumericType(NumericType.INT);
		break;
	case FLOAT:
		ft.setNumericType(NumericType.FLOAT);
		break;
	case LONG:
		ft.setNumericType(NumericType.LONG);
		break;
	case DOUBLE:
		ft.setNumericType(NumericType.DOUBLE);
		break;
	case DATE:
		ft.setNumericType(NumericType.LONG);
		break;
	default:
		throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Unknown type for trie field: " + type);
	}
	ft.setNumericPrecisionStep(precisionStep);
	final org.apache.lucene.document.Field f;
	switch (type) {
	case INTEGER:
		int i = (value instanceof Number) ? ((Number) value).intValue() : Integer.parseInt(value.toString());
		f = new org.apache.lucene.document.IntField(field.getName(), i, ft);//Create a specific field and pass in ft as a parameter.
		break;
	case FLOAT:
		float fl = (value instanceof Number) ? ((Number) value).floatValue() : Float.parseFloat(value.toString());
		f = new org.apache.lucene.document.FloatField(field.getName(), fl, ft);
		break;
	case LONG:
		long l = (value instanceof Number) ? ((Number) value).longValue() : Long.parseLong(value.toString());
		f = new org.apache.lucene.document.LongField(field.getName(), l, ft);
		break;
	case DOUBLE:
		double d = (value instanceof Number) ? ((Number) value).doubleValue()
				: Double.parseDouble(value.toString());
		f = new org.apache.lucene.document.DoubleField(field.getName(), d, ft);
		break;
	case DATE:
		Date date = (value instanceof Date) ? ((Date) value) : dateField.parseMath(null, value.toString());
		f = new org.apache.lucene.document.LongField(field.getName(), date.getTime(), ft);
		break;
	default:
		throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Unknown type for trie field: " + type);
	}
	f.setBoost(boost);
	return f;
}

To sum up the schemaField of the numeric type, if there is no dcoValues, only a common type of lucene field will be created according to the type. If there is a docValue, two fields will be created, one is a normal field, and the second one contains docValue. , the two fields contain the same name and content, and both fields will be used at the same time when indexing. If it is multi-range, it will create a SortedSetDocValue, if it is single-range, it will create a NumericDocValue. When creating a normal field without docValue, only five attributes are saved, but no word vectors.

 

2. String type

       String type is wordless, he also has docValue, the corresponding fieldType is: org.apache.solr.schema.StrField. Take a look at his createFields method

public List<IndexableField> createFields(SchemaField field, Object value, float boost) {
	if (field.hasDocValues()) {//If it contains docValue
		List<IndexableField> fields = new ArrayList<>();
		fields.add(createField(field, value, boost));//普通的value
		final BytesRef bytes = new BytesRef (value.toString ());
		if (field.multiValued()) {//If it is a multi-valued field, a SortedSetDocValue will be created
			fields.add(new SortedSetDocValuesField(field.getName(), bytes));
		} else {
			fields.add(new SortedDocValuesField(field.getName(), bytes));//If it is a single-valued field, a SortedDocValue will be created.
		}
		return fields;
	} else {
		return Collections.singletonList(createField(field, value, boost));
	}
}

In addition, the createField method of String type does not override FieldType's own createField method, as follows:

public IndexableField createField(SchemaField field, Object value, float boost) {
	if (!field.indexed() && !field.stored()) {
		if (log.isTraceEnabled())
			log.trace("Ignoring unindexed/unstored field: " + field);
		return null;
	}
	String val;
	try {
		val = toInternal(value.toString());
	} catch (RuntimeException e) {
		throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,"Error while creating field '" + field + "' from value '" + value + "'", e);
	}
	if (val == null)
		return null;

	org.apache.lucene.document.FieldType newType = new org.apache.lucene.document.FieldType();
	newType.setIndexed(field.indexed());//Whether to create an index
	newType.setTokenized (field.isTokenized ()); // Pros and cons
	newType.setStored(field.stored());//Whether to store
	newType.setOmitNorms(field.omitNorms());//Whether to ignore the standard factor
	newType.setIndexOptions(getIndexOptions(field, val));//Format of the inverted list
	newType.setStoreTermVectors(field.storeTermVector());//Whether word vectors are stored
	newType.setStoreTermVectorOffsets(field.storeTermOffsets());//Whether to store the offset of the word vector
	newType.setStoreTermVectorPositions(field.storeTermPositions());//Whether to store the position increment of the word vector. It can be found that the string type can record the word vector.

	return createField(field.getName(), val, newType, boost);//This is very simple, ignore
}

To summarize the string type, if there is a docValue, there are also two fields, one is a common field without docValue, and the other is a field containing docValue. The field containing docValue is divided into sortedSetDocValue and SortedDocValue according to whether it is a multi-valued field. . In addition, unlike the field of the number type, the string type can have word vectors.

 

3. Text type

This class is a domain that can be segmented. According to common sense, he will not use docValue, because if a domain is segmented, it is meaningless to do facet and sort, but let's take a look at the code.

The createFields method of FieldType is not overridden in the TextField class, so let's take a look at the createFields method of FieldType:

public List<IndexableField> createFields(SchemaField field, Object value, float boost) {
	IndexableField f = createField(field, value, boost);
	if (field.hasDocValues() && f.fieldType().docValueType() == null) {
		throw new UnsupportedOperationException("This field type does not support doc values: " + this);
	}
	return f == null ? Collections.<IndexableField> emptyList() : Collections.singletonList(f);
}

 He will also call the createField (note that there is no s) method, but the createField method is not overridden in TextField. Judging from the createField method of the String type above, if the createField method in FieldType is not overridden, a file containing docValue will not be created. of the field. This also matches common sense.

 

So simple, after reading how solr uses docValue when creating an index.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326162959&siteId=291194637