Reasonable setting of Solr Schema to prevent OOM

background:

      I received an alarm at night, saying that a solr collection on the line had hung up. I quickly opened the remote service and checked the status of the server. Sure enough, all the queries from the business side timed out, and the incremental update also went down. From the abnormal information, it is in the cluster. There are no available nodes to use. Seeing such a problem, the first thing that comes to mind is to restart the server. The tragedy is that after restarting the server, the service was only normal for 15 seconds, and then it all went down.

     

      Judging that the heap memory of the VM has overflowed, I looked at the virtual machine startup parameters -Xmx4400m -Xms4400m (the server has 8G memory)

, the temporary solution is to increase the memory, set it to -Xmx6400m -Xms6400m, after the setting is completed, after quickly restarting the server, after observing for a while, the core node will not hang up, but, through the command tool jstat -gcutil command to view VM fullgc It is observed that the frequency of Full GC is relatively high, but the server barely hangs.

     

     My colleague told me that their team had just launched a new query request. This prompted me whether it was because of what query conditions were set in the query. Then I checked the log carefully. Sure enough, I observed two queries in the query log. With the parameter of sort, the fields that need to be sorted only set indexed=true in the schema field node without setting docValues=true, and instantly understand why OOM occurs.

 

Cause Analysis:

      The reason is that the field that needs to be sorted in the client query request has indexed=true turned on in the schema field definition, but docValue=true is not turned on, so that when Solr sorts the hit result set, it will prepend the value corresponding to the docid. In loading memory, if there are 2000w documents in a core, imagine a field of long field type, which requires 2000w*8 bytes of memory in memory, about 152 megabytes of memory, and this memory block needs to be accompanied by documents. Content updates, frequent refreshes, and frequent OOM of memory can also be imagined.

     Starting with solr 5.0, the docvalue mechanism was introduced into the framework. According to my current understanding, this storage format has the following three characteristics:

  1. This is always stored in columns, so retrieving the content of a single column by docid is many times faster than the content of a document stored based on rows
  2. The content in the storage is physically arranged in order. Using this feature, when sorting documents, you only need to use the docid to get the offset offset on the corresponding storage, and the two values ​​can be judged by this offset offset. The size of org.apache.solr.response.SortingResponseWriter can save additional IO overhead. Based on this, the solr framework can derive many very cool functions, such as stream export functions based on "/export", and stream expression functions based on "/export".
  3. The storage content is not dependent on memory, which is fundamentally different from the old version of the fieldcache mechanism.

The following is a view of the execution stack path of Solr using the sort field in the query:


 From the calling path, the getNumerics method of FieldCacheImpl will eventually be called, as follows:

@Override
  public NumericDocValues getNumerics(LeafReader reader, String field, Parser parser, boolean setDocsWithField) throws IOException {
    if (parser == null) {
      throw new NullPointerException();
    }
    // Whether docValue=true is enabled on the schema field
    final NumericDocValues valuesIn = reader.getNumericDocValues(field);
    if (valuesIn != null) {
      // Not cached here by FieldCacheImpl (cached instead
      // per-thread by SegmentReader):
      return valuesIn;
    } else {
      final FieldInfo info = reader.getFieldInfos().fieldInfo(field);
      if (info == null) {
        return DocValues.emptyNumeric();
      } else if (info.getDocValuesType() != DocValuesType.NONE) {
        throw new IllegalStateException("Type mismatch: " + field + " was indexed as " + info.getDocValuesType());
      } else if (info.getIndexOptions() == IndexOptions.NONE) {
        return DocValues.emptyNumeric();
      }
      return (NumericDocValues) caches.get(Long.TYPE).get(reader, new CacheKey(field, parser), setDocsWithField);
    }
  }
 

 

Through the code, I know that the getNumericDocValues ​​method of LeafReader is called first. Whether the result is empty depends on whether the field definition in the schema is set to docValue=true. Five indexed term-based, field loading strategies prepared in:

 

  private Map<Class<?>,Cache> caches;
  FieldCacheImpl () {
    init();
  }
  private synchronized void init() {
    caches = new HashMap<>(6);
    caches.put(Long.TYPE, new LongCache(this));
    caches.put(BinaryDocValues.class, new BinaryDocValuesCache(this));
    caches.put(SortedDocValues.class, new SortedDocValuesCache(this));
    caches.put(DocTermOrds.class, new DocTermOrdsCache(this));
    caches.put(DocsWithFieldCache.class, new DocsWithFieldCache(this));
  }
    I was wondering, since there is already a docvalue mechanism in solr5.0, why keep these fieldcache mechanisms that are preloaded into memory through term in the framework, because once the user needs to use sorting, the function forgets to define the docvalue in the schema as True, once the number of documents is large, it is likely to cause OOM, maybe the developer of solr is for the reason of backward compatibility of the version.

 

    problem solved:

     How to solve the OOM problem caused by the improper use of the user index structure is to mark in the wiki how to configure the schema carefully to prevent similar problems. This is like drawing a double yellow line on the road in an urban area to explicitly tell drivers not to cross the double yellow line and drive in the opposite direction, but the fact is that during peak hours, as long as there is no surveillance, there is always the courage. For drivers to cross the double yellow lines, the solution is to build an isolation belt in the middle of the road like on the highway to forcibly prevent driving to the opposite lane, but this cost is indeed a bit high, but it is very effective. We also need to learn from similar experience in making platform-based products. We need to build a track in platform products to ensure that users operate on the established track. If users try to jump out of the established track, we will inform us through friendly feedback messages. He has gone off track and needs to be corrected in time. Such an approach may be more effective and friendlier than writing development specifications in a wiki.

 

    So when I started the solr container, I performed an operation to clear the cache prepared by the solr framework in advance. If there is an operation that tries to perform an operation such as sort without using the docvalue mechanism, an error will be reported, so that it can be avoided during the development process. The error caused by the unreasonable setting of the schema, the code is as follows:

   RemoveFieldCacheListener:

 

public class RemoveFieldCacheListener implements ServletContextListener {

	@Override
	public void contextInitialized(ServletContextEvent sce) {
		RemoveFieldCacheStrategy.removeFieldCache();
	}
	@Override
	public void contextDestroyed(ServletContextEvent sce) {
	}
}
 RemoveFieldCacheStrategy:

 

import java.io.IOException;
import java.lang.reflect.Field;
import java.util.Map;

import org.apache.lucene.index.BinaryDocValues;
import org.apache.lucene.index.LeafReader;
import org.apache.lucene.index.SortedDocValues;
import org.apache.lucene.uninverting.FieldCacheImpl.Cache;
import org.apache.lucene.uninverting.FieldCacheImpl.CacheKey;
import org.apache.lucene.uninverting.FieldCacheImpl.DocsWithFieldCache;
import org.apache.lucene.util.Accountable;

public class RemoveFieldCacheStrategy {

	@SuppressWarnings("all")
	public static void removeFieldCache() {
		try {
			FieldCacheImpl fieldCacheManager = (FieldCacheImpl) FieldCache.DEFAULT;
			Field cacheField = FieldCacheImpl.class.getDeclaredField("caches");
			cacheField.setAccessible(true);
			// Prevent the field from setting indexed=true when the docvalue attribute is not set in the schema at startup
			// Preload the value of the term of the doc into the memory to prevent the business party from improperly setting the query object and causing the server to oom
			Map<Class<?>, Cache> caches = (Map<Class<?>, Cache>) cacheField.get(fieldCacheManager);

			FieldCacheImpl.Cache disable = new FieldCacheImpl.Cache(null) {
				@Override
				public Object get(LeafReader reader, CacheKey key, boolean setDocsWithField)
						throws IOException {
					throw new IllegalStateException(
							"you are intending to use sorting,facet,group or other statistic feature,please set field:["
									+ key.field + "] docValue property 'true'");
				}
				@Override
				protected Accountable createValue(LeafReader reader, CacheKey key,
						boolean setDocsWithField) throws IOException {
					return null;
				}
			};
			caches.clear();
			caches.put(Long.TYPE, disable);
			caches.put(BinaryDocValues.class, disable);
			caches.put(SortedDocValues.class, disable);
			caches.put(DocTermOrds.class, disable);
			caches.put(DocsWithFieldCache.class, disable);

		} catch (Exception e) {
			throw new RuntimeException(e);
		}
	}
}

 over!

 

   

      

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326780280&siteId=291194637