Summary of pit avoidance for four types of cache

background

Distributed, cached, asynchronous, and multithreaded are known as the four magic weapons of Internet development. Today I will summarize the problems encountered in the actual projects of the four caches that are often encountered in project development.

JVM in-heap cache

The JVM in-heap cache is still widely used in projects because it can avoid network communication failures in centralized caches such as memcache and redis.

In-heap caching needs to pay attention to GC issues. If our design is to periodically pull data from a remote location to update the local cache. Be sure to pay attention to two points: first, do not pull and cover the entire amount, and second, do not replace a large object with a new object as a whole.

Let’s talk about full pull coverage first. Full pull will have a large network overhead and cause network traffic spikes. Some people say it's okay, we have enough bandwidth, and access to the intranet is not afraid. But one thing that needs to be cultivated for stability is to cut peaks and fill valleys. Let the system run in a stable environment. Otherwise, there is a sudden burst of data when pulling new data from the large cache? According to Murphy's Law, anything that has a chance of happening will happen. Program with caution.

Let's talk about the problem of overall replacement of large objects, which will cause GC problems. The pseudocode is as follows:


List<POJO> oldList = initList();
public void refresh() {
      List<POJO> newList = dataFromNetworkService.getAll();
      oldList = new List();
      for(POJO pojo : newList) {
          oldList.add(pojo);
      }
}

If the data pulled from the Internet and the data stored in the cache, the object type has not changed. The conversion overhead incurred is slightly smaller. Because, for example, the object POJO exists in a list. Although this list is very large, it contains references to objects. The actual POJO has not changed. Although the above pseudo code creates a new list object, it is more stupid to traverse and add new objects than directly oldList=newList. But the traversal process actually does not change the pojo object. So what affects GC here is only the oldList object (excluding the process of pulling data back from the network).

But if the code is written like this:


List<POJO2> oldList = initList();
public void refresh() {
      List<POJO1> newList = dataFromNetworkService.getAll();
      oldList = new List();
      for(POJO2 pojo : newList) {
          oldList.add(Beanutils.copy(new POJO2(), pojo));
      }
}

The traversal process will create all the original POJO1s again. Generally, these objects first enter the new generation of heap memory, and then enter the old generation after several young gcs. Will cause frequent GC.

For the projects I have worked on, it is generally considered that fullgc once or twice a day is a reasonable value. In this way, for example, if you know in advance that there will be a big promotion at a certain point in time, you can avoid the outbreak of fullgc during the peak period by triggering GC in advance. It is considered normal for younggc to be triggered at least once every 5 minutes, or even longer. In this way, scenarios such as spikes can be avoided through control.

JVM off-heap cache

The memory recovery principle of the off-heap cache uses Java's phantom references (see "Java's strong references, soft references, weak references, and phantom references" ). This design can avoid the GC problem of the JVM, but if it is not handled properly, it may cause more serious consequences: the entire machine memory is full, and the machine may hang. In fact, it is okay to hang up a production environment in a general enterprise, because there are usually redundant machines for disaster recovery. But a more common situation is that the machine is busy with swap memory exchange, the machine is alive but the response is very slow. Belonging to half-dead.

I haven't encountered this problem online, but my colleague encountered it before when he was in a super factory.

Some students said that I should strictly calculate the memory and do a good job of monitoring. It is necessary to rely on human factors for emergency treatment. And man is the least reliable of all stability. Because the problem usually does not occur when the person is sober and has few things on hand. It's an existence that makes things worse. For example, during a big promotion, the traffic will increase, and the number of threads will increase. Each thread will apply for thread stack resources, and the system will process IO. At this time, the system will apply for more buffers/cached memory.

linux的buffers/cached

Run the top command or the free command on the Linux system, and you can see the data related to buffers and cached. It should be noted that the percentage of free memory that we usually see in the monitoring data is not free/total shown below, but (free+buffers+cached)/total .

Buffers are usually used as IO caches for block storage in Linux systems. The so-called block storage can be simply understood as writing data directly to raw disks. Cached is generally used for IO caching of the file system. For example, the memory paging function of page cache.

It doesn't matter if you don't understand, because in fact the two are often used together. For example, it is used when exchanging data with disks and performing network communications. Buffers and cached are actually used by the system process of the operating system, but they can be released quickly if the user process needs it. So it is usually counted into the remaining available memory.

But this should also be paid attention to. For example, in an IO-intensive system, if the buffers/cached are heavily occupied, the IO speed will be reduced, thereby reducing the system throughput. It may even take several seconds for a request to reach the application, causing the request to time out.

centralized cache

In fact, the redis cache also has a local proxy, which can cache some active data on the local machine. When the local machine can retrieve data, it does not need to communicate across the network. But because redis is essentially a key-value structure. If you need to fetch the full amount of data based on wildcards, if the network fails, the integrity of the data may be affected.

But the most worrying thing about redis cache is its non-standard usage. For example, store a large value. Specifically, the problems caused by this to the network and storage will not be detailed. Imagine a clogged toilet.

Summarize

Tom Cargill, an object-oriented programming expert at Bell Labs, said:

The first 90% of your development work will take up 90% of your first development time. The remaining 10% of the development effort will consume the other 90% of your development time.

I understand that the remaining 10% takes up 90% of the time because it exceeds the original knowledge reserve, and it is necessary to cram for the time being, or even to find a nail with a hammer. So alternatively you can do this:

Continue to invest 5% of the learning time and 10% of the thinking time every week, and then use 100% of the time to complete 100% of the development.

 

Guess you like

Origin blog.csdn.net/Jernnifer_mao/article/details/131358516