Using cached 9 Mistakes (rpm)

If you want to frequent a site or application optimization, it can be said that the use of the cache is the fastest and most obvious effect of the way. In general, we will put some common, or it takes a lot of resources or time to produce the data cached so that subsequent use more quickly.

  If you really want to elaborate on the benefits of caching, quite a lot, but in practical applications, often using the cache, always so unsatisfactory. In other words, the use of cache have been assumed to be such that the performance is 100 (where only one metering digital symbols only, just to give you an "amount" experience), but many times, to enhance the effect of only 80, 70, or less, or even lead to serious performance degradation, this phenomenon is particularly prominent in the use of distributed cache.

  In this article, we will tell you more than nine crux of the problem, and gives corresponding solutions. Taking as an example .NET demo code of conduct, and other technology platforms to be friends is also a reference value, as long as the replacement code corresponding to the line!

  In order to make the elaborate later more convenient, but also makes the article more complete, we first look at two forms of caching: local memory cache, distributed cache.

  For the first local cache memory, the data is cached in memory in the machine, shown in Figure 1:

  It can be seen clearly from the figure above:

  • Application data is cached in memory of the machine, when you need to go directly to the machine's memory is acquired.
  • For .NET applications, and at the time of acquiring data in the cache, the object is to find data in memory by reference to an object, he said, if after we get the data object by reference, we directly modify the object, in fact, we are modifying the real objects in the cache memory.

  For distributed cache, this time because of the cached data is in the cache server, or that, at this time the application needs to access process across distributed cache server, shown in Figure 2:

  Regardless of where the cache server, because it involves a cross-process, and even cross-domain access to cache data, then the cache data must first be serialized before being sent to the cache server, and when to use cached data, the application server receives a sequence of the following data will be the deserialization. Serialization and de-serialization process is very CPU intensive operation, many problems appeared in it.

  In addition, if we get to the data, the application has been modified, then the original data in the cache server is not modified, unless we again will save the data to the cache server. Note: This and previous local memory cache is not the same.

  For each piece of data in the cache, for describing aspects of the text, which we call "the cache entry."

  After the popularity of these two concepts is over, we will enter today's topic: using the cache of nine common errors:

  1. Too dependent on .NET default serialization mechanism
  2. Large Object Cache
  3. Use caching mechanism for data sharing among threads
  4. After calling believe cache API, the data will be cached immediately
  5. Caching large data sets, and wherein a portion of the reading
  6. FIG object cache structure having a large number of waste memory
  7. Cache configuration information for the application
  8. Using many different key entry point to the same cache
  9. No timely update, or delete another cache has expired or invalid data

  Here, we look at each specific point to!

  Too dependent on .NET default serialization mechanism

  When we use in the application caching across processes, such as memcached distributed caching or Microsoft AppFabric, at this time the data is cached outside of the application process. Every time, when we need to put some cached data when the data cache API will first form into a sequence of bytes and sends them to byte cache server to save. Similarly, when we want to re-use the cached data in the application, the cache server will cache bytes sent to the application, and then cached client library receives these bytes will be deserialized operation, and we need to convert it to data objects.

  In addition there are three things to note is that:

  • The serialization and de-serialization mechanism is occurring on an application server and cache server is only responsible for saving it.
  • The default serialization mechanism used in .NET is not optimal, because it wants to use reflection, and reflection is very CPU intensive, especially when we cached the more complex data objects of the time.

  For this issue, we have to choose a good sequence of methods to reduce CPU usage as much as possible. Commonly used method is to make objects that implement the ISerializable interface.

  First we look at the default serialization mechanism is kind of how. Figure 3:

  Then, to achieve our own ISerializable interface, as shown in Figure 4:

  Our own way to achieve the maximum difference between the default .NET serialization mechanism is: do not use reflection. In this way the speed of their own to achieve can be the default mechanism of a hundred times.

  Some people may think that nothing, not that a little sequence of it, it is necessary to fuss over it?

  In developing a high-performance applications (eg, website), the writing from architecture to code, as well as the back of the deployment, every place needs to be optimized. A small problem, for example, the sequence of the problem, not a problem at first glance, the amount of access to our site if the application is millions, millions, or even higher level, and these need to get some public access to the cached data, prior to this so-called small problem is not small!

  Below, we look at the second misunderstanding.

  Large Object Cache

  Sometimes, we want to put up some big object cache, generating a high price because of large objects, we need to generate once, use many times as possible, so as to enhance the response.

  Mentioned large object, here it is very necessary to conduct a more in-depth introduction to. In .NET, the so-called large object, which refers to the memory occupied by objects larger than 85K, the following by a comparison of the problem clear.

  If you now have a collection of Person class, defined as List <Person>, each Person object occupies 1K of memory, if the Person collection contains 100 Person object instance, then this set if it is a large object?

  The answer is: No!

  Just because the Person object in the collection contains a reference to the example of a, i.e., the .NET managed heap above, the Person set allocated memory size is 100 in terms of size reference.

  Then, for the following object it is a large object: byte [] data = new byte [87040] (85 * 1024 = 87040).

  Speaking here, to talk about it, why: great produce a big target price.

  Because in .NET, large objects are allocated in large object on the managed heap above (we referred to as "piles", of course, there is a corresponding small pile), and this allocation mechanism objects and small piles on top of the heap not the same: piles at the time of distribution, always need to find the right memory space, the result is cause memory fragmentation, resulting in out of memory! We describe using a map shown in Figure 5:

  The figure is clear, in Figure 5:

  • Piles of garbage collection will not be compressed after recovery of objects (small pile is compressed).
  • Time allocated objects, it is necessary to traverse the piles, the need to find the right space, is to spend the cost of traversal.
  • If some space is less than 85K, then it can not be assigned, and only wasted, but also lead to memory fragmentation.

  After finished those, we get down to business, take a look at the cache large objects.

  As previously mentioned, the object cache and the read time is to be serialized and deserialized, cached objects larger (e.g., with a 1M etc.), the whole process will consume more CPU.

  For such large objects, depending on whether it is used very frequently, whether it is a common data objects, each user must still produced. Because once we cached (particularly in the distributed cache), you need to consume memory and CPU application server cache servers. If you do not use frequently, I advise you to generate! If the data is public, it is recommended that a lot of testing: it will cost when production costs and large object cache consumes memory and CPU, selects a small cost! If it is to be generated for each user, see if you can break down, if it does not break down, then the cache, but the timely release!

  Use caching mechanism for data sharing among threads

  When the data in the cache when multiple threads of our programs can access the public areas. When multiple threads access the cache data, it will have some competition, which is also multi-threaded issues often occur.

  Here we were two aspects cache introduce competition brought from local and distributed memory cache.

  Look at the following piece of code:

  For local cache memory, for the above code, after running the three threads in the thread 1, when the value of item number may be 1, 2 may be the thread 2, the thread 3 may be 3. Of course, this is not necessarily! Only possible value in most cases!

  If it is for a distributed cache, not to say! Because the modified data is not immediately occur in the memory of this machine, but through a process of cross-process.

  Some cache module has achieved a locked way to solve this problem, for example, AppFabric. Pay special attention in modifying cached data when this point.

  After calling believe cache API, the data will be cached immediately

  Sometimes, when we call the cache API, we will think: data has been replaced, then you can read the data in the cache directly. Despite this situation many times, but not absolute! Many of the problems created in this way!

  Let's explain with an example.

  For example, for an ASP.NET application, if we call the cache API in the Click event of a button, and then when the page renders, go read cache, as follows:

  The code above reasons as it is, but the problem occurs. After clicking the button when the return page, and the page is rendered display data, the process is no problem. But not take into account the question: If the server's memory is tight, resulting in server memory is recovered, it is likely that the cached data, there is no!

  Here are a friend would say: garbage collection so fast?

  This is mainly to see some of our setup and processing.

  In general, caching mechanism would set an absolute expiration time is relative expiration time difference between the two, we should be very clear, I have not much to say here. For the above code, if we set the absolute expiration time, assuming one minute, if the page is processed very slowly, for more than 1 minute, and then wait until the time of presentation, may be data in the cache has no!

  Sometimes, even if we cached data in the first line of code, then perhaps in the third line of code, we have to read the data cache when it is gone. This may be because the server memory data directly cleared away a lot of pressure, the cache mechanism accessed least. Or the server CPU busy, the network is not good, cause data has not been saved to the cache server, even if the serialized.

  In addition, ASP.NET, if using a local memory cache, then, but also to the IIS configuration issues (restrictions on the cache memory), we have a special opportunity to share this knowledge for everyone.

  So, every time when using the cached data to determine whether there is, otherwise, there will be a lot of "object not found" error, we produce some consider to be "strange and reasonable phenomenon."

  Caching large data sets, and wherein a portion of the reading

  In many cases, we tend to cache a set of objects, however, when we read, but each read part of it. We give an example to illustrate this problem (example may not be appropriate, but suffice).

  In the shopping site, common operation is to query the information of some products, this time, if the user enters "25-inch TV," and then look for related products. This time, in the background, we can query the database to find hundreds of such data, and then, we bring together hundreds of data cache as a cache entry, the code of the code as follows:

  At the same time, we display the page to find products, 10 per impression. In fact, every time the page, we are going to obtain key data according to the buffer, and then select the next 10 data, then displays.

  If you are using a local memory cache, then this may not be a problem, if it is a distributed cache, the question came. The following figures clearly illustrate this process, as shown:

  I believe we read this chart, then before the combination should be very clear about the problem: every time get all the data in the cache key, then the application server where deserialize all the data, but only take one 10.

  Here again split the data set may be divided into e.g. 25-0-10-products, 25-11-20-products such cache entry, as shown below:

  Of course, there are many ways of querying and caching, split the way there are many, here it is given some common questions!

  FIG object cache structure having a large number of waste memory

  To better illustrate this problem, we first see one of the following class structure diagram, as shown:

  If we need to put some Customer data is cached, there can be two possible problems:

  1. Since .NET using default serialization mechanism, or added without proper corresponding Attribute (attributes), such that the cache does not need to cache some of the original data.

  2. The Customer caching, the same time, for faster access to information on Customer's Order, the Order information is cached in another cache entry, resulting in the same data is cached twice.

  Here, we take a look at these two issues separately.

  First to see the first one. If we use a distributed cache to cache some information about Customer's time, if we do not re-Customer own serialization mechanism, but the default, then the serialization mechanism when serialized Customer, the Customer will be referenced using the objects are also serialized, then the sequence of other reference object is serialized object, the final result is: Customer is serialized, Customer's Order information is serialized, Order cited OrderItem be serialized last OrderItem references Product It will be serialized.

  All the entire object graph is serialized, and if this is what we want, then there is no problem; if not, then we're wasting a lot of resources, and the solution is twofold: first, to achieve their own serialization, full control of their own which objects need to be serialized, we have already spoken; second, if you use the default serialization mechanism, then do not need to serialize objects above plus [NonSerialized] mark.

  Below, we see the second question. This problem is mainly due to the first question arising: the original cache Customer when Customer has additional information, such as Order, Product has been cached. But many of the art this is not clear, then again the Customer Order information to other cache entries in the cache, used to use, for example, the cache ID to obtain the Customer Order information according to the identifier, as shown in the following code:

  The solution to this problem is also more obvious, see the solution of a problem on it!

  Cache configuration information for the application

  Because the cache is a set of data failure detection cycle (said before, either a fixed time has expired or is invalid relative time), so many technicians like to put some of the information stored in the dynamic cache to take advantage of caching this feature of the mechanism, wherein the configuration information caching program is one example.

  Because in some configuration applications, changes may occur, the simplest is the database connection string, the following code:

  Once this is set, every once in a while after a cache miss, go reread the configuration file, this time, it is possible to configure at this time and not the same as before, and that other places can be read cache to perform updates, special it is time to deploy the same site on multiple servers above, sometimes, we do not have timely information to modify server configuration files for each of the above sites inside, this time how to use distributed cache cache configuration information, as long as a site update configuration file, all the other sites on the revised technical staff happy. OK, this really seems like a good way (if necessary can use it), but not all of the configuration information should remain the same, but also to consider how a situation: If the cache server is a problem, downtime , then all of our sites use this configuration information may all go wrong.

  Recommended information for these profiles, the use of monitoring mechanisms, such as file monitoring file changes occur every time, you reload the configuration information.

  Using many different key entry point to the same cache

  We sometimes encounter such a situation: we have a cached object is to get the data as a key cache key, then, we have to get through this data as a cache key index, as shown in the following code:

  The reason we write this, mainly because we read in several ways from the cache data, for example, during the time loop through, you need to get the data via an index, such as index ++, etc., and in some cases, we may need other way, for example, the product name to get information about the product.

  If you encounter such a situation, then it is recommended that a plurality of these keys are combined to form the following form:

  Another common problem is: the same data is cached in different cache entries, for example, if a user queries the size of 36-inch color TV, then there could be a possible No. 100 on TV products in the result, at this time, we will result cache. In addition, users find a manufacturer for the TCL television, if the number is 100 TV product appeared in the results, and the results we cached in another cache item. This time, it is clear that there has been a waste of memory.

  For such a case, the author before the method is employed to create a cache index in the list, as shown:

  Of course, of which there are many details and problems to be solved, not one by one to tell, depending on the respective application and the case may be! Are also very welcome to offer a better way.

  No timely update, or delete another cache has expired or invalid data

  This should be the most common problems using the cache, for example, if we get all the information did not deal with the orders of a Customer's then cached, similar code is as follows:

  After that, a user of the order has been processed, but the cache has not been updated, so this time, the data in the cache has a problem! Of course, I am here just listed the simplest scenario, we can think of your own other product applications, it is likely there will not be the same as the cache and the actual data in the database.

  Now a lot of the time, we had to tolerate the inconsistency of such a short time. In fact, for this case, there is no perfect solution, if you do, it touches can be achieved, for example, each time a change, or delete data, go through all the data in the cache, and then to operate, but this is often not worth the candle. Another approach is a compromise, the change period of the data is determined, then the cache as a little shorter time

Guess you like

Origin www.cnblogs.com/q149072205/p/12530719.html