Middleware Cache - Cache architecture implementation (under)

Middleware Cache - Cache architecture implementation (under)

Foreword

Cache architecture, it plainly pressure is the use of various means to achieve the cache, thereby reducing the server, and even databases.

Here to put out the technical classification previously proposed cache architecture:

  • Browser cache
    • Cookie
    • LocalStorage
    • SessionStorage
  • CDN cache
  • Load level cache
    • Nginx cache module
    • Squid caching server
    • Lua extension
  • The application layer caching
    • Etag
    • ThreadLocal
    • Guava
  • External cache
    • Redis
  • Database Cache
    • MySql cache

In front of the "cache middleware - to achieve (on) cache architecture" has a brief description of the browser cache, CDN cache, load-tier caching. This will continue to elaborate the application layer cache, external cache, the database cache.

The application layer caching

Caching application layer, often a user's request finally reaches the application server, but does not reach the database, which relates specifically to develop the application server.

Etag

The reason why the Etag technology in the application layer caching, because the user's request must reach the application layer.

Etag means that, if the request content is the same request twice, then twice the response should be the same. Then the response to the first request, it can act as a response to the request a second time.

Of course, the actual business, there are two request are the same, but the response is inconsistent (e.g., bank balance inquiry are, but not the same, two possible intermediate operation, the wages arrival). This data consistency issues related to the cache will be mentioned later. Not go into here.

Then the application server to determine how consistent it two requests. It can hash the two requests, comparison determination. Involving HTTP protocol, such as the status code 304, the request protocol header field If-None-Match, Etag response protocol header fields.

Request Process

The server has already done the corresponding development and setting (such as the Spring ShallowEtagHeaderFilter ()).

The first request
  1. Request from the client RequestA
  2. Server receives a request RequestA client, the following process:
    1. In the application, the corresponding MD5 value is calculated according to a request RequestA
    2. Etag field returned in response ResponseA protocol header is provided in front of the MD5 value calculated
    3. Returns the corresponding page
  3. The client receives the response ResponseA, displayed in the browser. And in the browser cache ResponseA
The second request
  1. RequestB client makes a request again, and the RequestA RequestB request the same content (e.g., all requests for the same page, etc.)
  2. Server receives a request RequestB client, the following process:
    1. The newly calculated ETag request, and determines whether the value of the If-None-Match header field corresponding RequestB protocol in the request (that is, before the value of the ETag field ResponseA) consistent
      1. If there is no overrun, in the setting of Response protocol state 304, the client returns the corresponding ReponseB
  3. Client receives the response ReponseB, protocol state is confirmed 304, directly before using cached response ResponseA, returns a response as a request RequestB

In fact, the above is a functional logic, according to code logic, in fact, we should say:

Client
  1. The client sends a request to prepare
  2. Browser detects whether the page has a value corresponding to the ETag field
  3. If there is a value corresponding to the request into a protocol header
  4. After the ready, you want the browser sends a request server
Server
  1. The protocol header of the request, determines whether or not provided Last-Modified / If-None-Match field
  2. If there is the corresponding fields, following determination
    1. The calculated new request ETag and If-None-Match determines whether the corresponding field of the request protocol header value (that is, before the same value of the ETag field ResponseA
      1. If there is no overrun, the protocol is set in Response to state 304, the client returns the corresponding Reponse
  3. If any one of the above two conditions are not met, then the following logic:
    1. In application, the MD5 value is calculated according to the corresponding request RequestA, stored in the application
    2. Returns the corresponding page
    3. Etag field returned in response ResponseA protocol header is provided in front of the MD5 value calculated

Rather, this should be provided by the HTTP protocol caching solution, not just ETag. Because the request only five conditions ETag HTTP protocol If-None-Match header associated with two If-Match header. In addition, there If-Modified-Since, If-Unmodified-Since, If-Range three conditional request header. If you have the opportunity to write a special blog about the HTTP protocol. Urgent small partner, you can also read Chapter VII (especially 7.8) "HTTP Definitive Guide," a book.

Advantage

  • Reduce database access pressure. ETag If successful, the process directly returns a status code 304, no database operations.
  • Reducing the pressure application server. ETag If successful, returns a status code 304 directly, without the business operation, such as log.
  • Reduce the bandwidth pressure. The statistics show that general request response model, and incubated size much larger than the packet size in response to the request. If the response is then returned to the body blank, only the protocol header 304 status code, etc., can greatly reduce the pressure in the system bandwidth.

Shortcoming

  • Investment in technological learning. If you want to better use, need to be familiar with the design cache HTTP protocol (including philosophy, architecture, steps, etc.)
  • The need for existing business systems, certain adjustments
  • Data refresh processing problems, ensure data "freshness"
  • Computing resources, applications occupation. It was suggested ETag MD5 computing brings CPU utilization problems corresponding application system. The need to talk about:
    • It depends on whether there is a specific request itself is greater than the MD5 calculation of CPU usage problem.
    • Reasonable cache architecture design generally do not have such problems (such as static resources such as CPU small footprint request, simply in front of the browser, CDN, load balancing layers disposed of)

Practical application

Practical application part, there are two points need to be mentioned.

  • As part of the disadvantages of If-None-Match, there is little need to introduce the best partner Last-Modified-Since with the use of
  • The actual development, Spring provides ShallowEtagHeaderFilter (), you can also expand their own

PS: some people think that only Last-Modified-Since you can, but only Last-Modified-Since the following problems:

  • Changes within the period of 1s, can not be processed (because the minimum time unit Last-Modified-Since recorded in seconds)
  • Although part of the data has changed, but in fact what we needed and did not change (such as periodic rewriting, etc.)
  • There is a conflict system time part of the application system (that is, the presence of the second level difference absolute system application server in the cluster instance. As for unified cluster relevant time, in the future have the opportunity to write a special blog (feel has achieved numerous flag )).

ThreadLocal

What ThreadLocal is that I am not here to explain. Do not understand the small partners, it can be understood: ThreadLocal class is a static Map, which is a key thread of execution (calling class instance thread) of the name, and value is the value of the position is set to call.

Advantage

  • (Core) to avoid contamination of the definition of the interface. The present application A-> B-> C such operation link (same JVM). However, only the A and C uses specific parameters (e.g., user information), then in order to be able to call C, B must be introduced into the specific parameters (e.g. a user parameter), even if the particular parameter B is not used. This has resulted in contamination of the interface definition (see thread-level cache ThreadLocalCache )
  • Data cache. Since ThreadLocal thread safety is achieved through the concept of a closed stack, so it has a specific use in some scenarios.

Shortcoming

  • ThreadLocal cache design changes and learning, and the original system
  • (Core) nodes due may involve multiple calls on multiple threads and call chain, so the design and troubleshooting will be more difficult

Practical application

In previously received IOT item I, the terminal reads the system program data by the sensor and the sensor is configured to obtain the original data (including monitoring of the original value, and a configuration corresponding to the configuration table (e.g., hardware identifier, alarm thresholds, etc.)). However, the raw data collected, data will be cleaned, a plurality of alarm evaluation operation data, data storage and the like. But wherein the cleaning does not involve hardware identification data, alarm thresholds and the like. Therefore ThreadLocal employed to store corresponding data (hardware configuration), to avoid contamination of the interface method. Of course, later as a result of the process is not all there is before and after the order requirements, so add event listener, asynchronous decoupling, reducing system complexity.

GuavaCache

Guava represents an application-level cache, more precisely a single JVM instance cache. When the original stand-alone system, we tend to use Redis not such a distributed cache (unless it is desired to use its data processing, such as GEO process, set process), instead of using GuavaCache or custom cache (cache custom design, back there will be a special blog).

Advantage

  • Small footprint. After all, only in a kind of stand-alone tool cache
  • Implement a simple cache management tools to meet the needs of most stand-alone system cache

Disadvantaged

  • Distributed caching middleware function is not perfect (especially cache custom tools)
  • If you are using third-party caching Guava this tool, you need to learn the cost of certain tools
  • If it is a custom implementation (for a more streamlined, customized), tend to improve the performance of the technical level has certain requirements (such as the use SoftReference etc.)
  • Changes to the original application

External cache

An important representative of external cache is Redis, Memcache such a distributed caching middleware. Of course, external cache, you should file systems division to come, it is not impossible, as long as you can meet the definition of cache.

Redis here for an example.

Redis

Redis as the current most popular distributed caching middleware, the application can be said to be very extensive, also I like to use a distributed caching middleware. It is an open source, C language, memory-based, supported by persistent log type, KV type of network programs.

advantage

  • Simple to use. Redis is used alone will not be too simple. Even newcomers can get started in a very short period of time, and in the actual application development (of course, if the project already has the relevant configuration, and provides related Util even more convenient)
  • Powerful performance. Even stand-alone Redis, can also be on a normal server performance, of over 100,000 per second grade reading and writing skills (of course, in many cases the impact of, see redis of BenchMark )
  • Powerful. GEO Redis provides related operations (distance between two points is calculated and the like), a set of related operations (intersection, union, etc.), the relevant operation flow (similar to the message queue)
  • Multi-application scenarios. Such as the Session Server (Distributed Session excellent solution), counter (Incr), distributed lock

Shortcoming

  • Redis server needs to be deployed. And in order to ensure availability, often requires a clustered deployment
  • Proficient difficult.
    • Function. Powerful Redis, its internal implementation is still a lot of things, including its persistence mechanism, memory management
    • Theory. The memory management Redis relates LRU, LFU algorithm, as well as a simplified version of a custom implementation. Or Raft distributed election algorithm and other mechanisms involved in their sentry
    • Deployment. Stand-alone deployment, as well as a variety of cluster deployment (production-level deployment, you can read my previous blog - Redis installed (stand-alone and all kinds of clusters, Ali cloud) )

Practical application

An integrated system before I took off (covering social, online education, broadcast, etc.), its Session server is supported by Redis. By <SessionId, Session> way, stored in Redis, and SeesionId saved in the user's Cookie (For some small partners worried about Cookie disabling problem, which involves knowledge of the contents of the Cookie .Cookie will be saved in the URL)

As another example (Redis application scenarios too much). IOT is responsible for the project before, where the control system alarm module have such a demand: the same terminal with a sensor in 30min, alarm only once, to avoid alarm scraper phenomenon. And have been employed in the control system Redis (the control system is deployed can be clustered to ensure the availability, to avoid the performance bottleneck), the use of a set of characteristic features and expire Redis performs a corresponding cache design. Following this special will write a blog, elaborate.

Database Cache

Here that the database refers to Mysql, Oracle database such, rather than Redis case.

Here to Mysql example, that it should be most familiar with.

Mysql

Mysql caching mechanism that caches sql text, and its corresponding cache result, Mysql server to save memory by KV form. After Mysql server, again encounter the same sql statement, it will return results directly from the cache, without the need for further sql parsing, optimization, execution.

Some people may worry that if the data changes, and the request statement is select * from xxx, would not that have been the thing to get the old data. Rest assured, mysql have to deal with this aspect, when the data corresponding to the table has been modified, then use the cache data table will all fail. So for the frequent changes of data tables, the cache is not much value.

Advantage

  • Improve performance. The same statement, the first execution may need 1s, while the second performance often requires only a few milliseconds.
  • Avoid indexing time. Because it is, to acquire a corresponding request sql results directly from the cache, no index query.
  • Reduce database disk operations. Although the request reaches the database, but if no hard disk operations (seek, read data, etc.), then the sub-database operations on the database resource consumption much smaller (because the most time-consuming operation in the database is an index and hard disk operating)
  • Reduce database resource consumption, improve query time. As it avoids all operations after obtaining the database sql, it replaced obtain data from the cache (a read operation KV, resource consumption can be almost ignored)

Shortcoming

  • mysql caching applications, and configuration requires sufficient expertise (general back-end and not very deep at this level often require specialized processing DBA)
  • mysql cache judgment rule is not smart enough to improve the query cache usage threshold, reducing its efficiency
  • mysql cache inspection and clean-up needs to occupy certain resources
  • mysql cache memory management is not perfect, would produce a certain memory fragmentation (seemingly not directly mysql database using memory, just the same JVM. If there are different opinions, can private letter or @ me. After all, I'm not good database, although just the work was carried out to take over the database middleware development. embarrassing)

Spread

Practical application

In the IOT project I received before, whether it is terminal system, or control system, there is often a large amount of data queries, a single query data often involves ten thousand, one hundred thousand to query the data and may frequent queries (data is repeatedly refresh the page).

On the one hand, I write through volume (reduce the usage of the database connection frequency), lower modify the database table data corresponding to the frequency (once every few seconds from the original, becomes once per minute). On the other hand, the cache configuration database, ensure that the database in a minute indexing operation does not require hard disk operation, return the results directly in the memory. Thus effectively improving data showing the effect of the front page.

Of course the follow-up, I order for this particular business scenarios and needs, the business has been adjusted slightly, thus greatly improving the data query results, significantly reduce application resource consumption (which I will write a special blog, even a special open series , used to describe a particular traffic scene design of such a size).

Bloom filter

Before someone private letter I think Bloom filter should be classified as part of the cache architecture.

I began to think there must be some truth, because Bloom filter does involve cached data, it needs to record data of the past, to achieve. But then I thought, Bloom filter should not be divided into the cache, because the Bloom filter is based cache, application cache. Just like you can say that Redis cache is part of the cache architecture, but you can not say to call the application server cache belonging cache. So in the end, I do not have the Bloom filter is divided into a part of the cache. But it will be very interesting as a kind of filter is a limiting way, such as a security measure.

But as an extension here briefly about the Bloom filter. Plainly, it is to use a hash map Hash characteristics, data filtering. I set as an array Array (all values ​​are 0) in the application, which is a fixed length of 10W. I calculate a hash value for each user, and this hasn value of 10W modulo operation carried out to obtain index values ​​(such as 1000). I'll Array value set in the first index to 1. After such production on the environment, if there is a user that the calculated index value of 0 in positions corresponding to the Array, then the user does not exist in the system (of course, if it is 1, and its description is not to users of the system, after all, there is a hash conflict and take over the conflict, but low probability). By this means, effectively prevent invalid request and the like.

Follow-up may be specialized to write a blog about the Bloom filter.

to sum up

These are the relevant knowledge of the cache architecture. Of course, this knowledge is relatively large particle size, although I gave some practical examples, but we need for specific application scenarios, adjust application. In addition, these are relatively common knowledge. Possible under specific business scenarios, there are some programs not listed here. Finally, there is no best technology, only the most suitable technology. Here are many techniques require a certain size (amount of data, the number of requests, concurrency, etc.) business, the use of relatively good value for money, we need to be carefully considered.

If you have any questions or ideas, can private letter or @ me.

We were willing to work with the party of progress.

reference

Guess you like

Origin www.cnblogs.com/Tiancheng-Duan/p/12185507.html