Redis sublimation articles

This chapter will mainly explain 1. Some other more advanced usages of Redis, 2. Some underlying principles of Redis, etc. 3. Redis persistence mechanism, etc.

1. Reidis' advanced feature usage

        Message queue   We all know that Redis has the characteristics and usage of list/queue (list) first in first out (FIFO). Will this be a problem? For example: Consumers continue to consume (lpop) is like writing a loop (that is, they need to communicate continuously), but in order to reduce this consumption, the client can sleep and request again, but there is no Is there a problem? (The list does not support one-time production and multiple consumption scenarios)

       1. If you sleep, the length of sleep is difficult to control, so you lose the real-time nature of the message

       2. If the message produced by the producer is faster than the information consumed by the consumer, then over time will the producer take up a lot of information, and the list will also take up a lot of memory (there is no accumulation of consumed messages)

       Friends who have used messaging middleware should all know that there is a publish-subscribe model. This model actually supports the scenario of one production and multiple consumption. In fact, Redis also supports such scenarios internally. In this mode, there is no need for consumers to continuously request whether the producer has produced the message. These two achieve decoupling (no direct relationship)

       Subscribe to the channel ( channel ) : There will be many channels, in fact, it can be understood as a channel is a queue (queue), subscribers can subscribe to one or more. The publisher of the message can also send a message to the specified channel. Only the message arrives in the channel, so that all subscribers (consumers) who have subscribed to this channel will receive the message. Special attention should be paid to: only the message is sent out , It will not be persisted, and will be removed from the queue. Therefore, if consumers want to receive the channel’s message, they must subscribe in advance before the channel sends the message. If it is delayed, it is accepted No messages (that is, all messages before the channel cannot be received because they have been removed from the queue) 

    Normal usage:

      > subscribe channel-1 channel-2   (Subscribers subscribe to one or more at a time)

     > publish channel-1 2673  (The publisher can only send messages to one channel at a time, and multiple channels are not supported)

    > unsubscribe channel-2   (unsubscribe)

  Subscribe to channels according to the rules. Usage: Support? and * placeholders. ? Represents a character, * represents 0 or more characters

  consumer:

    cli1 > psubscribe *sport

   cli2 > psubscribe stars*

   cli3 > psubscribe stars-NBA

Producers:

  cli4  > publish NBA-sport KeBi
  cli4  > publish stars-NBA YaoMing
  cli4 > publish New-weather rain

What is a transaction? We all know to achieve atomicity: either succeed or fail. This is called atomicity? A command can actually guarantee atomicity.

      In fact, many times, we don’t always use one command, and often one command can’t achieve the result we want. Therefore, it may be necessary to implement multiple commands in a logic to execute in a certain order, so there is a concept of transaction. (Need to control the atomicity of this logic through transactions)

 Redis transaction control also has its own characteristics : 1, according to the queue order execution, 2, without the other client requests

Commands: multi (open transaction), exec (execute transaction), discard (cancel transaction), watch (monitor)

Usage (combined scene)

Transfer business: A has 200 yuan, B has 50 yuan, and A has transferred 100 yuan to B. A becomes 100 yuan, B becomes 150 yuan

Here you need to pay attention: multi opens the transaction, then multi nesting multiple layers is useless, the effect is the same

Execute transactions through the exec command. If exec is not executed, all commands will not be executed

Abandon midway: you can call discard to clear the transaction queue and give up execution

watch command: You can use watch to monitor one or more keys. If at least one monitored key is modified before exec is executed after the transaction is started, the entire transaction will be cancelled (except for the key expiration early). Can be cancelled with unwatch

         

Note: In a transaction, if it occurs before the exec is executed, the commands in the entire transaction queue will not be executed, but if it occurs after the exec, the transaction heterogeneity is obviously complete, and it will not be affected, just the wrong command Will not be executed

LUA script, this script is actually a bit similar to the function of the database (ie, stored procedure)

Features: 1. Multiple commands can be sent at one time, reducing network overhead 2. A LUA script can contain multiple commands, which are executed as a whole without being cut off, ensuring atomicity 3. Can be used as a file to implement commands Set reuse

Syntax format: eval lua-script key-num [key1 key2 key3 ....] [value1 value2 value3 ....]

 eval represents the execution of Lua language commands.
 lua-script represents the content of Lua language script.
 key-num indicates how many keys are in the parameter. It should be noted that the key in Redis starts from 1. If there is no key parameter, then write 0.
 [key1 key2 key3...] is the key passed to the Lua language as a parameter, or it can be left blank, but it needs to correspond to the number of key-num.
 [value1 value2 value3 ….] These parameters are passed to the Lua language, and they can be filled in or not.

Use redis.call(command, key [param1, param2…]) to operate. Need to use return , otherwise an error will be reported

Cache LUA scripts Why? Reason: If the script is too long, the script needs to be uploaded to the server each time it is executed, which will occupy a relatively large network overhead, so the EVALSHA command is provided.

How to cache ? When the script load command is executed, the SHA1 summary of the script will be calculated and recorded in the script cache. When the EVALSHA command is executed, Redis will find the corresponding script content from the script cache according to the provided summary, and execute the script if it finds it, otherwise an error will be returned: "NOSCRIPT No matching script. Please use
EVAL." (Remember that you need to bring parameters later , even if the parameter is 0 )

Redis underlying principle analysis

   First of all, Redis gives us the feeling that it is very fast, why is redis block?

How fast this is, depends on the respective machine (according to official data, Redis's QPS can reach about 100,000 (requests per second))

So what is the reason why Redis is fast ?

1. Pure memory structure storage, because it is a KV structure, the query speed is 0 (1).

2. Single-threaded operation. Advantages: a. Reduce the consumption caused by creating threads. b. Avoid the consumption caused by CPU context switching. c. Reduce the competition between threads and avoid the lock problem caused by thread competition.

      So why is Redis single threaded ?

 Reason: According to the official statement, because single thread is enough, there is no need to use multiple threads (no need to waste CPU resources). CPU is not the bottleneck of Redis, but Redis memory or network bandwidth, so Redis uses single Thread

    So why is Redis fast?

Reason: Since it is fast, we have to start with the memory where the data is stored. We all know that operating memory is obviously much faster than the disk. This is also one of the reasons for the speed, and of course it is also related to its memory structure. of

Memory: In the early days, our data was directly based on the memory address, and we accessed the direct physical address.

 Because such addressing methods will have defects: (1, multi-user multi-tasking operating system, there is shared memory, if each process occupies a physical memory address, then the physical address of the main memory will be occupied sooner or later, The best thing is that at different times, there will be different processes to share this physical address. 2. Since there is sharing, it is very likely that one process will modify the content of another process, leading to data confusion, or even physical Address damage, etc., causing exceptions to other programs) In order to solve this problem, virtual memory is created,   that is , a virtual memory is added between the physical address and the CPU. When each thread is created, a virtual physical is allocated Address, and then the mapping between virtual address and physical address to find the physical address, so that you will not directly touch the physical address. At present, most systems should use virtual memory, windows, linux, etc., in 32-bit operating systems Above, the virtual address space is: 2^32 = 4G. So a 64 system, is it 2^64? Actually, it is impossible to have this large virtual address space, nor use such a large space, and it is too large. Increase the overhead of addressing and other systems. Linux generally uses less than 48 bits, that is, 2^48. In actual situations, the physical address space is generally much smaller than the virtual address space. Introducing the advantages of virtual memory (1, provides a larger address space, and the addresses are continuous, so that the link is simpler, 2. Isolate the actual physical address, the operation between different threads does not affect 3. You can also The shared area between different threads is mapped to different virtual addresses for memory sharing)

After analyzing the memory , the next step is that not any user can directly manipulate the memory data. In this way, in order to ensure the security of the kernel, the internal virtual memory of the system is divided into ( user space, kernel space

User space : stores the code and data of the user program. Kernel space : Stores the kernel code and data, the core of the operating system , independent of ordinary applications, and can access the protected memory space as well as access to the underlying hardware devices.

When a thread enters the kernel space, it becomes the kernel state (the kernel space executes arbitrary commands and calls all resources of the system ), and when it enters the user space, it becomes the user state (only simple operations can be performed , and system resources cannot be directly called . System interface (also known as system call) can issue instructions to the kernel)

Therefore, if a thread comes in and needs to obtain data, the first thing to enter is user space, then map to find the corresponding physical address, and then go to kernel space to copy the data.  Therefore, during this period, the threads are blocked and do not occupy CPU resources, waiting to be awakened, and continue to execute.

  Therefore, in order to solve the problem of thread blocking: 1. Use multithreading or thread pool processing, but in the case of high concurrency, it will also consume resources. 2. Use polling to regularly request memory space. At that time there will be a certain delay. So is there a mechanism to process multiple client requests, and the client does not need to wait, when the data is processed, it will automatically tell the thread. (It is: I/O multiplexing)

3. Multiplexing (asynchronous non-blocking I/O multiplexing solves concurrent connections)

    What is I/O multiplexing?

         I/O refers to network I/O. Multiplexing refers to multiple TCP connections (Socket or Channel). Multiplexing refers to multiplexing one or more threads. The basic principle is that it is no longer used by the application itself Monitor the connection, but the kernel monitors the file descriptor for the application

 There are many implementations of multiplexing. Take select as an example. When the user process calls the multiplexer, the process will be blocked. The kernel will monitor all the sockets that the multiplexer is responsible for, and when the data for any one socket is ready, the multiplexer will return. At this time, the user process calls the read operation again to copy the data from the kernel buffer to the user space, thus reducing the blocking time.

Memory reclamation mechanism

We all know that Redis has an expiration mechanism, so what are there? What is included in data recovery?

My understanding is based on: 1. The expiration time in the setting, 2. The expiration time is not set 3. Noeviction (do not delete data)

Set expiration time (in all data with time set): volatile-lru (least used recently), volatile-lfu (least frequently used recently) volatile-ttl (expired recently) volatile-random (random)

No expiration time is set (in all data): allkeys-lru, allkeys-lfu, allkeys-random

Redis persistence 

   Finally, talk about the persistence of data, Redis both the mechanism  and then the fire so now, it is clear that there must be a moment of downtime, so persistence is a must in.

Redis currently improves 2 persistence mechanisms 1. RDB snapshot (Redis DataBase) 2. AOF (Append Only File)

   RDB trigger mechanism

 1) Modify the trigger condition in the configuration file (the RDB also has two trigger methods)


2) Shutdown is triggered to ensure that the server shuts down normally.
3) Flushall, the RDB file is empty, meaningless (delete dump.rdb to demonstrate)

Manual trigger

 1) save    because redis is single-threaded, if you use save directly, if the amount of data is large, it will cause redis to block, and it is not recommended to use it in the production environment

2) The bgsave   command just made up for the shortcomings of save. When the command is executed, a child thread will be fork to process the command, the main thread will still run, but the data after the fork will not be recorded, and there will be Short-term impact on the main thread (very short time)

  Pros and cons of RDB

    Advantages: 1. The data is compact, suitable for data backup and disaster recovery. 2. When bgsave forks the child process, the main thread does not need to perform other I/O operations. 3. RDB is faster than AOF data when recovering large data.

   Disadvantages: 1. Insufficient storage granularity, no real-time persistence or second-level persistence. Every time bgsave, you need to fork the child process. If it is used frequently, the cost will be high. 2. The real-time performance is not high, and the amount of data loss may be more than that of AOF

    AOF 

    Redis is not enabled by default. AOF uses the form of a log to record each write operation and append it to the file. After it is turned on, when the command to change the Redis data is executed, the command will be written into the AOF file. When restarting, all recorded commands will be executed once.

    AOF trigger mechanism

1) Configuration file

appendonly: Redis only enables  RDB persistence by default, and AOF needs to be modified to yes
appendfilename "appendonly.aof": The path is also configured through the dir parameter config get dir

AOF is not always persisted to disk in real time. Due to the caching mechanism of the operating system, AOF is not actually written to disk. When to write the contents of the buffer to the AOF file.

AOF persistence strategy (hard disk cache to disk), default everysec
 no means that fsync is not performed, and the operating system guarantees that data is synchronized to disk, which is the fastest but not very safe;
 always means that fsync is performed every time a write is performed. Ensure that the data is synchronized to the disk, which is very inefficient;
 everysec means that fsync is executed once per second, which may cause the loss of this 1s data. Usually choose everysec,
taking into account safety and efficiency

    So now I have a question. If you keep adding commands every time, the file will definitely be large, and a command may be executed repeatedly, so many repeated commands will be appended for a long time. How to solve this?

  There will be a command rewriting mechanism inside AOF  . When a threshold is reached, the rewriting mechanism will be triggered to rewrite all commands once, merge the repeated commands, etc., to generate a new file to replace the original file .

        

    What if it is modified during the AOF rewrite?

When performing AOF rewriting, the main thread will execute: 1. Process normal command requests. 2. Append the write command to the current AOF file. 3. Append the write command to the AOF rewrite cache.

AOF advantages and disadvantages

 Advantages: 1. The AOF persistence method provides a variety of synchronization frequencies. Even if the default synchronization frequency is used to synchronize once per second, Redis will lose at most 1 second of data.

  2. The granularity of data storage is finer

Disadvantages: 1. Sometimes the advantages of AOF will also be its disadvantages. In the case of high concurrency, performance may be affected. 2. With the same data, the AOF file may be larger

 As for the two persistent methods, each has its own advantages and disadvantages. For RDB, if you can tolerate data loss for a short period of time, then RDB will undoubtedly be better, but if the data real-time requirements are relatively high, then AOF will be better. In general, the two mechanisms may be used, not a single use. The specific situation has to be selected according to the time business scenario.

Let’s stop here today, the content is still lacking, so please give me some advice. Thank you!!!

 

Guess you like

Origin blog.csdn.net/u010200793/article/details/105004205