Interview must ask [distributed middleware]-redis, message queue, es, high availability, sub-database and sub-table

Principle of Distributed Cap

Consistency Consistency
Availability High-availability
partition tolerance
Zookeeper can satisfy cp, but the
three properties cannot be satisfied at the same time. Therefore, in order to satisfy high availability, the distributed system sacrifices consistency and only needs to ensure the final consistency of the data. As long as the final time is within the acceptable range of users

Implementation of distributed locks

1. Zookeeper orderly temporary nodes implement distributed locks, but zookeeper is inherently inefficient, and the implementation and deletion of nodes requires a relatively large overhead.
2. Redis setnx plus expiration time

Implementation of advanced distributed locks Redisson

Redisson implements Redis distributed locks.
Reentrant lock RLock lock = redisson.getLock("anyLock");
fair lock RLock fairLock = redisson.getFairLock("anyLock");
semaphore RSemaphore semaphore = redisson.getSemaphore("semaphore");
Latch RCountDownLatch latch = redisson.getCountDownLatch("anyCountDownLatch");

Seven transaction propagation behaviors
(1). REQUIRED<default> If there is a transaction currently, then join the transaction, if there is no transaction currently, create a new transaction.
(2) SUPPORTS If there is a transaction currently, it will join the transaction; if there is no transaction currently, it will continue to run in a non-transactional manner.
(3) MANDATORY If there is a transaction currently, it will join the transaction; if there is no transaction currently, an exception will be thrown.
(4) REQUIRES_NEW re-create a new transaction, if the current transaction exists, postpone the current transaction.
(5) NOT_SUPPORTED runs in a non-transactional manner. If there is a transaction currently, the current transaction is suspended.
(6) NEVER runs in a non-transactional manner. If there is a transaction currently, an exception is thrown.
(7) NESTED If not, create a new transaction; if there is, nest other transactions in the current transaction.

Thread pool workflow When
a task is submitted and the number of core threads surviving in the thread pool is less than corePoolSize, the thread pool will create a core thread to process the submitted task.
If the number of core threads in the thread pool is full, that is, the number of threads is equal to corePoolSize, a new one The submitted tasks will be put into the task queue workQueue to wait for execution.
When the number of surviving threads in the thread pool is equal to corePoolSize, and the task queue workQueue is also full, judge whether the number of threads reaches the maximumPoolSize, that is, whether the maximum number of threads is full, if not, create a non-core thread to execute the submitted task.
If the current number of threads reaches the maximumPoolSize and there are new tasks coming, the rejection strategy is used directly.
The difference between sleep and wait
sleep belongs to the thread class, wait belongs to the object class; sleep does not release the lock
rejection strategy
AbortPolicy directly throws an exception to prevent the thread from running;
CallerRunsPolicy if the discarded thread task is not closed, the thread is executed;
DiscardOldestPolicy removes the oldest thread in the queue Try to submit the current task
DiscardPolicy discards the current task without processing

In our actual use, what is the appropriate size of the thread pool? What to pay attention to when configuring the parameter of the thread pool
For the computationally intensive, setting the number of threads = the number of CPUs + 1, usually achieves the best utilization.
For I/O intensive, a common saying on the Internet is to set the number of threads = the number of CPUs * 2

Which objects can be used as gc root for accessibility analysis?
1. In the stack frame of the virtual machine, the object referenced by the variable in the local variable table
2. The static variable and static of the method area
3. The object referenced by the local method in the local method stack

In the project, the new generation with more new objects should be set larger, and the persistent large objects
should be set larger in the old age. In order to reduce the full gc, we hope to be able to persist commonly used objects in the old age as much as possible, so we can set the old age to be larger
1) Larger young generations will inevitably lead to smaller old generations. Large young generations will extend the period of ordinary GC, but will increase the time of each GC; small old generations will lead to more frequent Full GC
2) Smaller The young generation will inevitably lead to the older generation, the small young generation will cause the normal GC to be very frequent, but the GC time will be shorter each time; the older generation will reduce the frequency of Full GC

OOM problem location method
(1): jmap -heap 10765 as shown above, you can view the allocation size and usage of the young and old generations of the heap memory;
(2): jstat view the GC collection status
(3): jmap -dump: live, format=b, file= to local
(4): open the analysis through the MAT tool

Zookeeper implements a high-concurrency distributed lock scheme

create /test laogong // Create a permanent node
create -e /test laogong // Create a temporary node
create -s /test // Create a sequential node
create -e -s /test // Create a temporary sequential node
in the logic of concurrent execution of threads Using the feature that nodes cannot be created repeatedly in zookeeper, create a temporary sequence of nodes before thread execution.
1. Because it is sequential, all threads monitor whether the previous node is released, which will reduce the herd effect.
2. The node is temporary. If the service where the thread is hung up, the node can be deleted quickly, and other threads can be executed.
Disadvantages:
1. The performance of Zk may not be as high as the cache service.
2. The network is jittery, the session connection is broken, zk thinks it is hung up, delete the temporary node, and other clients can obtain the distributed lock

The redis consistent hash
calculates the hash value of the data. According to the value, the machine hash (a.png)% 4 = 2 is used to locate the machine hash (a.png)% 4 = 2, and all machines
are not traversed. The server ip and machine code are used to calculate the hash position to form a circle
1 , Reduce a service, only affect the server to the previous server in its ring space.
2. The same is true for adding a service, that is, the first server encountered when walking in a counterclockwise direction
solves the problem of consistent hash tilt and increases virtual Nodes, that is, multiple hash locations and multiple points on a machine

Database transaction isolation levels
4 isolation levels
1, read uncommitted
transaction B can read the uncommitted data of transaction A, breaking the isolation, once transaction A rolls back, transaction B reads dirty data
2. Read commits
a transaction Only the data submitted by other transactions will be queried, but the transaction will read the submitted data, which will cause the data to change when the transaction reads the data repeatedly. Non-repeatable reads destroy the consistency (update and delete)
A transaction updates or deletes data, After submission, the data in the first query and the second query of transaction B is inconsistent, that is, non-repeatable read
3. Repeatable read
transaction a reads data that does not exist for itself multiple times, the first time does not exist, the second time exists , Is an
example of consistency (insert) , transaction A query data is 7 items, transaction Binsert becomes 8 items, submit. Because the row lock is added, not the table lock, then transaction A updates all the values of a field in the table and finds that 8 pieces of data are updated, transaction A phantom reads
4. Serialization

Why is redis single-threaded execution speed so fast?
(1): Pure memory operations, avoid a large number of database accesses, reduce direct reading of disk data, and are not limited by disk time
(2): Single-threaded operations avoid unnecessary context switching and Race conditions, no multithreading, no deadlock
(3): Non-blocking I/O multiplexing mechanism is used

String
Hash array + linked list at the bottom of redis data 1. Reids' Hash uses chain address method to deal with conflict
lists. For example, twitter follow list, follower
list is implemented as a doubly linked list, which can support reverse lookup and traversal.
Set: value is null The HashMap is actually a
zset that quickly sorts the weight by calculating the hash : HashMap and skip list (SkipList) are used internally to ensure the storage and order of the data

redis transaction
(1): Multi open transaction
(2): Exec executes the command in the transaction block
(3): Discard cancel transaction
(4): Watch monitor one or more keys, if the key is changed before the transaction is executed, the transaction will be interrupted

Data loss during redis update

Redis sentry **** sentry needs at least 3 instances to ensure its robustness.
Cluster monitoring: Responsible for monitoring whether the Redis master and slave processes are working properly.
Message notification: If a Redis instance fails, the sentry is responsible for sending a message as an alarm notification to the administrator.
Failover: If the master node goes down, it will automatically be transferred to the slave node.
Configuration Center: If a failover occurs, notify the client of the new master address.

The Redis expiration strategy is: regular deletion + memory elimination mechanism
noeviction: When the memory is not enough to accommodate the newly written data, the new write operation will report an error. Generally no one uses this, it is really disgusting.
allkeys-lru: When the memory is insufficient to accommodate the newly written data, in the key space, remove the least recently used key (this is the most commonly used).
allkeys-random: When the memory is not enough to accommodate the newly written data, a key is randomly removed from the key space.
Generally, no one uses this. Why should it be random? It must be the least recently used key. .
Volatile-lru: When the memory is insufficient to accommodate the newly written data, remove the least recently used key from the key space with the expiration time set (this is generally not appropriate).
Volatile-random: When the memory is insufficient to accommodate the newly written data, a key is randomly removed from the key space with the expiration time set.
Volatile-ttl: When the memory is insufficient to accommodate the newly written data, in the key space with an expiration time set, the key with an earlier expiration time will be removed first.
Cache elimination strategy
(1): First-in-first-out algorithm
(FIFO)
(2): Least Frequently Used (LFU)
(3): Least Recently Used (LRU)
that has not been used for the longest time when there is hot data , LRU is very efficient, but occasional and periodic batch operations will cause the LRU hit rate to drop sharply, and the cache pollution situation is more serious

Redis expired key deletion strategy
(1): lazy deletion, cpu-friendly, but a waste of cpu resources
(2): regular deletion (not commonly used)
(3): regular deletion, cpu-friendly, saving space

zuul
zuul gateway, load balancing, distribution request, authentication, traffic monitoring @EnableEurekaClient

LVS + nginx realizes high-availability load balancing. LVS
works on the 4th layer , dual-machine hot backup. Almost all applications are load-balanced, with strong anti-load capability. It only distributes request traffic.
nginx HTTP and reverse proxy servers work in the first layer . 7th floor, less memory, strong concurrency The company is unified export IP to ip

Forward proxy
Configure a proxy server on the client (browser), and use the proxy server to access the Internet.
Reverse proxy
sends the request to the reverse proxy server. The reverse proxy server selects the target server to obtain the data, and then returns it to the client
At this time, the reverse proxy server and the target server are the same external server, the proxy server address is exposed, and the real server IP address is hidden

Redis and database cache inconsistency problem

It is composed of a front-end virtual load balancer and a back-end real server group; after the
request is sent to the virtual server, it is forwarded to the real server according to the packet forwarding strategy and load balancing scheduling algorithm. The
so-called four-layer (lvs, f5) is load balancing based on IP+port ; The seventh layer (nginx) is load balancing based on application layer information such as URL

Cache avalanche A large number of cache invalidation at the same time;
processing method:
if it is a hot key, set the key to never invalidate;
(2): set different cache invalidation time
(3): double-layer caching strategy C1 is short-term, C2 is long-term
(4) : Redsi and memcache cache, request->redis->memcache->db;
(1): Update the cache regularly

Cache penetration frequent query does not have data;
processing method:
(1): The query result is null and the null result is still cached, and the expiration time is set to no more than 5 minutes to avoid empty DB data and database query every time
(2): Distribute Long filter, all possible data is mapped to a large enough bitmap. Google Bloom filter.
Bloom filter can only add data to it, but cannot delete data. Use a cuckoo filter.

Cache breakdown. Concurrent access hotspot key. The cache is invalid at a certain moment, so a large number of concurrent requests hit the database.
1. Redis distributed lock, and there must be an expiration time, otherwise the result of the lock is locked and the process is offline and there is no way, and To operate atomically, setex
2. If it is a master-slave redis, just lock the master redis. If the request can reach the slave, it has already been diverted.
3. If it must be locked, use Redlock, and more than half of the master and slave can be locked.

Cannot add synconized lock, java lock can only lock one object.

The emergence of hot keys causes cluster visits to tilt. Solution
(1): Use local cache
(2): Use the characteristics of the sharding algorithm to break up the key (add a prefix or suffix to the hot key to change the number of a hotkey) It becomes a multiple M of the number of redis instances N, so that access to one redis key becomes access to N * M redis keys)

What happens if the machine loses power suddenly?
1. Depends on the configuration of the sync attribute of the aof log. If performance is not required, sync the disk at each write command, and data will not be lost.
2. The required performance is still fsync per second and disk write per second

What data structure does Redis have?

String String, Dictionary Hash, List List, Set Set, Ordered Set SortedSet.
HyperLogLog, Geo, Pub/Sub.

If you still want to add points, then you said you have played Redis Module, like BloomFilter, RedisSearch, Redis-ML, at this time

BloomFilter, a weapon to avoid cache breakdown

If there are a large number of keys that need to be set to expire at the same time, what should be paid attention to?

1. The expiration time is scattered.
2. Token bucket or message queue or load balancing control traffic
3. Level 2 cache

E-commerce homepages often use timed tasks to refresh the cache, and a large amount of data may have a very concentrated invalidation time. If the invalidation time is the same, and a large number of users flood in at the time of the invalidation, it may cause a cache avalanche.

Redis distributed lock, what is it?

setnx contention for lock, expire plus expiration time release

What happens if the process crashes unexpectedly or restarts maintenance after executing setnx before expire?

The set command has very complicated parameters. This should be able to combine setnx and expire into one command at the same time!

If there are 100 million keys in Redis, and 10w of them start with a fixed known prefix, how to find them all?

Use the keys command to scan out the key list of the specified mode.

The other party then asked:

If this redis is providing services to online businesses, what is the problem with using the keys command?

Redis's single-threaded scan instruction

At this time, you have to answer a key feature of redis : redis's single-threaded . The keys instruction will cause the thread to block for a period of time, and the online service will be paused. The service cannot be restored until the instruction is executed. At this time, you can use the scan command . The scan command can extract the key list of the specified mode without blocking, but there will be a certain probability of repetition. It is enough to do the deduplication once on the client side, but the overall time will be more than direct use. The keys instruction is long.

Have you used Redis as an asynchronous queue? How do you use it?

Generally, the list structure is used as a queue , rpush produces messages, and lpop consumes messages. When there is no message from lpop, you need to sleep for a while and try again.

If the other party asks, can I not sleep?

The list also has an instruction called blpop , when there is no message, it will block until the message arrives.

If the other party continues to ask if it can produce once and consume multiple times?

Using the pub/sub topic subscriber model, a 1:N message queue can be realized.

If the other party continues to ask what are the disadvantages of pub/su b?

In the case of consumers going offline, the produced messages will be lost, and a professional message queue such as RocketMQ must be used.

How does Redis implement delayed queues?

Use sortedset, use the timestamp as the score, and the message content as the key to call zadd to produce the message. Consumers use the zrangebyscore command to obtain data polling N seconds ago for processing.

How does Redis persist? How does the service master-slave data interact?

**RDB does mirror full persistence, and AOF does incremental persistence. **Because RDB takes a long time and is not real-time enough, it will cause a large amount of data loss when it is down, so AOF is needed to cooperate. When the redis instance is restarted, the RDB persistent file will be used to rebuild the memory, and then the AOF will be used to replay the recent operation instructions to achieve a complete restoration of the state before the restart.

It’s easy to understand here. Think of RDB as the full amount of data in a whole table, and AOF as the log of each operation. When the server restarts, you will get all the data in the table first, but it may be incomplete, and you can play it back. Just click on the log and the data will be complete. However, the mechanism of Redis itself is that when AOF persistence is enabled and an AOF file exists, the AOF file is loaded first; when the AOF is closed or the AOF file does not exist, the RDB file is loaded; after the AOF/RDB file city is loaded, Redis starts successfully; AOF/RDB file When there is an error, Redis fails to start and prints an error message

The other party asked what would happen if the machine suddenly loses power?

Depending on the configuration of the sync attribute of the AOF log, if performance is not required, sync the disk at each write command, and data will not be lost. However, it is unrealistic to sync every time under the requirements of high performance. Generally, timing sync is used, such as 1s1 time. At this time, at most 1s of data will be lost.

The other party asked what is the principle of RDB?

You can give two words**, fork and cow. Fork refers to redis to perform RDB operations by creating child processes, cow refers to copy on write, ** after the child process is created, the parent and child processes share the data segment, the parent process continues to provide read and write services, and the dirty page data will gradually be written to the child The process is separated.

What are the benefits of pipelines, why use pipelines?

The time for multiple IO round trips can be reduced to one, provided that there is no causal correlation between the instructions executed by the pipeline. When using redis-benchmark for stress testing, it can be found that an important factor affecting the peak QPS of redis is the number of pipeline batch instructions.

Do you understand the synchronization mechanism of Redis?

Redis can use master-slave synchronization and slave-slave synchronization. During the first synchronization, the master node does a bgsave, and at the same time records the subsequent modification operations to the memory buffer. After completion, the RDB file is fully synchronized to the replication node. After the replication node accepts the completion, it loads the RDB image into the memory. After the loading is completed, the master node is notified to synchronize the operation records modified during the period to the replication node for replay, and the synchronization process is completed. The subsequent incremental data can be synchronized through the AOF log, which is a bit similar to the binlog of the database.

Have you ever used a Redis cluster? How to ensure the high availability of the cluster? What is the principle of the cluster?

Redis Sentinal focuses on high availability. When the master is down, it will automatically promote the slave to the master and continue to provide services.

Redis Cluster focuses on scalability. When a single redis memory is insufficient, Cluster is used for shard storage.

Redis implements distributed locks, and the addition of distributed locks in redission
zookeeper implements distributed locks
setnx

Sub-library and sub-table

How to configure springBoot to automatically load
several components of springCloud microservices, how to use
sql to optimize how to view the execution plan,
whether it is necessary to add an index, index b+ number and hash, if the index does not have a primary key, how to generate a
primary key auto-increment sequence