Baidu asked wildly for 3 hours, Dachang got an offer, the guy is so ruthless! (Baidu interview questions)

Foreword:

In the (50+) reader community of 40-year-old architect Nien , there are often small partners who need to interview Baidu, Toutiao, Meituan, Ali, JD.com and other major companies.

The following is a small partner who successfully passed Baidu's three technical interviews. After more than three hours of technical torture, the small partner finally got an offer.

Judging from these topics: Baidu's interviews focus on underlying knowledge and principles, let's take a look.

Now put the real interview questions and reference answers into our collection, let’s take a look, what do you need to learn to receive a Baidu Offer?

Of course, for intermediate and advanced development, these interview questions also have reference significance.

Here, the questions and reference answers are included in our " Nin Java Interview Collection " V72, for the reference of later friends, to improve everyone's 3-high architecture, design, and development levels.

Note: This article is continuously updated in PDF. For PDF files of the latest Nien architecture notes and interview questions, please go to the official account [Technical Freedom Circle] at the end of the article to obtain

This article directory

Article directory

One side (69min)

1. Talk about cookies and sessions

Cookie and Session are two commonly used technologies in Web development, and they can both be used to transfer data between the client and the server.

However, there are great differences in their scope of action, implementation methods, and usage scenarios.

Cookies and usage scenarios

Cookie is a browser storage technology. Note that the emphasis here is on the client side.

Cookies can store some key-value pair data for users on the client side (mainly browsers), so that they can be sent to the server when the user visits the same website next time.

Cookie can set expiration time, or set HttpOnly attribute, so as to protect user's privacy.

Scenarios for using cookies:

  • Website login : Authenticating user identity by storing a unique identifier on the client side to achieve single sign-on.
  • Website Tracking : By storing some statistical information on the client side to understand the user's behavior habits, so as to optimize the performance of the website.
  • Website settings : Personalized recommendations and other functions can be realized by storing some preferences in the client.

A cookie is a way for a server or script to maintain information on a client's workstation under the HTTP protocol.

Cookies are small text files (the contents are usually encrypted) that are saved by the web server in the memory of the user's browser or on the user's local hard drive (client), which can contain information about the user.

Whenever a user connects to the server, the Web site can access the cookie information

Session and usage scenarios

Session is a server-side storage technology. Note that the emphasis here is on the server side.

Session can create an independent session for each user on the same server, so as to realize data sharing between different requests of users.
Session can store some key-value pair data, and can also set an expiration time.

Session usage scenarios:

  • Shopping cart function : When the user adds the product to the shopping cart, store the product information in the Session, so that the user can continue to use the previous operation when he visits the shopping cart page next time.
  • Membership system : When a user registers as a member, the user's information is stored in the Session, so that the user can be identified and corresponding services can be provided in subsequent visits.
  • Forum system : When a user publishes a post or replies to a post, store the user's information in the Session, so that the user's posting record or reply record can be displayed in subsequent visits.
  • Other scenarios where data needs to be shared for the same user

The difference between Cookie and Session

  • The scope of action is different : Cookie is a browser storage technology that can only store data on the client; Session is a server-side storage technology that can only create an independent session for each user on the same server.
  • The implementation methods are different : Cookie is the data automatically sent by the browser to the server; Session is an object created on the server side, which can be managed by programming.
  • Different security : Cookie can set HttpOnly attribute to protect user's privacy; Session can set encryption algorithm to protect user's sensitive information security.
  • The data size limit is different : the size limit of Cookie is 4KB; the size limit of Session has no clear limit, but it is limited by server memory and hard disk space.
  • Expiration time control is different : Cookie can set expiration time to control the validity period of data; Session can set expiration time to control session validity period.
  • Different usage scenarios : Cookies can be used in scenarios such as website login, website tracking, and website settings; Sessions can be used in scenarios such as shopping cart functions, membership systems, and forum systems.

How does the Session mechanism implement multiple user requests, or multiple requests, to share data?

When the server receives the request, it first checks whether the request in the client already contains a session identifier – sessionID. If it already contains a sessionID, it means that a session has been created for this client before, and the server retrieves the session according to the sessionID. Come out and use (cannot be retrieved, may create a new one),

If the client request does not contain a sessionID, create a session for this client and claim a sessionID associated with this session. The value of the sessionID should be a string that is neither repeated nor easy to be found for imitation ( The server will automatically create), this sessionID will be returned to the client in this response for storage.

2. Tell me about the data structures of redis? And the redis persistence method?

Redis supports a variety of data structures, including strings, hash tables, lists, sets, and sorted sets.

The specific instructions are as follows:

  1. String type : The most basic data type in Redis, which can store strings, hashes, lists, sets and ordered sets, etc.
  2. Hash type : a hash table in Redis, which uses the data structure of a hash table to store key-value pairs, where each key is unique.
  3. List type : a list in Redis, which uses the data structure of a linked list to store a series of values.
  4. Set type : a collection in Redis, which uses the data structure of the collection to store a series of unique values.
  5. Sorted Set type : An ordered collection in Redis, which uses the data structure of the collection to store a series of ordered values.

String is the most basic data type and can store any type of data, including binary data.

A hash table can be seen as a mapping of strings to string values,

list is an ordered list of strings,

A set is an unordered collection of strings, while an ordered set is an ordered collection of strings with a score associated with each string.

Redis supports two persistence methods: RDB and AOF.

RDB persistence:

RDB is a snapshot persistence method that can save Redis data to the hard disk so that the data can be restored when Redis restarts.

The advantage of RDB persistence is that the file size is small, but the disadvantage is that backup operations need to be performed regularly, and the database cannot be read or written during the backup period.

RDB persistence configuration

Redis will dump a snapshot of the dataset into the dump.rdb file.

In addition, we can also modify the frequency of the dump snapshot of the Redis server through the configuration file. After opening the 6379.conf file, we search for save, and we can see the following configuration information:

save 900 1       #在900秒(15分钟)之后,如果至少有1个key发生变化,则dump内存快照。

save 300 10      #在300秒(5分钟)之后,如果至少有10个key发生变化,则dump内存快照。

save 60 10000    #在60秒(1分钟)之后,如果至少有10000个key发生变化,则dump内存快照。

AOF persistence:

AOF is an append-type log persistence method, which can append all write operations of Redis to a log file, so that these write operations can be re-executed to restore data when Redis restarts.

Each operation on the database is recorded in the form of a log, and the data set can be restored in the form of a log file when the server is restarted.

The advantage of AOF persistence is that it can realize real-time backup and does not need to perform regular backup operations, but the disadvantage is that the file size is large and the write performance is low.

AOF persistent configuration

There are three synchronization methods in the Redis configuration file, they are:

appendfsync always   #每次有数据修改发生时都会写入AOF文件。

appendfsync everysec #每秒钟同步一次,该策略为AOF的缺省策略。

appendfsync no     #从不同步。高效但是数据不会被持久化。

The two persistence methods have their own advantages and disadvantages, and you can choose the appropriate method according to the specific application scenario.

3. Do you understand Linux? Talk about commonly used shell commands

Linux is a free and open source Unix-like operating system widely used in servers, desktop computers, mobile devices, and more. The core idea of ​​Linux is "everything is a file", which means that all hardware devices, file systems, and applications can be accessed and managed in the form of files.

Linux is its key features include:

  1. Open source code : The source code of Linux can be freely obtained and modified on the Internet, which allows users to customize and modify it according to their own needs.
  2. Free to use : Linux is a free operating system that users can use and distribute without paying any fees.
  3. Stable and reliable : After long-term practice and testing, Linux has been proven to be a very stable and reliable operating system, suitable for applications in various environments.
  4. High security : Linux has strict authority management and security mechanisms, which can effectively protect the system from malicious attacks and illegal access.
  5. Wide range of applications : Linux can run on a variety of hardware platforms, from servers to desktops, and there are also a large number of open source applications and tools available.

Linux has a wide range of applications, including but not limited to:

  1. Server operating system : Linux is one of the most popular server operating systems, widely used in various Internet services and applications.
  2. Desktop operating system : Linux can also be used as a desktop operating system, such as Ubuntu, KDE, etc.
  3. Embedded systems : Linux can be used in various embedded devices such as IoT devices, robots, etc.
  4. Workstations and Personal Computers : Linux can also be used on personal computers and workstations as an operating system and tools.

Commonly used shell commands:

mkdir
cd
ls
pwd
ps -ef
jps

There are too many, please refer to Nien's " Linux Command Encyclopedia: 2W Multiple Words, Realizing Linux Freedom at One Time "

4. Tell me about the role of message middleware?

Messaging middleware is a middleware software used to deliver messages in distributed systems.

Its role is to pass messages from one application to another, while providing functions such as asynchronous communication, decoupling, reliability, message persistence, and message distribution.

Through message middleware, applications can be decoupled, that is, the sender and receiver do not need to know each other's existence, thereby improving the maintainability and scalability of the system.

The role of message middleware:

  1. Decoupling system:
    Messaging middleware can decouple communication between different applications so that they can be scaled and deployed independently. This means that the failure of one application will not affect the normal operation of other applications.
  2. Asynchronous calls:
    Messaging middleware supports asynchronous calls, which means that the sender and receiver can continue to execute their own code without waiting for the other party's response. This approach can improve the throughput and performance of the system.
  3. Reliability guarantee:
    message middleware provides some mechanisms to ensure the reliability of messages, such as message persistence, message retry, message confirmation, etc. These mechanisms ensure that messages are not lost or mishandled.
  4. Flexibility:
    Messaging middleware is usually highly flexible and can be integrated with different programming languages ​​and operating systems. In addition, they also provide some advanced features, such as topics, queues, routing, etc., to meet different application scenarios.

Focus on asynchronous calls

Before understanding middleware, let me introduce what is synchronization?

First of all, let's think about it. If there is a business requirement for the two companies to call each other's interfaces, how can it be realized if no middleware technology is introduced?

The user initiates a request to system A, and system A directly calls system B after receiving the request. After system B returns the result, system A can return the result to the user. This mode is a synchronous call.

The so-called synchronous call means that each system depends on each other. When one system sends a request, other systems will follow in order to process it. Only after all systems are processed is a request completed for the user. As long as other systems fail, an error will be reported to the user.

So after introducing middleware, how to make asynchronous calls?

The user initiates a request to system A. At this time, system A sends a message to MQ, and then returns the result to the user, leaving system B alone. Then system B goes to MQ to obtain the message according to its own situation. It may have been 1 minute or even 1 hour when the message is obtained, and then executes the corresponding operation according to the instruction of the message.

So think about it, do system A and system B communicate with each other? Is this method of calling synchronous?

After system A sends a message to the middleware, its own work has been completed, and there is no need to worry about when system B completes the operation. After system B pulls the message, it does not need to tell system A the execution result when it executes its own operation, so the entire communication process is called asynchronously.

Having said that, we can make a summary, what exactly is message middleware?

In fact, message middleware is an independently deployed system. Asynchronous calls between various systems can be realized. Of course, its role is more than that. It can solve a large number of technical pain points, which we will introduce next.

In summary, message middleware has three functions: asynchronization to improve performance, reduce coupling, and traffic peak shaving .

Asynchronization improves performance

Let’s talk about asynchronous performance improvement first. When we introduced middleware above, we explained how asynchronization is achieved after the introduction of middleware, but did not explain how the specific performance is improved. Let’s take a look at the figure below.

When no middleware is introduced, the user initiates a request to system A, and system A takes 20ms. Then system A calls system B, and system B takes 200ms. The experience brought to the user is that it takes a total of 220ms to complete an operation.

What if middleware is introduced? See the picture below.

When a user initiates a request to system A, it takes 20ms for system A to send a message to MQ for 5ms, and it takes 25ms for the result to be returned. It only takes 25ms for the user to experience an operation, regardless of when system B obtains the message and executes the corresponding operation , so compared, the performance is naturally greatly improved

Let me introduce in detail how message middleware reduces coupling

Let’s talk about the decoupling scene again, see the picture below.

If no middleware is introduced, then when system A calls system B, system B breaks down, causing the call to fail, then system A will receive an exception message, after receiving the exception message, it must deal with it again, and return it to the user. Try again later. At this time, you have to wait for the engineer of system B to solve the problem. After everything is resolved, you can tell the user that it is ok, and then try again.

With such an architecture, the two systems are coupled together, and the user experience is extremely poor.

So what kind of scenario does it look like after we introduce middleware? Look at the following process:

For system A, the result is returned directly after sending the message, regardless of how system B operates later.

However, after system B recovers from a fault, it pulls messages from MQ again, and re-executes unfinished operations. In such a process, there is no impact between systems, and decoupling is realized.

Finally, let me introduce in detail what is traffic peak clipping

Next, let's talk about the last scene, traffic peak cutting

If our system A is a cluster without connecting to the database, the cluster itself can withstand 10,000 QPS

System B operates a database, and this database can only withstand 6000QPS. As a result, no matter how System B expands the cluster, it can only withstand 6000QPS. Its bottleneck lies in the database.

If suddenly the system QPS reaches 10,000, it will directly cause the database to crash, then how to solve it after introducing MQ, see the figure below:

After the introduction of MQ, it has no effect on system A, and sending a message to MQ can directly send 10,000 QPS.

At this time, for system B, you can control the speed of obtaining messages by yourself, keep it below 6000QPS, and perform operations at a speed that a database can bear. This ensures that the database will not be overwhelmed.

Of course, in this case, there may be a large backlog of messages in MQ.

But for MQ, a backlog of messages is allowed. After the peak value of system A passes and returns to 1000QPS, system B is still pulling messages at a speed of 6000QPS, and naturally the messages in MQ are slowly released.

This is the process of traffic peak clipping.

Such a set of architecture can be used in scenarios with traffic peaks such as e-commerce spikes and ticket grabbing.

Nien reminded that message middleware is very important in the architecture.

Therefore, Nien also prepared the rocketmq tetralogy for everyone, taking you from the underlying principles and core principles, and proficient in rocketmq message middleware.

5. How to design a high-concurrency and high-availability solution?

Designing a high-concurrency and high-availability solution requires consideration of multiple aspects, including hardware infrastructure, network architecture, application software architecture, and middleware high-availability and high-concurrency architecture.

hardware infrastructure

1. Server selection

For high availability we need to use multiple servers.

First, we need to choose the configuration of the server. The configuration of the server should be powerful enough to handle high concurrent requests.

We recommend using a server with at least 8-core CPU and 16GB memory, and each server should have at least 2 network cards.

In addition, we recommend using a solid-state drive (SSD) instead of a mechanical hard drive, because SSDs can provide faster read and write speeds.

2. Load Balancer

In order to distribute requests and increase the availability of the system, we need to use a load balancer.

A load balancer can distribute requests to multiple servers to achieve load balancing. We recommend using a hardware load balancer as it provides better performance and reliability. If you use a cloud service provider's load balancer, you need to consider its performance and reliability.

3. Virtualization technology

For rapid deployment and scaling, we recommend the use of virtualization technology.

Virtualization technology can divide a physical server into multiple virtual servers, and each virtual server can run independent operating systems and applications. This allows us to quickly deploy and scale servers while improving resource utilization.

Network Architecture

1.CDN

In order to speed up the transfer of static resources and reduce the load on the server, we recommend using a CDN (Content Delivery Network). CDN can cache static resources on nodes closer to users to improve the transmission speed of resources. We recommend using a global CDN service provider, such as Alibaba Cloud CDN, Tencent Cloud CDN, etc.

2. Firewall and DDoS protection system

In order to keep the network safe, we need to use firewall and DDoS protection system. A firewall blocks unauthorized access and protects the system from attacks. The DDoS protection system can identify and block DDoS attacks to ensure system availability. We recommend using a professional firewall and DDoS protection system, such as Alibaba Cloud Security Group, Tencent Cloud Security Group, etc.

Apply high-availability architecture

In order to achieve high availability of applications, they are generally deployed in clusters.

Of course, the so-called cluster deployment, in the case of a small number of users in the initial period, is actually just deploying two application servers, and then a server will be placed in front to deploy a load balancing device, such as LVS, to evenly distribute user requests. Go to two application servers.

If an application server fails at this time, there is another application server that can be used, thus avoiding the single point of failure problem.

As shown below:

Application of high concurrency architecture of microservices

Assuming that the estimated number of users of this website is 10 million, then according to the rule of 28, 20% of users will visit this website every day, that is, 2 million users will visit this website every day.

It is usually assumed that on average each user will have 30 clicks each time they come, so there will be a total of 60 million clicks (PV).

24 hours a day, according to the rule of 28, the most active time of most users every day is concentrated within (24 hours * 0.2) ≈ 5 hours, and most users refer to (60 million clicks * 0.8 ≈ 50 million clicks)

In other words, there will be 50 million clicks in 5 hours.

In other words, during the 5-hour active access period, there will be about 3,000 requests per second, and then there may be a peak time period during which a large number of users visit intensively during these 5 hours.

For example, a large number of users flood in within half an hour to form a peak visit.

According to online experience, general peak visits are 2 to 3 times active visits. Assuming that we calculate by 3 times, there may be a short-term peak within 5 hours with about 10,000 requests per second.

After roughly knowing that there may be about 10,000 requests per second during the peak period, let's take a look at the pressure estimation of each server in the system.

Generally speaking, an application server deployed on a virtual machine, with a Tomcat on it, can support up to hundreds of requests per second.

Calculated by supporting 500 requests per second, then to support 10,000 visits per second during the peak period, 20 application services need to be deployed.

Moreover, the number of visits to the database by the application server will increase several times, because it is assumed that the application server receives 10,000 requests per second, but the application server may involve an average of 3 to 5 database visits in order to process each request. .

Calculated based on 3 database accesses, 30,000 requests will be made to the database per second.

According to the fact that one database server supports a maximum request volume of about 5,000 per second, at this time, six database servers are required to support about 30,000 requests per second.

The pressure on the image server will also be very high, because a large number of image display pages need to be read. This is not easy to estimate, but it can be roughly estimated that there will be at least a few thousand requests per second, so multiple image servers are needed to support it. Image access request.

Business vertical split

Generally speaking, the first thing to do at this stage is the vertical split of the business

Because if all business codes are mixed and deployed together, it will be difficult to maintain when multiple people collaborate on development.

When the website reaches tens of millions of users, the R&D team generally has dozens or even hundreds of people.

So at this time, if you still do development in a monolithic system, it is a very painful thing. What you need to do at this time is to split the business vertically, split a monolithic system into multiple business systems, and then a A small team of about 10 people is dedicated to maintaining a business system.

As shown below

High-availability and high-concurrency architecture for middleware

Distributed cache high availability and high concurrency architecture

At this time, there is generally no major problem at the application server level, because nothing more than adding machines can resist higher concurrent requests.

Now it is estimated that there are about 10,000 requests per second, and it will be no problem to deploy twenty or thirty machines.

However, the most stressful part of the above-mentioned system architecture is actually the database level, because it is estimated that there may be about 30,000 concurrent requests for reading and writing to the database during the peak period.

At this time, it is necessary to introduce a distributed cache to resist the pressure of reading requests on the database, that is, to introduce a Redis cluster.

Generally speaking, the read and write requests to the database roughly follow the rule of 28, so out of the 30,000 read and write requests per second, about 24,000 are read requests

Basically 90% of these read requests can be resisted by the distributed cache cluster, that is, about 20,000 read requests can be resisted by the Redis cluster.

We can put hot and common data in the Redis cluster as a cache, and then provide external cache services.

When reading data, read it from the cache first, and if it is not in the cache, then read it from the database. In this way, 20,000 read requests will fall on Redis, and 10,000 read and write requests will continue to fall on the database.

Generally, it is no problem for a single server to resist tens of thousands of requests per second, so Redis clusters generally deploy 3 machines, and it is absolutely no problem to resist 20,000 read requests per second. As shown below:

Database master-slave architecture for read-write separation

For the database server, the master-slave architecture is generally used at this time, and a slave library is deployed to synchronize data from the main library.

As shown below:

In this way, once there is a problem with the main library, the slave library can be used quickly to continue to provide database services, avoiding the complete failure of the entire system due to database failure.

At this time, the database server still has 10,000 requests per second, which is still too much pressure for a single server.

However, databases generally support a master-slave architecture, that is, there is a slave database that has been synchronizing data from the master database in the past. At this time, read-write separation can be done based on the master-slave architecture.

In other words, about 6,000 write requests per second enter the main library, and about 4,000 read requests are read from the slave library, so that the pressure of 10,000 read and write requests can be distributed to two servers.

After such amortization, the main library can write up to 6000 requests per second, and the slave library can read up to 4000 requests per second, which can barely withstand the pressure. As shown below:

Summarize

The technologies involved in large-scale website architecture are far more than these, including MQ, CDN, static, sub-database and sub-table, NoSQL, search, distributed file system, reverse proxy, and many other topics, but this article cannot discuss them one by one. involves,

Later, the 40-year-old architect Nien will continue to write architecture articles through the official account "Technical Freedom Circle", please continue to pay attention.

6. Talk about the current limiting algorithm, leaky bucket, token bucket and counting

Tips from 40-year-old architect Nien

Current limiting is a common interview question in interviews (especially interviews with big factories and high-P interviews)

very common, very common

In the community, many small partners encountered

The next little friend reported that when he was in Ali Sanmian, he also met

Why limit current

simply put:

Current limiting is used in many scenarios to limit concurrency and request volume, such as flash sales, protecting your own system and downstream systems from being overwhelmed by huge traffic, etc.

Take Weibo as an example. For example, when a celebrity announces his relationship, the number of visits has increased from the usual 500,000 to 5 million. The system's planning capability can support up to 2 million visits , so the current limiting rule must be implemented to ensure that it is a usable state, so that the server will not crash and all requests will be unavailable.

Limiting thoughts

Increase the number of people entering as much as possible while ensuring that it is available, and the rest of the people are waiting in line, or returning a friendly reminder to ensure that the users inside the system can use it normally and prevent the system from avalanche.

In daily life, what places need to limit current?

For example, there is a national scenic spot next to me. There may not be many people going there at ordinary times, but it will be overcrowded when it comes to May 1st or the Spring Festival. At this time, the management personnel of the scenic spot will implement a series of policies to limit the flow of people. Why should the flow be restricted
?

If the scenic spot can accommodate 10,000 people, and now 30,000 people have entered, there will be crowds of people, and accidents will happen if it is not well done. The result is that everyone’s experience will be bad. If an accident occurs, the scenic spot may have to be closed, resulting in It is not available to the outside world, and the consequence of this is that everyone feels that the experience is terrible.

Current Limiting Algorithm

There are many current-limiting algorithms, and there are three common types, namely counter algorithm, leaky bucket algorithm, and token bucket algorithm, which will be explained one by one below.

Current limiting methods usually include counters, leaky buckets, and token buckets. Pay attention to the difference between current limiting and rate limiting (all requests will be processed), depending on
the business scenario.

(1) Counter:

Within a period of time (time window/time interval), the maximum number of processing requests is fixed, and the excess will not be processed.

(2) Leaky bucket:

The size of the leaky bucket is fixed, and the processing speed is fixed, but the incoming speed of requests is not fixed (when there are too many requests in emergencies, too many requests will be discarded).

(3) Token Bucket:

The size of the token bucket is fixed, and the speed of token generation is fixed, but the speed of consuming tokens (that is, requests) is not fixed (it can deal with some situations where there are too many requests at certain times); each request will take tokens from the token bucket card, if there is no token, the request is discarded.

counter algorithm

Counter current limit definition:

Within a period of time (time window/time interval), the maximum number of processing requests is fixed, and the excess will not be processed.

Simple and crude, such as specifying the size of the thread pool, specifying the size of the database connection pool, the number of nginx connections, etc., all belong to the counter algorithm.

The counter algorithm is the simplest and easiest algorithm in the current limiting algorithm.

For example, for example, we stipulate that for the A interface, the number of visits in one minute cannot exceed 100.

Then we can do this:

  • At the beginning, we can set a counter counter. Whenever a request comes, the counter will increase by 1. If the value of the counter is greater than 100 and the interval between the request and the first request is still within 1 minute, Then it means that there are too many requests and access is denied;
  • If the interval between the request and the first request is greater than 1 minute, and the value of the counter is still within the current limit range, then reset the counter, which is as simple and rude as that.

Realization of Calculator Current Limit

package com.crazymaker.springcloud.ratelimit;

import lombok.extern.slf4j.Slf4j;
import org.junit.Test;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;

// 计速器 限速
@Slf4j
public class CounterLimiter
{
    
    

    // 起始时间
    private static long startTime = System.currentTimeMillis();
    // 时间区间的时间间隔 ms
    private static long interval = 1000;
    // 每秒限制数量
    private static long maxCount = 2;
    //累加器
    private static AtomicLong accumulator = new AtomicLong();

    // 计数判断, 是否超出限制
    private static long tryAcquire(long taskId, int turn)
    {
    
    
        long nowTime = System.currentTimeMillis();
        //在时间区间之内
        if (nowTime < startTime + interval)
        {
    
    
            long count = accumulator.incrementAndGet();

            if (count <= maxCount)
            {
    
    
                return count;
            } else
            {
    
    
                return -count;
            }
        } else
        {
    
    
            //在时间区间之外
            synchronized (CounterLimiter.class)
            {
    
    
                log.info("新时间区到了,taskId{}, turn {}..", taskId, turn);
                // 再一次判断,防止重复初始化
                if (nowTime > startTime + interval)
                {
    
    
                    accumulator.set(0);
                    startTime = nowTime;
                }
            }
            return 0;
        }
    }

    //线程池,用于多线程模拟测试
    private ExecutorService pool = Executors.newFixedThreadPool(10);

    @Test
    public void testLimit()
    {
    
    

        // 被限制的次数
        AtomicInteger limited = new AtomicInteger(0);
        // 线程数
        final int threads = 2;
        // 每条线程的执行轮数
        final int turns = 20;
        // 同步器
        CountDownLatch countDownLatch = new CountDownLatch(threads);
        long start = System.currentTimeMillis();
        for (int i = 0; i < threads; i++)
        {
    
    
            pool.submit(() ->
            {
    
    
                try
                {
    
    

                    for (int j = 0; j < turns; j++)
                    {
    
    

                        long taskId = Thread.currentThread().getId();
                        long index = tryAcquire(taskId, j);
                        if (index <= 0)
                        {
    
    
                            // 被限制的次数累积
                            limited.getAndIncrement();
                        }
                        Thread.sleep(200);
                    }


                } catch (Exception e)
                {
    
    
                    e.printStackTrace();
                }
                //等待所有线程结束
                countDownLatch.countDown();

            });
        }
        try
        {
    
    
            countDownLatch.await();
        } catch (InterruptedException e)
        {
    
    
            e.printStackTrace();
        }
        float time = (System.currentTimeMillis() - start) / 1000F;
        //输出统计结果

        log.info("限制的次数为:" + limited.get() +
                ",通过的次数为:" + (threads * turns - limited.get()));
        log.info("限制的比例为:" + (float) limited.get() / (float) (threads * turns));
        log.info("运行的时长为:" + time);
    }


}

Serious problem with counter current limiting

Although this algorithm is simple, there is a very fatal problem, that is, the critical problem. Let's look at the picture below:

From the figure above, we can see that if there is a malicious user who sends 100 requests in an instant at 0:59, and sends another 100 requests in an instant at 1:00, then in fact, within 1 second, the user 200 requests were sent in an instant.

What we just stipulated is a maximum of 100 requests per minute (planned throughput), that is, a maximum of 1.7 requests per second. Users can instantly exceed our rate limit by making burst requests at the reset node of the time window.

Users may use this loophole in the algorithm to overwhelm our application in an instant.

Leaky Bucket Algorithm

The basic principle of the leaky bucket algorithm current limiting is: water (corresponding to the request) enters the leaky bucket from the water inlet, and the leaky bucket discharges water at a certain speed (request for release). When the water inflow speed is too large, the total water volume in the bucket is greater than the bucket The capacity will be overflowed directly, and the request will be rejected, as shown in the figure.
The general flow-limiting rules of the leaky bucket are as follows:
(1) The water inlet (corresponding to client requests) flows into the leaky bucket at any rate.
(2) The capacity of the leaky barrel is fixed, and the water discharge (release) rate is also fixed.
(3) The capacity of the leaky bucket remains unchanged. If the processing speed is too slow, the amount of water in the bucket will exceed the capacity of the bucket, and the water droplets that flow in later will overflow, indicating that the request is rejected.

Leaky Bucket Algorithm Principle

The idea of ​​the leaky bucket algorithm is very simple:

Water (request) first enters the leaky bucket, and the leaky bucket flows out at a certain speed. When the water inflow speed is too high, it will overflow the bucket directly.

It can be seen that the leaky bucket algorithm can forcibly limit the data transmission rate.

The leaky bucket algorithm is actually very simple. It can be roughly regarded as the process of water injection and leakage. Water flows into the bucket at any rate and flows out at a certain rate. When the water exceeds the capacity of the bucket, it is discarded, because the capacity of the bucket remains unchanged . The overall speed is guaranteed.

Water flows out at a certain rate

Peak clipping : When a large amount of traffic enters, overflow will occur, so that the current limiting protection service is available

Buffering : not directly requesting to the server, buffering pressure

The consumption speed is fixed because the computing performance is fixed

Leaky Bucket Algorithm Implementation

package com.crazymaker.springcloud.ratelimit;

import lombok.extern.slf4j.Slf4j;
import org.junit.Test;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;

// 漏桶 限流
@Slf4j
public class LeakBucketLimiter {
    
    

    // 计算的起始时间
    private static long lastOutTime = System.currentTimeMillis();
    // 流出速率 每秒 2 次
    private static int leakRate = 2;

    // 桶的容量
    private static int capacity = 2;

    //剩余的水量
    private static AtomicInteger water = new AtomicInteger(0);

    //返回值说明:
    // false 没有被限制到
    // true 被限流
    public static synchronized boolean isLimit(long taskId, int turn) {
    
    
        // 如果是空桶,就当前时间作为漏出的时间
        if (water.get() == 0) {
    
    
            lastOutTime = System.currentTimeMillis();
            water.addAndGet(1);
            return false;
        }
        // 执行漏水
        int waterLeaked = ((int) ((System.currentTimeMillis() - lastOutTime) / 1000)) * leakRate;
        // 计算剩余水量
        int waterLeft = water.get() - waterLeaked;
        water.set(Math.max(0, waterLeft));
        // 重新更新leakTimeStamp
        lastOutTime = System.currentTimeMillis();
        // 尝试加水,并且水还未满 ,放行
        if ((water.get()) < capacity) {
    
    
            water.addAndGet(1);
            return false;
        } else {
    
    
            // 水满,拒绝加水, 限流
            return true;
        }

    }


    //线程池,用于多线程模拟测试
    private ExecutorService pool = Executors.newFixedThreadPool(10);

    @Test
    public void testLimit() {
    
    

        // 被限制的次数
        AtomicInteger limited = new AtomicInteger(0);
        // 线程数
        final int threads = 2;
        // 每条线程的执行轮数
        final int turns = 20;
        // 线程同步器
        CountDownLatch countDownLatch = new CountDownLatch(threads);
        long start = System.currentTimeMillis();
        for (int i = 0; i < threads; i++) {
    
    
            pool.submit(() ->
            {
    
    
                try {
    
    

                    for (int j = 0; j < turns; j++) {
    
    

                        long taskId = Thread.currentThread().getId();
                        boolean intercepted = isLimit(taskId, j);
                        if (intercepted) {
    
    
                            // 被限制的次数累积
                            limited.getAndIncrement();
                        }
                        Thread.sleep(200);
                    }


                } catch (Exception e) {
    
    
                    e.printStackTrace();
                }
                //等待所有线程结束
                countDownLatch.countDown();

            });
        }
        try {
    
    
            countDownLatch.await();
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
        float time = (System.currentTimeMillis() - start) / 1000F;
        //输出统计结果

        log.info("限制的次数为:" + limited.get() +
                ",通过的次数为:" + (threads * turns - limited.get()));
        log.info("限制的比例为:" + (float) limited.get() / (float) (threads * turns));
        log.info("运行的时长为:" + time);
    }
}

The leaky bucket problem

The water outlet speed of the leaky bucket is fixed, that is, the request release speed is fixed.

Copied sayings from the Internet:

The leaky bucket cannot effectively deal with sudden traffic, but it can smooth out sudden traffic (rectification).

Actual question:

The speed of the leaky bucket exit is fixed, and it cannot flexibly respond to the improvement of the back-end capability.

For example, through dynamic expansion, the back-end traffic is increased from 1000QPS to 1WQPS, and there is no way for leaky buckets.

Token Bucket Current Limit

The token bucket algorithm generates tokens at a set rate and puts them into the token bucket. Every time a user requests a token, if the token is insufficient, the request is rejected.
In the token bucket algorithm, when a new request arrives, a token will be taken from the bucket. If there is no token in the bucket, the service will be refused. Of course, the number of tokens is also capped. The number of tokens is strongly related to time and issuance rate. The longer the time elapses, the more tokens will be added to the bucket. If the token issuance speed is faster than the application speed, the token bucket will be filled with tokens , until the tokens occupy the entire token bucket, as shown in the figure.

The general rules for token bucket current limiting are as follows:
(1) The water inlet puts tokens into the bucket at a certain speed.
(2) The capacity of the token is fixed, but the speed of release is not fixed. As long as there are remaining tokens in the bucket, the application can be successful once the request comes, and then release.
(3) If the issuance speed of the token is slower than the arrival speed of the request, there will be no tokens to collect in the bucket, and the request will be rejected.

In a word, the sending rate of the token can be set, so that the sudden egress traffic can be dealt with effectively.

token bucket algorithm

The token bucket is similar to the leaky bucket. The difference is that some tokens are placed in the token bucket. After the service request arrives, the service will only be obtained after the token is obtained. For example, when we usually go to the cafeteria to eat, we usually use the The queue in front of the window in the cafeteria is like the leaky bucket algorithm. A large number of people gather outside the window in the cafeteria to enjoy the service at a certain speed. If too many people come in and the cafeteria cannot hold it, there may be some If people stand outside the cafeteria, they will not enjoy the service of the cafeteria. This is called overflow. Overfill can continue to request, that is, continue to queue. So what’s wrong with this?

If there are special circumstances at this time, such as some volunteers who are in a hurry, or the college entrance examination in the third year of high school, this situation is an emergency. If we also use the leaky bucket algorithm, we have to queue up slowly, which does not solve our problem. Requirements, for many application scenarios, in addition to being able to limit the average data transmission rate, it is also required to allow a certain degree of burst transmission. At this time, the leaky bucket algorithm may not be suitable, and the token bucket algorithm is more suitable. As shown in the figure, the principle of the token bucket algorithm is that the system will put tokens into the bucket at a constant speed, and if the request needs to be processed, it needs to get a token from the bucket first, when there is no token in the bucket When the card is available, service is refused.

Token Bucket Algorithm Implementation

package com.crazymaker.springcloud.ratelimit;

import lombok.extern.slf4j.Slf4j;
import org.junit.Test;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;

// 令牌桶 限速
@Slf4j
public class TokenBucketLimiter {
    
    
    // 上一次令牌发放时间
    public long lastTime = System.currentTimeMillis();
    // 桶的容量
    public int capacity = 2;
    // 令牌生成速度 /s
    public int rate = 2;
    // 当前令牌数量
    public AtomicInteger tokens = new AtomicInteger(0);
    ;

    //返回值说明:
    // false 没有被限制到
    // true 被限流
    public synchronized boolean isLimited(long taskId, int applyCount) {
    
    
        long now = System.currentTimeMillis();
        //时间间隔,单位为 ms
        long gap = now - lastTime;

        //计算时间段内的令牌数
        int reverse_permits = (int) (gap * rate / 1000);
        int all_permits = tokens.get() + reverse_permits;
        // 当前令牌数
        tokens.set(Math.min(capacity, all_permits));
        log.info("tokens {} capacity {} gap {} ", tokens, capacity, gap);

        if (tokens.get() < applyCount) {
    
    
            // 若拿不到令牌,则拒绝
            // log.info("被限流了.." + taskId + ", applyCount: " + applyCount);
            return true;
        } else {
    
    
            // 还有令牌,领取令牌
            tokens.getAndAdd( - applyCount);
            lastTime = now;

            // log.info("剩余令牌.." + tokens);
            return false;
        }

    }

    //线程池,用于多线程模拟测试
    private ExecutorService pool = Executors.newFixedThreadPool(10);

    @Test
    public void testLimit() {
    
    

        // 被限制的次数
        AtomicInteger limited = new AtomicInteger(0);
        // 线程数
        final int threads = 2;
        // 每条线程的执行轮数
        final int turns = 20;


        // 同步器
        CountDownLatch countDownLatch = new CountDownLatch(threads);
        long start = System.currentTimeMillis();
        for (int i = 0; i < threads; i++) {
    
    
            pool.submit(() ->
            {
    
    
                try {
    
    

                    for (int j = 0; j < turns; j++) {
    
    

                        long taskId = Thread.currentThread().getId();
                        boolean intercepted = isLimited(taskId, 1);
                        if (intercepted) {
    
    
                            // 被限制的次数累积
                            limited.getAndIncrement();
                        }

                        Thread.sleep(200);
                    }


                } catch (Exception e) {
    
    
                    e.printStackTrace();
                }
                //等待所有线程结束
                countDownLatch.countDown();

            });
        }
        try {
    
    
            countDownLatch.await();
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
        float time = (System.currentTimeMillis() - start) / 1000F;
        //输出统计结果

        log.info("限制的次数为:" + limited.get() +
                ",通过的次数为:" + (threads * turns - limited.get()));
        log.info("限制的比例为:" + (float) limited.get() / (float) (threads * turns));
        log.info("运行的时长为:" + time);
    }


}

Benefits of Token Bucket

One of the benefits of the token bucket is that it can easily handle sudden egress traffic (improvement of back-end capabilities).

For example, the token issuing speed can be changed, and the algorithm can increase the number of tokens issued according to the new sending rate, so that the egress burst traffic can be processed.

Guava RateLimiter

GuavaIt is an excellent open source project in the Java field. It includes some core libraries used by Google in Java projects, including Collections, Caching, Concurrency, Common annotations, String operations, I Many very useful functions for /O operations. Guava RateLimiterprovides token bucket algorithm implementations: Smooth Bursty and Smooth Warming Up.

RateLimiterThe class diagram is shown above,

Nginx leaky bucket current limiting

A simple demonstration of Nginx current limiting

Only one request is processed every six seconds, as follows

limit_req_zone  $arg_sku_id  zone=skuzone:10m      rate=6r/m;
limit_req_zone  $http_user_id  zone=userzone:10m      rate=6r/m;
limit_req_zone  $binary_remote_addr  zone=perip:10m      rate=6r/m;
limit_req_zone  $server_name        zone=perserver:1m   rate=6r/m;

This is from the request parameters, advance parameters to limit current

This is the key for counting the number of times current limiting is performed from the request parameters, advance parameters.

Define the current-limited memory area zone in the http block.

limit_req_zone  $arg_sku_id  zone=skuzone:10m      rate=6r/m;
limit_req_zone  $http_user_id  zone=userzone:10m      rate=6r/m;
limit_req_zone  $binary_remote_addr  zone=perip:10m      rate=6r/m;
limit_req_zone  $server_name        zone=perserver:1m   rate=10r/s;

Use the current limiting zone in the location block, as follows:

#  ratelimit by sku id
location  = /ratelimit/sku {
    
    
  limit_req  zone=skuzone;
  echo "正常的响应";
}

test

[root@cdh1 ~]# /vagrant/LuaDemoProject/sh/linux/openresty-restart.sh
shell dir is: /vagrant/LuaDemoProject/sh/linux
Shutting down openrestry/nginx:  pid is 13479 13485
Shutting down  succeeded!
OPENRESTRY_PATH:/usr/local/openresty
PROJECT_PATH:/vagrant/LuaDemoProject/src
nginx: [alert] lua_code_cache is off; this will hurt performance in /vagrant/LuaDemoProject/src/conf/nginx-seckill.conf:90
openrestry/nginx starting succeeded!
pid is 14197


[root@cdh1 ~]# curl  http://cdh1/ratelimit/sku?sku_id=1
正常的响应
root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
正常的响应
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
正常的响应

Advance parameters from Header

1. Nginx supports reading non-nginx standard user-defined headers, but it needs to enable the underscore support of headers under http or server:

underscores_in_headers on;

2. For example, if we customize the header as X-Real-IP, this is what is needed to obtain the header through the second nginx:

$http_x_real_ip; (all lowercase, and there is an extra http_ in front)

 underscores_in_headers on;

  limit_req_zone  $http_user_id  zone=userzone:10m      rate=6r/m;
  server {
    
    
    listen       80 default;
    server_name  nginx.server *.nginx.server;
    default_type 'text/html';
    charset utf-8;


#  ratelimit by user id
    location  = /ratelimit/demo {
    
    
      limit_req  zone=userzone;
      echo "正常的响应";
    }


  
    location = /50x.html{
    
    
      echo "限流后的降级内容";
    }

    error_page 502 503 =200 /50x.html;

  }

test

[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
正常的响应
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER_ID:2" http://cdh1/ratelimit/demo
正常的响应
[root@cdh1 ~]# curl -H "USER_ID:2" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]#
[root@cdh1 ~]# curl -H "USER_ID:2" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:3" http://cdh1/ratelimit/demo
正常的响应
[root@cdh1 ~]# curl -H "USER-ID:3" http://cdh1/ratelimit/demo
限流后的降级内容

Three subdivision types of Nginx leaky bucket current limit, namely burst and nodelay parameter details

Only one request is processed every six seconds, as follows

limit_req_zone  $arg_user_id  zone=limti_req_zone:10m      rate=10r/m;

Leaky bucket current limiting without buffer queue

limit_req zone=limti_req_zone;

  • Process requests strictly according to the rate configured in limti_req_zone
  • If it exceeds the rate processing capacity, it will be dropped directly
  • Appears to have no delay for incoming requests

Assuming that 10 requests are submitted within 1 second, you can see a total of 10 requests, 9 of which fail, and return 503 directly.

Then check /var/log/nginx/access.log, which confirms that only one request is successful, and the others all return 503 directly, that is, the server rejected the request.

Leaky bucket current limiting with buffer queue

limit_req zone=limti_req_zone burst=5;

  • Process requests according to the rate configured in limti_req_zone
  • At the same time, a buffer queue with a size of 5 is set, and the requests in the buffer queue will wait for slow processing
  • Requests that exceed the burst buffer queue length and rate processing capacity are directly discarded
  • Appears as a delay in receiving requests

Assuming that 10 requests are submitted within 1 second, it can be found that within 1 second, after the server receives 10 concurrent requests, it processes 1 request first, and at the same time puts 5 requests into the burst buffer queue for processing. The requests exceeding the number of (burst+1) are directly discarded, that is, 4 requests are directly discarded . The 5 requests cached by burst are processed every 6s.

Then check the /var/log/nginx/access.log log

Leaky Bucket Flow Limiting with Instantaneous Processing Capabilities

limit_req zone=req_zone burst=5 nodelay;

If nodelay is set, it will provide the ability to process (burst + rate) requests instantaneously . When the number of requests exceeds ( burst + rate ), it will directly return 503. For requests within the peak range, there is no need to wait for requests .

Assuming that 10 requests are submitted within 1 second, it can be found that within 1 second, the server processes 6 requests (peak speed: one request within burst+10s). For the remaining 4 requests, return 503 directly. If you continue to send 10 requests to the server in the next second, the server will directly reject these 10 requests and return 503.

Then check the /var/log/nginx/access.log log

It can be found that within 1s, the server processed 6 requests (peak speed: burst + original processing speed). For the remaining 4 requests, return 503 directly.

However, the total quota is consistent with the speed time, that is, the quota is used up, and it is necessary to wait for a time period with a quota before receiving new requests. If 5 requests are processed at a time, it is equivalent to 30s of quota, 6 5 = 30. Because 6s is set to process one request,
another request cannot be processed until after 30s, that is, if 10 requests are sent to the server at this time, nine 503s and one 200 will be returned

Distributed current limiting components

why

However, Nginx's current-limiting commands can only be valid in the same memory area, and the external gateways that are flashkilled in production scenarios are often deployed on multiple nodes, so this requires the use of distributed current-limiting components.

High-performance distributed current-limiting components can be developed using Redis+Lua, and JD.com’s snap-up is to use Redis+Lua to complete the current-limiting. And whether it is an Nginx external gateway or a Zuul internal gateway, Redis+Lua current limiting components can be used.

Theoretically, there are multiple dimensions to current limiting at the access layer:

(1) User-dimension current limiting: A user is only allowed to submit one request within a certain period of time. For example, the client IP or user ID can be used as the key for current limiting.

(2) Current limiting in the product dimension: For the same snap-up product, only a certain number of requests are allowed to enter within a certain period of time, and the flash sale product ID can be used as the key for current limiting.

When to use nginx current limit:

User-dimensional current limiting can be performed on ngix, because using nginx current-limiting memory to store user IDs is more efficient than using redis keys to store user IDs.

When to use redis+lua distributed current limiting:

The current limit of the commodity dimension can be carried out on redis, which does not require a large number of keys to calculate the number of visits. In addition, it can control the total number of access seckill requests of all access layer nodes.

redis+lua distributed current limiting component

--- 此脚本的环境: redis 内部,不是运行在 nginx 内部

---方法:申请令牌
--- -1 failed
--- 1 success
--- @param key key 限流关键字
--- @param apply  申请的令牌数量
local function acquire(key, apply)
    local times = redis.call('TIME');
    -- times[1] 秒数   -- times[2] 微秒数
    local curr_mill_second = times[1] * 1000000 + times[2];
    curr_mill_second = curr_mill_second / 1000;

    local cacheInfo = redis.pcall("HMGET", key, "last_mill_second", "curr_permits", "max_permits", "rate")
    --- 局部变量:上次申请的时间
    local last_mill_second = cacheInfo[1];
    --- 局部变量:之前的令牌数
    local curr_permits = tonumber(cacheInfo[2]);
    --- 局部变量:桶的容量
    local max_permits = tonumber(cacheInfo[3]);
    --- 局部变量:令牌的发放速率
    local rate = cacheInfo[4];
    --- 局部变量:本次的令牌数
    local local_curr_permits = 0;

    if (type(last_mill_second) ~= 'boolean' and last_mill_second ~= nil) then
        -- 计算时间段内的令牌数
        local reverse_permits = math.floor(((curr_mill_second - last_mill_second) / 1000) * rate);
        -- 令牌总数
        local expect_curr_permits = reverse_permits + curr_permits;
        -- 可以申请的令牌总数
        local_curr_permits = math.min(expect_curr_permits, max_permits);
    else
        -- 第一次获取令牌
        redis.pcall("HSET", key, "last_mill_second", curr_mill_second)
        local_curr_permits = max_permits;
    end

    local result = -1;
    -- 有足够的令牌可以申请
    if (local_curr_permits - apply >= 0) then
        -- 保存剩余的令牌
        redis.pcall("HSET", key, "curr_permits", local_curr_permits - apply);
        -- 为下次的令牌获取,保存时间
        redis.pcall("HSET", key, "last_mill_second", curr_mill_second)
        -- 返回令牌获取成功
        result = 1;
    else
        -- 返回令牌获取失败
        result = -1;
    end
    return result
end
--eg
-- /usr/local/redis/bin/redis-cli  -a 123456  --eval   /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua key , acquire 1  1

-- 获取 sha编码的命令
-- /usr/local/redis/bin/redis-cli  -a 123456  script load "$(cat  /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua)"
-- /usr/local/redis/bin/redis-cli  -a 123456  script exists  "cf43613f172388c34a1130a760fc699a5ee6f2a9"

-- /usr/local/redis/bin/redis-cli -a 123456  evalsha   "cf43613f172388c34a1130a760fc699a5ee6f2a9" 1 "rate_limiter:seckill:1"  init 1  1
-- /usr/local/redis/bin/redis-cli -a 123456  evalsha   "cf43613f172388c34a1130a760fc699a5ee6f2a9" 1 "rate_limiter:seckill:1"  acquire 1

--local rateLimiterSha = "e4e49e4c7b23f0bf7a2bfee73e8a01629e33324b";

---方法:初始化限流 Key
--- 1 success
--- @param key key
--- @param max_permits  桶的容量
--- @param rate  令牌的发放速率
local function init(key, max_permits, rate)
    local rate_limit_info = redis.pcall("HMGET", key, "last_mill_second", "curr_permits", "max_permits", "rate")
    local org_max_permits = tonumber(rate_limit_info[3])
    local org_rate = rate_limit_info[4]

    if (org_max_permits == nil) or (rate ~= org_rate or max_permits ~= org_max_permits) then
        redis.pcall("HMSET", key, "max_permits", max_permits, "rate", rate, "curr_permits", max_permits)
    end
    return 1;
end
--eg
-- /usr/local/redis/bin/redis-cli -a 123456 --eval   /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua key , init 1  1
-- /usr/local/redis/bin/redis-cli -a 123456 --eval   /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua  "rate_limiter:seckill:1"  , init 1  1


---方法:删除限流 Key
local function delete(key)
    redis.pcall("DEL", key)
    return 1;
end
--eg
-- /usr/local/redis/bin/redis-cli  --eval   /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua key , delete


local key = KEYS[1]
local method = ARGV[1]
if method == 'acquire' then
    return acquire(key, ARGV[2], ARGV[3])
elseif method == 'init' then
    return init(key, ARGV[2], ARGV[3])
elseif method == 'delete' then
    return delete(key)
else
    --ignore
end

In redis, in order to avoid wasting network resources by repeatedly sending script data, you can use the script load command to cache script data and return a hash code as the call handle of the script.

Every time you call the script, you only need to send the hash code to call it.

Distributed Token Current Limiting Actual Combat

You can use redis+lua, the simple case below the actual combat one ticket:

The token is put into the token bucket at a rate of 1 per second, and a maximum of 2 tokens can be stored in the bucket, then the system will only allow continuous processing of 2 requests per second.

Or every 2 seconds, after the 2 tokens in the bucket are full, handle the emergency of 2 requests at a time to ensure system stability.

Current limit of commodity dimension

When the current limit of the flash product dimension, when the traffic of the product is far greater than the traffic involved, the request will be randomly discarded.

Nginx's token bucket rate limiting script getToken_access_limit.lua is executed in the access phase of the request. However, this script does not implement the core logic of rate limiting, and only calls the rate_limiter.lua script cached inside Redis to limit rate.

The relationship between the getToken_access_limit.lua script and the rate_limiter.lua script is shown in Figure 10-17.

Figure 10-17 Relationship between getToken_access_limit.lua script and rate_limiter.lua script

When is the rate_limiter.lua script loaded in Redis?

Like the Lightning Deal script, this script is loaded and cached in Redis when the Java program starts the Lightning Deal.

Another very important point is that the Java program will encode the sha1 code after the script is loaded, and cache it in Redis through a custom key (specifically "lua:sha1:rate_limiter"), so that the getToken_access_limit.lua script of Nginx can get it , and used when calling the evalsha method.

Note: Redis cluster is used, so each node needs to cache a copy of script data

/**
* 由于使用redis集群,因此每个节点都需要各自缓存一份脚本数据
* @param slotKey 用来定位对应的slot的slotKey
*/
public void storeScript(String slotKey){
    
    
if (StringUtils.isEmpty(unlockSha1) || !jedisCluster.scriptExists(unlockSha1, slotKey)){
    
    
   //redis支持脚本缓存,返回哈希码,后续可以继续用来调用脚本
    unlockSha1 = jedisCluster.scriptLoad(DISTRIBUTE_LOCK_SCRIPT_UNLOCK_VAL, slotKey);
   }
}

Common Current Limiting Components

Redission distributed current limiting adopts the idea of ​​token bucket and fixed time window. The trySetRate method sets the size of the bucket, uses the redis key expiration mechanism to achieve the purpose of time window, and controls the number of requests allowed to pass within the fixed time window.

The spring cloud gateway integrates redis current limiting, but it belongs to the gateway layer current limiting

reference link

System architecture knowledge map (a system architecture knowledge map worth 10w)

https://www.processon.com/view/link/60fb9421637689719d246739

The architecture of seckill system

https://www.processon.com/view/link/61148c2b1e08536191d8f92f

side summary

There are not many questions, only 6 questions, but they are all hard-core questions

The answer from the little friend is also in place, and it took a full 69 minutes to answer

Finally, one side passed

Two sides (1 hour)

1. Tell me, what are the advantages of object-oriented?

Object Orientation is a programming methodology that makes code easier to understand, extend, and maintain by encapsulating data and behavior in objects.

In the object-oriented method, the program is regarded as a collection of objects, and the objects communicate and cooperate with each other to realize the functions of the system.

The object-oriented approach differs significantly from the traditional structured approach. This idea advocates the use of human thinking to construct software systems from things that exist in the real world.

The object-oriented method is based on the concept of "object", takes the object as the center, and uses classes and inheritance as the construction mechanism to design and construct software systems.

The advantages of object orientation are:

1. Good reusability

The inheritance and encapsulation of the object-oriented method support software reuse. There are two ways to reuse an object class. One is to create an instance of the class to use it directly; the other is to derive a new class that meets the needs from it. The subclass can reuse the data structure and program code of its parent class, and can be easily modified on the basis of the parent class And expansion, and the modification of the subclass does not affect the use of the parent class.

2. Good scalability

Object-oriented programs can be extended as needs increase, because objects can add new properties and methods to meet new needs.

3. Consistent with human habitual way of thinking

The traditional structured software development method is process-oriented, with algorithms as the core, data and process as independent parts, data and process are separated, ignoring the inherent relationship between data and operations, and the problem space and solution space are not consistent of.

The object-oriented method is based on the object as the core, as close as possible to the abstract thinking method used by human beings, and describes the problem space and solution space as consistently as possible, so as to solve the problem naturally.

4. The stability of the system is good

The object-oriented method simulates the entities in the problem domain with objects, and describes the relationship between entities with the relationship between objects. When the functional requirements of the system change, it will not cause the overall change of the software structure, only some local modifications are required.

Since entities in the real world are relatively stable, the software system constructed centered on objects will also be relatively stable.

5. Easier to develop large software products

When developing large-scale software with the object-oriented method, the large-scale software products are regarded as a series of independent small products, and the iterative development model of RUP (Unified Development Process) can be used to reduce the technical difficulty of development and the difficulty of development work management.

6. Good maintainability

Because object-oriented software is more stable, easy to modify, easy to understand, easy to test and debug, so the maintainability of the software will be better.

2. Talk about the data structure and remember which data structures

Data structures are methods used in computer science to organize, store and manipulate data. In programming, choosing the right data structure can greatly improve the efficiency and performance of your code. Common data structures include:

  1. Array (Array): A linear structure that accesses elements through subscripts.
  2. Linked List: A linear structure that connects elements through pointers.
  3. Stack (Stack): Linear structure, first in last out (LIFO) data structure.
  4. Queue (Queue): Linear structure, first-in-first-out (FIFO) data structure.
  5. Tree: A non-linear structure consisting of nodes and edges.
  6. Graph: A non-linear structure consisting of nodes and edges.
  7. Heap: A non-linear structure, a data structure that can quickly find the maximum or minimum value.
  8. Hash Table: A data structure accessed according to the Key Value.
  9. Hash Map: A data structure that is accessed according to the Key Value and can be automatically sorted.
  10. Trie Tree: A data structure for string search and matching.

3. Talk about the characteristics and usage scenarios of arrays and linked lists

Arrays and linked lists are common data structures that are widely used in many programming languages. Their characteristics and application scenarios are briefly described below.

array

An array is a linear data structure that consists of a set of elements arranged in order.

Each element in an array has a unique index, which can be used to access and modify that element.

Arrays are usually used in numerical calculations, image processing, and other scenarios that require efficient access and manipulation of elements.

An array is a linear data structure that is a collection of elements of the same data type.

The size of the array is determined when it is defined, once the size is determined, it cannot be changed.

Key features of arrays include:

  • Fixed size: The size of the array is determined when it is defined, and once the size is determined, it cannot be changed.
  • Random access: The elements in the array can be accessed through a unique index value, so random access is possible.
  • Simple and efficient: operations on arrays are relatively simple, such as insertion, deletion, search, and other operations can be completed within O(1) time complexity.

Here is a simple array implementation:

javaCopy codepublic class Array {
    
    
    private int size;
    private int[ ] data;

    public Array(int size) {
    
    
        this.size = size;
        this.data = new int[size];
    }

    public int get(int index) {
    
    
        if (index < 0 || index >= size) {
    
    
            throw new IndexOutOfBoundsException("Index out of range");
        }
        return data[index];
    }

    public void set(int index, int value) {
    
    
        if (index < 0 || index >= size) {
    
    
            throw new IndexOutOfBoundsException("Index out of range");
        }
        data[index] = value;
    }

    public int length() {
    
    
        return size;
    }
}

Advantages and disadvantages of arrays:

  1. Insert data: time complexity O(1). When we need to insert an element at a certain position in the array, we can first query the memory address of the first element of the array. Since the memory address is continuous, for example, we need If you insert at 8 positions, you can directly find the memory address with index 8, but if you insert in the middle, you need to move the subsequent data backwards, which is time-consuming
  2. Query the number of data: time complexity O(1), which is consistent with the principle of insertion, and can be quickly queried based on the first memory address
  3. Query the data of a certain value: time complexity O(n), when querying the data of a certain value, you can only traverse the data from the beginning to the end for comparison, and you need to traverse all the data

linked list

A linked list is also a linear data structure.

A linked list is a dynamic data structure that consists of a series of nodes, each node containing a data element and a pointer to the next node.
A linked list can insert and delete nodes at any position, so it is very flexible. Linked list is usually used to implement the data structure of the program, network programming and other scenarios.

Key features of linked lists include:

  • Dynamic size: The size of the linked list is determined at runtime, and nodes can be added or removed at any time.
  • Convenient insertion and deletion: the operation of linked list is more flexible than that of array, such as insertion, deletion, search and other operations can be completed within O(1) time complexity.
  • Relatively large memory footprint: Since a linked list requires an extra pointer to point to the next node, its memory footprint is relatively large.

In short, arrays and linked lists have their own advantages and disadvantages. In practical applications, it is necessary to choose the appropriate data structure according to the specific situation.

Here is a simple linked list implementation:

javaCopy codepublic class LinkedList {
    
    
    private Node head;

    public LinkedList() {
    
    
        this.head = null;
    }

    public void append(int value) {
    
    
        Node newNode = new Node(value);
        if (head == null) {
    
    
            head = newNode;
            return;
        }
        Node current = head;
        while (current.next != null) {
    
    
            current = current.next;
        }
        current.next = newNode;
    }

    public void prepend(int value) {
    
    
        Node newNode = new Node(value);
        newNode.next = head;
        head = newNode;
    }

    public void insertAfterNode(Node prevNode, int value) {
    
    
        if (prevNode == null) {
    
    
            System.out.println("Previous node is not in the list");
            return;
        }
        Node newNode = new Node(value);
        newNode.next = prevNode.next;
        prevNode.next = newNode;
    }

    public void deleteNode(int value) {
    
    
        Node curNode = head;
        if (curNode == null) {
    
    
            System.out.println("Previous node is not in the list");
            return;
        }
        while (curNode.next != null) {
    
    
            curNode = curNode.next;
        }
        if (curNode.value == value) {
    
    
            curNode.next = curNode.next.next;
            return;
        }
        Node prevNode = null;
        while (curNode != null) {
    
    
            if (curNode.value == value) {
    
    
                prevNode = curNode;
                curNode = curNode.next;
            } else {
    
    
                prevNode = curNode;
                curNode = curNode.next;
            }
        }
        if (prevNode != null) {
    
    
            prevNode.next = curNode.next;
        } else {
    
    
            head = curNode.next;
        }
    }
}

Advantages and disadvantages of linked list

  1. Insert data: time complexity O(1). For example, to insert data after a certain data, you need to point the pointer of the data before insertion to the data to be inserted, and point the pointer of the data to be inserted to the next data. No need data to move
  2. Query the number of data: time complexity O(n), need to traverse the linked list query, because the memory address is discontinuous
  3. Query the data of a certain value: time complexity O(n), need to traverse the linked list query

In general, the difference between the two is:

  • Arrays are suitable for scenarios that require efficient access and manipulation of elements, such as numerical calculations, image processing, etc.;
  • The linked list is suitable for scenarios that require frequent insertion and deletion of elements, such as network programming.
  • When actually writing code, you need to choose an appropriate data structure according to specific scenarios and needs.

4. Talk about hashMap and the red-black tree in Hashmap

There is too much content in this question, please refer to Nien's Java Interview Collection Topic 31 PDF for details

"Topic 31: Hash Serial Cannon Interview Questions (Exclusively for the king of papers + the most complete in history + necessary for interviews in 2023)" PDF

There are too many contents of red-black tree, please refer to Nien Java Interview Collection Topic 33PDF for details

"Special Topic 33: BST, AVL, RBT Red-Black Tree, Three Core Data Structures (Exclusively for Juan Wang + Most Complete in History + Must-Have for 2023 Interviews)" PDF

5. Is hashMap thread-safe? Which hashMap is thread safe

There is too much content in this question, please refer to Nien's Java Interview Collection Topic 31 PDF for details

"Topic 31: Hash Serial Cannon Interview Questions (Exclusively for the king of papers + the most complete in history + necessary for interviews in 2023)" PDF

6. Talk about the lock

Includes locks at the hardware level

Locks at the operating system layer
JVM built-in locks
JUC Display locks
Distributed locks

There is too much content, and each lock can be opened for 10 minutes.

The previous four locks:

  • hardware layer lock
  • operating system lock
  • JVM built-in locks
  • JUC display lock

Please refer to Nien's "Java High Concurrency Core Programming Volume 2 Enhanced Edition" published by Tsinghua University Press

The following distributed locks:

Please refer to Nien's Nien Java Interview Collection topic:

"Topic 15: Distributed Lock Interview Questions (Exclusively for Juanwang + The most complete in history + 2023 interview must)" PDF

7. Talk about the thread pool

Thread Pool (Thread Pool) is a multi-thread processing method, which can improve the concurrency and efficiency of the program. A set of created threads is maintained in the thread pool. When there is a task to be executed, an idle thread is directly taken from the thread pool to execute the task. If no idle thread is available, the task is put into the task queue and waits. implement.

In Java, the thread pool can be implemented through the Executor framework. The specific steps are as follows:

  1. To create an ExecutorService object, you can use the static method of the Executors factory class to obtain an ExecutorService object with a specified number of threads.
  2. To submit the task to the ExecutorService object, you can use the submit() method to submit the Runnable or Callable object to the ExecutorService object.
  3. Use the execute() method of the ExecutorService object to execute the task, which returns a Future object through which the task execution result can be obtained.
  4. When the ExecutorService object is no longer needed, call the shutdown() method to close the thread pool.

Commonly used thread pool implementation classes in Java include ThreadPoolExecutor and ScheduledThreadPoolExecutor.

ThreadPoolExecutor is a thread pool-based executor that can store and reuse threads, and can set thread priority, queue size, rejection policy, etc.

ScheduledThreadPoolExecutor is a thread pool that supports delayed or periodic execution of tasks. It can set the delay time, execution cycle and execution times of the task, etc.

By using these thread pool implementation classes, you can easily create and manage thread pools and execute tasks. The following is a sample code to implement a simple calculator using ThreadPoolExecutor:

javaCopy codeimport java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Calculator {
    
    
    public static void main(String[] args) {
    
    
        ExecutorService executor = Executors.newFixedThreadPool(5);
        for(int i=0; i<5; i++) {
    
    
            executor.execute(new CalculatorTask());
        }
        executor.shutdown();
    }
}

class CalculatorTask implements Runnable {
    
    
    public void run() {
    
    
        int num1 = 5;
        int num2 = 10;
        int sum = num1 + num2;
        System.out.println("The sum of " + num1 + " and " + num2 + " is " + sum);
    }
}

This example creates a thread pool containing 5 threads, and uses the submit() method to submit 5 CalculatorTask objects to the thread pool for execution. After each thread is executed, the thread pool will automatically call the shutdown() method to close the thread pool. This makes it easy to implement a simple calculator.

For more knowledge about the thread pool, please refer to Nien's "Java High Concurrency Core Programming Volume 2 Enhanced Edition" published by Tsinghua University Press

8. Talk about mysql index

1. Index Introduction

1. What is an index?

MySQL's official definition of index: Index (Index) is a data structure that helps MySQL obtain data efficiently, and these data structures reference (point to) data in some way. The essence of indexing is: data structure. It can be simply understood as "sorted fast lookup data structure"

A very appropriate metaphor is the relationship between the catalog page of the book and the content of the text of the book. In order to facilitate the search of the content in the book, the content is indexed to form a catalog. Therefore, the first thing you need to understand is that the index is also a file, and it occupies physical space.

For example, for the MyISAM storage engine:

.frm    后缀的文件存储的是表结构。
.myd    后缀的文件存储的是表数据。
.myi    后缀的文件存储的就是索引文件。

For the InnoDB storage engine:

.frm    后缀的文件存储的是表结构。
.ibd    后缀的文件存放索引文件和数据(需要开启innodb_file_per_table 参数)

Therefore, when you create an index on a table, the size of the index file will also change. When the data in your data table changes due to additions and deletions, the index file will also change, but MySQL will automatically maintain the index. This process You don't need to intervene, which is why inappropriate indexes can affect MySQL performance.

Here is an example of one possible indexing method:

On the left is the data table, and on the far left is the physical address of the data record.

In order to speed up the search of Col2, a binary search tree shown on the right can be maintained, and each node contains an index key and a pointer to the physical address of the corresponding data record. In this way, binary search can be used to find corresponding data within a certain complexity, so as to quickly retrieve qualified records.

2. Advantages and disadvantages of indexing

Advantage:

  1. Indexing can greatly improve the retrieved data and reduce the number of rows retrieved from the table
  2. The connection conditions in the table connection can speed up the direct connection between the table and the table

Disadvantages:

  1. Indexes exist on disk and take up physical space.
  2. The query speed is improved, but at the same time, the speed of updating the table will be reduced, such as UPDATE, INSERT and DELETE on the table. Because when updating the table, MySQL not only needs to save the data, but also updates the index file every time the field with the added index column is updated.

3. Index names in different tables can be created repeatedly

The same field index of different tables in mysql can be renamed, because there is one index file for each table;

Naming rules:

普通索引:idx_字段名

唯一索引:ux_字段名
  1. The index is to put the data in the data table in the index file according to the data structure of the specific search algorithm for quick search;
  2. Indexes exist on disk and take up physical space.

The indexes we usually talk about, if not specified, are indexes organized by B-tree (multi-way search tree, not necessarily binary). Among them, the clustered index, covering index, composite index, prefix index, and unique index all use B+ tree index by default, collectively referred to as index. Of course, in addition to the index of the B+ tree type, there are also hash indexes (hash index) and so on.

2. Index classification

1. Ordinary index

This is the basic index without any restrictions, the default B-tree type index in MyISAM, the basic index type, there is no uniqueness restriction, and NULL values ​​are allowed.

# 直接创建索引
CREATE INDEX index_name ON table(column(length))

# 修改表结构的方式添加索引
ALTER TABLE table_name ADD INDEX index_name (column(length))

# 创建表的时候同时创建索引
CREATE TABLE table_name ( \*,INDEX index_name title(length))

# 删除索引
DROP INDEX index_name ON table

2. Unique index

Consistent with ordinary index, the difference is that the index value must be unique, and null values ​​are allowed

# 直接创建索引
CREATE UNIQUE INDEX index_name ON table(column(length))

# 修改表结构的方式添加索引
ALTER TABLE table_name ADD UNIQUE INDEX index_name (column(length))

# 创建表的时候同时创建索引
CREATE TABLE table_name ( \*,UNIQUE index_name title(length))

3. Full text index

FULLTEXT indexes are only supported by the MyISAM engine. They can be created from CHAR, VARCHAR, or TEXT columns as part of the CREATE TABLE statement, or added later using ALTER TABLE or CREATE INDEX, but remember that for large-volume data tables, generating full-text indexes is A very time-consuming and hard-disk space-consuming approach

# 直接创建索引
CREATE FULLTEXT INDEX index_content ON article(content)

# 修改表结构的方式添加索引
ALTER TABLE article ADD FULLTEXT index_content(content)

# 创建表的时候同时创建索引
CREATE TABLE table_name ( \*,FULLTEXT (content))

4. Combination index (the leftmost prefix, not belonging to the category)

The usual SQL query statements generally have more restrictive conditions, so in order to further squeeze the efficiency of MySQL, it is necessary to consider building a composite index.

ALTER TABLE article ADD INDEX index_titme_time (title(50),time(10))
ALTER TABLE `table_name` ADD INDEX index_name ( `column1`, `column2`, `column3` )

Establishing such a combined index is actually equivalent to establishing the following two sets of combined indexes: –title,time–title

5. Primary key index (not classified)

Data columns are not allowed to be repeated, and NULL is not allowed, and a table can only have one primary key.

ALTER TABLE `table_name` ADD PRIMARY KEY ( `column` )

3. Check the index

show index from 表名
show keys from  表名
desc 表名

Fourth, the type of index

As mentioned above, index files are stored according to different data structures, and different data structures also produce different index types. Common index types include

B-Tree index, hash index, spatial data index (R-Tree), full-text index

1. B-tree index

​ B-Tree index is the most commonly used index. If no specific type is specified, it is probably a B-Tree index. In fact, many search engines use its variant B+Tree, which is a reference to B-Tree an optimization of

​ Most storage engines, such as MyISAM and InnoDB, support this kind of index, so it is the most widely used and most commonly used index method, but different storage engines will have slightly different implementations, such as MyISAM Indexes will be compressed using prefix compression, but InnoDB will not. The B-tree system can be understood as a "sorted fast search structure".

The following figure shows how the B-Tree index stores the indexed data:

illustrate:

The image on the left is a data table with three columns, and the image on the right shows how the data is indexed.

It can be seen that the B-Tree stores the index columns in order, and each leaf node points to the indexed data, which is why the B-Tree index supports range search data.

2.hash index

Compared with the B-Tree index, the implementation of the hash index is relatively simple. It is implemented based on the hash table. For the columns to be indexed, the storage engine will calculate the one-to-one corresponding hash code, and then put the hash The code is stored in the hash table as a key, and the value is a pointer to the row of data.

The following figure is a simple principle display:

illustrate:

The purple image on the left represents a two-column data table.

The middle means that the hash index is performed on the fname column to calculate the hash value.

The green figure on the right indicates that the generated hash value is stored in the hash table.

When we execute the following query:

select * from testTable where fname = "mary";

MySQL will first calculate the hash value of the query condition mary, and then go to the hash table to find the hash value. If it finds the hash value according to the corresponding pointer, it will find the data row that needs to be found.

Advantages and limitations of hash tables:

  • Advantage

In the memory table, the default is a hash index, only need to compare the hash value, so the speed is very fast, the theoretical query time complexity is O(1), and the performance advantage is obvious;

  • limit
  1. Does not support any range query, such as where price > 150, because it is based on hash calculation and supports equality comparison.
  2. Hash tables are stored out of order, so index data cannot be used for sorting.
  3. This type is not supported by mainstream storage engines, such as MyISAM and InnoDB. Hash indexes are only supported by Memory and NDB engines.
  4. Must go back to the row. That is to say, to get the data position through the index, you must go back to the table to get the data

Therefore, although the hash index is fast, its use is actually very limited, and it is only suitable for some special occasions.

3. Spatial data index (R-Tree)

Spatial index can be used for geographic data storage, and it needs the support of GIS-related functions. Since MySQL's GIS support is not perfect, this index method is rarely used in MySQL.

4. Index summary

  1. B-Tree index is the most widely used and supported by mainstream engines.
  2. Hash index has high performance and is suitable for special occasions.
  3. R-Tree is not commonly used.
  4. Full-text index fulltext is suitable for keyword fuzzy search of massive data. It is impossible to use like, but it seems to have no effect on Chinese. You can use professional search engines such as Sphinx or Solr.

5. The relationship between index and storage engine

In MySQL, the index is implemented in the storage engine, not all storage engines support all index types, such as hash index, MyISAM and InnoDB are not supported; similarly, even for the same type of index, different The implementation methods of storage engines may also be different. For example, MyISAM and InnoDB have different implementations of B-Tree indexes.

Summarize:

  1. Different storage engines may support different index types;
  2. Different storage engines may have different implementations for the same index type.

9. Talk about the underlying principle of the index (hash index and B+ tree)

The underlying principle of the index refers to the process of how the database implements the index. In MySQL, common index types include B-Tree index and hash index.

A B-Tree index is a balanced tree structure that stores index information by dividing data into blocks. Each node contains a key value and pointers to child nodes. The B-Tree index can ensure that the target data row can be found by traversing down the tree structure during query, so it has better query performance.

A hash index is a data structure based on a hash table, which stores index information by mapping a key value to a location in the hash table. Hash indexes are suitable for equivalent queries, that is, the query conditions only involve the value of the key and not the order of the key. Since the lookup time complexity of the hash table is O(1), the hash index has very fast query performance. However, the hash index does not support range query and sorting operations, so it needs to be selected according to specific needs in practical applications.

Whether it is a B-Tree index or a hash index, they all speed up data retrieval operations by creating an index structure in memory. When querying, the database will first check whether the index structure in the cache exists, and if it exists, it will return the result directly; if it does not exist, it needs to read the data file from the disk and build the index structure, and then perform the query operation.

It is important to note that indexes have an impact on the performance of insert, update, and delete operations. Therefore, when creating an index, trade-offs and choices need to be made according to specific business needs and data characteristics. At the same time, it is also necessary to maintain indexes regularly, delete unnecessary indexes, and rebuild indexes that are no longer used.

10. Talk about the time complexity of B+ tree and red-black tree

Both B+ trees and red-black trees are commonly used data structures for implementing indexes. They are all balanced and can guarantee that the complexity of inserting, deleting, searching and other operations is O(logn).

The following introduces the time complexity of B+ tree and red-black tree respectively:

B+ tree

A B+ tree is a multi-way search tree that stores index information by dividing data into blocks. Each node contains a key value and pointers to child nodes. The time complexity of the query, insertion and deletion operations of the B+ tree is O(logn), and the time complexity of the query operation is O(logn), because the leaf nodes of the B+ tree are connected by pointers, which can be followed one by one. Traverse to the leaf nodes, and then perform search operations; the time complexity of insertion and deletion operations is also O(logn), because the nodes of the B+ tree will be automatically sorted according to the size of the keywords, and the structure of the tree needs to be adjusted when inserting or deleting operations , to maintain balance.

red black tree

The red-black tree is also a self-balancing binary search tree, which maintains the balance of the tree through operations such as rotation and color change. The time complexity of the insertion, deletion and search operations of the red-black tree is O(logn), and the time complexity of the search operation is O(logn), because the nodes of the red-black tree will be automatically sorted according to the size of the key, and the insertion Or delete operations need to adjust the structure of the tree to maintain balance.

It should be noted that the specific implementation of B+ tree and red-black tree may affect their performance. For example, each node in a B+ tree can store multiple key values ​​and pointers, while each node in a red-black tree only stores one key value. In addition, B+ trees are generally suitable for range queries and sorting operations, while red-black trees are suitable for single-point queries and ordered collections. Therefore, in practical applications, it is necessary to select an appropriate data structure according to specific needs.

11. Do you know about the mysql storage engine? Tell me about it

For the convenience of management, functions that do not involve real data storage, such as connection management, query cache, syntax analysis, and query optimization, are divided into functions of MySQL Server, and functions of real access to data are divided into functions of storage engines. Therefore, after MySQL Server completes the query optimization, it only needs to call the API provided by the underlying storage engine according to the generated execution plan, and return the data to the client after obtaining the data.

The concept of storage engine is mentioned in MySQL. In short, the storage engine refers to the type of table. In fact, the storage engine used to be called a table processor, but was later renamed as a storage engine. Its function is to receive instructions from the upper layer, and then extract or write data in the table.

1. Engines supported by MySQL

We can view the storage engines supported by the current database server with the following command:

show engines
Engine Support Comment Transactions SHAH Savepoints
InnoDB DEFAULT Supports transactions, row-level locking, and foreign keys YES YES YES
MRG_MYISAM CODE YES Collection of identical MyISAM tables NO NO NO
MEMORY YES Hash based, stored in memory, useful for temporary tables NO NO NO
BLACKHOLE YES /dev/null storage engine (anything you write to it disappears) NO NO NO
MyISAM YES MyISAM storage engine NO NO NO
CSV YES CSV storage engine NO NO NO
ARCHIVE YES Archive storage engine NO NO NO
PERFORMANCE_SCHEMA YES Performance Schema NO NO NO
FEDERATED NO Federated MySQL storage engine

XA is the interface specification (that is, the interface function) between the transaction middleware and the database defined by X/Open DTP. It is used by transaction middleware to notify the database of transaction start, end, commit, rollback, etc. The XA interface function is provided by the database vendor, and it can be understood that the XA indicator indicates whether distributed transactions are supported.

View the storage engine of the current server

show variables like  '%storage_engine%'


If the storage engine of the specified table is not displayed in the statement of creating the table, InnoDB will be used as the storage engine of the table by default. If we want to change the default storage engine of the table, we can write the command line to start the server like this:

set DEFAULT_STORAGE_ENGINE=InnoDB 

Or modify the my.cnf file:

default-storage-engine=InnoDB

1) InnoDB engine

Transactional storage engine with foreign key support. MySQL has included the InnoDB storage engine since 3.23.34a, and after 5.5, the InnoDB engine is used by default.

Innodb引擎Provides support for database ACID transactions. And also provides row-level locks and foreign key constraints. InnoDB is MySQL's default transactional engine, which is designed to handle a large number of short-lived transactions. It can ensure the complete commit (Commit) and rollback (Rollback) of the transaction.

Data file structure :

  • Table name.frm stores the table structure (merged in table name.ibd in MySQL8)
  • Table name.ibd stores data and indexes

InnoDB is designed for maximum performance in handling huge data volumes . In previous releases, dictionary data was stored in metadata files, non-transactional tables, etc. Now these metadata files are deleted, such as .frm .par .trn .isl .db .optetc. do not exist in MySQL8.

For the MyISAM storage engine, InnoDB writes are less efficient and take up more disk space to keep data and indexes.

MyISAM only caches indexes, not real data. InnoDB not only caches indexes but also caches real data, which has high memory requirements, and memory size has a decisive impact on performance.

Summary: In addition to adding and querying, update and delete operations are required, so the InnoDB storage engine should be preferred.

2) MyISAM engine

The primary non-transactional storage engine, which was the default storage engine prior to 5.5.

MyISAM provides a large number of features, including full-text indexing, compression, spatial functions (GIS), etc., but MyISAM does not support transactions, row-level locks, and foreign keys. One undoubted defect is that it cannot be safely recovered after a crash.

The advantage is that the access speed is fast, there is no requirement for transaction integrity or select and insert-based applications (read-only applications or read-based services).

There is additional constant storage for data statistics, so the query efficiency such as count(*) is very high.

Data file structure:

  • Table name.frm stores the table structure (table name.sdi in MySQL8)
  • Table name.MYD stores data
  • Table name.MYI storage index

3) Archive engine

for data archiving. archive means archiving, and only supports insert and query functions (rows cannot be modified after they are inserted). The index function is supported after MySQL5.5.

It has a good compression mechanism, uses the zlib compression library, and compresses in real time when recording requests, and is often used as a warehouse. When creating an ARCHIVE table, the storage engine will create a file whose name starts with the table name, and the extension of the data file is .ARZ.

According to the test results in English, under the same amount of data, the Archive table is about 75% smaller than the MyISAM table, and 83% smaller than the InnoDB table that supports transaction processing.

The Archive storage engine uses row-level locks. The engine supports AUTO_INCREMENTcolumn attributes. AUTO_INCREMENT columns can have unique or non-unique indexes. Trying to create SO on any other column will result in an error.

The Archive table is suitable for log and data acquisition (archive) applications, and is suitable for storing a large amount of independent data as historical records. It has high insertion speed, but poor query support.

trait support
B-tree index NO
backup/point-in-time recovery (implemented in the server, not in the storage engine) Yes
Cluster database support No
clustered index No
compressed data Yes
data cache No
Encrypted data (the encryption function is implemented in the server) Yes
foreign key No
full text index No
Geospatial Data Types Yes
geospatial index No
hash index No
index cache No
lock granularity row lock
MVCC No
storage limit no restrictions
affairs No
Update statistics for data dictionary Yes

4) Blackhole engine

Write operations are discarded, and read operations return empty content. The Blackhole engine does not implement any storage mechanism, it discards all inserted data without making any guarantees. But the server will record the log of the Blackhole table, so it can be used to copy data to the standby database, or simply record to the log. However, this application method will encounter many problems and is not recommended.

5) CSV engine

When storing data, separate data items with commas. The CSV engine can treat ordinary CSV files as MySQL tables, but does not support indexes. It can be used as a data exchange mechanism, and the stored data can be read directly in the operating system with a text editor or Excel. It has obvious advantages for fast import and export of data.

When a CSV table is created, the server creates a plain text data file whose name begins with the table name and has an .CSVextension. When you store data into a table, the storage engine saves it to the data file in a comma-separated values ​​format.

6) Memory engine

A table placed in memory. The logical medium used by Memory is memory, and the response speed is fast, but when the mysqld daemon process crashes, the data will be lost. In addition, the data that is required to be stored is in a format with a constant data length, for example, blob/textthe type of data is not available.

Memory supports both hash (HASH) index and B+ tree index

  • Hash indexes are fast for equivalent queries, but slow for range queries;
  • Hash index is used by default

Memory tables are at least an order of magnitude faster than MyISAM tables.

The size of the memory table is limited. The size of the table mainly depends on two parameters, namely max_rowsand max_heap_table_size. Among them, max_rowsit can be specified when creating the table, max_heap_table_sizeand the default size is 16MB, which can be expanded as needed.

Memory data files and index files are stored separately.

  • Each table based on the Memory storage engine actually corresponds to a disk file. The file name of the file is the same as the table name, and the type is . Only the .frmstructure of the table is stored in this file, and its data files are stored in memory.
  • The advantage is that it is conducive to the rapid processing of data and improves the processing efficiency of the entire table;
  • The disadvantage is that the data is easy to lose and the life cycle is short.

Federated engine , accessing remote tables. The engine is a proxy for accessing other MySQL servers, and while it seems to provide a nice level of cross-server flexibility, it often causes problems and is therefore disabled by default.

The Merge engine manages a collection of tables composed of multiple MyISAM tables.

NDB engine, a dedicated storage engine for MySQL clusters. Also known as the NDB Cluster storage engine, it is mainly used in the MySQL Cluster distributed cluster environment, similar to Oracle's RAC cluster.

Two, the difference between MyISAM and InnoDB

The default storage engine before MySQL 5.5 was MyISAM, and it was changed to InnoDB after 5.5.

First of all, for the InnoDB storage engine, it provides good support 事务管理、崩溃修复能力和并发控制. Because the InnoDB storage engine supports transactions, it is necessary to choose InnoDB for occasions that require transaction integrity. For example, data operations include many update and delete operations in addition to insertion and query, such as financial systems. 缺点是读写效率较差,占用的数据空间相对比较大.

Secondly, for the MyISAM storage engine, if it is a small application, the system is mainly read and insert operations, with only a few update and delete operations, and the requirements for transactions are not so high, you can choose this storage engine. MyISAM存储引擎的优势在于占用空间小,处理速度快。缺点是不支持事务的完整性和并发性.

\ MyISAM InnoDB
cache Only the index is cached, the real data is not cached Not only the index is cached, but also the real data is cached, which requires high memory. And memory size has a decisive impact on performance.
foreign key not support support
affairs not support support
Lock table level locking Row-level locking, table-level locking, low locking force and high concurrency
Implementation of the index B+ tree index, myisam is a heap table B+ tree index, Innodb is an index-organized table
hash index not support support
full text index support not support
record storage order Save in record insertion order Insert in order by primary key size
storage MyISAM can be compressed with less storage space InnoDB tables require more memory and storage, and it will build its dedicated buffer pool in main memory for caching data and indexes
Portability, Backup and Recovery Since MyISAM data is stored in the form of files, it will be very convenient in cross-platform data transfer. A table can be operated independently during backup and recovery Free solutions can be copying data files, backing up binlog, or using mysqldump, which is relatively painful when the amount of data reaches tens of gigabytes

1) The difference in the index

  • InnoDB indexes are clustered indexes and MyISAM indexes are nonclustered indexes.
  • The leaf nodes of InnoDB's primary key index store row data, so the primary key index is very efficient.
  • The leaf nodes of the MyISAM index store the data 行数据地址that needs 再寻址一次to be obtained.
  • The leaf nodes of the InnoDB non-primary key index store the primary key and other indexed column data, so it is very efficient to implement a covering index when querying.
  • InnoDB's data files themselves are primary key index files, such indexes are called "clustered indexes"

In InnoDB, the primary index (primary key index or clustered index) is stored together with the row data, while the secondary index (auxiliary index) is stored separately, and then there is a pointer to the primary key. The primary key is mainly used when scanning the index and returning row data.

2) Table/row lock differences

MyISAM只支持表级锁,用户在操作myisam表时,select,update,delete,insert语句都会给表自动加锁,如果加锁以后的表满足insert并发的情况下,可以在表的尾部插入新的数据。

InnoDB支持事务和行级锁,是innodb的最大特色。行锁大幅度提高了多用户并发操作的新能。但是InnoDB的行锁,是在索引上生效的。如果没有命中索引,则锁表。

MyISAM锁的粒度是表级,而InnoDB支持行级锁定。简单来说就是, InnoDB支持数据行锁定,而MyISAM不支持行锁定,只支持锁定整个表。

MyISAM同一个表上的读锁和写锁是互斥的,MyISAM并发读写时如果等待队列中既有读请求又有写请求,默认写请求的优先级高,即使读请求先到,所以MyISAM不适合于有大量查询和修改并存的情况,那样查询进程会长时间阻塞。因为MyISAM是锁表,所以某项读操作比较耗时会使其他写进程饿死。

3)表主键

MyISAM: 允许没有任何索引和主键的表存在,索引都是保存行的地址。

InnoDB: 如果没有设定主键或者非空唯一索引,就会自动生成一个6字节的主键(用户不可见),行数据是主键索引的一部分,二级索引则是单独存放,然后有个指针指向primary key。InnoDB的主键范围更大,最大是MyISAM的2倍。

4)行数统计count

没有where的count()使用MyISAM要比InnoDB快得多。因为MyISAM内置了一个计数器,count()时它直接从计数器中读,而InnoDB必须扫描全表。

所以在InnoDB上执行count()时一般要伴随where,且where中要包含主键以外的索引列。为什么这里特别强调“主键以外”?

因为InnoDB中主键索引是和行数据存放在一起的,而二级索引则是单独存放,然后有个指针指向primary key。所以只是count()的话使用secondary index扫描更快,而primary key则主要在扫描索引同时要返回row data时的作用较大。

三、InnoDB的优势

InnoDB存储引擎在实际应用中拥有诸多优势,比如操作便利、提高了数据库的性能、维护成本低等。如果由于硬件或软件的原因导致服务器崩溃,那么在重启服务器之后不需要进行额外的操作。InnoDB崩溃恢复功能自动将之前提交的内容定型,然后撤销没有提交的进程,重启之后继续从崩溃点开始执行。

InnoDB存储引擎在主内存中维护缓冲池,高频率使用的数据将在内存中直接被处理。这种缓存方式应用于多种信息,加速了处理进程。

在专用服务器上,物理内存中高达80%部分被应用于缓冲池。插入、更新和删除操作通过做改变缓冲自动机制进行优化。InnoDB不仅支持当前读写,也会缓冲改变的数据到数据流磁盘。

如果需要将数据插入不同的表中,可以设置外键加强数据的完整性。更新或者删除数据,关联数据将会被自动更新或删除。如果试图将数据插入从表,但在主表中没有对应的数据,插入的数据将被自动移除。

如果磁盘或内存中的数据出现崩溃,在使用脏数据之前,校验和机制会发出警告。当每个表的主键都设置合理时,与这些列有关的操作会被自动优化。

InnoDB的性能优势不只存在于长时运行查询的大型表。在同一列多次被查询时,自适应哈希索引会提高查询的速度。使用InnoDB可以压缩表和相关索引,可以在不影响性能和可用性的情况下创建或删除索引。

对于大型文本和BLOB数据,使用动态行形式,这种存储布局更高效。即使有些操作系统限制文件大小为2GB,InnoDB仍然可以处理。当处理大数据量时,InnoDB兼顾CPU以达到最大性能。

在同一个语句中,InnoDB表可以与其他存储引擎表混用。

四、InnoDB和ACID模型

ACID模型是一系列数据库设计规则,这些规则着重强调可靠性,而可靠性对于商业数据和任务关键型应用非常重要。MySQL包含类似InnoDB存储引擎的组件,与ACID模型紧密相连,这样出现意外时,数据不会崩溃,结果不会失真。

如果依赖ACID模型,可以不使用一致性检查和崩溃恢复机制。如果拥有额外的软件保护,极可靠的硬件或者应用可以容忍一小部分的数据丢失和不一致,可以将MySQL设置调整为只依赖部分ACID特性,以达到更高的性能。下面给出InnoDB存储引擎和ACID模型相同作用的四个方面。

1)原子方面

ACID的原子方面主要涉及InnoDB事务,与MySQL相关的特性主要包括:

  • 自动提交设置
  • Commit语句
  • rollBack语句
  • 操作INFORMATION_SCHEMA 库中的表数据

2)一致性方面

ACID模型的一致性主要涉及保护数据不崩溃的内部InnoDB处理过程,与MySQL相关的特性主要包括:

  • InnoDB双写缓存
  • InnoDB崩溃恢复

3)隔离方面

隔离是应用于事务的级别,与MySQL相关的特性主要包括:

  • 自动提交设置
  • SET ISOLATION LEVEL语句
  • InnoDB锁的低级别信息

4)耐久性(持久性)方面

ACID模型的耐久性主要涉及与硬件配置相互影响的mysql软件特性。由于硬件复杂度多样化,耐久性方面没有具体的规则可循。与MySQL相关的特性有:

  • InnoDB双写缓存,通过 innodb_doublewrite 配置项配置;
  • 配置项 innodb_flush_log_at_trx_commit
  • 配置项 sync_binlog
  • 配置项 innodb_file_per_table
  • 存储设备的写入缓存
  • 存储设备的备用电池缓存
  • 运行MySQL的操作系统
  • 持续的电力供应
  • 备份策略
  • 对分布式或托管的应用,最主要的在于硬件设备的地点及网络情况

五、InnoDB架构

1)缓冲池

缓冲池是主内存中的一部分空间,用来缓存已使用的表和索引数据。缓冲池使得经常被使用的数据能够直接在内存中获得,从而提高速度。

2)更改缓存

更改缓存是一个特殊的数据结构,当受影响的索引页不在缓存中时,更改缓存会缓存辅助索引页的更改。索引页被其他读取操作时会加载到缓存池,缓存的更改内容就会被合并。不同于集群索引,辅助索引并非独一无二的。

当系统大部分闲置时,清除操作会定期运行,将更新的索引页刷入磁盘。更新缓存合并期间,可能会大大降低查询的性能。在内存中,更新缓存占用一部分InnoDB缓冲池。在磁盘中,更新缓存是系统表空间的一部分。更新缓存的数据类型 由 innodb_change_buffering 配置项管理。

3)自适应哈希索引

自适应哈希索引将负载和足够的内存结合起来,使得InnoDB像内存数据库一样运行,不需要降低事务上的性能或可靠性。这个特性通过 innodb_adaptive_hash_index 选项配置,或者通过 --skip-innodb_adaptive_hash_index 命令行在服务器启动时关闭。

4)重做日志缓存

重做日志缓存存放要放入重做日志的数据。重做日志缓存大小通过 innodb_log_buffer_size 配置项配置。重做日志缓存会定期地将日志文件刷入磁盘。大型的重做日志缓存使得大型事务能够正常运行而不需要写入磁盘。

5)系统表空间

系统表空间包括InnoDB数据字典、双写缓存、更新缓存和撤销日志,同时也包括表和索引数据。多表共享,系统表空间被视为共享表空间。

6)双写缓存

双写缓存位于系统表空间中,用于写入从缓存池刷新的数据页。只有在刷新并写入双写缓存后,InnoDB才会将数据页写入合适的位置。

7)撤销日志(Undo log)

撤销日志是一系列与事务相关的撤销记录的集合,包含如何撤销事务最近的更改。

如果其他事务要查询原始数据,可以从撤销日志记录中追溯未更改的数据。撤销日志存在于撤销日志片段中,这些片段包含于回滚片段中。

8)独立表空间

也就是每个表一个文件的表空间,即每个单独的表空间创建在自身的数据文件中,而不是系统表空间中。这个功能通过 innodb_file_per_table 配置项开启。每个表空间由一个单独的 .ibd 数据文件代表,该文件默认被创建在数据库目录中。

9)通用表空间

使用CREATE TABLESPACE 语法创建共享的InnoDB表空间。通用表空间可以创建在MySQL数据目录之外能够管理多个表并支持所有行格式的表。

10)撤销表空间

撤销表空间由一个或多个包含撤销日志的文件组成,撤销表空间的数量是由 innodb_undo_tablespace 配置项配置。

11)临时表空间

用户创建的临时表空间和基于磁盘的内部临时表都创建于临时表空间。innodb_temp_data_file_path 配置项定义了相关的路径、名称、大小和属性。如果该值为空,默认会在 innodb_data_home_dir变量指定的目录下创建一个自动扩展的数据文件。

12)重做日志(Redo log)

重做日志是基于磁盘的数据结构,在崩溃恢复期间使用,用来修复数据。正常操作期间,重做日志会将请求数据进行编码,这些请求会改变 InnoDB表数据。遇到意外崩溃后,未完成的更改会自动在初始化期间重新进行。

12、说说mysql 集群原理

一、什么是MySQL集群

MySQL集群是一个无共享的(shared-nothing)、分布式节点架构的存储方案,其目的是提供容错性和高性能。

数据更新使用读已提交隔离级别(read-committedisolation)来保证所有节点数据的一致性,使用两阶段提交机制(two-phasedcommit)保证所有节点都有相同的数据(如果任何一个写操作失败,则更新失败)。

无共享的对等节点使得某台服务器上的更新操作在其他服务器上立即可见。传播更新使用一种复杂的通信机制,这一机制专用来提供跨网络的高吞吐量。

通过多个MySQL服务器分配负载,从而最大程序地达到高性能,通过在不同位置存储数据保证高可用性和冗余。

二、架构图

三、如何存储数据

1.Mysqlcluster数据节点组内主从同步采用的是同步复制,来保证组内节点数据的一致性。一般通过两阶段提交 协议来实现,一般工作过程如下:

1)Master执行提交语句时,事务被发送到slave,slave开始准备事务的提交。

2)每个slave都要准备事务,然后向master发送OK(或ABORT)消息,表明事务已经准备好(或者无法准备该事务)。

3)Master等待所有Slave发送OK或ABORT消息

如果Master收到所有 Slave的OK消息,它就会向所有Slave发送提交消息,告诉Slave提交该事务;

如果Master收到来自任何一个Slave的ABORT消息,它就向所有 Slave发送ABORT消息,告诉Slave去中止事务。

4)每个Slave等待来自Master的OK或ABORT消息。

如果Slave收到提交请求,它们就会提交事务,并向Master发送事务已提交 的确认;

如果Slave收到取消请求,它们就会撤销所有改变并释放所占有的资源,从而中止事务,然后向Masterv送事务已中止的确认。

5) 当Master收到来自所有Slave的确认后,就会报告该事务被提交(或中止),然后继续进行下一个事务处理。

由于同步复制一共需要4次消息传递,故mysql cluster的数据更新速度比单机mysql要慢。所以mysql cluster要求运行在千兆以上的局域网内,节点可以采用双网卡,节点组之间采用直连方式。

疑问: 对cluster进行扩容增加数据节点组时会不 会导致数据更新速度降低?

答:不会,数据更新速度会变快。因为数据是分别处理,每个节点组所保存的数据是不一样的,
也能减少锁定。

2.Mysqlcluster将所有的索引列都保存在主存中,其他非索引列可以存储在内存中或者通过建立表空间存储到磁盘上。如果数据发生改变(insert,update,delete等),mysql 集群将发生改变的记录写入重做日志,然后通过检查点定期将数据定入磁盘。由于重做日志是异步提交的,所以故障期间可能有少量事务丢失。为了减少事务丢失,mysql集群实现延迟写入(默认延迟两秒,可配置),这样就可以在故障发生时完成检查点写入,而不会丢失最后一个检查点。一般单个数据节点故障不会导致任何数据丢失,因为集群内部采用同步数据复制。

四、MySQL集群的横向扩展

1.添加数据节点组来扩展写操作,提高 cluster的存储能力。支持在线扩容,先将新的节点加入到clsuter里,启动后用
ALTER ONLINE TABLE table_name REORGANIZE PARTITION

命令进行数据迁移,把数据平均分配到数据节点上。

2.添加Slave仅仅扩展读,而不能做到写操作的横向扩展。

整个系统的平均负载可以描述为:

AverageLoad=∑readload+ ∑writeload / ∑capacity

假设每个服务器每秒有10000的事务量,而Master每秒的写负载为4000个事务,每秒的读负载为6000,结果就是:

AverageLoad=6000+4000/10000=100%

现在,添加3个slave,每秒的事务量增加到40000。因为写操作也会被复制,每个写操作执行4次,

这样每个slave的写负载就是每秒4000个事务。那么现在的平均负载为:

AverageLoad=6000+4*4000/ 4*10000=55%

五、MySQL集群的优缺点

优点:

1)99.999%的高可用性

2))快速的自动失效切换

3)灵活的分布式体系结构,没有单点故障

4)高吞吐量和低延迟

5)可扩展性强,支持在线扩容

缺点:

1)存在很多限制,比如:不支持外键

2)部署、管理、配置很复杂

3)占用磁盘空间大,内存大

4)备份和恢复不方便

5)重启的时候,数据节点将数据load到内存需要很长时间

13、关于操作系统:说一下进程和线程

进程和线程都是操作系统中的基本执行单元,但是它们之间有一些重要的区别。

进程(Process)是操作系统中正在运行的程序实例,每个进程都有自己的独立内存空间、系统资源和打开的文件等。进程之间相互隔离,一个进程的崩溃不会影响其他进程的运行。进程之间可以通过系统调用进行通信和协作。

线程(Thread)是进程中的一个执行单元,它与同一进程中的其他线程共享进程的内存空间和系统资源。多个线程可以同时执行,它们之间相互协作,共享数据和资源。由于线程共享进程的内存空间,因此线程之间的通信和同步比进程之间更加容易。

区别:

  1. 内存空间:进程拥有独立的内存空间,而线程共享同一个进程的内存空间。
  2. 资源占用:进程需要独立的系统资源来运行,如文件句柄、网络连接等,而线程可以共享这些资源。
  3. 通信方式:进程之间通信需要通过系统调用进行,而线程之间可以通过共享变量或者消息传递等方式进行通信。
  4. 创建和销毁开销:创建和销毁进程的开销比创建和销毁线程要大得多。

应用:

进程适用于需要独立运行、相互隔离的应用程序,如编译器、数据库管理系统等。在多核处理器上,使用多个进程可以充分利用多核处理器的性能。

线程适用于需要频繁切换、共享资源的应用程序,如图形界面应用程序、网络应用程序等。在单核处理器上,使用多个线程可以提高应用程序的并发性和响应速度。

14、说说进程通信方式

进程通信是操作系统提供的一种机制,使得多个进程之间可以交换信息和数据。以下是几种常见的进程通信方式:

  1. 共享内存:共享内存是最常见的进程通信方式之一。它允许两个或多个进程共享同一块内存,这样它们就可以访问同一个数据。共享内存通常用于进程间通信,例如在同一个进程中的两个线程之间共享数据。
  2. 消息传递:消息传递是另一种进程通信方式,它允许进程之间交换消息。消息可以是任何类型的数据,包括数字、字符串、结构体等。消息传递通常用于进程间通信,例如在操作系统中的进程调度和同步中使用。
  3. 共享文件:共享文件是一种特殊类型的共享内存,它允许两个或多个进程共享同一个文件。共享文件通常用于文件传输和同步等操作。
  4. 管道:管道是一种特殊类型的共享内存,它允许一个进程将数据写入管道,而另一个进程可以从管道中读取数据。管道通常用于进程间通信,例如在命令行界面中使用管道来将命令和输出传递给其他进程。
  5. 信号量:信号量是一种特殊类型的变量,它允许一个进程在另一个进程中等待或响应。信号量通常用于进程间通信,例如在操作系统中使用信号量来实现进程同步和调度。

这些是常见的进程通信方式,每种方式都有其优点和缺点,适用于不同的应用场景。

15、关于计算机网络:说一下 HTTP HTTPS

HTTP(超文本传输协议)和HTTPS(安全超文本传输协议)都是使用TCP/IP协议进行通信的网络协议。HTTP是一种超文本标准协议,HTTPS是安全的超文本标准协议。

特点:

  • HTTP是一种明文协议,所有的数据包都会明文传输而可能会被黑客窃取。
  • HTTPS是一种加密协议,使用公钥密码学技术来保证数据传输安全,即使数据被截获,也无法解密。
  • HTTPS还具有身份验证、数据加密、重定向等安全机制,确保用户的隐私和数据安全。
  • 因此,HTTPS被广泛用于需要传输敏感数据的网站。

应用:

  • HTTP用于公共场所、非敏感信息的传输,允许数据直接传送,速度快,适用于小数据的传递。
  • HTTPS用于需要传输敏感数据的网站,比如在线购物、银行支付等,它可以确保用户的隐私和数据安全。

16、说说HTTP 完整的请求过程

完整的HTTP请求过程包括以下几个步骤:

  1. 客户端向服务器发送HTTP请求报文。
  2. 服务器接收到请求报文后,对报文进行解析,根据报文中的方法(GET,POST等)和请求的资源(文件,数据等)来确定如何处理该请求。
  3. 服务器向客户端发送响应报文,报文中包含服务器返回的数据和状态信息。
  4. 客户端接收到响应报文后,解析响应报文中的数据和状态信息,根据响应确定是否成功获取了所需的资源或者是否出现了错误。

需要注意的是,HTTP请求和响应是通过TCP协议进行传输的,因此,在传输过程中需要进行三次握手建立连接,同时需要进行数据的分片和确认,以确保数据的可靠传输。

HTTP(Hypertext Transfer Protocol)是一种用于在Web浏览器和Web服务器之间传输数据的协议。HTTP请求过程可以分为以下几个步骤:

  1. 建立TCP连接:客户端向服务器发送一个请求,服务器收到请求后会建立与客户端的TCP连接。
  2. 发送请求头:客户端在建立TCP连接后会发送一个请求头(Request Header),其中包含了请求方法、请求URL、协议版本等信息。
  3. 发送请求体:如果请求需要携带数据,则客户端会在请求头后面发送一个请求体(Request Body)。
  4. 服务器响应头:服务器收到请求后会返回一个响应头(Response Header),其中包含了响应状态码、协议版本等信息。
  5. 发送响应体:服务器在返回响应头后会发送一个响应体(Response Body),其中包含了服务器返回的数据。
  6. 关闭连接:当客户端接收到服务器的响应后,会关闭TCP连接。

下面是一个简单的HTTP请求过程的示例:

假设有一个Web服务器在IP地址为192.168.1.100上运行,端口号为80,并且当前请求的URL为http://www.example.com/index.html。

客户端使用浏览器向服务器发送一个HTTP GET请求,请求头部包含以下信息:

GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
Connection: keep-alive
Cookie: userid=12345; sessionid=67890

服务器收到请求后会根据请求头部中的信息进行相应的处理,然后返回一个HTTP响应给客户端。在这个例子中,服务器可能会返回以下内容:

HTTP/1.1 200 OK
Date: Wed, 22 Oct 2021 10:30:00 GMT
Server: Apache/2.4.41 (Unix) OpenSSL/1.1.1f CDH11 x86_64 GNU/Linux
Last-Modified: Tue, 21 Oct 2021 14:25:37 GMT
ETag: "e6d7cbe-f7b8-4a9f-b7c8-e7a9fef5f5e5"
Accept-Ranges: bytes
Content-Length: 32768
Connection: keep-alive
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: -1
Set-Cookie: userid=12345; sessionid=67890; path=/; domain=www.example.com; expires=Wed, 23 Oct 2021 10:30:00 GMT; secure; HttpOnly
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Powered-By: PHP/7.4.15 (Apache) with Suhosin-Patch for PHP 7.4.x and PHP API v2 on Apache+PHP7.4.15组合

<!DOCTYPE html> <html> <head> <title>Hello World</title> </head> <body> <h1>Hello World!</h1> </body> </html>

17、说说http 报文请求行、请求头、请求正文

HTTP(Hypertext Transfer Protocol)是一种用于在Web浏览器和Web服务器之间传输数据的协议。HTTP请求由三部分组成:请求行、请求头和请求正文。

一、请求行:

包含以下信息:

  • HTTP方法(Get、Post等)
  • 目标URL
  • HTTP版本号

例如,一个GET请求的请求行可能如下所示:

GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
Connection: keep-alive
Cookie: userid=12345; sessionid=67890

二、请求头:

请求头包含了一些元数据,用于描述客户端发送的请求。它包含以下字段:

  • Host:目标服务器的主机名或IP地址。
  • User-Agent:客户端的标识信息,通常包括浏览器类型和版本号。
  • Accept:客户端可接受的内容类型列表。
  • Accept-Encoding:客户端可以压缩的内容类型列表。
  • Accept-Language:客户端可接受的语言类型列表。
  • Connection:客户端与服务器之间的连接类型。常见的连接类型有keep-alive和close。
  • Cookie:客户端发送给服务器的Cookie信息。
  • Referer:客户端访问当前页面的前一个页面的URL。
  • If-Modified-Since:如果缓存未过期,则表示客户端希望从缓存中获取资源。
  • If-None-Match:如果缓存未过期且资源没有被修改,则表示客户端希望使用缓存中的资源。
  • Content-Type:客户端发送的数据的类型。常见的数据类型有application/json、application/x-www-form-urlencoded等。
  • Content-Length:客户端发送的数据的长度。

例如,一个POST请求的请求头可能如下所示:

Content-Type: application/json
Content-Length: 25
{
    
    "name": "John", "age": 30}

其中,Content-Type指定了发送的数据类型为JSON格式,Content-Length指定了发送的数据长度为25个字节。

三、请求正文:

请求正文是客户端发送的实际数据内容。对于GET请求来说,请求正文通常是一个URL参数;对于POST请求来说,请求正文是客户端发送的数据内容。

请求正文是HTTP报文的主体部分,用于描述请求的内容和请求的目标资源。请求正文通常包括以下内容:

  • URI:包含请求资源的地址
  • Method:请求方法,例如GET、POST、PUT等
  • Header:请求头中包含的其他信息
  • Body:请求正文中包含的实际数据或请求参数

在HTTP报文中,请求行和请求正文是分开的,通过回车符或换行符分隔。HTTP报文的格式如下:

Copy codeHTTP/1.1 200 OK
Date: Wed, 25 Feb 2023 10:00:00 GMT
Content-Type: application/json
Content-Length: 123
Connection: keep-alive

其中,HTTP版本和状态码表示请求是否成功,Date表示请求发送的时间,Content-Type和Content-Length表示请求正文的类型和长度,Connection表示连接方式。在请求正文中,JSON格式的数据表示请求参数。

18、说说 post 和 put 区别

HTTP 的 POSTPUT 方法都用于向服务器发送数据,但是它们的作用和使用方式有所不同。 POST 方法用于向服务器发送数据,并且数据会被附加到服务器端的请求正文中。当服务器端收到 POST 请求时,它会将请求正文中的数据作为响应的主体部分返回给客户端。 PUT 方法用于向服务器发送数据,并且数据会替换服务器端的现有资源。当服务器端收到 PUT 请求时,它会将请求中的数据作为请求正文的一部分附加到服务器端的资源上。然后,服务器端会将资源的内容重新写入客户端请求的地址。 下面是一个简单的示例来说明 POSTPUT 方法的区别:

Copy code// POST方法
public void postData(String url, String data) {
    
    
    try {
    
    
        URL obj = new URL(url);
        HttpURLConnection con = (HttpURLConnection) obj.openConnection();
        con.setRequestMethod("POST");
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        OutputStream os = con.getOutputStream();
        os.write(data.getBytes());
        os.flush();
        os.close();
        con.getResponseCode(); // 执行 POST 请求
    } catch (Exception e) {
    
    
        e.printStackTrace();
    }
}

// PUT方法
public void putData(String url, String data) {
    
    
    try {
    
    
        URL obj = new URL(url);
        HttpURLConnection con = (HttpURLConnection) obj.openConnection();
        con.setRequestMethod("PUT");
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        OutputStream os = con.getOutputStream();
        os.write(data.getBytes());
        os.flush();
        os.close();
        con.getResponseCode(); // 执行 PUT 请求
    } catch (Exception e) {
    
    
        e.printStackTrace();
    }
}

在以上示例中,postData 方法使用 POST 方法向服务器发送数据,并且将数据附加到请求正文中。putData 方法使用 PUT 方法向服务器发送数据,并且将数据替换服务器端的现有资源。

19、说一下DNS 服务器解析原理

DNS(Domain Name System)服务器是一种用于将域名解析为IP地址的计算机系统。当用户在浏览器中输入一个网址时,浏览器会向本地DNS服务器发送一个查询请求,询问该网址对应的IP地址是什么。如果本地DNS服务器没有缓存该网址对应的IP地址,它会向根DNS服务器发送一个查询请求,根DNS服务器会继续向下查询,直到找到对应的权威DNS服务器或者直接返回查询结果。

下面是DNS服务器解析原理的具体步骤:

  1. 用户在浏览器中输入一个网址,如www.example.com。
  2. 浏览器向本地DNS服务器发送一个查询请求,请求包含以下信息:目标域名(www.example.com)、查询类型(A记录或MX记录等)、查询方式(默认为正向查询)。
  3. 如果本地DNS服务器没有缓存目标域名对应的IP地址,它会向根DNS服务器发送一个查询请求,请求包含目标域名、查询类型和查询方式等信息。
  4. 根DNS服务器收到查询请求后,会查找其缓存中是否已经存在目标域名对应的IP地址。如果存在,则直接返回该IP地址给本地DNS服务器;如果不存在,则继续向下查询。
  5. 根DNS服务器会继续向下查询,直到找到对应的权威DNS服务器或者直接返回查询结果。在向下查询的过程中,可能会遇到多个权威DNS服务器,每个权威DNS服务器都会根据自己的缓存信息来判断是否需要继续向下查询。
  6. 当权威DNS服务器收到查询请求后,会查找其缓存中是否已经存在目标域名对应的IP地址。如果存在,则直接返回该IP地址给本地DNS服务器;如果不存在,则继续向上查询。
  7. 当权威DNS服务器找到目标域名对应的IP地址后,它会将该IP地址返回给本地DNS服务器。本地DNS服务器收到IP地址后,会将其缓存起来,并将该IP地址返回给浏览器。
  8. 最后,浏览器收到本地DNS服务器返回的IP地址后,就会使用该IP地址来建立与目标网站的连接。

20、知道 ARP 欺骗吗?

ARP(Address Resolution Protocol)协议是用于将一个IP地址映射到一个MAC地址的协议。ARP欺骗是一种网络攻击技术,攻击者通过发送伪造的ARP响应包来欺骗网络中的其他设备,使其认为攻击者的MAC地址和IP地址是合法的。

具体来说,ARP欺骗可以分为以下两种类型:

  1. 静态ARP欺骗:攻击者在网络中预先设置一个虚假的ARP响应包,使得网络中的其他设备误认为该响应包是真实的。这样,当其他设备需要与攻击者通信时,它们会将自己的MAC地址和IP地址信息发送给攻击者,从而被攻击者利用进行进一步的攻击。
  2. 动态ARP欺骗:攻击者在网络中不断地发送伪造的ARP响应包,使得网络中的其他设备不断地更新自己的ARP缓存表。这样,当其他设备需要与攻击者通信时,它们会再次发送ARP请求,而攻击者则会发送伪造的ARP响应包,从而实现欺骗。

ARP欺骗可以被用于多种攻击方式,例如中间人攻击、网络隔离等。为了防止ARP欺骗攻击,网络管理员可以采取一些措施,例如定期更新ARP缓存表、限制ARP广播等。同时,用户也应该注意保护自己的网络安全,避免点击来自陌生人或可疑网站的链接,以及使用安全的网络连接方式。

21、HTTP 一次请求响应时间过长,怎么分析和解决?

HTTP 一次请求响应时间过长可能是由多种原因引起的,下面是一些可能的分析和解决方法:

  1. 网络延迟:可以通过 ping 命令或 traceroute 命令来检查网络延迟。如果网络延迟较高,可以考虑更换网络服务提供商或使用 CDN 等技术来加速网络。
  2. 服务器负载过高:可以通过监控服务器的 CPU、内存、磁盘等指标来判断服务器是否负载过高。如果负载过高,可以考虑优化代码、增加服务器数量或使用负载均衡等技术来分散负载。
  3. 数据库查询慢:可以通过检查数据库的慢查询日志来分析查询慢的原因。如果查询慢,可以考虑优化数据库索引、优化查询语句或增加缓存等技术来提高查询效率。
  4. 静态资源加载慢:可以通过使用浏览器的开发者工具来分析静态资源的加载时间。如果加载慢,可以考虑使用 CDN 或优化静态资源的压缩、合并等技术来提高加载速度。
  5. 程序逻辑复杂:如果程序逻辑过于复杂,可能会导致响应时间过长。可以通过简化程序逻辑、减少数据库查询等方式来优化程序。

总之,要想解决 HTTP 一次请求响应时间过长的问题,需要全面分析问题,找出问题的根本原因,并采取相应的解决方法。

22、编译原理还记得吗?说一下

编译原理是计算机科学中的一个重要分支,主要研究将高级语言翻译成机器语言的过程。编译器是实现这一过程的工具,它将高级语言的源代码转换成目标代码,使计算机能够理解和执行。

编译器的主要工作包括词法分析、语法分析、语义分析、代码优化和代码生成等。词法分析器将源代码转换成一个个的词法单元,语法分析器将这些词法单元组合成语法树,语义分析器则对语法树进行分析,检查代码是否符合语义规则。代码优化器对代码进行优化,以提高程序的效率和性能。最后,代码生成器将优化后的代码转换成目标代码,使计算机能够执行。

编译原理的研究对于理解计算机的工作原理、提高程序效率和开发高质量的编译器和解释器都具有重要意义。

编译原理包括以下几个主要步骤:

  1. 词法分析(Lexical Analysis):将源代码分解成一个个有意义的符号,例如关键字、标识符、运算符等。
  2. 语法分析(Parsing):将符号序列转化为语法树,即一棵表示源代码结构的树形结构。
  3. 语义分析(Semantic Analysis):对语法树进行静态检查,确定程序是否符合语法规则和语义要求。
  4. 中间代码生成(Intermediate Code Generation):将语法树转换为中间代码,中间代码是一种低级形式的程序语言,接近于汇编语言。
  5. 代码优化(Optimization):对中间代码进行优化,以提高生成的目标代码的性能。
  6. 目标代码生成(Target Code Generation):将中间代码转换为目标代码,即计算机可以执行的机器语言。

在实际编译过程中,通常会使用多种技术来实现这些步骤,例如正则表达式、递归下降解析器、语法分析器、语义分析器、中间代码生成器、优化器等。同时,不同的编程语言也有其独特的编译原理和实现方式。

23、说说语义语法分析

语义语法分析(Semantic Syntax Analysis,简称SSA)是一种将代码分解为抽象语法树(Abstract Syntax Tree,AST)的方法。它通过将代码分解为更小的、更基本的单元,使得代码更容易理解和优化。

语义语法分析器通过以下步骤分析代码:

  1. 词法分析:将源代码转换为一个个称为“标记”(token)的小部分,每个标记代表源代码中的一个元素。
  2. 语法分析:将标记转换为抽象语法树。抽象语法树表示源代码的结构,每个节点代表一个语法单元,例如一个函数、表达式或声明。
  3. 语义分析:对抽象语法树进行语义分析,将其转换为一个个抽象语义表达式(Abstract Syntax Tree,AST)。AST包含一个节点和一个“值”,节点表示节点的类型,而“值”则包含节点的值。
  4. 优化:通过AST分析,可以对代码进行优化,例如去除冗余的节点或重构代码结构。

语义语法分析器在编译器和解释器中都很有用。在编译器中,语义语法分析器将源代码转换为AST,然后编译器可以根据AST来生成优化的代码。在解释器中,语义语法分析器可以将AST转换为代码,并将代码解释为机器语言。

24、说说 JVM 内存模型

Java虚拟机(Java Virtual Machine,简称JVM)有三种内存模型:

  1. 栈内存模型:此模型中,变量的值存储在栈中,函数调用时,参数和返回值会被压入或弹出栈。如果有多个线程同时访问同一个变量,这些线程都会访问到同一个栈中的变量值。
  2. 堆内存模型:此模型中,变量的值存储在堆中。当一个线程调用一个函数时,会在堆中分配一块内存空间,该空间被用来存储函数中的变量值。函数调用结束后,这块内存空间可以被释放或重新分配给其他对象。
  3. 垃圾回收内存模型:此模型中,JVM使用垃圾回收器来管理内存。当程序使用了一个对象之后,如果不再需要它,就会被回收。JVM会跟踪哪些对象正在被使用,哪些对象可以被回收。当对象不再被使用时,JVM会将其回收,以便为其他对象腾出空间。

JVM的内存模型是由垃圾回收机制实现的。JVM使用垃圾回收机制来解决内存分配和垃圾回收的问题,从而支持多线程程序的并发执行。JVM的垃圾回收机制可以分为标记清除、复制算法和标记整理等不同的算法,不同的算法适用于不同的场景。

25、说说 JVM 垃圾回收算法

JVM(Java 虚拟机)垃圾回收算法主要分为以下几种:

  1. 标记-清除算法(Mark-Sweep GC):标记出所有需要回收的对象,然后清除所有被标记的对象所占用的内存。标记-清除算法是最基本的垃圾回收算法,但是它会产生大量的内存碎片,碎片化的内存会导致以后的内存分配效率降低。
  2. 复制算法(Copy-on-Write GC):将内存划分为两部分,每次只使用其中一部分。当一个对象需要进行垃圾回收时,将对象复制到另一部分内存中,然后清除原来的内存。这样可以减少内存碎片的产生,但是复制算法需要额外的开销来复制对象。
  3. 标记-整理算法(Mark-Compact GC):标记出所有需要回收的对象,然后将所有存活对象移动到一端,清除另一端的内存。标记-整理算法既能减少内存碎片,又能减少复制对象的开销,但是它需要一个较长的时间来进行垃圾回收。
  4. G1 算法(G1 Garbage Collector):G1 算法是一种采用增量式垃圾回收的算法,与其他垃圾回收算法不同的是,G1 算法可以预测对象的生命周期,并且可以在对象被使用时进行垃圾回收,而不是等到对象不再被使用时再进行回收。G1 算法具有较好的性能和效率,是目前比较流行的垃圾回收算法之一。
  5. 其他垃圾回收算法:还有其他一些垃圾回收算法,如 EHCache、Tombstone 算法等,但是它们的应用范围比较有限。

26、关于 redis,说一下 sorted set 底层原理

Redis 的 sorted set 是一种有序集合,它的底层实现是使用了跳跃表(Skip List)和哈希表(Hash Table)这两种数据结构。跳跃表是一种类似于链表的数据结构,但是它在每个节点上增加了多个指针,可以快速地跳过一些节点,从而提高了查找效率。而哈希表则是一种以键值对形式存储数据的数据结构,可以快速地进行数据的插入、查找和删除操作。

在 Redis 的 sorted set 中,每个元素都有一个分数(score),根据分数的大小来进行排序。使用跳跃表可以快速地进行分数的比较和排序,而使用哈希表可以快速地进行元素的查找和删除操作。同时,Redis 还使用了压缩列表(Ziplist)来优化存储空间,对于一些小的 sorted set,它们的元素可以被存储在一个压缩列表中,从而减少了内存的使用。

sorted set 底层的原理如下:

  1. 哈希表:每个 sorted set 对象都由一个哈希表和一个分数(score)组成。哈希表用于存储成员的分值和成员的散列值(member),分数用于对成员进行排序。
  2. 分数计算:sorted set 中的每个成员都有一个分数,该分数是根据其散列值和一个权重因子计算出来的。权重因子是一个介于 0 到 100 之间的整数,用于调整不同成员的分值大小。
  3. 成员添加和删除:当向 sorted set 中添加或删除成员时,Redis 首先会根据其散列值计算出新的分数,然后将新分数与旧分数进行比较,如果新分数大于旧分数,则更新哈希表中的对应项。
  4. 范围查询:sorted set 支持范围查询,即可以根据分数范围查找成员。当执行范围查询时,Redis 首先会计算出最小分值和最大分值之间的所有成员的分数范围,然后遍历哈希表,找到所有分数在该范围内的成员。
  5. 迭代器:sorted set 支持迭代器,可以通过迭代器遍历 sorted set 中的所有成员。迭代器返回每个成员的散列值和分值,但不返回成员本身。

总之,sorted set 底层的原理是基于哈希表和分数计算实现的,它支持添加、删除、范围查询和迭代等操作。由于 sorted set 可以快速地进行范围查询和排序操作,因此在实际应用中被广泛使用。

27、说说 redis 持久化

Redis 持久化是指将 Redis 中的数据保存到磁盘上,以便在服务器重启或宕机时能够恢复数据。Redis 支持两种持久化方式:RDB(Redis Database)快照和 AOF(Append-Only File)日志。

  1. RDB 持久化:RDB 持久化是将 Redis 内存中的数据集快照写入磁盘中,可以设置不同的快照间隔时间和压缩选项。RDB 持久化的优点是可以将快照保存到磁盘中,占用较少的磁盘空间,并且可以快速恢复数据。缺点是如果在备份过程中发生宕机,可能会丢失一部分数据。
  2. AOF 持久化:AOF 持久化是将 Redis 执行的所有写操作记录到一个日志文件中,可以在服务器重启时重新执行日志文件中的操作来恢复数据。AOF 持久化的优点是可以保证数据的完整性和可靠性,因为每个写操作都会被记录下来。缺点是相对于 RDB 持久化,AOF 持久化需要更多的磁盘空间,并且在恢复数据时可能会比 RDB 持久化慢一些。

除了以上两种持久化方式外,Redis 还支持主从复制功能,可以将多个 Redis 实例复制到不同的节点上,以实现数据的备份和负载均衡。

总之,Redis 持久化是将 Redis 中的数据保存到磁盘上,以便在服务器重启或宕机时能够恢复数据。Redis 支持 RDB 和 AOF 两种持久化方式,可以根据实际需求选择合适的方式进行配置。

28、TB 级别的日志文件中存储词汇,找出出现频率最高的十个

对于TB级别的日志文件,如果想要找出出现频率最高的十个词汇,可以考虑使用MapReduce框架来实现。

具体实现可以分为以下几个步骤:

  1. 将日志文件分割成小文件,每个小文件大小不超过HDFS的块大小。
  2. 使用MapReduce框架,将每个小文件中的词汇进行统计,并将统计结果输出为键值对(key-value)形式,其中键为词汇,值为该词汇出现的次数。
  3. 将所有小文件的统计结果进行合并,即将相同键的值进行累加,得到所有词汇的出现次数。
  4. 对所有词汇的出现次数进行排序,取出出现频率最高的前十个词汇即可。

以下是使用Java实现的示例代码:

import java.io.IOException;
import java.util.StringTokenizer;
import java.util.TreeMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class TopTenWords {
    
    
    public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    
    
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    
    
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
    
    
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    
    
        private TreeMap<Integer, String> topTenWords = new TreeMap<Integer, String>();

        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    
    
            int sum = 0;
            for (IntWritable val : values) {
    
    
                sum += val.get();
            }
            topTenWords.put(sum, key.toString());
            if (topTenWords.size() > 10) {
    
    
                topTenWords.remove(topTenWords.firstKey());
            }
        }

        protected void cleanup(Context context) throws IOException, InterruptedException {
    
    
            for (Integer count : topTenWords.descendingKeySet()) {
    
    
                context.write(new Text(topTenWords.get(count)), new IntWritable(count));
            }
        }
    }

    public static void main(String[] args) throws Exception {
    
    
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "top ten words");
        job.setJarByClass(TopTenWords.class);
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

在这个示例代码中,我们定义了一个WordCountMapper类和一个WordCountReducer类,用于实现Map和Reduce操作。其中,WordCountMapper类将输入的文本文件按照空格进行分割,对每个词汇进行计数,并输出为键值对形式。WordCountReducer类将相同键的值进行累加,并将结果存储在一个TreeMap中,最后输出出现频率最高的前十个词汇。

main函数中,我们定义了一个Job对象,设置输入和输出路径,并启动MapReduce任务。

注意,在实际应用中,还需要根据具体情况进行一些优化,例如设置Combiner、使用压缩等。

29、说说一致性哈希算法

由于文章篇幅限制,还有2w字放不下,本题答案 请到文末公号【技术自由圈】获取《尼恩Java面试宝典》专题41:大厂面试真题.PDF

30、说说多模匹配算法

多模匹配算法(Multi-Mode Matching Algorithm)是一种用于在多个模式中查找匹配项的算法。它的基本思想是将每个模式看作一个独立的搜索空间,然后在这些搜索空间中进行匹配。

多模匹配算法通常包括以下几个步骤:

  1. 初始化:将所有模式中的字符按照其出现的位置进行标记,并将其对应的位置存储在一个二维数组中。
  2. 第一阶段匹配:在第一个模式中,从左到右依次扫描每个字符,并在第二个模式中查找与之匹配的字符。如果找到了匹配项,则将其位置记录下来。
  3. 第二阶段匹配:在第一个模式中,从上到下依次扫描每个字符,并在第二个模式中查找与之匹配的字符。如果找到了匹配项,则将其位置记录下来。
  4. 第三阶段匹配:在第一个模式中,从右到左依次扫描每个字符,并在第二个模式中查找与之匹配的字符。如果找到了匹配项,则将其位置记录下来。
  5. 第四阶段匹配:在第一个模式中,从下到上依次扫描每个字符,并在第二个模式中查找与之匹配的字符。如果找到了匹配项,则将其位置记录下来。
  6. 最终匹配:根据记录下来的匹配项位置,在两个模式中找到所有的匹配项,并将它们输出。

多模匹配算法的优点是可以处理多个模式之间的匹配问题,并且可以在不同模式之间进行灵活的选择和组合。缺点是它的时间复杂度较高,因为需要对每个模式进行多次扫描和匹配。此外,由于它需要记录每个模式中字符的位置信息,因此需要较大的内存空间来存储这些信息。

31、了解 web 容器吗?JBOSS、tomcat

Web容器是一种用于部署和运行Web应用程序的软件。它负责将应用程序的代码和资源打包成一个可执行的、可扩展的和安全的单元。 JBOSS是一个流行的Web容器,基于Java EE,它可以运行在不同的操作系统上,如Linux、Windows和Solaris。它提供了许多功能,包括:

  • 部署和管理应用程序
  • 负载均衡
  • 认证和授权
  • 缓存和反向代理
  • 安全性
  • 日志和监控

JBOSS通常用于部署Java EE应用程序,如Web应用程序和EJB应用程序。 Tomcat是另一个流行的Web容器,也基于Java EE。它可以运行在不同的操作系统上,如Linux、Windows和Solaris。它提供了许多功能,包括:

  • 部署和管理应用程序
  • 负载均衡
  • 认证和授权
  • 缓存和反向代理
  • 安全性
  • 日志和监控

Tomcat通常用于部署Java EE应用程序,如Web应用程序和EJB应用程序。它是一个开源软件,可以免费使用和自由修改。

二面总结

问题不多, 也就 30道题目,都是 硬核题目

小伙伴的很多也没有回答那么完美,大概就是一小时

侥幸的是,二面竟然也通过啦

三面(52min)

1、说一下 spring3 优点、缺点

Spring是一个开源的Java企业应用开发框架,它提供了丰富的功能和组件,可以帮助开发人员快速构建高质量的企业级应用程序。Spring3是Spring框架的最新版本之一,它继承了Spring2的优点,并增加了一些新的特性和改进。

Spring3的优点包括

  1. 更加灵活的配置方式:Spring3引入了基于注解的配置方式,使得配置文件更加简洁、易于维护。同时,Spring3还支持基于XML的配置方式,可以满足不同开发需求。
  2. 更加模块化的设计:Spring3将框架划分为多个模块,每个模块都有自己的职责和功能。这种模块化的设计使得Spring3更加灵活和可扩展,可以根据不同的需求选择相应的模块进行集成。
  3. 更加完善的AOP支持:Spring3对AOP的支持更加完善,提供了更加灵活的AOP实现方式,并且支持自定义切面和通知。
  4. 更加强大的数据访问能力:Spring3提供了更加强大和灵活的数据访问技术,包括JDBC、ORM框架、NoSQL等,可以帮助开发人员更加高效地访问数据。
  5. 更加安全的认证机制:Spring3引入了基于角色的访问控制(Role-Based Access Control)和基于表单的认证(Form-Based Authentication)等新的认证机制,可以帮助开发人员更好地保护应用程序的安全。

Spring3的缺点包括

  1. 学习曲线较陡峭:由于Spring3引入了许多新的特性和概念,因此需要较长时间的学习成本来掌握其使用方法和原理。
  2. 需要更多的内存和CPU资源:由于Spring3的功能和复杂性,它需要更多的内存和CPU资源来运行,这可能会导致应用程序的性能问题。

2、说说struts2 和 springMVC 原理,区别

Struts2和Spring MVC都是常见的Java Web框架,它们都提供了丰富的功能和组件,可以帮助开发人员快速构建高质量的Web应用程序。

Struts2的原理

Struts2是一个基于MVC(Model-View-Controller)设计模式的Web框架,它将请求分为三个部分:模型、视图和控制器。其中,模型表示客户端发送的数据,视图表示用户界面,控制器负责处理请求并将数据传递给相应的视图进行渲染。Struts2还提供了拦截器机制,可以在请求到达控制器之前对其进行拦截和处理,从而实现更加灵活的功能扩展。

Spring MVC的原理

Spring MVC是一个基于MVC设计模式的Web框架,它将请求分为五个部分:控制器、模型、视图、处理器和视图解析器。其中,控制器负责处理请求并将数据传递给相应的模型进行处理,模型表示业务数据,视图表示用户界面,处理器负责对模型和视图进行处理,视图解析器负责将处理器返回的结果转换为最终的视图。Spring MVC还提供了拦截器机制,可以在请求到达控制器之前对其进行拦截和处理,从而实现更加灵活的功能扩展。

区别:

  1. 设计理念不同:Struts2是基于MVC设计模式的Web框架,而Spring MVC则是基于MVC、AOP和IOC(Inversion of Control)设计模式的Web框架。
  2. 工作流程不同:在Struts2中,请求会先经过拦截器进行处理,然后再由Action处理请求;而在Spring MVC中,请求会先由DispatcherServlet接收,然后再由HandlerMapping找到对应的Controller处理请求。
  3. 配置方式不同:在Struts2中,配置文件通常以.xml.properties格式存在;而在Spring MVC中,配置文件通常以.xml.java格式存在。
  4. 功能扩展方式不同:在Struts2中,可以通过编写Interceptor来扩展功能;而在Spring MVC中,可以通过编写HandlerInterceptor或者ControllerAdvice来扩展功能。

3、说一下 memcache、redis 和 mongoDB

Memcached、Redis 和 MongoDB 都是常用的 NoSQL 数据库,但它们的特点、区别和应用场景有所不同。

1. Memcached

Memcached 是一种高性能的分布式内存对象缓存系统,主要用于减轻数据库的负载。它的特点包括:

  • 简单:Memcached 只支持键值存储,没有复杂的数据结构。
  • 高性能:Memcached 的读写性能非常高,可以轻松处理高并发请求。
  • 分布式:Memcached 支持分布式部署,可以通过添加节点来扩展性能。
  • 不支持持久化:Memcached 不支持数据持久化,数据只存在于内存中。

应用场景:Memcached 适用于需要高速缓存的场景,如 Web 应用程序中的会话管理、页面缓存等。

2. Redis

Redis 是一种高性能的键值存储系统,支持多种数据结构,包括字符串、哈希、列表、集合、有序集合等。它的特点包括:

  • 多种数据结构:Redis 支持多种数据结构,可以满足不同的应用需求。
  • 数据持久化:Redis 支持数据持久化,可以将内存中的数据异步地写入磁盘,以保证数据的可靠性。
  • 支持事务:Redis 支持事务,可以将多个命令封装成一个事务进行执行,保证数据的一致性。
  • 支持分布式:Redis 支持分布式部署,可以通过添加节点来扩展性能。

应用场景:Redis 适用于需要高速缓存和数据存储的场景,如 Web 应用程序中的会话管理、页面缓存、消息队列等。

3. MongoDB

MongoDB 是一种面向文档的 NoSQL 数据库,它采用了类似于 JSON 的文档格式来存储数据。它的特点包括:

  • 面向文档:MongoDB 是一种面向文档的数据库,每个文档可以包含不同的字段,非常灵活。
  • 支持复杂查询:MongoDB 支持复杂的查询和聚合操作,可以方便地进行数据分析。
  • 支持分布式:MongoDB 支持分布式部署,可以通过添加节点来扩展性能。
  • 不支持事务:MongoDB 不支持事务,不能保证数据的一致性。

应用场景:MongoDB 适用于需要存储大量非结构化数据的场景,如 Web 应用程序中的日志、用户行为数据等。

4、对比一下 memcache、redis

Memcache和Redis都是流行的高性能、高可用性的内存缓存系统,它们都提供了多种数据结构和操作模式,可以满足不同场景下的需求。下面是它们的一些对比:

  1. 数据类型:Memcache支持键值对(key-value)存储,而Redis支持更多的数据类型,包括字符串、哈希表、列表、集合等。
  2. 性能:Memcache的性能比Redis略高,因为它使用单线程模型和简单的内存结构。但是,Redis通过使用多线程模型和更复杂的数据结构来提高性能。
  3. 可靠性:Memcache具有较高的可靠性,因为它是基于内存的,并且支持持久化存储。Redis也具有较高的可靠性,但是需要进行数据备份和恢复操作。
  4. 扩展性:Redis具有更好的扩展性,因为它可以通过集群部署方式来扩展节点数量,从而提高性能和容错能力。Memcache不支持集群部署,只能通过增加服务器数量来扩展。
  5. 功能:Redis提供了更多的功能,例如发布订阅、Lua脚本支持等,而Memcache则相对较简单。

总之,Memcache适合于简单的缓存场景,而Redis则适合于更复杂的应用场景,例如实时消息传递、排行榜等。选择哪种缓存系统取决于具体的需求和场景。

5、说说memcached 默认过期时间

Memcached是一种内存缓存系统,它可以缓存各种类型的数据,如字符串、对象、图像等。在Memcached中,每个缓存条目都有一个过期时间,一旦过期时间到了,该条目就会自动从缓存中删除。

Memcached的默认过期时间是0,即永不过期。这意味着,如果你没有显式地设置过期时间,那么缓存的条目将会一直存在,直到你手动删除它们或者Memcached的内存空间被占满。

当然,你也可以通过设置过期时间来控制缓存的生命周期。在Memcached中,可以通过向set()方法传递第三个参数来设置过期时间,例如:

set('key', 'value', 3600) # 将key-value对缓存1小时

在上面的例子中,缓存的过期时间为3600秒,即1小时。当1小时过去后,该条目就会自动从缓存中删除。

需要注意的是,Memcached的过期时间只是一个估计值,它并不是绝对准确的。如果你的缓存系统非常繁忙,那么可能会出现过期时间延迟的情况。因此,在设计缓存系统时,需要合理设置过期时间,避免出现过期时间延迟导致的缓存不一致问题。

6、说说redis 数据结构

Redis支持多种数据结构,包括:

  1. 字符串(String):Redis的最基本的数据结构,可以存储字符串、整数或者浮点数。
  2. 列表(List):Redis的列表是一个双向链表,可以在头部或尾部添加或删除元素,支持各种操作,如范围查询、插入、删除等。
  3. 集合(Set):Redis的集合是无序的字符串集合,可以进行交集、并集、差集等操作。
  4. 有序集合(ZSet):Redis的有序集合是在集合的基础上增加了一个权重值,可以根据权重值进行排序,支持各种操作,如范围查询、插入、删除等。
  5. 哈希(Hash):Redis的哈希是一个键值对集合,可以储存多个键值对,支持各种操作,如添加、删除、查找等。

除了这些基本的数据结构,Redis还支持一些高级数据结构,如:

  1. 布隆过滤器(Bloom Filter):一种概率型数据结构,可以快速判断一个元素是否存在于一个集合中,可以用于快速判断一个URL是否已经被爬取过,或者一个邮箱地址是否已经被注册过。
  2. HyperLogLog:一种基数算法,可以用于快速统计一个集合中元素的数量,如统计一个网站的独立访客数量。
  3. 地理位置(Geo):可以储存地理位置信息,支持各种操作,如查询两个位置之间的距离、查询某个位置周围的其他位置等。

总之,Redis提供了丰富的数据结构,可以满足各种不同的应用场景。

7、说说全量复制和增量复制

全量复制和增量复制是数据备份和恢复中的两个重要概念。

全量复制指的是将所有数据从一个数据库实例复制到另一个数据库实例中,包括所有的数据、结构和配置信息等。这种方式适用于需要完全一致的数据备份和恢复的情况,比如在进行数据库迁移或者数据库重建时。但是,全量复制的缺点是数据量大、复制时间长,而且对于源数据库和目标数据库之间的差异处理比较困难。

增量复制指的是只复制源数据库中发生变化的数据到目标数据库中。这种方式适用于只需要部分数据备份和恢复的情况,比如在进行数据库更新或者数据同步时。增量复制的优点是速度快、效率高、成本低,而且对于源数据库和目标数据库之间的差异处理比较容易。

总之,全量复制和增量复制都有各自的优缺点,具体使用哪种方式取决于具体的应用场景和需求。

8、说一下 mongoDB

MongoDB是一种NoSQL数据库,它采用文档存储方式,支持动态查询和索引。下面是MongoDB的特性、优缺点和应用。

特性:

  1. 支持动态查询和索引:MongoDB使用BSON(Binary JSON)格式存储数据,支持动态查询和索引,可以快速查询和分析数据。
  2. 支持复制和故障转移:MongoDB支持复制和故障转移,可以保证数据的高可用性和可靠性。
  3. 支持分片:MongoDB支持自动分片,可以扩展到大规模数据集。
  4. 支持MapReduce:MongoDB支持MapReduce,可以进行复杂的数据分析和聚合。
  5. 支持全文搜索:MongoDB支持全文搜索,可以对文本进行高效的搜索和分析。

优点:

  1. 高性能:MongoDB采用内存映射和预分配空间等技术,可以实现高性能的读写操作。
  2. 易于扩展:MongoDB支持自动分片和复制,可以方便地进行扩展。
  3. 灵活性:MongoDB采用文档存储方式,可以方便地存储结构不固定的数据。
  4. 易于使用:MongoDB使用简单,支持多种编程语言和平台。

缺点:

  1. 不支持事务:MongoDB不支持多文档事务,只支持单文档事务。
  2. 存储空间占用较大:MongoDB采用BSON格式存储数据,存储空间占用较大。
  3. 不支持复杂查询:MongoDB不支持复杂查询,如多表连接等。

应用:

  1. Web应用:MongoDB可以用于Web应用的数据存储和查询。
  2. 大数据分析:MongoDB支持MapReduce,可以用于大数据分析。
  3. 日志处理:MongoDB可以用于日志处理和分析。
  4. 移动应用:MongoDB可以用于移动应用的数据存储和查询。

总的来说,MongoDB是一种高性能、易扩展、灵活性强的NoSQL数据库,适用于多种应用场景。

9、说一下 mongoDB 和 redis、memcached 区别,和 mysql 区别

由于文章篇幅限制,还有2w字放不下,本题及下面问题答案,请到文末公号【技术自由圈】获取《尼恩Java面试宝典》专题41:大厂面试真题.PDF

10、说一下myisam 和 innodb

11、说说事务基本特性

12、说说mongoDB 索引

13、说说mongoDB 有事务吗

14、说说mongoDB 持久化

15、说说分布式事务

16、说说操作系统内存管理

说在最后:

在尼恩的(50+)读者社群中,很多、很多小伙伴需要进大厂、拿高薪。

尼恩团队,会持续结合一些大厂的面试真题,给大家梳理一下学习路径,看看大家需要学点啥?

前面用多篇文章,给大家介绍字节、滴滴的真题:

饿了么太狠:面个高级Java,抖这多硬活、狠活

字节狂问一小时,小伙offer到手,太狠了!

收个滴滴Offer:从小伙三面经历,看看需要学点啥?

这些真题,都会收入到 史上最全、持续升级的 PDF电子书 《尼恩Java面试宝典》。

本文收录于 《尼恩Java面试宝典》 V72版,请到文末公号【技术自由圈】取

基本上,把尼恩的 《尼恩Java面试宝典》吃透,大厂offer很容易拿到滴。另外,下一期的 大厂面经大家有啥需求,可以发消息给尼恩。

技术自由的实现路径 PDF:

实现你的 架构自由:

吃透8图1模板,人人可以做架构

10Wqps评论中台,如何架构?B站是这么做的!!!

阿里二面:千万级、亿级数据,如何性能优化? 教科书级 答案来了

峰值21WQps、亿级DAU,小游戏《羊了个羊》是怎么架构的?

100亿级订单怎么调度,来一个大厂的极品方案

2个大厂 100亿级 超大流量 红包 架构方案

… 更多架构文章,正在添加中

实现你的 响应式 自由:

响应式圣经:10W字,实现Spring响应式编程自由

这是老版本 《Flux、Mono、Reactor 实战(史上最全)

实现你的 spring cloud 自由:

Spring cloud Alibaba 学习圣经

分库分表 Sharding-JDBC 底层原理、核心实战(史上最全)

一文搞定:SpringBoot、SLF4j、Log4j、Logback、Netty之间混乱关系(史上最全)

实现你的 linux 自由:

Linux命令大全:2W多字,一次实现Linux自由

实现你的 网络 自由:

TCP协议详解 (史上最全)

网络三张表:ARP表, MAC表, 路由表,实现你的网络自由!!

实现你的 分布式锁 自由:

Redis分布式锁(图解 - 秒懂 - 史上最全)

Zookeeper 分布式锁 - 图解 - 秒懂

实现你的 王者组件 自由:

队列之王: Disruptor 原理、架构、源码 一文穿透

缓存之王:Caffeine 源码、架构、原理(史上最全,10W字 超级长文)

缓存之王:Caffeine 的使用(史上最全)

Java Agent 探针、字节码增强 ByteBuddy(史上最全)

实现你的 面试题 自由:

4000页《尼恩Java面试宝典 》 40个专题

以上尼恩 架构笔记、面试题 的PDF文件更新,▼请到下面【技术自由圈】公号取 ▼

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/131027012