Ele.me is too ruthless: face a high-level Java, how hard and ruthless it is (Ele.me interview real questions)

Foreword:

In the (50+) reader community of 40-year-old architect Nien , there are often small partners who need to interview big companies such as Ele.me, Toutiao, Meituan, Ali, and JD.com. There are many small partners who have completed the counterattack in life and got high-end offers.

Recently, a friend with 6 years of experience has an annual salary of 60W, which is very impressive.

The following is a friend who successfully got an offer from Ele.me Advanced Java , and his interview experience is still two words:

  • deep: very deep
  • wide: very wide

Next, let’s take a look at the topic of the guy’s interview, what do you need to learn to accept an offer from Ele.me?

The following interview questions are also very useful for interviewing other senior Java positions.

The questions and reference answers are also included here in our " Nin Java Interview Collection " V71, for the reference of the following friends, to improve everyone's 3-high architecture, design, and development levels.

Note: This article is continuously updated in PDF. For PDF files of the latest Nien architecture notes and interview questions, please go to the official account [Technical Freedom Circle] at the end of the article to obtain

Article directory

Ele.me interview topic:

1. Tell me about the isolation level of database transactions?

The isolation level of database transactions refers to the different degrees of isolation between transactions when accessing the database concurrently. There are four common isolation levels:

  1. Read Uncommitted (Read Uncommitted) : This is the lowest isolation level. A transaction can read data from another uncommitted transaction, which may cause dirty reads, non-repeatable reads, and phantom reads.
    It is suitable for scenarios with more reads and fewer writes, and can improve concurrency performance. However, if one transaction reads uncommitted data, other transactions may be affected, so it needs to be used with caution.
  2. Read Committed (Read Committed) : This is a higher isolation level. One transaction can only read the data of another committed transaction, which can avoid dirty read problems, but non-repeatable read and phantom read problems may still occur.
    It is suitable for scenarios with more reads and fewer writes, and can guarantee data consistency, but may reduce concurrency performance.
  3. Repeatable Read (Repeatable Read) : This is a higher isolation level. During the execution of a transaction, reading the same data multiple times will get the same result, which can avoid dirty reads and non-repeatable reads, but it may still occur Phantom reading problem.
    Applicable to scenarios that require data consistency, such as bank transactions and order processing. However, due to the need to lock data during transaction execution, concurrency performance may be reduced.
  4. Serializable : The highest isolation level, all transactions are executed serially, which can avoid dirty reads, non-repeatable reads, and phantom reads, but has a greater impact on performance.
    It is suitable for scenarios that require very high data consistency, such as financial transactions and medical diagnosis. However, due to serial execution, concurrency performance may be reduced.

In actual development, according to specific business needs and performance requirements, different isolation levels can be selected to balance data consistency and concurrency performance.

isolation level read data consistency dirty read non-repeatable read Phantom reading
read uncommitted The lowest level, which can only guarantee that physically damaged data will not be read yes yes yes
read committed statement level no yes yes
repeatable read transaction level no no yes
Serialization highest level, transaction level no no no

The table lists four common database transaction isolation levels and their handling of dirty reads, non-repeatable reads, and phantom reads. Among them, dirty read means that one transaction reads data that has not yet been submitted by another transaction; non-repeatable read means that a transaction reads the same data multiple times, but due to the modification of other transactions, the results of each read are different ; Phantom reading means that a transaction reads the same range of data multiple times, but due to the insertion or deletion of other transactions, the results of each read are different.

2. Talk about several major characteristics of transactions, and talk about the realization principle

A transaction is a sequence of operations performed as a single logical unit of work, either all or none of them.

ACID

Transactions have four key properties, known as ACID:

  1. Atomicity : All operations in a transaction either succeed or fail and roll back, and it is not allowed to perform only a part of them.
  2. Consistency : Before and after the execution of the transaction, the state of the database must be consistent, that is, all constraints are met.
  3. Isolation : Transactions are isolated from each other, and the execution of one transaction should not affect the execution of other transactions. Each transaction should think that it is the only transaction executing, and each transaction should be unaware of the presence of other transactions.
  4. Durability : Once a transaction is committed, the data modification in the database is permanent, even if the system crashes, it will not be lost.

Realization principle

The implementation of transactions requires the support of the database management system, usually through log records and lock mechanisms.

Logging : During transaction execution, the database management system will record all operations in the log. If the transaction execution fails, it can be rolled back through the log to ensure data consistency.

Lock mechanism : In order to ensure the isolation between transactions, the database management system will use the lock mechanism to isolate transactions. When a transaction modifies a certain data, it will lock the data, and other transactions need to wait for the transaction to release the lock before modifying the data.

3. How to use redis to implement message publishing and subscription?

Redis can implement the publication and subscription of messages through the publication and subscription (Pub/Sub) mode.

principle

Redis is implemented in C, you can understand the underlying implementation of the publish and subscribe mechanism by analyzing the pubsub.c file in the Redis source code

Redis implements publishing and subscription functions through commands such as PUBLISH, SUBSCRIBE and PSUBSCRIBE

After subscribing to a channel through the SUBSCRIBE command, a dictionary is maintained in the redis-server. The key of the dictionary is a channel, and the value of the dictionary is a linked list, which stores all clients that subscribe to this channel. The key to the SUBSCRIBE command is to add the client to the subscription list for a given channel.

Send a message to subscribers through the PUBLISH command, redis-server will use the given channel as a key, look up the list of all clients that subscribe to this channel in the channel dictionary it maintains, and publish the message to all subscribers

Pub and Sub are literally understood as publishing (Publish) and subscription (Subscribe). In redis, you can set message publishing and message subscription for a certain key value. When a message is published on a key value, all subscriptions Its clients will receive corresponding information. The most obvious use of this function is the real-time messaging system, such as ordinary instant chat, group chat and other functions.

Subscribe/Publish Message Graph

Specific steps

1. Create a subscriber collection

First, you need to create a subscriber collection in Redis to store the relevant information of all subscribers. You can use the SET command in Redis to create a collection where the key is the subscriber's name and the value is the subscriber's ID.

2. Post a message

Then, use Redis's PUBLISH command to publish a message to the specified topic. A topic is a string, which can be an arbitrary name, identifying the message to publish. Redis's JSON format can be used to represent message content, for example:

PUBLISH topic "Hello World"

3. Subscribe to news

Next, subscribers can use Redis's SUBSCRIBE command to subscribe to the specified topic. Likewise, the subject is also a string, which can be any name. After subscribing, Redis returns a response containing information about the current set of subscribers. You can use Redis's PSUBSCRIBE command to subscribe to multiple topics, for example:

PSUBSCRIBE "topic1", "topic2"

4. Processing messages

When a message is published to the specified topic, Redis will automatically send the message to all subscribers who have subscribed to the topic. Subscribers can use Redis LPUSH, RPUSH and other commands to receive and process messages, for example:

LPUSH "my-subscriber-channel" '{"message": "Hello World"}'

The above code publishes a message to the channel named "my-subscriber-channel" and passes a message object in JSON format. Other subscribers can receive and process the message in the same way.

For more details, please refer to Nien's "Java High Concurrency Core Programming Volume 1 Enhanced Edition: NIO, Netty, Redis, ZooKeeper", which makes a detailed introduction in the book, which is very detailed

4. Why does java design its own program counter in the memory structure, why not use the kernel?

The program counter (Program Counter Register) in Java is a memory area used to store the address of the bytecode instruction being executed by the current thread. The reason why the Java virtual machine designs its own program counter in the memory structure instead of using the kernel's program counter is mainly due to the following reasons:

  1. The program counter provided by the kernel is a kernel-mode counter, which can only simply record the execution times of threads, and cannot dynamically modify the value of the counter like the Java program counter. If the kernel's program counter is used, since multiple threads may access the kernel at the same time, contention and conflicts will result, resulting in program errors or crashes.
  2. The Java program counter can better support multi-threaded concurrency. In Java, switching between threads is achieved through context switching in kernel mode, and the program counter is an important parameter of context switching. After the thread executes a piece of code, it needs to add 1 to the value of the counter so that it can restore to the previous state when the code is executed next time. This dynamic context switching would not be possible without a program counter.
  3. The program counter in Java can ensure the visibility and atomicity between threads through mechanisms such as memory barriers (Memory Barrier), so as to achieve efficient concurrent execution.

In addition, the program counter in Java has the following advantages:

  1. Cross-platform : One of Java's design goals is to achieve cross-platform, that is, Java programs can run on different operating systems and hardware platforms. In order to achieve this goal, the Java virtual machine needs to implement the program counter itself, without relying on the program counter provided by the operating system.
  2. Thread privateness : The program counter in the Java virtual machine is thread-private, and each thread has its own program counter. When the thread is switched, the virtual machine saves the program counter of the current thread and restores the program counter of the next thread. Thread privacy cannot be achieved if the operating system's program counter is used.
  3. Fast access : The program counter is an important part of the Java virtual machine execution engine, which is used to indicate the address of the bytecode instruction being executed by the current thread. If you use the program counter of the operating system, you need to switch between system calls and kernel mode, which will affect performance. The program counter in the Java virtual machine directly accesses memory, which is faster.

To sum up, Java uses its own program counter to support multi-threaded concurrent execution, and manage it through the memory structure to improve the stability and reliability of the program. The Java virtual machine needs to design its own program counter in the memory structure to achieve cross-platform, thread privacy and fast access.

5. Talk about the process of distributed transaction 2PC?

Distributed transaction means that in a distributed system, multiple transaction operations involve multiple databases or resources, and it is necessary to ensure that these transaction operations either all succeed or all fail. 2PC (Two-Phase Commit) is a distributed transaction protocol for coordinating the commit and rollback of distributed transactions. The process is mainly divided into two stages:

Prepare Phase

At this stage, the Coordinator sends a "prepare" request to all Participants, asking them if they can execute the transaction, and saves their execution results in the log. Participants execute transactions and feed back the execution results to the coordinator. If all participants can execute the transaction, the coordinator sends a "commit" request, otherwise a "rollback" request.

Commit Phase

At this stage, if the coordinator sends a "commit" request, all participants execute the transaction and submit the execution result. If the coordinator sends a "rollback" request, all participants cancel the transaction and roll back the execution result. Finally, the coordinator sends a "done" request to all participants, indicating that the transaction is complete.

advantage

In the 2PC process, the coordinator must be strongly consistent, that is, it needs to perform a consistency check on the data of all participants to ensure that the data of all participants can be submitted or rolled back correctly.

The advantage of the 2PC protocol is that it can guarantee the atomicity and consistency of transactions, that is, either all commits or all rollbacks.

shortcoming

It also has some disadvantages, such as:

  1. Performance issues : 2PC requires multiple network communications and waiting, which will affect performance.
  2. Single point of failure problem : The coordinator is the key to the 2PC protocol. If the coordinator fails, the entire system will not work properly.
  3. Synchronous blocking problem : In the preparation phase, all participants need to wait for the response of the coordinator. If the response time of the coordinator is too long, the participants will be blocked.

Therefore, in practical applications, it is necessary to select an appropriate distributed transaction solution, such as TCC, Saga, etc., according to specific business scenarios.

6. Redis is single-threaded, why is it so fast?

The reason why Redis can process requests efficiently is mainly because it adopts the following optimization measures:

  1. Memory-based : Redis stores all data in memory, which avoids the overhead of disk I/O operations, thereby improving the speed of data reading and writing.
  2. Single-threaded model : Redis adopts a single-threaded model, which avoids the overhead of competition and locks among multiple threads, thereby reducing the overhead of context switching. Although Redis is single-threaded, it uses an event-driven mechanism and asynchronous I/O technology to improve concurrency by decomposing tasks into multiple small tasks and executing them in parallel. In addition, Redis also uses multiplexing technology, which can handle multiple client requests at the same time.
  3. Asynchronous non-blocking : Redis processes client requests in an asynchronous and non-blocking manner. When a client initiates a request, Redis immediately responds and puts the request in the queue, and then processes the request asynchronously, which avoids thread blocking and wait.
  4. Data structure optimization : Redis has a variety of built-in data structures, such as hash tables, ordered sets, etc. These data structures have been optimized for fast data storage and retrieval.
  5. Efficient encoding and decoding : Redis uses some efficient encoding and decoding algorithms, such as Deflate, Snappy, LZ4, etc., which can compress and decompress data and reduce the amount of data transmitted over the network.

Combining the above optimization measures, Redis can handle a large number of requests in a single-threaded environment and maintain efficient performance.

7. Talk about the implementation of NIO, and how is Netty designed?

Implementation of NIO

NIO (Non-blocking I/O) is a new I/O model provided by Java that supports non-blocking, event-driven I/O operations. Compared with the traditional blocking I/O model, NIO can better handle highly concurrent network requests and improve system throughput and response speed.

The implementation of NIO mainly depends on two classes: Channeland Buffer.

  • ChannelRepresents an entity connected to a port, which can communicate with another Channel or server;
  • BufferIt represents a data structure for storing the read data and provides some methods to process the data.

NIO Selectorimplements event-driven through (selector). It can monitor the state changes Channelof , and notify the application to process when there is data readable or writable. SelectorIt will continuously poll the ones registered on it Channel, and when Channelthere is data readable or writable, Selectorthe application will be notified for corresponding processing. In NIO, you can use Channeland Bufferto read and write data, and you can use a single thread to handle multiple read Channeland write operations, thus avoiding the competition between multiple threads and the overhead of locks.

How Netty is designed

Netty is a NIO-based client/server framework that provides a highly customizable network programming API that can help developers quickly build high-performance, high-reliability network applications. Netty's design idea is based on "Reactor mode", which uses thread pool, buffer pool, memory pool and other technologies to optimize the performance of network communication, and provides rich codec and protocol support, so that developers can easily Realize data exchange of various protocols.

The main design ideas of Netty include:

  1. Extensibility : Netty's componentized design makes it very easy to extend and customize. Users can choose appropriate components according to their needs, and realize complex functions through combination.
  2. High performance : Netty adopts some optimization strategies, such as event-driven model, zero-copy technology, memory pool, etc., thereby improving the throughput and responsiveness of the system.
  3. Portability : Netty supports multiple operating systems and platforms, such as Windows, Linux, Unix, MacOS, etc., and can be used in different languages, such as Java, Scala, Python, Golang, etc.
  4. Maintainability : Netty's code structure is clear and easy to understand, and it also provides rich documentation and sample code, allowing developers to easily maintain and modify the code.

The core components of Netty include Channel, EventLoop, ChannelFuture, ChannelHandlerand so on.

  • ChannelIt is the core concept of Netty, which represents a network connection, which can read and write data;
  • EventLoopIt is the event loop component of Netty, which is responsible for processing all I/O events and dispatching the events to the corresponding Channel for processing;
  • ChannelFutureIt is the encapsulation class of Netty's asynchronous operation results, which can be used to obtain the results of asynchronous operations;
  • ChannelHandlerIt is Netty's data processor, which is responsible for encoding, decoding, processing and forwarding the data in the Channel.

In short, the implementations of NIO and Netty are based on an event-driven asynchronous non-blocking model, which can better handle highly concurrent network requests and improve system throughput and response speed.

For more details, please refer to Nien's "Java High Concurrency Core Programming Volume 1 Enhanced Edition: NIO, Netty, Redis, ZooKeeper", which makes a detailed introduction in the book, which is very detailed

8. When micro-services should be split and when should they be merged

The splitting and merging of the microservice architecture needs to consider multiple factors, such as business complexity, team size, technology stack, maintainability, performance, etc.

When to Split Microservices

  1. High business complexity : When the business logic is very complex, you can consider splitting it into multiple microservices, and each microservice focuses on the business logic of a certain subfield.
  2. Large team size : When the team size is large, the team can be split into multiple small teams, and each small team is responsible for maintaining a microservice to improve development efficiency and quality.
  3. Different technology stacks : When different microservices use different technology stacks, they can be split into multiple microservices so that the team can focus on the technology stack they are good at.
  4. Poor maintainability : When the code of a microservice is difficult to maintain, it can be split into multiple microservices so that the team can better maintain and manage the code.

When to Merge Microservices

  1. Simple business logic : When the business logic is relatively simple, multiple microservices can be combined into one to reduce system complexity and maintenance costs.
  2. Performance issues : When calls between multiple microservices are frequent, they can be merged into one microservice to reduce network latency and improve performance.
  3. Data sharing : When multiple microservices need to share the same data, they can be merged into one microservice to facilitate data management and maintenance.

It should be noted that the splitting and merging of microservices requires careful consideration, and decisions should be made on a case-by-case basis.

9. When should messages be used, and when is it suitable for interface calls?

In the microservice architecture, we can use message queues or interface calls to achieve communication between different microservices.

When to use message queues

  1. Asynchronous communication : Message queues can be used when asynchronous communication is required between two microservices. For example, when a microservice needs to notify other microservices of an event, a message queue can be used to achieve asynchronous communication.
  2. Decoupling : When decoupling is required between two microservices, message queues can be used. For example, when a microservice needs to hand over a task to other microservices, message queues can be used to decouple tasks.
  3. Flow Control : Message queues can be used when the flow between two microservices needs to be controlled. For example, when one microservice needs to transfer large amounts of data to other microservices, message queues can be used to control the flow.

When to use the interface call

  1. Synchronous communication : When synchronous communication is required between two microservices, interface calls can be used. For example, when a microservice needs to obtain data from other microservices, it can use interface calls to achieve synchronous communication.
  2. High performance : When the communication between two microservices requires high performance, interface calls can be used. For example, when a microservice needs to call other microservices frequently, interface calls can be used to improve performance.
  3. Data security : When the communication between two microservices needs to ensure data security, interface calls can be used. For example, when a microservice needs to transmit sensitive data, interface calls can be used to ensure data security.

It should be noted that message queues and interface calls have their own advantages and disadvantages, and the appropriate communication method should be selected according to the specific situation. At the same time, in practical applications, we can also combine message queues and interface calls to achieve more flexible and efficient communication methods.

10. If you are asked to design a global id in the sub-database and sub-table, how to design it? Did Baidu understand the optimization of the Snowflake algorithm?

snowflake algorithm

In sub-database and sub-table, in order to avoid the same ID appearing in different databases, it is necessary to design a globally unique ID. A common solution is to use the Snowflake algorithm (SnowFlake) to generate a globally unique ID.

The Snowflake algorithm is a distributed ID generation algorithm open sourced by Twitter, which can guarantee the generation of unique IDs in a distributed environment. The ID generated by the Snowflake algorithm is a 64-bit integer, of which 1 bit is a sign bit, 41 bits are a timestamp, 10 bits are a working machine ID, and 12 bits are a serial number.

The ID generation rules of the Snowflake algorithm are as follows:

  1. The first bit is the sign bit , which is always 0, indicating that a positive integer is generated
  2. The next 41 bits are the timestamp , accurate to the millisecond level, you can subtract a fixed start time from the current time to get a relative timestamp
  3. The next 10 digits are the machine identifier , which can be designed according to the needs, such as IP address, MAC address, data center ID and other information can be used to generate
  4. The last 12 bits are the serial number , which can be realized by using a counter, which increments every time the ID is generated. When the serial number reaches the maximum value, you can wait for the next millisecond before continuing to generate

IDs generated using the Snowflake algorithm have the following advantages:

  1. Globally unique , unique IDs can be generated in distributed systems
  2. The timestamps are in order , and can be sorted according to the timestamps of the IDs, which is convenient for database query and analysis
  3. High performance , very fast ID generation, can support high concurrency scenarios
  4. Easy to implement , the implementation of the Snowflake algorithm is relatively simple, and can be implemented in languages ​​​​such as Java

It should be noted that in the scenario of sub-database and sub-table, if the Snowflake algorithm is used to generate IDs, it is necessary to ensure that the machine identifiers of each sub-database and sub-table are different, otherwise duplicate IDs may be generated. You can consider using the data center ID and machine ID to generate machine identifiers to ensure that the machine identifiers of each sub-database and sub-table are different.

The following is a sample code for Java to implement the Snowflake algorithm to generate a globally unique ID:

public class SnowflakeIdGenerator {
    
    
    // 起始的时间戳
    private final static long START_TIMESTAMP = 1480166465631L;

    // 每一部分占用的位数
    private final static long SEQUENCE_BIT = 12; // 序列号占用的位数
    private final static long MACHINE_BIT = 10; // 机器标识占用的位数
    private final static long DATACENTER_BIT = 1; // 数据中心占用的位数

    // 每一部分的最大值
    private final static long MAX_DATACENTER_NUM = -1L ^ (-1L << DATACENTER_BIT);
    private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT);
    private final static long MAX_SEQUENCE = -1L ^ (-1L << SEQUENCE_BIT);

    // 每一部分向左的位移
    private final static long MACHINE_LEFT = SEQUENCE_BIT;
    private final static long DATACENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT;
    private final static long TIMESTAMP_LEFT = DATACENTER_LEFT + DATACENTER_BIT;

    private long datacenterId; // 数据中心
    private long machineId; // 机器标识
    private long sequence = 0L; // 序列号
    private long lastTimestamp = -1L; // 上一次时间戳

    public SnowflakeIdGenerator(long datacenterId, long machineId) {
    
    
        if (datacenterId > MAX_DATACENTER_NUM || datacenterId < 0) {
    
    
            throw new IllegalArgumentException("datacenterId can't be greater than MAX_DATACENTER_NUM or less than 0");
        }
        if (machineId > MAX_MACHINE_NUM || machineId < 0) {
    
    
            throw new IllegalArgumentException("machineId can't be greater than MAX_MACHINE_NUM or less than 0");
        }
        this.datacenterId = datacenterId;
        this.machineId = machineId;
    }

    public synchronized long nextId() {
    
    
        long timestamp = timeGen();

        if (timestamp < lastTimestamp) {
    
    
            throw new RuntimeException("Clock moved backwards.  Refusing to generate id");
        }

        if (timestamp == lastTimestamp) {
    
    
            sequence = (sequence + 1) & MAX_SEQUENCE;
            if (sequence == 0L) {
    
    
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
    
    
            sequence = 0L;
        }

        lastTimestamp = timestamp;

        return ((timestamp - START_TIMESTAMP) << TIMESTAMP_LEFT) |
                (datacenterId << DATACENTER_LEFT) |
                (machineId << MACHINE_LEFT) |
                sequence;
    }

    private long tilNextMillis(long lastTimestamp) {
    
    
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
    
    
            timestamp = timeGen();
        }
        return timestamp;
    }

    private long timeGen() {
    
    
        return System.currentTimeMillis();
    }
}

Example usage:

SnowflakeIdGenerator idGenerator = new SnowflakeIdGenerator(1, 1);
long id = idGenerator.nextId();
System.out.println(id);

The datacenterId and machineId here can be set according to the actual situation. For example, Zookeeper can be used to manage the allocation of datacenterId and machineId.

Baidu's Optimization of Snowflake Algorithm

The Snowflake algorithm is a commonly used distributed ID generation algorithm, but in high-concurrency scenarios, the problem of ID duplication may occur, which will lead to data errors and inconsistencies. In order to solve this problem, Baidu has made some optimizations based on the Snowflake algorithm, making the generated ID more stable and unique.

Baidu’s optimization of the Snowflake algorithm mainly includes the following points:

1. Increase the number of digits in the data center ID and machine ID

In the original Snowflake algorithm, the data center ID and machine ID have 5 and 5 digits respectively, totaling 10 digits. Baidu increased the number of digits for the data center ID and machine ID to 8 and 8 digits respectively, for a total of 16 digits. This allows more data centers and machines to be supported, and also reduces the chance of ID duplication.

2. Use Zookeeper to manage data center ID and machine ID

In the original Snowflake algorithm, the data center ID and machine ID are statically configured and need to be configured in each application. This will bring some problems, such as the need to modify the configuration file when expanding or shrinking, which is error-prone and not flexible enough. To solve this problem, Baidu uses Zookeeper to manage data center IDs and machine IDs. When each application starts, it will register its own ID with Zookeeper, and Zookeeper will assign a unique ID to the application. In this way, the problem of manual configuration can be avoided, and dynamic expansion and contraction can also be supported.

3. Improved hash function

Baidu uses the MurmurHash3 hash function to store snowflake sequences. The MurmurHash3 hash function is an efficient hash function that can quickly map a set of numbers to a fixed array position.

Use a thread-safe hash table: When generating globally unique identifiers, you need to use a hash table in multiple threads simultaneously to store snowflake sequences. In order to ensure the thread safety of the hash table, Baidu uses the thread-safe hash table provided in the C++11 standard library.

Increase the size of the hash table: In order to improve the efficiency of the hash table, Baidu increased the size of the hash table in practical applications. When the size of the hash table reaches a certain level, it will be automatically expanded to ensure the performance and stability of the hash table.

4. Timestamp accuracy

In the snowflake algorithm, the precision of the timestamp is at the millisecond level. In order to further improve the accuracy of the timestamp, Baidu optimized the Snowflake algorithm to increase the accuracy of the timestamp to the microsecond level. This can better support time synchronization and timing control in distributed systems.

5. Serial number range :

In the snowflake algorithm, the sequence number ranges from 0 to 4095. In order to support greater concurrency and higher performance, Baidu optimized the Snowflake algorithm and extended the range of serial numbers to 1 to 4096. This can better support data writing and query operations in high concurrency scenarios.

6. Machine identification code :

In the snowflake algorithm, the machine identification code is used to represent the unique identifier of the current machine. In order to avoid machine identification code conflicts, Baidu optimized the Snowflake algorithm and expanded the range of machine identification codes from 0 to 32 bits to 128 bits. This better supports the problem of unique identifier collisions between multiple machines.

7. Concurrency control :

In the snowflake algorithm, in order to ensure the correctness of concurrent writing, Baidu has optimized the snowflake algorithm and introduced mechanisms such as write locks and read locks. This can better support write operations in high-concurrency scenarios, and avoid write conflicts and data loss problems.

Through the above optimizations, Baidu has implemented a more stable and reliable distributed ID generation algorithm, which can generate unique IDs in high-concurrency scenarios and ensure the correctness and consistency of data.

11. How does redis perform statistics on stand-alone hotspot data?

Redis can collect statistics on stand-alone hotspot data in the following ways:

  1. Use INFOcommands to view various performance indicators of the Redis instance, such as memory usage, number of connections, number of executed commands, etc. INFOThe command is a command that comes with Redis and can be used in any Redis client.
    1) Use INFOthe command to get the statistics of the Redis server.
    2) Analyze statistical information to obtain data related to memory usage.
    3) Calculate the memory usage of each key according to the memory usage.
    4) Sort the memory usage of all keys, and obtain the top N keys with the largest memory usage, which are hot data.
  2. Use MONITORthe command to monitor the performance indicators of the Redis instance in real time, and output the results to the standard output stream. MONITORThe command can set the monitoring cycle and output format, which is very flexible.
  3. Use the commands in the Redis cluster CLUSTER INFOto view the performance indicators of each node in the cluster, including memory usage, number of connections, number of executed commands, etc. CLUSTER INFOThe command can only be used in Redis cluster.
  4. Integrate Redis monitoring tools such as New Relic, Datadog, etc. in the application. These tools help you monitor the performance metrics of your Redis instance in real time and provide detailed reporting and alerting capabilities.

12. After a new node is added to the redis cluster, how to allocate data to the new node?

In the Redis cluster, when a new node is added, the data in the cluster needs to be re-sharded to ensure that the load of each node is balanced. Specific steps are as follows:

  1. Determine the slot range for the new node . In Redis cluster, data is divided into 16384 slots, each slot has a number from 0 to 16383. New nodes need to be allocated a certain range of slots, which can be calculated based on the number of nodes and slots in the current cluster.
  2. Join the new node to the cluster . New nodes can be added to the cluster using Redis CLUSTER MEETcommands, for example:
CLUSTER MEET <new_node_ip> <new_node_port>
  1. Assign the new node a slot . CLUSTER ADDSLOTSA range of slots can be assigned to new nodes using Redis commands, for example:
CLUSTER ADDSLOTS 0 1 2 3 4 ... 100

where, 0 1 2 3 4 ... 100indicates the slot number to be allocated.

  1. Wait for the cluster to reshard . When a new node joins the cluster and allocates a slot, the cluster will automatically re-shard and migrate the corresponding data to the new node. This process takes a certain amount of time, and you can use CLUSTER INFOcommands to view the cluster status until the cluster status is ok.
  2. Repeat the above steps until all nodes have joined the cluster and have slots assigned.

It should be noted that the Redis cluster has the function of automatically balancing data. When a node has too many or too few slots, the cluster will automatically migrate some slots to other nodes to maintain the load balance of each node. Therefore, when adding and deleting nodes, the cluster can automatically perform data migration to reduce the complexity of manual operations.

13. How to find the largest 100 from a file containing 10 billion integers?

The answer is: divide and conquer, heap sorting, quick selection algorithm, BitMap algorithm can be used respectively,

The following is the code to write several algorithms in java

1. Use divide and conquer

The idea of ​​​​divide and conquer is to decompose the big problem into small problems, then solve the small problems separately, and finally combine the solutions of the small problems to get the solution of the big problem. When finding the largest 100 numbers among 10 billion integers, you can divide the entire data set into several small data sets, find the largest 100 numbers in each small data set, and then combine these largest 100 numbers Get up, and then find the largest 100 numbers.

The Java code is implemented as follows:

import java.io.*;
import java.util.*;

public class Top100NumbersByDivideAndConquer {
    
    
    private static final int MAX_NUMBERS = 1000000000; // 最多处理10亿个数
    private static final int MAX_NUMBERS_PER_FILE = 10000000; // 每个文件最多处理1千万个数
    private static final int MAX_NUMBERS_PER_GROUP = 1000000; // 每个小数据集最多处理100万个数
    private static final int MAX_GROUPS = MAX_NUMBERS / MAX_NUMBERS_PER_GROUP; // 最多分成10000个小数据集
    private static final int MAX_TOP_NUMBERS = 100; // 找出最大的100个数

    public static void main(String[] args) throws Exception {
    
    
        // 生成随机数文件
        generateRandomNumbersFile("numbers.txt", MAX_NUMBERS);

        // 将随机数文件分成若干个小文件
        List<String> files = splitNumbersFile("numbers.txt", MAX_NUMBERS_PER_FILE);

        // 找出每个小文件中最大的100个数
        List<List<Integer>> topNumbersPerFile = new ArrayList<>();
        for (String file : files) {
    
    
            List<Integer> numbers = readNumbersFromFile(file);
            List<Integer> topNumbers = findTopNumbersByHeapSort(numbers, MAX_TOP_NUMBERS);
            topNumbersPerFile.add(topNumbers);
        }

        // 将每个小文件中最大的100个数合并起来
        List<Integer> topNumbers = mergeTopNumbers(topNumbersPerFile, MAX_TOP_NUMBERS);

        // 输出最大的100个数
        System.out.println("Top " + MAX_TOP_NUMBERS + " numbers:");
        for (int i = 0; i < MAX_TOP_NUMBERS; i++) {
    
    
            System.out.println(topNumbers.get(i));
        }
    }

    // 生成随机数文件
    private static void generateRandomNumbersFile(String fileName, int count) throws Exception {
    
    
        Random random = new Random();
        BufferedWriter writer = new BufferedWriter(new FileWriter(fileName));
        for (int i = 0; i < count; i++) {
    
    
            writer.write(String.valueOf(random.nextInt()));
            writer.newLine();
        }
        writer.close();
    }

    // 将随机数文件分成若干个小文件
    private static List<String> splitNumbersFile(String fileName, int maxNumbersPerFile) throws Exception {
    
    
        List<String> files = new ArrayList<>();
        BufferedReader reader = new BufferedReader(new FileReader(fileName));
        String line;
        int count = 0;
        int fileIndex = 0;
        BufferedWriter writer = new BufferedWriter(new FileWriter("numbers_" + fileIndex + ".txt"));
        while ((line = reader.readLine()) != null) {
    
    
            writer.write(line);
            writer.newLine();
            count++;
            if (count >= maxNumbersPerFile) {
    
    
                writer.close();
                files.add("numbers_" + fileIndex + ".txt");
                fileIndex++;
                writer = new BufferedWriter(new FileWriter("numbers_" + fileIndex + ".txt"));
                count = 0;
            }
        }
        writer.close();
        files.add("numbers_" + fileIndex + ".txt");
        reader.close();
        return files;
    }

    // 从文件中读取数字
    private static List<Integer> readNumbersFromFile(String fileName) throws Exception {
    
    
        List<Integer> numbers = new ArrayList<>();
        BufferedReader reader = new BufferedReader(new FileReader(fileName));
        String line;
        while ((line = reader.readLine()) != null) {
    
    
            numbers.add(Integer.parseInt(line));
        }
        reader.close();
        return numbers;
    }

    // 使用堆排序算法找出最大的k个数
    private static List<Integer> findTopNumbersByHeapSort(List<Integer> numbers, int k) {
    
    
        PriorityQueue<Integer> heap = new PriorityQueue<>(k);
        for (int number : numbers) {
    
    
            if (heap.size() < k) {
    
    
                heap.offer(number);
            } else if (number > heap.peek()) {
    
    
                heap.poll();
                heap.offer(number);
            }
        }
        List<Integer> topNumbers = new ArrayList<>(heap);
        Collections.sort(topNumbers, Collections.reverseOrder());
        return topNumbers;
    }

    // 合并每个小文件中最大的k个数
    private static List<Integer> mergeTopNumbers(List<List<Integer>> topNumbersPerFile, int k) {
    
    
        PriorityQueue<Integer> heap = new PriorityQueue<>(k);
        for (List<Integer> topNumbers : topNumbersPerFile) {
    
    
            for (int number : topNumbers) {
    
    
                if (heap.size() < k) {
    
    
                    heap.offer(number);
                } else if (number > heap.peek()) {
    
    
                    heap.poll();
                    heap.offer(number);
                }
            }
        }
        List<Integer> topNumbers = new ArrayList<>(heap);
        Collections.sort(topNumbers, Collections.reverseOrder());
        return topNumbers;
    }
}

2. Use the heap sort algorithm

The idea of ​​the heap sorting algorithm is to use a small root heap to store the largest k numbers that have been found so far, and then traverse the remaining numbers. If it is larger than the top element of the heap, replace the top element with this number, and then readjust the heap .

The Java code is implemented as follows:

import java.io.*;
import java.util.*;

public class Top100NumbersByHeapSort {
    
    
    private static final int MAX_NUMBERS = 1000000000; // 最多处理10亿个数
    private static final int MAX_TOP_NUMBERS = 100; // 找出最大的100个数

    public static void main(String[] args) throws Exception {
    
    
        // 生成随机数文件
        generateRandomNumbersFile("numbers.txt", MAX_NUMBERS);

        // 找出最大的100个数
        List<Integer> numbers = readNumbersFromFile("numbers.txt");
        List<Integer> topNumbers = findTopNumbersByHeapSort(numbers, MAX_TOP_NUMBERS);

        // 输出最大的100个数
        System.out.println("Top " + MAX_TOP_NUMBERS + " numbers:");
        for (int i = 0; i < MAX_TOP_NUMBERS; i++) {
    
    
            System.out.println(topNumbers.get(i));
        }
    }

    // 生成随机数文件
    private static void generateRandomNumbersFile(String fileName, int count) throws Exception {
    
    
        Random random = new Random();
        BufferedWriter writer = new BufferedWriter(new FileWriter(fileName));
        for (int i = 0; i < count; i++) {
    
    
            writer.write(String.valueOf(random.nextInt()));
            writer.newLine();
        }
        writer.close();
    }

    // 从文件中读取数字
    private static List<Integer> readNumbersFromFile(String fileName) throws Exception {
    
    
        List<Integer> numbers = new ArrayList<>();
        BufferedReader reader = new BufferedReader(new FileReader(fileName));
        String line;
        while ((line = reader.readLine()) != null) {
    
    
            numbers.add(Integer.parseInt(line));
        }
        reader.close();
        return numbers;
    }

    // 使用堆排序算法找出最大的k个数
    private static List<Integer> findTopNumbersByHeapSort(List<Integer> numbers, int k) {
    
    
        PriorityQueue<Integer> heap = new PriorityQueue<>(k);
        for (int number : numbers) {
    
    
            if (heap.size() < k) {
    
    
                heap.offer(number);
            } else if (number > heap.peek()) {
    
    
                heap.poll();
                heap.offer(number);
            }
        }
        List<Integer> topNumbers = new ArrayList<>(heap);
        Collections.sort(topNumbers, Collections.reverseOrder());
        return topNumbers;
    }
}

3. Use the quick selection algorithm

The idea of ​​​​the quick selection algorithm is to use the idea of ​​​​quick sorting to divide the data set into two parts, and then only continue to recurse on the part containing the largest k number until the largest k number is found.

The Java code is implemented as follows:

import java.io.*;
import java.util.*;

public class Top100NumbersByQuickSelect {
    
    
    private static final int MAX_NUMBERS = 1000000000; // 最多处理10亿个数
    private static final int MAX_TOP_NUMBERS = 100; // 找出最大的100个数

    public static void main(String[] args) throws Exception {
    
    
        // 生成随机数文件
        generateRandomNumbersFile("numbers.txt", MAX_NUMBERS);

        // 找出最大的100个数
        List<Integer> numbers = readNumbersFromFile("numbers.txt");
        List<Integer> topNumbers = findTopNumbersByQuickSelect(numbers, MAX_TOP_NUMBERS);

        // 输出最大的100个数
        System.out.println("Top " + MAX_TOP_NUMBERS + " numbers:");
        for (int i = 0; i < MAX_TOP_NUMBERS; i++) {
    
    
            System.out.println(topNumbers.get(i));
        }
    }

    // 生成随机数文件
    private static void generateRandomNumbersFile(String fileName, int count) throws Exception {
    
    
        Random random = new Random();
        BufferedWriter writer = new BufferedWriter(new FileWriter(fileName));
        for (int i = 0; i < count; i++) {
    
    
            writer.write(String.valueOf(random.nextInt()));
            writer.newLine();
        }
        writer.close();
    }

    // 从文件中读取数字
    private static List<Integer> readNumbersFromFile(String fileName) throws Exception {
    
    
        List<Integer> numbers = new ArrayList<>();
        BufferedReader reader = new BufferedReader(new FileReader(fileName));
        String line;
        while ((line = reader.readLine()) != null) {
    
    
            numbers.add(Integer.parseInt(line));
        }
        reader.close();
        return numbers;
    }

    // 使用快速选择算法找出最大的k个数
    private static List<Integer> findTopNumbersByQuickSelect(List<Integer> numbers, int k) {
    
    
        int left = 0;
        int right = numbers.size() - 1;
        while (left <= right) {
    
    
            int pivotIndex = partition(numbers, left, right);
            if (pivotIndex == k) {
    
    
                break;
            } else if (pivotIndex < k) {
    
    
                left = pivotIndex + 1;
            } else {
    
    
                right = pivotIndex - 1;
            }
        }
        List<Integer> topNumbers = new ArrayList<>(numbers.subList(0, k));
        Collections.sort(topNumbers, Collections.reverseOrder());
        return topNumbers;
    }

    private static int partition(List<Integer> numbers, int left, int right) {
    
    
        int pivotIndex = left;
        int pivotValue = numbers.get(pivotIndex);
        swap(numbers, pivotIndex, right);
        int storeIndex = left;
        for (int i = left; i < right; i++) {
    
    
            if (numbers.get(i) > pivotValue) {
    
    
                swap(numbers, i, storeIndex);
                storeIndex++;
            }
        }
        swap(numbers, storeIndex, right);
        return storeIndex;
    }

    private static void swap(List<Integer> numbers, int i, int j) {
    
    
        int temp = numbers.get(i);
        numbers.set(i, numbers.get(j));
        numbers.set(j, temp);
    }
}

4. Use the BitMap algorithm

The idea of ​​the BitMap algorithm is to use a BitMap to record whether each number has appeared, and then traverse the BitMap to find the k numbers with the most occurrences.

The Java code is implemented as follows:

import java.io.*;
import java.util.*;

public class Top100NumbersByBitMap {
    
    
    private static final int MAX_NUMBERS = 1000000000; // 最多处理10亿个数
    private static final int MAX_TOP_NUMBERS = 100; // 找出最大的100个数

    public static void main(String[] args) throws Exception {
    
    
        // 生成随机数文件
        generateRandomNumbersFile("numbers.txt", MAX_NUMBERS);

        // 找出最大的100个数
        List<Integer> numbers = readNumbersFromFile("numbers.txt");
        List<Integer> topNumbers = findTopNumbersByBitMap(numbers, MAX_TOP_NUMBERS);

        // 输出最大的100个数
        System.out.println("Top " + MAX_TOP_NUMBERS + " numbers:");
        for (int i = 0; i < MAX_TOP_NUMBERS; i++) {
    
    
            System.out.println(topNumbers.get(i));
        }
    }

    // 生成随机数文件
    private static void generateRandomNumbersFile(String fileName, int count) throws Exception {
    
    
        Random random = new Random();
        BufferedWriter writer = new BufferedWriter(new FileWriter(fileName));
        for (int i = 0; i < count; i++) {
    
    
            writer.write(String.valueOf(random.nextInt()));
            writer.newLine();
        }
        writer.close();
    }

    // 从文件中读取数字
    private static List<Integer> readNumbersFromFile(String fileName) throws Exception {
    
    
        List<Integer> numbers = new ArrayList<>();
        BufferedReader reader = new BufferedReader(new FileReader(fileName));
        String line;
        while ((line = reader.readLine()) != null) {
    
    
            numbers.add(Integer.parseInt(line));
        }
        reader.close();
        return numbers;
    }

    // 使用BitMap算法找出最大的k个数
    private static List<Integer> findTopNumbersByBitMap(List<Integer> numbers, int k) {
    
    
        int[] bitMap = new int[Integer.MAX_VALUE / 32 + 1];
        for (int number : numbers) {
    
    
            int index = number / 32;
            int bit = number % 32;
            bitMap[index] |= (1 << bit);
        }
        List<Integer> topNumbers = new ArrayList<>();
        while (topNumbers.size() < k) {
    
    
            int maxCount = 0;
            int maxNumber = 0;
            for (int i = 0; i < bitMap.length; i++) {
    
    
                for (int j = 0; j < 32; j++) {
    
    
                    if ((bitMap[i] & (1 << j)) != 0) {
    
    
                        int number = i * 32 + j;
                        int count = countNumberInList(numbers, number);
                        if (count > maxCount) {
    
    
                            maxCount = count;
                            maxNumber = number;
                        }
                    }
                }
            }
            topNumbers.add(maxNumber);
            removeNumberFromList(numbers, maxNumber);
        }
        return topNumbers;
    }

    private static int countNumberInList(List<Integer> numbers, int number) {
    
    
        int count = 0;
        for (int n : numbers) {
    
    
            if (n == number) {
    
    
                count++;
            }
        }
        return count;
    }

    private static void removeNumberFromList(List<Integer> numbers, int number) {
    
    
        for (Iterator<Integer> iterator = numbers.iterator(); iterator.hasNext();) {
    
    
            if (iterator.next() == number) {
    
    
                iterator.remove();
            }
        }
    }
}

The above four algorithms can all be used to solve the problem of finding the largest 100 numbers from a file of 10 billion integers.

Among them, the divide and conquer method and BitMap algorithm are suitable for data processing in a distributed environment .

The heap sort algorithm and the quick selection algorithm are suitable for data processing in a stand-alone environment .

Say at the end:

In Nien's (50+) reader community, many, many small partners need to enter a big factory and get a high salary.

The Nien team will continue to combine the real interview questions of some major companies to sort out the learning path for you and see what you need to learn?

In the previous two articles, I introduced the main points of knowledge about the real interview questions of Dachang:

" Byte Maniac asked for 1 hour, the guy got the offer, it's too ruthless!" "

" Accept a Didi Offer: From the three experiences of the guy, what do you need to learn? "

These real interview questions will be included in the most complete and continuously upgraded PDF e-book " Nin's Java Interview Collection ".

The title and reference answers of this article are included in our "Nin Java Interview Collection" V71 version, you can get it from Nin, password: get e-book

Basically, if you thoroughly understand Nien's "Ninan Java Interview Collection", it is easy to get offers from big companies.

In addition, if you have any needs in the next issue of Dachang Interview, you can send a message to Nien.

The realization path of technical freedom PDF:

Realize your architectural freedom:

" Have a thorough understanding of the 8-figure-1 template, everyone can do the architecture "

" 10Wqps review platform, how to structure it? This is what station B does! ! ! "

" Alibaba Two Sides: How to optimize the performance of tens of millions and billions of data?" Textbook-level answers are coming "

" Peak 21WQps, 100 million DAU, how is the small game "Sheep a Sheep" structured? "

" How to Scheduling 10 Billion-Level Orders, Come to a Big Factory's Superb Solution "

" Two Big Factory 10 Billion-Level Red Envelope Architecture Scheme "

… more architecture articles, being added

Realize your responsive freedom:

" Responsive Bible: 10W Words, Realize Spring Responsive Programming Freedom "

This is the old version of " Flux, Mono, Reactor Combat (the most complete in history) "

Realize your spring cloud freedom:

" Spring cloud Alibaba Study Bible "

" Sharding-JDBC underlying principle and core practice (the most complete in history) "

" Get it done in one article: the chaotic relationship between SpringBoot, SLF4j, Log4j, Logback, and Netty (the most complete in history) "

Realize your linux freedom:

" Linux Commands Encyclopedia: 2W More Words, One Time to Realize Linux Freedom "

Realize your online freedom:

" Detailed explanation of TCP protocol (the most complete in history) "

" Three Network Tables: ARP Table, MAC Table, Routing Table, Realize Your Network Freedom!" ! "

Realize your distributed lock freedom:

" Redis Distributed Lock (Illustration - Second Understanding - The Most Complete in History) "

" Zookeeper Distributed Lock - Diagram - Second Understanding "

Realize your king component freedom:

" King of the Queue: Disruptor Principles, Architecture, and Source Code Penetration "

" The King of Cache: Caffeine Source Code, Architecture, and Principles (the most complete in history, 10W super long text) "

" The King of Cache: The Use of Caffeine (The Most Complete in History) "

" Java Agent probe, bytecode enhanced ByteBuddy (the most complete in history) "

Realize your interview questions freely:

4000 pages of "Nin's Java Interview Collection" 40 topics

The PDF file update of the above Nien architecture notes and interview questions, ▼Please go to the following [Technical Freedom Circle] official account to get it▼

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/130989697