Tencent's interview experience is better (spring recruitment internship Java post)

Today, I will share a reader's internship experience of Tencent's spring recruitment. The Java backend of the post mainly asks about the three major areas of MySQL, Java, and network.

He felt that this interview was very rewarding. Although it was all basic questions, the roots, stems and leaves were all connected when digging deep down. The interviewer gave me a lot of advice in the rhetorical question section, including interviews, strategies, foundations, algorithms, etc. It was a valuable learning experience.

MySQL

Introduce the indexing mechanism of MySQL

Indexes can help us quickly search for data. The innodb storage engine uses a b+ tree index. Leaf nodes store index + data, and non-leaf nodes only store indexes.

Indexes can be classified according to four perspectives.

  • Classified by "data structure": B+tree index, Hash index, Full-text index.
  • Classified by "physical storage": clustered index (primary key index), secondary index (auxiliary index).
  • Classified by "field characteristics": primary key index, unique index, common index, prefix index.
  • Classified by "number of fields": single-column index, combined index.

What is a joint index?

By combining multiple fields into an index, the index is called a composite index.

For example, to combine the product_no and name fields in the product table into a joint index (product_no, name), the way to create a joint index is as follows:

CREATE INDEX index_product_no_name ON product(product_no, name);

The B+Tree schematic diagram of the joint index (product_no, name) is as follows (I drew a one-way linked list between the leaf nodes in the figure, but it is actually a two-way linked list. I can’t find the original picture, and I can’t modify it. I won’t repeat it if I’m lazy. Draw it, everyone can make up a doubly linked list).

insert image description here
It can be seen that the non-leaf nodes of the joint index use the values ​​of the two fields as the key values ​​of the B+Tree. When querying data in the joint index, first compare by the product_no field, and then compare by the name field if the product_no is the same.

That is to say, the B+Tree of the joint index query is first sorted by product_no, and then sorted by the name field if the product_no is the same.

Therefore, when using a joint index, there is a leftmost matching principle , that is, index matching is performed in a leftmost-first manner. When using a joint index for query, if the "leftmost matching principle" is not followed, the joint index will fail, so that the fast query feature of the index cannot be used.

What is a clustered index?

The leaf nodes of the B+Tree of the clustered index store actual data, and all complete user records are stored in the leaf nodes of the B+Tree of the primary key index.

What is a covering index?

The secondary index is used in the query. If the queried data can be queried in the secondary index, then there is no need to return to the table. This process is a covering index. If the queried data is not in the secondary index, it will search the secondary index first, find the corresponding leaf node, obtain the primary key value, and then search the primary key index to query the data. This process is to return to the table.

What is the process of the entire index query?

Each node in the B+ tree in InnoDB is a data page . The structural diagram is as follows:
insert image description here
How does the B+ tree realize the fast search for records with the primary key of 6? The above figure is an example:

  • Starting from the root node, use the dichotomy method to quickly locate the page that meets the scope of the page and contains the query value. Because the primary key value of the query is 6, it is between the range [1, 7), so go to page 30 to find a more detailed directory item;
  • In the non-leaf nodes (page 30), continue to use the dichotomy method to quickly locate the page that matches the page range containing the query value, and the primary key value is greater than 5, so go to the leaf node (page 16) to search for records;
  • Next, in the leaf node (page 16), when searching for records through slots, use the dichotomy method to quickly locate which slot (which record group) the record to be queried is in, and after locating the slot, traverse all the records in the slot to find the primary key for 6 records.

It can be seen that when locating the page where the record is located, the page containing the record is also quickly located through the dichotomy. After locating the page, it will perform dichotomy to quickly locate the group (slot number) where the record is located in the page, and finally traverse and search in the group.

What are the isolation levels of transactions?

  • Read uncommitted means that when a transaction has not been committed, the changes it makes can be seen by other transactions;
  • Read commit means that after a transaction commits, the changes it makes can be seen by other transactions;
  • Repeatable reading refers to the data seen during the execution of a transaction, which is always consistent with the data seen when the transaction is started. The default isolation level of the MySQL InnoDB engine;
  • Serialization ; a read-write lock will be added to the record. When multiple transactions read and write this record, if a read-write conflict occurs, the later accessed transaction must wait for the previous transaction to complete before continuing implement;

What do dirty reads, phantom reads, and non-rereadables mean?

  • Dirty read : If a transaction "reads" another "data modified by an uncommitted transaction", it means that a "dirty read" phenomenon has occurred.
  • Phantom reading : A "number of records" that meets the query conditions is queried multiple times in a transaction. If the number of records queried twice before and after is different, it means that a "phantom reading" phenomenon has occurred.
  • Non-repeatable read : The same data is read multiple times within a transaction. If the data read twice before and after is different, it means that the phenomenon of "non-repeatable read" has occurred.

The specific principles and underlying details of InnoDB multi-version concurrency control?

For transactions at the "read committed" and "repeatable read" isolation levels, they are implemented through Read View. The difference between them is that the timing of creating Read View is different. You can understand Read View as a data snapshot. Like taking pictures with a camera, freeze the scenery at a certain moment.

  • The "read commit" isolation level is to regenerate a Read View "before each select statement is executed";
  • The "repeatable read" isolation level is to generate a Read View when executing the first select, and then use this Read View during the entire transaction.

Read View has four important fields:

  • m_ids: refers to the list of transaction ids of "active transactions" in the current database when Read View is created . Note that it is a list. "Active transactions" refer to transactions that have been started but not yet committed.
  • min_trx_id : Refers to the transaction with the smallest transaction id among the "active transactions" in the current database when Read View is created, that is, the minimum value of m_ids.
  • max_trx_id: This is not the maximum value of m_ids, but the id value that should be given to the next transaction in the current database when creating Read View , that is, the largest transaction id value in the global transaction + 1;
  • creator_trx_id: refers to the transaction id of the transaction that created the Read View.

For database tables using the InnoDB storage engine, its clustered index records contain the following two hidden columns:

  • trx_id, when a transaction modifies a clustered index record, the transaction id of the transaction will be recorded in the trx_id hidden column;
  • roll_pointer, every time a clustered index record is changed, the old version of the record will be written into the undo log, and then this hidden column is a pointer to each old version record , so you can use it to find record of.

After creating the Read View, we can divide the trx_id in the record into these three situations:
insert image description here
when a transaction accesses the record, besides its own updated record is always visible, there are several other situations:

  • If the trx_id value of the record is smaller than the min_trx_id value in Read View, it means that this version of the record is generated by the transaction that has been committed before creating Read View, so this version of the record is visible to the current transaction.
  • If the trx_id value of the record is greater than or equal to the max_trx_id value in Read View, it means that the record of this version is generated by the transaction started after the Read View is created, so the record of this version is not visible to the current transaction.
  • If the recorded trx_id value is between the min_trx_id and max_trx_id of the Read View, it is necessary to determine whether the trx_id is in the m_ids list:
  • If the trx_id of the record is in the m_ids list, it means that the active transaction that generated the record of this version is still active (the transaction has not been committed), so the record of this version is not visible to the current transaction.
  • If the trx_id of the record is not in the m_ids list, it means that the active transaction that generated the record of this version has been submitted, so the record of this version is visible to the current transaction.

This kind of "version chain" to control the behavior of concurrent transactions accessing the same record is called MVCC (multi-version concurrency control).

What is next Key and how to achieve it?

Next-Key Lock is called Pro-Key Lock, which is a combination of Record Lock + Gap Lock, which locks a range and locks the record itself.

Assuming that there is a next-key lock with the range id (3, 5] in the table, then other transactions can neither insert the record with id = 4 nor modify the record with id = 5. Therefore, the next-key lock can
insert image description here
protect This record, in turn, prevents other transactions from inserting new records into the gap in front of the protected record.

What are the scenarios where the index fails, and do you know how to improve it?

  • When we use left or left fuzzy matching, that is, like %xx or like %xx%, both methods will cause the index to fail;
  • When we use functions on index columns in query conditions, it will cause the index to fail.
  • When we perform expression calculations on index columns in query conditions, we cannot use indexes.
  • When MySQL encounters a comparison between a string and a number, it will automatically convert the string to a number, and then compare it. If the string is an index column and the input parameter in the conditional statement is a number, then the index column will undergo implicit type conversion. Since the implicit type conversion is implemented through the CAST function, it is equivalent to using a function on the index column, so will cause the index to fail.
  • To use the joint index correctly, it is necessary to follow the leftmost matching principle, that is, to match the index according to the leftmost first method, otherwise the index will become invalid.
  • In the WHERE clause, if the conditional column before the OR is an indexed column, but the conditional column after the OR is not an indexed column, the index will fail.

The entire process of committing a transaction, how does each log work?

The specific process of updating a record UPDATE t_user SET name = 'xiaolin' WHERE id = 1; is as follows:

  1. The executor is responsible for the specific execution, and will call the interface of the storage engine, and obtain the row record with id = 1 through the primary key index tree search:
  2. If the data page where the row with id=1 is located is already in the buffer pool, it will be directly returned to the executor for update;
  3. If the record is not in the buffer pool, read the data page from the disk into the buffer pool, and return the record to the executor.
  4. After the executor obtains the clustered index records, it will check whether the records before the update are the same as the records after the update:
  5. If they are the same, the subsequent update process will not be performed;
  6. If they are not the same, both the pre-update record and the post-update record are passed as parameters to the InnoDB layer, so that InnoDB can actually perform the operation of updating the record;
  7. To start a transaction, before the InnoDB layer updates the record, first record the corresponding undo log, because this is an update operation, you need to write down the old value of the updated column, that is, to generate an undo log, and the undo log will be written to the Buffer Pool Undo page in , but after modifying the Undo page in memory, you need to record the corresponding redo log.
  8. When the InnoDB layer starts to update the records, it will first update the memory (marked as dirty pages at the same time), and then write the records to the redo log. At this time, the update is completed. In order to reduce disk I/O, dirty pages will not be written to disk immediately, and then the background thread will choose an appropriate time to write dirty pages to disk. This is the WAL technology. The write operation of MySQL is not written to the disk immediately, but the redo log is written first, and then the modified row data is written to the disk at an appropriate time.
  9. So far, a record has been updated.
  10. After an update statement is executed, it starts to record the binlog corresponding to the statement. At this time, the recorded binlog will be saved to the binlog cache and not refreshed to the binlog file on the hard disk. The transaction will be executed uniformly when the transaction is committed All binlogs in the process are flushed to hard disk.
  11. Transaction submission (for the convenience of explanation, the process of group submission is not mentioned here, only two-phase submission):
  12. Prepare phase: set the transaction state corresponding to the redo log to prepare, and then flush the redo log to the hard disk;
  13. Commit phase: flush the binlog to disk, then call the engine’s commit transaction interface, and set the redo log status to commit (after setting the transaction to commit status, flush to the disk redo log file);
  14. At this point, an update statement is executed.

Java

What is the content of each area in the JVM memory area?

The memory structure of the JVM is mainly divided into the following parts:

  • Program Counter Register: Each thread has a program counter. When a thread executes a Java method, the program counter holds the address of the currently executing instruction so that it can be reset to the correct location when the JVM calls another method or resumes thread execution.
  • Java Virtual Machine Stacks (Java Virtual Machine Stacks): Each thread has a virtual machine stack. The virtual machine stack stores information such as local variables, operand stacks, and method exits during method execution. Every time a thread calls a Java method, it will create a stack frame (Stack Frame), which contains the local variables of the method, operand stack, method return address and other information. The stack frame is popped after the method execution ends.
  • Native Method Stack: Similar to the Java virtual machine stack, but serves native methods.
  • Java heap (Java Heap): The Java heap is the largest memory area in the Java virtual machine, used to store various types of object instances, and is also the main working area of ​​the garbage collector. The Java heap is a portion shared by all threads.
  • Method Area: The method area is also a part shared by all threads. It is used to store data such as class loading information, static variables, constant pools, and method bytecodes. In Java 8 and earlier versions, the method area was implemented as a permanent generation (Permanent Generation), and in Java 8 it was changed to a metaspace (Metaspace).

JVM exception problem?

  • StackOverflowError (thread request stack depth exceeds the maximum depth allowed by the virtual machine)
  • OutOfMemoryError (not enough heap memory)
  • PermGen space (not enough memory in the method area).

Where is the String saved?

String is stored in the string constant pool. Unlike other objects, its value is immutable and can be shared by multiple references.

What are the main differences between java version change question 1.7 and 1.8?

  • New features of Java 7: diamond operator, try-with-resource statement, support for dynamic type languages, Fork/Join framework, etc.
  • New Java 8 features: Lambda expressions, Stream API, new Date/Time API, Nashorn JavaScript engine, etc.

How to find the garbage that needs to be recycled?

The garbage collection mechanism in Java determines which objects can be recycled by judging whether the objects are reachable. When an object does not have any references to it, it can be recycled, and this process is done automatically by the JVM's garbage collector.

The JVM uses a reachability analysis algorithm to determine whether an object is reachable. Starting from the GC Roots object, traverse all objects through a series of reference chains. If an object is unreachable, it means that it is dead and can be recycled.

In Java, there are 4 types of references: strong references, soft references, weak references and phantom references. Among them, strong reference is the most common type of reference. As long as the strong reference exists, the garbage collector will not reclaim the object. Soft references, weak references, and phantom references represent soft references, weak references, and phantom references to objects, respectively. When the garbage collector performs garbage collection, it will decide whether to recycle objects according to different reference types.

What are the common garbage collectors, give a few to talk about?

  1. Serial collector: Single-threaded garbage collector, using mark-copy algorithm, suitable for small applications or client applications.
  2. Parallel collector: A multi-threaded garbage collector, using the mark-copy algorithm, suitable for medium-sized applications running in the background.
  3. CMS collector: Concurrent garbage collector, using the mark-sweep algorithm, suitable for medium-sized applications that require response time.
  4. G1 collector: Concurrent garbage collector, using the mark-sort algorithm, suitable for applications that require response time and have a large heap memory.

Among them, the Serial collector and Parallel collector are new generation collectors, and CMS and G1 are old generation collectors.

How is the bottom layer of HashMap implemented? Is it thread safe?
The underlying layer of HashMap is implemented based on arrays and linked lists. To put it simply, HashMap maps the key to an array through the hash algorithm, and then looks up the value in the corresponding linked list. When multiple keys have the same hash value, a linked list is used to store these key-values ​​in the same array position. However, when the length of the linked list is too long, the performance of HashMap will be affected. Therefore, in JDK1.8, when the length of the linked list exceeds the threshold, the linked list will be converted into a red-black tree to improve search efficiency.

HashMap is not thread-safe, because multiple threads accessing HashMap at the same time may cause data inconsistency. A thread-safe Map can be implemented using ConcurrentHashMap.

What is a red-black tree?

The red-black tree is a self-balancing binary search tree, which can guarantee that the time complexity of basic dynamic operations is O(log n) in the worst case. Each node in a red-black tree has a color attribute, which can be red or black. A red-black tree satisfies the following five properties:

  1. Each node is either red or black.
  2. The root node is black.
  3. Each leaf node (NIL node, empty node) is black.
  4. If a node is red, both of its children are black.
  5. For each node, the simple path from the node to all its descendant leaf nodes contains the same number of black nodes.

Through these properties, the red-black tree can ensure that when inserting and deleting nodes, the structure of the tree is automatically adjusted to maintain the balance of the tree and the satisfaction of the properties. Compared with ordinary binary search trees, red-black trees are more balanced, and search, insertion, and deletion have more stable time complexity, so they are widely used in many scenarios.

What are fair locks and unfair locks?

Fair locks and unfair locks refer to the way locks are acquired.

Fair lock means that multiple threads acquire locks in the order in which they apply for locks, that is, the principle of first come first served. After thread A releases the lock, threads B, C, and D acquire the lock in turn. If thread E applies for the lock at this time, it needs to wait for B, C, and D to acquire and release the lock in turn before acquiring the lock.

Unfair lock means that the order in which multiple threads acquire locks is random, and fairness is not guaranteed. When thread A releases the lock, threads B, C, D and other threads can all acquire the lock through competition, and at this time, thread E can also acquire the lock through competition.

In practical applications, fair locks can avoid starvation, but due to the need to maintain thread queues, the efficiency is relatively low. Since the non-fair lock does not need to maintain the thread queue, it is relatively efficient, but it may cause some threads to be unable to acquire the lock for a long time.

What is ThreadLocal?

ThreadLocal is a thread closure technology in Java, which allows each thread to have its own separate copy of variables, thereby ensuring thread safety. ThreadLocal provides a mechanism for thread local storage, providing each thread with an independent variable copy, so that the variables in each thread do not interfere with each other.

computer network

The process of TCP three-way handshake and four-way wave?

The process of three-way handshake:
insert image description here

  • At the beginning, both the client and the server are in the CLOSE state. First, the server actively listens to a port and is in the LISTEN state
  • The client will randomly initialize the serial number (client_isn), put this serial number in the "serial number" field of the TCP header, and set the SYN flag to 1, indicating a SYN message. Then send the first SYN message to the server, which means to initiate a connection to the server. This message does not contain application layer data, and then the client is in the SYN-SENT state.
  • After the server receives the SYN message from the client, the server first randomly initializes its serial number (server_isn), and fills this serial number into the "serial number" field of the TCP header, and then fills in the "confirmation response number" field of the TCP header. Enter client_isn + 1, then set the SYN and ACK flags to 1. Finally, the message is sent to the client, and the message does not contain application layer data, and then the server is in the SYN-RCVD state.
  • After the client receives the message from the server, it needs to respond to the server with the last response message. First, the ACK flag in the TCP header of the response message is set to 1, and then the "acknowledgment response number" field is filled with server_isn + 1. Finally, the The message is sent to the server, this time the message can carry the data from the client to the server, and then the client is in the ESTABLISHED state.
  • After receiving the response message from the client, the server also enters the ESTABLISHED state.

The process of waving four times:
insert image description here

  • The client intends to close the connection. At this time, it will send a packet with the FIN flag set to 1 in the TCP header, that is, a FIN packet, and then the client enters the FIN_WAIT_1 state.
  • After receiving the message, the server sends an ACK response message to the client, and then the server enters the CLOSE_WAIT state.
  • After receiving the ACK response message from the server, the client enters the FIN_WAIT_2 state.
  • After waiting for the server to process the data, it also sends a FIN message to the client, and then the server enters the LAST_ACK state.
  • After receiving the FIN message from the server, the client returns an ACK response message, and then enters the TIME_WAIT state
  • After the server receives the ACK response message, it enters the CLOSE state, and the server has completed the closing of the connection so far.
  • After a period of 2MSL, the client automatically enters the CLOSE state, and the client has also completed the closing of the connection.

Why shake hands three times and wave four times?

Reasons for the three-way handshake:

  • The three-way handshake can prevent repeated initialization of historical connections
  • The three-way handshake can synchronize the initial serial numbers of both parties
  • Three-way handshake can avoid resource waste

Reasons for waving four times:

  • The server usually needs to wait for the completion of data sending and processing, so the ACK and FIN of the server are generally sent separately, so four waves are required.

What IO mechanisms does LINUX have?

  • Blocking IO (Blocking IO): When the application is performing IO operations, it will always block and wait for the IO to complete, during which no other operations can be performed.
  • Non-blocking IO (Non-blocking IO): When the application is performing an IO operation, it will return immediately. Regardless of whether the IO operation is completed, the application can perform other operations. It is necessary to judge whether the IO is completed by polling, so the efficiency is low.
  • IO multiplexing (IO Multiplexing): Through system calls such as select, poll, and epoll, multiple file descriptors can be monitored simultaneously in one process. When any file descriptor is ready, IO operations can be performed.
  • Signal Driven IO (Signal Driven IO): When the application performs IO operations, it registers a signal processing function with the kernel, and the kernel sends a signal to the application when the IO is completed, and the application performs data processing after receiving the signal.
  • Asynchronous IO (Asynchronous IO): When the application performs an IO operation, it can return immediately. The kernel is responsible for reading the data into the specified buffer and notifying the application after completion. The application can continue to perform other operations. Asynchronous IO requires the support of the operating system and hardware, and is currently mainly used in high-performance IO scenarios.

select poll epoll, what is the difference between the underlying implementation?

There is no essential difference between select and poll. They both use a "linear structure" internally to store the Socket collection that the process cares about.

When using it, you first need to copy the concerned Socket collection from the user state to the kernel state through the select/poll system call, and then the kernel detects the event. When a network event occurs, the kernel needs to traverse the process to pay attention to the Socket collection and find the corresponding Socket, and set its status as readable/writable, and then copy the entire Socket collection from the kernel state to the user state, and the user state will continue to traverse the entire Socket collection to find the readable/writable Socket, and then process it.

Obviously, the defect of select and poll is that when there are more clients, that is, the larger the Socket collection, the traversal and copying of the Socket collection will bring a lot of overhead, so it is difficult to deal with C10K.

epoll is a powerful tool to solve the C10K problem, and it solves the select/poll problem in two ways.

  • epoll uses the "red-black tree" in the kernel to focus on all the sockets to be detected in the process. The red-black tree is an efficient data structure. The general time complexity of adding, deleting, and modifying is O(logn). Through the management of this black-red tree, There is no need to pass in the entire Socket collection for each operation like select/poll, which reduces a lot of data copying and memory allocation in the kernel and user space.
  • epoll uses an event-driven mechanism. A "linked list" is maintained in the kernel to record ready events, and only the Socket collection with events occurred is passed to the application. It does not need to poll and scan the entire collection like select/poll (including yes and no) Event Socket), which greatly improves the detection efficiency.

What is the difference between BIO and NIO?

In Java, both BIO and NIO are IO models. Their main differences are:

  • Blocking and non-blocking: BIO adopts blocking mode, that is, when performing IO operations, the thread will always block and wait for the IO to complete. However, NIO adopts non-blocking mode, that is, when performing IO operations, the thread will return immediately, regardless of whether the IO operation is completed, the thread can perform other operations.
  • IO model: BIO adopts the synchronous blocking IO model, that is, one thread can only process one connection. When there are a large number of connections, a large number of threads are required to process them, which will lead to waste of system resources. NIO adopts the synchronous non-blocking IO model, which registers the connection requests sent by the client to the multiplexer, so that one thread can handle multiple connections. When there are a large number of connections, only a small number of threads are needed to process them, which can be effectively improve system resource utilization.

algorithm

Linked list judgment intersection

public ListNode getIntersectionNode(ListNode headA, ListNode headB) {
    
    
    if (headA == null || headB == null) return null;
    ListNode pA = headA, pB = headB;
    while (pA != pB) {
    
    
        pA = pA == null ? headB : pA.next;
        pB = pB == null ? headA : pB.next;
    }
    return pA;
}

interview summary

Feel

It was a very rewarding interview, and it is worth writing a separate article to record it later. A weak foundation can shake mountains. Although they are all basic problems, if you dig deep down, the roots, stems and leaves are all connected. There are solutions to these problems in the interview questions, but I really only know the surface, and I will go to the battlefield after a superficial look. Yes, and I am very grateful to the interviewer. The questioning session gave me a lot of advice, including interviews, strategies, foundations, algorithms, etc. It was a valuable learning experience. It would be a lie to say no regrets, but I am very happy and have learned a lot. Harvest is good.

Inadequacies

The foundation is not solid enough, many of them only know what’s on the surface but don’t know what’s inside, and the interviewer has rich experience and is very sensitive. At that time, the pressure was also great, and the mentality was unstable when asked.

Few answers are particularly good. If you say a little bit, you will be asked deeply until you can’t. I won’t post the answer at that time here. What I said is very shallow. Every question must be studied carefully and the knowledge must be connected into a network. .

Construct a scene for yourself, sort out all the technologies used, and be able to tell the whole process, from the beginning to the end, so that you can really understand it, such as the process of inserting data into hashmap, the process of creating and releasing threadlocal, and how to realize the characters of String String splicing, every step of SpringBoot framework construction, etc.

Guess you like

Origin blog.csdn.net/Park33/article/details/131050557