[Freshly released] Big data interview road experienced by big factories

One, Java basics

1. The difference between string, stringbulider, and stringbuffer

2.ArrayList、LinkedList、Vector区别

3. The difference between Class.forName and classloader

Both Class.forName and classloader in Java can be used to load classes.

In addition to loading the .class file of the class into the jvm, Class.forName also interprets the class and executes the static block in the class. Here we recommend a big data exchange circle q group: 813383827 .

The classloader only loads the .class file into the jvm, and does not execute the content in the static, and only executes the static block in the newInstance.

4. Java design patterns

23 design patterns.

Design pattern: It is a general solution to design problems.

Use design patterns: It can be applied to specific applications to solve similar problems.

The use of design patterns is to reusable code, make it easier for others to understand, and ensure code reliability.

5. The benefits of mysql index and the corresponding data structure

Creating an index can greatly improve the performance of the system, advantages:

First, by creating a unique index, you can ensure the uniqueness of each row of data in the database table.

Second, it can greatly speed up data retrieval, which is the main reason for creating indexes.

Third, it can speed up the connection between the table and the table, which is particularly meaningful in terms of achieving the referential integrity of the data.

Fourth, when using grouping and sorting clauses for data retrieval, the time for grouping and sorting in the query can also be significantly reduced.

Fifth, by using the index, you can use the optimization hider in the query process to improve the performance of the system.

The data structure of mysql index is B+ tree

6. CocurrentHashMap underlying structure

CocurrentHashMap is composed of Segment array and HashEntry array. Segment is a reentrant lock (ReentrantLock), as a data segment competition lock, each HashEntry has a linked list structure element, here is a big data exchange circle q skirt: 894951460

Use the Hash algorithm to obtain the index to determine the attribution data segment, which corresponds to the lock that needs to be acquired by competition during modification.

The lock segmentation technology is to segment the data set, each segment competes for a lock, and there is no lock competition for data in different data segments, thereby effectively improving the efficiency of high concurrent access.

CocurrentHashMap does not need to be locked in the get method, because the shared variables used are all modified with the volatile keyword, which proves the visibility of shared variables between threads (synchronize the cache and memory for each read, directly from the memory Get the value,

Although it is not an atomic operation, according to the happen before principle of the JAVA memory model, write operations to volatile fields precede read operations.

It can be guaranteed that it will not be dirty read), volatile in order to allow variables to provide memory visibility between threads, it will prohibit the reordering of program execution results (cause the effect of cache optimization is reduced)

7. ThreadLocal underlying structure

8. HashMap underlying structure

9. Java four ways to resolve Hash conflicts

10. The specific details of JVM (memory structure, GC algorithm, GC tool, reference method, etc.)

2. Data structure and algorithm

1. Binary tree preorder, middle order, and subsequent traversal methods (recursive and non-recursive)

2. The depth and breadth of the binary tree traversal method

3. The largest value of all consecutive nodes in the case of binary tree traversal

4. Find all possible sub-arrays of the array

5. Given a number, find the sum of two numbers in an ordered array to satisfy this number (it can be expanded to become two unordered arrays)

6. Find the second largest value of an array

7. Sorting problem of large files (cannot be loaded into memory)

8. Quick sort, merge sort, bubble sort, selection sort (what is the complexity)

9. What is the time complexity of hash, HashMap, index (b tree/ b+ tree)?

The time complexity of hash table is O(1)

In the most ideal case, the time complexity of HashMap is O(1) (in this case, the hash table has no data conflicts),

Otherwise, the time complexity is O(n) (in this case, it is mainly time to query the linked list)

The time complexity of a binary search tree query is O(logN)

Three, big data framework

1.hadoop

The principle and execution process of HDFS, Yarn, and MapReduce, especially MapReduce, it is best to combine the source code to say something

2.Flume+Kafka a real-time stream collection framework

Familiar with flume's workflow, source, channel, sink, interceptor, and custom source, custom sink, and custom interceptor

Familiar with the main components of Kafka (broker node, replica, partition), working principle of Kafka (kaffka production and consumption model), Kafka and other MQ

A comparison situation, how does Kafka guarantee three consumption states (at most once, at least once, at exactlt once)

Combine the specific network bottlenecks that may occur in Kafka and the GC situation of zookeeper

3.storm

Storm specific structure (spout + bolt), storm tuning specific methods, storm how to ensure high reliability, ack confirmation mechanism, storm avalanche solutions,

The specific method of strom to calculate pv, uv, dv (it is better not to use the set method)

4.Hbase

The specific architecture of hbase, (architecture diagram), hbase read and write process (emphasis on bulkload), hbase table and rowkey design (prevent hot issues), hbase hotspot

The harm caused by the problem

5. Friday

Redis usage method, redis data structure type, redis bitmap structure, redis persistence, redis elimination mechanism, redis cache breakdown

6.spark

Spark submits the execution process of a task (job division, stage division, task generation, resource scheduling, detailed shuffle process, etc.), some classic programming of spark core (secondary sorting of spark core, grouping of spark core for top N), spark Optimization of spark sql, out-of-heap memory overflow of spark sql, optimization of spark (about seven or eight)

4. Project specific issues

1. Cluster resources

2. Data volume

3. Specific projects (and problems encountered in specific projects, solutions)

Five, other

1. rpc framework communication protocol

2. Talk about your views on some specific issues (including product design and analysis)

Conclusion

Thank you for watching. If you have any shortcomings, please criticize and correct.

Get information

This time I recommend a free learning group to everyone, which summarizes data warehouse/source code analysis/Hadoop/Flink/Spark/Hive/hbase/Flink/kylin and interview resources.
Students who are interested in big data development technology, welcome to join the Q group: 813383827, whether you are a novice or a big cow, I welcome it. There is also a set of high-efficiency learning routes and tutorials compiled by Daniel to share with you for free, and at the same time every day Update video information.
Finally, I wish you all early success in learning, get a satisfactory offer, get a quick promotion and raise your salary, and embark on the pinnacle of life! !

Guess you like

Origin blog.csdn.net/sinat_40775402/article/details/89959166