Ao Bing's 8 years of experience readers, summary of 20 major factories during the epidemic

This article comes from the submission of an interview experience of an 8-year big data veteran. I read it completely and it was really detailed and detailed. During the epidemic, the interviews failed and various summaries. It is not easy to get an offer in the end, and the spirit is worth it. Everyone learn

Preface

I'm not a big cow, I'm just an Internet veteran with eight years of work experience. I have neither a very bright academic background nor a great resume.

This winter, when the child was transferred from the emergency department to the hospital, I got the news that it would be optimized a year ago. As the only source of income in the family, I was completely blinded. For a while, my loss and helplessness almost ruined me.

In the end, I told the truth with my family, greatly encouraged by them, and found the courage to start again.

It's a pity that this epidemic has come so fiercely, with few positions, high demands, telephone interviews, video interviews, online coding, repeated defeats, and repeated defeats, which constituted my regular life for these two months.

At first, I was anxious and at a loss, and even deeply doubted my ability and work experience.

Later, after a few good friends' encouragement and encouragement, I also read many interview articles by Ao Bing, and carefully summarized the deficiencies in my interview, checked the missing parts, and finally confirmed the offer this week. (This is the original word, I really didn't add haha)

Next, I will talk about Tencent, Gaode, JD, Meituan, Are you hungry, Kuaishou, ByteDance, Didi, 360 Finance, GSX, Netlink Liquidation, BMW Brilliance, Quick Read Comics, We summarize the interview questions of companies such as Momo, Maimai and so on. It is a good time to start a discussion and hope to help everyone.

Summary of interview questions

Basic problem

Linux and network basics

(1) What is the kernel mode and user mode of the linux system, and what is the difference?

(2) What are BIO, NIO and AIO, and what is the difference?

(3) The difference between TCP and UDP?

(4) Describe the TCP 3-way handshake in detail, the difference between TCP and HTTP. The byte interviewer asks the most detailed questions. He will specifically ask about the specific implementation logic of the 3-way handshake at the bottom of TCP, and what will happen if the third handshake fails.

It is recommended that you also read the 4 waves when TCP is closed. Ao Bing’s article has it. Reading at least the superficial things will not trouble you. Because this is the most basic question, if the answer is not good, the interviewer’s impression is you know.

(5) The difference between rpc and http, do you know any rpc framework.

(6) What encryption method is implemented by https compared to http, is it symmetric encryption or asymmetric encryption?

(7) How to do group summation with linux commands, how to turn strings into arrays based on delimiters (I suggest you read Ao Bing’s linux commands)

JVM basics

(1) Briefly introduce the JVM virtual machine (this question is not to divide the JVM into JMM, class loading and GC to ask, we must think about how to describe the JVM)

(2) Briefly describe the process of a GC (Do you remember the Minor gc and Major gc processes)

(3) What is JMM?

(4) What does JVM shared memory have, and what is off-heap memory?

(5) Introduction and differences between GC area, garbage collection algorithm, garbage collector, G1, CMS, ParNew and other garbage collectors.

(6) The class loading process (5 processes are best studied and understood, because it also involves knowledge of stack frames, partial tables, operand stacks, dynamic links, and method exports. You can understand by reading Ao Bing’s article )

(7) Is the result obtained by getClass() of two objects of an ArrayList the same (understand class loading and Class class type)

(8) How to check the problem of deadlock (-XX:+PrintGCDetails)

(9) Gc logs have to be able to read, especially when asked how to check OOM problems, you should know to use jconsole, jstat, jmap, jvisualvm and other tools to check the gc status, and see if the young generation settings are too small and cause major gc is frequent or memory leaks.

JAVA foundation and multi-thread foundation

(1) What optimizations did synchronized make in JDK6, the difference between synchronized and lock

(2) Is duble check thread-safe for lazy singletons? Why add volatile?

(3) What is the use of Volatile and what is CAS

(4) What is the happens before principle

(5) What is AQS

(6) The difference between thread sleep and wait, what does thread join mean?

(7) What kind of locks does Java have (Ao Bing's article)

(8) There are several types of thread pools, including coreSize, maxSize, survival time, waiting queue, and rejection strategy.

(9) The realization of Java optimistic lock (CAS+spin)

(10) The realization of blocking queue, at least two blocking queue methods (single lock, multiple locks, ReentrantLock, Condition)

(11) Differences between CountDownLatch, CyclicBarrier, and Semaphore, and usage scenarios

(12) Is HashMap thread-safe, how is the bottom layer implemented (get, set, resize), what changes have been made before and after JDK1.8, and which HashMap (LinkedHashMap, TreeMap) needs to be used to make the insertion of kv in order , How is the thread safety of ConcurrentHashMap achieved (the implementation is different before and after JDK1.8)

(13) The difference between ArrayList and LinkedList, the difference between stack and queue. The difference between Queue and Deque

(14) Netty, Jetty realization principle.

(15) Java static proxy, dynamic proxy

(16) Forkjoin model

(17) Java callback

(18) The difference between coroutine and thread

(19) What are the new features of JDK1.8, do you know about functional programming (look at guava if you don’t know)

Data structure algorithms and design patterns

(1) Design patterns are generally derived from the underlying implementation of projects or tools, so you need to understand some of the more common design patterns, such as factories, singletons, observers, commands, adapters, agents, etc.

(2) The algorithm is mainly search and sorting, so at least you must write mainstream sorting algorithms and search algorithms

(3) How is the LSM tree implemented? What is the difference between B+ tree and mysql (LSM tree is the underlying storage structure of hbase and levelDB, I don’t understand it shouldn’t)

(4) Binary tree, balanced search binary tree, red-black tree, etc.

(5) Stack, array, linked list, queue, deque, jump list (redis zset), etc.

spring series

(1) AOP, IOC concept

(2) Introduction to Spring cloud components. Hytrix and eureka are more specific. Hytrix mainly asks how to achieve current limiting and downgrading (thread pool and semaphore), what is the difference between the two implementation methods, and the specific configuration when the circuit breaks; Eureka mainly introduces the difference between zookeeper and the registration process

(3) Many Spring boot configurations are annotated, so you should know the commonly used annotations

(4) The difference between filters and Spring interceptors

Message Middleware AMQP

It's enough to read Ao Bing's article

redis cache related

It's enough to read Ao Bing's article

(These two paragraphs didn't laugh me to death)

Other types

(1) How to do single sign-on system (SSO system)

(2) Why choose cassandra instead of hbase and what is the difference between the two

Big data problem

hadoop

(1) What are the processes of hadoop1.0, introduction to hdfs and mapreduce

(2) What did the namenode do when the cluster was initialized, what were fsimage and editslog?

(3) What is the role of SecondaryNamenode.

(4) Hadoop read file and write file process

(5) Introduction to the Mapreduce process (note that this is the basis, not to lower the impression score), the shuffle process, the process of job client submission, etc.

(6) How does Mapreduce perform serialization and deserialization (inputFormat, outputFormat)

(7) What task schedulers are available in Jobtracker

(8) What optimizations have been made in Hadoop YARN, what processes does YARN have, and how YARN submits jobs?

(9) Mapreduce optimization (mapjoin, combiner, small file merging, etc.)

(10) Briefly describe how to implement Hive table join with mapreduce, mapreduce secondary sort, the difference between secondary sort partition and grouping

(11) HA implementation of Hadoop cluster (zookeeper realizes master/backup and federation, it is best to understand the concept)

(12) How do other frameworks such as spark integrate with yarn

(13) Spark optimization compared to mapreduce (memory calculation, RDD, etc.)

(14) Give you a user table with 10 billion pieces of data and a piece of 100MB memory, how to remove duplicates or judge whether a user is in it (bitmap, bloom filter, etc.)

(15) Bonus item: Have you read the hadoop source code? Let me introduce the specific source code.

hive

(1) Hive data warehouse architecture

(2) How does Hive convert sql into mapreduce (at least know that the sql parser parses into an AST syntax tree, then parses into queryblock, enters the execution queue, etc.)

(3) Hive basic data types, combination types (I asked how many types of int in Hive at the time, I was confused)

(4) Hive underlying storage type, compression format

(5) Hive UDF, UDTF, UDAF, window functions (row_number, rank, cube, rollup, lag, lead) (usually asked following sql coding)

(6) Hive optimization (count(distinct xxx), remove null values, merge small files, optimize the number of map and reduce, solve data skew)

(7) The difference between Hive partition and bucket. What problem does bucketing mainly solve? The difference between internal and external tables. How to dynamically partition.

(8) How does Hive automatically complete the partition (MSCK command, this is relatively uncommon, just know that there is this thing)

(9) HIve column storage, how to store data in rcfile, orcfile and parquet

hbase

(1) Introduction to hbase architecture

(2) How Hbase reads and writes data in detail

(3) Application scenarios of Hbase

(4) Hbase optimization (hot spots, pre-partitioning, rowKey design, manual merge, etc.)

(5) Why does Hbase write fast and read slow (LSM tree)

(6) Is Hbase cp or ap architecture? (Do you understand CAP theory, hbase is CP)

(7) How does Hbase scan data.

kafka

Why is Kafka put in big data? Because Kafka is the source end of ETL process and streaming computing process in most scenarios

(1) Introduction to Kafka architecture

(2) Why Kafka is fast, has good performance, and has high throughput (see mmap and sendfile)

(3) Will Kafka lose data? Are Kafka messages in order?

(4) How does Kafka producer consumer realize at most once and exactly once (idempotent calculation and transaction)

(5) How to achieve Kafka high availability (AR, ISR, OSR). Will it be split brain (no, refer to zookeeper election)

(6)Kafka leo(log end offset)和hw(high watermark)

(7) When Kafka consumer consumes a certain partition of topic, what is the difference between consumers in different groups and the same group.

(8) There are several kinds of Kafka ack, what does each mean

(9) What are the pitfalls of Kafka and how to improve it (brain storm)

(10) What is the difference between Kafka and traditional message queues such as rabbitMq

zookeeper

(1) Introduction to zookeeper

(2) Zookeeper node type

(3) Zookeeper watcher mechanism

(4) Zookeeper usage scenarios, how to use zk to design active and standby high availability, how to use zk to implement distributed locks (see Ao Bing's article, in fact, it is the establishment of temporary sequence znode and the magical effect of watcher mechanism)

(5) Zookeeper election mechanism, will brain split

Other tools

Because I don’t have a lot of spark and flink project experience, so I ask less in this part

(1) introduce storm, spark, flink

(2) Spark RDD

(3) How does Spark stage divide tasks?

(4) Spark wide and narrow dependence

(5)Spark shuffle

(6) Why is Spark easy to OOM

(7) What are the Flink window types

(8) What is the Flink water level, what problems should be solved, and how to ensure the order of messages

(9) How does Flink achieve exactly once

The mysql, redis, flume, sqoop, es and other tools involved in the project will also be specifically asked. I will not go into details here. For redis and mysql, you can just read Ao Bing's article directly.

Data warehouse related problems and data analysis, algorithmic problems

(1) How did you design the data warehouse

(2) What is a data warehouse and what is the difference with a database

(3) How to layer your data warehouse

(4) Dimensional modeling process, other types of modeling methods

(5) What is the difference between Inmon model and KimBall model

(6) How to refine business indicators

(7) How to design fact table and dimension table

(8) Some concepts of data cube

(9) What is a slowly changing dimension and how to deal with this slowly changing dimension

(10) In the specific project, I will ask whether the log or data is stored incrementally or in full. It may be extended to the zipper table, and even let me implement the zipper table (byte 2 surface is the zipper table process is not written correctly, so later Simply implement it on mysql and you will understand)

(11) Do you use python and R for data analysis? Do you use tools such as SPSS, EXCEL, and tableau?

(12) What multi-dimensional query engine have been used (impala, kylin, presto, druid, etc., if you haven’t used it, don’t say you have used it, because the interviewer may ask very detailed and low-level questions if you know it well)

(13) The concept of MPP, the use of tools such as clickhouse

(14) Dispatch system report system metadata management system blood relationship analysis and other system design, label system design, AI algorithm implementation, user portrait design, etc.

(15) Talk about the understanding of the data center and what problems to solve (see your thinking ability and understanding of the functions of the data department)

(16) Talk about the understanding of data governance

Coding

Here are just some simple problems that you have encountered. You should master the basic search algorithms and sorting algorithms, and use recursion and greedy proficiently. It is better to understand dynamic programming.

If you have time, you can review the questions on Leetcode, because thousands of questions will take a lot of time. For students who need to prepare for the SQL test, it is recommended to do all the actual SQL on the Niuke online database.

(1) Implement a function to combine two ordered int arrays into a new ordered array (java, encountered 2 times)

(2) Fully sort the a[n] array and find the previous combination of a combination, such as a[3]{[1,2,3],[1,3,2],[2,1,3] ,[2,3,1],[3,1,2],[3,2,1]}, given [2,3,1,], the preorder to find him is [2,1,3] (Java)

(3) Given a positive array arr (that is, the array elements are all positive numbers), find the maximum value of the subtraction of two elements in the array, where the subscript of the minus is not less than the subscript of the minus. That is to find: maxValue = max{arr[j]-arr[i] and j >= i} (java)

(4) There are 8 balls, one of which is heavier than the other 7 balls. Give you a balance and ask for 2 weighings to find the heavy ball. (Intellectual questions)

(5) Find the smallest positive integer that does not exist in an array (Java, this seems to be a question in the programmer interview guide)

(6) Given a user login table, how to find users who have not logged in for 3 consecutive days (sql)

(7) Given the detailed daily income data, how to check the sum of historical income for each day (sql)

(8) There are duplicate values ​​in the Hive table, how to find the total number of duplicate values ​​(hql)

(9) Given the registration form and login form, use a sql to ask for 1-7 days retention (sql)

(10) Realize the zipper table (hql)

(11) Given an e-commerce order table, the fields are order id (order_id) and order combination (type_list), find the TOP10 related products of each type of product in this order combination, that is, find the product related to this product (order this product) At the same time, we also place orders for other products) TOP10 (hql, row to column)

(12) Given an advertisement table ad, the fields are aid (advertising id) and citys (the collection of cities city_id) and the city table city_info, the fields are city_id and city_name (city name), find the TOP10 of the advertisement volume of the specific city name . (Hql, row to column)

project

The interviewer inspects the project experience, not only your basic mastery, but also your own business understanding, architecture design, and your own thinking about the project.

Therefore, in addition to the basic knowledge issues involved in the project, you will also ask questions such as which design is better or unreasonable in the project, and how you solved it.

These problems may be architecture, or specific technical details.

But as long as you speak out your own thinking and solutions, an experienced interviewer will probably understand your technical depth, architecture design ability, and problem-solving ability.

Therefore, we must find the bright spots to endorse in advance, introduce the project in a level, and think about the project design or imperfect implementation.

Some interviewers will also ask if you are asked to design a system and how do you design it. I think it’s best to endorse this kind of questions in advance.

Because for students with little work experience, the interviewer mainly depends on the depth and breadth of the questions he wants, but for those with no less than 5 years of work experience, the interviewer pays more attention to whether you have a mature implementation process and methodology.

Therefore, a hierarchical and process-oriented design will greatly increase the favorability of interviewers.

Remember to avoid nonsense and unclear core.

(This is also my problem, because I have not prepared, so I must say it loosely, so that the interviewer feels that you have no core and method to do things by yourself)

Personal values

In general, the interviewers are leaders or department bosses at the back of the technical side. They are actually very concerned about your career planning, work attitude, teamwork ability, self-worth realization thinking, and of course the ability to realize the project. Past project experience and depth.

So it's best to think about how to express clearly in short words first, and focus on key words.

Message to the last classmate

These are some of my suggestions for fresh graduates and classmates who have just worked.

(1) You must keep up with the pace of technological advancement, especially big data-related technologies. When the technology changes, you must learn to be familiar with the new technology and look at the source code. Even if you don’t use it in your work, you must learn it, because this It's a stepping stone to your next job.

As a veteran, I worked for the last company for 4 years. The company did not have the business scenarios of spark and flink, and I did not force myself to learn these new technologies.

As a result, the interview is hitting a wall. In fact, most of the reason is that the mainstream technology of people's home is these, you will be eliminated if you don't.

(2) Do not set a comfort zone for yourself. This means that a company should not be lazy after staying for a long time, and should not waste yourself. Always keep a clear mind and enterprising spirit, keep learning, and constantly improve your own technology, architecture design capabilities, and projects. Management ability, delivery ability, etc.

It is necessary to sum up the experience and shortcomings from the project in time, preferably in the diary, and finally form your own working methodology through continuous thinking.

(3) You must have a plan for your career. You must have your own ideas about what you want to do in the future. If you are sure, you must improve yourself in this direction, learn more and practice more.

At present, there are not many companies that have a data-centered architecture in the big data area. Except for algorithm posts, most people are in a multi-functional role in the team. They do ETL today, data warehouse tomorrow, and BI the day after tomorrow. To export data, it may also engage in platforms such as scheduling systems, reporting systems, labeling systems, and anti-cheating platforms.

No one will focus on a certain piece, but you must figure out which piece is the direction you want to go in the future, then if this direction is determined, you must study this piece of knowledge in depth, read the source code more, and do more exercises. When you come into contact with specific projects, you have to settle yourself in the project, and finally form your own knowledge system.

(4) Be responsible for doing things, don't set boundaries for yourself based on OKR, you must do more if you have the ability and free time, this is also one of the best ways for others to recognize you.

The Internet circle is very small, and everyone recognizes you. It may be easier to go to a big factory and change a good job.

Ao Bing's babble

Yes, let me just say it at the end. The reader who submitted the article is a veteran of Internet experience. The two-month intensive interview has also made him a lot of gains. There is a sentence I like very much, don’t set up a comfort zone for yourself. There is a sense of crisis, it really is.

When we write code, we really don’t just want to make a living. We just think it’s a youth meal. We can treat it as a lifetime business. After 30 years of age, when you transform products and transform architects, you must have code accumulation. Yes, it's not that you can transfer if you can transfer.

A life-long career, I think it is worth your time to learn, there is so much chicken soup.

By the way, do you remember the graduate student who was interviewed by phone? He went to Ali, and congratulations to him and to this veteran reader. I really didn’t expect that many of my articles could really help you so much, and I will continue to write .

I'm Ao Bing, a tool man who has survived on the Internet.

The best relationship is mutual achievement. Your "three consecutive" is the greatest motivation for Bingbing's creation. See you in the next issue!

Guess you like

Origin blog.51cto.com/14689292/2545807