I practice eight months in the post and pieces of data analysis

 

Author: Master double non-statistical just graduated from college, currently working in drops, 8 months of data analysis internship experience, interviewed more than 10 data analysis intern, eventually became the product manager.

 

Two themes

 

The main objective of this article is to help some students just started to understand some of the information in Internet companies 'data analysis' posts, including two themes:

 

1, the Internet company's entry-level data to analyze what main tasks the students is how to improve on the job?

 

2, the Internet company's data analysis is how the interview .

 

The first question to help you consider whether to enter this industry work, the second question would like to help the students to enter the industry to enhance the efficiency of some of the interview, which the interviewer and the interviewer are good.

 

According to the author in some Internet companies work experience, the current data analysis work there are three general direction:

 

1, business data analysis

2, the development bias bin number data analysis

3, data analysis algorithm biased

 

As the experience of the author's limitations, this article focused on "business data analysis" job expansion, taking into account the most readers of this article is to determine the next Friends of the data in the relevant work, so the interview to talk about the content, revisit the work content.

 

01 Internet companies data analysis is how the interview

 

First Sql programming is a must item (some small companies to accept weaker Sql ability students, they will re-training after entry), since preliminary data analysis main job is to write the students Sql statement, it is generally the interview process will let you spot Sql write the title, if not write, that this interview there are certain risks.

 

The end: "Appendix One Sql recommended learning path."

 

After confirmation of Sql ability to cross the border, will resume in exchange for content, mainly to talk internship experience, followed by project experience, and finally the game experience (suggest that you go to practice), because I want to see you play the best side , tend to have internship experience with the boss, if you usually sufficient got the idea for the work, that for the circumstances surrounding this matter will be able to speak more clearly; project experience is definitely a tutor, but also the ability and the degree of attention of the relevant supervisor; The game experience is probably more often the students own research, error-prone.

 

End: 'Appendix II is a project for a sample question. "

 

I was interviewing experience

 

The following lists some technology-related knowledge I asked.

  • Sql connection in the left and right links, connections and full connectivity within

  • Snowflake star model and model

  • Modeling how do encounter missing data

  • What data skew is how to deal with

  • The advantages and disadvantages of clustering Kmeans

  • How to determine the number of categories kmeans clustering

 

Such surface by cattle off the Internet has been more and more comprehensive, and not repeat them here, I was the main speaker to get the core offer of several factors:

 

The first is Sql solid foundation , when I put on all topics Sql cattle and leetcode have written off network, so quick to write optimal solution; there is a brain turn faster, talk interview project was found places have not done well, it can quickly adds realistic background, explain why this had been done in the (round back).

 

My interviewer experience

 

This is the core content of this article, lengthy.

 

First we need to have a correct perception of the interview: the interviewer is not your enemy, an ideal interview experience, in the chat, the interviewer gently lead the interviewer to prove that he does have the ability to meet this post, here under analysis I, as the interviewer some ideas to help everyone to empathy, to enhance the efficiency of the interview.

 

(A) I of honey, he of arsenic. Fellow students are very good, on a resume awards, internship experience dazzling, but most jobs and irrelevant. I was to post to recruit people, not to recruit a good man to give him to find a job, so please be sure to write only and job-related experience in your resume, such as the honorary title, scholarships and the like can be simply a passing not even write, you have to show how good you are not, but how much you match this job , if you do not know where job-matching condition, it is more of a pre-topic, they will not start.

 

(B) known as know, I do not know to know. Less likely to encounter in an interview tell the interviewer something: this is not my field of study, I do not know, talk about something else. Because we can accept your energy is limited, can not do everything, but you do answer will seriously affect the image of the little knowledge in the field, this content will go into detail later.

 

(C) equal dialogue. In fact, this is some of the trick, and professional level has nothing to do, but it can reflect the interviewer's mind, such as when the interviewer to ask questions, "anti-kill" a wave: mention this issue well, talked about the core elements; or " bar "wave: I think you said no problem, but the cause of the circumstances we did not do so is xx. On the one hand these can show up your mind, that you completely nervous, or even want to talk too much; there is one is that you have been away from the interviewee's identity, into my "colleagues" status, we equal dialogue.

 

The actual interview, I have been "open-book test", that is to inform about the time of the interview: The interview content Sql programming problems and resume your project, please get ready. However, 90% percent of the students stuck in Sql problem areas, and in the past internships or projects do not in general lead to pass, I recruited more than four months (2019.10-2020.1) not found the right students.

 

The following is my interview process and in which some students find possible problems:

 

In the beginning of the interview, I'll start a few sql entitled "Appendix III are each interview I will be out of the title" first out of a medium difficulty, if the interviewer can write, it's a little bit difficult; if the interviewer will not, I'd have secretly sighed, then a simple topic ease the awkward atmosphere.

 

After talking in sql subject, I will resume and interview students chatted project or internship experience, and there I found a problem: data analysis, students can easily regard themselves as "the tool man." For example, a classmate told me, send coupons to different groups of people of different user groups he want to, so I asked him how come grouped according to where the user distinguish between different groups in, how each group match coupon?

 

He replied: business party decision.

 

From the perspective of the interviewer speaking, this problem has been clear, and he himself was asked to do things perfectly done, but from the interviewer's perspective, the answer is failing, because I think the students did not find a method of progress, he sees himself as a tool of others, only be told things in life and work are almost impossible to progress.

 

How to avoid it to become the "Tools" and then, a better way is to cultivate their owner mentality: that I'm not here just to give you a help, I will help you get the whole thing.

 

Here to provide a project description template, you can try to drive a set of his own experience, to see how their own history project owner mentality:

 

In the context of multiple driver complaints, we found across the river to send a single issue (appearance), this problem is caused because the system according to the straight line to send a single (internal factors), in order to solve this problem, we (if you are a single person on better friends) proposed to send a single road distance method (preferably with multiple contrast method), the effect of this method is to resolve 30 percent of the hills across the river and send a single issue, the complaints decreased by 50%, I am responsible for the model is designed to determine whether to send a single River, to finish this project, I grew up and send a single driver for a deeper understanding, if I re-do again, I would exchange the previous period and the driver, since it makes the model done faster.

 

It should not only understand, is thinking more and re-set, if there is no boss to lead, limited thinking of the students would be very difficult to make everyone aware of the problem, which is the reason I recommend everyone to do more to practice (an early look at social beatings, ha ha ha ha).

 

Process chat project, we are more concerned about critical thinking .

 

Examples of rigorous thinking: When verify the effect of a drug, it is necessary in several groups? Answer: three groups, a group of medicine, a group do not eat, eat appearance as a group does not have any effect of fake drugs, because medicine can be divided into "eat" and "drug" two things. --- practical problems arise out of the user send five yuan coupons and commodity price 5 yuan direct effect of the same thing? If not, what effect is it better? Why would a business do not always send some discount "coupon" mean?

 

Here recommend a book, " Ask Questions ", it makes you more efficient communication and people at work, at the same time be able to exercise your logical thinking ability , so that everyone can become a "bar fine."

 

"Know as know, I do not know as I do not know", there is no need to make your resume more rich and write some of their own do not understand something, I now see the data analysis the students resume hardest hit by these problems include the following categories:

 

(A) mathematical modeling contest. Since this often does not match the guidance of professional instructors and special emergency time, even if winning the prize, finished quality is not high, for example, I asked one why here use this method? The answer is because most of the students xx papers have used this method, students can hardly tell the pros and cons of other possible methods of comparison, in fact, I myself participated in mathematical modeling, discovered the problem even I can not answer, because time too tight, really was not thinking. Our Strategy is given, mathematical modeling game experience can write, but do not take the initiative to say, more as an experience, if you are talking to, respect the limited playing time, some places are not particularly rigorous.

 

Algorithms (b) machine learning, deep learning. Data do the students learned more or less the point, perhaps realized in the online demo, but the ability of most of the students is not up to the requirements of enterprises, such as neural networks would you say that the disappearance of the gradient and gradient diffusion always know, incidentally, the activation function evolved to understand it; used cnn, cnn that effect in the image field than what dnn good reason. Now see the related content of your resume, I can not take the initiative to open chat, recommended cold field, if I am given is not good science, it is best not to write, because some time ago fiery artificial intelligence, so the interviewer knows this more and more, you will not have had this right, but if it is found not to learn but will learn that you usually not serious.

 

02 Internet companies in the elementary data analysis what main tasks the students are

 

With the decline in the cost of storage and collection of data, companies often have a large collection of user data, including the user's every click to see other acts, with the increase and extend the operating time of the number of users, we have more and more data storage capacity large (more than 20 million pieces daily orders), under the conditions of such a large amount of data, traditional excel almost impossible to operate on the data, the data needs to be processed by writing Sql statement.

 

Therefore the primary data analysis, students usually work most of the time in writing Sql, after the adoption took enough data, enough information has been entered, you can ask questions of the status quo and business solutions, heard the point that data analysis students perceive business development from the perspective of decision support data classmates, and my view is that data analysis should be the students understand the business data (so I jump directly to the business side?).

 

Therefore, data analysis business direction of the students should improve their way is to let yourself know more about the business, I am glad at first internship experience, my boss has always stressed me to understand the business, he said you do in giving prior to demand (access), be sure to ask why this number, look at the business side is how the relationship between the data and the business needs to cut a mistake, than doing ten correct demand enhance your Grand. Space is limited, this temporarily expanded, longer hope for future work experience, have a new perception update again later.

 

Appendix I

My sql learning path is to look at net lesson:

https://www.bilibili.com/video/av9252479?p=26

 

After learning the basic Sql statement, I began to brush the question, cattle off the net:

https://www.nowcoder.com/ta/sql

 

Before jewels article:

I did some SQL questions.

 

There leetcode of:

https://leetcode-cn.com/problemset/database/

 

If you learn the words of the day, basically two weeks can clearance. 

 

Appendix II 

Resume content: speech recognition, extracting voices patterns, sound and speaker can be matched, accuracy 94%; used segan algorithm for reducing noise in the original audio, using cnn place rnn neural networks, will increase accuracy to 98 %.

 

problem:

  • What segan principle is to reduce noise?

  • Reduce and not reduce the noise impact on the results is how much?

  • What is the reason Rnn is as effective as cnn?

  • Why is used here cnn, without considering rcnn or other neural networks?

  • What is the reason the remaining 2% is mainly judged inaccurate, as well as optimization thing?

  •  What is this upgrade brings 4% user value or commercial value?

 

Appendix III

Easy question

 

 

Each row represents the id of this order, complete the order form driver's id, the amount of orders, order fulfillment time, you want a sql: If a given day, any single driver completed five or more, and a total of 5 single amount is greater than $ 50, these days the corresponding output driver id.

 

Output column names: date, drier_id

Knowledge Point: subquery or having to do with screening.

 

Medium title

 

 

Each line represents whether the user active in the day before, if a user is too active in the day, and in the next 2 to 30 days and active too, is said to retain its active users for the day for 30 days, such as a table and b 2019/1/1 users in the active, a is 2019/1/3 active, retained condition is satisfied in the active 2019/1/1 30 days, he had no active in b 2 to 30 days, thus does not meet the active 30 days retention. I want every day of active users and active users 30 days retention

 

Table correct output

 

 

Knowledge Point: retention of connection from the wording, the date written addition and subtraction.

 

Difficult questions:

 

 

Each row represents drivers start playing time (start_time) and the game is over time (end_time), I ask every driver after the end of a game, the game will start to end within the average long time? If the driver only played one game, do not calculate the driver.

 

Table correct output

 

 

Knowledge Point: window function row_number usage, add or subtract anything written.

 

Questions about window function: explain them TMD few popular data analysis interview questions.

 

I think the difficulty is acceptable because it was that wrote these questions, then get the internship offer.

 

Published 363 original articles · won praise 74 · views 190 000 +

Guess you like

Origin blog.csdn.net/sinat_26811377/article/details/104663771