Answer the two basic questions of data analysis

Insert picture description here

                    越是沮丧的时候,越要学会自我拯救。

Introduction: 2020 is destined to be extraordinary. At the beginning of the year, the new crown epidemic broke out and the situation was suddenly tense. Wuhan and China have become the focus and topics of the world. Perhaps for professional reasons, I am interested in several data analysis topics.

Forecast the growth trend of the number of infected people

Comment on the lethality of the new coronavirus

Estimate the ultimate number of infections in the West

At that time, the participants on these three topics were very familiar with each other, and they were often very concerned. The author is ashamed of his poor academic ability and poor analytical skills, so I dare not get involved. In fact, it is also "ashamed" of this matter. Why? These coaxing guys are all technocrats, completely ignoring the influence of the two core factors-① the characteristics of the virus, ② human effort. Dare to predict the trend when the characteristics of the virus (especially the length of the incubation period and infectiousness) are not clear? Nonsense! Ebola, Zika, and COVID-19 face completely different national epidemic prevention forces, so dare you simply use mortality to judge their lethality? Nonsense! At that time, many people actually said that Western countries would eventually control the number of infections to 10,000 or 20,000. joke!

It’s funny. After learning about the incubation period of the virus, one of the author’s judgments and concerns was unfortunately/fortunately stated-“my country’s epidemic will soon be brought under control, but foreign estimates are suspended. How to prevent and control overseas imports will be an issue. Difficult problem! China is likely to become the world's back in the fight against the epidemic." Apart from the only numerical data basis (the length of the virus incubation period), this judgment is based on "intuition", but why is the judgment correct?

This is the basic problem we most often face in data analysis.

01 Who is a data analyst?

I believe that a considerable proportion of data analysts consider themselves to be technical personnel, and a larger proportion of leading colleagues will regard data analysts as technical personnel. It is true that data analysts have broad responsibilities, broader knowledge requirements, and necessary technical capabilities; but if we really regard ourselves as "technical personnel", we will inevitably fall into embarrassment-tall data analysts have become Run a small reminder of the report.

The author believes that technology is a tool to achieve goals, and business and solving business problems are the fundamental, foothold, and starting point of data analysis. Therefore, understanding the business, being familiar with the business, and understanding the business are what data analysts must do, and is the most important thing. Their understanding of the business is often more thorough, comprehensive and systematic than those of the front-line business personnel. Only in this way can we know:

What data do we need to collect and capture?

What data can be used instead of the data that cannot be collected?

The meaning of each data can determine the most appropriate data caliber.

The logical relationship and causality of the analysis elements? relationship? Doesn't matter?

Is the conclusion valuable? Can the conclusion be reached?

……

If you are not familiar with the business or have a thorough understanding, the so-called data analysis is easy to be taken for granted and metaphysical, and cannot solve practical problems. Therefore, it is only natural that the company, leadership, and business will not be recognized.

This is a question of self-perception of data analysts.

02 What are the limitations of data analysis?

What is the data? Data is a statement of objective existence. Objective existence is inherently complex, and the relationship between objective existence is more complicated. A comprehensive understanding of an objective existence is a huge challenge for anyone. And data analysts have to face this challenge head-on.

1) What factors affect the analysis object in turn? In fact, it is easy for us to overlook certain key factors. For example, the "length of virus incubation period" and "degree of human effort" mentioned above, such as the impact of weather on human behavior. Even if we select key factors from many factors through big data analysis, it is difficult to guarantee that we have listed all the key influencing factors.

2) What are the weights of various influencing factors? You must know that the weight is changing, and the influence of each element is different as the environment changes over time.

3) Many key factors are difficult for us to quantify or obtain data. For example, how to quantify the "level of human effort" above? Assuming simple measurement through capital investment, we also find it difficult to obtain accurate data; let alone the investment in response to the Ebola virus.

4) Is the technical capability sufficient for organic analysis of various factors? Data analysts who claim to be technicians have more say. What we learn basically belongs to the methodology of Western thought, which has the disadvantages of not being exhaustive and fragmented. We are often trapped in using the priori (limited to the environment) to verify and not verify, and the (self-righteous) known to verify unknown. For example, in western medicine, the arm is retreated or retreated, which seems to be clear, but often loses the other. When the business environment changes, everything becomes invalid.

If we understand the above two key issues, we will naturally understand:

Deceptive data

Timeliness of data

Data limitations

I understand why the judgments of excellent business personnel are often more accurate than the conclusions drawn through complex data analysis, and I understand why which business personnel do not kill us. Excellent professional staff, like old Chinese doctors, know how to treat and analyze problems systematically and dialectically. Sometimes the process is like metaphysics, but it has its logic, but it is effective.

Why do I have such a judgment on the development of the epidemic, but I still judge it accurately? My logic is actually very simple-

Due to the long incubation period of the new coronavirus, asymptomatic patients will spread widely from this period.

Unless the personnel are effectively isolated.

Our country has the feasibility of effective isolation of personnel in terms of system, culture, and nationality.

However, the prevalence of "liberalism" and individualism in the West, coupled with the unsystematic basic-level organizational conditions, makes isolation almost impossible.

There is nothing to do with any other data here, only deductions similar to the Yin Yang and Five Elements. There is no other data here, but we cannot say that this is not data analysis, at least that it is not analysis. The main word of "data analysis" is "analysis", not "data."

The private place of a data person is a big family that helps the data person grow, helping partners who are interested in data to clarify the learning direction and accurately improve their skills. Follow me and take you to explore the magical mysteries of data

1. Go back to "Data Products" and get <Interview Questions for Data Products from Big Factory>

2. Go back to "Data Center" and get <Dachang Data Center Information>

3. Go back to "Business Analysis" and get <Dachang Business Analysis Interview Questions>;

4. Go back to "make friends", join the exchange group, and get to know more data partners.

Guess you like

Origin blog.csdn.net/weixin_49880348/article/details/112007699