The era of big data, data can easily be misleading eight questions

Now do sales, marketing people if you do not know how to analyze the data, with the data speak that is really behind the times. Many business leaders is seen opening "Take me to see the data, no data is how I make decisions ah?." Data analysis can be seen in today's business management do occupy a very important position, and the data analyst also the most promising one of the ten the next decade career.

Look at the case of the use of a data Bluff: Spain and the United States during the war, the US Navy's mortality rate is 9 per 1,000, while the mortality of New York residents are sixteen thousandths. Later, the Navy recruiters will use these data to prove the army safer. Do you think this conclusion is correct? Of course not correct, these two figures simply do not match, the young soldier are able-bodied, and mortality data including the sick and elderly residents, who are relatively , the mortality rate is high. It should be normal to compare the data with the Navy and New York residents of the same age.

In fact, you find that 9 ‰ and 16 ‰ simply do not have the contrast can be.

Business executives on the "fake" data is deep evil pain of the disease. The reason is self-evident: "false data" a waste of resources, poor decisions, adversely affected by aircraft and so on. Briefly summarize a few aspects of "questionable data", as soon as possible to help you excel eyes. We need to remind everyone that "the data in question" does not mean necessarily "false" data, because some data is true, but the conclusion does "false". Common use of data

Mislead everyone's circumstances are the following:

A, free to create "fake" data to customers or consumers Fudge

Please forgive me for using the "create" the verb.

This situation can be seen everywhere, for some people or organizations, the seriousness of the data is simply empty talk, what they want to compile data on what data they called "series" committee. In this case, we have to ask a few why, ask the data source on it. Remember that "no data (source) there is no truth." For example, newspaper circulation is always the world's most intractable puzzle, I do not know the answer, I know:

1, the amount actually issued their own media announced that their top issue record, in general, we used to get rid of the "highest" word

2, when some of the highest circulation newspaper in order to create, directly to the newspaper from printing or leave garbage station, this is well shameless fraud, however, after ban

We look at whether the digital error in this sentence: salesman Xiaoqiang has 24 customers, in April not repeat customer purchases ratio of 78% (Note: do not repeat customer purchases ratio = total number of customers have orders / total number of customers ). The answer is wrong, because 78% of this data will never count out.

Second, the problem of value orientation

Such a means of concealment and having deceptive. What is the value oriented? Presupposes that a conclusion, and then select the most conducive to the conclusion of the crowd for market research or study, known as the last universal laws or conclusions. For example, the average wage, I want him to high office went to visit, I want to make him low, then as the labor market now! This method is a deceptive trick, no good, but a lot of people are very enthusiastic!

The ultimate use of this method is that some market research companies or government agencies. For example, an area said to be a year within six months prices will cut prices much more than half a year later they did, but people did not feel the trend of falling prices, why? That they play the numbers game, the sample was six months ago city ​​house prices on average, six months after adding a suburban house prices averaged after.

Most market research company is the value oriented enthusiasts. Many business owners will be asked to market research company in accordance with their conclusions sampling survey, and then use this data to do advertising, public relations, consumer fraud. Some companies survey data is true (that is, the number of samples investigated enough, and there is no directional select respondents), but the conclusion is false. Because companies can also take directional conclusion. For example (this example is to illustrate the problem, assume that the data, do not take it seriously), such as some kind of toothpaste propaganda: After using this brand of toothpaste will reduce tooth decay by 23%, this data is the data of market research. Of course, this data you must be tempting. Because you think the opposite of that reduction can not reduce if you know he's behind there might be something like:! 23% reduction in tooth decay, 40 percent of people do not have any reaction, 37 percent of tooth decay but increased (but this unlikely).

Look at this picture you will understand

Third, Tian Ji's horse

Tian Ji's horse racing story you must have heard, the situation is misleading to use Tian Ji's horse is more common. Look at an example, a well-known B2C website by the end of 2010 to engage in a "universal berserk" activities, after the event, someone wrote on the microblogging: on the transaction data, the average daily turnover in 24 years has been big promotion far more than the sum of the average daily sales of supermarkets under the 2008-09 Gome, Suning and Brilliance three lines. In this sentence, it is no problem, there is no wrong before and after the data comparability with conventional daily maximum sales and marketing yourself when others do the comparison, this comparison does not have any meaning. This is just like Liu Xiang won the championship Paralympic Games can they do? Simply not a group.

Let's look at a set of data: December 20, 2010 to December 26 the movie "If You Are the One 2" and "Let the bullets fly," the week at the box office, respectively 240 million and 210 million (Note: Non-2 is listed on December 22, let be listed on December 16). From whether two data we can conclude that: "non-2" box office well beyond "so that" at the box office. From a purely data, the fact that no two comparative data do not match. Because 12.20-12.26 "non-2" in the first week of the film, is to "let" the second week of the shadow. Normal box office blockbusters are high in the first week. If we look at them in the first week of box office data: Let listed on the first week of four days at the box office total of 290 million, an average of 070 million a day, 25 days before the non-shadow box office 240 million, an average of about 050 million yuan at the box office, "so that" box office but much higher!

Tian Ji's horse is actually in the choice of the conclusion of the data. Consistency of the data that we need to watch out all the time, this area is prone to make mistakes, and sometimes we look very reasonable comparison could also be very unreasonable.

Fourth, the system error data analysis

Data analysis and sometimes human factors, and sometimes may also be system error. For say: Say Personnel Department to investigate a house in what other people think of the new general manager, there are five options: Very much like it, do not feel, do not like, do not like. He requested anonymity vote. After recovering ballot results are as follows: 25% are very fond of, like 40%, 20% did not feel, do not like the 10%, 5% did not like. Because voting is anonymous you may think that the data is no problem, right (assuming no flattering phenomenon).

My answer is not necessarily. Because it is likely there are many employees simply do not vote. The reason they do not vote are likely to be busy or do not know the survey did not have time to vote and so on, there is the abstention these are likely to vote "dislike" of people, they do not want to express their true thoughts, so they have " purpose "to give up voting. Think of the UN General Assembly abstention right, a little mean it. Also, if the investigation into the following five options Sort: do not like, do not like, I do not feel, like, very much. Or just those who cast their votes to vote, the result may be different Oh!

Highly recommended reading articles

40 + annual salary of big data development [W] tutorial, all here!

Zero-based Big Data Quick Start Tutorial

Java Basic Course

web front-end development based tutorial

Basics tutorial to learn linux

Big Data engineers must understand the concept of the seven

The future of cloud computing and big data Five Trends

How to quickly build their own knowledge of large data

Guess you like

Origin blog.csdn.net/yuyuy0145/article/details/92847528