Big data correlation is non-related data

Meaning big data is to find a certain degree of correlation from vast amounts of data, and then push the possibility of performance behavior. From this perspective, a lot of people are talking about big data and concepts to optimize the compilation of relevant data, are they not the same thing

I'm no expert on large data, but because the data is open proposition, so I will own cognitive level, talk about my views on big data, more focused on the financial application data.

What is Big Data Section

First, big data is not new

As early as 1980, the famous futurist Alvin Toffler in "The Third Wave", a book, a large data enthusiastically eulogized as the "third wave." However, until around 2009, "big data" became Internet IT industry buzzword. 2013, with unprecedented popular Internet financial, but also a real sense only on "big data" pushed to the climax.

If you explore the association between the popular Internet financial data and large, behind a very crucial factor is that Internet banking has been unable to answer a core proposition - risk control. That the Internet did not find a risk control method than the traditional financial advantage, making Internet banking suffered questioned in the course of the rise of.

And turned out of big data has become an important spiritual pillar of Internet financial supporters. Big Data more unpredictable, with more imagination, the financial terms of the Internet, with more explanatory power.

So what is big data? I talk about my views.

About 45 years ago, the United States, "Business Week," a senior writer John Byrne a best-selling book "Ten Outstanding blue blood" on the Internet today, Chinese financial theory and big data enthusiasts had a tremendous impact. The book tells the story of young ten, came from elite Harvard and cherished ideals, it is a genius. During World War II they became the hero of the US Air Force logistics, fruitfully the number of management models used in the war, the Allied billions of dollars in cost savings, helped the Allies to victory.

This shows that the analysis and management of data, have long been used in national and commercial operations of the war, why now have to give a "big" word in front of the data it?

In the last Hongru On the Road, listening to the course of the Academy of Social Sciences Professor He Fan, he mentioned that the essence of all social phenomena are statistical phenomenon, not as a clear causal relationship like lab tests. Such as supply and demand economics theorem is a statistical law. But the worst of human cognitive abilities is statistical thinking.

Nobel Laureate psychologist Daniel Kahneman once said, the human brain has two sets of thinking. A set of thinking is instinctive, such as our language skills, ability to imitate, sixth sense, etc., are born. Such as MIT linguist Noam Chomsky pointed out, why a child to 3 years old can learn to speak, but to the teens can learn calculus it? Linguistics so fast, he can not children learn it is the brain when he was born on a set of pre-installed system.

In addition, the ability to wind blows do not specifically teach or learn, the children will already have. So this is the first set of the human system, the system also allows us to quickly react. Second system, when we do the math reasoning, especially when the need to use statistical analysis. Second system running very slow, because the memory is too large accounts, we often make mistakes because decisions too quickly, a problem in terms of statistical judgment.

So, we are talking about big data, the thought is a trendy concept, but as in terms of methodology, big data has a long history. ? So why now we suddenly talking about big data of it mainly because now more and more data, on the one hand with the IT revolution, storage and computing power continues to increase, there may be unlimited storage, instant new era of computing in the future; On the other hand, it can be digitized things more and more, in the past only numbers, but after accounting system out economic activity can begin by accounting data of.

It took the pad and e-reader text now, the image is digitized. With the data can become more and more things, the ability to calculate and process data is growing, it suddenly found that this thing is very interesting. Once the statistics and now large-scale data together, it will overturn many of our original thinking.

Second, the relevant data is the large non-relevant data

Big data is right now a marvelous important reason, is generally considered a large data set can be effective deduction and future direction, very accurately infer the probability of occurrence of certain events, reflected in the financial industry where that can be well controlled risk. So, really true?

Written in 1942, the science fiction "Galactic Empire", it tells the story of when the Galactic Empire's most prosperous, the most talented mathematician, Seldon said he could foresee changes in the historical and potential future crises, as long as the amount of data is large enough . At that time the population is already a trillion, enough for him to accurately predict future changes.

So, Seldon quietly built an "al-Qaeda", and "second al-Qaeda." When the so-called "Seldon crisis", we put him prerecorded video tune out, he will explain the crisis has predicted 99 percent of its possibilities in the video, and tell people how to do .

Professor Seldon's book first and foremost a mathematician, but also a psychologist. He behavioral psychology from the gigabit society people in sample analysis to infer the evolution of society. He put all kinds of accidental factors have been established in the amplitude range of variables, once the mutation appeared to be slaughtered, so that the process of return to society set to go.

He also did some basic definitions that are implicit assumption that a society, that is, "as a human research subject, the total must be sufficient to use statistical methods to be addressed, there is a necessary assumption is that the population must no one knows itself is a psychological analysis of the sample, so that it can ensure that all the reactions are truly random. "

As can be seen, meaning big data is to find a certain degree of correlation from vast amounts of data, and then push the possibility of performance behavior. From this perspective, a lot of people are talking about big data and concepts to optimize the compilation of relevant data, are they not the same thing.

What is the relevance of non-relevant data? For example, public health, infectious disease research is difficult, because the fast onset, the patient died immediately, as cancer is difficult to study what it's like pathology Yes. I finally found how contagious the way it? Cholera, for example, actually there are two maps, one is the distribution of cholera patients, and the other is a map of the City of London wells, and eventually found two pictures of there are some links between the law, so that may be associated with drinking water.

In science developed in the past, we are looking for distribution through wells with cholera two non-related data to the relevance, though I do not know why, can not explain, but was able to make a better prevention. Relevance of this non-relevant data is discovered by accident, not deliberately look for results.

The reason is that in the past there is no data collection technology, resulting in the amount of data is too small, can not carry out than the right to non-relevant data. On the other hand but also because of the limited processing power of the data, even if the data collection, data processing can not be effective, means to find out the correlation between various data are extremely limited.

Therefore, past data mining Ye Hao, data analysis worth mentioning, or more for the collation and analysis of data associated with the prior, as the basis of all logic. Including the "Ten Outstanding blue blood" inside behavior, including Ali financial risk control and other so-called big data, the data itself can be deduced, the probability is calculated on top of this. If these data are defined as big, so big data is not a new concept. Talk about big data, it would be a flood of correlation analysis of different dimensions of data for correlation of data itself has, at most, known as data optimization Bale.

I feel, the more big data is based on two aspects: the first is the Internet makes big increase in data generation capacity, have a realistic comparison of different data and the possibility of treatment; the second is the substantial increase in computing power the original single central computing cloud computing becomes a way characterized by distributed computing, making massive increase in processing power, and now finally possible to statistically large-scale data integration together to find a lot of interesting things.

On the one hand there is a correlation found a lot of events, but do not know why-related; on the other hand is found that many previously thought to be unrelated matter, are linked to the original internal. Eventually become the basis for our decision-making, and greatly enhance our management efficiency or the ability to deal with things, but also greatly subvert our original thinking.

I once joked that what our earliest Big Data thinking is it? Is derived from the Book of Changes numerology school is definitely thinking big data. You are considered a hexagram, and then say someone died in the evening, he died. Both do not have any relationship ah, but results from the statistical point of view, is often extremely high accuracy rate. What is the correlation between the two maintained, very often there is a causal relationship, often no causal relationship, of course, it may also not know the causal relationship could not be verified.

I put in the "payment revolution" in too, little big pull data, talk about the small data. Why? Because big data to try to collect more non-relevant data to calculate its relevance, with the inevitable significant cost and uncertainty of the results. The cost of large, because the cost of data collection and collation costs are high; the result uncertain, because all non-relevant data before you is very difficult to determine the relevance of these data, and the need to constantly crash tests, not only costly, but also it may be found after exhaustive, these data are not really related, so the inputs and outputs are often very asymmetric.

Last encounter the concept of vice president of HP's mention of big data, he said that according to their understanding, global companies have the ability to carry out the so-called big data applications will not be over 50, my own guess about the same. Even if many companies can use big data, earnings tend to be obtained is not enough to cover the huge costs. I hear a lot in the last summit P2P companies in that they can be a great business precipitation data, then use big data technology to improve the quality of risk control. This is called what? Called typical Mozhizhemowei.

Recommended Reading articles

Big Data technologies to share Zookeeper cluster management and elections

Training Big Data technology sharing: Hbase Explained

Three large data compulsory courses

Guess you like

Origin blog.csdn.net/sdddddddddddg/article/details/91631319