How should we think in the era of big data?

As rational human beings, we are not satisfied with just venting our feelings, but hope to grasp more facts through data and think rationally.

  In this day and age, there is no shortage of information. It is obvious what information consumes: it consumes the attention of the recipient of the information. The richer the information, the less attention it will lead to…

  Today, information is not lacking, what is lacking is our ability to process information, and our limited attention span is a major bottleneck in organizational activities. Herbert Simon of Carnegie Mellon University pointed out that human rationality is limited, so all decisions are based on the results of bounded rationality. He went on to propose that if the information stored in the computer—that is, data—can be used to assist decision-making, the scope of human rationality will be expanded, and the quality of decision-making can be improved.

  In the era of big data, one of the problems facing human society is how to make better use of data to assist decision-making.

  

  For small data, the most basic and most important requirement is to reduce errors and ensure quality. Because the amount of information collected is relatively small, we must ensure that the data recorded is as accurate as possible.

  Whether it's determining the position of a celestial object or observing the size of an object under a microscope, many scientists are working to optimize the measurement tools in order to make the results more accurate. When sampling, the requirements for accuracy are even more stringent. Because the limited amount of information gathered means that subtle errors can be magnified, potentially affecting the accuracy of the overall result.

  However, allowing inaccuracy has become a new bright spot, not a disadvantage, in the emerging new situations. Because the standard of fault tolerance has been relaxed, people have more data and can use this data to do more new things. So it's not as simple as a lot of data is better than a small amount of data, but a lot of data creates better results.

  Google Translate is better not because it has a better algorithmic mechanism, but because Google Translate adds all kinds of data. In 2006, the trillions of corpora released by Google were some discarded content from the Internet.

  Using this as a "training set", Google can correctly infer the likelihood of English words being paired together. Google's corpus is a qualitative breakthrough, using a huge database to make a leap in the direction of natural language processing.

  At the same time, we need to fight all kinds of chaos. Confusion, simply put, as the amount of data increases, the error rate increases accordingly. So if the amount of data collected increases by a factor of 1000, some of the data collected may be wrong, and as the amount of data increases, the error rate may continue to increase.

  When integrating various types of information from different sources, it also adds to the level of confusion because they are often not exactly the same. While these mistakes can be avoided if we put enough effort into them, in many cases, embracing mistakes will benefit us more than committing to avoiding them.

  If the traditional thinking mode-accuracy is applied to the data-based and networked 21st century, important information will be missed, and obsession with accuracy is a product of the era of lack of information. When we have a lot of new data, accuracy is less important, and we can also grasp the trend of things because we do not rely on accuracy.

  Big data not only frees us from the expectation of accuracy, but also prevents us from achieving it. However, aside from contradicting our intuitions at first, accepting the imprecision and imperfection of data allows us to make better predictions and better understand the world.

  Compared to the era that relied on small data and accuracy, big data helps us get closer to the truth because it emphasizes the integrity and hybridity of data. The appeal of "partial" and "exact" is understandable. But when our vision is limited to the data we can analyze and determine, our overall understanding of the world can be erroneous and skewed.

  Not only the motivation to try to collect all the data, but also the right to see things from all different angles. So, confined to narrow small data, we can be proud of the pursuit of precision, but even if we can analyze the details of the details, we still miss the whole picture.

  Like Impressionism, every stroke in the painting feels chaotic up close, but take a step back and you'll see that it's a great work, because when you step back you can see the painting the overall idea.

  This is a change in the way of thinking in the era of big data, not obsessed with the pursuit of precision, but embracing chaos. Another shift in thinking is a greater emphasis on correlation rather than the pursuit of causality based on assumptions.

  Correlations are also useful in the world of small data, but they shine in the context of big data. By applying correlations, we can analyze things more easily, more conveniently, and more clearly than ever before.

  The core of correlation is to quantify the mathematical relationship between two data values. A strong correlation means that when one data increases, the value of another data is likely to increase as well. Take Google's Flu Trends, for example: The more people in a given geographic location search for a particular term on Google, the more people in that area have the flu.

  Conversely, a weak correlation means that when one data value increases, the other data value hardly changes. For example, we can look for a correlation between individual shoe size and happiness, but find that they have little to do with each other.

  Correlations help us analyze a phenomenon by identifying useful correlates, not by revealing its inner workings. Of course, even a strong correlation doesn't necessarily explain every situation, such as two things appearing to behave similarly, but most likely it's just a coincidence. There is no absolute relationship, only possibility.

  That said, not every book Amazon recommends is a book a customer wants to buy. However, if the correlation is strong, the probability of a relevant link succeeding is high. Our understanding of the world no longer needs to be based on assumptions, assumptions made about phenomena about their production and internal mechanisms.

  So we also don't need to make assumptions about which terms indicate when and where the flu is spreading; we don't need to know how airlines price airline tickets; we don't need to know the cooking preferences of Walmart customers. Instead, we can correlate the data to know which search terms are most indicative of the spread of the flu, whether the price of a plane ticket will skyrocket, and which foods are most wanted by people staying at home during a hurricane. of.

  We replace assumption-based error-prone approaches with data-driven correlational analysis of big data. Correlation analysis of big data is more accurate, faster, and less susceptible to bias. Prediction based on correlation analysis is the core of big data. This prediction happens so often that we often overlook its innovativeness. Of course, its application will be more and more.

  Finding correlations in social contexts is just one way big data analytics take. An equally useful way is to address everyday needs by finding interconnections between new kinds of data. For example, a method called predictive analytics is widely used in business to predict the occurrence of practices.

  For example, a part of the car fails. Because something is going to fail, it won't be an instant, but a problem slowly. By collecting all the data, we can pre-catch the signs that things are going to fail, like the hum of the engine, the overheating of the engine, they may be going to fail.

  The system compares these anomalies to normal and knows what's wrong. By spotting anomalies early, the system can alert us to replace parts or fix problems before they fail. By finding a correlation and monitoring it, we can predict the future.

  In the era of small data, both correlation analysis and causal analysis are not easy, and they are both expensive and must start with the establishment of assumptions. Then we'll run an experiment - the hypothesis is either confirmed or disproved. But since both start with assumptions, these analyses are susceptible to bias and are prone to error.

  At the same time, data for correlation analysis is hard to come by, and the collection of this data is expensive. Today, with so much data available, these challenges don't exist.

  By identifying things that may be related, we can conduct further causal analysis on this basis, and if there is a causal relationship, we can further find the cause.

  This convenient mechanism reduces the cost of causal analysis through rigorous experimentation. We can also find some important variables from the correlation, which can be used in experiments to verify causality. Correlations are useful, not only because they give us new perspectives, but the perspectives provided are clear.

  In the age of small data, we make assumptions about how the world works and then test that assumption by collecting and analyzing data. In the near future, we will explore the world under the guidance of big data, no longer limited by various hypotheses. Our research starts with the data, and because of the data we uncover previously undiscovered connections.

  In short, in addition to obsessing over the accuracy, correctness, and rigor of the data, we should also tolerate some imprecision. The data cannot be completely right or completely wrong, and when the size of the data increases by orders of magnitude, these confusions are not a problem.

  In fact, it may be beneficial because it may provide some details that we can't think of. And because we find correlations in data faster and cheaper, and often better, without having to struggle to find causality.

  Of course, in some cases, we still have to do causal research and experimentation. However, in many everyday situations, it is enough for us to know the "what" without having to figure out the "why".

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326403212&siteId=291194637