Big Data era notes

introduction

  • 2009, Google to complete the predicted H1N1 flu by the user's search records. The correlation with the official forecast as high as 97%. And CDC, they can predict where the flu is spread out, and very timely

  • 2003, Aiqiaoni a University of Washington computer scientist as artificial intelligence projects, collect data from travel, developed a system to predict the ups and downs ticket. This system is Microsoft's $ 110 million acquisition.

    The modern era of data explosion

    Big Data is the core prediction, rather than machine learning

    Three large data transformation:

    1. In the era of big data, we can analyze more data, and sometimes can even handle all the data and a special phenomenon, rather than rely on random sampling.

    2. Research data so much that we are no longer interested in pursuing accuracy.

    3. The third change because change contributed to the first two, that we are no longer interested in seeking causal relationship, but a relationship.

Not a random sample, but all data

Use of all the data, rather than relying on just one small part of the data.

A man called John - Grunt British sewing supplies to propose a calculation method, London's population, a variety of statistical methods is later calculated when the plague.

1880 census is very time-consuming, took 8 years to complete data collection. 1890 is expected to take 13 years.

Xoom cross-border remittance services company with cross-border remittance unusual transactions alarm, see separate each transaction is legitimate, but it proved to be a crime in view of the fraud. The only way to discover abnormal is to re-check all the data to find out information of the sample analysis missed.

Who can imagine the importance of a person with many friends who are just not as good as a lot of contact with people outside the network of relationships in the network of relationships it? This indicates that either a group or a society, diversity is the additional value.

Not accuracy, but hybridity

In order to understand the general trends, we are willing to make some concessions accuracy.

MIT and deflation prediction software (originally with manual collection, high costs, results lag, collect data from the Internet using big data technology, data although very confusing, but the project soon found that in September 2008 collapse of Lehman Brothers lag by tightening the trend, two people who rely on official data to November was aware of the situation)

Hybridity, not trying to avoid, but standard ways

Overall sample =

Not a causal relationship, but the relationship

A performance comparison test critic created sales and computer-generated content generated about two very different results. Recommended by data products increased by almost 100 times sales, the computer may not know like Hemingway number of customers will buy Fitzgerald. But this is not important, it is important to sales.

In a particular location, the more people search for specific terms, in the region there are more people suffering from the flu.

Wal-Mart, please tarts with hurricane supplies put together

US discount retailer Target and pregnancy prediction. Browse, purchase history and related matter relations, analyze customer pregnancy

UPS and auto repair prediction

5569906-4aaf22bedd072304.png
image-20190524204427657.png

Anything can be quantified

HNA logs collected by large-scale set point, to develop a viable HNA map

https://books.google.com/ngrams Google data library, you can search frequency 1500 to 2008 to a word appears

Books, mood. . . Can be digitized

Inexhaustible innovation data

The three pillars of data, technology and thinking

Let the data dictate everything worries

Both responsibility and freedom of information management

Reproduced in: https: //www.jianshu.com/p/0e37146a3de6

Guess you like

Origin blog.csdn.net/weixin_34415923/article/details/91273394