The wisdom of big data processing 80% of people do not know

  Living in the era of data streaking, ordinary people are shouting how to protect their private data, black-hearted people are planning how to sell personal information, and caring people are thinking about how to deal with big data. The data processing is divided into several steps. to gain great wisdom.

  


  The smart way to complete the big data processing process:

  The first step is called data collection.

  First, there must be data. There are two ways to collect data:

  The first way is to take it. Professionally speaking, it is called crawling or crawling. Search engines, for example, do just that: it downloads all the information on the Internet into its data center, and you can search it out. For example, when you go to search, the result will be a list. Why is this list in the search engine company? It is because he took down all the data, but if you click a link, the website is no longer in the search engine company. . For example, there is a news in Sina, you use Baidu to search it out, when you don't click, the page is in Baidu's data center, and the page that comes out after one click is in Sina's data center.

  The second way is push, there are many terminals that can help me collect data. For example, the Xiaomi Mi Band can upload your daily running data, heartbeat data, and sleep data to the data center.

  The second step is the transmission of data.

  It is usually done by queue, because the amount of data is too large, and the data must be processed to be useful. But the system couldn't handle it, so we had to queue up and deal with it slowly.

  The third step is the storage of data.

  Now data is money, and mastering the data is equivalent to mastering the money. Otherwise, how does the website know what you want to buy? It is because it has the data of your historical transactions. This information cannot be given to others and is very valuable, so it needs to be stored.

  The fourth step is data processing and analysis.

  The data stored above is raw data. Most of the raw data is disorganized and contains a lot of junk data. Therefore, it needs to be cleaned and filtered to obtain some high-quality data. For high-quality data, analysis can be performed to classify the data, or to discover the interrelationships between the data and obtain knowledge.

  For example, the rumored story of beer and diapers in Walmart supermarket is based on the analysis of people's purchase data, and found that when men usually buy diapers, they will buy beer at the same time. Knowledge, then applied to practice, getting the beer and diaper counters very close to get wisdom.

  The fifth step is the retrieval and mining of data.

  Retrieval is search, the so-called foreign affairs indecision ask Google, internal affairs indecision ask Baidu. Both internal and external search engines put the analyzed data into the search engine, so when people want to find information, they can search for it.

  The other is mining. Just searching out can no longer meet people's requirements. It is also necessary to dig out the mutual relationship from the information. For example, in financial search, when searching for a company's stock, should the company's executives also be discovered? If you just search for the company's stock and find that it has risen very well, so you go to buy it. The executive issued a statement, which was very unfavorable for the stock, and it fell the next day. Wouldn’t this harm the majority of investors? Therefore, it is very important to mine the relationship in the data through various algorithms to form a knowledge base.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325986212&siteId=291194637