Collection and processing of large data sharing

Large data from the collection, processing, to final landing to be commercialized, can benefit the general public of the solution, the product of a closed-loop chain. The so-called from the masses, to the masses. It is this chain closed, completed the industrialization big data.

The reason why big data concepts can suddenly exploded in the past three years, precisely because at this stage and the ability to handle large data collection undergone a sea change, making human society entered the era of big data applications popularity:

1, a large collection of data

Two large technology enables the collection of data became easier:

Cheap and deploy a variety of sensors coverage greatly increased. For example, we are most familiar with is around the camera around, less than 10 years, the city of any corner looking ahead to the all cameras.

Development of Internet technology. In fact, the computer is also a sensor, but its record of non-standard data formats and more diverse. With the great development of Internet technology, the terminal can access to the Internet cheaper, continuously improve the coverage in the crowd, so that we can have a sensor network covering most of the population. For example, where I Taobao, every day one hundred million levels of user access, shopping. In the traditional industrial age, we can never know a person in the supermarket what has been done, it is difficult to analyze each person in the supermarket to buy something (even though you have a cash register data). In this Internet era everyone with sensors, all acts are likely to be recorded, analyzed, to optimize your future experience (of course, may also be used for evil bad guys, like gunpowder mountains can be used to repair the dam can also be used to kill evil, the technology itself is amoral's).

2, the processing of large data

Inexpensive parallel computing solutions, such as mapreduce frame, MPI frame, the GPU computing. The new high-performance parallel computing endless. In the past, mass data storage laboratory and national projects to use computing power can now be cheap, scalable and easy to maintain, lease (cloud computing) get in the way.

Fight like a giant particle @ Lee Yang mentioned that the laboratory morphology of big data Collider, the actual representatives of that era we must first build a massive analysis of sensor cluster in order to do a big data, and then need to be very computer literate people use very expensive computer cluster to write a series of very few people can understand (there is almost no reuse) the code to be analyzed. And such experiments and analysis, or only to a more limited range of purposes and applications. (Of course, I do not express any disrespect to discover the Higgs particle, which is a great cause) and the industrial era of big data, means that modular assembly line, high reusability.

Sensor cluster is out there, a lot of users log can be Taobao, Baidu, Tencent, watercress, know almost any company a small investment of a few million (or equivalent resource) the acquisition, processing and analysis.

Ready to achieve a large database of the frame; packaged good data may be analyzed not difficult to learn a computer language and scripting tools for analysis package (such as SAS, R, HiveSQL, Hadoop etc.). And while there are familiar with business analysts and data analysis methods, product managers, developers quickly apply them to business development projects.

This creates a large data collection, processing, to final landing to be commercialized, can benefit the general public of the solution, the product of a closed-loop chain. The so-called from the masses, to the masses. It is this chain closed, completed the industrialization big data.

Authors strongly recommend reading the article:

Big Data engineers must master the open source tools summary

Big Data senior teach you how to read a large data core technology

Top Big Data engineers need to master the skills

8 big factor data, machine learning and artificial intelligence for future development

 

Guess you like

Origin blog.csdn.net/sdddddddddddg/article/details/91357787