A brief introduction of big data

Big Data Definition

"Big Data" is a large data capacity, fast speed to take the value of low density data set is the main feature, since the data itself is a huge scale, decentralized sources, in multiple formats, so they need a new architecture, technology, algorithms and analysis a method for acquisition, storage and correlation analysis, desirable to be able to extract therefrom information hidden valuable data.

4V characteristics of Big Data

Body mass (Volume): large amount of data, including collection, storage and calculation are very large. Start unit of measurement is at least large data P (1024 th T), E (100 million th T) or Z (10 billion th T). Facebook recently disclosed a set of data at a meeting in the headquarters can give you an initial impression, take a look how much data each day had to deal with it on Facebook: Share on Facebook Clause 2.5 billion 2.7 billion the number of "like" the the number of data uploaded 300 million photos a few 500 + TB 105TB newly generated every half hour by Hive scan data 100 + PB (1PB = 1024TB) disk capacity of a single HDFS (distributed file system) cluster As of 2012, data the amount has been (1024GB = 1TB) level jumped from TB to PB (1024TB = 1PB), EB (1024PB = 1EB) and even ZB (1024EB = 1ZB) level.

Many types (Variety) : to diversify the types and sources. Including structured, semi-structured and unstructured data, specific performance network logs, audio, video, pictures, location information, etc., many types of data processing capability of the data put forward higher requirements.

Speed (the Velocity) : fast growth of data, the processing speed is fast, time-critical requirements. For example, search engines require news a few minutes ago to user queries can be personalized recommendation algorithm requires real time as possible to complete the recommendation. This is different from the traditional big data mining significant feature.

Low density value (Value) : the relatively low value of the data density, or a wave in Sentosa but precious. With the widespread use of the Internet and the Internet of things, information perception everywhere, a flood of information, but a lower density value, and how to combine business logic with powerful data mining algorithms to the value of the machine, the era of big data is most needed to solve the problem.

 

Guess you like

Origin www.cnblogs.com/lfz0/p/11945674.html