003 Big Data 4V Features

Today's sharing: big data

Let's understand the past and present of big data together:

One: the definition of big data

The term “big data” was proposed by Americans in the 1980s. It was not until September 2008 that the “Science” magazine published an article “Big Data: Science in the Petabyte Era” that the term “big data” began. Spread widely.

The editor found the following definitions of big data from the Internet:

1:Wikipedia:

Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.

That is to say, large and complex data sets that cannot be processed by traditional data processing software

2:IBM:

Big Data is being generated at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Much of this data is coming to us in an unstructured form, making it difficult to put into structured tables with rows and columns.

It mainly emphasizes that big data is generated all the time, through the common social media, wearable devices, sensors, etc., and also emphasizes its unstructured characteristics, which are difficult to store in a structured database.

3: The world-renowned consulting company McKinsey defines it like this:

Big data refers to a collection of data whose content cannot be collected, stored, managed, and analyzed with traditional database software tools within a certain period of time.

The above definitions have intersections, which is the general definition of big data: the amount is large and complex, and it cannot be processed by traditional means.

Let's talk about the characteristics of big data, which will be more conducive to understanding

Two: the characteristics of big data

There are also many sayings about the characteristics of big data:

1:The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs:

This 3V feature is the first mainstream statement:

(1)Volume(数据体量大). Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would've been a problem – but new technologies (such as Hadoop) have eased the burden.

There are many sources of data, such as commercial transaction terminals, social media, and sensors, to name a few familiar examples, such as supermarket cash registers, pose machines, RFID handheld terminals, social media such as QQ, WeChat, Weibo, etc., and some motion sensors , Bracelets, etc., these devices are sources of massive data. In the past, there was no such data, but now there is, but the previous data processor processing software has not kept up with these needs, and new equipment is necessary, such as the Hadoop mentioned in the article [Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput (high throughput) to access application data, suitable for those with large data sets (large data sets). set) application.

(2)Velocity(处理速度快). Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.

Fast processing speed mainly emphasizes the timeliness of data, which mainly refers to the timeliness of streaming data, that is, when data is generated in the previous second, the impact of these data must be analyzed in the next second.

(3)Variety(数据种类多). Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.

Many types mainly emphasize the classification of data: structured, unstructured, and another category is semi-structured. Regarding the difference between structured and unstructured, I have consulted teachers in the class during graduate school. The specifics are as follows: structured data (that is, row data, stored in a database, and two-dimensional table structure can be used to logically express the realized data (two) The dimensional table structure can be understood as an Excel table, and a value can be determined by coordinates)), unstructured data, including all formats of office documents, text, pictures, XML, HTML, various reports, images, and audio/video information and many more

2: In the later development process, the characteristics of big data have developed

(1) Value (low value) can be understood as follows: the traffic lights at a certain traffic intersection are basically monitored around the clock throughout the year. The amount of data generated is very large, and what is really useful for officials may be accidents, The time of red light incidents, so its value density is very low.

(2) Veracity (authenticity) mainly includes data credibility, authenticity, source and reputation, validity, etc.

The 5V features mentioned in 1 and 2 above basically summarize all the features of big data. Of course, it does not deny that scholars will propose later.

3: The above 5-dimensional features may be mainly popular in academia. I came into contact with these theories when I read the paper. Until today, when I went to Wikipedia on the Internet, I discovered that there is another feature:

(1)Factory work and Cyber-physical systems may have a 6C system:

(2) Connection (sensor and networks) relevance

(3) Cloud (computing and data on demand) features of cloud computing

(4) Cyber ​​(model and memory) network characteristics

(5) Content/context (meaning and correlation) text features

(6) Community (sharing and collaboration)

(7)Customization (personalization and value)独特性

This feature method may be more suitable for industrial and physical cyberspace

I hope that the above content sharing will help everyone understand the concept of big data

Guess you like

Origin blog.csdn.net/lyw5200/article/details/109407469