Big Data Re1

words written in front

Since I entered the big data industry for more than two years, I have a lot of thoughts, so let me just say a few words. Similar to learning in all other computer-related industries, the learning of big data is also to find information on the Internet, watch videos, and then practice and explore by yourself. When encountering problems, ask bloggers and platform bosses to ask questions.

Most of the time, it is a person's burden to move forward. Not to mention, the information on the Internet is mixed, both good and bad, not only to separate the false from the true, but also to reflect each other. Although the collisions and questions between different viewpoints and interpretations, different understandings are rare, but It is indeed a big obstacle in the learning process. What level of concept and principle should be understood, where should be focused on analysis, and where can be easily understood, all of which are difficult for novices to grasp.

Feeling this, the author is determined to re-examine some concepts, frameworks, and principles of big data with the mindset of a beginner with the superficial experience of recent years, refer to textbooks and industry books, and organize them into a series of articles with my own understanding , not only to clear things up for myself, but also hope to give some little help to novices, so I am satisfied.

1. The era of big data

1.1 Data and information

First of all, the simplest understanding of big data is a large amount of data, so where does the large amount of data come from? We know that data is the result we get through observation, experiment or calculation. Unlike information, discrete data has little practical value.

Information is a macro concept, which generally refers to all the content of human social communication. In 1948, the mathematician Shannon pointed out that information is something used to eliminate random uncertainty. The scientific concept of information can be summarized as follows:

Information is a reflection of the state of motion and changes of various things in the objective world, a representation of the interrelationships and interactions between objective things, and the expression of the essence of the state of motion and changes of objective things.

1.2 Data generation method

It can be said that the transformation of data generation methods gave birth to the concept of big data. In general, the way human society generates data has roughly gone through the following three stages: the operational system stage, the user-generated content stage, and the perceptual system stage.

  • operational system phase

    The earliest large-scale management and use of data in human society began with the birth of the database. Supermarket sales system, Yonghang trading system, stock market trading system, etc., are all based on the database. The remarkable feature of this stage is that the data generation method is passive, and only when the actual enterprise business occurs, new data is generated and recorded in the database.

  • User Generated Content Phase

    With the emergence and development of the Internet, data dissemination is faster. In the Web 1.0 era, mainly represented by portal websites, the organization and provision of content is emphasized, and a large number of Internet users themselves do not participate in the generation of content. In the Web 2.0 era, with the popularization of the mobile Internet and smart phones, including the establishment of major self-service platforms, Internet users have gradually become producers of platform content, and the amount of data has begun to increase dramatically.

  • perceptual system stage

    The perception system stage is closely related to the development of the Internet of Things. The Internet of Things includes a variety of sensors and cameras. These devices generate a large amount of data every moment. Compared with the artificial data generation methods in the Web 2.0 era, The automatic data generation method in the Internet of Things will generate denser and larger amounts of data in a short period of time.

1.3 The concept of big data

So far, we can finally give a more appropriate explanation for the concept of big data, or four characteristics of big data: large data volume (Volume), various data types (Variety), fast processing speed (Velocity) and value Low density (Value).

  • Large amount of data (Volume)

    According to estimates made by the famous consulting organization Internet Data Center (IDC), the data generated by human society has been growing at a rate of 50% per year, that is to say, more than doubling every two years. Known as "big data Moore's Law". This means that the amount of data generated by humans in the past two years is equivalent to the sum of all the data generated before. In 2020, the world has a total of about 44ZB of data volume, compared with 2010, the data volume will increase by nearly 40 times.

  • Variety of data types (Variety)

    There are many data sources for big data, and scientific research, enterprise applications, and Web applications are continuously generating new types of data. Biological big data, traffic big data, medical big data, telecommunication big data, electric power big data, financial big data, etc. all show "blowout" growth, and the amount of data involved is huge, which has jumped from TB level to PB level . All walks of life, all the time, are generating all kinds of different types of data.

  • Fast processing speed (Velocity)

    In the era of big data, data is generated very fast. In the field of Web 2.0 applications, within 1 minute, Sina Weibo can generate 20,000 microblogs, Twitter can generate 100,000 tweets, Apple can generate 47,000 application download data, and Taobao can sell 60,000 products , Baidu can generate data on 900,000 search queries. The famous Large Hadron Collider (Large Hadron Collider, LHC) produces about 600 million collisions per second, generates about 700 MB of data per second, and thousands of computers are analyzing these collisions at the same time.

  • Low value density (Value)

    In the era of big data, data is generated very fast. In the field of Web 2.0 applications, within 1 minute, Sina Weibo can generate 20,000 microblogs, Twitter can generate 100,000 tweets, Apple can generate 47,000 application download data, and Taobao can sell 60,000 products , Baidu can generate data on 900,000 search queries. The famous Large Hadron Collider (Large Hadron Collider, LHC) produces about 600 million collisions per second, generates about 700 MB of data per second, and thousands of computers are analyzing these collisions at the same time. [1]

There is a huge amount of data, and people naturally want to use this data to transform the world. Information technology needs to solve the three core problems of information storage, transmission and information processing, which are also the main problems that big data must face. The capacity of storage devices is increasing, and the computing power of CPU is also increasing, but on the other hand, the higher the CPU, the greater the storage capacity means the higher the price, and the financial resources consumed are on the other hand, a single machine No matter how superior the performance is, there is always a performance bottleneck, and distributed technology emerges as the times require.

1.4 The concept of distribution

Distributed technology provides the possibility for the storage, transmission and calculation of massive data. Broadly speaking, compared to the traditional stand-alone architecture, the distributed architecture solves two major problems in Internet applications: high concurrency and high availability. These two difficulties are also the shortcomings of the stand-alone architecture: performance bottlenecks and single points of failure. For massive data, distributed storage stores data on hundreds of servers to meet the storage requirements of massive data; distributed computing provides fast data processing capabilities

[1] Lin Ziyu. Principles and Applications of Big Data Technology [M]. People's Posts and Telecommunications Press: Big Data Innovation Talent Training Series, 201701.301.

[2]Zhiwu Wang. God-Of-BigData[Z], -Project Series Articles

Guess you like

Origin blog.csdn.net/qq_60934240/article/details/127309365