Big Data Technology Principles and Application Study Notes Chapter 1

Golden combination access address: http://dblab.xmu.edu.cn/post/7553/

1. "Big Data Technology Principles and Applications" textbook 

Official website: http://dblab.xmu.edu.cn/post/bigdata/

2. Big data software installation and programming practice guide

Official website "Big Data Technology Principles and Applications" compiled by Lin Ziyu and supporting the practical guide for big data software installation and programming_Xiamen University Database Laboratory

3. Lesson Preparation Guide

Official website: Teacher's Lesson Preparation Guide for "Principles and Applications of Big Data Technology" compiled by Lin Ziyu_Xiamen University Database Laboratory

4.Lecture videos

Official website: Lin Ziyu lectures on entry-level big data online courses_Xiamen University Database Laboratory

5.Experimental Guide

Official website: Computer room experiment guide - "Big Data Technology Principles and Applications" compiled by Lin Ziyu_Xiamen University Database Laboratory

6. Electronic books

Official website: Big data course teacher exchange group_Xiamen University Database Laboratory

7.Spark Getting Started Tutorial

Official website: Ziyu Big Data's Spark introductory tutorial (Scala version)_Xiamen University Database Laboratory Blog

8. Big Data Course Experiment Case " Analysis of Website User Shopping Behavior "

Official website: Big Data Course Experiment Case: Website User Behavior Analysis (Free Sharing)_Xiamen University Database Laboratory 

Chapter 1 Big Data Basics 

This article includes 2 chapters. Chapter 1 introduces the concept and application of big data, and analyzes the relationship between big data, cloud computing and the Internet of Things; Chapter 2 introduces the big data processing architecture Hadoop.

 Chapter 1 Big Data Overview

1.1 Big data era

1.1.1 The third wave of informatization

1.1.2 Information technology provides technical support for the big data era

  1. Storage device capacity continues to increase
  2. CPU processing power greatly improved
  3. Network bandwidth continues to increase

1.1.3 Changes in data generation methods have led to the advent of the big data era

The way data is generated in human society has roughly gone through three stages: operational system stage, user-generated content stage and perceptual system stage.

1.1.4 The development history of big data

The development process of big data can generally be divided into three important stages: embryonic stage, mature stage and large-scale application stage.

1.2 The concept of big data

Big data is not just the "massification" of data, but also includes multiple attributes such as "rapidity", "diversification" and "value".

The 4 characteristics of big data include 4 levels: large data volume (Volume), various data types (Variety), fast processing speed (Velocity) and low value density (Value)

1.2.1 Large amount of data

  • According to estimates made by IDC , data has been growing at a rate of 50% per year , which means it doubles every two years (Moore's Law of Big Data)
  • The amount of data humans have generated in the past two years is equivalent to the entire amount of data generated before
  • It is expected that by 2023 , the world will have a total of 11.7 ZB of data, which will more than double the forecast

1.2.2 Various data types

  • Big data is composed of structured and unstructured data
    10% of structured data is stored in the database
    90% of unstructured data, which are closely related to human information
  • Scientific research: Genome; LHC accelerator; Earth and space exploration
  • Enterprise applications: Email, documents, files; application logs; transaction records
  • Web 1.0 data: text; images; video
  • Web 2.0 data: query logs/clickstream; Twitter/Blog/SNS; Wiki
  • The time window from data generation to consumption is very small, leaving very little time available to generate decisions.
  • 1 second rule: This is also fundamentally different from traditional data mining technology

1.2.3 Fast processing speed

 Many applications in the big data era require real-time analysis results based on rapidly generated data to guide production and life practices. Therefore, the speed of data processing and analysis usually needs to reach second-level response.

1.2.4 Low value density

Low value density and high commercial value

  Take monitoring as an example. During continuous monitoring, the data that may be useful is only a second or two, but it has high commercial value.

 1.3 Impact of big data

  •  Dr. Jim Gray , Turing Award winner and famous database expert, observed and concluded that since ancient times, human beings have gone through four paradigms in scientific research: experiment, theory, calculation and data.
  • In terms of the way of thinking, big data completely subverts the traditional way of thinking: full sample rather than sampling; efficiency rather than precision; correlation rather than causation.
  • In terms of social development, big data decision-making has gradually become a new way of decision-making. Big data applications have effectively promoted the in-depth integration of information technology and various industries. Big data development has greatly promoted the continuous emergence of new technologies and new applications.
  • In terms of the job market, the rise of big data has made data scientists a popular profession
  • In terms of talent training, the rise of big data will greatly change the existing teaching and scientific research system of information technology-related majors in Chinese universities.
  • Big data is everywhere, and all walks of life, including finance, automobiles, retail, catering, telecommunications, energy, government affairs, medical care, sports, entertainment, etc., have been imprinted with big data.

 

1.4 Application of big data

1.5 Key technologies of big data

From the perspective of the whole process of data analysis, big data technology mainly includes data collection and preprocessing, data storage and management, data processing and analysis, data security and privacy protection, etc.

Two core technologies of big data:

  1. Distributed storage: GFS\HDFS; Big Table\HBase; NoSQL (key value, column family, graph, document database); NewSQL (such as: SQL Azure)
  2. Distributed processing: MapReduce

1.6 Big data computing model

MapReduce is a big data processing technology that everyone is familiar with. When people mention big data, they will naturally think of MapReduce, which shows its wide influence. In fact, the problems of big data processing are complex and diverse, and a single computing model cannot meet different types of computing needs. MapReduce is actually just one of the big data computing models. It represents a batch processing technology for large-scale data. In addition, there are various big data computing models such as query analysis computing, graph computing, and stream computing.

1.7 Big data industry

The big data industry refers to the collection of all corporate economic activities related to supporting big data organization management and value discovery.

1.8 The relationship between big data, cloud computing and the Internet of Things

Cloud computing, big data and the Internet of Things represent the latest technological development trends in the IT field. The three complement each other and are both connected and different.

1.8.1 Cloud Computing

1. Cloud computing concept

Cloud computing realizes the provision of scalable and cheap distributed computing capabilities through the network. Users only need to be in a place with network access conditions to obtain the various IT resources they need anytime and anywhere.

2. Key technologies of cloud computing

Key cloud computing technologies include: virtualization, distributed storage, distributed computing, multi-tenancy, etc.

3. Cloud computing data center

  • A cloud computing data center is a complex set of facilities, including blade servers, broadband network connections, environmental control equipment, monitoring equipment, and various security devices.
  • Data center is an important carrier of cloud computing, providing computing, storage, bandwidth and other hardware resources for cloud computing, and providing operating support environment for various platforms and applications.
  • Promoting the construction of data centers across the country

4. Cloud computing applications

  • Applications such as public security management, disaster recovery and backup, urban management, emergency management, intelligent transportation, and social security can be deployed on the government cloud. Through intensive construction, management, and operation, information resource integration and government resource sharing can be achieved, and innovation in government management can be promoted . , accelerating the transformation to a service-oriented government
  • The education cloud can effectively integrate high-quality educational resources such as early childhood education, primary and secondary education, higher education, and continuing education, and gradually achieve goals such as educational information sharing, educational resource sharing, and in-depth mining of educational resources.
  • Small and medium-sized enterprises can enable enterprises to establish financial, supply chain, customer relationship and other management application systems at low cost, greatly reducing the threshold of enterprise informatization, rapidly improving the level of enterprise informatization, and enhancing the market competitiveness of enterprises. 
  • The medical cloud can promote service sharing between hospitals, hospitals and communities, hospitals and emergency centers, hospitals and families, and form a new medical and health service system, thereby effectively improving the quality of medical care.

5. Cloud computing industry

As a strategic emerging industry, the cloud computing industry has developed rapidly in recent years and formed a mature industrial chain structure. The industry covers hardware and equipment manufacturing, infrastructure operations, software and solution providers, and Infrastructure as a Service (IaaS ) . , Platform as a Service ( PaaS ), Software as a Service ( SaaS ), terminal equipment, cloud security, cloud computing delivery / consulting / certification, etc.

 

1.8.2 Internet of Things

1. Internet of Things Concept

The Internet of Things is the Internet where things are connected. It is an extension of the Internet. It uses communication technologies such as local networks or the Internet to connect sensors, controllers, machines, people and things in new ways to form people and things, things and things . Connecting things to realize informatization and remote management and control

2. Key technologies of the Internet of Things

Key technologies in the Internet of Things include identification and perception technology (QR code, RFID , sensors, etc.), network and communication technology, data mining and fusion technology, etc.

3. Internet of Things applications

The Internet of Things has been widely used in smart transportation, smart medical care, smart homes, environmental monitoring, smart security, smart logistics, smart grids, smart agriculture, smart industry and other fields, and has played an important role in promoting the development of the national economy and society.

4. Internet of Things Industry

The complete IoT industry chain mainly includes six major links: core sensing device providers, sensing layer terminal equipment providers, network providers, software and industry solution providers, system integrators, operations and service providers.

1.8.3 The relationship between big data, cloud computing and the Internet of Things

Cloud computing, big data and the Internet of Things represent the latest technological development trends in the IT field. The three are both different and related.

1.9 Summary of this chapter 

  • This chapter introduces the development history of big data technology, and points out that the continuous progress of information technology provides technical support for the big data era, and the changes in data generation methods contribute to the advent of the big data era.
  • Big data has the characteristics of large data volume, various data types, fast processing speed, and low value density, collectively referred to as "4V". Big data has had an important impact on scientific research, ways of thinking, social development, the job market, and talent training. A deep understanding of these impacts of big data will help us better grasp the direction of learning and applying big data.
  • Big data has been increasingly widely used in all walks of life, including finance, automobiles, retail, catering, telecommunications, energy, government affairs, medical care, sports, entertainment, etc., profoundly changing our social production and daily life.
  • Big data is not a single data or technology, but a combination of data and big data technology. Big data technology mainly includes data collection, data storage and management, data processing and analysis, data security and privacy protection, etc.
  • The big data industry includes the IT infrastructure layer, data source layer, data management layer, data analysis layer, data platform layer and data application layer. A number of market-leading technologies and companies have been formed at different levels.
  • This chapter finally introduces the concepts and key technologies of cloud computing and the Internet of Things, and explains the differences and connections between big data, cloud computing and the Internet of Things.

Guess you like

Origin blog.csdn.net/m0_62110645/article/details/132652450