Nanjing University of Posts and Telecommunications-Cloud Computing Technology and Big Data Final Exam (Summary of Knowledge Points 1)

1. Overview of cloud computing technology

1. The origin and technical characteristics of cloud computing

1.1 Definition of Cloud Computing

●Cloud computing is a virtualized and highly available computing platform with a dynamic resource pool, borrowing the concept of "Electron Cloud" from quantum physics, emphasizing the diffuse and ubiquitous distribution characteristics of information processing
● Computing tasks are distributed in a large number On the resource pool formed by computing nodes, various application systems can obtain computing power, storage space and data services on demand

1.2 Technical characteristics of cloud computing

●The hardware infrastructure is built on a large-scale cheap server cluster.
●Applications and underlying services are developed in collaboration to maximize the use of resources.
●Redundancy of multiple cheap servers makes the system highly available.

[Tencent Cloud owns millions of servers, tens of millions of hard drives have an annual disk failure rate of 2%, and hundreds of hard drives fail every day]

1.3 Cloud computing system architecture

Insert picture description here

1.4 Internet ecology with cloud as the core

Insert picture description here

1.5 Smart City with Cloud as the Core

Insert picture description here

1.6 Cloud computing levels and types

Insert picture description here

1.7 Advantages of Cloud Computing Technology

Virtualization, distributed, parallel computing, mass storage,
desktop applications, resource scheduling, security

2. Large-scale cloud computing data center

2.1 Definition of Data Center

●Wikipedia: A data center is a complex set of facilities. It not only includes computer systems and other supporting equipment (such as communication and storage systems), but also includes redundant data communication connections, environmental control equipment, monitoring equipment, and various safety devices.

3. Cloud computing and other popular technologies

3.1 Cloud computing and virtualization

● Wikipedia's definition of
virtualization. Virtualization is an abstract method of representing computer resources. Through virtualization, the abstracted
resources can be accessed in the same way as the resources before the abstraction. The abstract method of this kind of resource is not restricted by the realization, geographical location or the physical configuration of the underlying resources
. The three-layer meaning of
virtualization. The object of virtualization is a variety of resources
. The logical resources after virtualization are hidden from users. Unnecessary details
Users can implement their functions in the real environment in the virtual environment

Virtualization type
●Network virtualization
●Storage virtualization
●Desktop virtualization
●Server virtualization
●System virtualization
●Others
(1) System virtualization
●System virtualization: Use virtualization software to virtualize one on a physical machine Or multiple virtual machines (Virtual Machine, VM)
●The virtual operating environment needs to provide a virtual hardware environment for the virtual machine running on it, including virtual CPU, memory, I/O devices and network interfaces
(2) Server virtualization
Server virtualization is the application of system virtualization to the server, the
integration of the server, and then virtualize a number of servers according to demand
(3) Desktop virtualization
Desktop virtualization decouples the user's desktop environment from the terminal equipment used
●Advantages: Desktop virtualization, the original terminal data resources and even the operating system are transferred to the server in the back-end data center, and the front-end terminal is transformed into a lightweight client with display-oriented and computing-assisted

3.2 Cloud Computing and Big Data

●Cloud computing is a supporting relationship for big data
●Cloud computing emphasizes computing and storage capabilities
●Big data requires the ability to handle big data
(acquisition, cleaning, conversion, storage, analysis, statistics, etc.)

3.3 Cloud Computing and Blockchain

●The essence of the blockchain is a non-tamperable distributed database under a peer-to-peer network. The blockchain
uses a certain consensus algorithm to ensure the consistency of data between nodes, and uses an encryption algorithm to ensure data security. At the same time, it uses time stamps and Hash The value forms an end-to-end chain structure, creating a set of open, transparent, verifiable, non-tamperable, and traceable technical system
●Blockchain technology features: large scale, high throughput, high latency, global transactions The mechanism requires the support of multiple nodes in the data center, which disperses the probability of large-scale downtime.
The advantages of cloud service providers in providing blockchain technology are mainly in three aspects: cost efficiency, application ecology, and security and privacy.
●By combining with cloud service providers , Blockchain technology can be integrated, packaged and delivered, laying
the foundation for application landing

3.4 Cloud Computing and Virus Defense

Cloud computing and virus defense
●The basic idea of ​​cloud security: Many for one/ many
●Use client-side probes to collect samples
●The more customers, the more reliable the security analysis based on the collected samples, the more reliable the virus response and the more timely
●Small size , detection and killing Virus stronger

4. Important commercial cloud computing platform

4.1 Google Cloud Computing

1 Google File System GFS (must read article)
Insert picture description here

2Distributed data processing MapReduce (required reading)
●Map function-perform specified operations on the original data. Each Map operation is for different original data.
Map and Map are independent of each other and can be fully parallelized
●Reduce operation-for each The intermediate results generated by the Map are merged. The intermediate results of the Map processed by each Reduce
do not cross each other. The final results generated by all Reduces are simply merged to form a complete result set.

3 Distributed structured data table Bigtable (must read)
4 Distributed lock service Chubby
5 Distributed storage system Megastore
6 Large-scale distributed system monitoring architecture Dapper
7 Mass data interactive analysis tool Dremel
8 Memory big data analysis system PowerDrill
9 Google Application Engine

4.2 Amazon Cloud Computing

●Amazon S3 is an object storage service that provides industry-leading scalability, data availability, security and performance
●Customers of all sizes and industries can use it to store and protect various use cases (such as websites, mobile applications)
applications, backup and restore, archiving, enterprise applications, loT equipment and big data analysis) of any
number of data

5. Representative open source cloud computing platform

● Hadoop: Google cloud computing open source implementation
● OpenStack: Cloud Platform management of the project, NASA and Rackspace collaborative R & D
● Eucalyptus: Amazon cloud computing open source implementation
● Cassandra: a combination of distribution technology Dynamo and Google's BigTable data model, a high
degree of Scalable, eventually consistent, distributed structured key-value storage system.
Enomaly ECP: Provides a cloud computing framework similar to
EC2Nimbus: Based on the grid middleware Globus, it provides functions and interfaces similar to EC2

2. Cloud operating system OpenStack

2.1 Introduction to OpenStack

OpenStack is the most popular open source cloud platform management project today. Many enterprises and organizations use OpenStack to support the rapid deployment of their new products, reduce costs, and upgrade their internal systems. And service providers use OpenStack to provide customers with reliable and easily accessible cloud infrastructure resources.
Vision: To provide all public cloud and private cloud providers with an open source cloud computing platform that can meet any of their needs, is easy to implement, and can be scaled on a large scale.

2.2 Introduction to OpenStack Components

(1)Nova

●Nova is the code name of computing service and one of the earliest OpenStack components-, it manages the computing resources of OpenStack.
●Nova can be said to be a set of virtualization management programs. Nova can create, delete virtual machines, restart virtual machines, etc. The reason why Openstack can build a cloud platform is also because it can create virtual machines.

●Nova-API provides unified external standardized interfaces. Accepts and responds to end user Compute API requests, and also realizes communication with other Openstack logic modules.
●Nova-conductor
before the G version, nova-compute directly interacts with the database, which will cause security problems, after the G version, nova-conductor acts as an agent

●Nova-scheduler
will select a computing node from the computing resource pool according to a certain algorithm to start a new VM instance (using multiple filters or algorithm scheduling)

●Nova-volume
generally runs on storage nodes (similar to the role of Agent), and mainly performs volume-related functions, such as creating volumes, binding volumes for VMs, or unbinding volumes.

(2)RabbitMQ

●OpenStack software modules realize information communication through AMQP protocol. In OpenStack, each
service interacts through messages.
●RabbitMQ is an architectural pattern that handles message verification, message conversion, and message routing. It coordinates
information communication between applications, minimizes the mutual awareness between applications or software modules, and effectively realizes decoupling.
●RabbitMQ is suitable for deployment in a large-scale system environment with flexible topology and easy expansion, effectively ensuring the timeliness of message communication between different modules, different nodes, and different processes; moreover, RabbitMQ's unique cluster HA security assurance capability can realize the information hub center At the same time, a single node has message recovery capability. When the system process crashes or the node is down, the message queue being processed by RabbitMQ will not be lost. After the node restarts, the communication can be restored in time according to the status data and information data of the message queue .

(3)Glance

●Glance is an OpenStack image service used to register, log in and retrieve virtual machine images. The Glance service provides a RESTAPI that enables users to query virtual machine image metadata and retrieve the actual image.
●The virtual machine image provided by the image service can be stored in different locations, from simple file system object storage to similar OpenStack object storage systems.
Glance function
●Mirror registration and query
●Role-based access control
●Support multiple image formats (raw, qcow2)
Support multiple storage types (S3, Swift, File system, etc.)

(4)Swift

Features:
➢Reliable object storage
➢No single point of failure
➢Support S3 API
➢Massive object secure storage
➢Large file storage
➢Data redundancy management

Swift mainly solves the problem.
Object storage overcomes the shortcomings of NAS (poor scalability) and SAN (not easy to share data securely), and combines
the advantages of both, namely: simultaneous high-speed direct access to SAN and data sharing of NAS, etc. Advantages,
providing a storage architecture with high performance, high reliability, cross-platform and secure data sharing.
➢ Large file storage
➢ Data redundancy management
● Not a file system. swift uses REST API instead of traditional file operation
commands, such as open(), read(), write(), seek(), and close().
●Does not support "file lock"
●No file directory Structure
● Not a database. Swift uses the concept of account-container-object to store objects, which can
list the objects in the specified container
●Cannot be used as a block device for virtual machines

Use of Swift
●As a storage service of laaS ●Connect
with OpenStack Compute to store images for it
●Document storage
Store data that needs to be stored for a long time, such as log
storage website pictures, thumbnails
to list objects in a specified container
●Cannot be used as a block device Provided to virtual machines

The difference between Swift and HDFS
●HDFS uses a central system to maintain file metadata (NameNode), while in Swift, metadata
is distributed and replicated across clusters.
●Swift considers multi-tenancy when designing, and HDFS does not have the concept of multi-tenancy.
●HDFS is optimized for large files, and Swift is designed to store files of any size.
●Files in HDFS are written once, and only one file
can be written at a time; while in Swift, files can be written multiple times. In a concurrent operation environment, the most recent operation shall prevail.
●HDFS is written in Java, while Swift is written in Python.

(5)Cinder

●Cinder (BlockStorage) block storage module, which provides permanent block storage volumes for virtual machines, and manages the creation, mounting and unmounting of block devices to virtual machines.
●The expansion of nova-volume demand is a centralized block storage service separated from Nova. Cinder is to increase the storage space of the virtual machine.
●The Cinder component architecture is a mirror image copy of Nova's architecture.

The difference between Cinder and Swift
Cinder is a block storage, which is used to attach an extended hard disk to a virtual machine, which is to attach the volume created by cinder to the virtual machine. Cinder
is a part of the persistent block storage function (Nova-Volume) previously in Nova separated from OpenStack to F version, independently as a new component Cinder ●swift is a system that can be uploaded and downloaded, and generally stored in it It is the content that is not frequently modified, such as for storing VM images, backups and archives, and smaller files, such as photos and email messages. More inclined to systematic management.

(6)Neutron

●Neutron is one of the core projects of OpenStack, providing virtual network functions in the cloud computing environment.
●In the early OpenStack version (before the Folsom version), there is no
Neutron/Quantum component, and the network functions are implemented in Nova, namely nova-network, which provides a simple Linux bridge mode and VLAN network structure.
●With the increasing demand for OpenStack, the functions of nova-network cannot meet this requirement, so Neutron came into being.
●Using Neutron components, one or
more private networks can be created for the project in OpenStack . These networks are logically isolated from the networks of other users, even if different private networks are isolated in a project.

Neutron service network management
●Fixed-IP: assigned to virtual machine instances for communication between tenant instances
●Floating IP: public IP address, used for communication between the instance and the outside or the Internet.
●Flat mode
The simplest networking mode that does not use VLANs and only supports one network. You need to manually create a bridge device on each node, and each instance receives a fixed IP from the pool.
●FlatDHCP mode. In
this mode, you need to start A DHCP server assigns a fixed IP address to the VM, except that it is basically the same as the first mode. Each instance receives a fixed IP from the pool
●VLAN mode
Each project has its own VLAN, Linux bridge, and DHCP server. All VMs belong to the same VLAN and are connected to the same bridge.

(7)Keystone

Keystone Openstack as the core module, provides a unified - and complete
penStack authentication, directory services, tokens, access policy services for Nova (Total
count), Glance (mirror), Swift (object store), Cinder (block storage ),
Neutron (network) and Horizon (Dashboard) provide authentication services.
Keystone has two key functions:
➢User management: control, manage and track user access rights
➢Service catalog management: provide service access URL

●User (User) The
user represents the person or program that can be accessed through Keystone, and Users are verified through authentication information (credentials, such as passwords, API Keys, etc.).
●Tenant (Tenant)
Tenant can be regarded as a project, group or organization, which is a collection of resources that can be accessed in each service.
●Role (Role)
represents the resource permissions that group users can access, such as the virtual machine in Nova , Mirror in Glance.
●Service (Service)
services such as Nova, Glance, Swift. Here are usually some different names used to represent different services.
●Endpoint Endpoint
is the URL of a service. If you need to access a service, you must know its endpoint.
●Token. (Token)
Token is the key to access resources. It is the return value after Keystone verification, and only the Token value needs to be carried in subsequent interactions with other services. Each Token has a validity period, and the Token is only valid during the validity period.

(8)Horizon

●Horizon is a web control panel used to manage and control OpenStack services. It can manage instances, images, create key pairs, add volumes to instances, and operate Swift containers. In addition, users can also use the terminal (console) or VNC to directly access the instance in the control panel.

Functions:
●Instance management: create and terminate instances, view terminal logs, VNC connections, add volumes, etc.
●Access and security management: create security groups, manage key pairs, and set floating IP.
●Preferences: Different preferences can be set for virtual hardware templates.
●Mirror management: Edit or delete mirrors.
●User management: create users, etc.
●Volume management: Create volumes and snapshots.
●Object storage processing: create and delete containers and objects.

2.3 Steps to configure the request flow of an instance in OpenStack:

1. Dashboard or CLI sends an authentication request to Keystone.
2. Keystone authentication certificate, generate and return token_id and serverCatalog (including various API service addresses nova-api, glance-
api, cinder-api, etc.)
3. Dashboard or CLI sends a new instance request to nova-api.
4. Nova-api receives the request and verifies whether the token is legal or not with Keystone.
5. Keystone verification token
6. Nova-api interacts with nova-database.
7. Create an instance in the database.
8. Nova-api sends an rpc.call request to nova-scheduler, hoping to get the physical machine number of the instance installation.
9. Nova-scheduler gets the request from the message queue.
10. nova-scheduler interacts with nova-database, filters first and then calculates the weight, and selects suitable physical machines.
11. Return the appropriate physical machine number.
12. Nova-scheduler sends an rpc.cast request to nova-compute to create a new instance on the appropriate physical machine.
13. Nova-compute gets the request from the message queue.
14. Nova-compute sends an rpc.call request to nova-conductor to obtain instance-related information: host number, CPU, Disk.
15. Nova-conductor obtains the request from the message queue.
16. Nova-conductor interacts with nova-database.
17. Return instance information.
18. Nova-compute obtains instance information from the message queue.
19. Nova-compute sends a request to glance-api,
obtains the URL of the mirror according to the ID number , and downloads the mirror.
20. The glance-api verifies the token with Keystone.
21. Nova-compute obtains image metadata.
22. Nova-compute sends a request to the Network API to configure nova-com pute
23 and quantum-api to verify tokens with Keystone.
24. Nova-compute obtains network information.
25. Nova-compute makes a request to the Volume API to bind instance volumes.
26. Cinder-api verifies the token with Keystone.
27. Nova-compute obtains block storage information.
28. The virtual machine runs on the Hypervisor.

Guess you like

Origin blog.csdn.net/qq_42005540/article/details/108307607