big data storage

I. Cloud Storage
1. Concept: Cloud storage is a new concept extended and developed on the concept of cloud computing, and is an emerging network storage technology.
2. Features:
Reliability
Availability (multi-paths, controllers, different fiber networks, RAID technology, end-to-end architecture control/monitoring, and mature change management processes can all improve cloud storage availability)
Security
Standardization
Low
cost3 .Architecture
The cloud storage architecture can be divided into access layer, application interface layer, basic management layer and storage layer from top to bottom
write picture description here
4. Cloud storage technology
4.1 Storage virtualization
In a virtualized storage environment, servers and their application systems face all It is a logical image of a physical device, and does not change with the change of physical devices. It realizes the transparency of resources to system administrators, and makes it easier to manage and maintain resources while reducing the cost of building a storage system.
2) The virtualization of cloud storage virtualizes storage resources into a global namespace, and provides storage resources to users through multi-tenant technology. During this process, data can flow across nodes and data centers in the storage resource pool.
The global namespace has the following three main technical solutions
(1) Algorithm positioning
For this implementation, data access positioning is fast, but the algorithm is fixed.
(2) Namespace management The
implementation is simple, but the user needs to perceive the first-level directory, and cannot completely access data without perception.
(3) Dynamic subtree In
theory , it can solve the problem of massive data access, but because the algorithm is too flexible and the engineering implementation is difficult, Cephfs has not been commercialized until now.
Multi-tenancy technology
In cloud storage technology, multi-tenancy technology is to realize resource allocation, isolation and sharing among different users.
In most multi-tenant cloud storage systems, three levels of tenants, sub-tenants and users are used to allocate resources. Physical isolation is adopted between tenants, and sub-tenants under the same tenant are logically isolated and share physical devices. The user is the service terminal under the sub-tenant, and the logical isolation method is also adopted.
Virtualization Implementation Levels
According to different virtualization implementation locations, virtualization can also be divided into host-based virtualization, storage device-based virtualization, and storage network-based virtualization.
4.2 Distributed storage
(1) Distributed block storage
Block storage means that the server directly accesses data by reading and writing one or a segment of addresses in the storage space.
Advantages: high reading efficiency
(2) Distributed object storage
Object storage is a storage mode that provides Key-Value for massive data, which is a storage mode for finding data files by key value.
Advantages: It has high scalability and supports concurrent reading and writing of data. The interface is simple and suitable for processing massive and small data unstructured data.
Disadvantages: Random write operations of data are generally not supported.
(3) Distributed file system The
file storage system can provide a common file access interface to realize functions such as file and directory operations, file access, and file access control.
The current realization of distributed file system storage has two ways: the integration of software and hardware and the separation of software and hardware.
4.3 Data reduction
(1) Automatic thin provisioning
Use virtualization to reduce the allocation of physical storage space and maximize storage space utilization.
(2) Automatic storage tiering
Mainly used to help data centers minimize cost and complexity
(3) Data deduplication
Eliminate redundant data by removing duplicate data from the dataset and keeping only one copy.
4.4 Load Balancing
In cloud storage, in addition to the load balancing device that realizes dynamic and uniform DNS resolution at the network edge, there is also a load balancing mechanism within the system, that is, load balancing between node resources.
2. Big data storage
1. Features and challenges of big data storage
Capacity issues, latency issues, security issues, cost issues, data accumulation, flexibility
2. Storage system architecture
Direct Attached Storage (DAS) storage devices are directly connected to the host system Connection
Applicable environment: (1) The geographical distribution of servers is very scattered, and it is difficult to interconnect through SAN or NAS
(2) The storage system must be directly connected to the application server
(3) Small network
Disadvantages : poor scalability, low resource utilization, poor manageability, different Severely structured
Network- attached storage (NAS) uses a special device directly connected to a network medium to achieve data storage.
The physical storage device of NAS requires a dedicated server and a dedicated operating system.
Advantages: (1) Plug and play
(2) Dedicated operating system supports different file systems, which can support file sharing between different operating systems of application servers
(3) Optimized file system on dedicated server improves file access Efficiency
(4) Independent of the application server, data can still be read even if the application server fails or stops working
Disadvantages : (1) The shared network mode makes network bandwidth a bottleneck for storage performance
(2) NAS access requires file system format conversion , so it can only be accessed at the file level, which is not suitable for block-level applications.
Storage Area Network (SAN)
A storage area network refers to a network in which storage devices are connected to each other and to server farms, creating networked storage.
Basic components:
interfaces, connecting devices and communication control protocols
SAN supported functions: archive data archiving and retrieval, backup and recovery, data migration between storage devices, disk mirroring technology and data sharing between network servers, etc.
After the emergence of the iSCSI protocol, in order to Distinguish, SAN is divided into FC SAN and IP SAN
Defects of FC SAN: poor compatibility, high cost, poor scalability.
IP SAN has the following advantages:
high scalability, proven transmission equipment to ensure operational reliability, data centralization, low total cost of ownership, remote data replication and disaster recovery.
3. Emerging database technologies
(1) NoSQL
generally refers to non-relational databases.
Some techniques commonly used in NoSQL systems: simple data model, separation of metadata and application data, weak consistency
Advantages of NoSQL: avoid unnecessary complexity, high throughput, high-level scalability and low-end hardware clustering, avoid Expensive object-relational mapping.
Disadvantages of NoSQL: The data model and query language are not mathematically validated, do not support ACID features, have simple functions, and have no unified query model.
(2) NewSQL
NewSQL refers to such a new type of relational database management system. For OLTP (read-write) workloads, it seeks to provide the same scalability as NoSQL systems, and still maintains features such as ACID and SQL.
NewSQL mainly includes two. System-like: Owning relational database products and services and bringing the benefits of the relational model to a distributed architecture; or improving the performance of relational databases to the point where horizontal scaling is not a concern.
write picture description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325729803&siteId=291194637