All this time! Multi-faceted comparison of advantages and disadvantages of distributed storage and traditional SAN and NAS

640?wx_fmt=png&wxfrom=5&wx_lazy=1

Create the first interactive community of open source cloud computing in China

Content focused on Linux, Kubernetes, OpenStack, Containers, Ceph, Cloud Foundry...

Guided reading


With the rapid development of the information industry, the amount of data generated by all walks of life has also shown an explosive growth. The global data volume has always followed the rules of Moore's Law: more than doubling every 18-24 months, and this The phenomenon has been going on for half a century.


Taking medical data as an example, if the diversification and sophistication of medical means are not considered, the mere improvement of the precision of medical instruments can lead to a 10% increase in medical data every year.


It can be seen that the data storage industry is also faced with unprecedented opportunities and challenges. How to solve the demand for more, faster and more stable data storage, traditional centralized storage and distributed storage have their own strengths.


Distributed storage undoubtedly has longer life cycle management in the entire storage solution, and can also achieve smoother iteration of storage media.


It has significant advantages for long-term storage of massive data and high concurrency application scenarios. With the continuous maturity of software-defined storage technology, traditional centralized storage has been hit unprecedentedly, followed by a game of budget and value. For the mid-to-low-end market, traditional storage manufacturers continue to compromise on price. And has gradually gained a price advantage.


At present, in financial and individual high-end storage application scenarios, traditional storage still has an overwhelming advantage. The specific future of old and new storage in such storage application scenarios remains to be verified by time.


The emergence of hyper-convergence has brought distributed storage back to the fore. In essence, the hyper-convergence method solves the problem of rational utilization of resources. During the separation of storage and virtualization platforms, customers have always hoped that they could Under the same platform, the entire computing resource pool and storage resource pool are unifiedly managed and controlled, and the entire IT operation and maintenance can be prevented before it occurs, and the context is clear and everything is under control.


Hyper-convergence undoubtedly solves the pain point at this level. In the future, the data storage industry will continue to maintain rapid development. It is believed that the injection of intelligence, automation and humanized genes will continue to push this industry into a better future.


——Special commentator of Open Source Cloud Chinese Community

Huayun Internet Li Jinbo


Reprint information

All relevant views in this article are reproduced from: talkwithtrend

@baimmi China UnionPay Co., Ltd.

Traditional SAN storage devices generally use a dual-controller architecture, which is used as a backup for each other. Two switches are configured to connect to the front-end server. This dual-controller architecture has the following two disadvantages:


1. Network bandwidth can easily become the bottleneck of the entire storage performance;

2. If a controller is damaged, the performance of the system will be greatly reduced, affecting the normal use of storage.


The limitations of traditional storage architectures are mainly reflected in the following aspects:


1. Poor horizontal scalability


Limited by the external service capability of the front-end controller, scaling up the number of disks cannot effectively improve the external service capability of the storage device. At the same time, the horizontal expansion capability of front-end controllers is very limited, and the industry can only achieve horizontal expansion of a few controllers at most. Therefore, the front controller becomes the bottleneck of the entire storage performance.


2. Management problems caused by differences between traditional storages of different manufacturers

Different manufacturers have different management and use methods of equipment. Due to constraints such as tight coupling of software and hardware and inconsistent management interfaces, unified management and flexible scheduling of resources cannot be achieved, and storage utilization is low. Therefore, the existence of different storage affects the convenience and utilization of storage usage.


Distributed storage often adopts a distributed system structure, uses multiple storage servers to share the storage load, and uses the location server to locate and store information. It not only improves the reliability, availability and access efficiency of the system, but also is easy to expand and minimizes the unstable factors introduced by general-purpose hardware. The advantages are as follows:


1. high performance


A high-performance distributed depositor can usually manage read and write caches efficiently, and supports automatic tiered storage. Distributed storage improves system responsiveness by mapping data in hotspot areas to high-speed storage; once these areas are no longer hotspots, the storage system moves them out of high-speed storage. The write cache technology can significantly change the overall storage performance in conjunction with high-speed storage. According to a certain strategy, data is first written to the high-speed storage, and then synchronized to the disk at an appropriate time.


2. Support tiered storage


Due to loosely coupled links through the network, distributed storage allows high-speed storage and low-speed storage to be deployed separately, or mixed in any proportion. In unpredictable business environments or agile applications, the advantages of tiered storage can be brought into full play. The biggest problem of current cache tiered storage is that when the performance pool misses a read, the granularity of data extracted from the cold pool is too large, resulting in high latency and overall performance jitter.


3. Consistency of multiple copies


Unlike traditional storage architectures that use RAID mode to ensure data reliability, distributed storage uses a multi-copy backup mechanism. Before storing the data, the distributed storage shards the data, and the sharded data is stored on the cluster nodes according to certain rules. In order to ensure the consistency between multiple data copies, distributed storage usually adopts a strong consistency technology that writes one copy and reads multiple copies, and uses mirroring, striping, distributed verification and other methods to satisfy tenants' needs for Different requirements for reliability. When reading data fails, the system can recover by reading data from other copies and rewriting the copy to ensure that the total number of copies is fixed; when the data is in an inconsistent state for a long time, the system will automatically rebuild and restore the data. Tenants can set bandwidth rules for data recovery to minimize business impact.


4. Disaster Recovery and Backup


In the disaster recovery of distributed storage, an important method is the multi-time snapshot technology, which enables the user production system to save the data of each version at a certain time interval. It is particularly worth mentioning that the multi-time point snapshot technology supports simultaneous extraction of multiple time point samples and simultaneous recovery, which is very useful for disaster location of many logical errors. If the user has multiple servers or virtual machines that can be used for system recovery, the By comparison and analysis, you can quickly find out which time point is the time point that needs to be restored, which reduces the difficulty of fault location and shortens the location time. This feature is also very useful for reproducing failures for analysis and research to prevent disasters from recurring in the future. Technologies such as multi-copy technology, data striping, multi-time point snapshots and periodic incremental replication provide a guarantee for the high reliability of distributed storage.


5. Elastic expansion


Thanks to a reasonable distributed architecture, distributed storage can predictably and elastically expand computing, storage capacity and performance. The horizontal expansion of distributed storage has the following characteristics:


1) After the node is expanded, the old data will be automatically migrated to the new node to achieve load balancing and avoid single-point overheating;

2) Horizontal expansion only needs to connect the new node and the original cluster to the same network, and the whole process will not affect the business;

3) When a node is added to the cluster, the overall capacity and performance of the cluster system also expands linearly, after which the resources of the new node will be taken over by the management platform and used for allocation or recycling.


6. storage system standardization


With the development of distributed storage, the standardization process of the storage industry is also advancing. Distributed storage preferentially uses industry standard interfaces (SMI-S or OpenStack Cinder) for storage access. At the platform level, by abstracting heterogeneous storage resources, traditional storage device-level operations are encapsulated into storage resource-oriented operations, thereby simplifying the operation of heterogeneous storage infrastructure, realizing centralized management of storage resources, and enabling Automate the entire storage lifecycle process of create, alter, recycle, and more. Based on the function of heterogeneous storage integration, users can realize disaster recovery across different brands and media, such as using medium and low-end arrays for high-end array disaster recovery, using different disk arrays for flash memory array disaster recovery, etc., which reduces the storage capacity from the side. Procurement and management costs.


@Liu Dong Neusoft Group

Compared with traditional SAN and NAS, distributed storage has the following advantages:


1. Performance


When the distributed storage reaches a certain scale, the performance will exceed that of traditional SAN and NAS. A large number of disks and nodes, combined with an appropriate data distribution strategy, can achieve very high aggregate bandwidth. Traditional SAN and NAS will have performance bottlenecks. Once the maximum expansion capability is reached, the performance will not change or even decrease.


2. Price: Traditional SAN and NAS are expensive. Especially for SAN network equipment, the cost of optical fiber network is relatively high. Moreover, future expansion requires additional expansion cabinets. the cost is too high. Distributed storage only needs an IP network, and several X86 servers and built-in hard disks can be set up, and the initial cost is relatively low. Expansion is also very convenient, just add a server.


3. Sustainability: Traditional SAN and NAS have limited expansion capabilities, and one head can carry up to several hundred disks. If you want more than PB of shared storage, distributed storage is only the best choice. Don't worry about scalability issues.


shortcoming:


1. Users who need relatively strong technical capabilities, operation and maintenance capabilities, and even development capabilities. Traditional storage works out of the box, the hardware is provided by the manufacturer, and there are complete documentation and services. Many distributed systems are open source or some companies provide support services based on open source systems. The version iteration is relatively fast, and you may need to solve problems by yourself.


2. Data consistency problem. For application scenarios such as ORACLE RAC that require high data consistency, the performance of distributed storage may be slightly weaker. Because of the distributed structure, data synchronization is a big problem. Not as reliable as traditional storage device data storage methods.


3. Stability issues. Distributed storage is very dependent on the network environment and bandwidth. If the network jitters or fails, it may affect the operation of the distributed storage system. For example, in the event of an IP conflict, the entire distributed storage may be inaccessible. Traditional storage generally uses dedicated SAN or IP network, which is more reliable in terms of stability.


@Garyy Continental Insurance

The reason for the rapid development of hyperconverged architecture is that it has significant advantages and can bring extremely high customer value. The hyper-converged architecture realizes the unified management and scheduling of computing, storage, network and other resources, has more elastic horizontal expansion capabilities, and can bring optimal efficiency, flexibility, scale, cost and data protection to the data center. The integrated platform of computing and storage hyper-convergence replaces the traditional server and centralized storage architecture, which makes the entire architecture clearer and simpler, and greatly simplifies the design of complex IT systems.


From the user's point of view, the reasons for choosing a hyper-converged architecture are often in the following aspects:


(1) Performance


The demands of business scale, data availability, business continuity, and performance are growing rapidly, and the traditional IT architecture cannot meet them or the cost is too high. Hyperconverged architectures can easily reach hundreds of thousands of IOPS. If all-flash hyperconvergence is adopted, the performance far exceeds that of ordinary SAN arrays.


(2) Cost


The cost of traditional IT architecture is too high on the premise of providing the same performance. Cost is not the biggest advantage of hyper-convergence, but it can still save investment compared to traditional solutions.


(3) Profit from the old


Reducing the old is not what hyper-convergence should do, but it is a real need. Hyperconvergence supports common standard x86 server hardware and therefore supports deployment on existing servers, protecting investments.


The content covers mainstream open source fields

640?wx_fmt=png 640?wx_fmt=png 640?wx_fmt=png 640?wx_fmt=png 640?wx_fmt=jpeg 640?wx_fmt=png

Submission email

[email protected]

640?wx_fmt=png

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324640125&siteId=291194637