Distributed storage of e-commerce system study notes

If you need to store a large number of pictures, such as online albums, configuration files, etc., if it is deployed with the application, it cannot be shared by multiple services in a distributed situation. Moreover, some servers do not support http static resources well. For example, tomcat supports dynamic resources better, and apache is suitable for static resource access.

problem needs to be solved

1 Multiple service sharing---build image server

2. Expansion problem - need to support horizontal expansion

In order to solve these two problems, we use FastDFS cluster to solve it. FastDFS is an open source lightweight distributed file system. It manages files. Its functions include: file storage, file synchronization, file access (file upload, file download) ), etc., to solve the problems of mass storage and load balancing. It is especially suitable for online services with files as the carrier, such as photo album websites, video websites and so on. Its advantage is that it can be expanded horizontally. The devices of FastDFS storage resources are distinguished by groups. When the storage space is insufficient, the expansion can be achieved by adding groups horizontally and adding devices accordingly, and there is no upper limit. Another advantage is high availability, which means that the FastDFS cluster can automatically switch to another nginx device when the nginx that provides the service fails to ensure the stability of the service.

FastDFS application

It mainly solves the problem of mass data storage, and is especially suitable for online services with medium and small files (recommended range: 4KB < file_size < 500MB).

The reason why it is suitable for small and medium files is that each volume, that is, a group, is a redundant backup for all storage storage files. If the storage file is too large, all storage servers of each volume are redundant, and the space is easily saturated. .

Architecture diagram


  1. Tracker server is responsible for scheduling, balancing, storage server periodic heartbeat reporting status
  2. Tracker server is also a cluster to prevent single point of failure. Each node of tracker server is peer-to-peer, and there is no difference between sending to any server
  3. The files between each volume are different, and the files in the volumes are the same. They are synchronized through binlog, and the synchronization information of the storage server will also be sent to the tracker server for scheduling judgment.

Upload file process:


Download file process:

think

1. High availability through redundant backup

2. Increase the tracker layer to be responsible for routing scheduling

3 Through redundant backup, it is still the idea of ​​​​cluster, which does not feel like a real distributed file storage.

By comparing the redis cluster architecture, I found that the reason why I think it is not distributed is that the hash operation is not performed when the volume is stored, such as redis to the 3-master and 3-slave structure, when it is stored to the master node, crc16 the key, and then Modulo 16384, this key can be distributed on the 3 master nodes relatively evenly. Because each node has partial data, if a node hangs, it cannot provide services because the data is incomplete. Therefore, each redundant node needs to be configured to store backup information. That is, we commonly use the 3-master and 3-slave architecture.


Check out fastdfs, load balancing can be done between groups, but it is static load balancing. Specifically, what is static load balancing:

Load balancing:

Local traffic management technologies mainly include the following load balancing algorithms:

Static load balancing algorithms include: round robin, ratio, priority

Dynamic load balancing algorithms include: minimum number of connections, fastest response speed, observation method, prediction method, dynamic performance allocation, dynamic server replenishment, service quality, service type, rule mode

So the question is, hash itself can achieve dynamic and more uniform distribution, so hash itself is also a load balancing method, right? Remember that nginx's load balancing methods include round-robin, weighted, ip hash, fair, url hash, etc. Among them, iphash is based on the ip address hash operation.



A simple equal-modulo operation can evenly distribute the numbers into 3 buckets, achieving a balanced distribution. In practical applications, optimization is required.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324651585&siteId=291194637