GFS distributed storage platform

GFS overview

GFS, full name Gluster File System, an open source distributed file system, is the core of Scale storage and can handle thousands of clients. In traditional solutions, Glusterfs can flexibly combine physical, virtual and cloud resources to achieve high performance Available and enterprise-level performance storage.
Composed of storage server (Brick Server), client and NFS/Samba storage gateway
No metadata server

GlusterFS features

  • Scalability and high performance
  • High availability
  • Global unified namespace
  • Flexible volume management
  • Based on standard protocol

The commonly used term
Brick: A storage unit in GFS. Pass is an export directory of servers in a trusted storage pool. It can be identified by the host name and directory name, such as'SERVER :EXPORT'
Volume: Volume
FUSE: Kernel file system, Filesystem Userspace is a loadable kernel module, which supports unprivileged users to create their own file system without modifying the kernel code . By running the file system code in the user space, the FUSE code is bridged with the kernel.
VFS: virtual file system
Glusterd: Gluster management daemon, to run on all servers in the trusted storage pool.

The structure of GFS
Modular stacked architecture
Modular, stacked architecture
Through the combination of modules, complex functions are realized

Elastic HASH algorithm
Obtain a 32-bit integer through the HASH algorithm and
divide it into N continuous subspaces. Each space corresponds to a Brick
. The advantage of the elastic HASH algorithm is to
ensure that the data is evenly distributed in each Brick to
solve the dependence on the metadata server. Then solve the single point of failure and access bottleneck

Common GFS Volume Types
Distributed Volume
Striped Volume
Replicated Volume
Distributed Striped Volume
Distributed Replicated Volume
Striped Replicated Volume
Distributed Striped Replicated Volume

)

Introduction to common GFS volume types

Distributed volume

The file is not divided into blocks, and the HASH value is saved through extended file attributes
. The underlying file systems supported are ext3, ext4, ZFS, XFS, etc.

Features

  • Files are distributed on different servers, no redundancy
  • Expand volume size more easily and cheaply
  • Single point of failure can cause data loss
  • Rely on underlying data protection

Create command
gluster volume create dis-volume server1:/dir1 server2:/dir2

Striped roll

The file is divided into N blocks (N strip nodes) according to the offset, and the polled storage is stored in each Brick Server node. When
storing large files, the performance is particularly outstanding.
No redundancy, similar to Raid0

Features

  • Data is divided into smaller pieces and distributed to different strips in the block server farm
  • Distribution reduces the load and smaller files accelerate the speed of access
  • No data redundancy

Create command
gluster volume create stripe-volume stripe 2 transport tcp server1:/dir1 server2:/dir2

Copy volume

Save one or more copies of the same file. In the
copy mode, the disk utilization is low because the copies are saved
. The storage space on multiple nodes is inconsistent. Then the capacity of the lowest node will be taken as the total capacity of the volume according to the barrel effect

Features

  • All servers in the volume keep a complete copy
  • The number of copies of the volume can be determined when the customer creates it
  • At least two block servers or more servers
  • With redundancy

Create command
gluster volume create rep-volume replica 2 transport tcp server1:/dir1 server2:/dir2

Distributed stripe volume

Taking into account the functions of distributed volumes and striped volumes
Mainly used for large file access processing
At least 4 servers are required

Create command
gluster volume create rep-volume replica 2 transport tcp server1:/dir1 server2:/dir2

Distributed replication volume

Taking into account the functions of distributed volumes and replicated volumes,
used when redundancy is required

Create command
gluster volume create rep-volume replica 2 transport tcp server1:/dir1 server2:/dir2

How GFS works

  1. Clients or applications access data through the mount point of GlusterFS
  2. The linux system kernel receives the request and processes it through the VFS API
  3. The VFS submits the data to the FUSE kernel file system, and the fuse file system submits the data to the GlusterFS client through the /dev/fuse device file.
  4. After the GlusterFS client receives the data, the client processes the data according to the configuration file configuration
  5. Transfer data to the remote GlusterFS Server through the network, and write the data to the server storage device
    Insert picture description here

bring it on! Show! !

Introduction to the experimental environment

Use VMware virtual machine software. There
are a total of 4 node hosts, each with 4 hard disks (default, don’t change it)
. The ip addresses of the 4 node machines are:
20.0.0.7, the host name is JD1
20.0.0.4, the host name JD2
20.0.0.5, the host name is JD3
20.0.0.6, and the host name is JD4
Insert picture description here

One client has an
IP address of 20.0.0.8 and a host name of client01

Start showing!

In the following steps, you need to
check the hard disk conditions for all four. You can see that the four hard disks are sdb, sdc, sdd, and sde** (my side is an experimental environment, and the number of hard disks of each server in the live network may not be the same. So read the hard drive name clearly)**

[root@jd1 ~]# fdisk -l
……省略部分……仅保留新加入硬盘信息……
磁盘 /dev/sdd:21.5 GB, 21474836480 字节,41943040 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节


磁盘 /dev/sdb:21.5 GB, 21474836480 字节,41943040 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节


磁盘 /dev/sdc:21.5 GB, 21474836480 字节,41943040 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节



磁盘 /dev/sde:21.5 GB, 21474836480 字节,41943040 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节

Hard disk partition

[root@jd1 ~]# mkdir /script
[root@jd1 ~]# cd /script/
[root@jd1 script]# vim disk.sh	##硬盘分区脚本
#!/bin/bash
echo "the disks exist list:"
fdisk -l | grep '磁盘 /dev/sd[a-z]'
echo "==================================="
PS3="chose which disk you want to cteate:"
select VAR in `ls /dev/sd*|grep -o 'sd[b-z]'|uniq` quit
do
        case $VAR in
        sda)
                fdisk -l /dev/sda
                break ;;
        sd[b-z])
          #create partitions
                echo "n
                        p



                        w" | fdisk /dev/$VAR
                #make filesystem
                mkfs.xfs -i size=512 /dev/${VAR}"1" &> /dev/null
                #mount the system
                mkdir -p /data/${VAR}"1" &> /dev/null
                echo -e "/dev/${VAR}"1" /data/${VAR}"1" xfs defaults 0 0\n" >> /etc/fstab
                mount -a &> /dev/null
                break ;;
        quit)
                break ;;
        *)
                echo "wrong disk,please check again";;
        esac
done

[root@jd1 script]# chmod +x disk.sh 
[root@jd1 script]# ./disk.sh 	##运行硬盘分区脚本,我只运行一次,实际上每台主机的四块硬盘都要用一次
the disks exist list:
磁盘 /dev/sdd:21.5 GB, 21474836480 字节,41943040 个扇区
磁盘 /dev/sdb:21.5 GB, 21474836480 字节,41943040 个扇区
磁盘 /dev/sdc:21.5 GB, 21474836480 字节,41943040 个扇区
磁盘 /dev/sda:214.7 GB, 214748364800 字节,419430400 个扇区
磁盘 /dev/sde:21.5 GB, 21474836480 字节,41943040 个扇区
===================================
1) sdb
2) sdc
3) sdd
4) sde
5) quit
chose which disk you want to cteate:1
欢迎使用 fdisk (util-linux 2.23.2)。

更改将停留在内存中,直到您决定将更改写入磁盘。
使用写入命令前请三思。

Device does not contain a recognized partition table
使用磁盘标识符 0x9aeb95b2 创建新的 DOS 磁盘标签。

命令(输入 m 获取帮助):Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): 分区号 (1-4,默认 1):起始 扇区 (2048-41943039,默认为 2048):将使用默认值 2048
Last 扇区, +扇区 or +size{
    
    K,M,G} (2048-41943039,默认为 41943039):将使用默认值 41943039
分区 1 已设置为 Linux 类型,大小设为 20 GiB

命令(输入 m 获取帮助):The partition table has been altered!

Calling ioctl() to re-read partition table.
正在同步磁盘。

分区四次后查看硬盘是否分区成功
[root@jd1 script]# ll /dev/ |grep sd
brw-rw----. 1 root disk      8,   0 9月  13 19:19 sda
brw-rw----. 1 root disk      8,   1 9月  13 19:19 sda1
brw-rw----. 1 root disk      8,   2 9月  13 19:19 sda2
brw-rw----. 1 root disk      8,  16 9月  13 19:31 sdb
brw-rw----. 1 root disk      8,  17 9月  13 19:31 sdb1
brw-rw----. 1 root disk      8,  32 9月  13 19:33 sdc
brw-rw----. 1 root disk      8,  33 9月  13 19:33 sdc1
brw-rw----. 1 root disk      8,  48 9月  13 19:33 sdd
brw-rw----. 1 root disk      8,  49 9月  13 19:33 sdd1
brw-rw----. 1 root disk      8,  64 9月  13 19:33 sde
brw-rw----. 1 root disk      8,  65 9月  13 19:33 sde1

[root@jd1 script]# df -Th
文件系统                类型      容量  已用  可用 已用% 挂载点
……省略部分……
/dev/sdb1               xfs        20G   33M   20G    1% /data/sdb1
/dev/sdc1               xfs        20G   33M   20G    1% /data/sdc1
/dev/sdd1               xfs        20G   33M   20G    1% /data/sdd1
/dev/sde1               xfs        20G   33M   20G    1% /data/sde1

Set up GFS

[root@jd1 script]# setenforce 0
[root@jd1 script]# iptables -F
[root@jd1 script]# vim /etc/hosts	##本地解析文件,四台都要
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
20.0.0.7 jd1
20.0.0.4 jd2
20.0.0.5 jd3
20.0.0.6 jd4

Install GLFS software (the client must also be installed)

[root@jd1 ~]# yum -y install centos-release-gluster6	##安装安装工具
[root@jd1 ~]# yum -y install glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma
[root@jd1 ~]# systemctl start glusterd	##启动服务
[root@jd1 ~]# systemctl enable glusterd	##设置开机自启
Created symlink from /etc/systemd/system/multi-user.target.wants/glusterd.service to /usr/lib/systemd/system/glusterd.service.

Add other nodes to the four hosts

[root@jd1 ~]# gluster peer probe jd1
peer probe: success. Probe on localhost not needed
[root@jd1 ~]# gluster peer probe jd2
peer probe: success. 
[root@jd1 ~]# gluster peer probe jd3
peer probe: success. 
[root@jd1 ~]# gluster peer probe jd4
peer probe: success. 

Create distributed volume

[root@jd1 ~]# gluster volume create fbj jd1:/data/sdb1 jd2:/data/sdb1 force
volume create: fbj: success: please start the volume to access data
[root@jd1 ~]# gluster volume start fbj
volume start: fbj: success
[root@jd1 ~]# gluster volume info fbj
 
Volume Name: fbj
Type: Distribute
Volume ID: f71bf71f-90af-41a1-a89d-d2f3ca9e9d7e
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: jd1:/data/sdb1
Brick2: jd2:/data/sdb1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

客户机验证
[root@client ~]# cd /
[root@client /]# mkdir /gfs
[root@client /]# mkdir fbj fzj fbtdj fbfzj tdj
[root@client fbj]# touch 1.txt 2.txt 3.txt

在两台主机中的 /data/sdb1里找,两个里一个有就成功了,随机分布的
[root@jd2 sdb1]# ls
1.txt  2.txt  3.txt

Create a striped volume

Don't look at it, the new version of the strip after 6.10

Create a replicated volume

[root@jd1 sdb1]# gluster volume create fzj replica 2 jd1:/data/sdc1 jd2:/data/sdc1 force
volume create: fzj: success: please start the volume to access data
[root@jd1 sdb1]# gluster volume start fzj
volume start: fzj: success
[root@jd1 sdb1]# gluster volume info fzj
 
Volume Name: fzj
Type: Replicate
Volume ID: 99a14b64-1a02-4a97-bf7d-3604475300fe
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: jd1:/data/sdc1
Brick2: jd2:/data/sdc1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

客户机
[root@client fbj]# mount -t glusterfs jd1:fzj /gfs/fzj
[root@client fbj]# cd ..
[root@client gfs]# cd fzj
[root@client fzj]# touch 1.txt 2.txt 6.txt

验证,两台都有
[root@jd1 sdc1]# ls
1.txt  2.txt  6.txt

[root@jd2 sdc1]# ls
1.txt  2.txt  6.txt

Create a distributed striped volume

Also see, striped volumes can not be created, how can it be created distributed! ! !

[root@jd1 sdc1]# gluster volume create fbtdj stripe 2 jd1:/data/sdd1 jd2:/data/sdd1 jd3:/data/sdd1 jd4:/data/sdd1 force
stripe option not supported

Distributed replication volume

[root@jd1 sdc1]# gluster volume create fbfzj replica 2 jd1:/data/sdd1 jd2:/data/sdd1 jd3:/data/sdd1 jd4:/data/sdd1 force
volume create: fbfzj: success: please start the volume to access data
[root@jd1 sdc1]# gluster volume start fbfzj
volume start: fbfzj: success
[root@jd1 sdc1]# gluster volume info fbfzj
 
Volume Name: fbfzj
Type: Distributed-Replicate
Volume ID: 6e0725be-204f-41d5-b63a-10beb4817953
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: jd1:/data/sdd1
Brick2: jd2:/data/sdd1
Brick3: jd3:/data/sdd1
Brick4: jd4:/data/sdd1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


客户机验证
[root@client gfs]# mount -t glusterfs jd1:fbfzj /gfs/fbfzj/
[root@client gfs]# cd ..
[root@client /]# cd /gfs/fbfzj/
[root@client fbfzj]# touch 123.txt 234.txt

在主机里找
[root@jd1 sdd1]# ls
234.txt
[root@jd2 data]# cd sdd1/
[root@jd2 sdd1]# ls
234.txt
[root@jd3 script]# cd /data/sdd1/
[root@jd3 sdd1]# ls
123.txt
[root@jd4 ~]# cd /data/sdd1/
[root@jd4 sdd1]# ls
123.txt

Destructive testing

Stop jd2's network
[root@jd2 sdd1]# systemctl stop network

View replicated volume

[root@client gfs]# cd fzj
[root@client fzj]# ls
1.txt  2.txt  6.txt
复制卷文件还在

View distribution volume

[root@client gfs]# cd fbj
[root@client fbj]# ls
[root@client fbj]# 
文件没了

View distributed replication volumes

[root@client gfs]# cd fbfzj/
[root@client fbfzj]# ls
123.txt  234.txt
数据还在

Experiment summary

The above experimental test, usually with copied data, in comparison, the data is relatively safe

Other common GFS commands

查看GlusterFS卷:gluster volume list 
查看所有卷的信息:gluster volume info
查看所有卷的状态:gluster volume status
停止/删除卷:gluster volume stop 卷名               ##停止一个卷
		   gluster volume delete 卷名             ##删除一个卷
设置卷的访问控制:gluster volume set dis-rep auth.allow 192.168.32.*     ##设置192.168.30.0网段的所有IP地址 都能访问dis-rep卷(分布式复制卷)

Guess you like

Origin blog.csdn.net/Ora_G/article/details/108561805