seaweedfs deployment and use (compatible Hadoop)

Copyright: Reprinted please link https://blog.csdn.net/DPnice/article/details/84990050

Software version:

software version Compression bags Name
seaweedfs seaweedfs-1.11 linux_amd64.tar.gz

GitHub:

https://github.com/chrislusf/seaweedfs

Related Definition Description:

Define Name Explanation
master Provide volume => location services and location mapping file id serial number
Node Abstract node system, abstract DataCenter, Rack
DataNode Storage node for managing, storing logical volume
DataCenter Data center, corresponding to the reality of different racks
Rack Rack, corresponding to the reality of the cabinet, a rack belonging to a particular data center, the data center may comprise a plurality of racks.
Volume The logical structure of the logical volume, the storage, the logical volume storing Needle, A VolumeServer contains one Store
Needle Logical volume Object, corresponding to the stored files, Needle file size is limited to 4GB for now.
Collection Set of files can be distributed across multiple logical volumes, if the file is stored when not specified collection, then use the default ""
Filer File Manager, to upload data Filer Weed Volume Servers, and large file into blocks, and block metadata information into the storage area Filer.
Mount User space, when used together with the mount filer, filer to retrieve only the metadata file, read the actual contents of the file and the volume directly between mount server, so does not require multiple filer

Use $ ./weed -h command to view and illustrates
the use of $ ./weed [command] -h command to view the parameters and description

Deployment planning:

node master volume filer
cdh1
cdh2
cdh3

Decompression:

$ tar -zxvf ./linux_amd64.tar.gz
得到 weed 文件

Start command:

Create a folder:

$ mkdir seaweedfd_master
$ mkdir seaweedfd_data

Start master command:

$ ./weed master -ip cdh1 -maxCpu 1 -mdir ./seaweedfd_master -peers cdh1:9333,cdh2:9333,cdh3:9333 -port 9333 -pulseSeconds 5 -defaultReplication 001
$ ./weed master -ip cdh2 -maxCpu 1 -mdir ./seaweedfd_master -peers cdh1:9333,cdh2:9333,cdh3:9333 -port 9333 -pulseSeconds 5 -defaultReplication 001
$ ./weed master -ip cdh3 -maxCpu 1 -mdir ./seaweedfd_master -peers cdh1:9333,cdh2:9333,cdh3:9333 -port 9333 -pulseSeconds 5 -defaultReplication 001

Avoid split brain: Only odd number of masters are supported!

后台运行:$ nohup ./weed master -ip cdh3 -maxCpu 1 -mdir ./seaweedfd_master -peers cdh1:9333,cdh2:9333,cdh3:9333 -port 9333 -pulseSeconds 5 -defaultReplication 001 > weed_master.out &

Want to provide services must survive two master

Start volume:

$ ./weed volume -dataCenter dc1 -dir ./seaweedfd_data -ip cdh1 -ip.bind cdh1 -maxCpu 1 -mserver cdh1:9333,cdh2:9333,cdh3:9333 -port 9222 -port.public 9222 -publicUrl cdh1 -rack rack1
$ ./weed volume -dataCenter dc1 -dir ./seaweedfd_data -ip cdh2 -ip.bind cdh2 -maxCpu 1 -mserver cdh1:9333,cdh2:9333,cdh3:9333 -port 9222 -port.public 9222 -publicUrl cdh2 -rack rack1
$ ./weed volume -dataCenter dc1 -dir ./seaweedfd_data -ip cdh3 -ip.bind cdh3 -maxCpu 1 -mserver cdh1:9333,cdh2:9333,cdh3:9333 -port 9222 -port.public 9222 -publicUrl cdh3 -rack rack1

dataCenter: Name Data center
rack: Rack name
backstage start: $ nohup ./weed volume -dataCenter dc1 -dir ./seaweedfd_data -ip cdh1 -ip.bind cdh1 -maxCpu 1 -max 200 -mserver cdh1: 9333, cdh2: 9333 , cdh3: 9333 -port 9222 -port.public 9222 -publicUrl cdh1 -rack rack1> weed_volume.out &

Access master webUI:

http://cdh3:9333/

Command to upload the file directory:

$ ./weed upload -dataCenter dc1 -master=cdh3:9333 -dir="./dir/"

File allocation key:

# 基本使用:
$ curl http://cdh1:9333/dir/assign
# 指定复制类型:
$ curl "http://cdh1:9333/dir/assign?replication=001"
# 指定保存时间
$ curl "http://cdh1:9333/dir/assign?count=5"
# 指定数据中心
$ curl "http://cdh1:9333/dir/assign?dataCenter=dc1"

Upload file example:

# 获取file key
$ curl "http://cdh1:9333/dir/assign?dataCenter=dc1"
# 返回JSON
{"fid":"2,016beb339d","url":"cdh2:9222","publicUrl":"cdh2","count":1}
# 上传一个文件指定fid
$ curl -F file=@./file http://cdh2:9222/2,016beb339d
# 返回JSON
{"name":"file","size":41629428}

Get file:

$ curl http://cdh2:9222/2,016beb339d

Configuration start filer:

# 查看配置文件 filer.toml 
$ ./weed scaffold filer 

Leveldb default file management

# 生成配置文件
$ ./weed scaffold -config filer -output="."
# 示例使用postgres作为元数据存储
# 创建表
=========================================
CREATE TABLE IF NOT EXISTS filemeta (
dirhash     BIGINT,
name        VARCHAR(1000),
directory   VARCHAR(4096),
meta        bytea,
PRIMARY KEY (dirhash, name)
);
=========================================
# 配置 filer.toml 中的[postgres]
$ vi filer.toml

start up:

$ ./weed filer -master cdh1:9333,cdh2:9333,cdh3:9333 -port 8888 -port.public 8889

Backgrounding $ nohup ./weed filer -master cdh1: 9333, cdh2: 9333, cdh3: 9333 -port 8888> weed_filer.out &

It recommended to start more than one, more than one share one database

upload files:

$ curl -F "[email protected]" "http://cdh1:8888/path/to/sources/"

Access webUI page:

http://cdh1:8888/

Compatible with Hadoop:

# MavenCentral下载最新版本
https://mvnrepository.com/artifact/com.github.chrislusf/seaweedfs-hadoop-client

# 确认有 mapred-site.xml 文件

# 测试 ls
==================================================================================
../../bin/hdfs dfs -Dfs.defaultFS=seaweedfs://cdh1:8888 \
           -Dfs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \
           -libjars ./seaweedfs-hadoop-client-1.0.2.jar \
           -ls /
# 返回
Found 2 items
drwxrwx---   -          0 2018-12-13 10:29 /path
drwxrwx---   -          0 2018-12-13 14:17 /weed

# 测试上传文件
==================================================================================
../../bin/hdfs dfs -Dfs.defaultFS=seaweedfs://cdh1:8888 \
           -Dfs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \
           -libjars ./seaweedfs-hadoop-client-1.0.2.jar \
           -put ./slaves /

# 测试下载文件夹
==================================================================================
../../bin/hdfs dfs -Dfs.defaultFS=seaweedfs://cdh1:8888 \
           -Dfs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \
           -libjars ./seaweedfs-hadoop-client-1.0.2.jar \
           -get /path

Configuration Hadoop:

$ vi core-site.xml
<property>
    <name>fs.seaweedfs.impl</name>
    <value>seaweed.hdfs.SeaweedFileSystem</value>
</property>
<!--可选 seaweedfs filer 的地址-->
<property>
    <name>fs.defaultFS</name>
    <value>seaweedfs://cdh1:8888</value>
</property>
# 配置SeaweedFS HDFS客户端jar
$ bin/hadoop classpath
$ cp ./seaweedfs-hadoop-client-1.0.2.jar /hadoop/share/hadoop/common/lib/
$ scp ./seaweedfs-hadoop-client-1.0.2.jar cdh2:/hadoop/share/hadoop/common/lib
$ scp ./seaweedfs-hadoop-client-1.0.2.jar cdh3:/hadoop/share/hadoop/common/lib
$ scp ./core-site.xml cdh2:/hadoop/etc/hadoop/
$ scp ./core-site.xml cdh3:/hadoop/etc/hadoop/
# 查看
$ ../../bin/hdfs dfs -ls seaweedfs://cdh3:8888/
# 返回
Found 3 items
drwxrwx---   -                        0 2018-12-13 10:29 seaweedfs://cdh3:8888/path
-rw-r--r--   1 dpnice dpnice         15 2018-12-13 14:41 seaweedfs://cdh3:8888/slaves
drwxrwx---   -                        0 2018-12-13 14:17 seaweedfs://cdh3:8888/weed

API:

Master Server API:

File allocation Key:

# Basic Usage:
curl http://localhost:9333/dir/assign
# To assign with a specific replication type:
curl "http://localhost:9333/dir/assign?replication=001"
# To specify how many file ids to reserve
curl "http://localhost:9333/dir/assign?count=5"
# To assign a specific data center
curl "http://localhost:9333/dir/assign?dataCenter=dc1"

Find the address volume:

curl "http://localhost:9333/dir/lookup?volumeId=3&pretty=y"
{
  "locations": [
    {
      "publicUrl": "localhost:8080",
      "url": "localhost:8080"
    }
  ]
}
# Other usages:
# You can actually use the file id to lookup, if you are lazy to parse the file id.
curl "http://localhost:9333/dir/lookup?volumeId=3,01637037d6"
# If you know the collection, specify it since it will be a little faster
curl "http://localhost:9333/dir/lookup?volumeId=3&collection=turbo"

Garbage collection:

curl "http://localhost:9333/vol/vacuum"
curl "http://localhost:9333/vol/vacuum?garbageThreshold=0.4"

Garbage collection will create a copy of the .dat and .idx files, skip deleted files, keep a copy of the original file deletion.
garbageThreshold is optional.

Pre-allocated volumes:

# specify a specific replication
curl "http://localhost:9333/vol/grow?replication=000&count=4"
{"count":4}
# specify a collection
curl "http://localhost:9333/vol/grow?collection=turbo&count=4"
# specify data center
curl "http://localhost:9333/vol/grow?dataCenter=dc1&count=4"
# specify ttl
curl "http://localhost:9333/vol/grow?ttl=5d&count=4"

On behalf count generate several empty volume

To delete a collection:

# delete a collection
curl "http://localhost:9333/col/delete?collection=benchmark&pretty=y"

Check the system status:

# 集群状态
curl "http://10.0.2.15:9333/cluster/status?pretty=y"
{
"IsLeader": true,
"Leader": "10.0.2.15:9333",
"Peers": [
"10.0.2.15:9334",
"10.0.2.15:9335"
    ]
}
# 拓扑状态
curl "http://localhost:9333/dir/status?pretty=y"
{
"Topology": {
"DataCenters": [
    {
    "Free": 567,
    "Id": "dc1",
    "Max": 600,
    "Racks": [
      {
        "DataNodes": [
          {
            "Free": 190,
            "Max": 200,
            "PublicUrl": "cdh2",
            "Url": "cdh2:9222",
            "Volumes": 10
          },
          {
            "Free": 190,
            "Max": 200,
            "PublicUrl": "cdh1",
            "Url": "cdh1:9222",
            "Volumes": 10
          },
          {
            "Free": 187,
            "Max": 200,
            "PublicUrl": "cdh3",
            "Url": "cdh3:9222",
            "Volumes": 13
          }
        ],
        "Free": 567,
        "Id": "rack1",
        "Max": 600
      }
    ]
  }
],
"Free": 567,
"Max": 600,
"layouts": [
  {
    "collection": "",
    "replication": "001",
    "ttl": "5d",
    "writables": [
      15,
      16,
      17,
      18
    ]
  },
  {
    "collection": "",
    "replication": "000",
    "ttl": "",
    "writables": [
      13,
      14,
      10,
      11,
      12,
      19,
      20,
      21,
      22
    ]
  },
  {
    "collection": "",
    "replication": "001",
    "ttl": "",
    "writables": [
      6,
      3,
      7,
      2,
      4,
      5
    ]
  },
  {
    "collection": "turbo",
    "replication": "001",
    "ttl": "",
    "writables": [
      8,
      9
    ]
  }
]
},
"Version": "1.11"
}

Volume Server API:

# 上传文件
curl -F file=@/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6

Pre-need to master key assignment file

# 直接上传文件自动分配key( master的端口)
curl -F file=@/home/chris/myphoto.jpg http://localhost:9333/submit
{"fid":"3,01fbe0dc6f1f38","fileName":"myphoto.jpg","fileUrl":"localhost:8080/3,01fbe0dc6f1f38","size":68231}
# 删除文件
curl -X DELETE http://127.0.0.1:8080/3,01637037d6
# 查看分块大文件的列表文件内容
curl http://127.0.0.1:8080/3,01637037d6?cm=false
# 检查 Volume Server 的状态
curl "http://localhost:8080/status?pretty=y"
{
  "Version": "0.34",
  "Volumes": [
    {
      "Id": 1,
      "Size": 1319688,
      "RepType": "000",
      "Version": 2,
      "FileCount": 276,
      "DeleteCount": 0,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 2,
      "Size": 1040962,
      "RepType": "000",
      "Version": 2,
      "FileCount": 291,
      "DeleteCount": 0,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 3,
      "Size": 1486334,
      "RepType": "000",
      "Version": 2,
      "FileCount": 301,
      "DeleteCount": 2,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 4,
      "Size": 8953592,
      "RepType": "000",
      "Version": 2,
      "FileCount": 320,
      "DeleteCount": 2,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 5,
      "Size": 70815851,
      "RepType": "000",
      "Version": 2,
      "FileCount": 309,
      "DeleteCount": 1,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 6,
      "Size": 1483131,
      "RepType": "000",
      "Version": 2,
      "FileCount": 301,
      "DeleteCount": 1,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 7,
      "Size": 46797832,
      "RepType": "000",
      "Version": 2,
      "FileCount": 292,
      "DeleteCount": 0,
      "DeletedByteCount": 0,
      "ReadOnly": false
    }
  ]
}

Filer Server API:

# Basic Usage:
# create or overwrite the file, the directories /path/to will be automatically created
curl -F [email protected] "http://localhost:8888/path/to"
{"name":"report.js","size":866,"fid":"7,0254f1f3fd","url":"http://localhost:8081/7,0254f1f3fd"}
# get the file content
curl  "http://localhost:8888/javascript/report.js"   
# upload the file with a different name
curl -F [email protected] "http://localhost:8888/javascript/new_name.js"    
{"name":"report.js","size":866,"fid":"3,034389657e","url":"http://localhost:8081/3,034389657e"}
# list all files under /javascript/
curl  -H "Accept: application/json" "http://localhost:8888/javascript/?pretty=y"           
{
  "Directory": "/javascript/",
  "Files": [
    {
      "name": "new_name.js",
      "fid": "3,034389657e"
    },
    {
      "name": "report.js",
      "fid": "7,0254f1f3fd"
    }
  ],
  "Subdirectories": null
}
# 分页查看文件列表
curl  "http://localhost:8888/javascript/?pretty=y&lastFileName=new_name.js&limit=2"
{
  "Directory": "/javascript/",
  "Files": [
    {
      "name": "report.js",
      "fid": "7,0254f1f3fd"
    }
  ]
}
# 删除文件
curl -X DELETE "http://localhost:8888/javascript/report.js"

Guess you like

Origin blog.csdn.net/DPnice/article/details/84990050