百度对象存储BOS(Baidu Object Storage)进行冷存储数据备份

https://blog.csdn.net/lavorange/article/details/50639849

百度对象存储BOS(Baidu Object Storage)进行冷存储数据备份

2016年02月06日 13:18:24 忆之独秀 阅读数:2123

版权声明:Copy Right 2012 - 2018@ 忆之独秀 https://blog.csdn.net/lavorange/article/details/50639849

最近有需求就是冷存储数据进行异地灾备,同时为了更多的节省本地的存储成本,维护成本,人力资源等等,选择使用相对更为优惠的百度对象存储来进行备份数据,BOS产品介绍:BOS介绍,为了快速的,批量的上传文件,利用BOS Python SDK开发了一套分布式多任务上传解决方案,本文主要来介绍一下BOS Python SDK的使用方法,为BOS免费做了广告,怎么感谢我!

一、创建虚拟环境

 
  1. # yum install python-virtualenv

  2. # mkvirtualenv bos

  3. # workon bos

使用virtualenv主要是使用虚拟环境来搭建python开发环境,将不同的python项目进行隔离,避免相关的包的冲突,简单的介绍一下virtualenv的使用:

列出虚拟环境列表:

workon/lsvirtualenv

新建虚拟环境:

mkvirtualenv [虚拟环境名称]

启动/切换虚拟环境:

workon [虚拟环境名称]

删除虚拟环境:
rmvirtualenv [虚拟环境名称]

离开虚拟环境:

deactive

二、安装BOS SDK

2.1 下载bos sdk安装包

URL:https://bce.baidu.com/doc/SDKTool/index.html#Python

wget http://sdk.bce.baidu.com/console-sdk/bce-python-sdk-0.8.8.zip

2.2 执行安装脚本

python setup.py install

三、编写配置文件

bos_sample_conf.py:

 
  1. import logging

  2. import os

  3. import sys

  4. from baidubce.bce_client_configuration import BceClientConfiguration

  5. from baidubce.auth.bce_credentials import BceCredentials

  6.  
  7. PROXY_HOST = 'localhost:8080'

  8. bos_host = "bj.bcebos.com"

  9. access_key_id = "a6748c1334a44c2d8af60fcdf098b30d"

  10. secret_access_key = "3d7621d35b0c426ea2c0dfdbfca45151"

  11.  
  12. logger = logging.getLogger('baidubce.services.bos.bosclient')

  13. fh = logging.FileHandler("sample.log")

  14. fh.setLevel(logging.DEBUG)

  15.  
  16. formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

  17. fh.setFormatter(formatter)

  18. logger.setLevel(logging.DEBUG)

  19. logger.addHandler(fh)

  20.  
  21. config = BceClientConfiguration(credentials=BceCredentials(access_key_id, secret_access_key), endpoint = bos_host)


配置文件指定了上传的host,id和secret,并执行了client的初始化配置config。

四、文件上传

 
  1. import os,sys,hashlib

  2. from baidubce import exception

  3. from baidubce.services import bos

  4. from baidubce.services.bos import canned_acl

  5. from baidubce.services.bos.bos_client import BosClient

  6. import base64

  7. import bos_sample_conf

  8.  
  9. ##init a bos client

  10. bos_client = BosClient(bos_sample_conf.config)

  11.  
  12. ##init a bucket

  13. bucket_name = 'wahaha'

  14. if not bos_client.does_bucket_exist(bucket_name):

  15. bos_client.create_bucket(bucket_name)

  16. print "init bucket:%s success" % bucket_name

  17. ##upload object from string

  18. object_key = 'Happy Spring Festival'

  19. str = 'this is the test string'

  20. bos_client.put_object_from_string(bucket_name,object_key,str)

  21.  
  22. ##put object from file

  23. file_name = "/root/baidu_object_storage/test/file_to_be_upload"

  24. response = bos_client.put_object_from_file(bucket_name, object_key + ' plus',file_name)

  25. print "response.metadata.etag = " + response.metadata.etag

  26. ##get object meta data

  27. response = bos_client.get_object_meta_data(bucket_name, object_key+' plus')

  28. print "response object meta data:"

  29. print response

  30.  
  31. ##list objects in bucket

  32. response = bos_client.list_objects(bucket_name)

  33. for object in response.contents:

  34. print 'object.key = ' + object.key

  35.  
  36. ##get bucket list

  37. response = bos_client.list_buckets()

  38. for bucket in response.buckets:

  39. print "bucket.name = " + bucket.name

  40. ##get object

  41. print bos_client.get_object_as_string(bucket_name,object_key)

  42.  
  43. #get unfinished multipart upload task

  44. print "get unfinished multipart upload task:"

  45. for item in bos_client.list_all_multipart_uploads(bucket_name):

  46. print 'item.upload_id = ' + item.upload_id

  47.  
  48. #abort unfinished multipart upload task

  49. print "abort unfinished multipart upload task"

  50. for item in bos_client.list_all_multipart_uploads(bucket_name):

  51. bos_client.abort_multipart_upload(bucket_name, item.key.encode("utf-8"), upload_id = item.upload_id)

  52.  
  53. response = bos_client.list_multipart_uploads(bucket_name)

  54. for item in response.bukcet:

  55. print item.name


结果:

五、大文件上传

 
  1. import os

  2. import sys

  3. import hashlib

  4.  
  5. sys.path.append("../bos")

  6. import bos_sample_conf

  7. from baidubce import exception

  8. from baidubce.services import bos

  9. from baidubce.services.bos import canned_acl

  10. from baidubce.services.bos.bos_client import BosClient

  11.  
  12.  
  13. default_path = os.path.dirname(os.path.realpath(__file__))

  14.  
  15. #init a bos client

  16. bos_client = BosClient(bos_sample_conf.config)

  17.  
  18. #init a bucket

  19. bucket_name = 'wahaha'

  20. if not bos_client.does_bucket_exist(bucket_name):

  21. bos_client.create_bucket(bucket_name)

  22.  
  23. #init object key

  24. object_key = 'this is object_key of big file'

  25.  
  26. #upload multipart object

  27. upload_id = bos_client.initiate_multipart_upload(bucket_name,object_key).upload_id

  28. print 'upload_id = ' + upload_id

  29. file_name = default_path + os.path.sep + 'big_file'

  30. if os.path.isfile(file_name):

  31. print "file_name = %s" % file_name

  32. else:

  33. exit(-1)

  34. #set the beginning of multipart

  35. left_size = os.path.getsize(file_name)

  36. #set the offset

  37. offset = 0

  38. part_number = 1

  39. part_list = []

  40. e_tag_str = ""

  41.  
  42. while left_size > 0:

  43. #set each part 50MB

  44. print "size left: %dMB" % (left_size/1014/1024)

  45. part_size = 50*1024*1024

  46. if left_size < part_size:

  47. part_size = left_size

  48.  
  49. response = bos_client.upload_part_from_file(

  50. bucket_name,object_key,upload_id,part_number,part_size,file_name,offset)

  51.  
  52. left_size -= part_size

  53. offset += part_size

  54. part_list.append({

  55. "partNumber":part_number,

  56. "eTag":response.metadata.etag

  57. })

  58. part_number += 1

  59. e_tag_str += response.metadata.etag

  60. print part_number, " ", response.metadata.etag

  61.  
  62.  
  63. print "\n"

  64. response = bos_client.complete_multipart_upload(bucket_name,object_key,upload_id,part_list)

  65. print response

  66.  
  67. m = hashlib.md5()

  68. m.update(e_tag_str)

  69. e_tag_str_to_md5 = "-" + m.hexdigest()

  70.  
  71. if e_tag_str_to_md5 == response.etag:

  72. print "e_tag match great!!!"

  73. else:

  74. print "etag does not match, e_tag_str_to_md5 = %s" % e_tag_str_to_md5

  75.  
  76. print "\n"

  77. print response.bucket

  78. print response.key

  79. print response.etag

  80. print response.location

结果:

注:

1.对于小文件上传返回的response值的etag值就是小文件的md5值。

2.对于大文件上传,完成分块上传的etag是每一个分块上传返回的etag值相加之后再计算md5值得到的值之前再加个"-"(好抽象)。

猜你喜欢

转载自blog.csdn.net/wuxiaobingandbob/article/details/85334573