Introduction to Amazon Simple Storage Service

In this blog post, we discuss Amazon Simple Storage Service (S3 for short), which is the first cloud computing service officially launched by Amazon AWS in 2006, so we will start with it in the service introduction in our blog Well, to "honor" its place in history :-).

S3 provides developers with a highly scalable (Scalability), high durability (Durability) and high availability (Availability) distributed data storage service. It is a data storage service completely oriented to the Internet. Applications can access data on S3 at any time through the Internet through a simple Web service interface. Of course, the data you store on S3 can be controlled to ensure data security. The access to S3 mentioned here includes multiple operations such as reading, writing, and deleting. When you first get in touch with S3, you need to distinguish S3 from what we call online disks: Although they all belong to the category of cloud storage, S3 is a service for developers and mainly used through API programming, while network disks like The cloud storage service provides a service interface for end users. Although S3 can also be used through AWS's web management console or command line, S3 is mainly aimed at developers and can be understood as a background service of cloud storage. For example, Dropbox is a cloud storage service that many people like to use. It is a typical AWS customer, and all its user files are stored in S3 storage. We mentioned in the last blog post that we are seeing more and more customers adopting cloud computing at an accelerated pace, a practical example is the usage of S3 . The graph below shows the growth in the number of data objects stored on S3 over the past few years.

Figure 1: Growth of S3

It can be seen that it took about 6 years for S3 to reach the first trillion objects, but only one year for the second trillion objects!

 

The basic data structure of S3

The data storage structure of S3 is very simple, which is a flat two-layer structure: one layer is a bucket (Bucket, also known as a storage segment), and the other layer is a storage object (Object, also known as a data element). A bucket is a way to classify data in S3, and it is a container for data storage. Every storage object needs to be stored in a certain bucket. The storage bucket is the highest level of the S3 namespace. It will become part of the domain name for users to access data. Therefore, the name of the storage bucket must be unique and DNS-compatible, such as using lowercase and not using special characters. For example, if you create a bucket named: zhangsan, then the corresponding domain name is zhangsan.s3.amazonaws.com, and you can access the data stored in it through http://zhangsan.s3.amazonaws.com/ . Because the geographical location of data storage is sometimes very important to users, S3 will prompt you to select the region (Region) information when creating a bucket. The storage object is the content that the user actually wants to store, and its composition is the object data content plus some metadata information. The object data here is usually a file, and the metadata is information describing the object data, such as the time of data modification. If you store a file picture.jpg in zhangsan's storage bucket, you can access this file through the URL http://zhangsan.s3.amazonaws.com/picture.jpg . From this URL access, we can see that the bucket name needs to be globally unique, and the name of the storage object needs to be unique within the bucket. Only in this way can you access the data you specify through a globally unique URL. The data storage structure of S3 is shown in the following figure:

 

 

Figure 2: Basic storage structure of S3

The size of data in an S3 storage object can range from 1 byte to 5TB. By default, each AWS account can create up to 100 buckets. However, users can store any number of storage objects in a bucket. Theoretically, there is no limit to the number of objects in the storage bucket, because S3 is completely designed according to the distributed storage method. In addition to the high scalability of S3 in terms of capacity, the performance of S3 is also highly scalable, allowing multiple clients and application threads to access data concurrently.

Some people may compare the storage structure of S3 with the general file system. It should be noted that S3 has only two layers of structure and does not support multi-level tree directory structure. But you can simulate a tree structure by designing storage object names with "/". For example, some S3 tools provide an operation option "create folder", which is actually realized by controlling the name of the storage object.

 

 

Several features of S3

As a typical representative of cloud storage, Amazon S3 has its own obvious characteristics in terms of scalability, durability and performance.

1. Durability and usability

Data stored on S3 is automatically stored synchronously across multiple facilities (data centers) and multiple devices in selected geographic regions. S3 storage provides the highest level of data durability and availability in the AWS platform. In addition to distributed data storage, S3 also has a built-in data consistency check mechanism to provide error correction. S3 is designed to have no single point of failure and can withstand simultaneous data loss from two facilities, making it ideal for use as primary data storage for mission-critical data. In fact, Amazon S3 is designed to provide 99.999999999% ("11 nines") annual durability and 99.99% annual availability for each stored object. In addition to built-in redundancy, S3 protects data from corruption by application failures and accidental deletions through the use of S3 versioning capabilities. For non-critical data that can be easily replicated as needed (such as transcoded media files, image thumbnails, etc.), you can use the Reduced Redundancy Storage (RRS) option in Amazon S3. The durability of RRS is 99.99%, and of course its storage cost is lower. While RRS is slightly less durable than standard S3, it's still about 400 times more durable than typical disk drives.

 

2. Elasticity and scalability

Amazon S3 is designed to automatically provide a high level of elasticity and scalability. A typical file system might have problems storing a large number of files in a single directory, but S3 can support an unlimited number of files in any bucket. Also, unlike disks, whose size limits the total amount of data that can be stored, Amazon S3 buckets can store an unlimited amount of data. In terms of data size, the only limitation of S3 currently is that the size of a single storage object cannot exceed 5TB, but you can store any number of storage objects, and S3 will automatically expand and distribute redundant copies of the data to servers in other locations within the same region , all of which are realized entirely through the high-performance infrastructure of AWS.

 

3. Good performance

S3 is a storage service for the Internet, so its data access speed cannot be compared with that of local hard drives. However, Amazon S3 is quickly accessible from Amazon EC2 within the same region. If you use multiple threads, multiple applications, or multiple clients to access S3 at the same time, the cumulative total throughput of S3 will often far exceed the throughput that a single server can generate or consume. S3 is designed to ensure that the server's access delay is much smaller than that of the Internet.

To speed access to related data, many developers use Amazon S3 with Amazon DynamoDB or Amazon RDS. S3 stores the actual information, while DynamoDB or RDS acts as a store for associated metadata such as storing object names, sizes, keywords, etc. The database provides the functions of indexing and searching, and the reference information of the stored objects can be found out efficiently through metadata searching. Users can then use this result to pinpoint the storage object itself and retrieve it from S3. Of course, in order to improve the performance of end users accessing data in S3, you can also use CDN services such as Amazon CloudFront.

 

4. Simple interface

Amazon S3 provides two forms of Web service API based on SOAP and REST for data management operations. The management and operations provided by these APIs include both storage buckets and storage objects. Although it is very flexible to directly use SOAP or REST-based APIs, because these APIs are relatively low-level, the actual use is relatively cumbersome. Therefore, in order to facilitate developers to use AWS, it provides advanced toolkits or software development kits (SDKs) for common development languages ​​based on the RESP API. Languages ​​supported by these SDKs include Java, .NET, PHP, Ruby, and Python, among others. In addition, if you need to manage and operate S3 directly in the operating system, then AWS also provides an integrated AWS command line interface (CLI) for Windows and Linux environments. In this command line environment, you can use Linux-like commands to implement common operations such as ls, cp, mv, sync, etc. Finally, you can also simply use the S3 service through the AWS web management console, including operations such as creating buckets, uploading and downloading data objects. Of course, there are also many third-party works that can help users use S3 services through a graphical interface, such as S3 Organizer (a free plug-in for Firefox), CloudBerry Explorer for Amazon S3, and so on.

 

There is still a lot to discuss about S3, which will be discussed in our follow-up blogs.

 

Guess you like

Origin blog.csdn.net/u012365585/article/details/15502205