What is MinIO

This article is an original article by joshua317, please indicate when reprinting: Reprinted from joshua317 blog  What is MinIO - joshua317's blog

What is MinIO?

MinIO is a high-performance, distributed object storage system . It is a software product that can run 100% on standard hardware. That is, low-cost machines such as X86 can also run MinIO well.

MinIO provides high-performance, S3-compatible object storage . Minio is an object storage service based on the Go language. It implements most of Amazon's S3 cloud storage service interfaces, and can be regarded as an open source version of S3, which is very suitable for storing large-capacity unstructured data, such as pictures, videos, log files, backup data, and container/virtual machine images etc., while an object file can be of any size, ranging from a few kb to a maximum of 5T. Different from distributed storage systems, minio is characterized by being simple, lightweight, and friendly to developers. It believes that storage should be a development problem rather than an operation and maintenance problem.

MinIO is native to Kubernetes and is the only object storage suite available on every public cloud, every Kubernetes distribution, private cloud and at the edge. MinIO is software defined and 100% open source under the GNU AGPL v3 .

MinIO differs from traditional storage and other object storages in that it designs software architecture for private cloud standards with higher performance requirements from the very beginning. Because MinIO is only designed for object storage from the beginning. Therefore, he designed it in an easier-to-use way. It can realize all the functions required by object storage, and its performance is also stronger. It will not compromise for more business functions and lose the ease of use and efficiency of MinIO. . The benefit of such a result is that it can more easily implement a native object storage service with elastic scalability.

MinIO supports the broadest range of use cases in the largest number of environments. Since launching cloud-native, MinIO's software-defined suite operates seamlessly across public clouds, private clouds and the edge - making it the leader in hybrid cloud and multi-cloud object storage. With industry-leading performance and scalability, MinIO can serve a range of use cases including AI/ML, analytics, backup/restore, and modern web and mobile applications.

MinIO excels in traditional object storage use cases such as secondary storage, disaster recovery, and archive. At the same time, it is also unique in storage technologies for machine learning, big data, private cloud, and hybrid cloud. Of course, support for data analysis, high-performance application load, and native cloud is not excluded.

In China: Alibaba, Tencent, Baidu, China Unicom, Huawei, China Mobile and more than 9,000 companies are also using MinIO products

a characteristic

MinIO's enterprise-grade features represent the standard in the object storage space. From AWS S3 API support to S3 Select support, and implementations such as erasure coding and data security designed by MinIO, MinIO's code has been widely praised and frequently used by some of the biggest names in technology and business fields.

1.1 Erasure code

MinIO protects data using per-object embedded erasure coding, which is written in assembly code to provide the highest performance. MinIO uses Reed-Solomon codes to partition objects into n/2 data and n/2 parity blocks - although these can be configured to any desired level of redundancy. This means that in a 12 drive setup, an object is sliced ​​into 6 data and 6 parity blocks. Even if up to 5 ((n/2)–1) drives are lost (either parity or data), data can still be reliably rebuilt from the remaining drives. The implementation of MinIO ensures that objects can be read or new objects written even if multiple devices are lost or unavailable. Finally, MinIO's erasure code is at the object level and can fix objects one at a time.

1.2 Bitrot Protection

Silent Data Corruption or Bitrot is a serious problem faced by disk drives that leads to data corruption without the user's knowledge. The causes are varied (aging drive, current spikes, bad disk firmware, spurious writes, wrong read/write direction, wrong driver, accidental overwrite), but the result is the same - data leak.

MinIO's optimized implementation of a high-speed hashing algorithm ensures that it never reads corrupted data - it catches and repairs corrupted objects in real-time. End-to-end integrity is ensured by hashing on READ and hashing on WRITE from the application, across the network, and to memory/drive. The implementation is designed for speed and can achieve over 10 GB/sec hash speed on a single core of an Intel CPU.

1.3 encryption

In the world of object storage, strong encryption is needed to have a seat at the table. MinIO delivers more with the highest level of encryption along with extensive optimizations that virtually eliminate the overhead typically associated with storage encryption operations.

1.4 WORM

When WORM is enabled, MinIO disables all APIs that might mutate object data and metadata. This means that once the data is written it is protected from tampering. This has practical application for many different regulatory requirements.

1.5 Identity Authentication and Management

With AWS Identity and Access Management (IAM) compatibility at its core, MinIO IAM exposes the framework to applications and users regardless of the environment, providing the same functionality in different public clouds, private clouds, and edges. MinIO extends AWS IAM compatibility with support for popular external identity providers such as ActiveDirectory/LDAP, Okta, and Keycloak, allowing administrators to offload identity management to their organization's preferred SSO solution.

1.6 Continuous replication

The challenge with traditional replication methods is that they don't scale efficiently to hundreds of terabytes. Having said that, everyone needs a replication strategy to support disaster recovery, and that strategy needs to span geographies, data centers, and clouds. MinIO's continuous replication is designed for large-scale cross-datacenter deployments. By leveraging Lambda to compute notifications and object metadata, it computes deltas efficiently and quickly.

Lambda notifications ensure that changes are propagated immediately, contrary to traditional batch mode. Continuous replication means that even with highly dynamic data sets, data loss is kept to a minimum in the event of a failure. Finally, like MinIO does, continuous replication is multi-vendor, which means your backup location can be anywhere from NAS to public cloud.

1.7 Global Consistency

Modern businesses have data everywhere. MinIO allows these various instances to be combined to form a unified global namespace. Specifically, up to 32 MinIO servers can be combined into a distributed schema set, and multiple distributed schema sets can be combined into a MinIO server federation. Each MinIO Server Federation provides a unified administrator and namespace.

MinIO Federation Server supports an unlimited number of distributed schema sets.

The impact of this approach is that object storage can scale massively for large, geographically dispersed enterprises while remaining accommodating a wide variety of applications from (S3 Select, MinSQL, Spark, Hive, Presto, TensorFlow, H20) Ability. single console.

1.8 Multi-cloud gateway

All enterprises are adopting multi-cloud strategies. This also includes private clouds. Therefore, your bare-metal virtualized containers and public cloud services (including non-S3 providers like Google, Microsoft, and Alibaba) must look identical. While modern applications are highly portable, the data that powers them is not.

The main challenge MinIO addresses is making data available no matter where it resides. MinIO runs on bare metal, network attached storage and every public cloud. What's more, MinIO ensures that you have the exact same view of your data from an application and management perspective through the Amazon S3 API.

MinIO can go a step further and make your existing storage infrastructure compatible with Amazon S3. Its impact is far-reaching. Organizations can now truly unify their data infrastructure - from files to blocks, all appearing as objects accessible through the Amazon S3 API without migration.

1.9 Scalability

MinIO leverages the hard-won knowledge of web scalers to bring a simple scaling model to object storage. This is MinIO's firm philosophy of "scalable simply." In MinIO, scaling starts with a single cluster, which can be federated with other MinIO clusters to create a global namespace, and can span multiple different datacenters if needed. The namespace can be expanded by adding more clusters, more racks, until the goal is achieved.

Scaling is a dimensional concept, but it has one truth: scaling with simplicity. MinIO scales horizontally (scale out) through a concept called server pools. Server pooling is a way to combine multiple technology components. Each server pool is an independent set of nodes with its own computing, networking, and storage resources.

1.10 Hybrid and multi-cloud

MinIO is ideal for enterprises looking for consistent, high-performance, and scalable object storage for their hybrid cloud strategy. Designed natively on Kubernetes and compatible with S3 from the ground up, today MinIO has more than 7.7 million instances running in AWS, Azure, and GCP—more than all other private clouds combined. When added to millions of private cloud instances and widespread edge deployments - MinIO is the hybrid cloud leader.

1.11 Cloud Native Support

MinIO has been built from the ground up over the past four years and is native to the technologies and architectures that define the cloud. These include containerization, Kubernetes orchestration, microservices, and multi-tenancy. No other object store is more Kubernetes-friendly.

1.12 Open all source code + enterprise level support

Open source powers the cloud. Open source powers the enterprise. Open source powers MinIO. Every day tens of thousands of customers and community members trust MinIO to deliver security, resiliency, durability and operational excellence to their deployments.

MinIO is 100% open source based on Apache V2 license. This means that MinIO customers can use and integrate MinIO automatically, unlimitedly and freely, freely innovate and create, freely modify, and freely redistribute new versions and software. Indeed, MinIO has strong support and Driven many Fortune 500 companies. In addition, the variety and specialization of its deployment provides advantages that other software cannot match. MinIO is 100% open source, under the Affero General Public License version 3 (AGPLv3). This means that MinIO's clients are not locked in, free to inspect, free to innovate, free to modify and free to redistribute. The variety of its deployment strengthens software in ways that proprietary software can never provide.

1.13 Bucket & Object Immutability

Protecting data from deletion (accidental or intentional) is a critical compliance component involved in every industry. MinIO supports full functionality including object locking, retention, legal holds, governance and compliance. MinIO's Buck and Object Mutability have been certified and validated by Cohasset Partners' Veeam for use in accordance with SEC Rule 17a-4(f), FINRA Rule 4511, and CFTC Regulation 1.31.

1.14 Bucket and Object Versioning

Object-level versioning is a significant evolution from SAN and NAS versioning methods. Version control not only provides data protection, but also serves as the basis for powerful features such as object locking, immutability, layering, and lifecycle management.

With MinIO, objects are independently versioned according to Amazon's S3 structure/implementation. MinIO assigns each version of a given object a unique ID - applications can specify a version ID at any time to access a point-in-time snapshot of that object.

1.15 Data Lifecycle Management and Hierarchy

As data continues to grow, the ability to jointly optimize access, security, and economics becomes a requirement rather than an optional extra. This is where lifecycle data management comes in. MinIO provides a unique set of capabilities to protect data within and across clouds - both public and private.

MinIO's enterprise data lifecycle management tools, including version control, object locking, and various derivative components, can satisfy many use cases.

1.16 Automated data management interface

Data is a business' most critical asset, so it must be easily and securely available across the organization in order to maximize its value to everyone. Therefore, businesses must adopt a series of data interface methods according to the needs of the audience. MinIO provides a set of options to cover every role in the data-driven enterprise, such as Graphical User Interface (GUI), Command Line Interface (CLI) and Application Programming Interface (API). MinIO's data management interfaces operate interchangeably to provide granular, high-performance, and scalable object storage management.

1.17 Monitoring

Metrics and logging are critical when tracking the health and performance of any system. MinIO provides complete visibility into the cluster with detailed storage performance monitoring, metrics, and logging of every operation. The result is a powerful, transparent, and high-performance answer to object storage monitoring, alerting, and observability.

1.18 AWS S3 standard compliant

S3 compatibility is a hard requirement for cloud native applications. MinIO is uncompromising in its adherence to the API, and with tens of thousands of users (commercial and community), MinIO's S3 implementation is the most widely tested and implemented AWS S3 alternative in the world.

As one of the early adopters of the S3 API (V2 and V4) and one of the only S3-focused storage companies, MinIO's large community ensures that no other AWS alternative is more compatible. MinIO is also one of the few companies that supports S3 Select.

1.19 Performance Benchmarks

MinIO pioneered high-performance object storage and remains the fastest on the market with GET/PUT throughput of 325 and 165 GiB/sec, respectively, on 32 nodes in NVMe. These speeds enable any workload to run on MinIO - from advanced analytics to AI/ML.

1.20 Simple installation, deployment and maintenance

Minimalism is the guiding design principle of MinIO. Simplicity reduces the chance for error, increases uptime, provides reliability, and serves as the foundation for hybrid and multicloud installation performance. MinIO can be installed and configured in minutes. The number of configuration options and changes is kept to a minimum, which results in close to zero system administration tasks and fewer failure paths. The MinIO upgrade is completed through a simple command, which can complete the MinIO upgrade without interruption, and can complete the upgrade operation without stopping, greatly reducing the total usage and operation and maintenance costs.

1.21 The fastest growing object storage system in the world

MinIO production deployments cover the world. It's growing by the day as the world's most used and downloaded object storage service - powered by an extraordinary community of contributors and evangelists.

1.22 Comprehensive development documentation

As an open source project based on the Golang programming language, MinIO has a high-performance distributed storage solution and has very complete official documents. .

Official website document address: https://docs.min.io/cn/

1.23 Multilingual support

At present, MinIO supports the mainstream development languages ​​in the market and can be quickly integrated and used through the SDK.

1.24 Management interface support

After the MinIO service is installed, you can log in to the system directly through a browser to manage folders and files. Very convenient to use.

Two architecture design

Designed to be cloud-native, MinIO can run as a lightweight container managed by an external orchestration service such as Kubernetes. The entire server is about 40MB of static binaries, making efficient use of CPU and memory resources even under high load. The result is that you can co-host a large number of tenants on shared hardware.

MinIO runs on commodity servers with local drives (JBOD/JBOF). All servers in the cluster are functionally identical (full symmetric architecture). There are no namenodes or metadata servers.

MinIO writes data and metadata together as objects, eliminating the need for a metadata database. Furthermore, MinIO performs all functions (erasure codes, bit rotrot checks, encryption) in an inline, strictly consistent operation. The result is that MinIO is incredibly flexible.

Each MinIO cluster is a collection of distributed MinIO servers, one process per node. MinIO runs in user space as a single process and uses lightweight coroutines to achieve high concurrency. Drives are grouped into Scratch Sets (by default, 16 drives per group), and objects are placed on those Scratch Sets using a deterministic hashing algorithm.

MinIO is designed for large-scale, multi-datacenter cloud storage services. Each tenant runs its own MinIO cluster that is completely isolated from other tenants, allowing them to protect them from any disruptions from upgrades, updates and security incidents. Each tenant scales independently by federating clusters across geographic regions.

Three hardware support

High-performance software requires high-performance hardware support to achieve better performance

Although MinIO is hardware agnostic, these industry standard, widely available boxes have been tested to match the excellent capabilities of MinIO's software. The MinIO folks work tirelessly to optimize MinIO's software for the latest hardware, so it can greatly outperform overpriced outdated devices.

Four basic concepts

Object : Basic objects stored in Minio, such as files, byte streams, videos, audios, logs, images, etc.

Bucket : The logical space used to store Objects. The data in each bucket is isolated from each other. For the client, it is equivalent to a top-level folder for storing files.

Drive : the disk that stores data, which is passed in as a parameter when MinIO starts. All object data in Minio will be stored in Drive.

Set  : It is a collection of Drives. Distributed deployment automatically divides one or more Sets according to the cluster size, and the Drives in each Set are distributed in different locations. An object is stored on a Set. (For example: {1…64} is divided into 4 sets each of size 16.)

Five application scenarios

Storage requirements for massive unstructured data on the Internet

  • E-commerce website: massive product images
  • Video Website: Massive Video Files
  • Network disk: massive files
  • Social Networking Sites: Massive Images
  • Logging System: Audit Logging
  • Mirror warehouse: massive docker images

This article is an original article by joshua317, please indicate when reprinting: Reprinted from joshua317 blog  What is MinIO - joshua317's blog

Guess you like

Origin blog.csdn.net/joshua317/article/details/128259365