How to conduct ECS specification selection and capacity verification based on actual business

With the vigorous development of cloud native technology and the increasingly low price of cloud products, more and more Geek developers and technology enthusiasts choose OSS object storage, ECS cloud server and other basic products to build their own websites, network disks and other applications. But for enterprises, in the face of ECS cloud servers with various types and specifications, how to understand the key characteristics of instance specifications and ensure the stable operation of business in scenarios such as insufficient inventory, product offline, and use of preemptible instances becomes the key.

Next, let's learn how to select an ECS cloud server and perform capacity planning through PTS. Next, we will introduce three different ECS selection methods.

01 Select a model based on instance specification parameters

Before starting an ECS instance, we will make configuration choices based on factors such as performance, price, and workload. According to different configuration parameters, ECS provides instance type families containing various instance types. In actual use, we can find the instance specification with the most appropriate parameters in the following two ways.

  • Instance type family [1] : Refer to the documentation for product details of the instance type family.
  • DescribeInstanceTypes [2] : Call the ECS API interface to obtain the latest performance specification parameters.

For this method, we need to understand the naming method of the instance type, so that we can quickly understand and find the required instance type. The format of the instance type family name is  ecs.<type family> , and the instance type name is  ecs.<type family>.<nx>large . The specific naming rules are as follows:

  • ecs : ECS product code of the cloud server.
  • <Specification family> : Composed of specification family body + specification family suffix.
  • x86 computing family and ARM computing family

  • Heterogeneous computing type families, elastic bare metal servers, and supercomputing cluster (SCC) instance type families Heterogeneous computing type families, elastic bare metal servers, and supercomputing cluster (SCC) instance type families generally use their own names, consisting of lowercase letters and Digital mix composition.

  • <nx>large: large means the number of vCPU cores, the larger n in <nx> means more vCPU cores. Among them, xlarge represents 4 cores, 2xlarge represents 8 cores, 3xlarge represents 12 cores, etc., and so on.

02 Selection according to self-built services and applications

When an enterprise chooses services to go to the cloud, while purchasing various cloud products, it will also self-build various services or applications to meet actual business needs. To facilitate selection, we have summarized and listed the ECS instance specifications corresponding to common self-built services and applications. You can select the corresponding instance specification family according to the applications used by the enterprise and refer to the selection principles.

03 Select according to the application scenario

In addition to the two methods based on direct parameters and self-built applications and services. In the actual production process, we will find that many business scenarios cannot be satisfied by a single service or application. At the same time, the additional requirements of related business scenarios will be relatively complex.

General application, game service, live video scene

In general-purpose scenarios, the performance requirements are CPU-intensive and require a relatively balanced ratio of processor and memory resources. Usually, the ratio of CPU to memory is 1:2. High-efficiency cloud disks are used for system disks, and SSD cloud disks are used for data disks. Or ESSD cloud disk. If the business requires stronger network performance, such as video barrage, you can choose a higher specification instance specification in the same series to improve the network packet sending and receiving capability (PPS).

Hadoop, Spark, Kafka big data scenarios

In scenarios such as Hadoop, Spark, and Kafka big data, due to the involvement of different nodes, the performance requirements are more complex, and it is necessary to balance the performance of each node, including computing, storage throughput, and network performance. Among them, management nodes and computing nodes can be treated as general scenarios. Depending on the size of the cluster, different instance types need to be selected. For example, ecs.g6e.4xlage can be used for less than 100 nodes, and ecs.g6e.8xlage can be used for more than 100 nodes. At the same time, data nodes require high storage throughput, high network throughput, and a balanced processor-to-memory ratio, so they can use the big data type d-series specification family. For example, ecs.d2s.5xlarge can be selected for MapReduce/Hive, and ecs.d2s.10xlarge can be selected for Spark/Mlib.

Database, cache, search scenarios

In this type of scenario, the processor-to-memory ratio of the instance type is generally required to be higher than 1:4, and some software is sensitive to storage I/O read and write capabilities and latency performance, so you can choose a type family with higher cost performance per memory unit.

Taking the database as an example, in the traditional way, the business system is directly connected to the OLTP database, and data redundancy is mostly realized through RAID disk arrays. Choose cloud server ECS, light-loaded, heavy-loaded databases can be flexibly deployed.

  • Light-loaded database: Enterprise-level instance specifications are used with cloud disks, which is more cost-effective.
  • Heavy-duty database: High storage IOPS and low read/write latency are required. It is recommended to use the local SSD-type i-series instance type family (with high-I/O local NVMeSSD local disks) to meet the requirements of large-scale heavy-duty databases.

Deep learning, image processing scenarios

In scenarios such as deep learning and image processing, applications require high-performance GPU accelerators. The following recommendations are made for the ratio of GPU to CPU.

  • Deep learning training: The ratio of GPU to CPU is recommended to be between 1:8 and 1:12.
  • General deep learning: The ratio of GPU to CPU is recommended to be between 1:4 and 1:48.
  • Image recognition reasoning: The ratio of GPU to CPU is recommended to be between 1:4 and 1:12.
  • Speech recognition and synthesis inference: The ratio of GPU to CPU is recommended to be between 1:16 and 1:48.

In addition to the above scenarios, we have summarized common scenarios and corresponding specification examples for heterogeneous computing and general computing, so that you can choose.

04 Specification Verification and Capacity Planning

After completing the selection and starting to use the ECS cloud server instance, this is just the beginning for the actual business. In actual business, when resources do not become a bottleneck, the trend of concurrency, TPS, and CPU is linear. When resource utilization is saturated, with the increase of business concurrency, the trend of TPS remains stable, and the CPU starts to soar. When the resource utilization is saturated and the concurrency exceeds the limit capacity point, the TPS trend and CPU fluctuate, and even the capacity starts to avalanche, and the service starts to become unavailable at this time.

Therefore, after selecting the appropriate specifications, service stability, resource utilization, and business throughput have become new considerations and focus points. Different team roles focus on different things:

  • The business team should pay attention to: whether the capacity can support the business volume stably during important business activities;
  • The R&D team should pay attention to: manual expansion, automatic expansion, release/rollback are not limited by resources;
  • The operation and maintenance team should pay attention to: resource utilization rate, departmental resource water level, resource usage and cost.

However, the traditional resource specification configuration mode based on manual experience has limitations. In order to ensure the stability of online business, a considerable amount of resources and volume are usually reserved to cope with load fluctuations, resulting in a large waste of resources.

Therefore, it is necessary to use the pressure testing tool performance test PTS [3] to verify whether the specification selection of the ECS cloud server is correct, and to detect the optimal safe capacity point and limit of the cloud system through capacity analysis and full link tracking according to actual business needs. Capacity points and damage capacity points are protected by current limiting and downgrading to achieve the best balance between system capacity and resource costs on the cloud.

Stress testing strategies for common business scenarios

In the process of simulating the business pressure and application scenario combination of the production environment or the drainage production environment, we will choose different pressure measurement strategies such as mutation, concurrency, load, stability, and limit to evaluate whether the various indicators of the system meet the requirements. Business volume, availability, stability and other requirements. Here, we summarize common business scenarios, corresponding stress testing strategies, and their advantages.

Best Practice: Create a pressure test task based on the performance test PTS

While demonstrating best practices, we provided a scenario experience based on Alibaba Cloud's free trial ECS + PTS, which we experienced while reading.

URL: https://developer.aliyun.com/adc/scenario/f37fb4d355684e189b7d87c9b6c8d10b

(1) Preparation before the start of the experiment

  • If your Alibaba Cloud master account is eligible for a free trial, it is recommended that you open a free trial for performance testing PTS and ECS cloud servers. Performance testing PTS provides 5,000 VUM of free stress testing resources for the first month, and the excess will be billed as a postpaid expert version. For specific billing details, see Performance Test PTS Billing Rules [4] .
  • If your Alibaba Cloud account can only receive some free trial products, please receive the products that meet the free trial qualifications, and then enter the experiment. Products that do not meet the free trial qualifications will be created using personal account resources and will incur a certain fee , please pay attention to the account deduction in time. In order to avoid wasting resources and causing account deduction, please configure the parameters strictly according to the parameters provided in this article. After the experiment is completed, please delete or disable the pressure test task in time.

1. Before the experiment starts, please choose to open a free trial .

2. Open the free trial of the performance test PTS: at the bottom of the lab page, select the performance test PTS , and click Try Now . On the Performance Test PTS panel , check the service agreement and click Try Now . At this time, the system will jump to the successful submission page, indicating that the trial application is successful, and you can try the performance testing PTS service for free.

Note: The opening process of the performance test PTS expert version will not incur any fees. In the actual process, the performance test (pay-as-you-go) provides 5000VUM + free pressure test quota, and the excess will be billed as the post-paid expert version. For specific billing details, please refer to See PTS charging rules for performance testing.

3. Open cloud server ECS free trial: at the bottom of the lab page, select cloud server ECS , click Try Now . On the ECS  panel of the cloud server, complete the parameter configuration according to the following instructions, select the protocol , and click Try Now . If a new page pops up, you can ignore it first. This trial tutorial takes the following configuration information as an example. In actual operation, it is recommended to select according to your actual business volume and needs.

4. Go to the ECS console [5] , in the left navigation bar, select Instances and Images > Instances . In the upper left corner of the top menu bar, select the same region as the trial instance (East China 1 (Hangzhou) in this tutorial example). Set the instance login password. Find the trial instance you created, click on the right side of the operation column

Instance properties > Reset instance password , follow the interface prompts to set the login password of the ECS instance. Click the ID of the trial instance, select the Security Group tab, click Configure Rules in the Security Group Operation column , and add ports that need to be allowed in the inbound direction. In this tutorial, ports 80, 443, 22, 3389, and 8080 are allowed in the security group inbound direction.

Note: About 3 to 5 minutes after the instance is created, the instance password can be reset. If it cannot be reset, please wait patiently and try again.

(2) One-click configuration and deployment of ECS applications

After preparing resources, you can quickly complete resource configuration or application building through one-click configuration. One-click configuration is implemented based on Alibaba Cloud's resource orchestration service ROS, and aims to help developers experience automatic configuration of resources through IaC. The completed content of the template includes:

  • Create a security group for the ECS instance.
  • Deploy the LAMP environment.
  • Based on the LAMP environment, use the PbootCMS source code to build a website.

1. Open the one-click configuration template link [6] and go to the ROS console. The system will automatically open the panel for creating resource stacks using new resources, and display the detailed information of the YAML file in the template content area.

2. On the template selection page, the ROS console is in the region where you accessed the console last time by default. In this experiment, the region should be in North China 2 (Beijing). Keep all the options on the page unchanged, and click Next to enter the configuration template parameters . page.

3. On the configuration template parameters page, modify the resource stack name, select the ECS instance you created when applying for a free trial, and modify the default password of the database root user by setting the MySQL database password. After filling in all required information and confirming, click Create to start one-click configuration.

Note:
1. To install Apache, MySQL and PHP, you need to download the application through the Internet, and the configuration time may vary due to reasons such as network stability. During the waiting period, you can refresh the resource stack information page to check whether the configuration is complete or click the Events tab to check the detailed progress of the configuration.
2. If you repeatedly execute the one-click configuration template of this tutorial on the same ECS instance, please ensure that the MySQL database password is exactly the same as the password set when the template was executed for the first time. Otherwise, the result of one-key configuration is unavailable.

4. On the resource stack information page, please wait patiently for about 8 to 10 minutes. When the status shows that the creation is successful , the one-key configuration is completed.

5. On the Resource Stack Information page, click Export .

6. On the Output tab, click  the value of WebUrl  (http://<ECS public IP address>/admin.php).

7. The PbootCMS login page appears, indicating that the one-key configuration is successful. In PbootCMS, you can customize the content of the website according to the needs of the company, including global configuration, basic content, article content, extended content, member center, etc. Subsequent console operations in this step are optional and can be skipped.

(3) Open the performance test PTS and create a scenario for stress testing and view reports

1. Go to the performance test PTS console [7] .

2. In the left navigation pane, choose Performance Test > Create Scenario .

3. On the Create Scenario page, click  PTS Pressure Test .

4. On the Create PTS Scenario page, set the scenario name , and then on the Scenario Configuration tab, enter the pressure test API name , such as demo, and click

icon. Enter  http://<ECS public network ip address>:80 in the stress test URL field.

Note: The public IP address of ECS can be viewed on the instance page of the cloud server management console [8] .

5. On the pressure configuration tab, configure relevant pressure measurement parameters, select auto increment for increment mode , enter 50 for the maximum concurrency , enter 10 for the increment percentage , enter 1 for the single-level duration , and enter 5 for the total stress test duration . After confirming the parameters, click Save to start the pressure test .

Note: Please configure the parameters in strict accordance with the guidelines to avoid unexpected charges due to exceeding the free trial quota.

6. In the prompt dialog box , confirm that the estimated consumption of the task does not exceed the free quota. After confirming , click OK to start the pressure test .

7. Please wait patiently for about 3 minutes, and you can view the real-time stress measurement data of the current application on the stress measurement page. On the stress test page, you can see overview data such as success rate, RT, and TPS.

8. After the pressure test is completed, it will automatically jump to the edit scene page, and then click the pressure test report .

9. On the Pressure Test Report tab, find your pressure test report, and click View under the Operation column on the right . Learn more about the content of the stress test report in the stress test report.

Capacity planning and performance bottleneck analysis

After getting the test result data, it is necessary to analyze the bottleneck points in the system to prepare for tuning. The performance bottleneck points of the system are mainly distributed in security protection, load balancing, back-end applications, middleware, database, operating system, hardware In terms of specifications, specific bottlenecks and tuning details, please look forward to the "Capacity Planning and Tuning" series of articles.

Related Links:

[1] Instance type family

https://help.aliyun.com/document_detail/25378.htm#concept-sx4-lxv-tdb

[2] DescribeInstanceTypes

https://help.aliyun.com/document_detail/25620.htm#doc-api-Ecs-DescribeInstanceTypes

[3] Performance test PTS

https://www.aliyun.com/product/pts

[4] Performance test PTS billing rules

https://help.aliyun.com/document_detail/433167.html?spm=a2c4g.29269.0.0.67fa7f32p02i2O

[5] ECS Console

https://account.aliyun.com/login/login.htm?oauth_callback=https%3A%2F%2Fecs.console.aliyun.com%2F

[6] One-click configuration template link

https://account.aliyun.com/login/login.htm?oauth_callback=https%3A%2F%2Fros.console.aliyun.com%2Fregion%2Fstacks%2Fcreate%3Fspm%3Da2c4g.611918.0.0.3aec628amGQK9n%26templateUrl%3Dhttps%3A%2F%2Fstatic-aliyun-doc.oss-cn-hangzhou.aliyuncs.com%2Ffile-manage-files%2Fzh-CN%2F20230320%2Fonyv%2F%25E5%25BF%25AB%25E9%2580%259F%25E6%2590%25AD%25E5%25BB%25BA%25E7%25BD%2591%25E7%25AB%2599.yml&lang=zh

[7] Performance test PTS console

https://account.aliyun.com/login/login.htm?oauth_callback=https%3A%2F%2Fpts.console.aliyun.com%2F&lang=zh

[8] Cloud server management console

https://account.aliyun.com/login/login.htm?oauth_callback=https%3A%2F%2Fecs.console.aliyun.com%2Fserver%2Fregion%2Fcn-beijing

Author: Zhao Jiajia

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/131552469