Game Thinking 26: Game Server Stress Test Documentation (Added linux-related commands, to be continued on 02/10)

Article directory

1. Focus of stress testing

①Traffic
②Memory
③Some main functions are stress tested, such as 同时注册, 最大在线, 战斗, 地图移动, 数据存取etc.
④The macro data of the two pressures remain unchanged:
a. 各接口的压力比例不变, First, from the same type of game or the internal testing stage of this game, log piles are inserted to collect the call ratio of each interface; then, the interface ratio is converted into a scene ratio, for example, there will be 2% at the same time Complete login, 15% of players fight, 20% of players pull friends list, 10% of players gamble (an example of a mobile game scene).
b.玩家平均每分钟操作频率不变. The average operating frequency of players is also collected during the internal testing phase.
Therefore, the goal of the stress test is transformed into how to simulate the pressure that conforms to the ab data.

⑤Server configuration information

  • static information
1)CPU核数
2)内存
3)操作系统
4)带宽
5)网卡
6)硬盘
  • Dynamic information
    1) CPU utilization monitoring
    2) Memory monitoring
    3) Intranet bandwidth monitoring (intranet outgoing bandwidth, intranet incoming bandwidth, intranet outgoing packets, intranet incoming packets, number of TCP connections)

insert image description here

⑥Other indicators
●Throughput: the number of processed transactions within a fixed time interval. Usually it is the number of requests processed within 1 second, unit: transaction/second (tps).

平均吞吐量: The average value of the throughput in a period of time. Instantaneous changes in throughput cannot be reflected.

峰值吞吐量: The maximum throughput in a period of time. It is one of the important indicators used to evaluate the system capacity.

最低吞吐量: The minimum value of the throughput within a period of time. If the minimum value is close to 0, it means that the system has a "stuck" phenomenon.

70%的吞吐量集中区间: By counting the 15% and 85% throughput boundary values, calculate the 70% throughput concentration interval. The more concentrated the interval, the more stable the throughput.

响应时间: the processing time of a transaction. It usually refers to the time interval from when a request is sent, to when the server returns after processing, to when the response data is received, and the unit is milliseconds.

平均响应时间: The average value of the response time within a period of time. Fluctuations in response time cannot be reflected.

中间响应时间: The median value of response time in a period of time, 50% response time, half of the server response time is lower than this value and the other half is higher than this value.

90%响应时间: 90% of transaction response times are shorter than this value within a period of time. Respond to the overall response speed, and a 10% timeout rate above this value. It is one of the important indicators used to evaluate the system capacity.

最小响应时间: The minimum value of the response time. It reflects the fastest processing capability of the service.

最大响应时间: The maximum value of the response time. It reflects the slowest processing capability of the server.

CPU占用率: 1-CPU idle rate, which indicates the CPU usage and reflects the utilization of system resources.

2. Calculate the most time-consuming loading operation

1) Read data from the database, further divide the loaded types into various types, and calculate the most time-consuming operations

2) View the percentage of CPU that changes with the number of online users

It can be seen that from 8:00 p.m. to 14:00 p.m. the next day, the CPU change range of each server is fixed.

3) View memory changes

This is also a chart generated as the number of robots changes from 8:00 p.m. to 14:00 p.m.
(The memory is in an interval segment, indicating that the program has no memory leak.)

4) Remarks

For the specific 内存, CPU所占的百分比it is meaningless to compare each game, the design and data storage methods and storage structures are different, and the purpose of this test is to understand the number of online players and the memory occupied by the server for this game, A relationship between CPUs, in preparation for going online to better control the maximum number of people carried by each server.

3. Design characteristics of MMORPG server for stress test

1) Common features of MMORPG

①More than 80% of 开发成本the consumption is in normal logic processing, and more than 80% 性能消耗点is in modules related to vision.

  • For example
    , "Dragon in the Sky", 移动包and 技能包the CPU consumption accounted for more than 30%; "Tianya Mingyue Knife", a well-done battle, only consumes more than 50% of skill logic during group battles; An MMORPG that Tencent is developing, because of 后台寻路, 体素判定, 行为树定义的复杂AIand 分段技能设计, the CPU consumption is higher than similar products, the statistics are as follows:
1)场景心跳 75.5%
2)战斗请求:11.3%
3)移动请求:3.8%
4)其他 : 6.6%
5)剩余客户端请求:2.8%

2) Two driving forces behind the MMORPG background

  • ①Message-driven:
    Including the driver of the player's uplink protocol and the message driver of other servers, the main time-consuming source of this part is time 战斗请求包and 移动请求包combat and movement account for this part 80%of the performance consumption

  • ②Timer:
    It includes the heartbeat logic of each major system and each OBJ. When carrying 5000 players online, there are often as many as 100,000 monsters and NPCs. Therefore, the main time-consuming source of the timer is the scene heartbeat ( AI\CD inspection\sweeping enemies, etc.), this part accounts for the time-consuming processing of the entire CPU 75%左右.

  • ③ These two parts constitute the gray area, accounting for up to 90% of the total. The common point is that there are few cross-scenario operations and a small amount of public data access (such as mail, gangs, etc.). And 10% are various requests on the UI

3) Remarks of LuaJIT

①LuaJIT has a 2GB memory limit (as of now, the latest official version has 64-bit support disabled by default, and it is not recommended to use it in the release phase). If there are too many threads, there may be insufficient memory.
②If Lua is not used too much in the processing of movement, skills, and AI, then it is recommended to use LuaJIT to maintain efficiency.
③If the multi-thread logic relies too much on Lua, it is also a good choice to use native Lua to keep multi-thread running

4. Evaluation of various test methods

1) Live network data estimation

  • Background
    Data estimation on the live network is based on part of the data in the stress test process to estimate the future access situation of a large number of users. The 横轴representative table in the figure represents 吞吐量.纵轴CPU压力
  • The green part of the method flow
    chart represents the current server pressure. After collecting data for a period of time, a curve can be simulated. Assuming that the online cost of the server is estimated to be 80%, it is possible to infer the capacity of the network through curve fitting, and thus infer the maximum upper limit.
  • Advantages and
    Disadvantages ①Advantages: The test results are easy to visualize
    ②Disadvantages: Usually game servers are relatively complex. This method is only suitable for simple server fitting, and complex server data is not very accurate.

2) Pressure measurement of buying volume by real people

  • Method flow
    Real-person pressure testing is to invite a certain number of real users to play the game, so as to achieve a test effect on the server. The biggest feature of this method is that the user's behavior is relatively the most real, because the user's use will not be restricted at all, just like a real user online. The current "closed test" during the game's online process can be considered a kind of real-life stress test, which can help developers find some performance problems.
  • Disadvantages
    ① The performance problems exposed are limited: Many games that have passed the beta test will still have problems when they go online. One of the reasons is that the number of beta testers is usually too small. Although there are hundreds or thousands of users playing, the concurrency is not enough. Not enough to expose server-side performance problems;
    ②Not suitable for tuning: Server performance testing not only needs to expose server problems, but also requires constant regression tuning after exposure, but real people cannot completely repeat these behaviors.

3) Interface test

  • The interface test of the method process
    server is slightly different from the interface test in the traditional sense. When developers need to evaluate a set of servers but are short of time, we can consider selecting some representative functions and some High-risk functions are tested to evaluate the performance of the entire set of servers by seeing the big from the small.
  • Disadvantages
    The main problem is that the interface of the entire server cannot be traversed, and it is difficult to avoid some minor problems.

4) Record playback

  • The method flow
    is “录制”the 抓取数据包way to obtain the protocol of the game, such as grabbing the login package when the user logs in to the game; "playback" means resending these captured protocols to the server, so that in theory, the protocol volume can be enlarged through tools Level to achieve the purpose of performance testing, for example 之前录制的登入协议扩大1w倍给服务器, this simulates the situation where 10,000 people log in at the same time.

  • Disadvantages
    The protocol interaction of the game is very complicated. If you simply amplify the data packets, it will not cause much pressure on the server. This method is more suitable for testing of fixed input and output service types

5) Robot Test

  • Method flow
    The robot simulation test is a balance of the above tests. By highly restoring the user behavior of real players and simulating high-concurrency scenarios, the test effect is similar to that of many people playing games at the same time.
  • The advantages of robot simulation
    并发性不受限制, from 1W to 10W, the pressure can be set independently;
    可以反复执行, it is convenient for performance tuning and regression;
    ③Realize 7*24 hours of continuous monitoring, after the code is developed and submitted, the version will run a new test after automatic compilation, In this way, performance monitoring can be carried out every day. In terms of tuning, a repetitive test can be completely carried out, and regression and tuning can be carried out continuously. The problem with this method is that the robot simulation needs to be developed by a dedicated person, and there are relatively high requirements for the tester's development ability and analysis ability.

Five, linux test related commands

0) Padding command and use of flame graph

  • Flame graph introduction and links
    ①https://blog.csdn.net/gatieme/article/details/78885908
    ②https://zhuanlan.zhihu.com/p/85654612
  • free
  • ping
  • vmstat (VirtualMeomoryStatistics, virtual memory statistics)
  • iostat for reporting central processing unit (CPU) statistics and statistics for the entire system, adapters, tty devices, disks, and CD-ROMs
  • I/O statistics
  • dstat shows the cpu usage, disk io status, network packet sending status and page changing status. The output is colored and readable. Compared with the input of vmstat and iostat, it is more detailed and intuitive.
  • pidstat is mainly used to monitor the occupation of system resources by all or specified processes, such as CPU, memory, device IO, task switching, threads, etc.
  • The summary area of ​​the top command displays five aspects of system performance information: load, process status, cpu usage, memory usage, and
    swap partition.
  • iotop LINUX process real-time monitoring tool, the interface style is similar to the top command
  • htop is an interactive process viewer for Linux systems, a text-mode application (in console or X terminal), requires ncurses.
  • mpstat Report processors related statistics. Report CPU statistics.
  • netstat is used to display statistical data related to IP, TCP, UDP and ICMP protocols, and is generally used to check the network connection status of each port of the machine.
  • ps displays the status of the current process
  • strace Trace system calls and signals. Track the system calls and received signals generated during program execution to help analyze abnormal conditions encountered during program or command execution.
  • ltrace A library call tracer traces the process of calling library functions
  • uptime can print the total running time of the system and the average load of the system. The meanings of the last three numbers output by the uptime command are the average load of the system within 1 minute, 5 minutes, and 15 minutes, respectively.
  • lsof (list open files) is a tool to list open files on the current system.
  • perf is a system performance optimization tool that comes with the Linux kernel. The advantage lies in the close integration with the Linux Kernel. It can be first applied to new features added to the Kernel to view hot functions and cash miss ratios, thereby helping developers optimize program performance.
  • tcpdump
  • sar
  • blktrace
    insert image description here

1)CPU

  • Question 1: How is the utilization information output by top calculated, and is it accurate?

  • Question 2: The column of ni is nice, and it outputs the overhead of when the CPU is processing?

  • Question 3: wa stands for io wait, so is the CPU busy or idle during this time?

(1) Average load

(2) CPU context switching

(3) How to troubleshoot high CPU utilization

(4) Related tools

  • ①vmStat -Sm 1
    r: Indicates the number of processes being executed and waiting to be executed on this CPU
    (higher r indicates that the CPU is saturated)
  • ②Compared with top , pidstat 1
    can scroll and print the CPU usage of each process. The %CPU here can exceed 100, and
    %400 is equal to 4 CPUs running at %100
  • ③mpstat -p ALL 1
    prints out the time
    when each CPU is decomposed into each state ①If the percentage of CPU in the user mode is as high as 100%, it indicates that the single thread encounters a bottleneck
  • ON_CPU火焰图
    (1) Instructions for use
    ①The vertical axis represents the depth of the call stack (the number of stack frames), which is used to represent the call relationship between functions: the function below is the parent function of the function above.
    ②The horizontal axis represents the frequency of calls. The larger the width of a grid, the more likely it is the cause of the bottleneck.
    ③ Different types of flame graphs are suitable for different optimization scenarios. For example, on-cpu flame graphs are suitable for analyzing functions with high CPU usage, and off-cpu flame graphs are suitable for solving blocking and lock preemption problems. (2) Meaningless things:
    ①Horizontal
    sequence It is for aggregation, and has nothing to do with the dependency or call relationship between functions;
    ②The various colors of the flame graph are for convenience to distinguish, and have no special meaning in themselves
    (3) Usage steps
    Usage steps:
    ①Collection stack: perf, System Tap, sample-bt
查看CPU和内存占用前十的进程
ps aux|head -1;ps -aux | sort -k3nr | head -n 10 //查看前10个最占用CPU的进程
ps aux|head -1;ps -aux | sort -k4nr | head -n 10 //查看前10个最占用内存的进程

1. Install perf: My current server distribution is Ubuntu 16.04.6 LTS, so I need to install perf before using it. This tool is provided by linux-tools-common, but it needs to install the dependencies behind it.

#ubantu安装
root@master:~# apt install linux-tools-common linux-tools-4.4.0-142-generic linux-cloud-tools-4.4.0-142-generic -y

root@master:~# perf -v #显示perf的版本
perf version 4.4.167

#centos安装
yum install perf

2. When the installation is complete, we can sample and analyze the process with the process ID 25633 with the highest CPU usage in the above figure. First, we collect the call stack information of the process:

root@master:~# sudo perf record -F 99 -p 25633 -g -- sleep 30
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.039 MB perf.data (120 samples) ]

3. Parameter description
This command will generate a large data file, which depends on the process and CPU configuration you collect. If a server has 16 CPUs, sampling 99 times per second for 30 seconds, you will get 47,520 call stacks , up to hundreds of thousands or even millions of lines. The generated data acquisition file is in the current directory, and the name is perf.data.

1)perf record表示记录,命令可以从高到低排列统计每个调用栈出现的百分比
2)-F 99表示每秒99次,
3)-p 25633是进程号,即对哪个进程进行分析,
4)-g表示记录调用栈,
5)sleep 30则是持续30秒

You can simply display the percentage of each call stack under linux

root@master:~# sudo perf report -n --stdio

insert image description here

② Analyze data: statckcollapse/pl (use perf script tool to analyze perf.data and generate perf.unfold)

# perf script -i /root/perf.data &> /root/perf.unfold
perf script -i perf.data &> perf.unfold

Use stackcollapse-perf.pl to collapse the symbols in the content perf.unfold parsed by perf

#安装stackcollapse
git clone https://github.com/brendangregg/FlameGraph.git
#拷贝stackcollapse-perf.pl和flamegraph.pl到目标机器上。
chmod +x flamegraph.pl
chmod +x stackcollapse-perf.pl

# ./stackcollapse-perf.pl /root/perf.unfold &> /root/perf.folded  //这里折叠堆栈

③ Generate flame graph: flamegraph.pi

./flamegraph.pl /root/perf.folded > /root/perf.svg
./flamegraph.pl perf.folded > perf.svg

The browser opens.

  • ⑤OFF-CPU flame graph

  • ⑥Memory flame graph

  • ⑦Display lua stack (chatgpt said, I have not tried)

  1. Install the perf tool
  2. Enter the following command in the terminal: perf record -g -p $(pgrep lua)
  3. run lua program
  4. Enter the following command in the terminal: perf script | stackcollapse-perf.pl | flamegraph.pl > lua.svg
  5. Open the lua.svg file in the browser to view the flame graph.
    Add the --call-graph dwarf parameter to the perf record command to see the lua stack information in the flame graph. The modified command is as follows:
    perf record -g --call-graph dwarf -p $(pgrep lua)
    $(pgrep lua) is a command to find the process ID of a running process named "lua". Here, it is used as an argument to the perf record command so that the perf tool can record performance data of a running lua program. If the process is written in C++ and only uses lua when calling the interface, then when the perf tool records performance data, you can only see the stack information of C++, but not the stack information of lua. Add the --call-graph dwarf parameter to the perf record command, and you can see the stack information of lua in the flame graph. The modified command is as follows:
    perf record -g --call-graph dwarf -p $(pgrep lua)
    -g: enable the call graph (call graph) function, record function call relationship
    –call-graph dwarf: use dwarf debugging information to Generate call graph
    -p $(pgrep lua): Specify the process ID to be recorded, here use the pgrep command to find the process ID of the process named "lua"

2) memory

(1) Memory description: virtual memory and physical memory

(2) buffer and cache in memory

(3) Memory detection tool

(1) vmstat statistics virtual memory usage
    • Example of use
      1) Dynamically view memory changes
      insert image description here
      2) View memory amount and usage
      insert image description here
  • parameter meaning

swpd  交换出的内存量
free 空闲的可用内存
buff 用于缓冲缓存的内存
cache 用于页缓存的内存
si   换入的内存(换页)
so 换出的内存(换页)
  • Note
    If so and si are always non-zero, it means that there are a lot of paging operations. You can use top or ps to see the memory used by each process

  • Order

Usage:
 vmstat [options] [delay [count]]

Options:
 -a, --active           active/inactive memory
 -f, --forks            number of forks since boot
 -m, --slabs            slabinfo
 -n, --one-header       do not redisplay header
 -s, --stats            event counter statistics 输出列表
 -d, --disk             disk statistics
 -D, --disk-sum         summarize disk statistics
 -p, --partition <dev>  partition specific statistics
 -S, --unit <char>      define display unit  ### 单位,按照多少内存对齐k(1000),K(1024),m(1000000),M(1048576) bytes
 -w, --wide             wide output
 -t, --timestamp        show timestamp

 -h, --help     display this help and exit
 -V, --version  output version information and exit
 
For more details see vmstat(8).
(2) PSI is skipped, linux version 4.20 is required
(3) PS Check the details of the process error memory usage
  • It is
    recommended to use ps aux
  • Note
    %MEM main memory usage (physical memory\RSS) as a percentage of total memory
    RSS: resident set size (KB), shows memory usage, including shared memory such as system libraries, which may be used by dozens of processes Mapping, part of the shared memory VSZ is recalculated here
    : virtual memory size
root:# ps aux

USER      PID       %CPU    %MEM    VSZ    RSS    TTY    STAT    START    TIME    COMMAND

smmsp    3521    0.0    0.7    6556    1616    ?    Ss    20:40    0:00    sendmail: Queue runner@01:00:00 f

root    3532    0.0    0.2    2428    452    ?    Ss    20:40    0:00    gpm -m /dev/input/mice -t imps2

htt    3563    0.0    0.0    2956    196    ?    Ss    20:41    0:00    /usr/sbin/htt -retryonerror 0

htt    3564    0.0    1.7    29460    3704    ?    Sl    20:41    0:00    htt_server -nodaemon

root    3574    0.0    0.4    5236    992    ?    Ss    20:41    0:00    crond

xfs    3617    0.0    1.3    13572    2804    ?    Ss    20:41    0:00    xfs -droppriv -daemon

root    3627    0.0    0.2    3448    552    ?    SNs    20:41    0:00    anacron -s

root    3636    0.0    0.1    2304    420    ?    Ss    20:41    0:00    /usr/sbin/atd

dbus    3655    0.0    0.5    13840    1084    ?    Ssl    20:41    0:00    dbus-daemon-1 --system
(4) TOP view memory and CPU ratio

The commonly used command
-o indicates what standard to sort by

top -0 %MEM
top -o %CPU
(5) pmap memory mapping related, shared memory related (omitted)
(7) perf tool

3) File IO performance monitoring

(1) Two ways of I/O (cached I/O and direct I/O)

(2) Commands to monitor disk I/O

4) Network IO performance monitoring

(1) Performance indicators

(2) Network information

(3) Related commands

5) Other tools

(1) nmon performance monitoring

(2) glances system monitoring

(3)w

(4) Log monitoring tools tail and mutitail

(5) Flame graph type description

insert image description here

Guess you like

Origin blog.csdn.net/weixin_43679037/article/details/128164582