Performance analysis, adjustment method

 Content of the article from the "summit of performance" section 2.5

 Skilled persons past, the first to win, to be the enemy can win. Not win in himself, can win the enemy.

                    --- "The Art of War military shaped articles of"

 

First, the street lights falsely law

  The method of non-deliberate. Feel free to look familiar observation tools, we have a chance to hit some problems may overlook some problems.

  Performance tuning can be trial and error in a trial and error way to know adjustable parameters are set, familiar with a variety of different values ​​and see if that helps.

  This method can also reveal problems, but when adjusted with the problem you are familiar with the relevant tools and did not progress very slowly.

  This method of using a class named observed deviation, effects such deviations is called streetlamp. From a fable:

  One night, a policeman saw a drunk looking for something on the ground below the street and asked what he was looking for. Drunk replied that he had lost the key. The police looked at could not find, asked him: "Are you sure here if you are lost, you in the street?" The drunk said: "No, but here's just the brightest."

  As you look at top, not because doing so makes sense, but the user does not know how to use other tools.

  The problem with this method to find the problem may be true, but not necessarily that you're looking for.

Second, the method of random variation false

  This is an experimental and correct approach. Users random guessing where the problem may exist, and then make changes until the problem disappears. To determine whether the performance has been raised or as the determination result of each variation, the user may select an index study, such as an application program run time, delay time, the operating rate (the number of operations per second), or the throughput (per second The number of bytes) overall process is as follows:

  1. Select an item to make any changes (for example, a variable parameter)

  2, to make changes in a certain direction

  3, measurement performance

  4, to make changes in the other direction,

  5, measurement performance

  6, the result of step 3 or step 5 is not better than the reference value? If so, subject to change and returns to step 1

  This adjustment process may end up only apply to the measured workload , the method is time consuming and adjustments may be made can not be maintained long term. For example, a change to avoid the application of a database or operating system bug, as a result can improve performance, but when the bug is fixed, so changes to the program no longer made sense, the key is that no one really understand this thing.

  There is another risk changes do not understand, that may lead to worse problems during peak production load, and therefore the need for this purpose prepare a fallback plan.

Third, blame others falsely law

  Steps:
  1, to find a component that you are not responsible for the system or the environment

  2, assuming that the problem is associated with that component

  3, the problem that threw the team responsible for assembly

  4, if it proves wrong, return to step 1

Four, Ad Hoc checklist method

  When you need to check and debug systems, technical support personnel often take a moment to go over a step by step checklist. When a typical scene in a production environment deployment of new servers or applications, technical support staff will spend half a day to check the FAQ system again under real pressure. This class checklist that issue before the system is based on experience and the type of the Ad Hoc encountered.

  For example, this is a checklist:

  Run iostat -x 1 await inspection column. If the column continues for more than 10ms under load, then that is too slow disk or disk overload.

  A checklist will contain many of these inspection items.

  Such lists can provide the most value in the shortest possible time, it is timely recommendations and require frequent updates to ensure reflect the current status. Such lists are mostly handled easily fix problem records, such as setting the adjustable parameters, rather than doing custom fixes for source code or environment.

  If you manage a technical support team of professionals, Ad Hoc checklist can effectively ensure everyone knows how to check the worst problem that can cover all the obvious questions. Checklist able to write clearly and they specification, illustrate how to identify each problem and how to make repairs. But of course, this list should be updated when a long hold.

Fifth, the problem statement method

  How clear problem statement is routine work of support staff beginning of the problem, done by asking a few questions about the customer:

  1. What makes you think there is a performance problem?

  2, before the system is running okay?

  3. What recent changes? software? hardware? load?

  4, problems can delay or run time to express it?

  5, other issues that affect the people and applications?

  6, the environment is kind of how? Those using the software and hardware? What version? What configuration?

  Ask these questions and get the appropriate answer will usually immediately point to a root cause and solution. So the question as a statement of law independent methods included here, it is this method and when you deal with a new problem, first of all should be used.

Sixth, the scientific method

  Scientific research method by assuming the problem is unknown and experimental. step:

  1, problem

  2, assuming that

  3, prediction

  4, experimental

  5, analysis

  The problem is to describe the performance issues. From this point you can assume there a reason for poor performance. Then you test, may be observability can be experimental, to see based on the assumption of the forecast is correct. Finally, analysis of experimental data collected.

  For example: you may find an application when migrating to a memory less system performance will drop, you assume that result in poor performance because the smaller the file system cache. You can use the observed test methods were measured cache turnover of the two systems, forecasting a smaller system cache memory loss rate will be higher. Experimental methods can increase the cache size (plus memory), predict the performance will be improved. In addition, more simply, experimental test can artificially reduce small large cache (using the number of adjustable parameters), the expected performance will deteriorate.

  Example (observability):

  1, question: What led to database queries are slow?

  2, assuming that: noise neighbors (other cloud tenants) in the implementation of disk I / 0, and database disk I / O in the competition (file system)

  3. Prediction: If you get in a database query process file system I / O delay, we can see the file system for the query is slow is responsible

  4, test: trace file system latency, latency to find the proportion of the total file system queries the delay is less than 5%

  5: File and disk systems no responsibility for the slow queries

  Although the problem is not resolved, but the environment in some of the large components have been ruled out. People perform the investigation can go back to step 2 to make a new hypothesis.

  Example (Experimental):

  1, question: why the HTTP request from host A to host B from host C to host than the longer C

  2, assume that: hosts A and B in different data centers

  3, prediction: the same host A to host B moves the data center will fix the problem

  4, the test: A mobile host and Test Performance

  5: performance was fixed - consistent with the hypothesis

  If the problem is not resolved, before starting a new hypothesis test to be restored before changes!

Seven, diagnostic cycle

  Similar diagnostic period and scientific method:
  Suppose -> instrument calibration -> Data -> hypothesis

  Like the scientific method, this method is also used to test the hypothesis by collecting data. This cycle stressed that the data can quickly lead to new hypotheses, and then verify the improvement, in order to continue.

  The above two methods, the theoretical data has a good balance. Data from the assumption that the development process quickly, so the theory can not be identified as early as possible and abandonment, and then develop a better theory.

Eight, Kit Method

  Guide follows:

  1, lists the available performance tools (optional, can be purchased or installed)

  2. For each tool, it provides a list of useful indicators

  3, for each indicator, the indicator may list the interpretation of rules

  This checklist perspective of those tools can tell you what indicators can read, and how to interpret these indicators. Although this is quite efficient, depends only available (or know) tool, you can get an incomplete vision system, but similar to street lights and correct method, the user does not know his vision is not complete - but this could start to finish without a I know. Need customization tools (such as Dynamic Tracing) to discover the problem may never be identified and resolved.

  In practice, the law does identify tools in addition to the bottleneck of resources, error, and other types of problem to some extent, but generally less efficient.

  When a large number of tools and indicators can be chosen, one by one enumeration is very time-consuming when multiple tools have the same function, even worse, you need to spend extra time to understand the advantages and disadvantages of each tool, in some case, for example, to select the applications of micro file system benchmark tool, the tool quite a lot, although this time you only need one.

Nine, use method

  Performance use method is applied, for identifying bottlenecks in the system is a nutshell:

  For all the resources to review his usage, saturation and errors.

  Definition of Terms:

  Resources: All servers physical components (CPU, bus ~ ~ ~) certain software resources can be taken into account, provide useful indicators

  Utilization: within the prescribed time period, the percentage of time resources for service work. Although resources are busy, but resources as well as the ability to accept more work, more work can not accept the degree of saturation is considered.

  Wrong number of events: Error

  Certain types of resources, including memory, utilization refers to the capacity of the resources used. This is based on the definition of time is different, once the resource capacity of up to 100% utilization, you can not accept more work, resources, or will work queues (saturation), or returns an error, use the method it used to be identification.

  Error needs to be investigated, because they can hurt performance, if the failure mode is recoverable, it may be difficult to detect errors immediately. This includes failure retry operation, as well as redundant equipment failure device pool.

  In contrast to the law is the tool, use the method listed is the system resources rather than tools, which help you to get a complete list of questions, do confirm your time looking for tools. Even without some of the problems existing tools for answers, but these problems inherent knowledge is extremely useful for performance analysis: these are "known knowns"

  The method of analysis will be directed to use a certain number of key indicators, so you can verify that all system resources as soon as possible. After that, if you do not find a problem, then we can consider other methods.

  FIG depicts a flowchart of a method of use. Error checking is placed first, in the first usage rate and saturation. Errors are usually easy to explain quickly, before studying other indicators, they distinguish clearly is time-saving and efficient:

  

 

  This approach is likely to identify system bottlenecks problem, however, may be more than one system is facing a performance problem, so you may initially be able to find the problem, but the problem is by no means found that you care about. Before returning to use other resources as needed to traverse method, each discovery can be investigated with more ways.

  Index Description:

  Indicators generally use the following method:

  Usage: The percentage value within the predetermined time interval

  Saturation: waiting queue length

  Error: a report of the number of errors

  Although it seems a little counterintuitive, but even the overall usage for a long time at a low level, transient shock or a high usage can lead to saturation and performance issues. Some monitoring tools report usage is the average of more than five minutes, for example: CPU usage rate may change very severe, so the average duration of 5 minutes may mask a 100% utilization rate within a short time, even saturation.

  Think of some highway toll station, the utilization rate is equivalent to the number of toll stood busy charges. Utilization rate of 100% means that you can not find an empty toll booth, must be ranked in someone else's back (saturation) usage all day if I say the toll is 40%, you can determine whether or not a car in the same day a time row over the team? It may indeed be that queued at peak times, when the usage rate is 100%, but on this day is the average can not see.

  Resource list:

  The first step is to use the method to build a list of resources to be as complete as possible. The following is a usual server resource table, with corresponding examples:

  cpu: slots, cores, hardware threads (virtual cpu)

  Memory: dram;

  Network Interface: Ethernet port

  Storage devices: disks

  Controller: storage, network

  Internet: cpu, memory, I / O

  Each component commonly used as a kind of resource types. For example, memory is a resource capacity, the network interface is a type of I / O resources (IOPS or throughput). Some components reflect a variety of resource types: for example, both a storage device I / O resources are also resources capacity, then need to consider all types can cause performance bottlenecks, we should also know the I / O resources may further be used as line up system to study, and the request is queued service.

  Physical resources, such as hardware cache (e.g., cache cpu may not listed.) The method of use is the most effective decrease in high performance resource utilization method of treating or saturation, of course, other detection methods. If you are not sure whether the list included a resource, then include it, look at the actual indicators of what the situation is.

 

  Block diagram

  Another way is through all resources to find a system block diagram or drawing, is shown below, this figure also shows the relationship between the components, it is helpful to find bottlenecks in the data flow.

  CPU, memory, I / O interconnect bus, and often overlooked. Fortunately, they are not a common bottleneck in the system, because these components itself is designed with a margin of more than throughput. You may need to upgrade the motherboard, or reduce the load. For example: zero-copy memory and alleviates the load bus

  

  index

  Once you have control of the list of resources, you can consider three types of indicators: usage, saturation and error. The following table lists some of the kinds of resources and index types, and some possible indicators.

  These indicators are either mean a certain time interval, either cumulative number.

  

 

            The system block diagram of an example dual-CPU

  

  

           use method indicator example

  重复所有的组合,包括获取每个指标的步骤,记录下当前无法获得的指标,那些是已知的未知。最终你得到一个大约30项的指标清单,有些指标难以测量,有些根本测不了。所幸的是常见的问题用较简单的指标就能发现(例如:CPU饱和度,内存容量饱和度,网络接口使用率,磁盘使用率),所以这些指标首先要测量

  一些比较难的组合示例可见下表

  

 

            use方法指标的进阶示例

  上述的某些指标可能用操作系统的标准工具是无法获得的,可能需要使用动态跟踪或者用到CPU性能计数器。

  软件资源

  某些软件资源的检测方法可能相似。这里指的是软件组件,而不是整个应用程序,示例如下:

  1、互斥锁:锁被持有的时间是使用时间,饱和度指的是有线程排队在等锁

  2、线程池:线程忙于处理工作的时间是使用时间,饱和度指的是等待线程池服务的请求数目

  3、进程/线程容量:系统的进程或线程的总数是有上限的,当前的使用数目是使用率,等待分配认为是饱和度,错误是分配失败

  4、文件描述符容量:同进程/线程容量一样,只不过针对的是文件描述符

  如果这些指标在你的案例里管用,就用它们;否则,用其他方法也是可以的,如延时分析。

  使用建议:

  对于使用上述这些指标类型,这里有一些总体的建议:

  使用率:100%的使用率通常是瓶颈的信号(检查饱和度并确认其影响)。使用率超过60%可能会是问题。基于以下理由:时间间隔的均值,可能掩盖了100%使用率的短期爆发,另外,一些资源,诸如硬盘,通常在操作期间是不能被中断的,即使是做优先级较高的工作,随着使用率的上升,排队延时会变的更频繁和明显。

  饱和度:任何程度的饱和都是问题(非零),饱和度可以用排队长度或者排队所花的时间来度量

  错误:错误都是值得研究的。尤其是随着错误增加性能会变差的那些错误

  低使用率、无饱和、无错误:这样的反例研究起来容易。这点要比看起来还有用--缩小研究的范围能帮你快速的将精力集中在出问题的地方,判断其不是某一个资源的问题,这是一个排除法的过程。

  云计算

  在云计算环境,软件资源控制在于限制或者给分享系统的多个租户设定阈值。在Joyent的公司,我们主要用os虚拟技术,来设定内存限制,cpu限制,以及存储I/O的门限制。每一项资源的限定都能用use方法来检验,与检查物理资源的方法类似。

  举个例子,内存容量使用率是租户的内存使用率与其他内存容量的比值。内存容量饱和出现于匿名的分页活动,虽然传统的页面扫描此时可能是空闲的。

十、工作负载特征归纳

  工作负载特征归纳是辨别这样一类问题简单而又高效的方法--由施加的负载导致的问题。这个问题关注于系统的输入,而不是所产生的性能。你的系统可能没有任何架构或者配置上的问题,但是系统的负载超出了它所能承受的合理范围。

  工资负载可以通过回答下列问题来进行特征归纳:

  1、负载是谁产生的?进行ID,用户ID,远端IP地址?

  2、负载为什么会调用?代码路径、堆栈跟踪?

  3、负载的特征是什么?IOPS、吞吐量、方向类型(读取/写入)?包含变动(标准方差),如果有的话

  4、负载是怎样随着时间变化的?有日常模式吗?

  将上述所有的问题都做检查会很有用,即便你对于答案会是什么已经有了很强的期望,但还是应做一遍,因为你可能大吃一惊。

  请思考这么一个场景:你碰到一个数据库性能问题,数据库请求来自一个web服务器池,你是不是应该检查正在使用数据库的IP地址?你本认为这些应该都是web服务器,正如所配置的那样。但你检查后发现好像整个因特网都在往数据库扔负载,以摧毁其性能。你正处于dos攻击中!
  最好的性能来自消灭不必要的工作,有时候不必要的工作室由于应用程序的不正常运行引起的,例如:一个困在循环的线程无端的增加CPU的负担。不必要的工作也有可能源于错误的配置--举个例子,在白天运行全系统的备份--或者是之前说过的dos攻击。归纳工作负载的特征能识别这类问题,通过维护和重新配置可以解决这些问题。

  如果被识别出的问题无法解决,那么可以用系统资源控制来限制它。举个例子,一个系统备份的任务压缩备份数据会消耗CPU资源,这会影响数据库,而且还要用网络资源来传输数据。用资源控制来限定备份任务对CPU和网络的使用(如果系统支持的话)这样虽然备份还是会发生,但不影响数据库

  除了识别问题,工作负载特征归纳还可以作为输入用于仿真基准设计。如果度量工作负载只是用的均值,理想情况,你还要收集分布和变化的细节信息。这对于仿真工作负载的多样性,而不是仅测试均值负载是很重要的。

  工作负载分析通过辨识出负载问题,有利于将负载问题和架构问题区分开。

  执行工作负载特征归纳所用的工具和指标视目标而定。一部分应用程序所记录的详细的客户活动信息可以成为统计分析的数据来源。这些程序还可以提供每日或每月的用户试用报告,这也是值得关注的。

十一、向下挖掘分析法

  深度分析开始于在高级别检查问题,然后依据之前的发现缩小关注的范围,忽视那些无关的部分,更深入发掘那些相关的部分。整个过程会探究到软件栈较深的级别,如果需要,甚至可以达到硬件层,以求找到问题的根源。

  在《Solaris性能与工具》一书中,针对系统性能,深度分析方法分为以下三个阶段:

  1、检测:用于持续记录高层级的统计数据,如果问题出现,予以辨别和报警

  2、识别:对于给定问题,缩小研究的范围,找到可能的瓶颈

  3、分析:对特定的系统做进一步的检查,找到问题根源并量化问题

  检测是在公司范围内执行,所有服务器和云数据都会聚合在一起。传统的方法是使用SNMP,监控支持该协议的网络设备。数据可以揭示长期的变化特点,这些是无法由短时间内运行的命令行工具获得的。如果发现怀疑的问题,多数的检测方案会报警,此时及时分析进入下一阶段。

  问题的识别在服务器上是交互执行的,用标准的检测工具来检查系统的组件:CPU、磁盘、内存等等。通常是用vmstat、iostat、mpstat这样的工具起一个命令行会话来完成的。还有一些较新的工具通过GUI支持的实时性能分析。

  有些分析工具还具备tracing或者profiling的功能,用以对可疑区域做更深层次的检查。做这类深层的分析可能需要定制工具乃至检查源代码。大量研究的努力就花在这里,俺需要对软件栈的层次做分离来找出问题的根本原因。执行这类任务的工具包括stace、truss、pref、Dtrace

  五个why

  在分析阶段,你还有一个能用的方法,叫做“五个why”技巧:问自己why?然后作答:

  1、查询多了数据库性能就开始变差。why?

  2、由于内存换页磁盘I/O延时。why?

  3、数据库内存用量变得太大了。why?

  4、分配器消耗的内存比应该用的多。why?

  5、分配器存在内存碎片的问题。why?

  这是一个真实的例子,但出人意料的是要修复的是系统的内存分配库。是持续的质问和对问题实质的深入研究使得问题得以解决。

十二、延时分析

  延时分析检查完成一项操作所用的时间,然后把时间再分成小的时间段,接着对有着最大的延时的分析时间段做再次的划分,最后定位并量化问题的根本原因。与深度分析相同,延时分析也会深入研究到软件栈的各层来找到延时问题的原因。

  分析可以从所施加的工作负载开始,检查工作负载是如何在应用程序中处理的,然后深入到操作系统的库、系统调用、内核以及设备驱动。

  举个例子、Mysql的请求延时分析可能涉及到以下问题的回答:

  1、存在请求延时的问题吗?(是的)

  2、请求时间大量花费在CPU上?(不在CPU上)

  3、不花在CPU上的时间在等待什么?(文件系统I/O)

  4、文件系统的I/O时间是花在磁盘I/O上还是锁竞争上?(磁盘I/O)

  5、磁盘I/O主要是随机寻址的时间还是数据传输的时间?(数据传输的时间)

  对于这个问题,每一步所提出的问题都讲延时划分成了两个部分,然后继续分析那个较大可能的部分:延时的二分搜索法,你可以这么理解,下图

  一旦是被出A和B中较慢的那个,就可以对其做下一步的分析和划分,依此类推。

  数据库查询的延时分析是R方法的目的。

  

 

            延时分析过程

十三、R方法

  R方法是针对Oracle数据库开发的性能分析方法,意在找到延时的根源,基于Oracle的traceevents。它被描述成“基于时间响应性能提升方法,可以得到对业务的最大经济收益”,着重于识别和量化查询过程中所消耗的时间,虽然它是用于数据库研究领域,但是方法思想可以应用于所有系统,作为一种可能的研究手段,值得在此提及,

十四、事件跟踪

  系统的操作就是处理离散的事件,包括CPU指令、磁盘I/O,以及磁盘令、网络包、系统调用、函数库调用、应用程序事件、数据库查询等等。性能分析通常会研究这些事件的汇总数据,诸如每秒操作数,每秒的字节数、或者延时的均值。有时一些重要的细节信息不会出现这些汇总之中,因此最好的研究事件的方法是逐个检查。

  网络排错常常需要逐包检查,用的工具有tcpdump,下边这个例子将个个网络包归纳汇总成了一行行文字。

  

  

  tcpdump按照需要可以输出各类信息

  存储设备I/O在块设备层可以用iosnoop来跟踪

   

 

  这里打印出了一些时间戳,包括起始时间,终止时间,请求和完成之间的时间,以及服务这这次I/O的估计时间。

  系统调用层是另一个跟踪的常用层,工具有Linux的strace和基于Solaris系统的truss。这些工具也有打印时间戳的选项。

  当执行事件跟踪时,需要找到以下信息:

  1、输入:事件请求的所有属性,即类型、方向、尺寸等等

  2、时间:起始时间、终止时间、延时

  3、结果:错误状态、事件结果

  有时性能问题可以通过检查时间的属性来发现,无论是请求还是结果。事件的时间戳有利于延时分析,一般跟踪工具都会包含这个功能。上述的tcpdump用参数-ttt,输出所包含的DELTA时间,就测量了包与包之间的时间。

  研究之前发生的事件也能提供信息。一个延时特别差的事件,通常叫做延时离群值,可能是因为之前的事件而不是自身所造成的。例如,队列尾部时间的延时可能会很高,但这是由之前队列里的事件造成的,而并非该时间本身。这种情况只能用事件跟踪来加以辨别。

十五、基础栈统计

  把当前的性能指标与之前的数值做比较,对分析问题常常有启发作用。负载和资源使用的变化是可见的,可以把问题回溯到他们刚发生的时候。某些观测工具(基于内核计算器)能显示启动以来的信息统计,可以用来与当前的活动做比较。虽然这比较粗糙,但总好过没有。另外的方法就是做基础栈统计。

  基础栈统计包括大范围的系统观测并将数据进行保存以备将来参考。与启动以来的信息统计不同,后者会隐藏变化,基础栈囊括了每秒的统计,因此变化是可见的。

  在系统或应用程序变化的之前和之后都能做基础栈统计,进而分析性能变化。可以不定期的执行基础栈统计并把它作为站点记录的一部分,让管理员有一个参照。了解“正常”是什么样的。若是作为性能检测的一部分,可以每天都按固定的间隔执行这类任务。

十六、静态性能调整

  静态性能调整处理的是架构配置的问题。其他的方法着重的是负载施加后的性能:动态性能。静态性能分析是在系统空闲没有施加负载的时候执行的。

  做性能分析和调整,要对系统的所有组件确认以下问题:

  1、该组件是需要的嘛?

  2、配置是针对预期的工作负载设定的嘛?

  3、组件的自动配置对于预期的工作负载时最优的嘛?

  4、有组件出现错误吗?是在降级状态吗?

  下面是一些在静态性能调整中可能发现的问题:

  1、网络接口协商:选择100Mb/s而不是1Gb/s

  2、建立RAID池失败

  3、使用的操作系统、应用程序或固件是旧的版本。

  4、文件系统记录的尺寸和工作负载I/O的尺寸不一致

  5、服务器意外配置了路由

  6、服务器使用的资源,诸如认证,来自远端的数据中心,而不是本地的。

  幸运的是,这些问题都很容易检查。难得是要记住做这些事情。

十七、缓存调优

  从应用程序到磁盘,应用程序和操作系统都会部署多层的缓存来提高I/O性能。这里介绍各级缓存的调优策略:

  1、缓存的大小尽量和栈的高度一样。靠近工作执行的地方,减少命中缓存的资源开销。

  2、确认缓存开启并确实在工作。

  3、确认缓存的命中/失效比例和失效率

  4、如果缓存的大小是动态的,确认它的当前尺寸。

  5、针对工作负载调整缓存。这项工作依赖缓存的可调参数。

  6、针对缓存调整工作负载。这项工作包括减少对缓存不必要的消耗,这样可以释放更多空间来给目标负载使用。

  要小心二次缓存--比如,消耗内存的两块不同的缓存块,把相同的数据缓存了两次。

  还有,要考虑到每一层的缓存调优的整体性能收益。调整CPU的L1缓存可以节省纳秒级别的时间,当缓存失效时,用的是L2。提升CPU的L3缓存能避免访问速度慢的多的主存,从而获得较大的性能收益。

十八、微基准测试

  微基准测试测量的是施加的简单的人造工作负载的 性能。微基准测试可以用于执行科学方法,将假设和预测放到测试中验证,或者作为容量规划的一部分来执行。

  这与业界的基准定标是不同的,工业的基准定标是针对真实世界和自然的工作负载。这样的基准定标执行时需要工作负载仿真,执行和理解的复杂度高。

  微基准测试由于涉及的因素较少,所以执行和理解会较为简单,可以用微基准测试工具来施加工作负载并度量性能。或者用负载生成器来产生负载,用标准的系统工具来测量性能。两种方法都可以,但是稳妥的方法时使用微基准测试工具并用标准系统工具再次确认性能数据。

  下边是一些微基准测试的例子,包括一些二维测试:

  1、系统调用:针对fork(),exec(),open(),read(),close();

  2、文件系统读取:从缓存过的文件读取,读取数据大小从1B变化到1MB;

  3、网络吞吐量:针对不同的socket缓冲区的尺寸测试TCP端对端数据传输。

  微基准测试通常在目标系统上的执行会尽可能快,测量完成大量上述的这类操作所要的时间,然后计算均值(平均时间=运行时间/操作次数)

  

Guess you like

Origin www.cnblogs.com/richered/p/11102228.html