Everything you know and what you don't know about Effort Estimation

This article was first published in IEEE Software Magazine, brought to you by InfoQ and IEEE Computer Society.

 

There is growing evidence of a trend in which software projects are overrun by cost and effort. On average, this flooding is around 30% [1]. Moreover, comparing the accuracy of estimates in the 1980s and recent surveys shows little improvement. (Only the Standish Group's analysis noted a significant improvement in estimation accuracy. However, the estimation accuracy mentioned in their Chaos Reports was significantly muffled, perhaps just due to a change in their own analysis methods, where they previously selected too many questionable project analysis, the selected projects are now more representative.[2]) The estimation method has not changed much either. Although there is much research on formal estimation models, "expert estimation" still dominates. 【3】

There is a clear lack of improvement in estimation accuracy, but we know more about effort estimation than before. In this article, I try to summarize some of the knowledge gained. Some of this knowledge has the potential to improve estimation accuracy, some can tell us what is unlikely to lead to improvement, and some is about what we know and don't know about effort estimates. The full set of empirical evidence used in the text to reach its conclusions can be found elsewhere. 【1】

 

what we know

After poring over research on workload estimation, I've selected seven well-documented conclusions.

There is no "best" effort estimation model or method

Numerous studies have compared the accuracy of various estimation models and methods, and there are different opinions on which one is the best choice [4]. The results are inconsistent, and the main reason seems to lie in several different core relationships, such as between development effort and project size, which vary across contexts [5]. Furthermore, the biggest factors affecting development effort also appear to vary, suggesting that estimation models and methods should be tailored to the context in which they are used.

The lack of stability of the core relationship can also be used to explain why estimation models with advanced statistical methods have little or no improvement in estimation accuracy compared to simpler models. Statistically advanced estimation models will be closer to historical data, and therefore less accurate than simpler models when context changes. This conclusion tells us that software companies should try to build their own estimation models, rather than expect a general estimation model and tool to be accurate in their environment.

 

Clients eyeing lower prices, leading cause of flooding of workloads

The tendency to underestimate workload is most pronounced in price-based competitive situations, such as in bidding. In situations where price competition is less important, such as in-house software development, this trend is less pronounced. In fact, you may even see the opposite result. This shows that one of the main reasons for the flood of workload is that when customers choose software development suppliers, they tend to choose the one with the lower price. Therefore, project bids that underestimate the workload are more likely to be launched. These observations illustrate that when customers choose suppliers, they can avoid workload overload by focusing less on price and more on capacity.

The maximum workload and minimum workload intervals are too close

The gap between the maximum and minimum workload, such as the 90% confidence interval, is too close to reflect the uncertainty in the actual situation. Although there is strong evidence that we cannot accurately set the maximum and minimum effort ranges, current estimation methods assume that this is possible. This is particularly evident in the estimation method (three-point estimation) based on the PERT Program Evaluation and Review Technique, where the median effort is often calculated from the minimum and maximum effort.

Instead of using expert judgment, software practitioners should use historical data and previous estimation errors to set realistic minimum and maximum workloads. 【6】

Workload estimation is easy to go wrong, and once it goes wrong, it is difficult to recover

All software development effort estimates, even using formal estimation models, require expert judgment. But even if expert judgment can be accurate, it can easily go astray. The most serious deviations are likely to occur when the person in charge of the estimate, before or during the estimate, learns about the budget, customer expectations, available time, or other so-called "estimation anchor" factors. Without realizing it, the estimator's estimates may be very close to these anchors. Knowing, for example, that customers expect low prices, or fewer hours worked, can lead to underestimating the workload. If the request for an estimate includes some introductory phrase, such as “how much this project is so small and simple, roughly how much it will cost,” it can mislead the expert’s estimate.

There are many studies on how to recover from misdirection, or how to correct for biased estimates, but no reliable method has been found. From this, it can be deduced that those responsible for workload estimation should try to avoid seeing misleading or irrelevant information, for example, they should remove misleading and irrelevant information in the requirements document.

Relevant historical data and checklists can improve estimation accuracy

There is ample evidence that the accuracy of effort estimates can be improved using historical data and estimation checklists. When historical data is relevant to the project, and the checklist is tailored to the company, it is less likely that some work will be missed, and enough contingency measures are likely to be added to deal with certain risks, and previous experience can be reused. Therefore, a more realistic estimate can be produced. In particular, estimation accuracy can be improved when similar items are available for analogy or reference class [7] estimation.

While historical data (such as what percentage of work is spent on unplanned work and project management) and estimation checklists (such as reminders for easy-to-forget work) are useful, there are many companies that do not use either, Therefore, the estimated workload accuracy cannot be improved.

Combining multiple independent estimates can improve estimation accuracy

An average of estimation accuracy from multiple parties may be more accurate than individual effort estimates. A key premise of this is that the estimates are done independently, meaning that the expertise, background and estimation process of the estimators are different. Delphi-like estimation processes, such as "planning poker", in which software developers simultaneously present their independently made estimates (their poker cards), are especially useful in the context of software effort estimation.

A team-based, structured estimating process can make estimating mechanical combinations more valuable because knowledge sharing increases the amount of knowledge, such as the sum of work required to complete a project. Negative effects of group estimation judgments, such as "group thinking" and taking more risks in groups, have not been seen in the relevant documentation for software effort estimation.

In general, estimated models are less accurate than expert estimates. However, when combined, the differences between the model and the expert may instead improve the estimation accuracy.

Estimation has its disadvantages

Estimates not only predict the future, but frequently affect the future. An underestimation will result in an underestimation of quality, which may lead to rework later; an overestimation may reduce productivity, following "Parkinson's Law" - work will automatically take up all available time.

Therefore, consideration must be given to whether an effort estimate is indeed required. If it is optional or necessary later, it may be better not to do it, or to postpone the estimate until more information becomes available. Agile software development -- only estimating the effort for the next sprint or release, and having to use feedback from previous sprints or releases -- can be a good way to avoid the potential harm of premature estimation.

things we don't know

估算活动中存在一些问题,我们就是找不到好的解决办法,就算进行再多研究也不行。有三方面的挑战,说明我们的知识远远不能令我们满意。

如何准确估算超大型——即大型复杂项目的工作量

超大型项目对工作量估算提出了更高要求。不仅是更多价值面临风险,而且相关经验或历史数据也相对较少。很多超大型项目中的典型活动,比如组织层面的问题——太多项目干系人参与,本来就很难估算,因为其中常常涉及流程变更,以及项目干系人之间、与现有软件应用之间的复杂互动。

如何准确估算软件大小和复杂度

虽然对软件规模和复杂度的度量研究经年,但说到估算软件开发工作量,没有哪种方法特别有效。有些软件规模和复杂度的上下文有可能产生准确估算,但这种情况很少见。

如何度量、预测工作效率

即使可以出色估算软件的规模和复杂度,你还是需要可靠地预测个人或团队完成工作的工作效率。这种预测很复杂,因为不同软件开发者和团队之间的工作效率差距很大。同时也没有什么好的预测方法,除了某些比较实际的编程测试(比如:trialsourcing)之外。

目前,我们甚至不知道软件项目中是否存在“规模经济”效应,或是“规模不经济”效应。很多基于经验的研究表明:一般软件项目都有规模经济效应,但是软件实践者们基本上都相信“规模不经济”。然而,对于规模经济的研究结果,似乎要视乎分析的实施方法而定,而且也没有揭示多少规模和工作效率之间的关系。

就我们现在对软件工作量和成本估算的了解,基本上不能让我们解决软件行业中的估算挑战。不过,它的确指出多种措施,可以提升估算准确率。特别要指出,如果软件公司能执行如下举措,就可以提升估算效率:

  • 制定并使用简单的估算模型,并要根据实际情况剪裁,同时和专家估算一起使用。
  • 使用历史估算的误差,设定最大—最小工作量区间。
  • 避免曝露易于误导和无关的估算信息。
  • 使用针对本组织调整过的检查列表。
  • 使用结构化的、基于小组的估算过程,要保证估算的独立性。
  • 避免基于高度不完整信息的早期估算。

在竞争激烈的投标轮次中,客户很容易重点关注低价格,这很容易导致投标人过于乐观,最终导致成本泛滥,软件质量低下。这在其他领域称为“赢家的诅咒”。长远来看,很多软件客户会开始了解,他们对于软件项目应该固定成本和低价格的看法,会对项目成功造成负面影响。在那之前,软件公司应该意识到:处于这种情况时,他们被选中,是因为他们自己对成本过于乐观;此时要有策略准备,以管理或者避免赢家的诅咒。

参考资料

  1. T. Halkjelsvik and M. Jørgensen, “From Origami to Software Development: A Review of Studies on Judgment-Based Predictions of Performance Time,” Psychological Bulletin, vol. 138, no. 2, 2012, pp. 238–271.
  2. M. Jørgensen and K. Moløkken-Østvold, “How Large Are Software Cost Overruns? A Review of the 1994 CHAOS Report,” Information and Software Technology, vol. 48, no. 4, 2006, pp. 297–301.
  3. M. Jørgensen, “A Review of Studies on Expert Estimation of Software Development Effort,” J. Systems and Software, vol. 70, no. 1, 2004, pp. 37–60.
  4. T. Menzies and M. Shepperd, “Special Issue on Repeatable Results in Software Engineering Prediction,” Empirical Software Eng., vol. 17, no. 1, 2012, pp. 1–17.
  5. J.J. Dolado, “On the Problem of the Software Cost Function,” Information and Software Technology, vol. 43, no. 1, 2001, pp. 61–72.
  6. M. Jørgensen and D.I.K. Sjøberg, “An Effort Prediction Interval Approach Based on the Empirical Distribution of Previous Estimation Accuracy,” Information and Software Technology, vol. 45, no. 3, 2003, pp. 123–136.
  7. B. Flyvbjerg, “Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class Forecasting in Practice,” European Planning Studies, vol. 16, no. 1, 2008, pp. 3–21.

关于作者

MAGNE JØRGENSEN是 Simula研究实验室的研究院,也是奥斯陆大学的教授。他目前的研究兴趣包括:工作量估算、投标过程、外包、软件开发技能评估等。可以通过 [email protected] 联系他。

 

关于IEEE

本文首先出现于 IEEE 软件杂志。IEEE 软件杂志提供扎实的、经过同行审阅的信息,设计当今技术战略层面的话题。要想应对运营可靠、灵活的企业面对的挑战,IT 经理和技术主管们依靠 IT高级人员,寻找最先进的解决方案。

 

查看英文原文:What We Do and Don't Know about Software Development Effort Estimation

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326462270&siteId=291194637