[北京大学] 软件工程领域顶会文章:Fixing Recurring Crash Bugs via Analyzing Q&A Sites [来自ASE 2015]

版权声明:如需转载或引用,请注明出处。 https://blog.csdn.net/weixin_39278265/article/details/82634413

前言

今日有所荒废。
现在是十点pm,但是还是想读两篇论文,感觉不读的话,总觉得是破坏了我之前积累下的习惯,而且不读的话我也会不习惯。

此外,看到网站:http://program-repair.org/bibliography.html 上面还有很多论文我没看过,不禁惭愧,只能在此亡羊补牢,小看一波。

本文内容

本文旨在介绍软工顶会文章“Fixing Recurring Crash Bugs via Analyzing Q&A Sites [来自ASE 2015]”。

1 会议介绍

ASE,非常出名的,全称:international conference on Automated Software Engineering

其官网在:http://www.ase-conferences.org/

明年在美国举行的 ASE 2019 会议主页: https://2019.ase-conferences.org/


此外,我觉得这些顶会的主页都有很多东西,还是得看一下啊。
比如best paper,等等
都是很重要的、很值得学习的资料,
以后要多关注这些会议。

2 论文作者信息

Qing Gao, Hansheng Zhang, Jie Wang, Yingfei Xiong, Lu Zhang, Hong Mei

作者都来自北大。

3 摘要

Recurring bugs are common in software systems, especially in client programs that depend on the same framework.
在相同框架下的client 程序经常会有相同的bug(我觉得作者大概就是这个意思)。

Existing research uses human-written templates, and is limited to certain types of bugs.
然而现在的研究还很局限, is limited to certain types of bugs.

扫描二维码关注公众号,回复: 3366816 查看本文章

In this paper, we propose a fully automatic approach to fixing recurring crash bugs via analyzing Q&A sites.
我们的工作,通过分析QA网站,来解决复现的crash bug。


疑问:难道crash bugs不算certain types of bugs?
这里没懂。

By extracting queries from crash traces and retrieving a list of Q&A pages, we analyze the pages and generate edit scripts.
技术,提取崩溃痕迹中的查询,抽取一个QA页的表,然后分析页面,生成修改补丁。

Then we apply these scripts to target source code and filter out the incorrect patches.
然后验证补丁。

The empirical results show that our approach is accurate in fixing real-world crash bugs, and can complement existing bug-fixing approaches.
complement这个词用的是真的妙。

recurring
英 [rɪ’kɜ:rɪŋ] 美 [rɪ’kɜ:ɪŋ]
v. 再发生,复发( recur的现在分词 );再现
If something recurs, it happens more than once.

4 这个文章有点像搞软件移植啊。


这个文章有点像搞软件移植啊。我好奇的是具体怎么把QA中的答案移植到buggy program 里面的,还是real-world buggy program。

5 “跨软件”的bug

Recurring bugs are bugs that occur often in different projects, and are found common, accounting for 17%-45% of the bugs [1, 2]

[1] S. Kim, K. Pan, and E. E. J. Whitehead, Jr., “Memories
of bug fixes,” in SIGSOFT ’06/FSE-14, 2006, pp. 35–45.
[2] T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. Al-Kofahi,
and T. N. Nguyen, “Recurring bug fixes in object-oriented
programs,” in ICSE ’10, 2010, pp. 315–324.


这个参考文献[2] 值得关注一下。有可能是 这篇文章的idea来源。

“Recurring bug fixes in object-oriented programs,” in ICSE ’10, 2010

6 原来作者反对的是 2013 年的PAR

因为PAR使用了人工定义的templetes。然而实际上由于bug 多种多样,templetes是无穷尽的…大概就是这样。

PAR [4] uses ten manually defined fix templates to fix bugs, and thus is not confined by the code in the current project. However, since the templates are extracted manually, only limited types of bugs can be fixed. In real-world programs, bug-fixing patterns can be numerous, and can vary from one framework to another. It is impractical to write every such template manually.

但是先说了genprog,意思是PAR比genprog更好,因为它不再局限于从当前的project找fix ingredients。


[4] D. Kim, J. Nam, J. Song, and S. Kim, “Automatic patch generation learned from human-written patches,” in ICSE ’13, 2013, pp. 802–811
这篇文章值得一读,这么经典,然而我还没看过。惭愧

7 第一次看到infer fixes(以前都是 generate fixes,repair program,等等) infer值得注意

To overcome the problem of manual fix-pattern extraction, in this paper we aim to infer fixes automatically via analyzing Q&A sites.

8 作者的idea

We observe that, many recurring bugs have already been discussed over the Q&A sites such as Stack Overflow, and we can directly obtain the fixes from the Q&A sites.


这是怎么观察到的?好厉害!
感觉没有足够的积累的话,是很难想出来的
基本功要扎实,还要多思考,多发散,大胆尝试。

9 开始讲自己的idea的价值,贡献

As the first step of fixing recurring bugs via analyzing Q&A sites, we focus on a specific class of bugs: crash bugs.
首先讲明了,我们修复的是特定的type的bug,也就是crash bug。

Crash bugs are among the most severe bugs in real-world software systems, and a lot of research efforts have been put into handling crash bugs,

这一段确实是写的非常好了,本来大家想问 为什么只修crash bug,但是接下来这些话,引用文献之类的,很好的说明了文章的价值,
落脚点是很高的。

including localizing the causes of crash bugs [5], keeping the system running under the presence of crashes [6], and checking the correctness of fixes to crash bugs [7]. However, despite the notable progress in automatic bug fixing [8, 9, 4, 10, 11, 12], there is no approach that is designed to directly fix crash bugs within our knowledge.

[5] R. Wu, H. Zhang, S.-C. Cheung, and S. Kim, “Crashlocator: Locating crashing faults based on crash stacks,” in ISSTA 2014, 2014, pp. 204–214.
[6] B. Demsky and M. Rinard, “Automatic detection and repair of errors in data structures,” in OOPSLA, 2003, pp. 78–95.
[7] H. Seo and S. Kim, “Predicting recurring crash stacks,” in ASE. ACM, 2012, pp. 180–189.


there is no approach that is designed to directly fix crash bugs within our knowledge.
精彩,这句话是真的6。

此外,我觉得这些文献都应该读读,原来2003年就有 automatic detection and repair of errors。 太神奇了。

10 我发现这篇文章的写作风格很奇特,有很多it is easy/not easy/difficult/impractical/common/not feasible,这样的写作风格比较少见。值得研究一下。(但是确实把问题描述的很清楚)

11 文中提出的两大challenge

1)it is not easy to locate a suitable Q&A web page that describes a bug of the same type and contains a solution automatically.

  • 解决:用 crash trace

2)it is still difficult to extract a solution from a page where questions and answers are described in a natural language.

  • many Q&A pages contain code snippets, and it is enough to fix many bugs by only looking at the code snippets on the pages.


看完这个部分,我觉得 这个idea是真的好,但是这个文章 并没有涉及到 自然语言的处理,而是直接用的code snippets。
这个也就是没那么难了(然而对我来说还是没法)

不过这也说明,好的idea是非常重要的,就算技术上做不大,我们也可以想办法,退而求其次。
重点是有 high quality contribution to the APR society。

然而,就算不做自然语言处理,也还是很难。。。
emmm,
However, even only analyzing code snippets is not easy. Due to the fuzzy nature of Q&A pages, there may not be a clear correspondence between the buggy and fixed versions of the code. Furthermore, we cannot directly apply the fix described in the web page to the target project, as the code in the web page is usually different from the source code in the target project.

12 震惊!特别注意:接问题11,我感觉到了作者强大的基本功!如何解决问题?——需要使用现有的“轮子”,技术,来帮助自己实现这个目的。太厉害了!

To overcome these difficulties, we systematically combine a set of existing techniques, including partial parsing [13, 14], tree-based code differencing [15, 16], and edit script generation [17]. These techniques together allow us to deal with the fuzzy nature of the web code as well as the gap between the project and the web page.

13 贡献(膜拜一下作者的写作功底)

• We propose an approach to fixing recurring crash bugs via analyzing Q&A sites. To our knowledge, this is the first approach for automatic program repair using Internet resources.
第一个用QA网站的信息修复crash bugs的approach。

• We demonstrate that fixes in Q&A sites can be obtained and applied by combining a set of fuzzy program analysis techniques, without complex natural language processing.
模糊程序分析技术。而不涉及到复杂的自然语言处理

• We evaluate our approach with real-world crash bugs from GitHub, and manually verify the correctness of the generated patches. Our evaluation shows that our approach is effective in fixing real-world recurring crash bugs, and can complement existing bug-fixing approaches.
人工检查补丁正确性。effective in fixing bugs。

nice

14 再次看到 changedistiller 和 gumtree!!

4) Code Differencing: The technique we use in analyzing Q&A sites is code differencing. ChangeDistiller [15] is a widely-used approach that builds mappings and generates edit scripts at AST level. GumTree [16] improves ChangeDistiller by removing the assumption that leaf nodes contain a significant amount of text, and it detects move actions better than ChangeDistiller. Chawathe et al. [17] propose an optimal and linear algorithm that generates edit scripts based on AST mappings. We chose GumTree for edit script generation, because it is the state-of-art work in this area.

15 QAcrashFix没有用SFL,而是直接用crash trace里面的提示信息来定位

简直是太省事了,太酷了,
这个idea是真的酷炫。

16 工作流程

这里写图片描述

the overview of the QAcrashFix.

17 仓促小结

总之,对这篇文章有了一定的了解,也学到了作者的idea,现在天色已晚,我也不太想继续看了,所以先到此为止。

以后有机会再返工一下。

猜你喜欢

转载自blog.csdn.net/weixin_39278265/article/details/82634413