[Read the paper] An Empirical Study of Architectural Decay in Open-Source Software

This article is to share my learning software architecture and middleware coursework. We can say that this article shaped my view of basic research, I now also develop the habit of reading papers. Thanks to the teachers, but also a sense of Xie Hengheng help me.

Papers Address:  https://ieeexplore.ieee.org/document/8417151

I. Overview of the problem

1.1. Basic Concepts

Architecture corruption (architectural decay) is a phenomenon software architecture and performance over time version of iterations gradual deterioration or decreasing maintainability. When the software system is modified or introduced new decision-making in the life cycle, they tend to produce architecture corruption. As shown in Figure 1, is the software architecture diagram Chukwa open source projects, the introduction of a large number of versions dependent upon the change, resulting in architecture corruption.

 

1 Architecture FIG corruption often generated when the change version

Architecture odor (architectural smells) architecture is the concrete manifestation of corruption, which refers to an instance at the architectural level of bad design decisions. Which is derived from the use of inappropriate software architecture and abstraction (e.g., components, interfaces, etc.), as shown in Table 1, the odor can be divided into the following schema 17 categories.

Odor will produce an understandable architecture of the system, testability and reusability have a negative impact, while previous studies rely more on personal experience, the lack of verification examples of its impact, and this article will a question for discussion.

 

Table 1 classification schema odor

1.2. Writing motivation

1.2.1. Defects previous work

Overall, previous work experience depends on the individual and small-scale study of cases, the conclusion is not universal. And its research methods, the former tend to focus on the problem with the code smell to represent architectural layers, but code smell the odor architecture has a big gap in the definition and function, the code architecture of smell can not be regarded architecture abstract, effect of the performance of the experiment is not good.

Scholars immediately smell proposed design (design smells) to represent a set of code smell, this approach has some guiding significance, but it is often pointed out that some non-issue architecture, performance results inaccurate.

1.2.2 paper work

The main focus of this paper in the following areas:

  • Selecting a representative architecture odor, odor achieve automatic detection algorithm;
  • The introduction of the question bank (issue repositories), by exploiting the relationship between architecture and odor problems, measure the impact of odors on the system;
  • ARCADE model and the use of a large number of examples demonstrate the architecture of the system is harmful odors.

 

Second, understanding of the model

2.1. ARCADE Model Overview

2, the paper design, modify, corruption evaluator model architecture for a recovery, including recovery techniques and a variety of software architectures for a set of measurements of various aspects of measuring change in architecture.

By function can be divided into the following modules: Restore software architecture (Architecture Recovery), odor recognition (Smell Detector), issue tracking system (Issue Tracking Systems) and the experimental analysis of the main module.

 

FIG 2 ARCADE schematic model

Github obtained from the source file to his software architecture reduction processing module, generating a formal representation of the software architecture (Architecture), referred to the next module identification odor odor recognition. Architecture odor formation with version information obtained from the Github repository record and submit questions Jira offers submitted as input to the issue tracking system that will eventually handed over to the final analysis module for verification analysis, the correct final results.

2.2. Restore Software Architecture

It refers to a software architecture recovery recovery system architecture implemented member from the process, and to achieve common architecture includes source code, executable files, and .class files. We represent a software architecture diagram showing the assembly with the nodes representing dependencies and logic coupled to each of the node C, may also be constituted by Entity E side.

As shown below, means that the framework consists of components A and component C1 C2, assembly constituted by a plurality of entities, a dependency exists between 1 and 4,3 and 5, there is a logical coupling between 2 and 3.

  

Software Architecture FIG Example 3

Three architectures employed herein CCP recovery method, the first is driven clustering algorithm appreciated (the ACDC), the architecture of a second recovery method (ARC) using the point of interest, a third recovery method for the packet structure (PKG ), three were efforts to make schema reduction in assembly, concerns and packet-level structure. Three technologies independently developed and proven effective in previous work, the use of a variety of recovery techniques also ensure the accuracy of view of architecture.

在将软件架构转化为示意图后,系统还将模型用形式化的语言进行表示,具体看来满足如表2所示的规则。值得注意的是,实体e由接口I、连接关系L和耦合关系Cp共同构成,而L和Cp均由源src和目的dst构成。在Java中实体e可以视作类,接口I可以视作公开方法。

 

表2 结构化表示规则

      

2.3. 异味检测

如上文所说,根据前人的研究,共有17种常见的架构异味,但在本实验中,我们依据不重复、不主动引入认为因素、不考虑连接器、满足软件架构基本点的原则进行筛选,选出以下六种作为实验中检测的异味,他们分别为:关注过载(Concern Overload)、循环依赖(Dependency Cycle)、链接过载(Link Overload)、未使用接口(Unused Interfaces)、草率委托(Sloppy Delegation)、共变耦合(Co-change Coupling)。接下来我们就需要设计算法识别上述六种异味,在算法实现中,我们使用四点分距法判断阈值。

【dectCO】在识别关注过载异味时,我们先判断每个组件实现了多少关注点,再判断实现多少关注点是过载的。借助四点分距算法,我们能够识别出哪些组件实现了过多的关注点。

【dectDC】循环依赖违反了模块化的原则,我们可以使用判断强连通图的经典方法来判断是否存在循环依赖

【dectLO】连接过载的判别方法类似于关注过载,只是要转为确定组件的出度和入度,然后用四点分距判断连接过度的组件/

【dectUI】无用接口增加了不必要的复杂度,不利于系统的维护。在算法实现过程中,我们只需要遍历每一个实体的接口I,如果其没有被调用,则为无用接口。

【dectSD】草率委派将关注点不合时宜的拆分,增加了数据流和控制流的复杂度。在算法实现的过程中,如果一个实体被调用的次数过低且不调用其他实体,则我们认为存在草率委派。

【dectCC】耦合有时候是必要的,但过度的耦合关系就会破坏原有的架构结构。本算法就是为了找出耦合度过高的实体,先计算组建的耦合度,再判断耦合度高于阈值的组件,完成判断。

以上六种算法共同作用,形成了异味检测模块,能够自动识别软件架构文件中的臭文件并将其返回。

 

2.4. 问题跟踪系统

  Jira是一个问题管理系统,他保存了开源软件在生命周期内被提出的所有问题。本实验选用的开源软件均使用Jira作为管理工具,本实验中也利用Jira提供的问题库(issue repositories)进行识别和判定。

       Jira系统除了使用问题库作为输入外,还利用了从GitHub中获得的提交记录和上文中得到的架构异味信息。通过分析问题与异味之间的关系,若问题和异味均影响同一版本且问题的解决影响了臭文件,则称两者相关。而这一模块就时识别出问题与异味之间的关系并将相关信息返回。

 

三、        对实验过程和实验结果的理解

3.1. 数据集设置

任务采用了如表3所示的8个开源系统的421个版本,共计376M行代码,累计获取了4万余个相关问题。所有开源项目均使用Apache开发,其具有维护良好的数据库、发行说明和bug追踪器,便于后续的研究;所有开源项目也均使用Jira作为问题管理系统,便于追踪问题和修复提交。

任务也利用了3种不同的架构还原方法,形成了1263个结构模型,识别出了6种共17余万个架构异味。

 

 表3 采用的开源项目一览表

3.2. 实验结果

3.2.1.  主实验

为了探究架构异味对系统的影响是如何显现出来的,作者做了两个假设。

假设一:臭文件比干净的文件更可能出问题;

假设二:臭文件比干净的文件更有可能发生改变;

在主实验环节,作者就利用了上述的ARCADE模型进行了分析,得到如表4和表5的结果。在表4中统计出了每个文件平均的问题数/修改提交数,也用2-sample t-test方法验证了结果非偶然误差导致。

 

表4 验证H1

 

 表5 验证H2

实验结果显示,在95%的置信度下,臭文件比干净文件的错误率提高24%-110%,修改率增加10%-83%,两个假设都得到了证明。

实验中亦有一个反例出现,猜测是smells类型不同造成影响,仍需后续研究证明。

3.2.2.    辅助实验

论文中还分析了Camel系统的78个版本在ACDC方法下的文件特点。

 

图 4  不同版本中的臭文件

如图4 所示,我们可以发现,臭文件在第一个版本就产生,且架构腐化的问题真实存在。

论文中还分析了文件长度与异味之间的关系,横坐标为文件大小,纵坐标为臭(或干净)文件在该长度中占的比例,不同于以往的研究,本论文发现,架构衰退与文件大小无关。

 

图 5  架构异味与文件大小之间的关系

四、Contribution

4.1. 创新与贡献

本文的创新点主要有以下几项:

  • 验证思路创新:不再通过研究code smells来分析架构问题,而是直接基于架构恢复技术,利用ARCADE进行分析验证
  • 实验设计创新:前人的工作针对的训练集较小,且没有依赖架构异味作为架构腐化的实例进行分析,而本文针对更多开源系统的更多版本、多种架构恢复方法进行实例研究

本文同时对该方向研究产生了以下贡献:

  • 在多个数据集上使用ARCADE验证猜想,证明架构异味对系统有害。

4.2. 有效性分析

ARCADE的有效性来自于以下几个方面:

  • 使用了三种体系结构恢复技术保证架构视图的准确性;
  • 使用被证明有效的架构异味类型,并使用常见的阈值法筛选。
  • 使用被证明有效的 架构异味类型,并使用常见的阈值法筛选。

4.3. 后续研究

在未来研究中,本文希望集中于以下优化方向:

  • 在分析模型中考虑改变的代码行数,这一影响因素。
  • 利用ARCADE模型预测架构腐化和潜在的问题

Guess you like

Origin www.cnblogs.com/hithongming/p/12022487.html