多尺度综述类

Using an Interest Ontology for Improved Support in Rule Mining  总结的就很好。借用这篇文章对于多层次关联规则挖掘的研究现状进行综述。

Han [5] discusses data mining at multiple concept levels. His approach is to use discovered associations at one level (e.g., milk ! bread) to direct the search for associations at a difierent level (e.g., milk of brand X ! bread of brand Y). As most of our data mining involves only one interest, our problem setting is quite difierent. Han et al. [6]introduce a top-down progressive deepening method for mining multiple-level association rules. They utilize the hierarchy to collect large item sets at difierent concept levels. Our approach utilizes an interest ontology to improve support in rule mining by means of concept raising. Fortin et al. [3] use an object-oriented representation for data mining. Their interest is in deriving multi-level association rules. As we are typically using only one data item in each tuple for raising, the possibility of multi-level rules does not arise in our problem setting. Srikant et al. [12] present Cumulative and EstMerge algorithms to flnd associations between items at any level by adding all ancestors of each item to the transaction. In our work, items of difierent levels do not coexist in any step of mining. Psaila et al. [9] describe a method how to improve association rule mining by using a generalization hierarchy. Their hierarchy is extracted from the schema of the database and used together with mining queries [7]. In our approach, we are making use of a large pre-existing concept hierarchy, which contains concepts from the data tuples. P¶airc¶eir et al. also difier from our work in that they are mining multi-level rules that associate items spanning several levels of a concept hierarchy [10]. Joshi et al. [8] are interested in situations where rare instances are really the most interesting ones, e.g., in intrusion detection. They present a two-phase data mining method with a good balance of precision and recall. For us, rare instances are not by themselves important, they are only important because they contribute with other rare instances to result in frequently occurring instances for data mining.
翻译:韩[5]. Han. Mining knowledge at multiple concept levels. In CIKM, pages 19{24, 1995 讨论了多概念层次的数据挖掘。他的方法是在一个层次上使用发现的关联(例如,牛奶!面包)在不同程度上引导搜索关联(例如,品牌X的牛奶!由于我们大多数的数据挖掘只涉及到一个兴趣点,所以我们的问题设置是相当不同的。Han等人介绍了一种挖掘多层次关联规则的自顶向下渐进深化方法。它们利用层次结构在不同的概念级别上收集大型项目集。我们的方法利用兴趣本体论,通过提出概念来改进规则挖掘的支持。Fortin等[3]S. Fortin and L. Liu. An object-oriented approach to multi-level association rule
mining. In
Proceedings of the flfth international conference on Information and
knowledge management
, pages 65{72. ACM Press, 1996. 使用面向对象的表示数据挖掘。他们的兴趣在于推导多层次的关联规则。由于我们通常在每个元组中只使用一个数据项来提升,所以在我们的问题设置中不可能出现多层次规则。Srikant et al.[12]R. Srikant and R. Agrawal. Mining generalized association rules. In Proc. of 1995
Int’l Conf. on Very Large Data Bases (VLDB’95), Zurich, ˜ Switzerland, September

1995, pages 407{419, 1995. 通过将每个项目的所有祖先添加到事务中,提出了累积和EstMerge算法,以在任何级别的项之间建立关联。在我们的工作中,不同级别的项目在任何采矿步骤中都不能共存。Psaila等人介绍了一种利用泛化层次结构改进关联规则挖掘的方法。它们的层次结构从数据库的模式中提取出来,并与挖掘查询[7]J. Han, Y. Fu, W. Wang, K. Koperski, and O. Zaiane. DMQL: A data mining query language for relational databases, 1996. 一起使用。在我们的方法中,我们正在使用一个大的预先存在的概念层次结构,它包含来自数据元组的概念。现P¶¶eir等人也从我们的工作出现在他们挖掘多层次规则,关联项目跨越好几个层次的概念层次结构[10]R. P¶ airc¶eir, S. McClean, and B. Scotney. Discovery of multi-level rules and exceptions from a distributed database. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 523{532.ACM Press, 2000. 。Joshi等人的[8]感兴趣的情况是,罕见的实例实际上是最有趣的,例如,入侵检测。他们提出了一种两阶段数据挖掘方法,在精度和查全率之间取得了良好的平衡。对于我们来说,罕见的实例本身并不重要,它们之所以重要,仅仅是因为它们与其他罕见的实例一起工作,从而导致频繁出现的实例进行数据挖掘。

这品文章提出的就是:In this paper, we showed that the combination of an ontology of the mined concepts with a standard rule mining algorithm can be used to generate data sets with orders of magnitude more tuples at higher levels. Generating rules from these tuples results in much larger (absolute) support values. In addition,
raising often produces rules that, according to our intuition, better represent the domain than rules found without raising. Formalizing this intuition is a subject of future work. According to our extensive experiments with tuples derived from Yahoo interest data, data mining with raising can improve absolute support for rules up to over 6000% (averaged over all common rules in one interest category). Improvements in support may be even larger for individual rules. When averaging
 over all support improvements for all 16 top level categories and levels 2 to 5,
we get a value of 438%.


在本文中,我们证明了将挖掘概念的本体与标准规则挖掘算法相结合,可以生成更高层次上的元组数量级的数据集。
从这些元组中生成规则会产生更大的(绝对的)支持值。此外,根据我们的直觉,提出的规则比不提出的规则更能代表领域。
将这种直觉形式化是未来工作的主题。根据我们对雅虎兴趣数据中元组的广泛实验,带raise的数据挖掘可以提高对规则的绝对支持,最高可达6000%(在一个利益类别中的所有公共规则的平均值)。对于单个规则,支持方面的改进可能更大。当对所有16个顶级类别和级别2到5的支持改进进行平均时,我们得到438%的值。

猜你喜欢

转载自blog.csdn.net/qq_31491859/article/details/80859188