1 benchmark数据集

benchmark数据集目前包含：MUSK1、MUSK2 ¹、Elephant、Fox以及Tiger ²，其具体属性如下：

数据集名称	MUSK1	MUSK2	Elephant	Fox	Tiger
维度	166	166	230	230	230
包数量	92	102	200	200	200
正包数	47	39	100	100	100
实例数	476	6598	1391	1320	1220
最大包大小	40	1044	13	13	13
最小包大小	2	1	2	2	1

数据集已上传至GitHub：
https://github.com/InkiInki/data/blob/master/multi-instance/benchmark.rar

2 text categorization

二十个文本分类数据集 ³来自于文本分类中广泛使用的20个新闻组语料库。对于每个数据集，包的数量为100，正包的数量与负包的数量相同。
数据集已上传至GitHub：
https://github.com/InkiInki/data/blob/master/multi-instance/text-categorization.rar

3 image 数据集

图像分类是MIL最成功的应用之一。数据集2000-Image ⁴和1000-Image ⁵包含20和10类COREL图像。每个类别有100张图像，每个图像都视为一个包。
原始图像数据集示例如下：
在这里插入图片描述
原始图像和已处理数据已上传至GitHub：
https://github.com/InkiInki/data/blob/master/multi-instance/2000-image.rar

4 artificial数据集

最初由Amar ⁶等制造，用于多示例回归，已上传至GitHub：
https://github.com/InkiInki/data/blob/master/multi-instance/artificial-dataset.rar

5 数据集格式介绍

以上的每一个数据集均处理为两个.arff文件，…_1代表数据集中所有的示例，@data的最后一列为实例标签，为窝处理时加上去的，可以忽略；…_2中@data下只有前两列数据可用，第一列代表每个包的大小，第二列代表每个包的标签。

T.G. Dietterich, R.H. Lathrop, and T. Lozano-Pérez. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, pages 31–71, 1997. ↩︎
S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. Proc. of Neural Information Processing Systems, pages 561–568, 2003. ↩︎
Z.H. Zhou, Y.Y. Sun, and Y.F. Li. Multi-instance learning by treating instances as non-i.i.d. samples. Proceedings of International Conference on Machine Learning, pages 1249–1256, 2009. ↩︎
Y.X. Chen, J. Bi, and J.Z. Wang. Miles: multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12):1931–1947, 2006. ↩︎
Y.X. Chen and J.Z. Wang. Image categorization by learning and reasoning with regions. Journal of machine learning Research, 5(Aug):913–939, 2004. ↩︎
R. A. Amar, D. R. Dooly, S. A. Goldman, and Q. Zhang. Multiple-instance learning of real-valued data. In Proceedings of the 18th International Conference on Machine Learning, pages 3–10, Williamstown, MA, 2001. ↩︎

因吉

原创文章 35 获赞 44 访问量 8626

关注私信

多示例数据集 (Multi-instance)

文章目录

1 benchmark数据集

2 text categorization

3 image 数据集

4 artificial数据集

5 数据集格式介绍

猜你喜欢