Weka 中的数据表基本管理

摘要: 通过例子说明 Instances, Instance, Attribute 三个类. 本贴可与 日撸 Java 三百行(51-60天,kNN 与 NB) 配合使用.

1. 测试

先测试了再说.

1.1 测试数据

先到这里下载测试数据集 weather-original.arff.
https://gitee.com/fansmale/javasampledata

1.2 代码

package thinking;

import java.io.FileReader;

import weka.core.Instances;
import weka.core.Instance;
import weka.core.Attribute;

/**
 * Test the data management of Weka.
 * 
 * @author Fan Min [email protected]
 *
 */
public class WekaDataTest {

	/**
	 ***************** 
	 * The only testing method.
	 * 
	 * @param args
	 ***************** 
	 */
	public static void main(String args[]) {
		Instances tempData = null;
		try {
			FileReader fileReader = new FileReader("D:/workplace/javasampledata/weather-original.arff");
			tempData = new Instances(fileReader);
			fileReader.close();
		} catch (Exception ee) {
			System.out.println("Cannot read the file: \r\n" + ee);
			System.exit(0);
		} // Of try

		// Step 1. Show the data.
		System.out.println("\r\n********* Part 1 *********");
		System.out.println("The data table is:\r\n" + tempData);

		// Step 2. Show one instance.
		System.out.println("\r\n********* Part 2 *********");
		System.out.println("The 3rd instance is: \r\n" + tempData.instance(2));

		// Step 3. Show one attribute.
		System.out.println("\r\n********* Part 3 *********");
		System.out.println("The 2nd attribute is: \r\n" + tempData.attribute(1));
		System.out.println("Its number of values is: \r\n" + tempData.attribute(1).numValues());
		System.out.println("The 3nd attribute is: \r\n" + tempData.attribute(2));
		System.out.println("Its number of values is: \r\n" + tempData.attribute(2).numValues());

		// Step 4. Take out one value from the data table.
		System.out.println("\r\n********* Part 4 *********");
		System.out.println("The 1st attribute value of the 1st instance is: " + tempData.instance(0).value(0));
		System.out.println("The 3rd attribute value of the 1st instance is: " + tempData.instance(0).value(2));
		System.out.println("The 5th attribute value of the 1st instance is: " + tempData.instance(0).value(4));
		System.out.println("The 5th attribute value of the 1st instance is: " + tempData.instance(0).value(4));

		// Step 5. Set the class attribute and show.
		System.out.println("\r\n********* Part 5 *********");
		tempData.setClassIndex(0);
		System.out.println("If we use the 1st attribute as the class, it is: \r\n" + tempData.classAttribute());
		tempData.setClassIndex(4);
		System.out.println("If we use the 5th attribute as the class, it is: \r\n" + tempData.classAttribute());
		System.out.println("The class value of the 1st instance is: " + tempData.instance(0).classValue());
	}// Of main
}// Of class WekaDataTest

1.3 结果


********* Part 1 *********
The data table is:
@relation weather.original

@attribute outlook {sunny,overcast,rainy}
@attribute temperature {hot,mild,cool}
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {yes,no}

@data
sunny,hot,95,FALSE,no
sunny,hot,92,TRUE,no
overcast,hot,91,FALSE,yes
rainy,mild,88,FALSE,yes
rainy,cool,78,FALSE,yes
rainy,cool,75,TRUE,no
overcast,cool,72,TRUE,yes
sunny,mild,89,FALSE,no
sunny,cool,77,FALSE,yes
rainy,mild,71,FALSE,yes
sunny,mild,78,TRUE,yes
overcast,mild,88,TRUE,yes
overcast,hot,70,FALSE,yes
rainy,mild,72,TRUE,no

********* Part 2 *********
The 3rd instance is: 
overcast,hot,91,FALSE,yes

********* Part 3 *********
The 2nd attribute is: 
@attribute temperature {hot,mild,cool}
Its number of values is: 
3
The 3nd attribute is: 
@attribute humidity numeric
Its number of values is: 
0

********* Part 4 *********
The 1st attribute value of the 1st instance is: 0.0
The 3rd attribute value of the 1st instance is: 95.0
The 5th attribute value of the 1st instance is: 1.0
The 5th attribute value of the 1st instance is: 1.0

********* Part 5 *********
If we use the 1st attribute as the class, it is: 
@attribute outlook {sunny,overcast,rainy}
If we use the 5th attribute as the class, it is: 
@attribute play {yes,no}
The class value of the 1st instance is: 1.0

2. 相关类

2.1 Attribute

该类管理某一个属性, 如

@attribute temperature {hot,mild,cool}

这是一个名词型属性, 名字是 temperature, 取值为 3 种可能.

@attribute humidity numeric

这是一个实数型属性, 名字是 humidity , 取值为 0 种可能 (因为 Java 无法获得无穷种可能, 所以毛了, 干脆跟你说是 0种, 哈哈).
见 Part 3 输出.

2.2 Instance

该类管理一行数据, 见 Part 2 输出.

2.3 Instances

该类管理一个表格, 见 Part 1 输出.
如果要获得第 i 行第 j 列的数据, 必须写成

tempData.instance(i).value(j);

的形式. 它获得的是数据的内部表示, 如 sunny, overcast, rainy 的内部表示依次是 0.0, 1.0, 2.0. 如果要把它们换成整数, 需要进行强制类型转换.

(int)tempData.instance(i).value(j);

注意: 从表格中获得数据, 不能先取列再取行. 因为后者首先取出一个属性, 而不是一个完整的列, 见 Part 3 输出.

如果做预测任务, 需要设置决策属性, 见 Part 5.

猜你喜欢

转载自blog.csdn.net/minfanphd/article/details/123368177