Weka开发[4]-特征选择

     特征选择,理论上和实际上理论上和实际上使用特征选择之后进行分类比不进行特征选择的正确率都差,那么特征选择的意义又何在呢?与一位网友讨论的结果是:有些特征的提取可没那么容易,也会带来计算效率问题。如果不进行特征选择直接进行分类的话计算效率有可能不能接受,所以可以事先使用部分数据进行特征选择。那么显然我们要权衡的就是:特征选择带来的正确率下降和不选择有可能带来的效率问题哪个更不能被接受。

 

     大概讲一下,用AttributeSelection进行特征选择,它需要设置3个方面,第一:对属性评价的类(自己到Weka软件里看一下,英文Attribute Evaluator),第二:搜索的方式(自己到Weka软件里看一下,英文Search Method),第三:就是你要进行特征选择的数据集了。最后调用Filter的静态方法userFilter,感觉写的都是废话,一看代码就明白了。唯一值得一说的也就是别把AttributeSelection的包加错了,代码旁边有注释。

       另一个函数懒的解释了(它也不是我写的),基本上是自解释的,不太可能看不懂。

package com.cizito.weka.study;

import java.util.Random;

import weka.attributeSelection.CfsSubsetEval;
import weka.attributeSelection.GreedyStepwise;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.meta.AttributeSelectedClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.supervised.attribute.AttributeSelection;

/**
 * @author zhangwei
 *
 */
public class FilterTest {

	
	private Instances m_instances = null;
	private Instances selectedIns;
	
	
	public static void main( String[] args ) throws Exception {
	        FilterTest filter = new FilterTest();
	       
	        filter.getFileInstances( "D:/ProgramFiles/Weka-3-6/data/soybean.arff");
	        filter.selectAttUseFilter();
	       
	        filter.selectAttUseMC();
	 }
	   
    public void getFileInstances( String fileName ) throws Exception {
        DataSource frData = new DataSource( fileName );
        m_instances = frData.getDataSet();
       
        m_instances.setClassIndex( m_instances.numAttributes() - 1 );
    }
   
    public void selectAttUseFilter() throws Exception {
        AttributeSelection filter = new AttributeSelection();  // package weka.filters.supervised.attribute!
        CfsSubsetEval eval = new CfsSubsetEval();
        GreedyStepwise search = new GreedyStepwise();
        filter.setEvaluator(eval);
        filter.setSearch(search);
        filter.setInputFormat( m_instances );
       
        System.out.println( "number of instance attribute = " + m_instances.numAttributes() );
       
        selectedIns = Filter.useFilter( m_instances, filter);
        System.out.println( "number of selected instance attribute = " + selectedIns.numAttributes() );

        for( int i = 0; i < selectedIns.numInstances(); i++ ) {
           
            System.out.println( selectedIns.instance( i ) );
        }
    }
   
    public void selectAttUseMC() throws Exception {  
         AttributeSelectedClassifier classifier = new AttributeSelectedClassifier();
         CfsSubsetEval eval = new CfsSubsetEval();
         GreedyStepwise search = new GreedyStepwise();
         J48 base = new J48();
         NaiveBayes nb = new NaiveBayes();
         classifier.setClassifier( nb );
         classifier.setEvaluator( eval );
         classifier.setSearch( search );
         // 10-fold cross-validation
         Evaluation evaluation = new Evaluation( selectedIns );
         evaluation.crossValidateModel(classifier, m_instances, 10, new Random(1));
         System.out.println( evaluation.toSummaryString() );
    }
}

 

猜你喜欢

转载自zwustudy.iteye.com/blog/1847473