1.Classification
Like sklearn, Orange provides machine learning algorithms such as Classification and Regression, which are used as follows:
import Orange
data = Orange.data.Table("voting")
lr = Orange.classification.LogisticRegressionLearner()
rf = Orange.classification.RandomForestLearner(n_estimators=100)
res = Orange.evaluation.CrossValidation(data, [lr, rf], k=5)
print("Accuracy:", Orange.evaluation.scoring.CA(res))
print("AUC:", Orange.evaluation.scoring.AUC(res))
Learners and Classifiers
Classification is composed of two types of objects: learners and classifiers. The learner considers the class-labeled data and returns a classifier. Given the first three data instances, the classifier returns the predicted classification:
import Orange
data = Orange.data.Table("voting")
#学习器
learner = Orange.classification.LogisticRegressionLearner()
#返回一个分类器
classifier = learner(data)
#查看分类结果
classifier(data[:3])
#预测数据
c_values = data.domain.class_var.values
for d in data[5:8]:
c = classifier(d)
print("{}, originally {}".format(c_values[int(classifier(d)[0])],
d.get_class()))
#统计错误
x = np.sum(data.Y != classifier(data))
Probabilistic Classification
Find the probability size assigned to each class by the classifier.
data = Orange.data.Table("voting")
learner = Orange.classification.LogisticRegressionLearner()
classifier = learner(data)
target_class = 1
print("Probabilities for %s:" % data.domain.class_var.values[target_class])
probabilities = classifier(data, 1)
for p, d in zip(probabilities[5:8], data[5:8]):
print(p[target_class], d.get_class())
Cross-Validation
data = Orange.data.Table("titanic")
lr = Orange.classification.LogisticRegressionLearner()
res = Orange.evaluation.CrossValidation(data, [lr], k=5)
print("Accuracy: %.3f" % Orange.evaluation.scoring.CA(res)[0])
print("AUC: %.3f" % Orange.evaluation.scoring.AUC(res)[0])
Handful of Classifiers
Orange contains many classification algorithms, most of which are packaged from sklearn, as follows:
import Orange
import random
random.seed(42)
data = Orange.data.Table("voting")
test = Orange.data.Table(data.domain, random.sample(data, 5))
train = Orange.data.Table(data.domain, [d for d in data if d not in test])
tree = Orange.classification.tree.TreeLearner(max_depth=3)
knn = Orange.classification.knn.KNNLearner(n_neighbors=3)
lr = Orange.classification.LogisticRegressionLearner(C=0.1)
learners = [tree, knn, lr]
classifiers = [learner(train) for learner in learners]
target = 0
print("Probabilities for %s:" % data.domain.class_var.values[target])
print("original class ", " ".join("%-5s" % l.name for l in classifiers))
c_values = data.domain.class_var.values
for d in test:
print(("{:<15}" + " {:.3f}"*len(classifiers)).format(
c_values[int(d.get_class())],
*(c(d, 1)[0][target] for c in classifiers)))
2.Regression
Regression is similar to a classifier. There is a learner and a regressor (regression model). The regression learner receives data and returns it to the regressor, and the regressor predicts the value of the continuous class.
import Orange
data = Orange.data.Table("housing")
learner = Orange.regression.LinearRegressionLearner()
model = learner(data)
print("predicted, observed:")
for d in data[:3]:
print("%.1f, %.1f" % (model(d)[0], d.get_class()))
Handful of Regressors
Build regression tree model:
data = Orange.data.Table("housing")
tree_learner = Orange.regression.SimpleTreeLearner(max_depth=2)
tree = tree_learner(data)
#输出树结构
print(tree.to_string())
random.seed(42)
test = Orange.data.Table(data.domain, random.sample(data, 5))
train = Orange.data.Table(data.domain, [d for d in data if d not in test])
lin = Orange.regression.linear.LinearRegressionLearner()
rf = Orange.regression.random_forest.RandomForestRegressionLearner()
rf.name = "rf"
ridge = Orange.regression.RidgeRegressionLearner()
learners = [lin, rf, ridge]
regressors = [learner(train) for learner in learners]
print("y ", " ".join("%5s" % l.name for l in regressors))
for d in test:
print(("{:<5}" + " {:5.1f}"*len(regressors)).format(
d.get_class(),
*(r(d)[0] for r in regressors)))
Cross Validation
data = Orange.data.Table("housing.tab")
lin = Orange.regression.linear.LinearRegressionLearner()
rf = Orange.regression.random_forest.RandomForestRegressionLearner()
rf.name = "rf"
ridge = Orange.regression.RidgeRegressionLearner()
mean = Orange.regression.MeanLearner()
learners = [lin, rf, ridge, mean]
res = Orange.evaluation.CrossValidation(data, learners, k=5)
rmse = Orange.evaluation.RMSE(res)
r2 = Orange.evaluation.R2(res)
print("Learner RMSE R2")
for i in range(len(learners)):
print("{:8s} {:.2f} {:5.2f}".format(learners[i].name, rmse[i], r2[i]))