SVM和LR速度测试

样本1

X, Y = skl_ds.make_classification(n_samples=200, n_features=50, n_classes=2)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=2019)

	First	Second	Third	Forth	Fifth	Avg
SVM时间（秒）	0.0010	0.0020	0.0040	0.0020	0.0030	0.0024
SVM准确率	0.7833	0.7667	0.7333	0.8167	0.7667	0.77334
LR时间（秒）	0.0230	0.0010	0.0020	0.0020	0.0020	0.006
LR准确率	0.9167	0.8167	0.75	0.85	0.8	0.82668

样本2

X, Y = skl_ds.make_classification(n_samples=2000, n_features=50, n_classes=2)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=2019)

	First	Second	Third	Forth	Fifth	Avg
SVM时间（秒）	0.1700	0.1640	0.1790	0.1449	0.1579	0.16316
SVM准确率	0.8833	0.93	0.925	0.9366	0.9333	0.92164
LR时间（秒）	0.0339	0.0069	0.0079	0.0080	0.0079	0.01292
LR准确率	0.8733	0.9216	0.9216	0.935	0.9416	0.91862

样本3

X, Y = skl_ds.make_classification(n_samples=20000, n_features=200, n_classes=2)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=2019)

	First	Second	Third	Forth	Fifth	Avg
SVM时间（秒）	291.80	194.85	162.11	210.30	289.99	229.81
SVM准确率	0.8853	0.9101	0.9223	0.9065	0.898	0.90444
LR时间（秒）	0.3450	0.3370	0.3510	0.3800	0.3250	0.3476
LR准确率	0.8848	0.9125	0.9206	0.9013	0.8963	0.9031

分析：

使用了三组数据进行测试，第一组数据一共200个样本，特征为50维；第二组数据一共2000个样本，特征为50维；第三组数据一共20000个样本，特征为200维。

从结果中可以看出，当样本较少，特征维数较低时，SVM和LR的运行时间均比较短，SVM较短一些。准确率的话，LR明显比SVM要高。当样本稍微增加些时，SVM运行时间开始增长，但是准确率赶超了LR。SVM时间虽长，但在接收范围内。

当数据量增长到20000时，特征维数增长到200时，SVM的运行时间剧烈增加，远远超过了LR的运行时间。但是准确率却和LR相差无几。从结果中可以直接分析，SVM在处理海量样本以及高维特征时并不是令人喜悦的算法，至少在运行时间上看来是这样的，而且本次实验使用的是线性核，如果使用RBF核，可能会取得较好的准确率，但是时间恐怕还得巨幅增长了。

分析下原因：核化的SVM本身的时间复杂度Omn2就是二次时间，n是样本数，m是特征数。当数据量较大的时候，还面临空间复杂度的问题，cache的储存可能不够，因此在数据量大特征多的情况下SVM的时间复杂度可能远超二次时间。

分析：

猜你喜欢