18/08/19 22:01:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Parsing data
Fitting model
train
Currently 8 partitions left
Size of data: 60000
Currently 4 partitions left
Size of data: 28024
Currently 2 partitions left
Warning: 456.0 relevant support vectors thrown away!
('model.support:', array([ 0, 1, 3, ..., 7018, 7020, 7021], dtype=int32), ' self.max_sv:', 5000.0)
Warning: 484.0 relevant support vectors thrown away!
('model.support:', array([ 0, 1, 3, ..., 7065, 7066, 7067], dtype=int32), ' self.max_sv:', 5000.0)
Warning: 348.0 relevant support vectors thrown away!
('model.support:', array([ 0, 1, 2, ..., 6891, 6892, 6893], dtype=int32), ' self.max_sv:', 5000.0)
Warning: 453.0 relevant support vectors thrown away!
('model.support:', array([ 1, 2, 4, ..., 7036, 7038, 7039], dtype=int32), ' self.max_sv:', 5000.0)
Size of data: 20000
Warning: 2935.0 relevant support vectors thrown away!
('model.support:', array([ 9, 19, 23, ..., 9975, 9976, 9984], dtype=int32), ' self.max_sv:', 5000.0)
Warning: 2923.0 relevant support vectors thrown away!
('model.support:', array([ 13, 35, 42, ..., 9968, 9970, 9996], dtype=int32), ' self.max_sv:', 5000.0)
Time: 845.46
Predicting outcomes training set
18/08/19 22:16:16 ERROR Executor: Exception in task 1.0 in stage 9.0 (TID 19)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 229, in main
process()
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 224, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 362, in func
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1056, in <lambda>
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1056, in <genexpr>
File "/home/hduser/pythonwork/CVM/bin/mnist.py", line 57, in <lambda>
labelsAndPredsTrain = trainRDD.map(lambda p: (p.label, model.predict(p.features)))
File "/home/hduser/pythonwork/CVM/cvm/svm.py", line 20, in predict
return self.model.predict(features)
File "/home/hduser/anaconda2/lib/python2.7/site-packages/sklearn/svm/base.py", line 548, in predict
y = super(BaseSVC, self).predict(X)
File "/home/hduser/anaconda2/lib/python2.7/site-packages/sklearn/svm/base.py", line 308, in predict
X = self._validate_for_predict(X)
File "/home/hduser/anaconda2/lib/python2.7/site-packages/sklearn/svm/base.py", line 439, in _validate_for_predict
X = check_array(X, accept_sparse='csr', dtype=np.float64, order="C")
File "/home/hduser/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0.08627451 0. 0. 0.01960784
0.2745098 0.2745098 0.2745098 0.2745098 0.2745098 0.2745098
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0.00784314 0.30980392 0.91372549
0.98039216 0.96862745 0.96862745 0.96862745 0.99607843 0.98039216
0.93333333 0.93333333 0.84313725 0.47843137 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.02352941 0.22745098
0.41176471 0.61176471 0.99607843 0.99607843 0.9372549 0.78823529
0.45098039 0.54117647 0.49803922 0.18431373 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0.03529412 0.61176471 0.99607843 0.51764706 0.14901961
0.55294118 0.56862745 0.09411765 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.63921569
0.9254902 0.24313725 0.00392157 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0.11764706 0.98039216 0.33333333 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0.19215686 0.98823529 0.58431373 0.09803922 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.02745098 0.23529412
0.99607843 0.94509804 0.80392157 0.34901961 0. 0.
0. 0. 0. 0.01960784 0.44313725 0.18431373
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.04705882 0.49411765
0.98039216 1. 0.75686275 0.02745098 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0.05098039 0.31372549 0.8627451
0.94901961 0.45882353 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0.1254902 0.8 0.96078431
0.35294118 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0.03529412 0.30980392 0.00784314 0. 0. 0.
0. 0. 0.02352941 0.89019608 0.81176471 0.01568627
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.32156863 0.6745098
0.03137255 0. 0. 0. 0. 0.
0.23529412 0.78039216 0.74509804 0.05098039 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0.54117647 0.96862745 0.58039216 0.5254902
0.24313725 0.24313725 0.24313725 0.50196078 0.92156863 0.84705882
0.41176471 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0.18039216 0.6 0.74509804 0.78039216 0.96470588 0.96470588
0.96470588 0.92941176 0.6627451 0.16862745 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
解决方法:
labelsAndPredsTrain = trainRDD.map(lambda p: (p.label, model.predict(p.features.reshape(1,-1))))
在feature后增加reshape方法.
说明:
>>> a = [1,2,3,4]
>>> import numpy as np
>>> b = np.array(a)
>>> c = b.reshape(1,-1)
>>> print c
[[1 2 3 4]]
>>> print b
[1 2 3 4]
>>> print a
[1, 2, 3, 4]