一、问题描述
报错NoSuchMethodException: org.apache.spark.ml.classification.GBTClassificationModel
:
NoSuchMethodException: org.apache.spark.ml.classification.GBTClassificationModel
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:468)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
二、解决方案
对于pipeline model和model的save和load操作都是不一样的,并且对应的API不要用错,比如在load训练好的模型是用的GBTClassificationModel
,而不是使用GBTClassier
:
gbt = GBTClassifier(labelCol="indexedlabel",\
featuresCol="indexedFeatures",\
maxIter=10,\
maxDepth=12,\
stepSize=0.1,\
)
pipeline_model = Pipeline(stages = [labelIndexer,\
featureIndexer,\
gbt])
model_0701 = pipeline_model.fit(trainingData)
# save_model
model_0712 = model_0701.stages[2]
save_path2 = "file:///filepath"
model_0712.write().overwrite().save(save_path2)
# load_model
from pyspark.ml.classification import GBTClassificationModel
# 这里的load不能使用GBTClassier,这是个坑,很多博客写错
model_0712_load = GBTClassificationModel.load(save_path2)
model_0712_load
加载管道也可以使用PipelineModel
。
Reference
[1] https://cloud.tencent.com/developer/ask/sof/1483953/answer/2030428