predictionio_train解析-执行pio train代码解析

执行pio train代码解析
1. 首先会执行 Consoletrain函数
2. 然后调用 CreateWorkflowmain函数,本函数主要实现解析文件,train函数所需的参数,
接着调用 CoreWorkflowrunTrain函数,并传递参数
3. CoreWorkflow传入参数,调用 BaseEngine的的train接口,训练模型,并把训练出的模型保存到数据库中
BaseEngine的train函数声明如下。train右击find usages发现有两个调用地方: 1.CoreWorkflow 2.EngineTest
  def train(
    sc: SparkContext,
    engineParams: EngineParams,
    engineInstanceId: String,
    params: WorkflowParams): Seq[Any]
4. Engine继承BaseEngine具体实现train函数,利用传入参数,获取源数据,将源数据转换成spark实现的算法所支持的数据类型
调用算法,生成模型,并将模型返回到上一级(CoreWorkflow)。调用算法,生成模型函数为val models: Seq[Any] = algorithmList.map(_.trainBase(sc, pd))
5. 继续调用 BaseAlgorithmtrainBase抽象方法


一次执行执行pio train打印出的日志
 
/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/bin/pio train
[INFO] [Console$] Using existing engine manifest JSON at /home/workspacepredictionio/apache-predictionio-0.10.0-incubating/bin/MyRecommendation/manifest.json
[INFO] [Runner$] Submission command: /home/workspacepredictionio/apache-predictionio-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6/bin/spark-submit 
--class org.apache.predictionio.workflow.CreateWorkflow 
--jars file:/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/bin/MyRecommendation/target/scala-2.10/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/bin/MyRecommendation/target/scala-2.10/template-scala-parallel-recommendation_2.10-0.1-SNAPSHOT.jar 
--files file:/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/conf/log4j.properties 
--driver-class-path /home/workspacepredictionio/apache-predictionio-0.10.0-incubating/conf:/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/lib/postgresql-9.4-1204.jdbc41.jar:/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/lib/mysql-connector-java-5.1.37.jar file:/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/assembly/pio-assembly-0.10.0-incubating.jar --engine-id EaX0g27pu0MJruMHAnacq7VfNDiGMx7h 
--engine-version b6d0beede5fb6bb095015d9d26c90768d48c5428 
--engine-variant file:/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/bin/MyRecommendation/engine.json 
--verbosity 0 
--json-extractor Both 
--env PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_MYSQL_PASSWORD=123456,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://192.168.107.1:3306/pio,PIO_HOME=/home/workspacepredictionio/apache-predictionio-0.10.0-incubating,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_MYSQL_USERNAME=root,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL,PIO_CONF_DIR=/home/workspacepredictionio/apache-predictionio-0.10.0-incubating/conf
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(MyApp1,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://[email protected]:51581]
[WARN] [MetricsSystem] Using default name DAGScheduler for source because spark.app.id is not set.
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.reco.DataSource@2c829dbc
[INFO] [Engine$] Preparator: com.reco.Preparator@76e90da5
[INFO] [Engine$] AlgorithmList: List(com.reco.ALSAlgorithm@214beff9)
[INFO] [Engine$] Data sanity check is on.
[INFO] [Engine$] com.reco.TrainingData does not support data sanity check. Skipping check.
[INFO] [Engine$] com.reco.PreparedData does not support data sanity check. Skipping check.
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
[WARN] [LAPACK] Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
[WARN] [LAPACK] Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
[INFO] [Engine$] org.apache.spark.mllib.recommendation.ALSModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=6fe6c187-9aa4-4d81-93d9-5141e9097552
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.



猜你喜欢

转载自blog.csdn.net/zilong230905/article/details/73072241