Using KNIME to build Spark Machine learning model 1: Development environment construction

1. Knime Analytics installation

Download the appropriate version from the official website https://www.knime.com/downloads

Unzip the downloaded installation package in the installation path https://www.knime.com/installation-0

The following picture is the welcome page after knime is started

image.png

To interact with the spark set, the KNIME® Extension for Apache Spark needs to be installed in Knime. And install Spark Job Server on Hadoop cluster boundary nodes or nodes capable of executing spark-submit. The architecture diagram is as follows:

image.png

2. KNIME® Extension for Apache Spark installation

In KNIME Analytics, click File->Install KNIME extensions to select KNIME Big Data Extensions, and click Next to install.

image.png

3. SPARK JOB SERVER installation

The following steps take Centos 6.5 + CDH 5.7 as an example

3.1 Download spark job server

$ wget http://download.knime.org/store/3.5/spark-job-server-0.6.2.3-KNIME_cdh-5.7.tar.gz

3.2 login as root or su root

3.3 Installation

# LINKNAME=spark-job-server

# useradd -d /opt/${LINKNAME}/ -M -r -s /bin/false spark-job-server

# su -l -c "hdfs dfs -mkdir -p /user/spark-job-server ; hdfs dfs -chown -R spark-job-server /user/spark-job-server" hdfs

# cp spark-job-server-0.6.2.3-KNIME_cdh-5.7.tar.gz / opt

# cd / opt

# tar -xvf spark-job-server-0.6.2.3-KNIME_cdh-5.7.tar.gz

# ln -s spark-job-server-0.6.2.3-KNIME_cdh-5.7 ${LINKNAME}

#  chown -R spark-job-server:spark-job-server ${LINKNAME} spark-job-server-0.6.2.3-KNIME_cdh-5.7

3.4 Startup

# ln -s /opt/${LINKNAME}/spark-job-server-init.d /etc/init.d/${LINKNAME}

#chkconfig --levels 2345 ${LINKNAME} on

3.5 Edit environment.conf

set master, e.g.

master = "spark://ifrebdplatform1:7077"

设置Default settings for Spark contexts: context-settings

3.6 Edit settings settings.sh

Set SPARK_HOME, the default is correct in this example, do not change

Set LOG_DIR, if you do not use the default directory

3.7 Edit log4j-server.properties as you like

3.8 Start spark job server

/etc/init.d/${LINKNAME} start

3.9 Add create spark context node test link in knime

image.png

image.png

Right-click the create spark context node and click Execute

Right-click the create spark context node and click Spark Context to view the results

image.png

To be continued...

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326079175&siteId=291194637