graphframes包Linux服务器部署

1.安装anaconda3.

2.下载graphframes包,官方下载地址:https://spark-packages.org/package/graphframes/graphframes,下载zip格式,上传至服务器。

3.在服务器上解压1中的压缩包,unzip  xx.zip。将/python/graphframes文件夹拷贝到anaconda3/lib/python/site-package/路径下。

4.安装pyspark 。conda install pyspark。

5.安装完毕。示例代码:

import sys
import os
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
import graphframes

CONF = SparkConf().setAppName("My app")
SC = SparkContext(conf=CONF)
sqlContext = SQLContext(SC)
v = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])
# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])
g = graphframes.GraphFrame(v, e)

# Query: Get in-degree of each vertex.
g.inDegrees.show()

# Query: Count the number of "follow" connections in the graph.
g.edges.filter("relationship = 'follow'").count()

# Run PageRank algorithm, and show results.
results = g.pageRank(resetProbability=0.01, maxIter=20)
results.vertices.select("id", "pagerank").show()

6.运行命令

spark-submit  \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 1 \
    --executor-memory 1G \
    --archives hdfs:///tmp/buming/tools/anaconda3_bm_v2.tar#anaconda3 \
 --jars hdfs:///tmp/cangyuan/package/graphframes_graphframes-0.5.0-spark2.1-s_2.11.jar,hdfs:///tmp/cangyuan/package/com.typesafe.scala-logging_scala-logging-api_2.11-2.1.2.jar,hdfs:///tmp/cangyuan/package/org.slf4j_slf4j-api-1.7.7.jar,hdfs:///tmp/cangyuan/package/com.typesafe.scala-logging_scala-logging-slf4j_2.11-2.1.2.jar,hdfs:///tmp/cangyuan/package/org.scala-lang_scala-reflect-2.11.0.jar \
    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda3/anaconda3/bin/python \
    graph_frame.py

猜你喜欢

转载自blog.csdn.net/weixin_42247685/article/details/81674082
今日推荐