S2X环境搭建与示例运行
http://dbis.informatik.uni-freiburg.de/forschung/projekte/DiPoS/S2X.html
环境
- Maven project
- Built in Eclipse
- Eclipse 3.8/4.2/4.3 (Juno & Kepler)
- Juno https://www.eclipse.org/downloads/packages/release/juno/sr2
- Kepler https://www.eclipse.org/downloads/packages/release/kepler/sr2
- The Eclipse Plug-Ins M2E (Version 1.5.0)
- Scala plugin from scala-ide.org (Version 3.0.3v-2_10...)
- The 3.0.3 release is the third maintenance release of the 3.0 version.
- It is available for Scala 2.10, on Eclipse 3.8/4.2/4.3 (Juno & Kepler).
依赖
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.2.0-cdh5.3.0 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0-cdh5.3.0</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-graphx_2.10/1.2.0-cdh5.3.0 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.10</artifactId>
<version>1.2.0-cdh5.3.0</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client/2.5.0-mr1-cdh5.3.0 -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.5.0-mr1-cdh5.3.0</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.jena/jena-arq/2.11.2 -->
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>jena-arq</artifactId>
<version>2.11.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.24.0 http://blog.51cto.com/nettm/1702453 -->
<dependency>
<groupId>com.esotericsoftware.kryo</groupId>
<artifactId>kryo</artifactId>
<version>2.24.0</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/junit/junit/4.11 -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
结构
java文件名 | 引用包 | 功能 |
---|---|---|
QueryExecutor.java | log4j:一个打日志的包 spark:操作spark的包 jena:用于操作sparql的包 |
1. 将args交给 ArgumentParser 解析 2. 使用SparkFacade创建spark上下文 3. 使用SparkFacade加载 HDFS (实例层)文件 4. 处理查询语句 - 使用IntermediateResultsModel清理中间值 - 使用jena.query.QueryFactory解析并生成query对象、处理前缀、编译query - 使用AlgebraTranslator从opRoot开始对query进行重写,并获得可执行序列 - 通过SparkOp.execute()开始执行顺序序列中元素 |
ArgumentParser.java | cls:一个处理命令行的包 log4j:一个打日志的包 de.tf.uni.freiburg.sparkrdf.constants.Const:工程内另一个包 |
1. 解析命令行参数 2. 将参数传递至de.tf.uni.freiburg.sparkrdf.constants.Const中 |
环境搭建步骤
- 安装并配置好 jdk1.7 或 jdk1.8
- 访问网站,下载并安装
Eclipse Kepler Package
并解压,建议选择 JavaEE 并使用国内镜像下载 - 配置 Maven 环境
- 访问 http://maven.apache.org/download.cgi 下载 Maven
- 新建环境变量
MAVEN_HOME
,并将%MAVEN_HOME%\bin
加入path
- 修改
%MAVEN_HOME%\conf\settings.xml
- 添加本地仓库:请选择一个非管理员目录
- 添加远程仓库:可选择国内阿里镜像或其他可访问镜像
- 配置 Eclipse 中 Maven
- Window->preference->Maven->installations :选择上一步安装的 Maven
- Window->preference->Maven->user settings :配置 Maven 设置
- 选择 settings.xml
- 更新 Local Repository
- 可参考
- 配置scala环境
- 访问 https://www.scala-lang.org/download/2.10.6.html 下载
Scala 2.10.6
- 可下载 scala.msi 或 scala-2.10.6.zip ,但我还没弄明白这两个有什么区别
- 配置 Eclipse 中 Scala
- 访问 http://scala-ide.org/download/prev-stable.html
- 选择对应版本 http://download.scala-ide.org/sdk/helium/e38/scala210/stable/site (我没弄明白应该用哪个版本,然后选了2.10.4的)
- Help->Install New Software->粘贴->Add->下载安装
- 下载 Scalastyle 插件
- 访问 https://www.scala-lang.org/download/2.10.6.html 下载
- 配置hadoop、spark环境