Spark java本地程序开发

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/warrah/article/details/81874013

1 spark sql本地测试
下面的测试类,是通过main函数启动的,非常简单的sparksql,但具备代表性。

import java.util.Calendar;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

import com.dzmsoft.dcm.redis.enums.EventTypeEnum;
import com.dzmsoft.dcm.spark.straming.util.DateUtil;

public class SparkHiveTest {
    public static void main(String[] args) {
        init();
        StringBuffer sql = new StringBuffer("");
        sql.append(" select count(*) from (") 
        .append("select count(1) as cn from test_user")
        .append(" where ")
        .append(" eventType='").append(EventTypeEnum.LOGIN.value()).append("'")
        .append(" group by userId")
        .append(" ) t") ;
        Dataset<Row> rows = spark.sql(sql.toString());
        rows.show();
        spark.stop();
    }

    private static String appName = "SparkHiveTest";
     private static String master = "local[8]";
    private static SparkSession spark = null;

    public static void init() {
        spark = SparkSession.builder().appName(appName).master(master).
                enableHiveSupport().getOrCreate();
    }
}

1.1 local[x]是什么
如果参数是”local”,则在本地用单线程运行spark,如果是 local[4],则在本地用4核运行,解释可参考SparkContext、SparkConf和SparkSession的初始化
1.2 sparksql是如何找到对应的hive的
spark会自动寻找classpath路径下面的hive-stite.xm和就hbase-site.xml配置文件
1
hive-site.xml内容如下:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://dashuju213:9083,thrift://dashuju214:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>
</configuration>

hbase-site.xml配置文件的内容如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- 指定hbase在HDFS上存储的路径 -->
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://dashuju174:9000/hbase</value>
    </property>
    <!-- 指定hbase是分布式的 -->
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <!-- 指定zk的地址,多个用“,”分割 -->
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>dashuju172:2181,dashuju173:2181,dashuju174:2181</value>
    </property>
</configuration>

2 spark本地模式
待续。。。

猜你喜欢

转载自blog.csdn.net/warrah/article/details/81874013
今日推荐