How to config Java Spark sparksession samplesize

SVMJ :

I am new to Java Spark.

I am currently have issue with Mongodb ETL to hive, that could cause the field have different data type. So that I want to increase the sample size but I only see examples of scala while I am using Java, does anyone know if I setup to increase samplesize properly?

SparkSession spark = SparkSession.builder()
                .master("local[2]")
                .appName("SparkReadMgToHive")
                .config("spark.sql.warehouse.dir", warehouseLocation)
                .config("spark.mongodb.input.uri", "mongodb://localhost:27017/test.testcollection")
                .config("sampleSize", 50000)
                .enableHiveSupport()
                .getOrCreate();

many thanks

bottaio :

It's spark.mongodb.input.sampleSize

SparkSession spark = SparkSession.builder()
                .master("local[2]")
                .appName("SparkReadMgToHive")
                .config("spark.sql.warehouse.dir", warehouseLocation)
                .config("spark.mongodb.input.uri", "mongodb://localhost:27017/test.testcollection")
                .config("spark.mongodb.input.sampleSize", 50000)
                .enableHiveSupport()
                .getOrCreate();

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=396703&siteId=1