Email of the author of the article: [email protected] Address: Huizhou, Guangdong
▲ This chapter’s program
⚪ Master the methods of using Spark’s SparkSQL;
⚪ Master Spark’s SparkSQL and call it through sql statements;
1. SparkSQL basic syntax - used through methods
1. Query
df.select("id","name").show();
2. Query with conditions
df.select($"id",$"name").where($"name" === "bbb").show()
3. Sorting query
orderBy/sort($"column name") sort in ascending order
orderBy/sort($"column name".desc) Sort in descending order
orderBy/sort($"Column 1", $"Column 2".desc) Sort by two columns
df.select($"id",$"name").orderBy($"name".desc).show
df.select($"id",$"name").sort($"name".desc).show
tabx.select($"id",$"name").sort($"id",$"name".desc).show
4. Group query
groupBy("column name", ...).max(column name) finds the maximum value
groupBy("column name", ...).min(column name) finds the minimum value
groupBy("column name", ...).avg(column name) find the average
groupBy("column name", ...).sum(column name) sum
groupBy("column name", ...).count() finds the number
groupBy("column name", ...).agg can aggregate multiple methods
scala>val rdd = sc.makeRDD(List((1,"a","bj",100),(2,"b","sh",80),(3,"c","gz",50),(4,"d","bj",45)));
scala>val df = rdd.toDF("id","name","addr","score");
scala>df.groupBy("addr").count().show()