[Use IDEA to develop Spark 3.4.1 project based on Scala2.12.18]

Use IDEA to create a Spark project

After opening IDEA, select a location to create a new project

siting sbt options
insert image description here

Configure JDK
insert image description here
insert image description here

debug

solution
insert image description here
insert image description here

If there are many problems with related dependency downloads, you can close the idea, restart it and wait.

Set up sbt dependencies

  • Set sbt source to domestic source
  • Add dependencies based on sbt
    • spark-sql
    • spark-core
ThisBuild / version := "0.1.0-SNAPSHOT"

ThisBuild / scalaVersion := "2.12.18"

lazy val root = (project in file("."))
  .settings(
    name := "Spark341Learning",
    idePackagePrefix := Some("cn.lh.spark341"),
    resolvers += "HUAWEI" at "https://mirrors.huaweicloud.com/repository/maven",
    updateOptions := updateOptions.value.withCachedResolution(true),
    libraryDependencies += "org.apache.spark" %% "spark-core" % "3.4.1",
    libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.4.1"
  )

Create the Spark project structure

insert image description here

New Scala code

Spark sql simleapp code is as follows:

package cn.lh.spark341
package SQL

import org.apache.spark.sql.SparkSession

object SimpleApp {
    
    

  def main(args: Array[String]): Unit = {
    
    
    //    Spark开发三部曲
    //    step 1: 准备Spark会话 spark sql 会话
    val spark = SparkSession.builder.appName("SimpeApp").master("local[2]").getOrCreate()
    //    step 2:Spark处理逻辑
    val logF = "D:\\Programs\\spark-3.4.1-bin-hadoop3\\README.md"
    val logD = spark.read.textFile(logF).cache()
    val numA = logD.filter(line => line.contains("a")).count()
    val numB = logD.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numA, Lines with b: $numB")
    //    step 3: 关闭Spark会话
    spark.stop()
  }

}

insert image description here

Spark RDD code is as follows:

package cn.lh.spark341
package RDD

import org.apache.spark.{
    
    SparkConf, SparkContext}

object RDDtest1 {
    
    

  def main(args:Array[String]): Unit ={
    
    
//    Spark RDD开发三部曲
//    step1: 创建SparkContext对象
    val conf = new SparkConf().setAppName("RDDtest1").setMaster("local[2]")
    val sc = new SparkContext(conf)
//    step2: Spark处理逻辑代码
    val data = Array(1,2,3,4,5)
    val distData = sc.parallelize(data)
    val i: Int = distData.reduce((a, b) => a + b)
//    step3:关闭SparkContext对象
    println(i)
    sc.stop()
  }

}

insert image description here

So far, the development of Spark 3.4.1 project based on Scala2.12.18 is completed.

Guess you like

Origin blog.csdn.net/pblh123/article/details/131946741