spark-jdbc-oracle（scala）

jdbc reads oracle data

JDBC connection properties
property name and meaning
url: The JDBC URL to connect to. Columns such as: jdbc:mysql://ip:3306
dbtable: The JDBC table that should be read. Subqueries in parentheses can be used instead of full tables.
driver: The class name of the JDBC driver used to connect to this URL, such as: com.mysql.jdbc.Driver

partitionColumn, lowerBound, upperBound, numPartitions:
These options only apply to read data. These options must be specified at the same time. They describe how to split a table when reading data from multiple workers in parallel.
partitionColumn: Must be a numeric column in the table.
lowerBound and upperBound are only used to determine the size of the partition, not to filter rows in the table.
All rows in the table will be split and returned.

fetchsize: Only for read data. JDBC fetch size, used to determine the number of rows per fetch. This can help JDBC drivers tune performance, which by default have low fetch sizes (e.g. Oracle fetches 10 rows per fetch).

batchsize: only for write data. JDBC batch size, used to determine the number of rows per insert.
This can help the JDBC driver to tune performance. Default is 1000.

isolationLevel: Only applicable to write data. Transaction isolation level, applicable to the current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to the connection object defined by JDBC, and defaults to the standard transaction isolation level of READ_UNCOMMITTED. See the documentation for java.sql.Connection.

truncate: Only for write data. When SaveMode.Overwrite is enabled, this option will truncate the table in MySQL instead of dropping and rebuilding its existing table. This can be more efficient and prevents table metadata (eg, indexes) from being stripped. However, in some cases, such as when the new data has a different schema, it won't work. It defaults to false.

createTableOptions: Only for write data. This option allows specific database table and partitioning options to be set when creating a table (eg CREATE TABLE t (name string) ENGINE=InnoDB.).

method one


 val prop = new java.util.Properties
    prop.setProperty("user", stateConf.getString(Constant.JdbcUserName))
    prop.setProperty("password", stateConf.getString(Constant.JdbcPasswd))

   val tableName = "test"
    val sql = s"select id,REPORT_TIME from $tableName"

    //Condition in the where clause for each partition.
    val predicates =
      Array(
        "2015-09-16" -> "2015-09-30",
        "2015-10-01" -> "2015-10-15",
        "2015-10-16" -> "2015-10-31",
        "2015-11-01" -> "2015-11-14",
        "2015-11-15" -> "2015-11-30",
        "2015-12-01" -> "2015-12-15"
      ).map {
        case (start, end) =>
          s"cast(REPORT_TIME as date) >= date '$start' " + s"AND cast(REPORT_TIME as date) <= date '$end'"
      }
    sparkSession.read.jdbc(
      stateConf.getString(Constant.JdbcUrl),
      s"($sql) a",
      predicates: Array[String],
      prop: Properties
    )

Method 2

 val tableName = "test"
    val sql = s"select VID,REPORT_TIME from $tableName"
    val allDayReportDS =
      sparkSession.read.format("jdbc")
        .options(
          Map(
        //"jdbc:oracle:thin:username/password@//192.168.0.89:1521/epps"
            JDBCOptions.JDBC_URL -> s"jdbc:oracle:thin:${stateConf.getString(Constant.JdbcUserName)}/${stateConf.getString(Constant.JdbcPasswd)}@//${stateConf.getString("host")}/${stateConf.getString("database")}",
            JDBCOptions.JDBC_TABLE_NAME -> s"(select id,REPORT_TIME from $tableName) a",
            JDBCOptions.JDBC_DRIVER_CLASS -> s"${stateConf.getString(Constant.JdbcDriver)}",
            JDBCOptions.JDBC_PARTITION_COLUMN -> "REPORT_TIME", //必须是数字列
            JDBCOptions.JDBC_LOWER_BOUND -> "1",
            JDBCOptions.JDBC_UPPER_BOUND -> "1000",
            JDBCOptions.JDBC_NUM_PARTITIONS -> "5",
            JDBCOptions.JDBC_BATCH_FETCH_SIZE -> "10",
            JDBCOptions.JDBC_TRUNCATE -> "false",
            JDBCOptions.JDBC_CREATE_TABLE_OPTIONS -> "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8",
            JDBCOptions.JDBC_BATCH_INSERT_SIZE -> "1000",
            JDBCOptions.JDBC_TXN_ISOLATION_LEVEL -> "READ_UNCOMMITTED"

          )
        )
        .load()

write operation

method one

 val prop = new java.util.Properties
    prop.setProperty("user", stateConf.getString(Constant.JdbcUserName))
    prop.setProperty("password", stateConf.getString(Constant.JdbcPasswd))
    prop.setProperty("batchsize", "10")
    prop.setProperty("isolationLevel", "READ_UNCOMMITTED")

    result.write.jdbc(stateConf.getString(Constant.JdbcUrl), "TEST", prop)

Method 2

 result.write
      .partitionBy("time")
      .mode(SaveMode.Overwrite)
      .format("jdbc")
      .option("url", stateConf.getString(Constant.JdbcUrl))
      .option("user", stateConf.getString(Constant.JdbcUserName))
      .option("password", stateConf.getString(Constant.JdbcPasswd))
      .option(JDBCOptions.JDBC_TABLE_NAME, "TEST")
      .option(JDBCOptions.JDBC_TRUNCATE, "true")
      .option(JDBCOptions.JDBC_BATCH_INSERT_SIZE, "1000")
      .option(JDBCOptions.JDBC_TXN_ISOLATION_LEVEL, "READ_UNCOMMITTED")
      //      .option("createTableOptions", "ENGINE tableEngineName")
      .option(JDBCOptions.JDBC_CREATE_TABLE_OPTIONS, "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8")
      .save()

spark-jdbc-oracle（scala）

Guess you like