Structured Streaming自定義MySQLSink

1.foreachBatch
spark2.4以后可以直接使用foreachBatch調用sparksql支持的jdbc批量寫mysql,如下:

/*使用2.4foreachBatch*/
val connectionProperties = PropertyConstants.getProperties()
resultDF
  .writeStream
  .foreachBatch { (batchDF: DataFrame, batchId: Long) =>
    batchDF.write().mode(SaveMode.Append).jdbc(connectionProperties.getProperty("url"),
        "tableName", connectionProperties)
  }
  .outputMode("Update")
  .start

2.foreach
但是批寫模式要么append,要么overwrite,不能按唯一鍵更新數(shù)據(jù),故需自定義sink。

import java.sql.{Connection, DriverManager, Timestamp}

import com.xxx.bigdata.utils.PropertyConstants
import org.apache.spark.sql.{ForeachWriter, Row}

import scala.collection.mutable.ArrayBuffer

class MySQLSink(tableName: String, fieldNames: Array[String]) extends ForeachWriter[Row]() {
  val connectionProperties = PropertyConstants.getProperties()
  var conn: Connection = _

  override def open(partitionId: Long, epochId: Long): Boolean = {
    Class.forName("com.mysql.jdbc.Driver")
    conn = DriverManager.getConnection(connectionProperties.getProperty("url"),
      connectionProperties)
    conn.setAutoCommit(false)
    true
  }

  override def process(value: Row): Unit = {
    val values = ArrayBuffer[String]()
    value.toSeq.foreach(_ => values += "?")
    val ps = conn.prepareStatement(
      s"""
         |replace into $tableName${fieldNames.mkString("(", ",", ")")}
         |values${values.mkString("(", ",", ")")}
       """.stripMargin)

    for (i <- 0 until value.size) {
      value.get(i) match {
        case v: Int => ps.setInt(i + 1, v)
        case v: Long => ps.setLong(i + 1, v)
        case v: Float => ps.setFloat(i + 1, v)
        case v: Double => ps.setDouble(i + 1, v)
        case v: String => ps.setString(i + 1, v)
        case v: Timestamp => ps.setTimestamp(i + 1, v)
      }
    }
    ps.execute()
    conn.commit()
  }

  override def close(errorOrNull: Throwable): Unit = {
    conn.close()
  }
}

調用:

    /*使用自定義MySQLSink */
    val mysqlSink = new MySQLSink("tableName", resultDF.schema.fieldNames)
    resultDF
      .writeStream
      .outputMode("update")
      .foreach(mysqlSink)
      .start

3.功能擴展
此為單條插入,也可在close里擴展批量,類型匹配也可以擴展。

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容