Structure Streaming和spark streaming原生API訪問HDFS文件數據對比

Structure Stream訪問方式

code examples

import org.apache.spark.sql.streaming._
val df = spark.readStream.text("/home/testhdfs")
val ps = df.writeStream.format("console").outputMode(OutputMode.Append).start

結論

basedir = /home/testhdfs
支持:mv file to basedir(/home/testhdfs)
不支持:mv directory to basedir

如果往basedir里面添加文件夾會出現ERROR:

java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths:
        hdfs://172.17.1.180:9000/home/testhdfs/data1
        hdfs://172.17.1.180:9000/home/testhdfs
If provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.

spark streaming 訪問方式

測試textFile接口使用

import org.apache.spark.streaming._
val ssc = StreamingContext.getActiveOrCreate(() => new StreamingContext(sc,                  Seconds(120)))
val ds1 = ssc.textFileStream("/home/testhdfs2")
ds1.print
ssc.start

結論

支持:mv file to basedir(/home/testhdfs2)
支持:mv directory to basedir

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容