1、搭建工程
1.可以使用local catalog
2.創建maven project,通過搜索scala-a
可以使用net.alchim31.maven
;最好注釋掉pom中的test部分并且刪除掉測試代碼
3.可以更改scala library container
4.win7本地運行spark
5、需在pom引入spark-core依賴及其他用到的spark依賴,scope設置成provided,用于maven打包編譯用(本地運行用user library導入spark jar即可)。使用maven-assembly-plugin引入打包其他非spark依賴
6、通過spark restful api提交請求(https://www.nitendragautam.com/bigdata/submit-apache-spark-job-with-rest-api/)1、URL:http://spark-cluster-ip:6066/v1/submissions/create
2、POST請求
3、Content-Type:application/json;charset=UTF-8
4、Body:{"clientSparkVersion":"1.5.0","appArgs":["2018-07","2018-07"],"mainClass":"class-package","appResource":"hdfs-jar-path","action":"CreateSubmissionRequest","sparkProperties":{"spark.jars":"hdfs-jar-path","spark.eventLog.enabled":"true","spark.eventLog.dir":"hdfs://m0.unidata:9000/history","spark.app.name":"custom-name","spark.submit.deployMode":"cluster","spark.master":"spark://m0.unidata:6066","spark.driver.supervise":"false","spark.executor.memory": "10g","spark.executor.cores": "5"},"environmentVariables":{"SPARK_ENV_LOADED":"1"}}
7.查看spark任務運行狀態
http://spark-cluster-ip:6066/v1/submissions/status/driver-yyyymmddhhmmss-XXXX
2、向hdfs上傳spark的jar文件可以使用hdfs explorer軟件或者使用程序上傳,最好不要使用ftp(jar會上傳不全)
3、hdfs更換namenode重新啟動
1.更改core-site.xml、excludes、hdfs-site.xml、slaves
2.格式化hadoop namenode -format
3.jps無datanode進程解決方法
4、hdfs更改副本數
1.設置副本數為1
./hdfs dfs -setrep -R 1 filepath
2../hdfs dfs -setrep -w 1 /
3.查看副本數:./hdfs dfs -ls /
4.文件健康檢查:./hdfs fsck /
5、單獨啟動某個datanode,
1.在該機器的hadoop目錄的sbin下執行
./hadoop-daemon.sh start datanode
6、刪除hdfs上文件
1.
./hdfs dfs -rmr -skipTrash file_path