首先,我們在使用前先看看HDFS是什麼?這將有助于我們是以后的運(yùn)維使用和故障排除思路的獲得。
HDFS采用master/slave架構(gòu)。一個(gè)HDFS集群是由一個(gè)Namenode和一定數(shù)目的Datanodes組成。Namenode是一個(gè)中心服務(wù)器,負(fù)責(zé)管理文件系統(tǒng)的名字空間(namespace)以及客戶端對文件的訪問。
集群中的Datanode一般是一個(gè)節(jié)點(diǎn)一個(gè),負(fù)責(zé)管理它所在節(jié)點(diǎn)上的存儲(chǔ)。HDFS暴露了文件系統(tǒng)的名字空間,用戶能夠以文件的形式在上面存儲(chǔ)數(shù)據(jù)。從內(nèi)部看,一個(gè)文件其實(shí)被分成一個(gè)或多個(gè)數(shù)據(jù)塊,這些塊存儲(chǔ)在一組Datanode上。Namenode執(zhí)行文件系統(tǒng)的名字空間操作,比如打開、關(guān)閉、重命名文件或目錄。它也負(fù)責(zé)確定數(shù)據(jù)塊到具體Datanode節(jié)點(diǎn)的映射。Datanode負(fù)責(zé)處理文件系統(tǒng)客戶端的讀寫請求。在Namenode的統(tǒng)一調(diào)度下進(jìn)行數(shù)據(jù)塊的創(chuàng)建、刪除和復(fù)制。
HDFS采用Java語言開發(fā),因此任何支持Java的機(jī)器都可以部署Namenode或Datanode。由于采用了可移植性極強(qiáng)的Java語言,使得HDFS可以部署到多種類型的機(jī)器上。一個(gè)典型的部署場景是一臺(tái)機(jī)器上只運(yùn)行一個(gè)Namenode實(shí)例,而集群中的其它機(jī)器分別運(yùn)行一個(gè)Datanode實(shí)例。這種架構(gòu)并不排斥在一臺(tái)機(jī)器上運(yùn)行多個(gè)Datanode,只不過這樣的情況比較少見。(偽分布式就是采用這種方式,適合資源短缺的測試環(huán)境,盡量避免用作生產(chǎn)。)
HDFS支持傳統(tǒng)的層次型文件組織結(jié)構(gòu)。用戶或者應(yīng)用程序可以創(chuàng)建目錄,然后將文件保存在這些目錄里。文件系統(tǒng)名字空間的層次結(jié)構(gòu)和大多數(shù)現(xiàn)有的文件系統(tǒng)類似:用戶可以創(chuàng)建、刪除、移動(dòng)或重命名文件。當(dāng)前,HDFS不支持用戶磁盤配額和訪問權(quán)限控制,也不支持硬鏈接和軟鏈接。但HDFS架構(gòu)并不妨礙實(shí)現(xiàn)這些特性。Namenode負(fù)責(zé)維護(hù)文件系統(tǒng)的名字空間,任何對文件系統(tǒng)名字空間或?qū)傩缘男薷亩紝⒈籒amenode記錄下來。應(yīng)用程序可以設(shè)置HDFS保存的文件的副本數(shù)目。文件副本的數(shù)目稱為文件的副本系數(shù),這個(gè)信息也是由Namenode保存的。
HDFS被設(shè)計(jì)成能夠在一個(gè)大集群中跨機(jī)器可靠地存儲(chǔ)超大文件。它將每個(gè)文件存儲(chǔ)成一系列的數(shù)據(jù)塊,除了最后一個(gè),所有的數(shù)據(jù)塊都是同樣大小的。為了容錯(cuò),文件的所有數(shù)據(jù)塊都會(huì)有副本。每個(gè)文件的數(shù)據(jù)塊大小和副本系數(shù)都是可配置的。應(yīng)用程序可以指定某個(gè)文件的副本數(shù)目。副本系數(shù)可以在文件創(chuàng)建的時(shí)候指定,也可以在之后改變。HDFS中的文件都是一次性寫入的,并且嚴(yán)格要求在任何時(shí)候只能有一個(gè)寫入者。
Namenode全權(quán)管理數(shù)據(jù)塊的復(fù)制,它周期性地從集群中的每個(gè)Datanode接收心跳信號和塊狀態(tài)報(bào)告(Blockreport)。接收到心跳信號意味著該Datanode節(jié)點(diǎn)工作正常。塊狀態(tài)報(bào)告包含了一個(gè)該Datanode上所有數(shù)據(jù)塊的列表。
副本的存放是HDFS可靠性和性能的關(guān)鍵。優(yōu)化的副本存放策略是HDFS區(qū)分于其他大部分分布式文件系統(tǒng)的重要特性。
大型HDFS實(shí)例一般運(yùn)行在跨越多個(gè)機(jī)架的計(jì)算機(jī)組成的集群上,不同機(jī)架上的兩臺(tái)機(jī)器之間的通訊需要經(jīng)過交換機(jī)。Namenode可以確定每個(gè)Datanode所屬的機(jī)架id。一個(gè)簡單但沒有優(yōu)化的策略就是將副本存放在不同的機(jī)架上。但是為了降低整體的帶寬消耗和讀取延時(shí),HDFS會(huì)盡量讓讀取程序讀取離它最近的副本。如果在讀取程序的同一個(gè)機(jī)架上有一個(gè)副本,那么就讀取該副本。如果一個(gè)HDFS集群跨越多個(gè)數(shù)據(jù)中心,那么客戶端也將首先讀本地?cái)?shù)據(jù)中心的副本。
當(dāng)Namenode啟動(dòng)后會(huì)進(jìn)入一個(gè)稱為安全模式的特殊狀態(tài)。處于安全模式的Namenode是不會(huì)進(jìn)行數(shù)據(jù)塊的復(fù)制的。Namenode從所有的 Datanode接收心跳信號和塊狀態(tài)報(bào)告。塊狀態(tài)報(bào)告包括了某個(gè)Datanode所有的數(shù)據(jù)塊列表。每個(gè)數(shù)據(jù)塊都有一個(gè)指定的最小副本數(shù)。當(dāng)Namenode檢測確認(rèn)某個(gè)數(shù)據(jù)塊的副本數(shù)目達(dá)到這個(gè)最小值,那么該數(shù)據(jù)塊就會(huì)被認(rèn)為是副本安全(safely replicated)的;在一定百分比(這個(gè)參數(shù)可配置)的數(shù)據(jù)塊被Namenode檢測確認(rèn)是安全之后(加上一個(gè)額外的30秒等待時(shí)間),Namenode將退出安全模式狀態(tài)。接下來它會(huì)確定還有哪些數(shù)據(jù)塊的副本沒有達(dá)到指定數(shù)目,并將這些數(shù)據(jù)塊復(fù)制到其他Datanode上。
Namenode上保存著HDFS的名字空間。對于任何對文件系統(tǒng)元數(shù)據(jù)產(chǎn)生修改的操作,Namenode都會(huì)使用一種稱為EditLog的事務(wù)日志記錄下來。例如,在HDFS中創(chuàng)建一個(gè)文件,Namenode就會(huì)在Editlog中插入一條記錄來表示;同樣地,修改文件的副本系數(shù)也將往Editlog插入一條記錄。Namenode在本地操作系統(tǒng)的文件系統(tǒng)中存儲(chǔ)這個(gè)Editlog。整個(gè)文件系統(tǒng)的名字空間,包括數(shù)據(jù)塊到文件的映射、文件的屬性等,都存儲(chǔ)在一個(gè)稱為FsImage的文件中,這個(gè)文件也是放在Namenode所在的本地文件系統(tǒng)上。
Namenode在內(nèi)存中保存著整個(gè)文件系統(tǒng)的名字空間和文件數(shù)據(jù)塊映射(Blockmap)的映像。當(dāng)Namenode啟動(dòng)時(shí),它從硬盤中讀取Editlog和FsImage,將所有Editlog中的事務(wù)作用在內(nèi)存中的FsImage上,并將這個(gè)新版本的FsImage從內(nèi)存中保存到本地磁盤上,然后刪除舊的Editlog,因?yàn)檫@個(gè)舊的Editlog的事務(wù)都已經(jīng)作用在FsImage上了。這個(gè)過程稱為一個(gè)檢查點(diǎn)(checkpoint)。在當(dāng)前實(shí)現(xiàn)中,檢查點(diǎn)只發(fā)生在Namenode啟動(dòng)時(shí),在不久的將來將實(shí)現(xiàn)支持周期性的檢查點(diǎn)。
Datanode將HDFS數(shù)據(jù)以文件的形式存儲(chǔ)在本地的文件系統(tǒng)中,它并不知道有關(guān)HDFS文件的信息。它把每個(gè)HDFS數(shù)據(jù)塊存儲(chǔ)在本地文件系統(tǒng)的一個(gè)單獨(dú)的文件中。Datanode并不在同一個(gè)目錄創(chuàng)建所有的文件,實(shí)際上,它用試探的方法來確定每個(gè)目錄的最佳文件數(shù)目,并且在適當(dāng)?shù)臅r(shí)候創(chuàng)建子目錄。在同一個(gè)目錄中創(chuàng)建所有的本地文件并不是最優(yōu)的選擇,這是因?yàn)楸镜匚募到y(tǒng)可能無法高效地在單個(gè)目錄中支持大量的文件。當(dāng)一個(gè)Datanode啟動(dòng)時(shí),它會(huì)掃描本地文件系統(tǒng),產(chǎn)生一個(gè)這些本地文件對應(yīng)的所有HDFS數(shù)據(jù)塊的列表,然后作為報(bào)告發(fā)送到Namenode,這個(gè)報(bào)告就是塊狀態(tài)報(bào)告。
所有的HDFS通訊協(xié)議都是建立在TCP/IP協(xié)議之上。客戶端通過一個(gè)可配置的TCP端口連接到Namenode,通過ClientProtocol協(xié)議與Namenode交互。而Datanode使用DatanodeProtocol協(xié)議與Namenode交互。
HDFS的主要目標(biāo)就是即使在出錯(cuò)的情況下也要保證數(shù)據(jù)存儲(chǔ)的可靠性。常見的三種出錯(cuò)情況是:Namenode出錯(cuò), Datanode出錯(cuò)和網(wǎng)絡(luò)割裂(network partitions)。每個(gè)Datanode節(jié)點(diǎn)周期性地向Namenode發(fā)送心跳信號。網(wǎng)絡(luò)割裂可能導(dǎo)致一部分Datanode跟Namenode失去聯(lián)系。Namenode通過心跳信號的缺失來檢測這一情況,并將這些近期不再發(fā)送心跳信號Datanode標(biāo)記為宕機(jī),不會(huì)再將新的IO請求發(fā)給它們。任何存儲(chǔ)在宕機(jī)Datanode上的數(shù)據(jù)將不再有效。Datanode的宕機(jī)可能會(huì)引起一些數(shù)據(jù)塊的副本系數(shù)低于指定值,Namenode不斷地檢測這些需要復(fù)制的數(shù)據(jù)塊,一旦發(fā)現(xiàn)就啟動(dòng)復(fù)制操作。在下列情況下,可能需要重新復(fù)制:某個(gè)Datanode節(jié)點(diǎn)失效,某個(gè)副本遭到損壞,Datanode上的硬盤錯(cuò)誤,或者文件的副本系數(shù)增大。
HDFS的架構(gòu)支持?jǐn)?shù)據(jù)均衡策略。如果某個(gè)Datanode節(jié)點(diǎn)上的空閑空間低于特定的臨界點(diǎn),按照均衡策略系統(tǒng)就會(huì)自動(dòng)地將數(shù)據(jù)從這個(gè)Datanode移動(dòng)到其他空閑的Datanode。
從某個(gè)Datanode獲取的數(shù)據(jù)塊有可能是損壞的,損壞可能是由Datanode的存儲(chǔ)設(shè)備錯(cuò)誤、網(wǎng)絡(luò)錯(cuò)誤或者軟件bug造成的。HDFS客戶端軟件實(shí)現(xiàn)了對HDFS文件內(nèi)容的校驗(yàn)和(checksum)檢查。當(dāng)客戶端創(chuàng)建一個(gè)新的HDFS文件,會(huì)計(jì)算這個(gè)文件每個(gè)數(shù)據(jù)塊的校驗(yàn)和,并將校驗(yàn)和作為一個(gè)單獨(dú)的隱藏文件保存在同一個(gè)HDFS名字空間下。當(dāng)客戶端獲取文件內(nèi)容后,它會(huì)檢驗(yàn)從Datanode獲取的數(shù)據(jù)跟相應(yīng)的校驗(yàn)和文件中的校驗(yàn)和是否匹配,如果不匹配,客戶端可以選擇從其他Datanode獲取該數(shù)據(jù)塊的副本。
FsImage和Editlog是HDFS的核心數(shù)據(jù)結(jié)構(gòu)。如果這些文件損壞了,整個(gè)HDFS實(shí)例都將失效。因而,Namenode可以配置成支持維護(hù)多個(gè)FsImage和Editlog的副本。當(dāng)Namenode重啟的時(shí)候,它會(huì)選取最近的完整的FsImage和Editlog來使用。
HDFS被設(shè)計(jì)成支持大文件,適用HDFS的是那些需要處理大規(guī)模的數(shù)據(jù)集的應(yīng)用。這些應(yīng)用都是只寫入數(shù)據(jù)一次,但卻讀取一次或多次,并且讀取速度應(yīng)能滿足流式讀取的需要。HDFS支持文件的“一次寫入多次讀取”語義。一個(gè)典型的數(shù)據(jù)塊大小是64MB。因而,HDFS中的文件總是按照64M被切分成不同的塊,每個(gè)塊盡可能地存儲(chǔ)于不同的Datanode中。
客戶端創(chuàng)建文件的請求其實(shí)并沒有立即發(fā)送給Namenode,事實(shí)上,在剛開始階段HDFS客戶端會(huì)先將文件數(shù)據(jù)緩存到本地的一個(gè)臨時(shí)文件。應(yīng)用程序的寫操作被透明地重定向到這個(gè)臨時(shí)文件。當(dāng)這個(gè)臨時(shí)文件累積的數(shù)據(jù)量超過一個(gè)數(shù)據(jù)塊的大小,客戶端才會(huì)聯(lián)系Namenode。Namenode將文件名插入文件系統(tǒng)的層次結(jié)構(gòu)中,并且分配一個(gè)數(shù)據(jù)塊給它。然后返回Datanode的標(biāo)識(shí)符和目標(biāo)數(shù)據(jù)塊給客戶端。當(dāng)文件關(guān)閉時(shí),在臨時(shí)文件中剩余的沒有上傳的數(shù)據(jù)也會(huì)傳輸?shù)街付ǖ腄atanode上。然后客戶端告訴Namenode文件已經(jīng)關(guān)閉。此時(shí)Namenode才將文件創(chuàng)建操作提交到日志里進(jìn)行存儲(chǔ)。如果Namenode在文件關(guān)閉前宕機(jī)了,則該文件將丟失。
當(dāng)客戶端向HDFS文件寫入數(shù)據(jù)的時(shí)候,一開始是寫到本地臨時(shí)文件中。假設(shè)該文件的副本系數(shù)設(shè)置為3,當(dāng)本地臨時(shí)文件累積到一個(gè)數(shù)據(jù)塊的大小時(shí),客戶端會(huì)從Namenode獲取一個(gè)Datanode列表用于存放副本。然后客戶端開始向第一個(gè)Datanode傳輸數(shù)據(jù),第一個(gè)Datanode一小部分一小部分(4 KB)地接收數(shù)據(jù),將每一部分寫入本地倉庫,并同時(shí)傳輸該部分到列表中第二個(gè)Datanode節(jié)點(diǎn)。第二個(gè)Datanode也是這樣,一小部分一小部分地接收數(shù)據(jù),寫入本地倉庫,并同時(shí)傳給第三個(gè)Datanode。最后,第三個(gè)Datanode接收數(shù)據(jù)并存儲(chǔ)在本地。因此,Datanode能流水線式地從前一個(gè)節(jié)點(diǎn)接收數(shù)據(jù),并在同時(shí)轉(zhuǎn)發(fā)給下一個(gè)節(jié)點(diǎn),數(shù)據(jù)以流水線的方式從前一個(gè)Datanode復(fù)制到下一個(gè)。
當(dāng)一個(gè)文件的副本系數(shù)被減小后,Namenode會(huì)選擇過剩的副本刪除。下次心跳檢測時(shí)會(huì)將該信息傳遞給Datanode。Datanode遂即移除相應(yīng)的數(shù)據(jù)塊,集群中的空閑空間加大。注意,從用戶刪除文件到HDFS空閑空間的增加之間會(huì)有一定時(shí)間的延遲。
好的,下面我們就先看一下有哪些命令語法是經(jīng)常用到的?
DFSAdmin 命令用來管理HDFS集群。這些命令只有HDSF的管理員才能使用。
顯示Datanode列表
[hadoop@qk ~]$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Configured Capacity: 8318124032 (7.75 GB)
Present Capacity: 7273684992 (6.77 GB)
DFS Remaining: 7273619456 (6.77 GB)
DFS Used: 65536 (64 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (1):
Name: 192.168.1.230:50010 (qk)
Hostname: qk
Decommission Status : Normal
Configured Capacity: 8318124032 (7.75 GB)
DFS Used: 65536 (64 KB)
Non DFS Used: 1044439040 (996.05 MB)
DFS Remaining: 7273619456 (6.77 GB)
DFS Used%: 0.00%
DFS Remaining%: 87.44%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 22 17:19:26 CST 2017
將集群置于不同模式(enter:進(jìn)入安全模式;leave:離開安全模式;get:獲知是否開啟安全模式;wait:等待離開安全模式)
hdfs dfsadmin -safemode < enter | leave |get| wait >
[hadoop@qk wc-in]$ hdfs dfsadmin -safemode enter
Safe mode is ON
[hadoop@qk wc-in]$ hdfs dfsadmin -safemode leave
Safe mode is OFF
[hadoop@qk wc-in]$ hdfs dfsadmin -safemode get
Safe mode is OFF
使Datanode節(jié)點(diǎn)datanodename退役
[hadoop@qk ~]$hadoop dfsadmin -decommission datanodename
DFSShell,命令的語法和用戶熟悉的其他shell(例如 bash, csh)工具類似。
查看當(dāng)前HDFS目錄結(jié)構(gòu) : -ls(-R更詳細(xì)列出)
[hadoop@qk ~]$ hadoop fs -ls /
Found 2 items
drwx-wx-wx? - hadoop supergroup? ? ? ? ? 0 2017-03-22 17:02 /tmp
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-22 16:23 /user
[hadoop@qk ~]$ hadoop fs -ls -R /
drwx-wx-wx? - hadoop supergroup? ? ? ? ? 0 2017-03-22 17:02 /tmp
drwx-wx-wx? - hadoop supergroup? ? ? ? ? 0 2017-03-17 17:59 /tmp/hive
drwx------? - hadoop supergroup? ? ? ? ? 0 2017-03-20 11:23 /tmp/hive/hadoop
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-22 16:23 /user
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-22 16:27 /user/hadoop
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-22 16:27 /user/hadoop/wc-in
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 16:27 /user/hadoop/wc-in/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in/b.txt
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-22 16:27 /user/hadoop/wc-out
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 0 2017-03-22 16:27 /user/hadoop/wc-out/_SUCCESS
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-out/part-r-00000
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-17 18:40 /user/hive
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-20 11:22 /user/hive/warehouse
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-17 18:40 /user/hive/warehouse/test.db
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-20 11:22 /user/hive/warehouse/wh301.db
創(chuàng)建一個(gè)目錄: -mkdir?
[hadoop@qk ~]$ hadoop fs -mkdir /tmp/foodir
[hadoop@qk ~]$ hadoop fs -ls /tmp
Found 2 items
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2017-03-22 17:27 /tmp/foodir
drwx-wx-wx? - hadoop supergroup? ? ? ? ? 0 2017-03-17 17:59 /tmp/hive
查看目錄下的文件內(nèi)容 :-cat
[hadoop@qk ~]$ hadoop fs -cat /user/hadoop/wc-out/part-r-00000
bla 3
wa 2
刪除文件: -rm
[hadoop@qk ~]$ hadoop fs -ls -R /user/hadoop/wc-out/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 0 2017-03-22 16:27 /user/hadoop/wc-out/_SUCCESS
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-out/part-r-00000
[hadoop@qk ~]$ hadoop fs -rm /user/hadoop/wc-out/part-r-00000
17/03/22 17:32:16 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/hadoop/wc-out/part-r-00000
[hadoop@qk ~]$ hadoop fs -ls -R /user/hadoop/wc-out/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 0 2017-03-22 16:27 /user/hadoop/wc-out/_SUCCESS
從本地復(fù)制到HDFS上 : -put
注意:要復(fù)制的目標(biāo)文件不能存在魚HDFS上,否則命令不能執(zhí)行,要復(fù)制到的目標(biāo)文件夾要存在,否則命令不能執(zhí)行。
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 16:27 /user/hadoop/wc-in/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in/b.txt
[hadoop@qk wc-in]$ hadoop fs -put /home/hadoop/wc-in/a.txt /user/hadoop/wc-in/
put: `/user/hadoop/wc-in/a.txt': File exists[hadoop@qk wc-in]$ hadoop fs -put /home/hadoop/wc-in/c.txt /user/hadoop/wc-in/
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 16:27 /user/hadoop/wc-in/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in/b.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:40 /user/hadoop/wc-in/c.txt
[hadoop@qk wc-in]$ hadoop fs -put /home/hadoop/wc-in/d.txt /user/hadoop/wc-in1/
put: `/user/hadoop/wc-in1/': No such file or directory[hadoop@qk wc-in]$
在HDFS間復(fù)制一個(gè)文件: -cp
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 16:27 /user/hadoop/wc-in/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in/b.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:40 /user/hadoop/wc-in/c.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 3 2017-03-22 17:41 /user/hadoop/wc-in/d.txt
[hadoop@qk wc-in]$ hadoop fs -cp /user/hadoop/wc-in/a.txt /user/hadoop/wc-in1/
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 16:27 /user/hadoop/wc-in/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in/b.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:40 /user/hadoop/wc-in/c.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 3 2017-03-22 17:41 /user/hadoop/wc-in/d.txt
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in1/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:44 /user/hadoop/wc-in1/a.txt
在HDFS間移動(dòng)一個(gè)文件:-mv
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 16:27 /user/hadoop/wc-in/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in/b.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:40 /user/hadoop/wc-in/c.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 3 2017-03-22 17:41 /user/hadoop/wc-in/d.txt
[hadoop@qk wc-in]$ hadoop fs -mv /user/hadoop/wc-in/b.txt /user/hadoop/wc-in1/
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 16:27 /user/hadoop/wc-in/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:40 /user/hadoop/wc-in/c.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 3 2017-03-22 17:41 /user/hadoop/wc-in/d.txt
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in1/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:44 /user/hadoop/wc-in1/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in1/b.txt
統(tǒng)計(jì)hdfs對應(yīng)路徑下的目錄個(gè)數(shù),文件個(gè)數(shù),文件總計(jì)大小 :-count
[hadoop@qk wc-in]$ hadoop fs -ls /user/hadoop/wc-in1/
Found 2 items
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:44 /user/hadoop/wc-in1/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in1/b.txt
[hadoop@qk wc-in]$ hadoop fs -count /user/hadoop/wc-in1/
1? ? ? ? ? ? 2? ? ? ? ? ? ? ? 19 /user/hadoop/wc-in1
顯示hdfs對應(yīng)路徑下每個(gè)文件夾和文件的大小: -du(-s 計(jì)算總大小 -h 以方便的方式展現(xiàn))
[hadoop@qk wc-in]$ hadoop fs -du /user/hadoop/wc-in1/
8? /user/hadoop/wc-in1/a.txt
11? /user/hadoop/wc-in1/b.txt
[hadoop@qk wc-in]$ hadoop fs -du -s /user/hadoop/wc-in1/
19? /user/hadoop/wc-in1
[hadoop@qk wc-in]$ hadoop fs -du -h /user/hadoop/wc-in1/
8? /user/hadoop/wc-in1/a.txt
11? /user/hadoop/wc-in1/b.txt
[hadoop@qk wc-in]$
查看對應(yīng)路徑的狀態(tài)信息 :-stat (可選參數(shù)有:%b(文件大小),%o(Block大小),%n(文件名),%r(副本個(gè)數(shù)),%y(最后一次修改日期和時(shí)間))
[hadoop@qk wc-in]$ hadoop fs -stat %b /user/hadoop/wc-in1/a.txt
8
[hadoop@qk wc-in]$ hadoop fs -stat %o /user/hadoop/wc-in1/a.txt
134217728
[hadoop@qk wc-in]$ hadoop fs -stat %n /user/hadoop/wc-in1/a.txt
a.txt
[hadoop@qk wc-in]$ hadoop fs -stat %r /user/hadoop/wc-in1/a.txt
1
[hadoop@qk wc-in]$ hadoop fs -stat %y /user/hadoop/wc-in1/a.txt
2017-03-22 09:44:41
拷貝多個(gè)文件或目錄到本地: -get(本地和hdfs同名情況下,文件不能傳遞,本地存在要get的文件也不能正常傳遞)
[hadoop@qk wc-in]$ hadoop fs -get /user/hadoop/wc-in1/a.txt /home/hadoop/wc-in/
[hadoop@qk wc-in]$ ls -ltr
total 4
-rw-r--r--. 1 hadoop hadoop 8 Mar 22 18:02 a.txt
[hadoop@qk wc-in]$ hadoop fs -get /user/hadoop/wc-in/b.txt /home/hadoop/wc-in/get: `/user/hadoop/wc-in/b.txt': No such file or directory
[hadoop@qk wc-in]$ hadoop fs -get /user/hadoop/wc-in1/a.txt /home/hadoop/wc-in/get: `/home/hadoop/wc-in/a.txt': File exists
改變一個(gè)文件在hdfs中的副本個(gè)數(shù):-setrep(-R 將目錄遞歸的更改副本個(gè)數(shù))
[hadoop@qk wc-in]$ hadoop fs -stat %r /user/hadoop/wc-in1/b.txt
1
[hadoop@qk wc-in]$ hadoop fs -setrep 2 /user/hadoop/wc-in1/b.txt
Replication 2 set: /user/hadoop/wc-in1/b.txt
[hadoop@qk wc-in]$ hadoop fs -stat %r /user/hadoop/wc-in1/b.txt
2
運(yùn)行一個(gè)mapreduce的文件都存在,我們來試試一個(gè)example的程序吧!
[hadoop@qk wc-in]$ hadoop fs -ls -R /user/hadoop/wc-in/
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 16:27 /user/hadoop/wc-in/a.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? 11 2017-03-22 16:27 /user/hadoop/wc-in/b.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 8 2017-03-22 17:40 /user/hadoop/wc-in/c.txt
-rw-r--r--? 1 hadoop supergroup? ? ? ? ? 3 2017-03-22 17:41 /user/hadoop/wc-in/d.txt
[hadoop@qk wc-in]$ hadoop jar $HADOOP_HOME/hadoop-mapreduce-examples-2.6.0.jar wordcount wc-in wc-out
17/03/22 18:25:46 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/03/22 18:25:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/03/22 18:25:47 INFO input.FileInputFormat: Total input paths to process : 4
17/03/22 18:25:47 INFO mapreduce.JobSubmitter: number of splits:4
17/03/22 18:25:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local2007528989_0001
17/03/22 18:25:48 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/03/22 18:25:48 INFO mapreduce.Job: Running job: job_local2007528989_0001
17/03/22 18:25:48 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/03/22 18:25:48 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
17/03/22 18:25:48 INFO mapred.LocalJobRunner: Waiting for map tasks
17/03/22 18:25:48 INFO mapred.LocalJobRunner: Starting task: attempt_local2007528989_0001_m_000000_0
17/03/22 18:25:48 INFO mapred.Task:? Using ResourceCalculatorProcessTree : [ ]
17/03/22 18:25:48 INFO mapred.MapTask: Processing split: hdfs://192.168.1.230:9000/user/hadoop/wc-in/b.txt:0+11
17/03/22 18:25:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/03/22 18:25:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/03/22 18:25:49 INFO mapred.MapTask: soft limit at 83886080
17/03/22 18:25:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/03/22 18:25:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/03/22 18:25:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/03/22 18:25:49 INFO mapreduce.Job: Job job_local2007528989_0001 running in uber mode : false
17/03/22 18:25:49 INFO mapreduce.Job:? map 0% reduce 0%
17/03/22 18:25:49 INFO mapred.LocalJobRunner:
17/03/22 18:25:49 INFO mapred.MapTask: Starting flush of map output
17/03/22 18:25:49 INFO mapred.MapTask: Spilling map output
17/03/22 18:25:49 INFO mapred.MapTask: bufstart = 0; bufend = 22; bufvoid = 104857600
17/03/22 18:25:49 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
17/03/22 18:25:49 INFO mapred.MapTask: Finished spill 0
17/03/22 18:25:49 INFO mapred.Task: Task:attempt_local2007528989_0001_m_000000_0 is done. And is in the process of committing
17/03/22 18:25:49 INFO mapred.LocalJobRunner: map
17/03/22 18:25:49 INFO mapred.Task: Task 'attempt_local2007528989_0001_m_000000_0' done.
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local2007528989_0001_m_000000_0
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Starting task: attempt_local2007528989_0001_m_000001_0
17/03/22 18:25:49 INFO mapred.Task:? Using ResourceCalculatorProcessTree : [ ]
17/03/22 18:25:49 INFO mapred.MapTask: Processing split: hdfs://192.168.1.230:9000/user/hadoop/wc-in/a.txt:0+8
17/03/22 18:25:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/03/22 18:25:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/03/22 18:25:49 INFO mapred.MapTask: soft limit at 83886080
17/03/22 18:25:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/03/22 18:25:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/03/22 18:25:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/03/22 18:25:49 INFO mapred.LocalJobRunner:
17/03/22 18:25:49 INFO mapred.MapTask: Starting flush of map output
17/03/22 18:25:49 INFO mapred.MapTask: Spilling map output
17/03/22 18:25:49 INFO mapred.MapTask: bufstart = 0; bufend = 16; bufvoid = 104857600
17/03/22 18:25:49 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600
17/03/22 18:25:49 INFO mapred.MapTask: Finished spill 0
17/03/22 18:25:49 INFO mapred.Task: Task:attempt_local2007528989_0001_m_000001_0 is done. And is in the process of committing
17/03/22 18:25:49 INFO mapred.LocalJobRunner: map
17/03/22 18:25:49 INFO mapred.Task: Task 'attempt_local2007528989_0001_m_000001_0' done.
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local2007528989_0001_m_000001_0
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Starting task: attempt_local2007528989_0001_m_000002_0
17/03/22 18:25:49 INFO mapred.Task:? Using ResourceCalculatorProcessTree : [ ]
17/03/22 18:25:49 INFO mapred.MapTask: Processing split: hdfs://192.168.1.230:9000/user/hadoop/wc-in/c.txt:0+8
17/03/22 18:25:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/03/22 18:25:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/03/22 18:25:49 INFO mapred.MapTask: soft limit at 83886080
17/03/22 18:25:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/03/22 18:25:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/03/22 18:25:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/03/22 18:25:49 INFO mapred.LocalJobRunner:
17/03/22 18:25:49 INFO mapred.MapTask: Starting flush of map output
17/03/22 18:25:49 INFO mapred.MapTask: Spilling map output
17/03/22 18:25:49 INFO mapred.MapTask: bufstart = 0; bufend = 12; bufvoid = 104857600
17/03/22 18:25:49 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214396(104857584); length = 1/6553600
17/03/22 18:25:49 INFO mapred.MapTask: Finished spill 0
17/03/22 18:25:49 INFO mapred.Task: Task:attempt_local2007528989_0001_m_000002_0 is done. And is in the process of committing
17/03/22 18:25:49 INFO mapred.LocalJobRunner: map
17/03/22 18:25:49 INFO mapred.Task: Task 'attempt_local2007528989_0001_m_000002_0' done.
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local2007528989_0001_m_000002_0
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Starting task: attempt_local2007528989_0001_m_000003_0
17/03/22 18:25:49 INFO mapred.Task:? Using ResourceCalculatorProcessTree : [ ]
17/03/22 18:25:49 INFO mapred.MapTask: Processing split: hdfs://192.168.1.230:9000/user/hadoop/wc-in/d.txt:0+3
17/03/22 18:25:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/03/22 18:25:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/03/22 18:25:49 INFO mapred.MapTask: soft limit at 83886080
17/03/22 18:25:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/03/22 18:25:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/03/22 18:25:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/03/22 18:25:49 INFO mapred.LocalJobRunner:
17/03/22 18:25:49 INFO mapred.MapTask: Starting flush of map output
17/03/22 18:25:49 INFO mapred.MapTask: Spilling map output
17/03/22 18:25:49 INFO mapred.MapTask: bufstart = 0; bufend = 7; bufvoid = 104857600
17/03/22 18:25:49 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214396(104857584); length = 1/6553600
17/03/22 18:25:49 INFO mapred.MapTask: Finished spill 0
17/03/22 18:25:49 INFO mapred.Task: Task:attempt_local2007528989_0001_m_000003_0 is done. And is in the process of committing
17/03/22 18:25:49 INFO mapred.LocalJobRunner: map
17/03/22 18:25:49 INFO mapred.Task: Task 'attempt_local2007528989_0001_m_000003_0' done.
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local2007528989_0001_m_000003_0
17/03/22 18:25:49 INFO mapred.LocalJobRunner: map task executor complete.
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Waiting for reduce tasks
17/03/22 18:25:49 INFO mapred.LocalJobRunner: Starting task: attempt_local2007528989_0001_r_000000_0
17/03/22 18:25:49 INFO mapred.Task:? Using ResourceCalculatorProcessTree : [ ]
17/03/22 18:25:49 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@3c348ac4
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
17/03/22 18:25:49 INFO reduce.EventFetcher: attempt_local2007528989_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
17/03/22 18:25:49 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2007528989_0001_m_000000_0 decomp: 21 len: 25 to MEMORY
17/03/22 18:25:49 INFO reduce.InMemoryMapOutput: Read 21 bytes from map-output for attempt_local2007528989_0001_m_000000_0
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 21, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->21
17/03/22 18:25:49 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2007528989_0001_m_000003_0 decomp: 11 len: 15 to MEMORY
17/03/22 18:25:49 INFO reduce.InMemoryMapOutput: Read 11 bytes from map-output for attempt_local2007528989_0001_m_000003_0
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 11, inMemoryMapOutputs.size() -> 2, commitMemory -> 21, usedMemory ->32
17/03/22 18:25:49 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2007528989_0001_m_000001_0 decomp: 12 len: 16 to MEMORY
17/03/22 18:25:49 INFO reduce.InMemoryMapOutput: Read 12 bytes from map-output for attempt_local2007528989_0001_m_000001_0
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 12, inMemoryMapOutputs.size() -> 3, commitMemory -> 32, usedMemory ->44
17/03/22 18:25:49 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2007528989_0001_m_000002_0 decomp: 16 len: 20 to MEMORY
17/03/22 18:25:49 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
17/03/22 18:25:49 INFO reduce.InMemoryMapOutput: Read 16 bytes from map-output for attempt_local2007528989_0001_m_000002_0
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 16, inMemoryMapOutputs.size() -> 4, commitMemory -> 44, usedMemory ->60
17/03/22 18:25:49 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
17/03/22 18:25:49 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
17/03/22 18:25:49 INFO mapred.LocalJobRunner: 4 / 4 copied.
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: finalMerge called with 4 in-memory map-outputs and 0 on-disk map-outputs
17/03/22 18:25:49 INFO mapred.Merger: Merging 4 sorted segments
17/03/22 18:25:49 INFO mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 33 bytes
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: Merged 4 segments, 60 bytes to disk to satisfy reduce memory limit
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: Merging 1 files, 58 bytes from disk
17/03/22 18:25:49 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
17/03/22 18:25:49 INFO mapred.Merger: Merging 1 sorted segments
17/03/22 18:25:49 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 44 bytes
17/03/22 18:25:49 INFO mapred.LocalJobRunner: 4 / 4 copied.
17/03/22 18:25:49 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
17/03/22 18:25:50 INFO mapred.Task: Task:attempt_local2007528989_0001_r_000000_0 is done. And is in the process of committing
17/03/22 18:25:50 INFO mapred.LocalJobRunner: 4 / 4 copied.
17/03/22 18:25:50 INFO mapred.Task: Task attempt_local2007528989_0001_r_000000_0 is allowed to commit now
17/03/22 18:25:50 INFO output.FileOutputCommitter: Saved output of task 'attempt_local2007528989_0001_r_000000_0' to hdfs://192.168.1.230:9000/user/hadoop/wc-out/_temporary/0/task_local2007528989_0001_r_000000
17/03/22 18:25:50 INFO mapred.LocalJobRunner: reduce > reduce
17/03/22 18:25:50 INFO mapred.Task: Task 'attempt_local2007528989_0001_r_000000_0' done.
17/03/22 18:25:50 INFO mapred.LocalJobRunner: Finishing task: attempt_local2007528989_0001_r_000000_0
17/03/22 18:25:50 INFO mapred.LocalJobRunner: reduce task executor complete.
17/03/22 18:25:50 INFO mapreduce.Job:? map 100% reduce 100%
17/03/22 18:25:51 INFO mapreduce.Job: Job job_local2007528989_0001 completed successfully
17/03/22 18:25:51 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=1358767
FILE: Number of bytes written=2623735
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=117
HDFS: Number of bytes written=26
HDFS: Number of read operations=51
HDFS: Number of large read operations=0
HDFS: Number of write operations=7
Map-Reduce Framework
Map input records=4
Map output records=7
Map output bytes=57
Map output materialized bytes=76
Input split bytes=456
Combine input records=7
Combine output records=5
Reduce input groups=4
Reduce shuffle bytes=76
Reduce input records=5
Reduce output records=4
Spilled Records=10
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=222
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=931340288
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=30
File Output Format Counters
Bytes Written=26
[hadoop@qk wc-in]$ hadoop fs -cat /user/hadoop/wc-out/part-r-00000
arsenal 1
bla 3
fc 1
wa 2
[hadoop@qk wc-in]$
wordcount程序是一個(gè)計(jì)算文件的單詞個(gè)數(shù)的程序,那么從以上計(jì)算結(jié)果看a.txt等4個(gè)文件的單詞個(gè)數(shù)分別被計(jì)算出來。
...待續(xù)