Version:1.0 StartHTML:000000208 EndHTML:000056271 StartFragment:000013223 EndFragment:000056185 StartSelection:000013223 EndSelection:000056149 SourceURL:https://www.cnblogs.com/airnew/p/9574309.html <title>一起學Hadoop——使用自定義Partition實現(xiàn)hadoop部分排序 - summer哥 - 博客園</title><link href="/bundles/blog-common.css?v=D7Le-lOZiZVAXQkZQuNwdTWqjabXaVBE_2YAWzY_YZs1" rel="stylesheet" type="text/css"><link id="MainCss" href="/skins/iMetro_HD/bundle-iMetro_HD.css?v=cDVgAMQ7FTtxrKvup-MCLDNCyny4xFuA__ZZf74gd_s1" rel="stylesheet" type="text/css"><link id="mobile-style" href="/skins/iMetro_HD/bundle-iMetro_HD-mobile.css?v=VYvBZbXPqDcuZyq3IqW6JPMJ-xBqKhDr2P6dzCO3y041" rel="stylesheet" type="text/css" media="only screen and (max-width: 767px)"><link title="RSS" rel="alternate" type="application/rss+xml"><link title="RSD" rel="EditURI" type="application/rsd+xml"><link rel="wlwmanifest" type="application/wlwmanifest+xml"> <script type="text/javascript">var currentBlogApp = 'airnew', cb_enable_mathjax=true;var isLogined=true;</script> <script type="text/x-mathjax-config;executed=true"> MathJax.Hub.Config({ tex2jax: { inlineMath: [['/script>,'/script>], ['\(','\)']], processClass: 'blogpost-body', processEscapes: true }, TeX: { equationNumbers: { autoNumber: ['AMS'], useLabelIds: true }, extensions: ['extpfeil.js'] }, 'HTML-CSS': { linebreaks: { automatic: true } }, SVG: { linebreaks: { automatic: true } } }); </script> <style type="text/css">.MathJax_Hover_Frame {border-radius: .25em; -webkit-border-radius: .25em; -moz-border-radius: .25em; -khtml-border-radius: .25em; box-shadow: 0px 0px 15px #83A; -webkit-box-shadow: 0px 0px 15px #83A; -moz-box-shadow: 0px 0px 15px #83A; -khtml-box-shadow: 0px 0px 15px #83A; border: 1px solid #A6D ! important; display: inline-block; position: absolute} .MathJax_Menu_Button .MathJax_Hover_Arrow {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 4px; -webkit-border-radius: 4px; -moz-border-radius: 4px; -khtml-border-radius: 4px; font-family: 'Courier New',Courier; font-size: 9px; color: #F0F0F0} .MathJax_Menu_Button .MathJax_Hover_Arrow span {display: block; background-color: #AAA; border: 1px solid; border-radius: 3px; line-height: 0; padding: 4px} .MathJax_Hover_Arrow:hover {color: white!important; border: 2px solid #CCC!important} .MathJax_Hover_Arrow:hover span {background-color: #CCC!important} </style> <style type="text/css">#MathJax_About {position: fixed; left: 50%; width: auto; text-align: center; border: 3px outset; padding: 1em 2em; background-color: #DDDDDD; color: black; cursor: default; font-family: message-box; font-size: 120%; font-style: normal; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; border-radius: 15px; -webkit-border-radius: 15px; -moz-border-radius: 15px; -khtml-border-radius: 15px; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} #MathJax_About.MathJax_MousePost {outline: none} .MathJax_Menu {position: absolute; background-color: white; color: black; width: auto; padding: 2px; border: 1px solid #CCCCCC; margin: 0; cursor: default; font: menu; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} .MathJax_MenuItem {padding: 2px 2em; background: transparent} .MathJax_MenuArrow {position: absolute; right: .5em; padding-top: .25em; color: #666666; font-size: .75em} .MathJax_MenuActive .MathJax_MenuArrow {color: white} .MathJax_MenuArrow.RTL {left: .5em; right: auto} .MathJax_MenuCheck {position: absolute; left: .7em} .MathJax_MenuCheck.RTL {right: .7em; left: auto} .MathJax_MenuRadioCheck {position: absolute; left: 1em} .MathJax_MenuRadioCheck.RTL {right: 1em; left: auto} .MathJax_MenuLabel {padding: 2px 2em 4px 1.33em; font-style: italic} .MathJax_MenuRule {border-top: 1px solid #CCCCCC; margin: 4px 1px 0px} .MathJax_MenuDisabled {color: GrayText} .MathJax_MenuActive {background-color: Highlight; color: HighlightText} .MathJax_MenuDisabled:focus, .MathJax_MenuLabel:focus {background-color: #E8E8E8} .MathJax_ContextMenu:focus {outline: none} .MathJax_ContextMenu .MathJax_MenuItem:focus {outline: none} #MathJax_AboutClose {top: .2em; right: .2em} .MathJax_Menu .MathJax_MenuClose {top: -10px; left: -10px} .MathJax_MenuClose {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; font-family: 'Courier New',Courier; font-size: 24px; color: #F0F0F0} .MathJax_MenuClose span {display: block; background-color: #AAA; border: 1.5px solid; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; line-height: 0; padding: 8px 0 6px} .MathJax_MenuClose:hover {color: white!important; border: 2px solid #CCC!important} .MathJax_MenuClose:hover span {background-color: #CCC!important} .MathJax_MenuClose:hover:focus {outline: none} </style> <style type="text/css">.MathJax_Preview .MJXf-math {color: inherit!important} </style> <style type="text/css">.MJX_Assistive_MathML {position: absolute!important; top: 0; left: 0; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display: block!important; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none} .MJX_Assistive_MathML.MJX_Assistive_MathML_Block {width: 100%!important} </style> <style type="text/css">#MathJax_Zoom {position: absolute; background-color: #F0F0F0; overflow: auto; display: block; z-index: 301; padding: .5em; border: 1px solid black; margin: 0; font-weight: normal; font-style: normal; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box; box-shadow: 5px 5px 15px #AAAAAA; -webkit-box-shadow: 5px 5px 15px #AAAAAA; -moz-box-shadow: 5px 5px 15px #AAAAAA; -khtml-box-shadow: 5px 5px 15px #AAAAAA; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} #MathJax_ZoomOverlay {position: absolute; left: 0; top: 0; z-index: 300; display: inline-block; width: 100%; height: 100%; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)} #MathJax_ZoomFrame {position: relative; display: inline-block; height: 0; width: 0} #MathJax_ZoomEventTrap {position: absolute; left: 0; top: 0; z-index: 302; display: inline-block; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)} </style> <style type="text/css">.MathJax_Preview {color: #888} #MathJax_Message {position: fixed; left: 1px; bottom: 2px; background-color: #E6E6E6; border: 1px solid #959595; margin: 0px; padding: 2px 8px; z-index: 102; color: black; font-size: 80%; width: auto; white-space: nowrap} #MathJax_MSIE_Frame {position: absolute; top: 0; left: 0; width: 0px; z-index: 101; border: 0px; margin: 0px; padding: 0px} .MathJax_Error {color: #CC0000; font-style: italic} </style> <style type="text/css">.MJXp-script {font-size: .8em} .MJXp-right {-webkit-transform-origin: right; -moz-transform-origin: right; -ms-transform-origin: right; -o-transform-origin: right; transform-origin: right} .MJXp-bold {font-weight: bold} .MJXp-italic {font-style: italic} .MJXp-scr {font-family: MathJax_Script,'Times New Roman',Times,STIXGeneral,serif} .MJXp-frak {font-family: MathJax_Fraktur,'Times New Roman',Times,STIXGeneral,serif} .MJXp-sf {font-family: MathJax_SansSerif,'Times New Roman',Times,STIXGeneral,serif} .MJXp-cal {font-family: MathJax_Caligraphic,'Times New Roman',Times,STIXGeneral,serif} .MJXp-mono {font-family: MathJax_Typewriter,'Times New Roman',Times,STIXGeneral,serif} .MJXp-largeop {font-size: 150%} .MJXp-largeop.MJXp-int {vertical-align: -.2em} .MJXp-math {display: inline-block; line-height: 1.2; text-indent: 0; font-family: 'Times New Roman',Times,STIXGeneral,serif; white-space: nowrap; border-collapse: collapse} .MJXp-display {display: block; text-align: center; margin: 1em 0} .MJXp-math span {display: inline-block} .MJXp-box {display: block!important; text-align: center} .MJXp-box:after {content: " "} .MJXp-rule {display: block!important; margin-top: .1em} .MJXp-char {display: block!important} .MJXp-mo {margin: 0 .15em} .MJXp-mfrac {margin: 0 .125em; vertical-align: .25em} .MJXp-denom {display: inline-table!important; width: 100%} .MJXp-denom > * {display: table-row!important} .MJXp-surd {vertical-align: top} .MJXp-surd > * {display: block!important} .MJXp-script-box > * {display: table!important; height: 50%} .MJXp-script-box > * > * {display: table-cell!important; vertical-align: top} .MJXp-script-box > *:last-child > * {vertical-align: bottom} .MJXp-script-box > * > * > * {display: block!important} .MJXp-mphantom {visibility: hidden} .MJXp-munderover {display: inline-table!important} .MJXp-over {display: inline-block!important; text-align: center} .MJXp-over > * {display: block!important} .MJXp-munderover > * {display: table-row!important} .MJXp-mtable {vertical-align: .25em; margin: 0 .125em} .MJXp-mtable > * {display: inline-table!important; vertical-align: middle} .MJXp-mtr {display: table-row!important} .MJXp-mtd {display: table-cell!important; text-align: center; padding: .5em 0 0 .5em} .MJXp-mtr > .MJXp-mtd:first-child {padding-left: 0} .MJXp-mtr:first-child > .MJXp-mtd {padding-top: 0} .MJXp-mlabeledtr {display: table-row!important} .MJXp-mlabeledtr > .MJXp-mtd:first-child {padding-left: 0} .MJXp-mlabeledtr:first-child > .MJXp-mtd {padding-top: 0} .MJXp-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 1px 3px; font-style: normal; font-size: 90%} .MJXp-scale0 {-webkit-transform: scaleX(.0); -moz-transform: scaleX(.0); -ms-transform: scaleX(.0); -o-transform: scaleX(.0); transform: scaleX(.0)} .MJXp-scale1 {-webkit-transform: scaleX(.1); -moz-transform: scaleX(.1); -ms-transform: scaleX(.1); -o-transform: scaleX(.1); transform: scaleX(.1)} .MJXp-scale2 {-webkit-transform: scaleX(.2); -moz-transform: scaleX(.2); -ms-transform: scaleX(.2); -o-transform: scaleX(.2); transform: scaleX(.2)} .MJXp-scale3 {-webkit-transform: scaleX(.3); -moz-transform: scaleX(.3); -ms-transform: scaleX(.3); -o-transform: scaleX(.3); transform: scaleX(.3)} .MJXp-scale4 {-webkit-transform: scaleX(.4); -moz-transform: scaleX(.4); -ms-transform: scaleX(.4); -o-transform: scaleX(.4); transform: scaleX(.4)} .MJXp-scale5 {-webkit-transform: scaleX(.5); -moz-transform: scaleX(.5); -ms-transform: scaleX(.5); -o-transform: scaleX(.5); transform: scaleX(.5)} .MJXp-scale6 {-webkit-transform: scaleX(.6); -moz-transform: scaleX(.6); -ms-transform: scaleX(.6); -o-transform: scaleX(.6); transform: scaleX(.6)} .MJXp-scale7 {-webkit-transform: scaleX(.7); -moz-transform: scaleX(.7); -ms-transform: scaleX(.7); -o-transform: scaleX(.7); transform: scaleX(.7)} .MJXp-scale8 {-webkit-transform: scaleX(.8); -moz-transform: scaleX(.8); -ms-transform: scaleX(.8); -o-transform: scaleX(.8); transform: scaleX(.8)} .MJXp-scale9 {-webkit-transform: scaleX(.9); -moz-transform: scaleX(.9); -ms-transform: scaleX(.9); -o-transform: scaleX(.9); transform: scaleX(.9)} .MathJax_PHTML .noError {vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}</style>
排序在很多業(yè)務場景都要用到,今天本文介紹如何借助于自定義Partition類實現(xiàn)hadoop部分排序。本文還是使用java和python實現(xiàn)排序代碼。
1、部分排序。
部分排序就是在每個文件中都是有序的,和其他文件沒有關系,其實很多業(yè)務場景就需要到部分排序,而不需要全局排序。例如,有個水果電商網(wǎng)站,要對每個月的水果的銷量進行排序,我們可以把reduce進程之后的文件分成12份,對應1到12月份。每個文件按照水果的銷量從高到底排序,1月份的排序和其他月份的排序沒有任何關系。
原始數(shù)據(jù)如下,有三個字段,第一個字段是水果名稱,第二個字段是銷售月份,第三個字段是銷售量,
Apple 201701 20
Pear 201701 30
Banana 201701 40
Orange 201701 90
Apple 201702 50
Pear 201702 60
Banana 201702 20
Orange 201702 10
Apple 201703 230
Pear 201703 302
Banana 201703 140
Orange 201703 290
Apple 201704 30
Pear 201704 102
Banana 201704 240
Orange 201704 190
經(jīng)過部分排序后會生成12個文件,內(nèi)容如下,銷量按照從高到低排序
Pear 302
Orange 290
Apple 230
Banana 140
實現(xiàn)思路:
1、自定義Partition類,因為一年有12個月 ,因此需要12個分區(qū),同時在MapReduce入口類中要指定Partition類,以及partition的數(shù)量。
2、在map函數(shù)中將年月作為key值,value變?yōu)椤癆pple_20”的格式。
3、在reduce函數(shù)中比較每種水果的銷量,按照從高到低排序。
Java代碼如下,Map類:
[[圖片上傳失敗...(image-c6345b-1535881942398)]](javascript:void(0); "復制代碼")
<pre>1 public class PartSortMap extends Mapper<LongWritable,Text,Text,Text> { 2
3 public void map(LongWritable key,Text value,Context context)throws IOException,InterruptedException{ 4 String line = value.toString();//讀取一行數(shù)據(jù),數(shù)據(jù)格式為“Apple 201701 30”
5 String str[] = line.split(" ");//
6 //年月當做key值,因為要根據(jù)key值設置分區(qū),而Apple+“”+銷量當做value
7 context.write(new Text(str[1]),new Text(str[0] + "" + str[2])); 8 } 9 }</pre>
[[圖片上傳失敗...(image-3ec0d8-1535881942398)]](javascript:void(0); "復制代碼")
自定義Partition類:
[[圖片上傳失敗...(image-5c5780-1535881942398)]](javascript:void(0); "復制代碼")
<pre> 1 public class PartParttition extends Partitioner<Text, Text> { 2 public int getPartition(Text arg0, Text arg1, int arg2) { 3 String key = arg0.toString(); 4 int month = Integer.parseInt(key.substring(4, key.length()));
5 if (month == 1) {
6 return 1 % arg2; 7 } else if (month == 2) {
8 return 2 % arg2; 9 } else if (month == 3) { 10 return 3 % arg2; 11 }else if (month == 4) { 12 return 4 % arg2; 13 }else if (month == 5) { 14 return 5 % arg2; 15 }else if (month == 6) { 16 return 6 % arg2; 17 }else if (month == 7) { 18 return 7 % arg2; 19 }else if (month == 8) { 20 return 8 % arg2; 21 }else if (month == 9) { 22 return 9 % arg2; 23 }else if (month == 10) { 24 return 10 % arg2; 25 }else if (month == 11) { 26 return 11 % arg2; 27 }else if (month == 12) { 28 return 12 % arg2; 29 } 30 return 0; 31 } 32 }</pre>
[[圖片上傳失敗...(image-487b42-1535881942398)]](javascript:void(0); "復制代碼")
Reduce類:
[[圖片上傳失敗...(image-19ba30-1535881942398)]](javascript:void(0); "復制代碼")
<pre> 1 public class PartSortReduce extends Reducer<Text,Text,Text,Text> { 2 class FruitSales implements Comparable<FruitSales>{
3 private String name;//水果名字
4 private double sales;//水果銷量
5 public void setName(String name){ 6 this.name = name; 7 }
8
9 public String getName(){ 10 return this.name; 11 } 12 public void setSales(double sales){ 13 this.sales = sales; 14 } 15
16 public double getSales() { 17 return this.sales; 18 } 19
20 @Override 21 public int compareTo(FruitSales o) { 22 if(this.getSales() > o.getSales()){ 23 return -1; 24 }else if(this.getSales() == o.getSales()){ 25 return 0; 26 }else { 27 return 1; 28 } 29 } 30 } 31
32 public void reduce(Text key, Iterable<Text> values,Context context)throws IOException,InterruptedException{ 33 List<FruitSales> fruitList = new ArrayList<FruitSales>(); 34
35 for(Text value: values) { 36 String[] str = value.toString().split("_"); 37 FruitSales f = new FruitSales(); 38 f.setName(str[0]); 39 f.setSales(Double.parseDouble(str[1])); 40 fruitList.add(f); 41 } 42 Collections.sort(fruitList); 43
44 for(FruitSales f : fruitList){ 45 context.write(new Text(f.getName()),new Text(String.valueOf(f.getSales()))); 46 } 47 } 48 }</pre>
[[圖片上傳失敗...(image-758ca4-1535881942398)]](javascript:void(0); "復制代碼")
入口類:
[[圖片上傳失敗...(image-dee533-1535881942398)]](javascript:void(0); "復制代碼")
<pre> 1 public class PartSortMain { 2 public static void main(String[] args)throws Exception{ 3 Configuration conf = new Configuration(); 4 //獲取運行時輸入的參數(shù),一般是通過shell腳本文件傳進來。
5 String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs(); 6 if(otherArgs.length < 2){
7 System.err.println("必須輸入讀取文件路徑和輸出路徑");
8 System.exit(2);
9 } 10 Job job = new Job(); 11 job.setJarByClass(PartSortMain.class); 12 job.setJobName("PartSort app"); 13
14 //設置讀取文件的路徑,都是從HDFS中讀取。讀取文件路徑從腳本文件中傳進來
15 FileInputFormat.addInputPath(job,new Path(args[0])); 16
17 //設置mapreduce程序的輸出路徑,MapReduce的結果都是輸入到文件中
18 FileOutputFormat.setOutputPath(job,new Path(args[1])); 19
20
21 job.setPartitionerClass(PartParttition.class);//設置自定義partition類
22 job.setNumReduceTasks(12);//設置為partiton數(shù)量 23 //設置實現(xiàn)了map函數(shù)的類
24 job.setMapperClass(PartSortMap.class); 25
26 //設置實現(xiàn)了reduce函數(shù)的類
27 job.setReducerClass(PartSortReduce.class); 28
29 //設置reduce函數(shù)的key值
30 job.setOutputKeyClass(Text.class); 31 //設置reduce函數(shù)的value值
32 job.setOutputValueClass(Text.class); 33
34 System.exit(job.waitForCompletion(true) ? 0 :1); 35 } 36 }</pre>
[[圖片上傳失敗...(image-2bc744-1535881942398)]](javascript:void(0); "復制代碼")
運行后會在hdfs中生成12個文件,如下圖所示:
查看其中的一個文件會看到如下的內(nèi)容:
可以看到是按照銷量從高到低排序。
使用Python實現(xiàn)部分排序。
Python使用streaming的方式實現(xiàn)MapReduce,和Java方式不一樣,不能自定義Partition,但是可以在腳本文件中指定哪個字段用作partition,哪個字段用于排序。
下圖顯示數(shù)據(jù)經(jīng)過部分排序之后,數(shù)據(jù)變化的過程。即原始數(shù)據(jù),經(jīng)過map函數(shù),然后到reduce函數(shù),最終在每個文件中按照銷量從高到底排序的過程:
上圖中的第一步是在map函數(shù)中將原始數(shù)據(jù)的第二列的“年月”轉換成“月”,當做partition,將銷量當做key,水果名當做value。第二步是經(jīng)過MapReduce的排序之后到達Reduce函數(shù)之間的結果。第三步是在reduce函數(shù)中將map輸入的數(shù)據(jù)中將key當做reduce的value,將value當做reduce的key。
代碼如下:
map_sort.py
[[圖片上傳失敗...(image-90ef10-1535881942391)]](javascript:void(0); "復制代碼")
<pre> 1 #!/usr/bin/python
2 import sys 3 base_numer = 99999
4 for line in sys.stdin: 5 ss = line.strip().split(' ')
6 fruit = ss[0] 7 yearmm = ss[1]
8 sales = ss[2]
9 new_key = base_number - int(sales) 10 mm = yearmm[4:6] 11 print "%s\t%s\t%s" % (int(mm), int(new_key), fruit)</pre>
[[圖片上傳失敗...(image-3ebe61-1535881942391)]](javascript:void(0); "復制代碼")
reduce_sort.py
[[圖片上傳失敗...(image-c4854d-1535881942391)]](javascript:void(0); "復制代碼")
<pre>1 #!/usr/bin/python
2 import sys 3 base_number = 99999
4 for line in sys.stdin: 5 idx_id, sales, fruit = line.strip().split('\t') 6 new_key = base_number - int(sales) 7 print '\t'.join([val, str(new_key)])</pre>
[[圖片上傳失敗...(image-6db3ef-1535881942391)]](javascript:void(0); "復制代碼")
執(zhí)行腳本如下:
run.sh
[[圖片上傳失敗...(image-6b17b9-1535881942391)]](javascript:void(0); "復制代碼")
<pre> 1 set -e -x
2 HADOOP_CMD="/usr/local/src/hadoop-2.6.1/bin/hadoop"
3 STREAM_JAR_PATH="/usr/local/src/hadoop-2.6.1/share/hadoop/tools/lib/hadoop-streaming-2.6.1.jar"
4 INPUT_FILE_PATH_A="/data/fruit.txt"
5 OUTPUT_SORT_PATH="/output_sort"
6 OUTPUT_SORT_PATH
7 STREAM_JAR_PATH
8 -input OUTPUT_SORT_PATH \ 10 -mapper "python map_sort.py" \ 11 -reducer "python reduce_sort.py" \ 12 -file ./map_sort.py \ 13 -file ./red_sort.py \ 14 -jobconf mapred.reduce.tasks=12 \ 15 -jobconf stream.num.map.output.key.fields=2 \ 16 -jobconf num.key.fields.for.partition=1 \ 17 -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner</pre>
[[圖片上傳失敗...(image-1f300c-1535881942391)]](javascript:void(0); "復制代碼")
-jobconf stream.num.map.output.key.fields=2 這行代碼用于指定排序的字段,數(shù)字2指定map函數(shù)輸出數(shù)據(jù)的第幾列用于排序,就是例子中的sales字段。
-jobconf num.key.fields.for.partition=1這行代碼指定partition字段,數(shù)字1指定map函數(shù)輸出數(shù)據(jù)的第一列用于分區(qū)。
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner這行代碼是調(diào)用hadoop streaming包中的分區(qū)類,實現(xiàn)分區(qū)功能。
實現(xiàn)streaming partition功能時這三行代碼必不可少。
總結:
實現(xiàn)hadoop部分排序主要是通過partition方式實現(xiàn)。
java語言使用自定義分區(qū)Partition類實現(xiàn)分區(qū)的功能,而streaming是通過KeyFieldBasedPartitioner類,然后在腳本文件中指定partition類的方式實現(xiàn)。