一起學Hadoop——使用自定義Partition實現(xiàn)hadoop部分排序

Version:1.0 StartHTML:000000208 EndHTML:000056271 StartFragment:000013223 EndFragment:000056185 StartSelection:000013223 EndSelection:000056149 SourceURL:https://www.cnblogs.com/airnew/p/9574309.html <title>一起學Hadoop——使用自定義Partition實現(xiàn)hadoop部分排序 - summer哥 - 博客園</title><link href="/bundles/blog-common.css?v=D7Le-lOZiZVAXQkZQuNwdTWqjabXaVBE_2YAWzY_YZs1" rel="stylesheet" type="text/css"><link id="MainCss" href="/skins/iMetro_HD/bundle-iMetro_HD.css?v=cDVgAMQ7FTtxrKvup-MCLDNCyny4xFuA__ZZf74gd_s1" rel="stylesheet" type="text/css"><link id="mobile-style" href="/skins/iMetro_HD/bundle-iMetro_HD-mobile.css?v=VYvBZbXPqDcuZyq3IqW6JPMJ-xBqKhDr2P6dzCO3y041" rel="stylesheet" type="text/css" media="only screen and (max-width: 767px)"><link title="RSS" rel="alternate" type="application/rss+xml"><link title="RSD" rel="EditURI" type="application/rsd+xml"><link rel="wlwmanifest" type="application/wlwmanifest+xml"> <script type="text/javascript">var currentBlogApp = 'airnew', cb_enable_mathjax=true;var isLogined=true;</script> <script type="text/x-mathjax-config;executed=true"> MathJax.Hub.Config({ tex2jax: { inlineMath: [['/script>,'/script>], ['\(','\)']], processClass: 'blogpost-body', processEscapes: true }, TeX: { equationNumbers: { autoNumber: ['AMS'], useLabelIds: true }, extensions: ['extpfeil.js'] }, 'HTML-CSS': { linebreaks: { automatic: true } }, SVG: { linebreaks: { automatic: true } } }); </script> <style type="text/css">.MathJax_Hover_Frame {border-radius: .25em; -webkit-border-radius: .25em; -moz-border-radius: .25em; -khtml-border-radius: .25em; box-shadow: 0px 0px 15px #83A; -webkit-box-shadow: 0px 0px 15px #83A; -moz-box-shadow: 0px 0px 15px #83A; -khtml-box-shadow: 0px 0px 15px #83A; border: 1px solid #A6D ! important; display: inline-block; position: absolute} .MathJax_Menu_Button .MathJax_Hover_Arrow {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 4px; -webkit-border-radius: 4px; -moz-border-radius: 4px; -khtml-border-radius: 4px; font-family: 'Courier New',Courier; font-size: 9px; color: #F0F0F0} .MathJax_Menu_Button .MathJax_Hover_Arrow span {display: block; background-color: #AAA; border: 1px solid; border-radius: 3px; line-height: 0; padding: 4px} .MathJax_Hover_Arrow:hover {color: white!important; border: 2px solid #CCC!important} .MathJax_Hover_Arrow:hover span {background-color: #CCC!important} </style> <style type="text/css">#MathJax_About {position: fixed; left: 50%; width: auto; text-align: center; border: 3px outset; padding: 1em 2em; background-color: #DDDDDD; color: black; cursor: default; font-family: message-box; font-size: 120%; font-style: normal; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; border-radius: 15px; -webkit-border-radius: 15px; -moz-border-radius: 15px; -khtml-border-radius: 15px; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} #MathJax_About.MathJax_MousePost {outline: none} .MathJax_Menu {position: absolute; background-color: white; color: black; width: auto; padding: 2px; border: 1px solid #CCCCCC; margin: 0; cursor: default; font: menu; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} .MathJax_MenuItem {padding: 2px 2em; background: transparent} .MathJax_MenuArrow {position: absolute; right: .5em; padding-top: .25em; color: #666666; font-size: .75em} .MathJax_MenuActive .MathJax_MenuArrow {color: white} .MathJax_MenuArrow.RTL {left: .5em; right: auto} .MathJax_MenuCheck {position: absolute; left: .7em} .MathJax_MenuCheck.RTL {right: .7em; left: auto} .MathJax_MenuRadioCheck {position: absolute; left: 1em} .MathJax_MenuRadioCheck.RTL {right: 1em; left: auto} .MathJax_MenuLabel {padding: 2px 2em 4px 1.33em; font-style: italic} .MathJax_MenuRule {border-top: 1px solid #CCCCCC; margin: 4px 1px 0px} .MathJax_MenuDisabled {color: GrayText} .MathJax_MenuActive {background-color: Highlight; color: HighlightText} .MathJax_MenuDisabled:focus, .MathJax_MenuLabel:focus {background-color: #E8E8E8} .MathJax_ContextMenu:focus {outline: none} .MathJax_ContextMenu .MathJax_MenuItem:focus {outline: none} #MathJax_AboutClose {top: .2em; right: .2em} .MathJax_Menu .MathJax_MenuClose {top: -10px; left: -10px} .MathJax_MenuClose {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; font-family: 'Courier New',Courier; font-size: 24px; color: #F0F0F0} .MathJax_MenuClose span {display: block; background-color: #AAA; border: 1.5px solid; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; line-height: 0; padding: 8px 0 6px} .MathJax_MenuClose:hover {color: white!important; border: 2px solid #CCC!important} .MathJax_MenuClose:hover span {background-color: #CCC!important} .MathJax_MenuClose:hover:focus {outline: none} </style> <style type="text/css">.MathJax_Preview .MJXf-math {color: inherit!important} </style> <style type="text/css">.MJX_Assistive_MathML {position: absolute!important; top: 0; left: 0; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display: block!important; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none} .MJX_Assistive_MathML.MJX_Assistive_MathML_Block {width: 100%!important} </style> <style type="text/css">#MathJax_Zoom {position: absolute; background-color: #F0F0F0; overflow: auto; display: block; z-index: 301; padding: .5em; border: 1px solid black; margin: 0; font-weight: normal; font-style: normal; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box; box-shadow: 5px 5px 15px #AAAAAA; -webkit-box-shadow: 5px 5px 15px #AAAAAA; -moz-box-shadow: 5px 5px 15px #AAAAAA; -khtml-box-shadow: 5px 5px 15px #AAAAAA; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')} #MathJax_ZoomOverlay {position: absolute; left: 0; top: 0; z-index: 300; display: inline-block; width: 100%; height: 100%; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)} #MathJax_ZoomFrame {position: relative; display: inline-block; height: 0; width: 0} #MathJax_ZoomEventTrap {position: absolute; left: 0; top: 0; z-index: 302; display: inline-block; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)} </style> <style type="text/css">.MathJax_Preview {color: #888} #MathJax_Message {position: fixed; left: 1px; bottom: 2px; background-color: #E6E6E6; border: 1px solid #959595; margin: 0px; padding: 2px 8px; z-index: 102; color: black; font-size: 80%; width: auto; white-space: nowrap} #MathJax_MSIE_Frame {position: absolute; top: 0; left: 0; width: 0px; z-index: 101; border: 0px; margin: 0px; padding: 0px} .MathJax_Error {color: #CC0000; font-style: italic} </style> <style type="text/css">.MJXp-script {font-size: .8em} .MJXp-right {-webkit-transform-origin: right; -moz-transform-origin: right; -ms-transform-origin: right; -o-transform-origin: right; transform-origin: right} .MJXp-bold {font-weight: bold} .MJXp-italic {font-style: italic} .MJXp-scr {font-family: MathJax_Script,'Times New Roman',Times,STIXGeneral,serif} .MJXp-frak {font-family: MathJax_Fraktur,'Times New Roman',Times,STIXGeneral,serif} .MJXp-sf {font-family: MathJax_SansSerif,'Times New Roman',Times,STIXGeneral,serif} .MJXp-cal {font-family: MathJax_Caligraphic,'Times New Roman',Times,STIXGeneral,serif} .MJXp-mono {font-family: MathJax_Typewriter,'Times New Roman',Times,STIXGeneral,serif} .MJXp-largeop {font-size: 150%} .MJXp-largeop.MJXp-int {vertical-align: -.2em} .MJXp-math {display: inline-block; line-height: 1.2; text-indent: 0; font-family: 'Times New Roman',Times,STIXGeneral,serif; white-space: nowrap; border-collapse: collapse} .MJXp-display {display: block; text-align: center; margin: 1em 0} .MJXp-math span {display: inline-block} .MJXp-box {display: block!important; text-align: center} .MJXp-box:after {content: " "} .MJXp-rule {display: block!important; margin-top: .1em} .MJXp-char {display: block!important} .MJXp-mo {margin: 0 .15em} .MJXp-mfrac {margin: 0 .125em; vertical-align: .25em} .MJXp-denom {display: inline-table!important; width: 100%} .MJXp-denom > * {display: table-row!important} .MJXp-surd {vertical-align: top} .MJXp-surd > * {display: block!important} .MJXp-script-box > * {display: table!important; height: 50%} .MJXp-script-box > * > * {display: table-cell!important; vertical-align: top} .MJXp-script-box > *:last-child > * {vertical-align: bottom} .MJXp-script-box > * > * > * {display: block!important} .MJXp-mphantom {visibility: hidden} .MJXp-munderover {display: inline-table!important} .MJXp-over {display: inline-block!important; text-align: center} .MJXp-over > * {display: block!important} .MJXp-munderover > * {display: table-row!important} .MJXp-mtable {vertical-align: .25em; margin: 0 .125em} .MJXp-mtable > * {display: inline-table!important; vertical-align: middle} .MJXp-mtr {display: table-row!important} .MJXp-mtd {display: table-cell!important; text-align: center; padding: .5em 0 0 .5em} .MJXp-mtr > .MJXp-mtd:first-child {padding-left: 0} .MJXp-mtr:first-child > .MJXp-mtd {padding-top: 0} .MJXp-mlabeledtr {display: table-row!important} .MJXp-mlabeledtr > .MJXp-mtd:first-child {padding-left: 0} .MJXp-mlabeledtr:first-child > .MJXp-mtd {padding-top: 0} .MJXp-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 1px 3px; font-style: normal; font-size: 90%} .MJXp-scale0 {-webkit-transform: scaleX(.0); -moz-transform: scaleX(.0); -ms-transform: scaleX(.0); -o-transform: scaleX(.0); transform: scaleX(.0)} .MJXp-scale1 {-webkit-transform: scaleX(.1); -moz-transform: scaleX(.1); -ms-transform: scaleX(.1); -o-transform: scaleX(.1); transform: scaleX(.1)} .MJXp-scale2 {-webkit-transform: scaleX(.2); -moz-transform: scaleX(.2); -ms-transform: scaleX(.2); -o-transform: scaleX(.2); transform: scaleX(.2)} .MJXp-scale3 {-webkit-transform: scaleX(.3); -moz-transform: scaleX(.3); -ms-transform: scaleX(.3); -o-transform: scaleX(.3); transform: scaleX(.3)} .MJXp-scale4 {-webkit-transform: scaleX(.4); -moz-transform: scaleX(.4); -ms-transform: scaleX(.4); -o-transform: scaleX(.4); transform: scaleX(.4)} .MJXp-scale5 {-webkit-transform: scaleX(.5); -moz-transform: scaleX(.5); -ms-transform: scaleX(.5); -o-transform: scaleX(.5); transform: scaleX(.5)} .MJXp-scale6 {-webkit-transform: scaleX(.6); -moz-transform: scaleX(.6); -ms-transform: scaleX(.6); -o-transform: scaleX(.6); transform: scaleX(.6)} .MJXp-scale7 {-webkit-transform: scaleX(.7); -moz-transform: scaleX(.7); -ms-transform: scaleX(.7); -o-transform: scaleX(.7); transform: scaleX(.7)} .MJXp-scale8 {-webkit-transform: scaleX(.8); -moz-transform: scaleX(.8); -ms-transform: scaleX(.8); -o-transform: scaleX(.8); transform: scaleX(.8)} .MJXp-scale9 {-webkit-transform: scaleX(.9); -moz-transform: scaleX(.9); -ms-transform: scaleX(.9); -o-transform: scaleX(.9); transform: scaleX(.9)} .MathJax_PHTML .noError {vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}</style>

排序在很多業(yè)務場景都要用到,今天本文介紹如何借助于自定義Partition類實現(xiàn)hadoop部分排序。本文還是使用java和python實現(xiàn)排序代碼。

1、部分排序。

部分排序就是在每個文件中都是有序的,和其他文件沒有關系,其實很多業(yè)務場景就需要到部分排序,而不需要全局排序。例如,有個水果電商網(wǎng)站,要對每個月的水果的銷量進行排序,我們可以把reduce進程之后的文件分成12份,對應1到12月份。每個文件按照水果的銷量從高到底排序,1月份的排序和其他月份的排序沒有任何關系。

原始數(shù)據(jù)如下,有三個字段,第一個字段是水果名稱,第二個字段是銷售月份,第三個字段是銷售量,

Apple 201701 20

Pear 201701 30

Banana 201701 40

Orange 201701 90

Apple 201702 50

Pear 201702 60

Banana 201702 20

Orange 201702 10

Apple 201703 230

Pear 201703 302

Banana 201703 140

Orange 201703 290

Apple 201704 30

Pear 201704 102

Banana 201704 240

Orange 201704 190

經(jīng)過部分排序后會生成12個文件,內(nèi)容如下,銷量按照從高到低排序

Pear 302

Orange 290

Apple 230

Banana 140

實現(xiàn)思路:

1、自定義Partition類,因為一年有12個月 ,因此需要12個分區(qū),同時在MapReduce入口類中要指定Partition類,以及partition的數(shù)量。

2、在map函數(shù)中將年月作為key值,value變?yōu)椤癆pple_20”的格式。

3、在reduce函數(shù)中比較每種水果的銷量,按照從高到低排序。

Java代碼如下,Map類:

[[圖片上傳失敗...(image-c6345b-1535881942398)]](javascript:void(0); "復制代碼")

<pre>1 public class PartSortMap extends Mapper<LongWritable,Text,Text,Text> { 2
3 public void map(LongWritable key,Text value,Context context)throws IOException,InterruptedException{ 4 String line = value.toString();//讀取一行數(shù)據(jù),數(shù)據(jù)格式為“Apple 201701 30”
5 String str[] = line.split(" ");//
6 //年月當做key值,因為要根據(jù)key值設置分區(qū),而Apple+“”+銷量當做value
7 context.write(new Text(str[1]),new Text(str[0] + "
" + str[2])); 8 } 9 }</pre>

[[圖片上傳失敗...(image-3ec0d8-1535881942398)]](javascript:void(0); "復制代碼")

自定義Partition類:

[[圖片上傳失敗...(image-5c5780-1535881942398)]](javascript:void(0); "復制代碼")

<pre> 1 public class PartParttition extends Partitioner<Text, Text> { 2 public int getPartition(Text arg0, Text arg1, int arg2) { 3 String key = arg0.toString(); 4 int month = Integer.parseInt(key.substring(4, key.length()));
5 if (month == 1) {
6 return 1 % arg2; 7 } else if (month == 2) {
8 return 2 % arg2; 9 } else if (month == 3) { 10 return 3 % arg2; 11 }else if (month == 4) { 12 return 4 % arg2; 13 }else if (month == 5) { 14 return 5 % arg2; 15 }else if (month == 6) { 16 return 6 % arg2; 17 }else if (month == 7) { 18 return 7 % arg2; 19 }else if (month == 8) { 20 return 8 % arg2; 21 }else if (month == 9) { 22 return 9 % arg2; 23 }else if (month == 10) { 24 return 10 % arg2; 25 }else if (month == 11) { 26 return 11 % arg2; 27 }else if (month == 12) { 28 return 12 % arg2; 29 } 30 return 0; 31 } 32 }</pre>

[[圖片上傳失敗...(image-487b42-1535881942398)]](javascript:void(0); "復制代碼")

Reduce類:

[[圖片上傳失敗...(image-19ba30-1535881942398)]](javascript:void(0); "復制代碼")

<pre> 1 public class PartSortReduce extends Reducer<Text,Text,Text,Text> { 2 class FruitSales implements Comparable<FruitSales>{
3 private String name;//水果名字
4 private double sales;//水果銷量
5 public void setName(String name){ 6 this.name = name; 7 }
8
9 public String getName(){ 10 return this.name; 11 } 12 public void setSales(double sales){ 13 this.sales = sales; 14 } 15
16 public double getSales() { 17 return this.sales; 18 } 19
20 @Override 21 public int compareTo(FruitSales o) { 22 if(this.getSales() > o.getSales()){ 23 return -1; 24 }else if(this.getSales() == o.getSales()){ 25 return 0; 26 }else { 27 return 1; 28 } 29 } 30 } 31
32 public void reduce(Text key, Iterable<Text> values,Context context)throws IOException,InterruptedException{ 33 List<FruitSales> fruitList = new ArrayList<FruitSales>(); 34
35 for(Text value: values) { 36 String[] str = value.toString().split("_"); 37 FruitSales f = new FruitSales(); 38 f.setName(str[0]); 39 f.setSales(Double.parseDouble(str[1])); 40 fruitList.add(f); 41 } 42 Collections.sort(fruitList); 43
44 for(FruitSales f : fruitList){ 45 context.write(new Text(f.getName()),new Text(String.valueOf(f.getSales()))); 46 } 47 } 48 }</pre>

[[圖片上傳失敗...(image-758ca4-1535881942398)]](javascript:void(0); "復制代碼")

入口類:

[[圖片上傳失敗...(image-dee533-1535881942398)]](javascript:void(0); "復制代碼")

<pre> 1 public class PartSortMain { 2 public static void main(String[] args)throws Exception{ 3 Configuration conf = new Configuration(); 4 //獲取運行時輸入的參數(shù),一般是通過shell腳本文件傳進來。
5 String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs(); 6 if(otherArgs.length < 2){
7 System.err.println("必須輸入讀取文件路徑和輸出路徑");
8 System.exit(2);
9 } 10 Job job = new Job(); 11 job.setJarByClass(PartSortMain.class); 12 job.setJobName("PartSort app"); 13
14 //設置讀取文件的路徑,都是從HDFS中讀取。讀取文件路徑從腳本文件中傳進來
15 FileInputFormat.addInputPath(job,new Path(args[0])); 16
17 //設置mapreduce程序的輸出路徑,MapReduce的結果都是輸入到文件中
18 FileOutputFormat.setOutputPath(job,new Path(args[1])); 19
20
21 job.setPartitionerClass(PartParttition.class);//設置自定義partition類
22 job.setNumReduceTasks(12);//設置為partiton數(shù)量 23 //設置實現(xiàn)了map函數(shù)的類
24 job.setMapperClass(PartSortMap.class); 25
26 //設置實現(xiàn)了reduce函數(shù)的類
27 job.setReducerClass(PartSortReduce.class); 28
29 //設置reduce函數(shù)的key值
30 job.setOutputKeyClass(Text.class); 31 //設置reduce函數(shù)的value值
32 job.setOutputValueClass(Text.class); 33
34 System.exit(job.waitForCompletion(true) ? 0 :1); 35 } 36 }</pre>

[[圖片上傳失敗...(image-2bc744-1535881942398)]](javascript:void(0); "復制代碼")

運行后會在hdfs中生成12個文件,如下圖所示:

image

查看其中的一個文件會看到如下的內(nèi)容:

image

可以看到是按照銷量從高到低排序。

使用Python實現(xiàn)部分排序。

Python使用streaming的方式實現(xiàn)MapReduce,和Java方式不一樣,不能自定義Partition,但是可以在腳本文件中指定哪個字段用作partition,哪個字段用于排序。

下圖顯示數(shù)據(jù)經(jīng)過部分排序之后,數(shù)據(jù)變化的過程。即原始數(shù)據(jù),經(jīng)過map函數(shù),然后到reduce函數(shù),最終在每個文件中按照銷量從高到底排序的過程:

image

上圖中的第一步是在map函數(shù)中將原始數(shù)據(jù)的第二列的“年月”轉換成“月”,當做partition,將銷量當做key,水果名當做value。第二步是經(jīng)過MapReduce的排序之后到達Reduce函數(shù)之間的結果。第三步是在reduce函數(shù)中將map輸入的數(shù)據(jù)中將key當做reduce的value,將value當做reduce的key。

代碼如下:

map_sort.py

[[圖片上傳失敗...(image-90ef10-1535881942391)]](javascript:void(0); "復制代碼")

<pre> 1 #!/usr/bin/python
2 import sys 3 base_numer = 99999
4 for line in sys.stdin: 5 ss = line.strip().split(' ')
6 fruit = ss[0] 7 yearmm = ss[1]
8 sales = ss[2]
9 new_key = base_number - int(sales) 10 mm = yearmm[4:6] 11 print "%s\t%s\t%s" % (int(mm), int(new_key), fruit)</pre>

[[圖片上傳失敗...(image-3ebe61-1535881942391)]](javascript:void(0); "復制代碼")

reduce_sort.py

[[圖片上傳失敗...(image-c4854d-1535881942391)]](javascript:void(0); "復制代碼")

<pre>1 #!/usr/bin/python
2 import sys 3 base_number = 99999
4 for line in sys.stdin: 5 idx_id, sales, fruit = line.strip().split('\t') 6 new_key = base_number - int(sales) 7 print '\t'.join([val, str(new_key)])</pre>

[[圖片上傳失敗...(image-6db3ef-1535881942391)]](javascript:void(0); "復制代碼")

執(zhí)行腳本如下:

run.sh

[[圖片上傳失敗...(image-6b17b9-1535881942391)]](javascript:void(0); "復制代碼")

<pre> 1 set -e -x
2 HADOOP_CMD="/usr/local/src/hadoop-2.6.1/bin/hadoop"
3 STREAM_JAR_PATH="/usr/local/src/hadoop-2.6.1/share/hadoop/tools/lib/hadoop-streaming-2.6.1.jar"
4 INPUT_FILE_PATH_A="/data/fruit.txt"
5 OUTPUT_SORT_PATH="/output_sort"
6 HADOOP_CMD fs -rmr -skipTrashOUTPUT_SORT_PATH
7 HADOOP_CMD jarSTREAM_JAR_PATH
8 -input INPUT_FILE_PATH_A\ 9 -outputOUTPUT_SORT_PATH \ 10 -mapper "python map_sort.py" \ 11 -reducer "python reduce_sort.py" \ 12 -file ./map_sort.py \ 13 -file ./red_sort.py \ 14 -jobconf mapred.reduce.tasks=12 \ 15 -jobconf stream.num.map.output.key.fields=2 \ 16 -jobconf num.key.fields.for.partition=1 \ 17 -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner</pre>

[[圖片上傳失敗...(image-1f300c-1535881942391)]](javascript:void(0); "復制代碼")

-jobconf stream.num.map.output.key.fields=2 這行代碼用于指定排序的字段,數(shù)字2指定map函數(shù)輸出數(shù)據(jù)的第幾列用于排序,就是例子中的sales字段。

-jobconf num.key.fields.for.partition=1這行代碼指定partition字段,數(shù)字1指定map函數(shù)輸出數(shù)據(jù)的第一列用于分區(qū)。

-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner這行代碼是調(diào)用hadoop streaming包中的分區(qū)類,實現(xiàn)分區(qū)功能。

實現(xiàn)streaming partition功能時這三行代碼必不可少。

總結:

實現(xiàn)hadoop部分排序主要是通過partition方式實現(xiàn)。

java語言使用自定義分區(qū)Partition類實現(xiàn)分區(qū)的功能,而streaming是通過KeyFieldBasedPartitioner類,然后在腳本文件中指定partition類的方式實現(xiàn)。

?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內(nèi)容

  • 各種純css圖標 CSS3可以實現(xiàn)很多漂亮的圖形,我收集了32種圖形,在下面列出。直接用CSS3畫出這些圖形,要比...
    劍殘閱讀 9,700評論 0 8
  • 深入理解傅里葉變換Mar 12, 2017 這原本是我在知乎上對傅立葉變換、拉普拉斯變換、Z變換的聯(lián)系?為什么要進...
    價值趨勢技術派閱讀 5,824評論 2 2
  • 1、垂直對齊 如果你用CSS,則你會有困惑:我該怎么垂直對齊容器中的元素?現(xiàn)在,利用CSS3的Transform,...
    kiddings閱讀 3,207評論 0 11
  • 選擇qi:是表達式 標簽選擇器 類選擇器 屬性選擇器 繼承屬性: color,font,text-align,li...
    wzhiq896閱讀 1,806評論 0 2
  • 旅行誰說只有吃喝和拍照、擺pose, 還有健身! 剛回到學校之前一個月幾乎在路上,直白點就是會老朋友、見親人、參加...
    葉萍閱讀 1,121評論 4 6