ChainMapper/ChainReducer實現原理及案例分析

ChainMapper/ChainReducer的實現原理

ChainMapper/ChainReducer主要為了解決線性鏈式Mapper而提出的。也就是說，在Map或者Reduce階段存在多個Mapper，這些Mapper像linux管道一樣，前一個Mapper的輸出結果直接重定向到下一個Mapper的輸入，形成一個流水線，形式類似于[MAP + REDUCE MAP*]。下圖展示了一個典型的ChainMapper/ChainReducer的應用場景。
在Map階段，數據依次經過Mapper1和Mapper2處理；在Reducer階段，數據經過shuffle和sort排序后，交給對應的Reduce處理，但Reducer處理之后還可以交給其它的Mapper進行處理，最終產生的結果寫入到hdfs輸出目錄上。

注意：對于任意一個MapReduce作業，Map和Reduce階段可以有無限多個Mapper，但是Reducer只能有一個。

通過鏈式MapReducer模式可以有效的減少網絡間傳輸數據的帶寬，因為大量的計算基本都是在本地進行的。如果通過迭代作業的方式實現多個MapReduce作業組合的話就會在網絡間傳輸大量的數據，這樣會非常的耗時。

ChainMapper官方說明

ChainMapper類允許使用多個Map子類作為一個Map任務。

這些map子類的執行與liunx的管道命令十分相似，第一個map的輸出會成為第二個map的輸入，第二個map的輸出也會變成第三個map的輸入，以此類推，直到最后一個map的輸出會變成整個mapTask的輸出。

該特性的關鍵功能是鏈中的Mappers不需要知道它們是在鏈中執行的。這使具有可重用的專門的映射器可以組合起來，在單個任務中執行組合操作。

注意:在創建鏈式是每個Mapper的鍵/值的輸出是鏈中下一個Mapper或Reducer的輸入。它假定所有的映射器和鏈中的Reduce都使用匹配輸出和輸入鍵和值類，因為沒有對鏈接代碼進行轉換。

使用方法

...
Job = Job.getInstance(conf);

Configuration mapAConf = new Configuration(false);
...
ChainMapper.addMapper(job, AMap.class, LongWritable.class, Text.class,
 Text.class, Text.class, true, mapAConf);

Configuration mapBConf = new Configuration(false);
...
ChainMapper.addMapper(job, BMap.class, Text.class, Text.class,
 LongWritable.class, Text.class, false, mapBConf);

...

job.waitForComplettion(true);
   ...

addMapper函數的參數說明

static void addMapper(Job job, Class<? extends Mapper> klass,
  Class<?> inputKeyClass, Class<?> inputValueClass,
  Class<?> outputKeyClass, Class<?> outputValueClass,
  Configuration mapperConf)
## 參數的含義如下
# 1. job
# 2. 此map的class
# 3. 此map的輸入的key類型
# 4. 此map的輸入的value類型
# 5. 此map的輸出的key類型
# 6. 此map的輸出的value類型
# 7. 此map的配置文件類conf

ChainReducer官方說明

ChainReducer類允許多個map在reduce執行完之后執行在一個reducerTask中，
reducer的每一條輸出，都被作為輸入給ChainReducer類設置的第一個map，然后第一個map的輸出作為第二個map的輸入，以此類推，最后一個map的輸出會作為整個reducerTask的輸出，寫到磁盤上。

使用方法


...
Job = new Job(conf);
....

Configuration reduceConf = new Configuration(false);
...
ChainReducer.setReducer(job, XReduce.class, LongWritable.class, Text.class,
  Text.class, Text.class, true, reduceConf);

ChainReducer.addMapper(job, CMap.class, Text.class, Text.class,
  LongWritable.class, Text.class, false, null);

ChainReducer.addMapper(job, DMap.class, LongWritable.class, Text.class,
  LongWritable.class, LongWritable.class, true, null);

...

job.waitForCompletion(true);
...

setReducer函數的參數說明

static void setReducer(Job job, Class<? extends Reducer> klass,
 Class<?> inputKeyClass, Class<?> inputValueClass,
  Class<?> outputKeyClass, Class<?> outputValueClass,
   Configuration reducerConf)
## 參數的含義如下
# 1. job
# 2. 此reducer的class
# 3. 此reducer的輸入的key類型
# 4. 此reducer的輸入的value類型
# 5. 此reducer的輸出的key類型
# 6. 此reducer的輸出的value類型
# 7. 此reducer的配置文件類conf

案例

案例描述

統計出一篇文章的高頻詞匯（只收集出現次數大于3的單詞），去除謂詞，并且過濾掉敏感詞匯。

實現方法

在MapTask中有三個子Mapper，分別命名為M1,M2,M3，在ReduceTask階段有一個Reduce命名為R1和一個Mpaaer命名為RM1。

MapTask階段

M1負責將文本內容按行切分每個單詞，M2負責將M1輸出的單詞進行謂詞過濾，M3將M2輸出的內容進行敏感詞過濾。

ReduceTask階段

Reduce過程中R1負責將shuffle階段中的單詞進行統計，統計好之后將結果交給RM1處理，RM1主要是將單詞數量大于5的單詞進行輸出。

上述方法只是為了展示ChainMapper/ReducerMapper的使用過程，所以觀者勿噴。

代碼

Mapper1

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class Mapper1 extends Mapper<LongWritable, Text, Text, IntWritable> {
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        System.out.println("Mapper1 setup===========");
    }

    @Override
    protected void map(LongWritable key, Text value, Context context)
    throws IOException, InterruptedException {
        System.out.println("map1===========" + value.toString());
        String line = value.toString() ;
        String[] strArr = line.split(" ") ;

        for (String w: strArr) {
            context.write(new Text(w), new IntWritable(1));
        }
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        System.out.println("Mapper1 cleanup===========");
    }
}

Mapper2

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * 該Mapper是用于過濾謂詞，但是過濾單詞不是本文的關鍵，所以為了演示方便
 * 這里只過濾一個單詞‘of’
 */
public class Mapper2 extends Mapper<Text, IntWritable, Text, IntWritable> {
    protected void setup(Context context) throws IOException, InterruptedException {
        System.out.println("Mapper2 setup===========");
    }
    @Override
    protected void map(Text key, IntWritable value, Context context)
    throws IOException, InterruptedException {
        System.out.println("map2==================" + key.toString() + ":" + value.toString());
        //過濾單詞'of'
        if (! key.toString().equals("of")){
            context.write(key, value);
        }
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        System.out.println("Mapper2 cleanup===========");
    }
}

Mapper3

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * 該Mapper是用于過濾敏感詞匯，但是過濾單詞不是本文的關鍵，所以為了演示方便
 * 這里只過濾一個單詞‘xxx’
 */
public class Mapper3 extends Mapper<Text, IntWritable, Text, IntWritable> {
    protected void setup(Context context) throws IOException, InterruptedException {
        System.out.println("Mapper3 setup===========");
    }
    @Override
    protected void map(Text key, IntWritable value, Context context)
    throws IOException, InterruptedException {
        System.out.println("map3==================" + key.toString() + ":" + value.toString());
        //過濾單詞'google'
        if (! key.toString().equals("xxx")){
            context.write(key, value);
        }
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        System.out.println("Mapper3 cleanup===========");
    }
}

Reducer1

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * Created by yanzhe on 2017/8/18.
 */
public class Reducer1 extends Reducer<Text, IntWritable, Text, IntWritable> {
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        System.out.println("Reducer1 setup===========");
    }

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context)
    throws IOException, InterruptedException {
        int count = 0 ;
        for (IntWritable iw: values) {
            count += iw.get();
        }
        context.write(key, new IntWritable(count));
        System.out.println("reduce=========" + key.toString() + ":" + count);
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        System.out.println("Reducer1 cleanup===========");
    }
}

ReduceMapper

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * Created by yanzhe on 2017/8/18.
 */
public class ReducerMapper1 extends Mapper<Text, IntWritable, Text, IntWritable> {
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        System.out.println("ReducerMapper1 setup===========");
    }

    @Override
    protected void map(Text key, IntWritable value, Context context)
    throws IOException, InterruptedException {
        if (value.get() > 5)
            context.write(key, value);

        System.out.println("reduceMap======" + key.toString() + ":" + value.toString());
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        System.out.println("ReducerMapper1 cleanup===========");
    }
}

App

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.chain.ChainMapper;
import org.apache.hadoop.mapreduce.lib.chain.ChainReducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * Created by yanzhe on 2017/8/18.
 */
public class App {
    public static void main(String[] args) throws Exception {

        args = new String[]{"d:/java/mr/data/data.txt", "d:/java/mr/out"} ;

        Configuration conf = new Configuration();

        FileSystem fs = FileSystem.get(conf) ;

        Path outPath = new Path(args[1]) ;
        if (fs.exists(outPath)){
            fs.delete(outPath,true) ;
        }

        Job job = Job.getInstance(conf) ;

        ChainMapper.addMapper(job,Mapper1.class, LongWritable.class, Text.class, Text.class, IntWritable.class,job.getConfiguration());

        ChainMapper.addMapper(job,Mapper2.class, Text.class,IntWritable.class, Text.class, IntWritable.class,job.getConfiguration());

        ChainMapper.addMapper(job,Mapper3.class, Text.class,IntWritable.class, Text.class, IntWritable.class,job.getConfiguration());

        ChainReducer.setReducer(job, Reducer1.class, Text.class, IntWritable.class, Text.class, IntWritable.class,job.getConfiguration());

        ChainReducer.addMapper(job, ReducerMapper1.class, Text.class,
                IntWritable.class, Text.class, IntWritable.class, job.getConfiguration());

        FileInputFormat.addInputPath(job,new Path(args[0]));

        FileOutputFormat.setOutputPath(job,outPath);

        job.setNumReduceTasks(2);
        job.setCombinerClass(Combiner1.class);
        job.setPartitionerClass(MyPartitioner.class);

        job.waitForCompletion(true) ;

    }
}

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

ChainMapper/ChainReducer實現原理及案例分析

ChainMapper/ChainReducer實現原理及案例分析

ChainMapper/ChainReducer實現原理及案例分析

ChainMapper/ChainReducer的實現原理

ChainMapper官方說明

使用方法

addMapper函數的參數說明

ChainReducer官方說明

使用方法

setReducer函數的參數說明

案例

案例描述

實現方法

代碼

Mapper1

Mapper2

Mapper3

Reducer1

ReduceMapper

App

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

ChainMapper/ChainReducer實現原理及案例分析

ChainMapper/ChainReducer實現原理及案例分析

ChainMapper/ChainReducer的實現原理

ChainMapper官方說明

使用方法

addMapper函數的參數說明

ChainReducer官方說明

使用方法

setReducer函數的參數說明

案例

案例描述

實現方法

代碼

Mapper1

Mapper2

Mapper3

Reducer1

ReduceMapper

App

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频