Eclipse編寫MapReduce程序

寫一個WordCount例子

有一個數據源,格式如下

data-source.png

求出 item 該分類有多少?

1.新建一個Map/Reduce Project

File - New - Other - Map/Reduce Project

2.SalesItemCategoryMapper.class

//SalesItemCategoryMapper.class
package SalesProduct;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class SalesItemCategoryMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);

    @Override
    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        // 將輸入的純文本的數據轉換成String
        String valueString = value.toString();
        // 將輸入的數據先按行進行分割
        StringTokenizer tokenizerArticle = new StringTokenizer(valueString, "\n");

        while (tokenizerArticle.hasMoreTokens()) {
            // 每行按空格劃分
            //  StringTokenizer tokenizer = new StringTokenizer(tokenizerArticle.nextToken());
            System.out.println("-tokenizer-->"+tokenizerArticle.nextToken());
            System.out.println("-valueString-->"+valueString);
            String[] items = valueString.split("\t");
            String itemName = items[3];
            Text name = new Text(itemName);
            output.collect(name, one);
        }
    }

}

3.SalesItemCategoryReducer.class

//SalesItemCategoryReducer.class
package SalesProduct;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class SalesItemCategoryReducer  extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {

    @Override
    public void reduce(Text t_key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        // TODO Auto-generated method stub
        Text key = t_key;
//      int frequencyForCountry = 0;
//      while(values.hasNext()){
//          IntWritable value = (IntWritable)values.next();
//          frequencyForCountry += value.get();
//          
//      }
        
        output.collect(key, new IntWritable(1));
    }

}

4.SalesItemDriver.class

//SalesItemDriver.class
package SalesResult;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;


public class SalesItemDriver {
    public static void main(String[] args) {
        JobClient my_client = new JobClient();
        JobConf job_conf = new JobConf(SalesItemDriver.class);
        job_conf.setJobName("SaleCategory");
        job_conf.setOutputKeyClass(Text.class);
        job_conf.setOutputValueClass(IntWritable.class);
//      job_conf.setOutputValueClass(DoubleWritable.class);

        // get category
        job_conf.setMapperClass(SalesProduct.SalesItemCategoryMapper.class);
        job_conf.setReducerClass(SalesProduct.SalesItemCategoryReducer.class);

        // get category sum
//      job_conf.setMapperClass(SalesProduct.SalesCategorySumMapper.class);
//      job_conf.setReducerClass(SalesProduct.SalesCategorySumReducer.class);

        job_conf.setInputFormat(TextInputFormat.class);
        job_conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job_conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

        my_client.setConf(job_conf);
        try {
            JobClient.runJob(job_conf);
        } catch (Exception e) {
            // TODO: handle exception
            e.printStackTrace();
        }
    }
}

5.添加日志 log4j.properties

log4j.rootLogger=INFO, stdout  
log4j.appender.stdout=org.apache.log4j.ConsoleAppender  
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout  
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n  
log4j.appender.logfile=org.apache.log4j.FileAppender  
log4j.appender.logfile.File=target/spring.log  
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout  
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n  

6.右鍵項目 Run as - Run configurations …

java-application-config-1

設置兩個參數

java-mapreduce-params

第一個參數表示你的目標文件
第二參數是運行結果保存在指定文件,注意保存所在文件夾不能手動創建,程序會自動創建。

7.點擊Run

demo-result.png
demo-result-2.png
demo-result-3.png

8.注意導入hadoop所需的類在如下文件夾內

hadoop-jars

9.練練手?
Data Set: https://pan.baidu.com/s/1c2J15Qw 密碼: 4xkd

The format goes like this:
date time store_name item price payment_method

2012-01-01 09:00 San Jose Men's Clothing 214.05 Amex
2012-01-01 09:00 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Diego Music 66.08 Cash
......
......
......

Use mapreduce programming model to find out:

  1. What are the item categories? What is the total sales value for each item category?
  2. What are the sales name for the following store name? "Reno" "Toledo" "Chandler"
  3. How many items in total were sold?
  4. What is the total sales value for all stores?

10.可能會有用的命令
Format namenode:

bin/hdfs namenode -format

Start Hadoop:

sbin/start-dfs.sh
shin/start-yarn.sh

Stop Hadoop:

sbin/stop-dfs.sh
sbin/stop-yarn.sh

Check Report

bin/hdfs dfsadmin -report

Allow port:

sudo ufw allow from 192.168.9.4 
#(allow access from this ip)
sudo ufw allow 22 
#(here allow 22 port)

HDFS create folder:

bin/hadoop fs -mkdir  /wordcount
#Use copyFromLocal to copy files to HDFS:
bin/hadoop fs -copyFromLocal /home/ubuntu/word.txt /wordcount/word.txt

check hadoop status :

http://目標IP:50070/dfshealth.html
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 9,787評論 0 23
  • 下午起床的時候 有一條好友申請 我是這個樣子的。。?!?我顯得很是淡定。。。我沒有聲張我還沒畢業。。。等等! 冷靜...
    嵐天玦閱讀 300評論 0 0
  • 很多人看到我們現在的規模,感覺不像轉型期。這是因為他對我們還不熟悉。 我們的轉型是一個新的起點,她有著自己獨特的經...
    王榕榕閱讀 259評論 0 0