1.配置插件
- 把hadoop-eclipse-plugin-1.2.1.jar拷貝到eclipse的plugins目錄中,重啟eclipse。
- 會(huì)看到eclipse左邊的project explorer中出現(xiàn)DFS Locations,點(diǎn)擊window->perspective->open perspective->other...,打開Map/Reduce。
Paste_Image.png
- 在下方新建Hadoop Locations
Paste_Image.png
- 填寫參數(shù):Location name隨便填,Map/Reducer Master中的Port好像填9001和50020都行,與mapred-site.xml中一致,右邊的Port與core-site.xml一致,寫9000。
Paste_Image.png
- 啟動(dòng)start-all.sh后,就能通過插件來操作DFS了。
Paste_Image.png
- 在hadoop-wsj下新建文件夾input/wc和output,在wc中上傳一個(gè)文件,用于統(tǒng)計(jì)單詞個(gè)數(shù);output用于存放輸出結(jié)果。
注意:只新建output,不新建output/wc,因?yàn)閣c會(huì)在程序運(yùn)行時(shí)自動(dòng)生成,提前建了反而報(bào)錯(cuò)。
2.新建map/reduce工程
- 注意建工程時(shí)要指定hadoop的安裝路徑
Paste_Image.png
最后是是mapreduce的Demo:
一共3個(gè)class文件,McMapper.class,WcReducer.class和JobRun.class。
- McMapper.class:
package test0;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class McMapper extends Mapper<LongWritable, Text, Text, IntWritable>{//輸入(key,value)類型確定
//每次調(diào)用map方法會(huì)傳入split中的一行數(shù)據(jù),key:該行數(shù)據(jù)所在文件中的位置下標(biāo),value:這行數(shù)據(jù)
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer st = new StringTokenizer(line);
while(st.hasMoreTokens()){
String word = st.nextToken();
context.write(new Text(word), new IntWritable(1));//map輸出
}
}
}
- WcReducer.class:
package test0;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WcReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
protected void reduce(Text key, Iterable<IntWritable> arg1,
Reducer<Text, IntWritable, Text, IntWritable>.Context arg2) throws IOException, InterruptedException {
int sum = 0;
for(IntWritable i:arg1){
sum = sum + i.get();
}
arg2.write(key, new IntWritable(sum));
}
}
- JobRun.class:
package test0;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class JobRun {
public static void main(String[] args){
Configuration conf = new Configuration();
/**
* 下面兩行很重要,是為了定位到HDFS的文件系統(tǒng)中,而不是本地的路徑
* 但前提是core-site.xml和hdfs-site.xml中的配置信息完全按照官方文檔寫,
* 自己不能改動(dòng)hadoop.tmp.dir的路徑,否則會(huì)報(bào)錯(cuò)
*/
conf.addResource(new Path("/home/wsj/hadoop121/hadoop-1.2.1/conf/core-site.xml"));
conf.addResource(new Path("/home/wsj/hadoop121/hadoop-1.2.1/conf/hdfs-site.xml"));
try {
Job job = new Job(conf);
job.setJarByClass(JobRun.class);
job.setMapperClass(McMapper.class);
job.setReducerClass(WcReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setNumReduceTasks(1);
FileInputFormat.addInputPath(job, new Path("/tmp/hadoop-wsj/input/wc"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/hadoop-wsj/output/wc"));
try {
System.exit(job.waitForCompletion(true) ? 0 : 1);
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}