目錄 |
---|
前言 |
單詞統(tǒng)計(jì) |
統(tǒng)計(jì)手機(jī)用戶流量日志 |
即將開始... |
統(tǒng)計(jì)手機(jī)用戶流量日志需求分析
需要統(tǒng)計(jì)手機(jī)用戶流量日志,日志內(nèi)容實(shí)例:
要把同一個(gè)用戶的上行流量、下行流量進(jìn)行累加,并計(jì)算出綜合。
例如上面的13897230503有兩條記錄,就要對(duì)這兩條記錄進(jìn)行累加,計(jì)算總和,得到:
手機(jī)號(hào) | 上行流量 | 下行流量 | 總流量 |
---|---|---|---|
13897230503 | 500 | 1600 | 2100 |
實(shí)現(xiàn)思路
- map
接收日志的一行數(shù)據(jù),key為行的偏移量,value為此行數(shù)據(jù)。
輸出時(shí),應(yīng)以手機(jī)號(hào)為key,value應(yīng)為一個(gè)整體,包括:上行流量、下行流量、總流量。
手機(jī)號(hào)是字符串類型Text,而這個(gè)整體不能用基本數(shù)據(jù)類型表示,需要我們自定義一個(gè)bean對(duì)象,并且要實(shí)現(xiàn)可序列化。
key: 13897230503
value: < upFlow:100, dFlow:300, sumFlow:400 >
- reduce
接收一個(gè)手機(jī)號(hào)標(biāo)識(shí)的key,及這個(gè)手機(jī)號(hào)對(duì)應(yīng)的bean對(duì)象集合。
例如:
key:
13897230503
value:
< upFlow:400, dFlow:1300, sumFlow:1700 >,
< upFlow:100, dFlow:300, sumFlow:400 >
迭代bean對(duì)象集合,累加各項(xiàng),形成一個(gè)新的bean對(duì)象,例如:
< upFlow:400+100, dFlow:1300+300, sumFlow:1700+400 >
最后輸出:
key: 13897230503
value: < upFlow:500, dFlow:1600, sumFlow:2100 >
代碼實(shí)踐
-
項(xiàng)目結(jié)構(gòu)
- pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>hadoop</groupId>
<artifactId>hadoop</artifactId>
<version>1.0-SNAPSHOT</version>
<name>hadoop</name>
<!-- FIXME change it to the project's website -->
<url>http://www.example.com</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils</artifactId>
<version>1.9.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
<plugins>
<!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
- FlowBean.java
public class FlowBean implements Writable {
private long upFlow;
private long dFlow;
private long sumFlow;
public FlowBean() {
}
public FlowBean(long upFlow, long dFlow) {
this.upFlow = upFlow;
this.dFlow = dFlow;
this.sumFlow = upFlow + dFlow;
}
public long getUpFlow() {
return upFlow;
}
public void setUpFlow(long upFlow) {
this.upFlow = upFlow;
}
public long getdFlow() {
return dFlow;
}
public void setdFlow(long dFlow) {
this.dFlow = dFlow;
}
public long getSumFlow() {
return sumFlow;
}
public void setSumFlow(long sumFlow) {
this.sumFlow = sumFlow;
}
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeLong(upFlow);
dataOutput.writeLong(dFlow);
dataOutput.writeLong(sumFlow);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
upFlow = dataInput.readLong();
dFlow = dataInput.readLong();
sumFlow = dataInput.readLong();
}
@Override
public String toString() {
return upFlow + "\t" + dFlow + "\t" + sumFlow;
}
}
- FlowCount.java
public class FlowCount {
static class FlowCountMapper extends Mapper {
@Override
protected void map(Object key, Object value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] fields = line.split("\t");
String phoneNbr = fields[0];
long upFlow = Long.parseLong(fields[1]);
long dFlow = Long.parseLong(fields[2]);
context.write(new Text(phoneNbr), new FlowBean(upFlow, dFlow));
}
}
static class FlowCountReducer extends Reducer {
@Override
protected void reduce(Object key, Iterable values, Context context)
throws IOException, InterruptedException {
long sumUpFlow = 0;
long sumDownFlow = 0;
Iterator<FlowBean> iterator = values.iterator();
while (iterator.hasNext()) {
FlowBean flowBean = iterator.next();
sumUpFlow += flowBean.getUpFlow();
sumDownFlow += flowBean.getdFlow();
}
FlowBean flowBean = new FlowBean(sumUpFlow, sumDownFlow);
context.write(key, flowBean);
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(FlowCount.class);
job.setMapperClass(FlowCountMapper.class);
job.setReducerClass(FlowCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(FlowBean.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
if (!result) {
System.out.println("Task fail!");
}
}
}
運(yùn)行
- 將flowdata.log上傳到hdfs
13726230501 200 1100
13396230502 300 1200
13897230503 400 1300
13897230503 100 300
13597230534 500 1400
13597230534 300 1200
運(yùn)行
hadoop jar hadoop-1.0-SNAPSHOT.jar flowcount/FlowCount /flowcount/input /flowcount/output
-
查看結(jié)果
擴(kuò)展
現(xiàn)在會(huì)把不同省份的號(hào)碼段統(tǒng)計(jì)在一起,例如137表示屬于河北,138屬于河南。
現(xiàn)在的代碼需要在原來的代碼上新增兩步。
- 自定義一個(gè)分區(qū)器Partitioner
- 在main程序中指定使用我們自定義的Partitioner
ProvincePartitioner.java
public class ProvincePartitioner extends Partitioner {
private static HashMap<String, Integer> provinceDict = new HashMap<>();
static {
provinceDict.put("137", 0);
provinceDict.put("133", 1);
provinceDict.put("138", 2);
provinceDict.put("135", 3);
}
@Override
public int getPartition(Object o, Object o2, int i) {
String prefix = o.toString().substring(0, 3);
Integer provinceId = provinceDict.get(prefix);
return provinceId == null ? 4 : provinceId;
}
}
FlowCount.java
public class FlowCount {
static class FlowCountMapper extends Mapper {
@Override
protected void map(Object key, Object value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] fields = line.split("\t");
String phoneNbr = fields[0];
long upFlow = Long.parseLong(fields[1]);
long dFlow = Long.parseLong(fields[2]);
context.write(new Text(phoneNbr), new FlowBean(upFlow, dFlow));
}
}
static class FlowCountReducer extends Reducer {
@Override
protected void reduce(Object key, Iterable values, Context context)
throws IOException, InterruptedException {
long sumUpFlow = 0;
long sumDownFlow = 0;
Iterator<FlowBean> iterator = values.iterator();
while (iterator.hasNext()) {
FlowBean flowBean = iterator.next();
sumUpFlow += flowBean.getUpFlow();
sumDownFlow += flowBean.getdFlow();
}
FlowBean flowBean = new FlowBean(sumUpFlow, sumDownFlow);
context.write(key, flowBean);
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(FlowCount.class);
job.setMapperClass(FlowCountMapper.class);
job.setReducerClass(FlowCountReducer.class);
job.setPartitionerClass(ProvincePartitioner.class);
job.setNumReduceTasks(5);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(FlowBean.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FlowBean.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
if (!result) {
System.out.println("Task fail!");
}
}
}