scalding--用Scala語言來寫級聯Hadoop任務,隱藏了MapReduce底層細節

別老扯什么Hadoop了,你的數據根本不夠大 - 極客頭條 - CSDN.NET
http://geek.csdn.net/news/detail/2780
我推薦使用Scalding,而不是Hive或者Pig,因為你可以用Scala語言來寫級聯Hadoop任務,隱藏了MapReduce底層細節。


twitter/scalding: A Scala API for Cascading
https://github.com/twitter/scalding
Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs.

package com.twitter.scalding.examples

import com.twitter.scalding._
import com.twitter.scalding.source.TypedText

class WordCountJob(args: Args) extends Job(args) {
TypedPipe.from(TextLine(args("input")))
.flatMap { line => tokenize(line) }
.groupBy { word => word } // use each word for a key
.size // in each group, get the size
.write(TypedText.tsv(String, Long))

// Split a piece of text into individual words.
def tokenize(text: String): Array[String] = {
// Lowercase each word and remove punctuation.
text.toLowerCase.replaceAll("[^a-zA-Z0-9\s]", "").split("\s+")
}
}

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容