Big Data, Crystal Balls and Looking Glasses: Reviewing 2016, predicting 2017
大數據,水晶球和鏡子:回顧2016,預測2017
End-of-year reviews are boring -- and everyone does them. Predictions are boring -- and they are hard. Of course, this is different -- because big data.
年底回顧很無聊—可每個人都要做回顧。預測未來很無聊--并且它們很難預測。當然,這是不同的--因為大數據。
How do big data people go about making end-of-year reviews and predictions? Using data is the obvious answer, but there's a few issues with that approach: there is no synthesis in data alone -- you have to find the story behind data, pick an angle and seek meaning. In addition, that approach does not account for subtle hints, industry knowledge, and big ideas.
搞大數據的人們是如何來年底回顧和來年預測的呢?使用數據是顯而易見的答案,但是這個方法有一些問題:數據里面沒有綜合的結論--你需要找到數據背后的故事,選取一個角度并且尋找它的意義。另外,那個方法不包含精確的提示信息,行業知識和大方向。
To paraphrase Carl Sagan, "we wish to find the truth, no matter where it lies. But to find the truth we need imagination and data both. We will not be afraid to speculate, but we will be careful to distinguish speculation from fact." In this spirit, let's keep things equally opinionated and objective in 2017.
卡爾薩根的意思是,“我們希望找到真相,無論它在哪里。但是為了找到真相,我們需要想象力和數據。我們不害怕推測,但是我們會很仔細從事實中獲取推測結果。” 在這種精神下,讓我們在2017同等主觀又客觀地看事情吧。
It's the end of Hadoop as we know it, and I feel fine
正如我們所知道的那樣,Hadoop要到頭了,我覺得還好。
Hadoop turned 10 in 2016. It's come a long way from a pet project named after a toy elephant to the (metaphorical) stampeding beast now in most every CXO's name-dropping list. The latest Big Data maturity survey showed that 73 percent of respondents are now in production with Hadoop (vs. 65 percent last year). And yet we're here to tell you Hadoop as we know it is dead. And that's not even news.
Hadoop在2016年表現的十全十美。它從一個以玩具大象命名的實驗項目成長到現在幾乎出現在每個首席官的炫耀名單里的狂奔的怪獸花了很長的時間。最新的大數據成熟度調查顯示百分之七十三的受訪者現在產品中都在使用Hadoop(相對去年是百分之六十五)。然后據我們所知Hadoop已死,而這幾乎不是新聞。
Hadoop has been constantly evolving, expanding, and re-inventing itself throughout its lifetime. A massive ecosystem has been developing around the initial bare-bones offering, and today Hadoop is more of a platform than "just" a storage and compute framework. The introduction of YARN was a game changer, enabling Hadoop to become a Big Data OS and to break away from its batch-oriented MapReduce origins.
Hadoop在它的生命過程中一直在持續的演進,擴張,和重新發明自己。圍繞著最初的基礎功能,Hadoop發展出了一個龐大的生態系統,并且今天它更像一個平臺,而不僅僅是一個儲存和計算的框架。YARN的引入顛覆了Hadoop,使得Hadoop成為了一個大數據操作系統,脫離了原來的面向批量操作的MapReduce。
In 2016, data and stories from the trenches all pointed to the same direction: batch, MapReduce Hadoop is dead, long live real-time, Spark Hadoop. 25 percent of organizations are using Spark in production today with an additional 33 percent using it in development, and all major Hadoop vendors are involved in it. Adding up suggests that by the end of 2017 up to 50 percent of organizations could be using Spark in production.
在2016年,現實中的數據和事例都指向了同一個方向:批處理,MapReduce Hadoop已死,實時處理萬歲,Spark Hadoop。現在百分之二十五的組織中線上產品中都在用Spark,另外有33%正在使用Spark做開發,并且所有主流的Hadoop服務商都參與到Spark中了。到2017年底,加起來會有多達50%的公司在它們的線上產品中使用Spark。
But it's not necessarily a Spark or bust future: neither is Spark the only streaming game in town, nor is Hadoop the only Big Data platform. Alternatives do exist, and users may migrate or leapfrog to them skipping Spark or Hadoop altogether, the same way they are now migrating from or skipping MapReduce.
Spark未來會興盛還是蕭條都不一定:Spark既不是唯一最好的大數據平臺,Hadoop也不是僅有的大數據平臺。可選方案確實存在,用戶可以遷移到或者跳過Spark和Hadoop到它們上面去,就像現在人們正從MapReduce遷移出去或者跳過MapReduce一樣。
[圖片上傳中。。。(1)]
The Big Data landscape is host to a multitude of different approaches. But more and more it looks like everyone is adding everyone else's features. Convergence or me-too? Image: Martin Kleppmann.
大數據框架是基于許多不同方法的。但是看起來每個模塊都在加入越來越多其余模塊的功能。聚合還是復制?圖片:Martin Kleppmann
**
Becoming all things to all men to save some
成為滿足所有人的萬能者來保留用戶
Spark can do both streaming and batch processing. And it can also do SQL, and graphs. And of course on Hadoop you can also do SQL and/or NoSQL in a number of other ways, utilizing a wide choice of tools. That's what being an ecosystem is all about, right? But then again, everyone seems to be at it these days.
Spark既能做流處理也能做批量處理。它也能處理SQL和圖片。當然在Hadoop上你也能通過使用許多可選的工具來處理SQL和/或NoSQL。這是作為一個生態系統所應該做的,是嗎?但是再說一次,每個大數據系統現在看起來都是這樣子的。
NoSQL databases like Cassandra / DataStax Enterprise can now also do graph, in addition to key-value, tabular and document. What about the iconic NoSQL document store - MongoDB? Well, besides document, you can now also do SQL . Microsoft's SQL Server? Youraverage SQL server no more: it can run on Linux, it supports R, in-memory processing and column store. MariaDB, the poor man's SQL server, also has its column store now.
像Cassandra / DataStax Enterprise 這樣子的NoSQL數據庫在能處理鍵值,格式化和文檔之外現在也能處理圖片。那著名的NoSQL文檔庫MongoDB怎么樣呢?好吧,除了文檔,你也能使用SQL了。微軟的SQL Server呢?它不再是你認識那個平庸的SQL服務器了:它能再Linux上運行,它支持R語言,內存運行和列存儲。MariaDB,窮人的SQL服務器,它現在也支持列存儲了。
Neo4J, the iconic graph store? It's going ACID. Google's BigQuery now supports standard SQL , joining Amazon Redshift that has had it for a while as it's based on Postgres. Of course, analytics-oriented column stores have long supported SQL. And traditional relational DBs like Oracle and IBM have been adding features like in-memory processing and column store for a while as well. Key-stores do it, document-stores do it, graph-stores do it, even SQL incumbents do it.
Neo4J, 典型的圖形數據庫?它也要支持ACID了。谷歌的BigQuery現在支持標準SQL,Amazon Redshift使用了BigQuery一段時間了因為它基于Postgres。當然,面向統計的列存儲數據庫長久以來就支持SQL。傳統的關系型數據庫像Oracle和IBM也一直在增加像內存處理和列存儲這樣子的功能。鍵值存儲數據庫這樣子,文檔存儲數據庫這樣子,圖形存儲數據庫這樣子,甚至就連SQL數據庫也是如此。
The boundaries are blurring, as more and more data platforms try to be more things to more people. Doing most everything on the same platform is good for vendors that want to increase their retention and good for users who don't want to have to mix and match disparate platforms to get things done. But it's not a sheer land-ho of opportunity - threats lie ahead too. Most notably, vendor lock-in, half-baked features, and half-hearted users.
因為越來越多的平臺都在為更多的人群提供更多的功能,平臺之間的界限正越來越模糊。對于想增加客戶保留率的供應商和不想混用和拼接不相干的平臺來達到目的的用戶來說,在相同的一個平臺上把幾乎所有事情都做了是極好的。但是它并不是一個純粹的充滿機會的土地,危險也同樣存在. 最顯著的問題有,供應商鎖定,半吊子功能和意興闌珊的用戶。
[圖片上傳中。。。(2)]
Some are trying to get the basics right, while some are after up in the sky goals. Yet, there's a place for everyone under Big Data. Image: Martin Kleppmann
一些人在為了基本的權利而努力,同時一些人在追求遠大的目標。然而,大數據下每個人都有自己的容身之地。 圖片:Martin Kleppmann
This article is from http://www.zdnet.com/article/big-data-crystal-balls-and-looking-glasses-reviewing-2016-predicting-2017/