给我免费播放片在线中国,激情欧美性aaaaa片直播,极度sm残忍bdsm变态

Spark分為兩種API：Transformations和Actions。

Transformations的常用操作有：map，filter，flatMap，union，sortByKey，reduceByKey等。
更多的解釋請參考：Spark Transformations
Actions的常用操作有：reduce，collect，count，countByKey，foreach，saveAsTextFile等。
更多的解釋請參考：Spark Actions

官方文檔中解釋：

RDDs support two types of operations: transformations, which create a new dataset from an existing one, and actions, which return a value to the driver program after running a computation on the dataset.

Transformations是轉換操作，Actions是執行操作。
關于Transformations還有另外的解釋：

All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file).

所有的Transformations操作都使用了lazy，它們不會計算結果，只是記錄dataset的轉換操作。

Transformations代碼樣例

探索一下源碼，看看Transformations是如何記錄轉換操作的。
先看一下的map為例，查看map的代碼(RDD.scala)：

def map[U: ClassTag](f: T => U): RDD[U] = withScope {
  val cleanF = sc.clean(f)
  new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF))
}

代碼很簡單，clean一下f，然后創建一個MapPatitionsRDD。
再看一下filter，查看filter的代碼(RDD.scala)：

  def filter(f: T => Boolean): RDD[T] = withScope {
    val cleanF = sc.clean(f)
    new MapPartitionsRDD[T, T](
      this,
      (context, pid, iter) => iter.filter(cleanF),
      preservesPartitioning = true)
  }

兩個邏輯大致是一樣的，值得注意的是：兩個方法中有下面一個操作：
val cleanF = sc.clean(f)
為什么要執行這個操作呢。看一下clean方法的解釋(SparkContext.scala):

  /**
   * Clean a closure to make it ready to serialized and send to tasks
   * (removes unreferenced variables in $outer's, updates REPL variables)
   * If <tt>checkSerializable</tt> is set, <tt>clean</tt> will also proactively
   * check to see if <tt>f</tt> is serializable and throw a <tt>SparkException</tt>
   * if not.
   *
   * @param f the closure to clean
   * @param checkSerializable whether or not to immediately check <tt>f</tt> for serializability
   * @throws SparkException if <tt>checkSerializable</tt> is set but <tt>f</tt> is not
   *   serializable
   */
  private[spark] def clean[F <: AnyRef](f: F, checkSerializable: Boolean = true): F = {
    ClosureCleaner.clean(f, checkSerializable)
    f
  }

其中比較重要的兩句：

Clean a closure to make it ready to serialized and send to tasks
(removes unreferenced variables in $outer's, updates REPL variables)

解釋這句話要有集群組件說起了，請看圖：

cluster-overview.png

具體參考Cluster Mode Overview
在集群上，運行一個spark程序需要三個組件，Driver，ClusterManager，WorkNode。
Driver負責提交任務和處理結果。ClusterManager負責分配任務。WorkNode負責執行具體的任務。
所有的任務代碼都在Driver節點，那么任務要想執行，就必須把代碼傳到WorkNode節點，所以需要先將代碼序列化后再傳到相應節點。
那么Clean a closure to make it ready to serialized and send to tasks就很好理解了。

再回到map和filter的代碼，兩個代碼都創建了一個MapPartitionsRDD，看一下MapPartitionsRDD的代碼：
主要注意它的類聲明和compute方法

private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag](
    var prev: RDD[T],
    f: (TaskContext, Int, Iterator[T]) => Iterator[U],  // (TaskContext, partition index, iterator)
    preservesPartitioning: Boolean = false)
  extends RDD[U](prev) {

MapPartitionsRDD繼承了RDD，就像是一個鏈表，每次Transformations都會在這個鏈表上加上一個節點
在看看compute方法

override def compute(split: Partition, context: TaskContext): Iterator[U] =
  f(context, split.index, firstParent[T].iterator(split, context))

compute方法就是調用f操作，轉換出一個新的Iterator[U]。
猜想，Action操作最終會調用compute方法完成數據的轉換。

接下來在寫Action的內容。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Spark的Transformation的lazy策略

Spark的Transformation的lazy策略

Spark分為兩種API：Transformations和Actions。

Transformations代碼樣例

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Spark的Transformation的lazy策略

Spark分為兩種API：Transformations和Actions。

Transformations代碼樣例

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频