迅雷下载,人玩山羊水门的视频,黑狐影院在线观看免费版

Gelly利用Flink的高效迭代算子來(lái)支持海量數(shù)據(jù)的迭代式圖處理。目前，F(xiàn)link Gelly提供了“Vertex-Centric”，“Scatter-Gather”以及“Gather-Sum-Apply”等計(jì)算模型的實(shí)現(xiàn)。下面將展示這些計(jì)算模型的思想和使用場(chǎng)景。

Vertex-Centric Iterations

“Vertex-Centric”迭代模型也就是我們經(jīng)常聽(tīng)到的“Pregel”，是一種從Vertex角度出發(fā)的圖計(jì)算方式。其中，同步地迭代計(jì)算的步驟稱之為“superstep”。在每個(gè)“superstep”中，每個(gè)頂點(diǎn)都執(zhí)行一個(gè)用戶自定義的函數(shù)，且頂點(diǎn)之間通過(guò)消息進(jìn)行通信，當(dāng)一個(gè)頂點(diǎn)知道圖中其他任意頂點(diǎn)的唯一ID時(shí)，該頂點(diǎn)就可以向其發(fā)送一條消息。
該計(jì)算模型如下圖所示。虛線框?qū)?yīng)了一系列并行的計(jì)算單元（即用戶自定義的計(jì)算函數(shù)）。在每個(gè)“superstep”中，所有的活躍的頂點(diǎn)并行地執(zhí)行同一個(gè)用戶自定義的計(jì)算函數(shù)。所有的“superstep”之間同步地被執(zhí)行（step by step），因此可以保證一個(gè)“superstep”發(fā)送的消息會(huì)在下一個(gè)“superstep”開(kāi)始的時(shí)候被接收到。

Vertex-Centric Iterations.png

在Gelly中，用戶只需要定義頂點(diǎn)的計(jì)算函數(shù)（Compute Function）就可以使用“Vertex-Centric”迭代模型。把這個(gè)計(jì)算函數(shù)和最大迭代次數(shù)傳給Gelly的runVertexCentricIteration方法，該方法會(huì)在輸入的圖上執(zhí)行“Vertex-Centric”的迭代計(jì)算，然后返回一個(gè)頂點(diǎn)值被修改的新圖。另外，可以選擇定義一個(gè)可選的消息組合器MessageCombiner以降低通信成本。

下面可以看一個(gè)用vertex-centric模式實(shí)現(xiàn)計(jì)算單源點(diǎn)最短路徑的例子。最開(kāi)始，每一個(gè)頂點(diǎn)都帶有一個(gè)表示距離的屬性，除了源點(diǎn)外該屬性值為0外，其他的頂點(diǎn)該屬性值均為無(wú)窮大。在第一步中，源點(diǎn)沿著邊向鄰居頂點(diǎn)傳播它的距離，在接下來(lái)的superstep中每個(gè)頂點(diǎn)檢查其接收的信息，然后從中選擇一個(gè)最小距離，如果該距離值小于當(dāng)前頂點(diǎn)上的距離屬性值，則將該屬性值進(jìn)行修改，并將該值傳遞給鄰居頂點(diǎn)，否則什么都不做。該算法在所有頂點(diǎn)值都不變或者達(dá)到指定迭代次數(shù)時(shí)收斂。在該算法中，Message Combiner可以用來(lái)發(fā)送到目標(biāo)節(jié)點(diǎn)的消息數(shù)量。
代碼示例如下：

// read the input graph
Graph<Long, Double, Double> graph = ...

// define the maximum number of iterations
int maxIterations = 10;

// Execute the vertex-centric iteration
Graph<Long, Double, Double> result = graph.runVertexCentricIteration(
            new SSSPComputeFunction(), new SSSPCombiner(), maxIterations);

// Extract the vertices as the result
DataSet<Vertex<Long, Double>> singleSourceShortestPaths = result.getVertices();


// - - -  UDFs - - - //

public static final class SSSPComputeFunction extends ComputeFunction<Long, Double, Double, Double> {

public void compute(Vertex<Long, Double> vertex, MessageIterator<Double> messages) {

    double minDistance = (vertex.getId().equals(srcId)) ? 0d : Double.POSITIVE_INFINITY;

    for (Double msg : messages) {
        minDistance = Math.min(minDistance, msg);
    }

    if (minDistance < vertex.getValue()) {
        setNewVertexValue(minDistance);
        for (Edge<Long, Double> e: getEdges()) {
            sendMessageTo(e.getTarget(), minDistance + e.getValue());
        }
    }
}

// message combiner
public static final class SSSPCombiner extends MessageCombiner<Long, Double> {

    public void combineMessages(MessageIterator<Double> messages) {

        double minMessage = Double.POSITIVE_INFINITY;
        for (Double msg: messages) {
           minMessage = Math.min(minMessage, msg);
        }
        sendCombinedMessage(minMessage);
    }
}

Configuring a Vertex-Centric Iteration

可以使用VertexCentricConfiguration對(duì)Vertex-Centric Iteration進(jìn)行配置，目前可以指定如下參數(shù)：

Name：Vertex-Centric Iteration的名稱，顯示在日志和消息中，可以通過(guò)使用setName()方法來(lái)指定。
Parallelism：Iteration的并發(fā)度，可以使用setParallelism()方法來(lái)設(shè)定。
Solution set in unmanaged memor：定義解集合是否保存在托管內(nèi)存內(nèi)（Flink 內(nèi)部以序列化的方式保存對(duì)象），默認(rèn)情況下，解集合運(yùn)行在托管內(nèi)存中。該屬性可以使用方法setSolutionSetUnmanagedMemory()來(lái)設(shè)置。
Aggregators：迭代的聚合器可以使用方法registerAggregator()來(lái)注冊(cè)，聚合器可以在每個(gè)superstep中全局地聚合一次所有的變量，并使其對(duì)接下來(lái)的superstep可用。注冊(cè)的Aggregators可以在用戶自定義的計(jì)算函數(shù)ComputeFunction內(nèi)部訪問(wèn)。
Broadcast Variables：可以使用addBroadcastSet()將數(shù)據(jù)集作為Broadcast Variables傳入到ComputeFunction。

示例如下：

Graph<Long, Double, Double> graph = ...

// configure the iteration
VertexCentricConfiguration parameters = new VertexCentricConfiguration();

// set the iteration name
parameters.setName("Gelly Iteration");

// set the parallelism
parameters.setParallelism(16);

// register an aggregator
parameters.registerAggregator("sumAggregator", new LongSumAggregator());

// run the vertex-centric iteration, also passing the configuration parameters
Graph<Long, Long, Double> result =
            graph.runVertexCentricIteration(
            new Compute(), null, maxIterations, parameters);

// user-defined function
public static final class Compute extends ComputeFunction {

    LongSumAggregator aggregator = new LongSumAggregator();

    public void preSuperstep() {

        // retrieve the Aggregator
        aggregator = getIterationAggregator("sumAggregator");
    }


    public void compute(Vertex<Long, Long> vertex, MessageIterator inMessages) {

        //do some computation
        Long partialValue = ...

        // aggregate the partial value
        aggregator.aggregate(partialValue);

        // update the vertex value
        setNewVertexValue(...);
    }
}

Scatter-Gather Iterations

scatter-gather模型，也被稱之為“signal/collect”模型，是另一種從Vertex角度出發(fā)的圖計(jì)算方式。該計(jì)算以同步迭代的方式進(jìn)行，每個(gè)迭代的計(jì)算都稱之為一個(gè)superstep。在每個(gè)superstep中，每個(gè)頂點(diǎn)向其他頂點(diǎn)傳播信息，并根據(jù)接收到的信息修改當(dāng)前頂點(diǎn)的值，其中傳播信息的過(guò)程稱之為scatter，也稱之為signal，接收信息并修改頂點(diǎn)值的過(guò)程稱之為gather，也叫collect。在Flink Gelly中使用scatter-gather模型，用戶只需定義每個(gè)superstep中頂點(diǎn)的以下兩種操作：

Scatter：產(chǎn)生需要傳遞給其他頂點(diǎn)的信息。
Gather：根據(jù)接收其他頂點(diǎn)的信息，更新當(dāng)前頂點(diǎn)的值。

Gelly提供了使用scatter-gather的方法，使用者只需要對(duì)應(yīng)實(shí)現(xiàn)scatter和gather的方法即可。其中ScatterFunction允許一個(gè)頂點(diǎn)向其他頂點(diǎn)發(fā)送消息。在同一個(gè)superstep中，發(fā)送出去的消息會(huì)立即被對(duì)應(yīng)的頂點(diǎn)接收到。另外一個(gè)方法是GatherFunction，該方法定義了一個(gè)頂點(diǎn)在接收消息之后如何更新當(dāng)前頂點(diǎn)的值。這兩個(gè)方法和最大迭代次數(shù)會(huì)作為參數(shù)傳遞給Gelly的runScatterGatherIteration。該方法會(huì)在輸入的圖上執(zhí)行scatter-gather迭代，并返回一個(gè)頂點(diǎn)值被修改了的新圖。
當(dāng)然，我們可以使用一些信息來(lái)擴(kuò)展scatter-gather迭代，比如總節(jié)點(diǎn)數(shù)，出度和入度等，另外還可以指定每個(gè)頂點(diǎn)的鄰接節(jié)點(diǎn)的類型，包括（入/出/所有）類型的鄰接節(jié)點(diǎn)。默認(rèn)情況下，一個(gè)節(jié)點(diǎn)只接收（入-鄰接）節(jié)點(diǎn)的消息，并向（出-鄰接）節(jié)點(diǎn)發(fā)送消息。
下面展示了使用scatter-gather迭代解決單源點(diǎn)最短路徑問(wèn)題的大致過(guò)程，一次superstep的詳細(xì)內(nèi)容。這里頂點(diǎn)1作為源點(diǎn)，在每個(gè)superstep中，每個(gè)節(jié)點(diǎn)會(huì)向（出-）鄰接節(jié)點(diǎn)發(fā)送一條候選距離的消息，該消息值為當(dāng)前頂點(diǎn)的值加上與（出）鄰接節(jié)點(diǎn)相連的邊的權(quán)重值。然后每個(gè)頂點(diǎn)在基于接收的候選距離消息的基礎(chǔ)上更新當(dāng)前頂點(diǎn)的值，即若接收到的消息中的最小值小于當(dāng)前頂點(diǎn)的值，則將改頂點(diǎn)的值改成這個(gè)最小值，否則什么都不做。在接下來(lái)的迭代中，如果一個(gè)節(jié)點(diǎn)經(jīng)過(guò)一次superstep當(dāng)前頂點(diǎn)的值沒(méi)有發(fā)生修改，則不向鄰接節(jié)點(diǎn)發(fā)送消息。當(dāng)所有頂點(diǎn)的值不會(huì)再變或者達(dá)到指定的迭代次數(shù)時(shí)，該過(guò)程收斂。

image.png

代碼示例如下：

// read the input graph
Graph<Long, Double, Double> graph = ...

// define the maximum number of iterations
int maxIterations = 10;

// Execute the scatter-gather iteration
Graph<Long, Double, Double> result = graph.runScatterGatherIteration(
            new MinDistanceMessenger(), new VertexDistanceUpdater(), maxIterations);

// Extract the vertices as the result
DataSet<Vertex<Long, Double>> singleSourceShortestPaths = result.getVertices();


// - - -  UDFs - - - //

// scatter: messaging
public static final class MinDistanceMessenger extends ScatterFunction<Long, Double, Double, Double> {

    public void sendMessages(Vertex<Long, Double> vertex) {
        for (Edge<Long, Double> edge : getEdges()) {
            sendMessageTo(edge.getTarget(), vertex.getValue() + edge.getValue());
        }
    }
}

// gather: vertex update
public static final class VertexDistanceUpdater extends GatherFunction<Long, Double, Double> {

    public void updateVertex(Vertex<Long, Double> vertex, MessageIterator<Double> inMessages) {
        Double minDistance = Double.MAX_VALUE;

        for (double msg : inMessages) {
            if (msg < minDistance) {
                minDistance = msg;
            }
        }

        if (vertex.getValue() > minDistance) {
            setNewVertexValue(minDistance);
        }
    }
}

Configuring a Scatter-Gather Iteration

我們可以使用ScatterGatherConfiguration對(duì)象來(lái)配置一個(gè)Scatter-Gather迭代過(guò)程。目前可以指定如下參數(shù)：

Name：Vertex-Centric Iteration的名稱，顯示在日志和消息中，可以通過(guò)使用setName()方法來(lái)指定。
Parallelism：Iteration的并發(fā)度，可以使用setParallelism()方法來(lái)設(shè)定。
Solution set in unmanaged memor：定義解集合是否保存在托管內(nèi)存內(nèi)（Flink 內(nèi)部以序列化的方式保存對(duì)象），默認(rèn)情況下，解集合運(yùn)行在托管內(nèi)存中。該屬性可以使用方法setSolutionSetUnmanagedMemory()來(lái)設(shè)置。
Aggregators：迭代的聚合器可以使用方法registerAggregator()來(lái)注冊(cè)，聚合器可以在每個(gè)superstep中全局地聚合一次所有的變量，并使其對(duì)接下來(lái)的superstep可用。注冊(cè)的Aggregators可以在用戶自定義的計(jì)算函數(shù)ComputeFunction內(nèi)部訪問(wèn)。
Broadcast Variables：可以使用addBroadcastSet()將數(shù)據(jù)集作為Broadcast Variables傳入到ComputeFunction。
Number of Vertices：在每個(gè)迭代過(guò)程中訪問(wèn)圖中所有的頂點(diǎn)數(shù)。可以通過(guò)方法setOptNumVertices()來(lái)設(shè)定。該值可以通過(guò)使用方法getNumberOfVertices()來(lái)取值，如果該項(xiàng)未設(shè)置，該方法默認(rèn)返回-1。
Degrees：在迭代過(guò)程中訪問(wèn)節(jié)點(diǎn)的出度/入度。該屬性值可以通過(guò)方法setOptDegrees()來(lái)設(shè)置，并通過(guò)getInDegree()和getOutDegree()方法來(lái)獲取對(duì)應(yīng)的入度和出度。如果未設(shè)置，這兩個(gè)方法默認(rèn)返回-1.
Messaging Direction：默認(rèn)情況下，消息會(huì)發(fā)送給每個(gè)頂點(diǎn)的出-鄰接節(jié)點(diǎn)，并根據(jù)入-鄰接節(jié)點(diǎn)的消息修改當(dāng)前節(jié)點(diǎn)的值。該配置允許用戶根據(jù)自己的意愿設(shè)置消息的傳播方向，可選方向有：EdgeDirection.IN, EdgeDirection.OUT, EdgeDirection.ALL，通過(guò)setDirection()方法設(shè)置方向后，消息就會(huì)沿著指定的方向EdgeDirection.IN, EdgeDirection.OUT, EdgeDirection.ALL進(jìn)行傳遞。
部分設(shè)置參數(shù)的示例如下：

Graph<Long, Double, Double> graph = ...

// configure the iteration
ScatterGatherConfiguration parameters = new ScatterGatherConfiguration();

// set the iteration name
parameters.setName("Gelly Iteration");

// set the parallelism
parameters.setParallelism(16);

// register an aggregator
parameters.registerAggregator("sumAggregator", new LongSumAggregator());

// run the scatter-gather iteration, also passing the configuration parameters
Graph<Long, Double, Double> result =
            graph.runScatterGatherIteration(
            new Messenger(), new VertexUpdater(), maxIterations, parameters);

// user-defined functions
public static final class Messenger extends ScatterFunction {...}

public static final class VertexUpdater extends GatherFunction {

    LongSumAggregator aggregator = new LongSumAggregator();

    public void preSuperstep() {

        // retrieve the Aggregator
        aggregator = getIterationAggregator("sumAggregator");
    }


    public void updateVertex(Vertex<Long, Long> vertex, MessageIterator inMessages) {

        //do some computation
        Long partialValue = ...

        // aggregate the partial value
        aggregator.aggregate(partialValue);

        // update the vertex value
        setNewVertexValue(...);
    }
}


//////////////////////////////////////////////////////////////////////////////
//The following example illustrates the usage of the degree as well as the number of vertices options.
//////////////////////////////////////////////////////////////////////////////

Graph<Long, Double, Double> graph = ...

// configure the iteration
ScatterGatherConfiguration parameters = new ScatterGatherConfiguration();

// set the number of vertices option to true
parameters.setOptNumVertices(true);

// set the degree option to true
parameters.setOptDegrees(true);

// run the scatter-gather iteration, also passing the configuration parameters
Graph<Long, Double, Double> result =
            graph.runScatterGatherIteration(
            new Messenger(), new VertexUpdater(), maxIterations, parameters);

// user-defined functions
public static final class Messenger extends ScatterFunction {
    ...
    // retrieve the vertex out-degree
    outDegree = getOutDegree();
    ...
}

public static final class VertexUpdater extends GatherFunction {
    ...
    // get the number of vertices
    long numVertices = getNumberOfVertices();
    ...
}


//////////////////////////////////////////////////////////////////////////////
//The following example illustrates the usage of the edge direction option. Vertices update their values to contain a list of all their in-neighbors.
Graph<Long, HashSet<Long>, Double> graph = ...
//////////////////////////////////////////////////////////////////////////////

// configure the iteration
ScatterGatherConfiguration parameters = new ScatterGatherConfiguration();

// set the messaging direction
parameters.setDirection(EdgeDirection.IN);

// run the scatter-gather iteration, also passing the configuration parameters
DataSet<Vertex<Long, HashSet<Long>>> result =
            graph.runScatterGatherIteration(
            new Messenger(), new VertexUpdater(), maxIterations, parameters)
            .getVertices();

// user-defined functions
public static final class Messenger extends GatherFunction {...}

public static final class VertexUpdater extends ScatterFunction {...}

Gather-Sum-Apply Iterations

與Scatter-Gather模型相似，Gather-Sum-Apply模型（以下簡(jiǎn)稱GSA）的計(jì)算也以同步迭代的方式進(jìn)行，每個(gè)迭代的計(jì)算都稱之為一個(gè)superstep。GSA的每個(gè)superstep由以下三個(gè)階段組成：

Gather：在一個(gè)頂點(diǎn)的每條邊和頂點(diǎn)的鄰接節(jié)點(diǎn)上并行調(diào)用的一個(gè)用戶定義的函數(shù)，生成中間值。
Sum：將Gather階段生成的中間值按照用戶定義的方式聚合生成一個(gè)單獨(dú)的新值。
Apply：使用一個(gè)定義的函數(shù)根據(jù)當(dāng)前值和Sum階段聚合生成的值對(duì)每個(gè)節(jié)點(diǎn)的值進(jìn)行更新。
同樣的，這里也以單源點(diǎn)最短路徑為例。如下圖所示，假設(shè)頂點(diǎn)1為源節(jié)點(diǎn)。在Gather階段，我們通過(guò)將當(dāng)前頂點(diǎn)的值與鄰接的邊上的權(quán)重求和，為每當(dāng)前頂點(diǎn)的每個(gè)鄰接頂點(diǎn)計(jì)算一個(gè)候選距離值；然后在Sum階段，根據(jù)頂點(diǎn)的ID對(duì)候選的距離值進(jìn)行g(shù)roup，并為每個(gè)頂點(diǎn)選擇一個(gè)最小的距離值；最后再Apply階段，將Sum階段為每個(gè)頂點(diǎn)選擇出的最小距離值與該頂點(diǎn)當(dāng)前的值進(jìn)行比較，如果Sum階段選擇出的最小值小于當(dāng)前頂點(diǎn)的值，則將頂點(diǎn)的當(dāng)前值替換成最小值。

image.png

與Scatter-Gather模型相似，當(dāng)一次迭代中，一個(gè)頂點(diǎn)的值未被修改，那么在下一個(gè)迭代中，該頂點(diǎn)不再計(jì)算候選距離。當(dāng)沒(méi)有節(jié)點(diǎn)的值變化時(shí)，該算法收斂。
在Gelly中，為了使用GSA，我們需要調(diào)用runGatherSumApplyIteration方法，并提供三個(gè)用戶定義的方法：GatherFunction, SumFunction 和 ApplyFunction。迭代同步、分組、值更新和收斂交由系統(tǒng)處理。示例代碼如下：

// read the input graph
Graph<Long, Double, Double> graph = ...

// define the maximum number of iterations
int maxIterations = 10;

// Execute the GSA iteration
Graph<Long, Double, Double> result = graph.runGatherSumApplyIteration(
                new CalculateDistances(), new ChooseMinDistance(), new UpdateDistance(), maxIterations);

// Extract the vertices as the result
DataSet<Vertex<Long, Double>> singleSourceShortestPaths = result.getVertices();


// - - -  UDFs - - - //

// Gather
private static final class CalculateDistances extends GatherFunction<Double, Double, Double> {

    public Double gather(Neighbor<Double, Double> neighbor) {
        return neighbor.getNeighborValue() + neighbor.getEdgeValue();
    }
}

// Sum
private static final class ChooseMinDistance extends SumFunction<Double, Double, Double> {

    public Double sum(Double newValue, Double currentValue) {
        return Math.min(newValue, currentValue);
    }
}

// Apply
private static final class UpdateDistance extends ApplyFunction<Long, Double, Double> {

    public void apply(Double newDistance, Double oldDistance) {
        if (newDistance < oldDistance) {
            setResult(newDistance);
        }
    }
}

Configuring a Gather-Sum-Apply Iteration

我們可以使用ScatterGatherConfiguration對(duì)象來(lái)配置一個(gè)GSA迭代過(guò)程。目前可以指定如下參數(shù)：

Name：Vertex-Centric Iteration的名稱，顯示在日志和消息中，可以通過(guò)使用setName()方法來(lái)指定。
Parallelism：Iteration的并發(fā)度，可以使用setParallelism()方法來(lái)設(shè)定。
Solution set in unmanaged memor：定義解集合是否保存在托管內(nèi)存內(nèi)（Flink 內(nèi)部以序列化的方式保存對(duì)象），默認(rèn)情況下，解集合運(yùn)行在托管內(nèi)存中。該屬性可以使用方法setSolutionSetUnmanagedMemory()來(lái)設(shè)置。
Aggregators：迭代的聚合器可以使用方法registerAggregator()來(lái)注冊(cè)，聚合器可以在每個(gè)superstep中全局地聚合一次所有的變量，并使其對(duì)接下來(lái)的superstep可用。注冊(cè)的Aggregators可以在用戶自定義的計(jì)算函數(shù)ComputeFunction內(nèi)部訪問(wèn)。
Broadcast Variables：可以使用addBroadcastSet()將數(shù)據(jù)集作為Broadcast Variables傳入到ComputeFunction。
Number of Vertices：在每個(gè)迭代過(guò)程中訪問(wèn)圖中所有的頂點(diǎn)數(shù)。可以通過(guò)方法setOptNumVertices()來(lái)設(shè)定。該值可以通過(guò)使用方法getNumberOfVertices()來(lái)取值，如果該項(xiàng)未設(shè)置，該方法默認(rèn)返回-1。
Neighbor Direction：與Scatter/Gather中的Message Direction類似，默認(rèn)情況下只向外傳播，可以通過(guò)方法setDirection()進(jìn)行修改，可選方向有：EdgeDirection.IN, EdgeDirection.OUT, EdgeDirection.ALL。示例代碼如下：

Graph<Long, Double, Double> graph = ...

// configure the iteration
GSAConfiguration parameters = new GSAConfiguration();

// set the number of vertices option to true
parameters.setOptNumVertices(true);

// run the gather-sum-apply iteration, also passing the configuration parameters
Graph<Long, Long, Long> result = graph.runGatherSumApplyIteration(
                new Gather(), new Sum(), new Apply(),
                maxIterations, parameters);

// user-defined functions
public static final class Gather {
    ...
    // get the number of vertices
    long numVertices = getNumberOfVertices();
    ...
}

public static final class Sum {
    ...
    // get the number of vertices
    long numVertices = getNumberOfVertices();
    ...
}

public static final class Apply {
    ...
    // get the number of vertices
    long numVertices = getNumberOfVertices();
    ...
}


//////////////////////////////////////////////////////////////////////
//The following example illustrates the usage of the edge direction option.
//////////////////////////////////////////////////////////////////////
Graph<Long, HashSet<Long>, Double> graph = ...

// configure the iteration
GSAConfiguration parameters = new GSAConfiguration();

// set the messaging direction
parameters.setDirection(EdgeDirection.IN);

// run the gather-sum-apply iteration, also passing the configuration parameters
DataSet<Vertex<Long, HashSet<Long>>> result =
            graph.runGatherSumApplyIteration(
            new Gather(), new Sum(), new Apply(), maxIterations, parameters)
            .getVertices();

Iteration Abstractions Comparison

盡管Gelly中的三個(gè)迭代模型抽象起來(lái)看著非常的相似，但是理解它們之間的差異可以幫助我們提高程序的性能和可維護(hù)性。在這三種模型中，vertex-centric模型是最通用的模型，支持對(duì)每個(gè)頂點(diǎn)進(jìn)行任意的計(jì)算和消息傳遞。scatter-gather模型將生成消息的邏輯與更新頂點(diǎn)值的邏輯解耦，因此scatter-gather模型相比較而言更易于迭代和維護(hù)，另外這兩個(gè)模塊的解耦還能對(duì)性能產(chǎn)生積極的影響。scatter-gather模型不需要并發(fā)地對(duì)消息接收的數(shù)據(jù)和發(fā)送的數(shù)據(jù)進(jìn)行處理，因此通常有著較低的內(nèi)存要求。然而，這種特性也限制了表達(dá)性，使一些計(jì)算模式表現(xiàn)得不那么直觀。當(dāng)然，如果一個(gè)算法需要一個(gè)頂點(diǎn)并發(fā)地訪問(wèn)它的接收的信息數(shù)據(jù)和發(fā)出的信息數(shù)據(jù)，那么用scatter-gather這種表達(dá)方式可能會(huì)有問(wèn)題，例如強(qiáng)連通組件分析。
GSA模型與scatter-gather也非常的相似，事實(shí)上，任何一個(gè)可以用GSA模型解決的計(jì)算問(wèn)題都可以使用scatter-gather模型來(lái)解決。其中Apply階段僅僅用來(lái)更新當(dāng)前頂點(diǎn)的值。兩種實(shí)現(xiàn)的主要區(qū)別在于GSA的Gather階段在邊上進(jìn)行并行計(jì)算，而scatter-gather的消息傳遞階段在頂點(diǎn)上進(jìn)行并行計(jì)算。另外一個(gè)區(qū)別就是在實(shí)現(xiàn)機(jī)制上，scatter-gather在內(nèi)部實(shí)現(xiàn)中用到了一個(gè)coGroup操作，而GSA使用的時(shí)reduce操作。因此，如果組合鄰居值(消息)的函數(shù)需要計(jì)算整個(gè)值組，則應(yīng)該使用scatter-gather。如果更新方法是互相關(guān)聯(lián)和交互的，那么GSA的有望提供更有效的實(shí)現(xiàn)，因?yàn)樗梢允褂媒M合器。
另外需要注意的是，GSA嚴(yán)格的工作在頂點(diǎn)的鄰接頂點(diǎn)上，而在vertex-centric和scatter-gather模型，一個(gè)頂點(diǎn)可以通過(guò)頂點(diǎn)ID向任何一個(gè)頂點(diǎn)發(fā)送消息，不管該頂點(diǎn)與當(dāng)前頂點(diǎn)是否鄰接。
三種迭代模型的主要不同如下表所示：

image.png

注：最近項(xiàng)目中圖計(jì)算的部分需求需要用到Flink Gelly，因此看了以下官網(wǎng)的文檔，時(shí)間倉(cāng)促，不足之處還請(qǐng)指正！！！

參考：https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/libs/gelly/iterative_graph_processing.html

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Flink-Gelly：Iterative Graph Processing

Flink-Gelly：Iterative Graph Processing

Vertex-Centric Iterations

Configuring a Vertex-Centric Iteration

Scatter-Gather Iterations

Configuring a Scatter-Gather Iteration

Gather-Sum-Apply Iterations

Configuring a Gather-Sum-Apply Iteration

Iteration Abstractions Comparison

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Flink-Gelly：Iterative Graph Processing

Vertex-Centric Iterations

Configuring a Vertex-Centric Iteration

Scatter-Gather Iterations

Configuring a Scatter-Gather Iteration

Gather-Sum-Apply Iterations

Configuring a Gather-Sum-Apply Iteration

Iteration Abstractions Comparison

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频