Facebook:BigGraph 中文文檔-評(píng)估(PyTorch)

目錄

圖嵌入是一種從圖中生成無(wú)監(jiān)督節(jié)點(diǎn)特征(node features)的方法,生成的特征可以應(yīng)用在各類機(jī)器學(xué)習(xí)任務(wù)上。現(xiàn)代的圖網(wǎng)絡(luò),尤其是在工業(yè)應(yīng)用中,通常會(huì)包含數(shù)十億的節(jié)點(diǎn)(node)和數(shù)萬(wàn)億的邊(edge)。這已經(jīng)超出了已知嵌入系統(tǒng)的處理能力。Facebook開(kāi)源了一種嵌入系統(tǒng),PyTorch-BigGraph(PBG),系統(tǒng)對(duì)傳統(tǒng)的多關(guān)系嵌入系統(tǒng)做了幾處修改讓系統(tǒng)能擴(kuò)展到能處理數(shù)十億節(jié)點(diǎn)和數(shù)萬(wàn)億條邊的圖形。

本系列為翻譯的pytouch的官方手冊(cè),希望能幫助大家快速入門(mén)GNN及其使用,全文十五篇,文中如果有勘誤請(qǐng)隨時(shí)聯(lián)系。

(一)Facebook開(kāi)源圖神經(jīng)網(wǎng)絡(luò)-Pytorch Biggraph

(二)Facebook:BigGraph 中文文檔-數(shù)據(jù)模型(PyTorch)

(三)Facebook:BigGraph 中文文檔-從實(shí)體嵌入到邊分值(PyTorch)

(四)Facebook:BigGraph 中文文檔-I/O格式化(PyTorch)

(五)Facebook:BigGraph 中文文檔-批預(yù)處理

(六)Facebook:BigGraph 中文文檔-分布式模式(PyTorch)

(七)Facebook:BigGraph 中文文檔-損失計(jì)算(PyTorch)

(八)Facebook:BigGraph 中文文檔-評(píng)估(PyTorch)


Evaluation 評(píng)估

During training, the average loss is reported for each edge bucket at each pass. Evaluation metrics can be computed on held-out data during or after training to measure the quality of trained embeddings.

在訓(xùn)練過(guò)程中,為每個(gè)邊塊每次傳入的平均損失報(bào)告。評(píng)估指標(biāo)在訓(xùn)練中或者訓(xùn)練結(jié)束時(shí)? 計(jì)算并用于評(píng)估被訓(xùn)練好的嵌入的質(zhì)量。

Offline evaluation 離線評(píng)估

The?torchbiggraph_eval?command will perform an offline evaluation of trained PBG embeddings on a validation dataset. This dataset should contain held-out data not included in the training dataset. It is invoked in the same way as the training command and takes the same arguments.

torchbiggraph_eval命令將在驗(yàn)證集上為已訓(xùn)練好的PBG嵌入執(zhí)行離線評(píng)估。這個(gè)數(shù)據(jù)集應(yīng)該包含在held-out數(shù)據(jù)集并且不包含在訓(xùn)練數(shù)據(jù)集中。命令行的調(diào)用和訓(xùn)練命令用同樣的方式,并且使用同樣的參數(shù)。

It is generally advisable to have two versions of the config file, one for training and one for evaluation, with the same parameters except for the edge paths, in order to evaluate a separate (and often smaller) set of edges. (It’s also possible to use a single config file and have it produce different output based on environment variables or other context). Training-specific config parameters (e.g., the learning rate, loss function, …) will be ignored during evaluation.

通常來(lái)說(shuō) 建議配置文件中包含兩個(gè)版本,一個(gè)用于訓(xùn)練,一個(gè)用于評(píng)估,除了邊的路徑之外,參數(shù)相同,以便讓評(píng)估一個(gè)獨(dú)立的(通常來(lái)說(shuō)更小)的邊集合上進(jìn)行。(也可以使用單個(gè)配置文件,并根據(jù)環(huán)境變量或其他上下文生成不同的輸出)。評(píng)估時(shí)將忽略訓(xùn)練特定配置參數(shù)(例如,學(xué)習(xí)率、損失函數(shù)等)。

The metrics are first reported on each bucket, and a global average is computed at the end. (If multiple edge paths are in use, metrics are computed separately for each of them but still ultimately averaged).

評(píng)估值的計(jì)算現(xiàn)在每個(gè)塊上計(jì)算,然后計(jì)算全局的平均值(如果使用了多邊路徑,則分別計(jì)算每個(gè)邊路徑的度量值,最后依舊使用平均值)。

Many metrics are statistics based on the “ranks” of the edges of the validation set. The rank of a positive edge is determined by the rank of its score against the scores of a certain number of negative edges. A rank of 1 is the “best” outcome as it means that the positive edge had a higher score than all the negatives. Higher values are “worse” as they indicate that the positive didn’t stand out.

許多度量是居于驗(yàn)證集的邊的排序做的統(tǒng)計(jì)。正白案的排序是由其相對(duì)于一定數(shù)量的負(fù)邊的得分的排序來(lái)確定的。排名為1是“最好”的結(jié)果,因?yàn)樗馕吨叺牡梅直人胸?fù)邊的得分都要高。越高的數(shù)值代表“更差”,這說(shuō)明正向樣本表現(xiàn)并不突出。

It may happen that some of the negative samples used in the rank computation are in fact other positive samples, which are expected to have a high score and may thus cause adverse effects on the rank. This effect is especially visible on smaller graphs, in particular when all other entities are used to construct the negatives. To fix it, and to match what is typically done in the literature, a so-called “filtered” rank is used in the FB15k demo script (and there only), where positive samples are filtered out when computing the rank of an edge. It is hard to scale this technique to large graphs, and thus it is not enabled globally. However, filtering is less important on large graphs as it’s less likely to see a training edge among the sampled negatives.

在一些情況下,使用的負(fù)樣本在排序計(jì)算實(shí)際上可能是其他正樣本,而本身這些正樣本期望具有較高的分值。這會(huì)引起對(duì)排序造成不利的影響。這種影響在圖相較較小的情況下比較明顯,尤其是當(dāng)所有的其他實(shí)體都被用來(lái)構(gòu)造負(fù)樣本的情況下。為了解決這個(gè)問(wèn)題并和文檔中所做的工作相匹配,F(xiàn)B15k演示腳本(僅該demo)中使用了一個(gè)叫“過(guò)濾”的排序,在計(jì)算邊緣排序時(shí)過(guò)濾出正樣本。這種技術(shù)很難擴(kuò)展到大型圖,因此無(wú)法全局啟用。然而,對(duì)于大型圖來(lái)說(shuō)過(guò)濾并不重要,因?yàn)樗惶赡茉诓蓸拥呢?fù)樣本中看到訓(xùn)練邊緣。

The metrics are:

計(jì)算指標(biāo)包括:

Mean Rank: the average of the ranks of all positives (lower is better, best is 1).

平均排序:所有正樣本的平均排序等級(jí)(越低越好,最好是1)

Mean Reciprocal Rank (MRR): the average of the?reciprocal?of the ranks of all positives (higher is better, best is 1).

平均倒數(shù)排序:所有正向排序的平均值(越高越好,最好是1)

Hits@1: the fraction of positives that rank better than all their negatives, i.e., have a rank of 1 (higher is better, best is 1).

命中@1:排名好于所有負(fù)樣本的正樣本的比例,即排名為1(越高越好,最好是1)

Hits@10: the fraction of positives that rank in the top 10 among their negatives (higher is better, best is 1).

命中@10:排名在前10的正樣本的比例(越高越好,最好是1)

Hits@50: the fraction of positives that rank in the top 50 among their negatives (higher is better, best is 1).

命中@50:排名在前50的正樣本的比例(越高越好,最好是1)

Area Under the Curve (AUC): an estimation of the probability that a randomly chosen positive scores higher than a randomly chosen negative (any?negative, not only the negatives constructed by corrupting that positive).

曲線下面積(auc):對(duì)隨機(jī)選擇的正分?jǐn)?shù)高于隨機(jī)選擇的負(fù)分?jǐn)?shù)的概率的估計(jì)。(任何負(fù)樣本,不僅是通過(guò)正樣本生成的負(fù)樣本)


Evaluation during training 線上評(píng)估

Offline evaluation is a slow process that is intended to be run after training is complete to evaluate the final model on a held-out set of edges constructed by the user. However, it’s useful to be able to monitor overfitting as training progresses. PBG offers this functionality, by calculating the same metrics as the offline evaluation before and after each pass on a small set of training edges. These stats are printed to the logs.

離線評(píng)估是一個(gè)緩慢的過(guò)程,目標(biāo)是在訓(xùn)練完成后運(yùn)行,用來(lái)完成對(duì)最終模型在held-out集合的邊上的結(jié)果評(píng)估。然而,隨著訓(xùn)練的進(jìn)行,能監(jiān)控過(guò)擬合是很有用的。PBG提供了這樣的特性,每次計(jì)算一組小的訓(xùn)練邊的集合,然后通過(guò)計(jì)算于離線評(píng)估是否相同來(lái)度量,這些數(shù)據(jù)被打印到日志中。

The metrics are computed on a set of edges that is held out automatically from the training set. To be more explicit: using this feature means that training happens on?fewer?edges, as some are excluded and reserved for this evaluation. The holdout fraction is controlled by the?eval_fraction?config parameter (setting it to zero thus disables this feature). The evaluations before and after each training iteration happen on the same set of edges, thus are comparable. Moreover, the evaluations for the same edge chunk, edge path and bucket at different epochs also use the same set of edges.

評(píng)估值是在一個(gè)邊集合中在持有的訓(xùn)練集合上自動(dòng)計(jì)算得出的,更明確的說(shuō):這個(gè)特性標(biāo)識(shí)訓(xùn)練在較少的邊上進(jìn)行,應(yīng)為有些變被預(yù)留用于此評(píng)估。持有集合的分?jǐn)?shù)由eval_fraction config參數(shù)來(lái)控制(如果要禁用,將其置為0)。?每次訓(xùn)練迭代前后的評(píng)價(jià)都發(fā)生在同一組邊上,這讓結(jié)果具有可比性。此外,對(duì)于不同迭代的同一邊緣塊、邊路徑和桶的評(píng)價(jià)也使用相同的邊集和。

Evaluation metrics are computed both before and after training each edge bucket because it provides insight into whether the partitioned training is working. If the partitioned training is converging, then the gap between the “before” and “after” statistics should go to zero over time.On the other hand, if the partitioned training is causing the model to overfit on each edge bucket (thus decreasing performance for other edge buckets) then there will be a persistent gap between the “before” and “after” statistics.

在訓(xùn)練每個(gè)邊的塊前后都會(huì)計(jì)算評(píng)估值,這樣可以觀察訓(xùn)練是否有效。如果分區(qū)訓(xùn)練正在收斂,那隨著時(shí)間推移,“before”和“after”統(tǒng)計(jì)數(shù)據(jù)之間的差值應(yīng)該為0。另外一方面,如果分區(qū)訓(xùn)練導(dǎo)致模型在每個(gè)邊桶上過(guò)擬合(這樣會(huì)降低其他邊緣桶的性能),則“before”和“after”統(tǒng)計(jì)之前將存在持續(xù)的差。

It’s possible to use different batch sizes for?same-batch?and?uniform negative sampling?by tuning the?eval_num_batch_negs?and the?eval_num_uniform_negs config parameters.

通過(guò)調(diào)整eval_num_batch_negs 和 eval_num_uniform_negs這兩配置,可以在同批次和均勻負(fù)采樣中使用不同的大小批次。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。