本文結構安排
- M-NMF
- LANE
- LINE
什么是Network Embedding?
LINE
[Information Network]
An information network is defined as, where
is the set
of vertices, each representing a data object andis the
set of edges between the vertices, each representing a relationship between two data objects. Each edgeis an ordered pair
and is associated with a weight
, which indicates the strength of the relation. If
is undirected, we have
and
; if G is directed, we have
and
-
[First-order Proximity] The first-order proximity in a network is the local pairwise proximity between two vertices. For each pair of vertices linked by an edge
, the weight on that edge,
, indicates the first-order proximity between u and v. If no edge is observed between u and v, their first-order proximity is 0. The first-order proximity usually implies the similarity of two nodes in a real-world network.
LINE with First-order Proximity:The first-order proximity refers to the local pairwise proximity between the vertices in the network. For each undirected edge
, the joint probability between vertex
and
as follows:
whereis the low-dimensional vector representation of vertex
.
,where
.
And its empirical probability can be defined as,where
.
To preserve the first-order proximity we can minimize the following objective function:
whereis the distance between two distributions. We choose to minimize the KL-divergence of two probability distributions. Replacing
with KL-divergence and omitting some constants, we have:
-
[Second-order Proximity] The second-order proximity between a pair of vertices (u,v) in a network is the similarity between their neighborhood network structures. Mathematically, let
denote the first-order proximity of u with all the other vertices,then the second-order proximity between u and v is determined by the similarity between p u and p v . If no vertex is linked from/to both u and v, the second-order proximity between u and v is 0.
The second-order proximity assumes that vertices sharing many connections to other vertices are similar to each other. In this case, each vertex is also treated as a specific “context” and vertices with similar distributions over the “contexts” are assumed to be similar.
Therefore, each vertex plays two roles: the vertex itself and a specific “context” of other vertices.We introduce two vectorsand
, where
is the representation of
when it is treated as a vertex while
is the representation of
when it is treated as a specific “context”. For each directed edge
,we first define the probability of “context”
generated by vertex
as:
whereis the number of vertices or “contexts”.
,where
.
The second-order proximity assumes that vertices with similar distributions over the contexts are similar to each other. To preserve the second-order proximity, we should make the conditional distribution of the contexts
specified by the low-dimensional representation be close to the empirical distribution
.Therefore, we minimize the following objective function:
where
is the distance between two distributions.
in the objective function is to represent the prestige of vertex i in the network,which can be measured by the degree or estimated through algorithms.
The empirical distribution
is defined as
,where
is the weight of the edge
and
is the out-degree of vertex i. Here we adopt KL-divergence as the distance function:
minimize this objective
, we are able to represent every vertex
with a d-dimensional vector
-
[Large-scale Information Network Embedding] Given a large network
, the problem of Large-scale Information Network Embedding aims to represent each vertex
into a low-dimensional space
,learning a function
, where
. In the space
, both the first-order proximity and the second-order proximity between the vertices are preserved.
We adopt the asynchronous stochastic gradient algorithm (ASGD) for optimizing
,In each step, the ASGD algorithm samples a mini-batch of edges and then updates the model parameters. If an edge
is sampled, the gradient the embedding vector
of vertex i will be calculated as:
Optimizing objectives are computationally expensive,which requires the summation over the entire set of vertices when calculating the conditional probability
. To address this problem, we adopt the approach of\textbf{ negative sampling }proposed.
Update parameter :
The above is the result of optimizing , and the obtained
is the result of the second-order similarity. The optimization of
is similar to optimization of
, only one variable U needs to be updated. Just change
to
M-NMF
The objective function is not convex, and we separate the
optimization to four subproblems and iteratively optimize them, which guarantees each subproblem converges to the local minima.
objective function:
M-subproblem: With other parameters in objective function fixed leads to a standard NMF formulation,the updating rule for M is:
U-subproblem: Updating U with other parameters in objective function
fixed leads to a joint NMF problem,the updating rule is:
C-subproblem: Updating C with other parameters in objective function
fixed also leads to a standard NMF formulation,the updating rule of C is:
H-subproblem: This is the fixed point equation that the solution must satisfy at convergence. Given an initial value of H, the successive updating rule of H is:
where
LANE
see as another article of my blog
[論文閱讀——LANE-Label Informed Attributed Network Embedding原理即實現]http://www.lxweimin.com/p/1abb24bb8a04
LINE-O2 Spark實現
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import breeze.linalg._
import breeze.numerics._
import breeze.stats.distributions.Rand
import scala.math._
object LINE {
//生成一個隨機數序列List,Range是范圍,num是隨機序列個數
def RandList(Range:Int,num:Int) : List[Int] = {
var resultList:List[Int]=Nil
while (resultList.length < num){
val randomNum = (new util.Random).nextInt(Range)
if(!resultList.exists(s => s==randomNum )){
resultList=resultList:::List(randomNum)
}
}
return resultList
}
def RandNumber(Range:Int) : Int = {
val randomNum = (new util.Random).nextInt(Range)
return randomNum
}
def Sigmoid(In:Double): Double = {
var Out:Double = 1.0/(math.exp(-1.0*In)+1)
return Out
}
def main(args: Array[String]) {
if (args.length < 4) {
System.err.println("Usage: LINE <Adjacent Matrix> <Adjacent Edge> <Negative Sample> <dimension>")
System.exit(1)
}
//負采樣個數
val NS = args(2).toInt
println("Negative Sample: "+NS)
//圖嵌入的維度
val Dim = args(3).toInt
println("Embedding dimension: "+Dim)
//spark配置和上下文
val conf = new SparkConf().setAppName("LINE")
val sc = new SparkContext(conf)
//輸入鄰接矩陣
val InputFile = sc.textFile(args(0),3)
//輸入鄰接表文件
val EgdeFile = sc.textFile(args(1),3)
//輸出輸入的文件行數
val InputFileCount = InputFile.count().toInt
println("InputFileCount(number of lines): "+InputFileCount)
//隨機采樣率
val sample_rate : Double = 0.1
//負采樣哈希表的映射長度
val HashTableSize: Int = 50000
println("HashTableSize: "+HashTableSize)
//LINE O_2 的二階相似度變量
var U_vertex = DenseMatrix.rand(InputFileCount, Dim, Rand.uniform)
var U_context = DenseMatrix.rand(InputFileCount, Dim, Rand.uniform)
//鄰接矩陣RDD
val Adjacent = InputFile.map(line => line.split(",")).map(splitline => splitline.map(word => word.toDouble))
val EgdeSet = EgdeFile.map(line => line.split(",")).map(splitline => splitline.map(word => word.toDouble))
//當數據量變大,collect操作將會有崩潰 待優化點1
val AdjacentCollect = Adjacent.collect()
//鄰接矩陣的行和列
val rows = AdjacentCollect.length
val cols = AdjacentCollect(0).length
//鄰接矩陣拉長為一維向量
val flattenAdjacent = AdjacentCollect.flatten
//鄰接矩陣轉為 breeze 矩陣
val AdjacentMatrix = new DenseMatrix(cols,rows,flattenAdjacent).t
//println(Adjacent.take(10).toList)
// Adjacent.foreach{
// rdd => println(rdd.toList)
// }
//每個點的度RDD
val VertexDegree = Adjacent.map(line => line.reduce((x,y) => x+y))
//所有點的度求和
var SumOfDegree = VertexDegree.reduce((x,y)=>x+y)
//var SumOfDegree = sc.accumulator(0)
//VertexDegree.foreach(x => SumOfDegree += x)
//對點的概率進行平滑,3/4次冪
val SmoothProbability = VertexDegree.map(degree => degree/SumOfDegree).map(math.pow(_,0.75))
//求SmoothProbability的累積概率CumulativeProbability
val p : Array[Double] = SmoothProbability.collect()
val CumulativeProbability : Array[Double] = new Array[Double](InputFileCount)
for(i <- 0 to InputFileCount-1) {
var inner_sum : Double = 0.0
for(j <- 0 to i){
inner_sum = inner_sum + p(j)
}
CumulativeProbability(i) = inner_sum
}
//歸一化后的累積概率后,乘以HashTableSize并取整,可以得到0~HashTableSize之內的整數
val HashProbability : Array[Int] = new Array[Int](InputFileCount)
//累積概率的最大值
var max_cpro = CumulativeProbability(InputFileCount-1)
for(i <- 0 to InputFileCount-1)
{
HashProbability(i) = ((CumulativeProbability(i)/max_cpro)*HashTableSize).toInt
}
//點的id的哈希表
val HashTable : Array[Int] = new Array[Int](HashTableSize+1)
//循環生成哈希映射,HashTableSize大小的數組,數組內存儲的是點的id標識
for(i <- 0 to InputFileCount-1) {
if (i==0) {
var start : Int = 0
var end : Int = HashProbability(1)
for(j <- start to end) {
HashTable(j) = i
}
}
else {
var start : Int = HashProbability(i-1)
var end : Int = HashProbability(i)
for(j <- start to end) {
HashTable(j) = i
}
}
}
println("HashTable(HashTableSize):"+HashTable(HashTableSize))
val sample_num = (sample_rate*InputFileCount).toInt
println("sample_num "+sample_num)
var O2_Array: Array[Double] = new Array[Double](100)
for(iterator <- 0 to 99)
{
//println("the iterator is "+iterator)
var learningrate = 0.1
var O_2 = 0.0
//false表示無放回采樣 選取預先選定的采樣數量
var sampling = EgdeSet.takeSample(false,sample_num)
for(i <- 0 to sample_num-1)
{
var objective = 0.0
//println("i is " + i)
var row:Int = sampling(i)(0).toInt
var col:Int = sampling(i)(1).toInt
//println("row:"+row)
//println("col:"+col)
var u_j_context = U_context(col,::).t
var u_j_context_t = U_context(col,::)
var u_i_vertex = U_vertex(row,::).t
var part1=(-1)*sampling(i)(2)*u_j_context*(1-Sigmoid((u_j_context_t*u_i_vertex).toDouble))
//println("part1: "+part1)
//生成0~50000的NS個隨機數,用于挑選負采樣樣本
var negativeSampleSum = DenseVector.zeros[Double](Dim)
var RandomSet : List[Int] = RandList(50000,NS)
//println("RandomSet is:"+RandomSet)
for(j <- 0 to RandomSet.length-1){
//println(RandomSet(j))
var u_k_context = U_context(HashTable(RandomSet(j)),::).t
var u_k_context_t = U_context(HashTable(RandomSet(j)),::)
negativeSampleSum = negativeSampleSum + u_k_context*Sigmoid((u_k_context_t*u_i_vertex).toDouble)
}
//println("negativeSampleSum: "+negativeSampleSum)
var part2 = sampling(i)(2)*negativeSampleSum
//println("part2: "+part2)
var d_O2_ui = part1-part2
//println("d_O2_ui: "+d_O2_ui)
//更新u_i
var tmp1 = u_i_vertex - learningrate*(d_O2_ui)
//println(tmp1(0)+" "+tmp1(1))
// println("previous U_context(row,::): "+U_context(row,::))
for(k1 <- 0 to Dim-1){
U_vertex(row,k1) = tmp1(k1)
}
//println("after U_context(row,::): "+U_context(row,::))
var d_O2_uj_context = (-1)*sampling(i)(2)*u_i_vertex*(1-Sigmoid((u_j_context_t*u_i_vertex).toDouble))
//更新u_j'
var tmp2 = u_j_context - learningrate*(d_O2_uj_context)
for(k2 <- 0 to Dim-1){
U_context(row,k2) = tmp2(k2)
}
//更新u_k'
var negative_cal = 0.0
for(j <- 0 to RandomSet.length-1){
var u_k_context = U_context(HashTable(RandomSet(j)),::).t
var u_k_context_t = U_context(HashTable(RandomSet(j)),::)
//這兩行用于計算目標函數的值
var sigmoid_uk_ui = Sigmoid((u_k_context_t*u_i_vertex).toDouble)
negative_cal = negative_cal + math.log(sigmoid_uk_ui)
//對u_k'求導
var d_O2_uk_context = sampling(i)(2)*u_i_vertex*sigmoid_uk_ui
var tmp3 = u_k_context - learningrate*d_O2_uk_context
for(k3 <- 0 to Dim-1){
U_context(HashTable(RandomSet(j)),k3) = tmp2(k3)
}
}
//計算誤差的變化
objective = (-1)*sampling(i)(2)*(math.log(Sigmoid((u_j_context_t*u_i_vertex).toDouble)) + negative_cal)
O_2 = O_2 + objective
}
O2_Array(iterator) = O_2
}
val U2_HDFS = sc.parallelize(U_vertex.toArray,3)
val O2_HDFS = sc.parallelize(O2_Array,3)
//a(::, 2)
println("======================")
//println(formZeroToOneRandomMatrix)
//VertexDegree.saveAsTextFile("file:///usr/local/data/line")
//IndexSmoothProbability.saveAsTextFile("file:///usr/local/data/line")
//HashProbability.saveAsTextFile("file:///usr/local/data/line")
U2_HDFS.saveAsTextFile("file:///usr/local/data/U2")
O2_HDFS.saveAsTextFile("file:///usr/local/data/O2")
println("======================")
sc.stop()
}
}