openhgnn.models.Mg2vec

class Mg2vec(node_num, mg_num, emb_dimension, unigram, sample_num)[source]

This is a model mg2vec from `mg2vec: Learning Relationship-Preserving Heterogeneous Graph Representations via Metagraph Embedding<https://ieeexplore.ieee.org/document/9089251>`__

It contains following parts:

Achieve the metagraph and metagraph instances by mining the raw graph. Please go to `DataMaker-For-Mg2vec<https://github.com/null-xyj/DataMaker-For-Mg2vec>`__ for more details.

Initialize the embedding for every node and metagraph and adopt an unsupervised method to train the node embeddings and metagraph embeddings. In detail, for every node, we keep its embedding close to the metagraph it belongs to and far away from the metagraph we get by negative sampling.

Every node and meta-graph can be represented as an n-dim vector.We define the first-order loss and second-order loss. First-Order Loss is for single core node in every meta-graph. We compute the dot product of the node embedding and the positive meta-graph embedding as the true logit. Then We compute the dot product of the node embedding and the sampled negative meta-graph embedding as the neg logit. We use the binary_cross_entropy_with_logits function to compute the first-order loss. Second-Order Loss consider two core nodes in every meta-graph. First, we cancat the two node’s embedding, what is a 2n-dim vector. Then we use a 2n*n matrix and an n-dim vector to map the 2n-dim vector to an n-dim vector. The map function is showed below: .. math:

f(u,v) = RELU([u||v]W + b)

u and v means the origin embedding of the two nodes, || is the concatenation operator. W is the 2n*n matrix and b is the n-dim vector. RELU is the an activation function. f(u,v) means the n-dim vector after transforming. Then, the computation of second-order loss is the same as the first-order loss. Finally, we use a parameter alpha to balance the first-order loss and second-order loss. .. math:

L=(1-alpha)*L_1 + alpha*L_2

After we train the node embeddings, we use the embeddings to complete the relation prediction task. The relation prediction task is achieved by edge classification task. If two nodes are connected with a relation, we see the relation as an edge. Then we can adopt the edge classification to complete relation prediction task.

Parameters:
  • node_num (int) – the number of core-nodes

  • mg_num (int) – the number of meta-graphs

  • emb_dimension (int) – the embedding dimension of nodes and meta-graphs

  • unigram (float) – the frequency of every meta-graph, for negative sampling

  • sample_num (int) – the number of sampled negative meta-graph