论文笔记：Diffusion-Convolutional Neural Networks （传播-卷积神经网络）

2022-06-16 01:08:28

Diffusion-Convolutional Neural Networks （传播-卷积神经网络）
2018-04-09 21:59:02

1. Abstract:

　　我们提出传播-卷积神经网络（DCNNs），一种处理 graph-structured data 的新模型。随着 DCNNs 的介绍，我们展示如何从 graph structured data 中学习基于传播的表示（diffusion-based representations），然后作为节点分类的有效基础。DCNNs 拥有多个有趣的性质，包括：

　　1). a latent representation for graphical data that is invariant under isomorphism;

　　2). polynomial-time prediction;

　　3). learning that can be represented as tensor operations;

　　4). efficiently implemented on a GPU.

2. Introduction:

　　处理结构化的数据是非常有挑战的。一方面，找到合适的方法来展示和探索数据的结构可以获得预测精度的提升；另一方面，找到这样的结构可能很困难，在模型中添加结构会使得预测复杂度显著的提升。

　　这个工作的目标是：设计一个灵活的模型来处理 general 类型的结构化数据，使得在改善预测精度的同时，避免复杂度的提升。为了完成这个目标，我们通过引入一种“diffusion-convolution”的操作，拓展 CNN 到 general graph-structure data。简单的说，不像传统的卷积操作那样（scanning a "square" of parameters across a grid-structured input），the diffusion-convolution operation 通过在一个 graph-structured input 上处理每一个节点，来扫描一个传播过程，以此来构建一个隐层的表示（builds a latent representation by scanning a diffusion process across each node in a graph-strucured input）。

　　这个模型是受到下面的启发：a representation that encapsulates (压缩) graph diffusion can provide a better basis for prediction than a graph itself. 图的传播可以表示为：a matrix power series, 提供了一个直观的机制来包含 entities 的内容信息（providing a straightforward mechansim for including contextual information about entities that can be computed in polynomial time and efficiently implemented on a GPU）。

　　在本文中，我们提供了一个 diffusion-convolutional neural network (DCNNs)，并且在 graphical data 的不同任务上做了验证。许多技术，包括：分类任务的结构化信息，DCNNs 提供了一种互补的方法，在节点分类任务上取得了显著的提升。

3. Model：　　

　　假设我们有 T 个 graphs g。每个 graph $G_t = （V_t, E_t）$ 是由顶点和边构成的。

　　顶点：$N_t * F$，其中 $N_t$ 是 graph 中节点的个数；

　　边：$N_t * N_t$ 的邻接矩阵 At；由此我们可以计算一个 degree-normalized transition matrix $P_t$，表示了从节点 i 到节点 j 之间跳跃的概率（that gives the probability of jumping from node i to node j in one step）。

　　这个图可以是加权的，也可以是不加权的；有向的或者无向的。节点也可以包含 labels Y。

　　我们对学习预测 Y 很感兴趣；即：来预测每一个 graph 中的每一个节点的标签；或者每一个图的 label。在每种情况下，我们可以访问一些已标注的实例（some labeled entities），我们的工作是预测剩下无标签实例的 label（our task is predict the values of the remaining unlabeled entities）。

　　DCNNs 被设计用来执行符合这种形式的任意任务。DCNN 将 graph g 作为输入，然后输出一个 hard prediction for Y 或者一个条件分布 P(Y|X)。每一个感兴趣的 entity 被转换为一个传播-卷积表示，which is a H*F real matrix defined by H hops of graph diffusion over F features, and it is defined by an H*F real-valued weight tensor $W^c$ and a nonliner differentiable function f that computes the activations. 所以，对于节点分类来说，graph t 的传播-卷积表示，$Z_t$，将会是 $N_t *H * F$ tensor, 如图1（a）所示。

　　该模型是基于一个 diffusion kernel 的，可以认为是：a measure of the level of connectivity between any two nodes in a graph when considering all paths between them, with longer paths being discounted more than shorter paths. DCNNs 没有 pooling operation.

　　Node Classfication.

　　　　考虑节点分类的任务，在 graph 中的每一个输入节点，都会进行标签的估计（a label Y is predicted for each input node in a graph）。

　　　　我们用 $P^*_t$ 表示为 $N_t * H * N_t$ tensor containing the power series of $P_t$，定义为：

　　　　传播-卷积激活 $Z_{tijk}$ for node i, hop j, and feature k of graph t is given by:

　　　　激活可以用 tensor 的形式更加精确的进行表达：

　　　　其中，圆圈这个符号代表 element-wise multiplication.

　　　该模型可以由 dense layer 来完成（the model is completed by a dense layer that connects Z to Y）。对于 Y 的 hard prediction，表示为 $\hat{Y}$，可以通过取最大的 activiation 来获得；and a conditional probability distribution P(X|Y) can be found by applying the softmax function:

　　　　This keeps the same function in the following extensions.

码农公寓

相关文章