机器学习 —— 概率图模型（Homework: Exact Inference）

2023-10-05 09:39:16

　　在前三周的作业中，我构造了概率图模型并调用第三方的求解器对器进行了求解，最终获得了每个随机变量的分布（有向图），最大后验分布（双向图）。本周作业的主要内容就是自行编写概率图模型的求解器。实际上，从根本上来说求解器并不是必要的。其作用只是求取边缘分布或者MAP，在得到联合CPD后，寻找联合CPD的最大值即可获得MAP，对每个变量进行边缘分布求取即可获得边缘分布。但是，这种简单粗暴的方法效率极其低下，对于MAP求取而言，每次得到新的evidance时都要重新搜索CPD，对于单个变量分布而言，更是对每个变量都要反复进行整体边缘化。以一个长度为6字母的单词为例，联合CPD有着多达26^6个数据，反复操作会浪费大量的计算资源。

1、团树算法初始化

　　团树算法背后的思路是分而治之。对于一组随机变量ABCDEFG,如果A和其他变量之间是独立的，那么无论是求边缘分布还是MAP都可以将A单独考虑。如果ABC联系比较紧密，CDE联系比较紧密，那么如果两个团关于C的边缘分布是相同的，则我们没有必要将ABCDE全部乘在一起再来分别求各个变量的边缘分布。因为反过来想，乘的时候也只是把对应的C乘起来，如果C的边缘分布相同，在相乘的时候其实两个团之间并没有引入其他信息，此时乘法不会对ABDE的边缘分布产生影响。团树算法的数学过程和Variable Elimination是相同的。

　　PGM在计算机中的表达是factorLists，factor的var(i)，var表示节点连接关系。val描述了factor中var的关系。cliqueTree其实是一种特殊的factorLists，它的var是clique，表示一堆聚类的var。它的val表示的还是var之间的关系。只不过此时var之间的连接不复存在了。所以clique由两个变量组成：1、cliqueTree 2、edges.

　　团树算法的初始化可以分为两个过程：1、将变量抱团；2、获取团的初始势；

　　变量抱团是一个玄学过程，因为有很多不同的抱法，而且还都是对的。比较常见的是最小边，最小割等...其实如果是人来判断很容易就能得到结果，但是使用计算机算法则要费一些功夫了。不过这不涉及我们对团树算法的理解，所以Koller教授代劳了。

　　团的初始势表示团里变量之间的关系。其算法如下，需要注意的是不能重复使用factor.因为一个factor表达了一种关系，如果两个团里都有同一个factor，那么就是...这个事情。。。你帮他重复一遍。。。等于你也有责任的，晓得吧？

 %COMPUTEINITIALPOTENTIALS Sets up the cliques in the clique tree that is

 %passed in as a parameter.

 %

 %   P = COMPUTEINITIALPOTENTIALS(C) Takes the clique tree skeleton C which is a

 %   struct with three fields:

 %   - nodes: cell array representing the cliques in the tree.

 %   - edges: represents the adjacency matrix of the tree.

 %   - factorList: represents the list of factors that were used to build

 %   the tree.

 %

 %   It returns the standard form of a clique tree P that we will use through

 %   the rest of the assigment. P is struct with two fields:

 %   - cliqueList: represents an array of cliques with appropriate factors

 %   from factorList assigned to each clique. Where the .val of each clique

 %   is initialized to the initial potential of that clique.

 %   - edges: represents the adjacency matrix of the tree.

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 function P = ComputeInitialPotentials(C)

 Input = C;

 % number of cliques

 N = length(Input.nodes);

 % initialize cluster potentials

 P.cliqueList = repmat(struct('var', [], 'card', [], 'val', []), N, 1);

 P.edges = zeros(N);

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 % YOUR CODE HERE

 %

 % First, compute an assignment of factors from factorList to cliques.

 % Then use that assignment to initialize the cliques in cliqueList to

 % their initial potentials. 

 % C.nodes is a list of cliques.

 % So in your code, you should start with: P.cliqueList(i).var = C.nodes{i};

 % Print out C to get a better understanding of its structure.

 %

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 % N_factors = length(C.factorList);

 for i = 1:N

     k = 1;

     clear clique_

      N_factors = length(Input.factorList);

     for j = 1:N_factors

         if min(ismember(Input.factorList(j).var,Input.nodes{i}))

             clique_(k) = Input.factorList(j);

             k = k+1;

             Input.factorList(j) =struct('var', [], 'card', [], 'val', []);

         end

     end

     Joint_Dis_cliq = ComputeJointDistribution(clique_);

     Joint_Dis_cliq_std = StandardizeFactors(Joint_Dis_cliq);

     P.cliqueList(i) = Joint_Dis_cliq_std;

 end

 P.edges = Input.edges;

 end

2、团树的校准

　　继续之前的例子，ABC联系比较紧密，CDE联系比较紧密，所以抱成了两个团。如果其关于C的边缘分布相同，那么我们则可以在直接对两个团求ABDE的边缘分布，而不用乘起来了。然而令人悲伤的是现实中往往C的边缘分布是不同的。这时就需要对团树进行校准，希望经过“校准”这个操作后，两边关于C达成了一致意见。显然，一棵校准后的团树求任意一个变量的边缘分布都是方便的，只要对很小规模的联合分布进行边际化就行。

　　要使得两边关于C的意见达成一致，最简单的方法就是把C在“A团”中的边缘分布乘以"E团”的势。反过来再把A在“E团”中的边缘分布乘以A团的势。那么此时C在两个团中的边缘分布就完全一样了 all = margin(C,A)*margin(C,E)。此即为团树校准的朴素想法。在数学上，团树的校准依然来自VE算法。让AB领盒饭后，C继续参加下一轮的VE。AB领盒饭剩下的C就是C在A团中的边缘分布。

　　团树校准的关键是知道消息传播的顺序。消息一般先由叶向根传递，再由根向叶传递。并且，一个团在得到其所有邻团的消息之前，不能向下一个团传递消息。消息传递顺序获取算法如下：

 %COMPUTEINITIALPOTENTIALS Sets up the cliques in the clique tree that is

 %passed in as a parameter.

 %

 %   P = COMPUTEINITIALPOTENTIALS(C) Takes the clique tree skeleton C which is a

 %   struct with three fields:

 %   - nodes: cell array representing the cliques in the tree.

 %   - edges: represents the adjacency matrix of the tree.

 %   - factorList: represents the list of factors that were used to build

 %   the tree.

 %

 %   It returns the standard form of a clique tree P that we will use through

 %   the rest of the assigment. P is struct with two fields:

 %   - cliqueList: represents an array of cliques with appropriate factors

 %   from factorList assigned to each clique. Where the .val of each clique

 %   is initialized to the initial potential of that clique.

 %   - edges: represents the adjacency matrix of the tree.

 %

 % Copyright (C) Daphne Koller, Stanford University, 2012

 function P = ComputeInitialPotentials(C)

 Input = C;

 % number of cliques

 N = length(Input.nodes);

 % initialize cluster potentials

 P.cliqueList = repmat(struct('var', [], 'card', [], 'val', []), N, 1);

 P.edges = zeros(N);

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 % YOUR CODE HERE

 %

 % First, compute an assignment of factors from factorList to cliques.

 % Then use that assignment to initialize the cliques in cliqueList to

 % their initial potentials. 

 % C.nodes is a list of cliques.

 % So in your code, you should start with: P.cliqueList(i).var = C.nodes{i};

 % Print out C to get a better understanding of its structure.

 %

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 % N_factors = length(C.factorList);

 for i = 1:N

     k = 1;

     clear clique_

      N_factors = length(Input.factorList);

     for j = 1:N_factors

         if min(ismember(Input.factorList(j).var,Input.nodes{i}))

             clique_(k) = Input.factorList(j);

             k = k+1;

             Input.factorList(j) =struct('var', [], 'card', [], 'val', []);

         end

     end

     Joint_Dis_cliq = ComputeJointDistribution(clique_);

     Joint_Dis_cliq_std = StandardizeFactors(Joint_Dis_cliq);

     P.cliqueList(i) = Joint_Dis_cliq_std;

 end

 P.edges = Input.edges;

 end

　　在获取消息传递顺序之后，则可进一步对被传递的消息进行计算。被传递的消息应为某个团对被传播变量的“所有认知”，所有认知则包括该团本身对该消息的认知，以及该团收到的“情报”。需要注意的是，向下家报告情报的时候要对所有信息进行总结，但是不能将下家告诉你的事情重复一遍。因为。。。重复一遍你也有责任的，知道吧。。。。

     while (1)

       [i,j]=GetNextCliques(P,MESSAGES);

       if i == 0

         break

       end

       to_be_summed =  setdiff(P.cliqueList(i).var,P.cliqueList(j).var);

       to_be_propogan  =  setdiff(P.cliqueList(i).var,to_be_summed);

       tmp_ = 1;

       clear factorList

       for k = 1:N

           if P.edges(i,k)==1&&k~=j&&~isempty(MESSAGES(k,i).var)

               factorList(tmp_) = MESSAGES(k,i);

               tmp_ = tmp_+1;

           end

       end

       factorList(tmp_) = P.cliqueList(i);

       MESSAGES(i,j) = ComputeMarginal(to_be_propogan,ComputeJointDistribution(factorList),[]);

     end

　　在消息完成从顶向下以及从下到上的传播后，每个团需要根据周边传来的消息进行总结。也就是把消息与本身的势相乘（消息是一种边缘分布）

 N = length(P.cliqueList);

     for i = 1:N

         tmp_ = 1;

         for k = 1:N

           if P.edges(i,k)==1

               factorList(tmp_) = MESSAGES(k,i);

               tmp_ = tmp_+1;

           end

         end

         factorList(tmp_) = P.cliqueList(i);

         belief(i) = ComputeJointDistribution(factorList);

         clear factorList

     end

　　此时，团树称为已经校准。对各个团的中的变量进行marginal就可以得到每个变量的边缘分布了。

3、基于团树的MAP估计

　　在很多时候，我们可能对单个变量的分布并不感兴趣，而是对[ABCDE]这个组合取哪个值概率最大感兴趣。这个思想可以用于信号解码，OCR，图像处理等领域。很多时候我们不关心单个像素的label是啥，只关心分割出来的像素块label是啥。这类问题称为最大后验估计（MAP）。

　　　　　　　　　　　　argmaxP(AB) = argmaxP(A)P(B|A) = argmax_a{P(A){argmax_bP(B|A)}

　　显然，从上述过程中，很容易联想到之前提到的边际。只不过这里把边际换成了argmax。P(A){argmax_bP(B|A)}的结果依旧是分布，只不过这个分布的前提是无论A取哪个值，其assignment to val都对应着argmax_b。也就是说，此时如果选择最大的val，那么assignment则对应的是argmax_ab。这种操作的意义就在于可以对一组变量的MAP分而治之，最终单个变量的MAP就是全局MAP的一部分。此时的MESSAGE计算如下：

 for i = 1:N

     P.cliqueList(i).val = log(P.cliqueList(i).val);

 end

     while (1)

       [i,j]=GetNextCliques(P,MESSAGES);

       if i == 0

         break

       end

       to_be_summed =  setdiff(P.cliqueList(i).var,P.cliqueList(j).var);

       to_be_propogan  =  setdiff(P.cliqueList(i).var,to_be_summed);

       tmp_ = 1;

       clear factorList

       for k = 1:N

           if P.edges(i,k)==1&&k~=j&&~isempty(MESSAGES(k,i).var)

               factorList(tmp_) = MESSAGES(k,i);

               tmp_ = tmp_+1;

           end

       end

       factorList(tmp_) = P.cliqueList(i);

       F = factorList;

       Joint = F(1);

         for l = 2:length(F)

             % Iterate through factors and incorporate them into the joint distribution

             Joint = FactorSum(Joint, F(l));

         end

       MESSAGES(i,j) = FactorMaxMarginalization(Joint,to_be_summed);

     end

此处对val取对数是因为在map估计时，card一般都比较大。对应的val太小不便于作乘法（OCR的card是26！！！）

　消息的综合如下：

     for i = 1:N

         tmp_ = 1;

         for k = 1:N

           if P.edges(i,k)==1

               factorList(tmp_) = MESSAGES(k,i);

               tmp_ = tmp_+1;

           end

         end

      factorList(tmp_) = P.cliqueList(i);

      F = factorList;

      belief = F(1);

         for l = 2:length(F)

             % Iterate through factors and incorporate them into the joint distribution

             belief = FactorSum(belief, F(l));

         end

      clear factorList

      Belief(i) = belief;

     end

4、总结

　　团树算法作为一种精确推理算法在VE算法的基础上大幅减小了计算量和搜索空间。但其作为一种精确推理方法，依旧有着较大局限性。下周的Homework会以实现MCMC算法为目标～就是Alpha狗用的哪个蒙特卡罗哦～敬请期待。

所有代码请点这里

码农公寓

1、团树算法初始化

2、团树的校准

3、基于团树的MAP估计

4、总结

相关文章