散度

原谅我写中文太累了,而且相信在座的都有一定的英文水平。

KL散度

  考虑某个未知分布  $p(x)$  ,假定已经使用一个近似的分布  $q(x)$  对它进行建模。如果使用  $q(x)$  来建立一个编码体系,用来把  $x$  的值传给接收者,那么,由于使用 了  $q(x) $ 而不是真实分布 $ p(x) $ ,因此在具体化  $x$  的值时,需要一些附加的信息。我们需要的平均的附加信息量(单位是nat)为:

    $\begin{aligned}D_{K L}(p \| q) &=-\int p(x) \log q(x)-(-\int p(x) \log p(x)) \\&=-\int p(x) \log \frac{q(x)}{p(x)} d x\end{aligned}$

  得:

    $D_{K L}(p \| q)=-\int p(X) \log \frac{q(X)}{p(X)} d X$

    $D_{K L}(q \| p)=-\int q(X) \log \frac{p(X)}{q(X)} d X$

  KL散度在一定程度上衡量了两个分布的差异,具有如下性质:

    • $D_{K L}(p \| q)>=0$ ,并且当且仅当时取等号。
    • 不满足对称性,即 $D_{K L}(p \| q) \neq D_{K L}(q \| p)$  ,因此选择作为衡量两个分布的差距时要慎重选择。

$\alpha$-散度

  给定 $\alpha \in \mathbb{R}$ ,$\alpha$ 散度 可以被定义为

    $\frac{1}{\alpha(1-\alpha)}\left(1-\sum\limits _{x} p_{2}(x)\left(\frac{p_{1}(x)}{p_{2}(x)}\right)^{\alpha}\right)$

  KL散度是$\alpha$ 散度的一个特例,$K L\left(P_{1}, P_{2}\right)$ ,  $K L\left(P_{2}, P_{1}\right)$ 分别对应 $ \alpha=1, \alpha=0$,且 $\alpha \neq 0,1 $。

  The Amari divergence come from the above by the transformation $\alpha=\frac{1+t}{2}$.

JS散度

  为构造出对称的形式,可以将两种 KL 散度结合起来,就是 JS 散度(Jensen-Shannon散度),表达式如下:

    $D_{J S}(p \| q)=\frac{1}{2} D_{K L}\left(p \| \frac{p+q}{2}\right)+\frac{1}{2} D_{K L}\left(q \| \frac{p+q}{2}\right)$

  性质:

  • JS散度是对称的。
  • JS散度有界,范围是 $  [0, \log 2]$ 。

F-散度

  Given a convex function  $f(t): \mathbb{R}_{\geq 0} \rightarrow \mathbb{R}$  with  f$(1)=0$, $f^{\prime}(1)=0$, $f^{\prime \prime}(1)=  1$ , the  $f$ -divergence  on  $\mathcal{P}$  is defined by

    $\sum \limits _{x} p_{2}(x) f\left(\frac{p_{1}(x)}{p_{2}(x)}\right)$

  • The cases  $f(t)=t \ln t$ correspond to the Kullback-Leibler distance.
  • The cases  $f(t)=(t-1)^{2}$ correspond to the $\chi^{2}$ -distance.
  • The case  $f(t)=|t-1|$ correspond to the  variational distance.
  • The case $f(t)=4(1-\sqrt{t})$ (as well as $f(t)=2(t+1)-4 \sqrt{t})$ corresponds to the squared Hellinger metric.
  • The case $f(t)=(t-1)^{2} /(t+1) $ correspond to the Vajda–Kus semimetric.
  • The case $ f(t)=\left|t^{a}-1\right|^{1 / a}$ with $0<a \leq 1$ correspond to the generalized Matusita distance.
  • The case $f(t)=\frac{\left(t^{a}+1\right)^{1 / a}-2^{(1-a) / a}(t+1)}{1-1 / \alpha} $ correspond to the Osterreicher semimetric.

Harmonic mean similarity

  The harmonic mean similarity is a similarity on $\mathcal{P}$ defined by

    $2 \sum \limits _{x} \frac{p_{1}(x) p_{2}(x)}{p_{1}(x)+p_{2}(x)} .$

Fidelity similarity

  The fidelity similarity (or Bhattacharya coefficient, Hellinger affinity) on $\mathcal{P}$ is

    $\rho\left(P_{1}, P_{2}\right)=\sum_{x} \sqrt{p_{1}(x) p_{2}(x)} .$

Hellinger metric

  In terms of the fidelity similarity $\rho$ , the Hellinger metric (or Matusita distance, Hellinger-Kakutani metric) on $\mathcal{P}$ is defined by

    $\left(\sum\limits_{x}\left(\sqrt{p_{1}(x)}-\sqrt{p_{2}(x)}\right)^{2}\right)^{\frac{1}{2}}=\sqrt{2\left(1-\rho\left(P_{1}, P_{2}\right)\right)}$

Bhattacharya distance 1

  In terms of the fidelity similarity $\rho$ , the Bhattacharya distance 1 (1946) is

    $\left(\arccos \rho\left(P_{1}, P_{2}\right)\right)^{2} $
  for $P_{1}, P_{2} \in \mathcal{P}$ . Twice this distance is the Rao distance  . It is used also in Statistics and Machine Learning, where it is called the Fisher distance.

Bhattacharya distance 2

  The Bhattacharya distance 2(1943) on $\mathcal{P}$ is defined by

    $-\ln \rho\left(P_{1}, P_{2}\right)$

$\chi^{2}$ -distance 

  The $\chi^{2}$ -distance (or Pearson $\chi^{2} $-distance) is a quasi-distance on $\mathcal{P}$ , defined by

    $\sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{2}(x)}$

  The Neyman $\chi^{2}$ -distance is a quasi-distance on $\mathcal{P} $, defined by

    $\sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{1}(x)} .$

  The half of $\chi^{2}$ -distance is also called Kagan's divergence.

  The probabilistic symmetric $\chi^{2}$ -measure is a distance on $\mathcal{P} $, defined by

    $2 \sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{1}(x)+p_{2}(x)} .$

  由于我暂时用不到剩下的,所以没写。

  本文参考 《 Encyclopedia  of Distances》,需要电子书的联系博主。

  Distances on Distribution Laws ..................................... 261

  同时参考了另外一个”借鉴“者的博客《机器学习中的数学

上一篇:Java统计代码段的执行时间


下一篇:SAP Cloud for Customer里新的Lead UI对Mashup集成的支持原理