散度

2023-10-11 15:23:22

原谅我写中文太累了，而且相信在座的都有一定的英文水平。

KL散度

　　考虑某个未知分布 $p(x)$ ，假定已经使用一个近似的分布 $q(x)$ 对它进行建模。如果使用 $q(x)$ 来建立一个编码体系，用来把 $x$ 的值传给接收者，那么，由于使用了 $q(x) $ 而不是真实分布 $ p(x) $ ，因此在具体化 $x$ 的值时，需要一些附加的信息。我们需要的平均的附加信息量（单位是nat）为:

　　　　$\begin{aligned}D_{K L}(p \| q) &=-\int p(x) \log q(x)-(-\int p(x) \log p(x)) \\&=-\int p(x) \log \frac{q(x)}{p(x)} d x\end{aligned}$

　　得：

　　　　$D_{K L}(p \| q)=-\int p(X) \log \frac{q(X)}{p(X)} d X$

　　　　$D_{K L}(q \| p)=-\int q(X) \log \frac{p(X)}{q(X)} d X$

　　KL散度在一定程度上衡量了两个分布的差异，具有如下性质：

- $D_{K L}(p \| q)>=0$ ，并且当且仅当时取等号。
- 不满足对称性，即 $D_{K L}(p \| q) \neq D_{K L}(q \| p)$ ，因此选择作为衡量两个分布的差距时要慎重选择。

$\alpha$-散度

　　给定 $\alpha \in \mathbb{R}$ ，$\alpha$ 散度可以被定义为

　　　　$\frac{1}{\alpha(1-\alpha)}\left(1-\sum\limits _{x} p_{2}(x)\left(\frac{p_{1}(x)}{p_{2}(x)}\right)^{\alpha}\right)$

　　KL散度是$\alpha$ 散度的一个特例，$K L\left(P_{1}, P_{2}\right)$ , $K L\left(P_{2}, P_{1}\right)$ 分别对应 $ \alpha=1, \alpha=0$，且 $\alpha \neq 0,1 $。

　　The Amari divergence come from the above by the transformation $\alpha=\frac{1+t}{2}$.

JS散度

　　为构造出对称的形式，可以将两种 KL 散度结合起来，就是 JS 散度（Jensen-Shannon散度），表达式如下：

　　　　$D_{J S}(p \| q)=\frac{1}{2} D_{K L}\left(p \| \frac{p+q}{2}\right)+\frac{1}{2} D_{K L}\left(q \| \frac{p+q}{2}\right)$

　　性质：

JS散度是对称的。
JS散度有界，范围是 $ [0, \log 2]$ 。

F-散度

　　Given a convex function $f(t): \mathbb{R}_{\geq 0} \rightarrow \mathbb{R}$ with f$(1)=0$, $f^{\prime}(1)=0$, $f^{\prime \prime}(1)= 1$ , the $f$ -divergence on $\mathcal{P}$ is defined by

　　　　$\sum \limits _{x} p_{2}(x) f\left(\frac{p_{1}(x)}{p_{2}(x)}\right)$

The cases $f(t)=t \ln t$ correspond to the Kullback-Leibler distance.
The cases $f(t)=(t-1)^{2}$ correspond to the $\chi^{2}$ -distance.
The case $f(t)=|t-1|$ correspond to the variational distance.
The case $f(t)=4(1-\sqrt{t})$ (as well as $f(t)=2(t+1)-4 \sqrt{t})$ corresponds to the squared Hellinger metric.
The case $f(t)=(t-1)^{2} /(t+1) $ correspond to the Vajda–Kus semimetric.
The case $ f(t)=\left|t^{a}-1\right|^{1 / a}$ with $0<a \leq 1$ correspond to the generalized Matusita distance.
The case $f(t)=\frac{\left(t^{a}+1\right)^{1 / a}-2^{(1-a) / a}(t+1)}{1-1 / \alpha} $ correspond to the Osterreicher semimetric.

Harmonic mean similarity

　　The harmonic mean similarity is a similarity on $\mathcal{P}$ defined by

　　　　$2 \sum \limits _{x} \frac{p_{1}(x) p_{2}(x)}{p_{1}(x)+p_{2}(x)} .$

Fidelity similarity

　　The fidelity similarity (or Bhattacharya coefficient, Hellinger affinity) on $\mathcal{P}$ is

　　　　$\rho\left(P_{1}, P_{2}\right)=\sum_{x} \sqrt{p_{1}(x) p_{2}(x)} .$

Hellinger metric

　　In terms of the fidelity similarity $\rho$ , the Hellinger metric (or Matusita distance, Hellinger-Kakutani metric) on $\mathcal{P}$ is defined by

　　　　$\left(\sum\limits_{x}\left(\sqrt{p_{1}(x)}-\sqrt{p_{2}(x)}\right)^{2}\right)^{\frac{1}{2}}=\sqrt{2\left(1-\rho\left(P_{1}, P_{2}\right)\right)}$

Bhattacharya distance 1

　　In terms of the fidelity similarity $\rho$ , the Bhattacharya distance 1 (1946) is

　　　　$\left(\arccos \rho\left(P_{1}, P_{2}\right)\right)^{2} $
　　for $P_{1}, P_{2} \in \mathcal{P}$ . Twice this distance is the Rao distance . It is used also in Statistics and Machine Learning, where it is called the Fisher distance.

Bhattacharya distance 2

　　The Bhattacharya distance 2(1943) on $\mathcal{P}$ is defined by

　　　　$-\ln \rho\left(P_{1}, P_{2}\right)$

$\chi^{2}$ -distance

　　The $\chi^{2}$ -distance (or Pearson $\chi^{2} $-distance) is a quasi-distance on $\mathcal{P}$ , defined by

　　　　$\sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{2}(x)}$

　　The Neyman $\chi^{2}$ -distance is a quasi-distance on $\mathcal{P} $, defined by

　　　　$\sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{1}(x)} .$

　　The half of $\chi^{2}$ -distance is also called Kagan's divergence.

　　The probabilistic symmetric $\chi^{2}$ -measure is a distance on $\mathcal{P} $, defined by

　　　　$2 \sum_{x} \frac{\left(p_{1}(x)-p_{2}(x)\right)^{2}}{p_{1}(x)+p_{2}(x)} .$

　　由于我暂时用不到剩下的，所以没写。

　　本文参考《 Encyclopedia of Distances》，需要电子书的联系博主。

　　Distances on Distribution Laws ..................................... 261

　　同时参考了另外一个”借鉴“者的博客《机器学习中的数学》

码农公寓