优化理论12有约束优化---Rosen的梯度投影法和既约梯度法

Rosen的梯度投影法和既约梯度法

文章目录

基本概念

投影矩阵

设 M \mathbf{M} M 是 m × n ( m ≤ n ) m \times n(m \leq n) m×n(m≤n) 行满秩矩阵, M T \mathbf{M}^{T} MT 的 m m m 个列向量生成的子空间记作 V M , V M V_{\mathbf{M}}, V_{\mathbf{M}} VM​,VM​ 的正交子空间用 V M ⊥ V_{\mathbf{M}}^{\perp} VM⊥​表示,则 V M ⊥ = { x ∣ M x = 0 } V_{\mathrm{M}}^{\perp}=\{\mathbf{x} \mid \mathbf{M} \mathbf{x}=\mathbf{0}\} VM⊥​={x∣Mx=0} 是 M \mathbf{M} M 的零空间。对于任意 x ∈ R n , \mathbf{x} \in \mathbb{R}^{n}, x∈Rn, 可惟一分解为
x = p + q , p ∈ V , q ∈ V M ⊥ (1) \mathbf{x}=\mathbf{p}+\mathbf{q}, \mathbf{p} \in V, \mathbf{q} \in V_{\mathbf{M}}^{\perp}\tag{1} x=p+q,p∈V,q∈VM⊥​(1)
p , q \mathbf{p}, \mathbf{q} p,q 分别是 x \mathbf{x} x 在 V M V_{\mathbf{M}} VM​ 和 V M ⊥ V_{\mathbf{M}}^{\perp} VM⊥​ 上的投影,记
p = P x , q = Q x (2) \mathbf{p}=\mathbf{P} \mathbf{x}, \mathbf{q}=\mathbf{Q} \mathbf{x}\tag{2} p=Px,q=Qx(2)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-T7KXyep8-1622729384402)(…/…/…/AppData/Roaming/Typora/typora-user-images/image-20201220092346176.png)]

则 P , Q \mathbf{P}, \mathbf{Q} P,Q 分别是 R n \mathbb{R}^{n} Rn 到 V M V_{\mathbf{M}} VM​ 和 R n \mathbb{R}^{n} Rn 到 V M ⊥ V_{\mathbf{M}}^{\perp} VM⊥​ 上的投影矩阵。
R N = P ⊕ Q \mathbf{R}^{N}=\mathbf{P} \oplus \mathbf{Q} RN=P⊕Q
因 p \mathbf{p} p 可表示为 M T \mathbf{M}^{T} MT 的 m m m 个列向量的线性组合,即存在系数向量 η ∈ R n , \mathbf{\eta} \in \mathbb{R}^{n}, η∈Rn, 使
p = M T η (3) \mathbf{p}=\mathbf{M}^{T} \mathbf{\eta}\tag{3} p=MTη(3)
将(3)式代入(1)式,然后两端左乘 M , \mathbf{M}, M, 并注意到 M p = 0 , \mathbf{M} \mathbf{p}=\mathbf{0}, Mp=0, 则
M x = M M T η \mathbf{M x}=\mathbf{M} \mathbf{M}^{T} \mathbf{\eta} Mx=MMTη
因 M \mathbf{M} M 行满秩,有
η = ( M M T ) − 1 M x \mathbf{\eta}=\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \mathbf{x} η=(MMT)−1Mx
将其代入(3)式得
p = M T ( M M T ) − 1 M x \mathbf{p}=\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M x} p=MT(MMT)−1Mx
将上式代入(1)式得
q = [ I − M T ( M M T ) − 1 M ] x \mathbf{q}=\left[\mathbf{I}-\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M}\right] \mathbf{x} q=[I−MT(MMT)−1M]x
再根据式(2)得
P = M T ( M M T ) − 1 M Q = I − M T ( M M T ) − 1 M \begin{aligned} \mathbf{P} &=\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \\ \mathbf{Q} &=\mathbf{I}-\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \end{aligned} PQ​=MT(MMT)−1M=I−MT(MMT)−1M​
容易验证,投影矩阵 P,Q是幂等的对称矩阵。反之,一个幂等的对称矩阵必是一个投影矩阵,这就
是下面的定理。

定 义 \large\color{magenta}{\boxed{\color{brown}{定义} }} 定义​假设 P ∈ R n × n , P \in R^{n \times n}, P∈Rn×n, 如果 P T = P , P P = P , P^{T}=P, P P=P, PT=P,PP=P, 则 P P P 是投影矩阵

引 理 1 \large\color{magenta}{\boxed{\color{brown}{引理1} }} 引理1​如果 P ∈ R n × n P \in R^{n \times n} P∈Rn×n 是投影矩阵,则

(1)则 P P P 是半正定矩阵 ;
x T P x = x T P 2 x = x T P T P x = ( P x ) T ( P x ) ≥ 0 x^{T} P x=x^{T} P^{2} x=x^{T} P^{T} P x=(P x)^{T}(P x) \geq 0 xTPx=xTP2x=xTPTPx=(Px)T(Px)≥0
(2) 当且仅当 I − P I-P I−P是投影矩阵,其中 I I I是 n n n 阶的单位矩阵 ;
( I − P ) T = I T − P T = I − P ( I − P ) 2 = I − 2 P + P 2 = I − 2 P + P = I − P \begin{array}{l} (I-P)^{T}=I^{T}-P^{T}=I-P \\ (I-P)^{2}=I-2 P+P^{2}=I-2 P+P=I-P \end{array} (I−P)T=IT−PT=I−P(I−P)2=I−2P+P2=I−2P+P=I−P​
(3) 令 Q = I − P Q=I-P Q=I−P, ∀ x ∈ R n \forall x \in R^{n} ∀x∈Rn, 则 L = { P x : x ∈ R n } L=\left\{P x: x \in R^{n}\right\} L={Px:x∈Rn} 与 L ⊥ = { Q x : x ∈ R n } L^{\perp}=\left\{Q x: x \in R^{n}\right\} L⊥={Qx:x∈Rn} 是相互
正交的线性子空间,并且 ∀ x ∈ R n \forall x \in R^{n} ∀x∈Rn 可唯一的表示为
x = p + q , p ∈ L , q ∈ L ⊥ x=p+q, p \in L, q \in L^{\perp} x=p+q,p∈L,q∈L⊥

梯度投影法

考虑线性约束的非线性问题
( P )     min ⁡     f ( x ) s.t.     A x ≥ b     E x = e     x ∈ R n (4) \begin{aligned} (P) ~ ~ ~ \min &~ ~ ~ f(x)\\\text{s.t.} &~ ~ ~ Ax \geq b\\ &~ ~ ~ Ex = e\\ &~ ~ ~x ∈ R^n \end{aligned}\tag{4} (P)   mins.t.​   f(x)   Ax≥b   Ex=e   x∈Rn​(4)
A ∈ R m × n , E ∈ R I × n , b ∈ R m , e ∈ R I \begin{aligned} &A \in R^{m \times n}, E \in R^{I \times n}, b \in R^{m}, e \in R^{I} \end{aligned} ​A∈Rm×n,E∈RI×n,b∈Rm,e∈RI​

f : R n → R 1 \mathrm{f}: \mathrm{R}^{\mathrm{n}} \rightarrow \mathrm{R}^{1} f:Rn→R1有一阶偏导数 .记(4)的可行域为 S S S.

基本思想

首先,要求迭代点都是在容许集内;

其次,在每一迭代点处为了产生一个下降容许方向,希望能使用目标函数在该点的负梯度方向来生成。

具体来说:若有一容许点x,在该点处的最速下降方向就是一个容许方向的话,那干脆直接取该方向作为x点处的寻优方向;但当迭代点在可行域边界上时 ,若它的最速下降方向不是容许方向的话,则一个直观的想法就是把这个方向投影到可行域内,将它 改 造 \large\color{#70f3ff}{\boxed{\color{green}{改造}}} 改造​成一个容许方向!

优化理论12有约束优化---Rosen的梯度投影法和既约梯度法

下降容许方向的确定

在容许点 x k \mathrm{x}^{\mathrm{k}} xk 处将A, b分解, 使得 A 1 x k = b 1 , A 2 x k > b 2 A_1\mathrm{x}^{\mathrm{k}}=\mathrm{b}_1, \mathrm{A}_{2} \mathrm{x}^{\mathrm{k}}>\mathrm{b}_2 A1​xk=b1​,A2​xk>b2​ .

d ≠ 0 d\neq 0 d​=0 在 x k x^{k} xk 处的可行方向当且仅当 d d d 满足条件
A 1 d ≥ 0    ,    E d = 0 (5) A_1d≥0~~,~~ Ed =0\tag{5} A1​d≥0  ,  Ed=0(5)
记 N = ( A 1 E ) , L N = { d ∈ R n ∣ N d = 0 } N=\left(\begin{array}{l}A_1 \\ E\end{array}\right), \quad L_{N}=\left\{d \in R^{n} \mid N d=0\right\} N=(A1​E​),LN​={d∈Rn∣Nd=0} 为 N N N 的零空间,由(5)知,当 d ∈ L N d \in L_{N} d∈LN​ \ { 0 } \{ 0 \} {0}时,d 是 在 x k \boldsymbol{x}^{k} xk 处的可行方向。由假设条件知 x k \boldsymbol{x}^{k} xk 是 S S S 的正则点,因此 N N N 行满秩, Q = I − N T ( N N T ) − 1   N , Q=I-N^{\mathrm{T}}\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N}, Q=I−NT(NNT)−1 N, 为 N N N 的投影
矩阵,我们通过投影矩阵 Q Q Q 将 − ∇ f ( x k ) -\nabla f\left(x^{k}\right) −∇f(xk) 投影到 L N L_{N} LN​ 上由此构造可行下降方向。

记 N = ( A 1 E ) N=\left(\begin{array}{l}A_1 \\ E\end{array}\right) N=(A1​E​) 不妨设N行满秩 ( ( ( 否则可直接去掉多余行).设N有r行, \quad

定 理 \large\color{magenta}{\boxed{\color{brown}{定理} }} 定理​ Q = I − N T ( N N T ) − 1   N , Q=I-N^{\mathrm{T}}\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N}, Q=I−NT(NNT)−1 N, 若 Q ∇ f ( x ( k ) ) ≠ 0 , Q \nabla \mathrm{f}\left(\mathrm{x}^{(\mathrm{k})}\right) \neq 0, Q∇f(x(k))​=0,则 d = − Q ∇ f ( x ( k ) ) \mathrm{d}=-Q \nabla \mathrm{f}\left(\mathrm{x}^{(\mathrm{k})}\right) d=−Q∇f(x(k)) 是下降容许方向。

【证明】

易验证 Q T Q = [ I − N T ( N N ⊤ ) − 1   N ] T [ I − N T ( N N ⊤ ) − 1   N ] \mathrm{Q}^{\mathrm{T}} \mathrm{Q}=\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right]^{\mathrm{T}}\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right] QTQ=[I−NT(NN⊤)−1 N]T[I−NT(NN⊤)−1 N]
= I − 2   N T ( N N ⊤ ) − 1   N + N T ( N N ⊤ ) − 1   N ⋅ N T ( N N ⊤ ) − 1   N = I − 2   N T ( N N ⊤ ) − 1   N + N T ( N N ⊤ ) − 1   N = I − N T ( N N ⊤ ) − 1   N = Q \begin{aligned} &=\mathrm{I}-2 \mathrm{~N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}+\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \cdot \mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \\ &=\mathrm{I}-2 \mathrm{~N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}+\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \\ &=\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}=\mathrm{Q} \end{aligned} ​=I−2 NT(NN⊤)−1 N+NT(NN⊤)−1 N⋅NT(NN⊤)−1 N=I−2 NT(NN⊤)−1 N+NT(NN⊤)−1 N=I−NT(NN⊤)−1 N=Q​
下降 : ∇ f ( x k ) T d = − ∇ f ( x k ) T Q ∇ f ( x k ) = − ∥ Q ∇ f ( x k ) ∥ 2 2 < 0 : \quad \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)^{\mathrm{T}} \mathrm{d}=-\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)^{\mathrm{T}} \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)=-{\| \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \mathrm{\|}_{2}}^{2}<0 :∇f(xk)Td=−∇f(xk)TQ∇f(xk)=−∥Q∇f(xk)∥2​2<0 。

可行: 只要验证 A 1 d ≥ 0 , E d = 0 , A_1d\geq 0, \mathrm{Ed}=0, A1​d≥0,Ed=0,
N d = − N Q ∇ f ( x k ) = − N [ I − N T ( N N ⊤ ) − 1   N ] ∇ f ( x k ) = − [ N − N ] ∇ f ( x k ) = 0 \begin{aligned} \mathrm{Nd} &=-\mathrm{NQ} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=-\mathrm{N}\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=-[\mathrm{N}-\mathrm{N}] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)=0 \end{aligned} Nd​=−NQ∇f(xk)=−N[I−NT(NN⊤)−1 N]∇f(xk)=−[N−N]∇f(xk)=0​
Q ∇ f ( x k ) = 0 Q \nabla \mathbf{f}\left(\mathbf{x}^{\mathbf{k}}\right)=0 Q∇f(xk)=0 的情况
Q ∇ f ( x k ) = [ I − N T ( N N ⊤ ) − 1   N ] ∇ f ( x k ) = ∇ f ( x k ) − N T ( N N ⊤ ) − 1   N ∇ f ( x k ) = ∇ f ( x k ) − N T q = 0 \begin{aligned} \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) &=\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)-\mathrm{N}^{\mathrm{T}} \mathrm{q}=0 \quad \end{aligned} Q∇f(xk)​=[I−NT(NN⊤)−1 N]∇f(xk)=∇f(xk)−NT(NN⊤)−1 N∇f(xk)=∇f(xk)−NTq=0​
记 q = ( N N ⊤ ) − 1   N ∇ f ( x k ) \mathrm{q}=\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) q=(NN⊤)−1 N∇f(xk) ,对应于 N N N可将 q q q也加以分解
q = ( y z ) ( y  对应于  A 1 , z  对应于  E ) \boldsymbol{q}=\left(\begin{array}{l} \boldsymbol{y} \\ z \end{array}\right)\left(\boldsymbol{y} \text { 对应于 } \boldsymbol{A}_1, z\text { 对应于 } \boldsymbol{E}\right) q=(yz​)(y 对应于 A1​,z 对应于 E)
这样上式就成为 ∇ f ( x ( k ) ) − ( A 1 E ) T ( y z ) = 0 \quad \nabla f\left(x^{(k)}\right)-\left(\begin{array}{c}A_1\\ E\end{array}\right)^{T}\left(\begin{array}{l}y \\ z\end{array}\right)=0 ∇f(x(k))−(A1​E​)T(yz​)=0
 即  ∇ f ( x ( k ) ) − A 1 T y − E T z = 0 \text { 即 } \nabla f\left(x^{(k)}\right)-A_1^{ T} y-E^{T} z=0  即 ∇f(x(k))−A1T​y−ETz=0

Kuhn-Tucker定理

混合约束问题
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ P : minimize …
设 x ∗ x^* x∗为混合约束问题的一个极小点, I = { i ∣ g i ( x ∗ ) = 0 , i = 1 : m } \mathrm{I}=\left\{\mathrm{i|g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)=0, \mathrm{i}=1: \mathrm{m}\right\} I={i∣gi​(x∗)=0,i=1:m} .函数 f ( x ) , g 1 ( x ) … , g m ( x ) , h 1 ( x ) , … , h k ( x ) \mathrm{f}(\mathrm{x}), \mathrm{g}_{1}(\mathrm{x}) \ldots, \mathrm{g}_{\mathrm{m}}(\mathrm{x}), \mathrm{h}_{1}(\mathrm{x}), \ldots, \mathrm{h}_{k}(\mathrm{x}) f(x),g1​(x)…,gm​(x),h1​(x),…,hk​(x) 在 x ∗ \mathrm{x} ^* x∗ 处都有一阶偏导数,当 ∇ h 1 ( x ∗ ) , … , ∇ h k ( x ∗ ) , ∇ g i ( x ∗ ) ( i ∈ I ) \nabla \mathrm{h}_{1}\left(\mathrm{x}^{*}\right), \ldots, \nabla \mathrm{h}_{k}\left(\mathrm{x}^{*}\right), \nabla \mathrm{g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)(\mathrm{i} \in \mathrm{I}) ∇h1​(x∗),…,∇hk​(x∗),∇gi​(x∗)(i∈I) 线性无关, , , , 则存在实数 μ 1 , … , μ m , λ 1 , … , λ l \mu_{1}, \ldots, \mu_{\mathrm{m}}, \lambda_{1}, \ldots, \lambda_{l} μ1​,…,μm​,λ1​,…,λl​ 使得
∇ f ( x ∗ ) + [ μ 1 ∇ g 1 ( x ∗ ) + μ 2 ∇ g 2 ( x ∗ ) + … + μ m ∇ g m ( x ∗ ) ] + [ λ 1 ∇ h 1 ( x ∗ ) + λ 2 ∇ h 2 ( x ∗ ) + … + λ k ∇ h l ( x ∗ ) ] = 0 \begin{array}{l} \nabla \mathrm{f}\left(\mathrm{x}^{*}\right)+\left[\mu_{1} \nabla \mathrm{g}_{1}\left(\mathrm{x}^{*}\right)+\mu_{2} \nabla \mathrm{g}_{2}\left(\mathrm{x}^{*}\right)+\ldots+\mu_{\mathrm{m}} \nabla \mathrm{g}_{\mathrm{m}}\left(\mathrm{x}^{*}\right)\right] \\ +\left[\lambda_{1} \nabla \mathrm{h}_{1}\left(\mathrm{x}^{*}\right)+\lambda_{2} \nabla \mathrm{h}_{2}\left(\mathrm{x}^{*}\right)+\ldots+\lambda_{k} \nabla \mathrm{h}_{l}\left(\mathrm{x}^{*}\right)\right]=0 \\ \end{array} ∇f(x∗)+[μ1​∇g1​(x∗)+μ2​∇g2​(x∗)+…+μm​∇gm​(x∗)]+[λ1​∇h1​(x∗)+λ2​∇h2​(x∗)+…+λk​∇hl​(x∗)]=0​
μ i g i ( x ∗ ) = 0 , i = 1 : m , μ i ≥ 0 , i = 1 : m \mu_{\mathrm{i}} \mathrm{g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)=0, \mathrm{i}=1: \mathrm{m},\mu_{i} \geq 0, i=1: m μi​gi​(x∗)=0,i=1:m,μi​≥0,i=1:m . ( .\left(\right. .( 即 λ j \lambda_{\mathrm{j}} λj​ 可正可负, \quad 但 μ i \mu_{\mathrm{i}} μi​ 必非负 ) ) )

定 理 \large\color{magenta}{\boxed{\color{brown}{定理} }} 定理​ 从而, 若y ⩾ 0 \geqslant 0 ⩾0 的话, 则上式就是K-T条件, 从而知 X ( k ) \mathrm{X}^{(\mathrm{k})} X(k) 为KKT,点。

但若 y ⩽ 0 \mathrm{y} \leqslant 0 y⩽0, 即y中有负分量比如 y j 0 y _{\mathrm{j0}} yj0​ (可能会有多个, 任取一个) , y j 0 = m i n { y j } < 0 y _{\mathrm{j0}}=min\{y_j\}<0 yj0​=min{yj​}<0,这时将 A 1 A_1 A1​ 中与 y j 0 y _{\mathrm{j0}} yj0​ 相对应的那个行整个删去,仍记删去该行后的矩阵为N,用这个新的矩阵N构造Q,从而构造d,这时这个必为下降容许方向 。

直线搜索确定步长 α m a x \alpha_{max} αmax​

$$
\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\text{s.t.} &~ ~ ~

0\leq \alpha \leq \alpha_{max} \end{aligned}
其 中 其中 其中
\alpha_{max} = \left{\begin{matrix}
min\left{ \frac{\hat{b_i}}{\hat{d_i}}, \hat{d_i}<0\right} ,&当 \hat{d}\ngeqslant0\ +\infty ,&当\hat{d}≥0

\end{matrix}\right.
$$
设得 α ˉ \bar\alpha αˉ, 则得新的迭代点 x ( k + 1 ) = x ( k ) + α ˉ   d \mathrm{x}^{(\mathrm{k}+1)}=\mathrm{x}^{(\mathrm{k})}+\mathrm{\bar\alpha} \mathrm{~d} x(k+1)=x(k)+αˉ d

投影梯度算法步骤:

  1. 取初始可行,点 x 0 , \mathrm{x}^{0}, \quad x0, 精度 ϵ > 0 \epsilon>0 ϵ>0, 置 k = 0 \mathrm{k}=0 k=0

  2. 在x k ^{\mathrm{k}} k 处将A, b分解

  3. 构造 N N N,从而构造 Q Q Q(若 N N N空的话,就取 Q = I Q=I Q=I)

  4. 计算 d k = − Q ∇ f ( x k ) d^{k}=-Q \nabla f\left(x^{k}\right) dk=−Q∇f(xk)

  5. 若 d k = 0 , \mathrm{d}^{\mathrm{k}}=0, dk=0, 或者 d k ≤ ϵ \mathrm{d}^{\mathrm{k}}\leq \epsilon dk≤ϵ , 则 当 N N N空时停止, 当 N N N非空时计算 q = ( N N T ) − 1   N ∇ f ( x k ) \mathrm{q}=\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) q=(NNT)−1 N∇f(xk) 并相应分解为两块y , z , \mathrm{z} ,z 。若 y ⩾ 0 \mathrm{y} \geqslant 0 y⩾0,停止迭代。否则, 修正 A 1 A_1 A1​ , 将 A 1 A_1 A1​ 中与 y j 0 y _{\mathrm{j0}} yj0​ 相对应的那个行整个删去,转 3.

  6. 若 d ( k ) ≠ 0 , \mathrm{d}^{(\mathrm{k})} \neq 0, d(k)​=0, 则确定步长,求解

    $$
    \begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\text{s.t.} &~ ~ ~

    0\leq \alpha \leq \alpha_{max} \end{aligned}
    $$

设得 α k \alpha _k αk​, 则得新的迭代点 x ( k + 1 ) = x ( k ) + α k   d \mathrm{x}^{(\mathrm{k}+1)}=\mathrm{x}^{(\mathrm{k})}+\mathrm{\alpha}_k \mathrm{~d} x(k+1)=x(k)+αk​ d,置 k = k + 1 , \mathrm{k}=\mathrm{k}+1, k=k+1, 转2

例子

min ⁡ f ( x ) = x 1 2 + 4 x 2 2  s.  t . x 1 + x 2 ≥ 1 15 x 1 + 10 x 2 ≥ 12 x 1 ⩾ 0 x 2 ≥ 0 \begin{array}{l} \min \quad &f(x)=x_{1}^{2}+4 x_{2}^{2} \\ \text { s. } t . \quad &x_{1}+x_{2} \geq 1 \\ &15 x_{1}+10 x_{2} \geq 12 \\ &x_{1} \quad \geqslant 0 \\ &x_{2} \geq 0 \end{array} min s. t.​f(x)=x12​+4x22​x1​+x2​≥115x1​+10x2​≥12x1​⩾0x2​≥0​

取初始容许点 x 0 = ( 0 , 2 ) T \mathrm{x}^{0}=(0,2)^{\mathrm{T}} x0=(0,2)T

【解】
∇ f ( x ) = ( 2 x 1 8 x 2 ) , A = ( 1 1 15 10 1 0 0 1 ) , b = ( 1 12 0 0 ) \nabla f(x)=\left(\begin{array}{l} 2 x_{1} \\ 8 x_{2} \end{array}\right), A=\left(\begin{array}{cc} 1 & 1 \\ 15 & 10 \\ 1 & 0 \\ 0 & 1 \end{array}\right), b=\left(\begin{array}{l} 1 \\ 12 \\ 0 \\ 0 \end{array}\right) ∇f(x)=(2x1​8x2​​),A=⎝⎜⎜⎛​11510​11001​⎠⎟⎟⎞​,b=⎝⎜⎜⎛​11200​⎠⎟⎟⎞​
第一次迭代: x 0 → x 1 x^{0} \rightarrow x^{1} x0→x1

(1) 下降容许方向的确定

要先生成寻优方向 d 0 , d ^{0}, d0, 先求解投影矩阵 Q Q Q ,在 x 0 \mathrm{x}^{0} x0 处 :
A 1 = ( 1 0 ) , b 1 = ( 0 ) A 2 = ( 1 1 15 10 0 1 ) , b 2 = ( 1 12 0 ) \mathrm{A}_1=(1 \quad 0), \mathrm{b}_1=(0) \quad \boldsymbol{A}_2=\left(\begin{array}{ll} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{array}\right), \boldsymbol{b}_2=\left(\begin{array}{l} 1 \\ 12 \\ 0 \end{array}\right) A1​=(10),b1​=(0)A2​=⎝⎛​1150​1101​⎠⎞​,b2​=⎝⎛​1120​⎠⎞​
从而 N = ( 1 , 0 ) N=(1,0) N=(1,0)
Q = I − N T ( N N T ) − 1 N = ( 0 0 0 1 ) d 0 = − Q ∇ f ( x 0 ) = ( 0 − 16 ) ≠ 0 \begin{array}{l} Q=I-N^{T}\left(N N^{T}\right)^{-1} N=\left(\begin{array}{ll} 0 & 0 \\ 0 & 1 \end{array}\right) \\ d^{0}=-Q \nabla f\left(x^{0}\right)=\left(\begin{array}{l} 0 \\ -16 \end{array}\right) \neq 0 \end{array} Q=I−NT(NNT)−1N=(00​01​)d0=−Q∇f(x0)=(0−16​)​=0​
(2)精确线搜索:

计算 b ^ = b 2 − A 2 x 0 = [ 1 12 0 ] − [ 1 1 15 10 0 1 ] [ 0 2 ] = [ − 1 − 8 − 2 ] ,    d ^ = A 2 d 0 = [ 1 1 15 10 0 1 ] [ 0 − 16 ] = [ − 16 − 160 − 16 ] \hat{b} =b_2-A_2x^{0} = \begin{bmatrix} 1 \\ 12 \\ 0 \end{bmatrix}- \begin{bmatrix} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 0\\2 \end{bmatrix} =\begin{bmatrix} -1\\-8\\-2 \end{bmatrix} ,~~ \hat{d}= A_2 d^0=\begin{bmatrix} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 0\\-16 \end{bmatrix} =\begin{bmatrix} -16\\-160\\-16 \end{bmatrix} b^=b2​−A2​x0=⎣⎡​1120​⎦⎤​−⎣⎡​1150​1101​⎦⎤​[02​]=⎣⎡​−1−8−2​⎦⎤​,  d^=A2​d0=⎣⎡​1150​1101​⎦⎤​[0−16​]=⎣⎡​−16−160−16​⎦⎤​

由于 d ^ ≱ 0 \hat{d}\ngeqslant0 d^0,计算
α m a x = m i n { − 1 − 16 , − 8 − 160 , − 2 − 16 } = 1 20 \alpha_{max} = min\left\{ \frac{{-1}}{{-16}},\frac{{-8}}{{-160}},\frac{{-2}}{{-16}}\right\} =\frac{{1}}{{20}} αmax​=min{−16−1​,−160−8​,−16−2​}=201​
而 x 0 + α d 0 = ( 0 2 ) + α ( 0 − 16 ) = ( 0 2 − 16 α ) x^{0}+\alpha d^0=\begin{pmatrix}0 \\ 2 \end{pmatrix} +\alpha \begin{pmatrix}0 \\ -16 \end{pmatrix} =\begin{pmatrix}0 \\2-16\alpha \end{pmatrix} x0+αd0=(02​)+α(0−16​)=(02−16α​)
$$
\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{0}+\alpha d^0)=4(2-16\alpha )^2\\text{s.t.} &~ ~ ~

0\leq \alpha \leq \frac{{1}}{{20}} \end{aligned}
$$
令 f ( x 0 + α d 0 ) ′ = 0 , 0 ≤ α ≤ 1 20 f(x^{0}+\alpha d^0)'=0, 0\leq \alpha \leq \frac{{1}}{{20}} f(x0+αd0)′=0,0≤α≤201​解得 α 0 = 1 20 \alpha_0= \frac{{1}}{{20}} α0​=201​.

则 $x{1}=x{0}+\alpha_1d^0=\begin{pmatrix}0
\ 1.2 \end{pmatrix} $

第二次迭代: x 1 → x 2 x^{1} \rightarrow x^{2} x1→x2

(1) 下降容许方向的确定

在 x 1 \mathrm{x}^{1} x1 处2, 3 约束有效 ,则
A 1 = ( 15 10 1 0 ) , b 1 = ( 12 0 ) A 2 = ( 1 1 0 1 ) , b 2 = ( 1 0 ) \quad A_1=\left(\begin{array}{cc}15 & 10 \\ 1 & 0\end{array}\right), \boldsymbol{b}_1=\left(\begin{array}{l}12 \\ 0\end{array}\right) {A}_2=\left(\begin{array}{ll} 1 & 1 \\ 0 & 1 \end{array}\right), \boldsymbol{b}_2=\left(\begin{array}{l} 1 \\ 0 \end{array}\right) A1​=(151​100​),b1​=(120​)A2​=(10​11​),b2​=(10​)

从而
$$
\begin{array}{}
N=\left(\begin{array}{ll}15 & 10 \ 1 & 0\end{array}\right) \
Q=I-N^{T}\left(N N{T}\right){-1} N \ =

\left(\begin{array}{ll}1 & 0 \
0 & 1\end{array}\right)
-\left(\begin{array}{ll}15 & 1 \ 10 & 0\end{array}\right) \left [ \left(\begin{array}{ll}15 & 10 \ 1 & 0\end{array}\right) \left(\begin{array}{ll}15 & 1 \ 10 & 0\end{array}\right) \right ]^{-1} \left(\begin{array}{ll}15 & 10 \ 1 & 0\end{array}\right)=0

\

\text { 从而 } d=-Q \nabla f\left(x^{(1)}\right)=0

\end{array}
计 算 计算 计算
\boldsymbol{q}=\left(\boldsymbol{N} \boldsymbol{N}{\boldsymbol{T}}\right){-1} \boldsymbol{N} \nabla \boldsymbol{f}\left(\boldsymbol{x}^{(1)}\right)=\left(\begin{array}{l}
0.96 \
-14.4
\end{array}\right)
$$
其中第二个分量小于0,故删去 A 1 A_1 A1​中对应的行,从而有新的 A 1 = ( 15 10 ) , \mathrm{A}_1=(15 \quad 10), A1​=(1510), 故有新的 N = ( 15 10 ) \mathrm{N}=(15 \quad 10) N=(1510)

故有新的 Q = I − N T ( N N T ) − 1 N Q=\boldsymbol{I}-\boldsymbol{N}^{T}\left(\boldsymbol{N N}^{T}\right)^{-1} \boldsymbol{N} Q=I−NT(NNT)−1N
= ( 1 0 0 1 ) − ( 15 10 ) [ ( 15 10 ) ( 15 10 ) ] − 1 ( 15 10 ) =\left(\begin{array}{cc}1 & 0 \\ 0 & 1\end{array}\right)-\left(\begin{array}{c}15 \\ 10\end{array}\right)\left[(15 \quad 10)\left(\begin{array}{c}15 \\ 10\end{array}\right)\right]^{-1}(15 \quad 10) =(10​01​)−(1510​)[(1510)(1510​)]−1(1510)
= ( 0.3046 − 0.4635 − 0.4635 0.6923 ) =\left(\begin{array}{cc}0.3046 & -0.4635 \\ -0.4635 & 0.6923\end{array}\right) =(0.3046−0.4635​−0.46350.6923​)

从而有寻优方向 d 1 = − Q ∇ f ( x 1 ) = ( 4.4307 − 6.6462 ) d^{1}=-Q \nabla f\left(x^{1}\right)=\left(\begin{array}{l}4.4307 \\ -6.6462\end{array}\right) d1=−Q∇f(x1)=(4.4307−6.6462​)

x 1 \mathrm{x}^{1} x1 沿着该方向作线性搜索可得新的点 x 2 = ( 0.4 , 0.6 ) T \mathrm{x}^{2}=(0.4,0.6)^{\mathrm{T}} x2=(0.4,0.6)T

既约梯度法

既约梯度法首先由Wolfe提出,解决具有线性约束的非线性规划问题。1969年由Abadie与Carpentier推广至非线性约束;

基本思想:借鉴求解线性规划的单纯形算法,选择某些变量为基变量,其它的作为非基变量,将基变量用非基变量表示,并从目标函数中消去基变量,得到以非基变量为自变量的简化的目标函数,进而利用此函数的负梯度构造下降可行方向。

既约梯度:简化目标函数关于非基变量的梯度。

考虑以下问题
$$
\begin{aligned}
&\min f(x)\
&\begin{array}{l}
\text { s.t. } \quad A x=b, \quad x \geq 0 \

\end{array}
\end{aligned}
$$
其中 A m × n , r ( A ) = m , b m × 1 , m ≤ n A_{m \times n}, r(A)=m, b_{m \times 1} , m \leq n Am×n​,r(A)=m,bm×1​,m≤n

思想:降维
$$
\begin{array}{ll}
\min &f\left(x_{B}, x_{N}\right) \
\text { s.t. } &B x_{B}+N x_{N}=b\
\quad& x_{B}, x_{N} \geq 0
\end{array} ~~~~~~~~~~
\Leftrightarrow ~~~~~~~~~~
\begin{array}{}

\min &F\left(x_{N}\right)\
\text { s.t. }& x_{B}, x_{N} \geq 0
\end{array}
B 可 逆 方 阵 B可逆方阵 B可逆方阵
A=(B ~~~N), \quad x=\left(\begin{array}{l}
x_{B} \
x_{N}
\end{array}\right) \quad \begin{array}{l}
x_{B}=B^{-1} b-B^{-1} N x_{N} \
\Rightarrow F\left(x_{N}\right)=f\left(x_{B}\left(x_{N}\right), x_{N}\right)
\end{array}
KaTeX parse error: Can't use function '$' in math mode at position 2: $̲f(x)$ 的既约梯度为
\begin{array}{l}
r\left(x_{N}\right)=\nabla F\left(x_{N}\right) \
=\nabla_{x_{N}} f(x)-\left(B^{-1} N\right)^{T} \nabla_{x_{B}} f(x)
\end{array}
$$

确定下降可行方向 d ( k ) d^{(k)} d(k)

  • 下降方向 d ( k ) d^{(k)} d(k) : ∇ f ( x ) T d ( k ) < 0 ∇f(x)^Td^{(k)}< 0 ∇f(x)Td(k)<0
  • 可行方向 d ( k ) d^{(k)} d(k): , A d = 0 , d j ≤ 0 ,Ad=0,d_j\leq 0 ,Ad=0,dj​≤0 ,如果 x j = 0 x_j = 0 xj​=0

d ( k ) = ( d B ( k ) d N ( k ) ) , d B ( k ) d^{(k)}=\left(\begin{array}{l}d_{B}^{(k)} \\ d_{N}^{(k)}\end{array}\right), \quad d_{B}^{(k)} d(k)=(dB(k)​dN(k)​​),dB(k)​ 和 d N ( k ) d_{N}^{(k)} dN(k)​ 分别对应基变量和非基变量。

定义 d N ( k ) , d_{N}^{(k)}, dN(k)​, 使得
d N j ( k ) = { − x N j ( k ) r j ( x N ( k ) )  当  r j ( x N ( k ) ) > 0 − r j ( x N ( k ) )  当  r j ( x N ( k ) ) ≤ 0 \begin{aligned} d_{N_{j}}^{(k)} &=\left\{\begin{array}{ll} -x_{N_{j}}^{(k)} r_{j}\left(x_{N}^{(k)}\right) & \text { 当 } r_{j}\left(x_{N}^{(k)}\right)>0 \\ -r_{j}\left(x_{N}^{(k)}\right) & \text { 当 } r_{j}\left(x_{N}^{(k)}\right) \leq 0 \end{array}\right. \end{aligned} dNj​(k)​​=⎩⎨⎧​−xNj​(k)​rj​(xN(k)​)−rj​(xN(k)​)​ 当 rj​(xN(k)​)>0 当 rj​(xN(k)​)≤0​​
为得到可行方向, 应 有 A d ( k ) = 0 , 应 有 A d^{(k)}=0, 应有Ad(k)=0, 即 B d B ( k ) + N d N ( k ) = 0 \quad B d_{B}^{(k)}+N d_{N}^{(k)}=0 BdB(k)​+NdN(k)​=0
 取  d B ( k ) = − B − 1 N d N ( k ) ⇒ d ( k ) = ( − B − 1 N d N ( k ) d N ( k ) ) \text { 取 } \quad d_{B}^{(k)}=-B^{-1} N d_{N}^{(k)} \Rightarrow \quad d^{(k)}=\left(\begin{array}{c} -B^{-1} N d_{N}^{(k)} \\ d_{N}^{(k)} \end{array}\right)  取 dB(k)​=−B−1NdN(k)​⇒d(k)=(−B−1NdN(k)​dN(k)​​)
定 理 \large\color{magenta}{\boxed{\color{brown}{定理} }} 定理​ 设 x x x 是可行解, A = ( B , N ) A=(B, N) A=(B,N) 是 m × n m \times n m×n 矩阵, B B B 为 m m m 阶可逆矩阵, x = ( x B T , x N T ) T , x B > 0 , x=\left(x_{B}^{T}, x_{N}^{T}\right)^{T}, x_{B}>0, x=(xBT​,xNT​)T,xB​>0, 函数 f f f
在点x处可微,又设
d = ( − B − 1 N d N d N ) d=\left(\begin{array}{c} -B^{-1} N d_{N} \\ d_{N} \end{array}\right) d=(−B−1NdN​dN​​)
其中
d N j = { − x N j r j ( x N ) r j ( x N ) > 0 − r j ( x N ) r j ( x N ) ≤ 0 d_{N_{j}}=\left\{\begin{array}{ll} -x_{N_{j}} r_{j}\left(x_{N}\right) & r_{j}\left(x_{N}\right)>0 \\ -r_{j}\left(x_{N}\right) & r_{j}\left(x_{N}\right) \leq 0 \end{array}\right. dNj​​={−xNj​​rj​(xN​)−rj​(xN​)​rj​(xN​)>0rj​(xN​)≤0​
如果 d ≠ 0 , d \neq 0, d​=0, 则 d d d 是下降可行方向, 而且 d = 0 d=0 d=0 的充要条件是x为KKT点。

直线搜索确定步长 α m a x \alpha_{max} αmax​

x k ∈ S , A x k = b , x B k > 0 , x N k ≥ 0 , A d k = 0 x k + α d k ∈ S x k + 1 ≥ 0 , x j k + 1 = x j k + α d j k ≥ 0 , j = 1 , ⋯   , n \begin{array}{l} x^{k} \in S, A x^{k}=b, x_{B}^{k}>0, x_{N}^{k} \geq 0, A d^{k}=0 \\ x^{k}+\alpha d^{k} \in S \\ x^{k+1} \geq 0, x_{j}^{k+1}=x_{j}^{k}+\alpha d_{j}^{k} \geq 0, j=1, \cdots, n \end{array} xk∈S,Axk=b,xBk​>0,xNk​≥0,Adk=0xk+αdk∈Sxk+1≥0,xjk+1​=xjk​+αdjk​≥0,j=1,⋯,n​

(1) 如果 d j k ≥ 0 , d_{j}^{k} \geq 0, djk​≥0, 则 x j k + 1 ≥ 0 , ∀ α > 0 x_{j}^{k+1} \geq 0, \forall \alpha>0 xjk+1​≥0,∀α>0.

(2) 如果 d j k < 0 , d_{j}^{k}<0, djk​<0, 则 α ≤ x j k − d j k \alpha \leq \frac{x_{j}^{k}}{-d_{j}^{k}} α≤−djk​xjk​​.

让 α max ⁡ = { ∞ , d k ≥ 0 , min ⁡ { x j k − d j k ∣ d j k < 0 } ,  else.  \alpha_{\max }=\left\{\begin{array}{ll}\infty, & d^{k} \geq 0, \\ \min \left\{\frac{x_{j}^{k}}{-d_{j}^{k}} \mid d_{j}^{k}<0\right\}, & \text { else. }\end{array}\right. αmax​=⎩⎨⎧​∞,min{−djk​xjk​​∣djk​<0},​dk≥0, else. ​

于是精确线搜索为:
$$
\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\text{s.t.} &~ ~ ~

0\leq \alpha \leq \alpha_{max} \end{aligned}\tag 7
$$

例子

min ⁡ f ( x ) = 2 x 1 2 + x 2 2  s.t.  x 1 − x 2 + x 3 = 2 − 2 x 1 + x 2 + x 4 = 1 x 1 , x 2 , x 3 , x 4 ≥ 0 \begin{array}{l} \min \quad f(x)=2 x_{1}^{2}+x_{2}^{2} \\ \text { s.t. } \quad x_{1}-x_{2}+x_{3} \quad=2 \\ -2 x_{1}+x_{2}+x_{4}=1 \\ x_{1}, x_{2}, x_{3}, x_{4} \geq 0 \end{array} minf(x)=2x12​+x22​ s.t. x1​−x2​+x3​=2−2x1​+x2​+x4​=1x1​,x2​,x3​,x4​≥0​

解:
∇ f ( x ) = ( 4 x 1 2 x 2 0 0 ) , A = ( 1 − 1 1 0 − 2 1 0 1 ) , b = ( 2 1 ) \nabla f(x)=\left(\begin{array}{c} 4 x_{1} \\ 2 x_{2} \\ 0 \\ 0 \end{array}\right), A=\left(\begin{array}{rrrr} 1 & -1 & 1 & 0 \\ -2 & 1 & 0 & 1 \end{array}\right), b=\left(\begin{array}{l} 2 \\ 1 \end{array}\right) ∇f(x)=⎝⎜⎜⎛​4x1​2x2​00​⎠⎟⎟⎞​,A=(1−2​−11​10​01​),b=(21​)
第一次迭代:

(1) x 1 = ( 1 3 4 0 ) x^{1}=\left(\begin{array}{c}1 \\ 3 \\ 4 \\ 0\end{array}\right) x1=⎝⎜⎜⎛​1340​⎠⎟⎟⎞​ ∈ \in ∈ S S S,
∇ f ( x 1 ) = ( 4 6 0 0 ) \nabla f\left(x^{1}\right)=\left(\begin{array}{c}4 \\ 6 \\ 0 \\ 0\end{array}\right) ∇f(x1)=⎝⎜⎜⎛​4600​⎠⎟⎟⎞​
x B = ( x 2 x 3 ) = ( 3 4 ) , x N = ( x 1 x 4 ) = ( 1 0 ) x_{B}=\left(\begin{array}{c}x_{2} \\ x_{3}\end{array}\right)=\left(\begin{array}{l}3 \\ 4\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{4}\end{array}\right)=\left(\begin{array}{l}1 \\ 0\end{array}\right) xB​=(x2​x3​​)=(34​),xN​=(x1​x4​​)=(10​)
B = ( − 1 1 1 0 ) , B − 1 = ( 0 1 1 1 ) , N = ( 1 0 − 2 1 ) B=\left(\begin{array}{rr}-1 & 1 \\ 1 & 0\end{array}\right), B^{-1}=\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right) B=(−11​10​),B−1=(01​11​),N=(1−2​01​)
( r N 1 ) T = ∇ N f ( x 1 ) T − ∇ B f ( x 1 ) T B − 1 N = ( 4 0 ) T − ( 6 0 ) T ( 0 1 1 1 ) ( 1 0 − 2 1 ) = ( 16 − 6 ) T \begin{aligned} \left(r_{N}^{1}\right)^{T} &=\nabla_{N} f\left(x^{1}\right)^{T}-\nabla_{B} f\left(x^{1}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{c} 4 \\ 0 \end{array}\right)^{T}-\left(\begin{array}{c} 6 \\ 0 \end{array}\right)^{T}\left(\begin{array}{ll} 0 & 1 \\ 1 & 1 \end{array}\right)\left(\begin{array}{rr} 1 & 0 \\ -2 & 1 \end{array}\right) \\ &=\left(\begin{array}{c} 16 \\ -6 \end{array}\right)^{T} \end{aligned} (rN1​)T​=∇N​f(x1)T−∇B​f(x1)TB−1N=(40​)T−(60​)T(01​11​)(1−2​01​)=(16−6​)T​

d N 1 = ( d 1 1 d 4 1 ) = ( − 16 6 ) d B 1 = ( d 2 1 d 3 1 ) = − ( 0 1 1 1 ) ( 1 0 − 2 1 ) ( − 16 6 ) = ( − 38 − 22 ) d 1 = ( − 16 − 38 − 22 6 ) \begin{aligned} d_{N}^{1} &=\left(\begin{array}{c} d_{1}^{1} \\ d_{4}^{1} \end{array}\right)=\left(\begin{array}{r} -16 \\ 6 \end{array}\right) \\ d_{B}^{1} &=\left(\begin{array}{c} d_{2}^{1} \\ d_{3}^{1} \end{array}\right)=-\left(\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\right)\left(\begin{array}{rr} 1 & 0 \\ -2 & 1 \end{array}\right)\left(\begin{array}{r} -16 \\ 6 \end{array}\right) \\ &=\left(\begin{array}{r} -38 \\ -22 \end{array}\right) \\ d^{1} &=\left(\begin{array}{r} -16 \\ -38 \\ -22 \\ 6 \end{array}\right) \end{aligned} dN1​dB1​d1​=(d11​d41​​)=(−166​)=(d21​d31​​)=−(01​11​)(1−2​01​)(−166​)=(−38−22​)=⎝⎜⎜⎛​−16−38−226​⎠⎟⎟⎞​​

α max ⁡ = min ⁡ { − 1 − 16 , − 3 − 38 , − 4 − 22 } = 1 16 x 1 + α d 1 = ( 1 3 4 0 ) + α ( − 16 − 38 − 22 6 ) = ( 1 − 16 α 3 − 38 α 4 − 22 α 6 α ) f ( x 1 + α d 1 ) = 2 ( 1 − 16 α ) 2 + ( 3 − 38 α ) 2 \begin{array}{l} \alpha_{\max }=\min \left\{\frac{-1}{-16}, \frac{-3}{-38}, \frac{-4}{-22}\right\}=\frac{1}{16} \\ x^{1}+\alpha d^{1}=\left(\begin{array}{c} 1 \\ 3 \\ 4 \\ 0 \end{array}\right)+\alpha\left(\begin{array}{r} -16 \\ -38 \\ -22 \\ 6 \end{array}\right)=\left(\begin{array}{c} 1-16 \alpha \\ 3-38 \alpha \\ 4-22 \alpha \\ 6 \alpha \end{array}\right) \\ f\left(x^{1}+\alpha d^{1}\right)=2(1-16 \alpha)^{2}+(3-38 \alpha)^{2} \end{array} αmax​=min{−16−1​,−38−3​,−22−4​}=161​x1+αd1=⎝⎜⎜⎛​1340​⎠⎟⎟⎞​+α⎝⎜⎜⎛​−16−38−226​⎠⎟⎟⎞​=⎝⎜⎜⎛​1−16α3−38α4−22α6α​⎠⎟⎟⎞​f(x1+αd1)=2(1−16α)2+(3−38α)2​

直线搜索确定步长 α m a x \alpha_{max} αmax​
min ⁡ f ( x 1 + α d 1 ) = 2 ( 1 − 16 α ) 2 + ( 3 − 38 α ) 2  s.t.  0 ≤ α ≤ 1 16 \begin{array}{ll} \min & f\left(x^{1}+\alpha d^{1}\right)=2(1-16 \alpha)^{2}+(3-38 \alpha)^{2} \\ \text { s.t. } & 0 \leq \alpha \leq \frac{1}{16} \end{array} min s.t. ​f(x1+αd1)=2(1−16α)2+(3−38α)20≤α≤161​​
得到 α 1 = 1 16 \alpha_{1}=\frac{1}{16} α1​=161​

第二次迭代:
x 2 = ( 0 5 8 21 8 3 8 ) , ∇ f ( x 2 ) = ( 0 5 4 0 0 ) x^{2}=\left(\begin{array}{c}0 \\ \frac{5}{8} \\ \frac{21}{8} \\ \frac{3}{8}\end{array}\right), \nabla f\left(x^{2}\right)=\left(\begin{array}{c}0 \\ \frac{5}{4} \\ 0 \\ 0\end{array}\right) x2=⎝⎜⎜⎛​085​821​83​​⎠⎟⎟⎞​,∇f(x2)=⎝⎜⎜⎛​045​00​⎠⎟⎟⎞​
x B = ( x 2 x 3 ) = ( 5 8 21 8 ) , x N = ( x 1 x 4 ) = ( 0 3 8 ) x_{B}=\left(\begin{array}{c}x_{2} \\ x_{3}\end{array}\right)=\left(\begin{array}{r}\frac{5}{8} \\ \frac{21}{8}\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{4}\end{array}\right)=\left(\begin{array}{c}0 \\ \frac{3}{8}\end{array}\right) xB​=(x2​x3​​)=(85​821​​),xN​=(x1​x4​​)=(083​​)
B = ( − 1 1 1 0 ) , B − 1 = ( 0 1 1 1 ) , N = ( 1 0 − 2 1 ) B=\left(\begin{array}{rr}-1 & 1 \\ 1 & 0\end{array}\right), B^{-1}=\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right) B=(−11​10​),B−1=(01​11​),N=(1−2​01​)

( r N 2 ) T = ∇ N f ( x 2 ) T − ∇ B f ( x 2 ) T B − 1 N = ( 0 0 ) T − ( 5 4 0 ) T ( 0 1 1 1 ) ( 1 0 − 2 1 ) = ( 5 2 − 5 4 ) T \begin{aligned}\left(r_{N}^{2}\right)^{T} &=\nabla_{N} f\left(x^{2}\right)^{T}-\nabla_{B} f\left(x^{2}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{l}0 \\ 0\end{array}\right)^{T}-\left(\begin{array}{c}\frac{5}{4} \\ 0\end{array}\right)^{T}\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right)\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right) \\ &=\left(\begin{array}{r}\frac{5}{2} \\ -\frac{5}{4}\end{array}\right)^{T} \end{aligned} (rN2​)T​=∇N​f(x2)T−∇B​f(x2)TB−1N=(00​)T−(45​0​)T(01​11​)(1−2​01​)=(25​−45​​)T​

d N 2 = ( d 1 2 d 4 2 ) = ( 0 5 4 ) d B 2 = ( d 2 2 d 3 2 ) = − ( 0 1 1 1 ) ( 1 0 − 2 1 ) ( 0 5 4 ) = ( − 5 4 5 4 ) d 2 = ( 0 − 5 4 − 5 4 − 5 4 ) \begin{aligned} d_{N}^{2} &=\left(\begin{array}{c}d_{1}^{2} \\ d_{4}^{2}\end{array}\right)=\left(\begin{array}{c}0 \\ \frac{5}{4}\end{array}\right) \\ d_{B}^{2} &=\left(\begin{array}{c}d_{2}^{2} \\ d_{3}^{2}\end{array}\right)=-\left(\begin{array}{cc}0 & 1 \\ 1 & 1\end{array}\right)\left(\begin{array}{rc}1 & 0 \\ -2 & 1\end{array}\right)\left(\begin{array}{c}0 \\ \frac{5}{4}\end{array}\right) \\ &=\left(\begin{array}{r}-\frac{5}{4} \\ \frac{5}{4}\end{array}\right) \\ d^{2} &=\left(\begin{array}{r}0 \\ -\frac{5}{4} \\ -\frac{5}{4} \\ -\frac{5}{4}\end{array}\right) \end{aligned} dN2​dB2​d2​=(d12​d42​​)=(045​​)=(d22​d32​​)=−(01​11​)(1−2​01​)(045​​)=(−45​45​​)=⎝⎜⎜⎛​0−45​−45​−45​​⎠⎟⎟⎞​​

α max ⁡ = min ⁡ { 5 / 8 5 / 4 , 21 / 8 5 / 4 } = 1 2 \alpha_{\max }=\min \left\{\frac{5 / 8}{5 / 4}, \frac{21 / 8}{5 / 4}\right\}=\frac{1}{2} αmax​=min{5/45/8​,5/421/8​}=21​
x 2 + α d 2 = ( 0 5 8 21 8 3 8 ) + α ( 0 − 5 4 − 5 4 − 5 4 ) = ( 5 8 − 5 4 α 21 8 − 5 4 α 3 8 + 5 4 α ) x^{2}+\alpha d^{2}=\left(\begin{array}{c}0 \\ \frac{5}{8} \\ \frac{21}{8} \\ \frac{3}{8}\end{array}\right)+\alpha\left(\begin{array}{r}0 \\ -\frac{5}{4} \\ -\frac{5}{4} \\ -\frac{5}{4}\end{array}\right)=\left(\begin{array}{c}\frac{5}{8}-\frac{5}{4} \alpha \\ \frac{21}{8}-\frac{5}{4} \alpha \\ \frac{3}{8}+\frac{5}{4} \alpha\end{array}\right) x2+αd2=⎝⎜⎜⎛​085​821​83​​⎠⎟⎟⎞​+α⎝⎜⎜⎛​0−45​−45​−45​​⎠⎟⎟⎞​=⎝⎛​85​−45​α821​−45​α83​+45​α​⎠⎞​
f ( x 2 + α d 2 ) = ( 5 8 − 5 4 α ) 2 f\left(x^{2}+\alpha d^{2}\right)=\left(\frac{5}{8}-\frac{5}{4} \alpha\right)^{2} f(x2+αd2)=(85​−45​α)2

直线搜索确定步长 α m a x \alpha_{max} αmax​

min ⁡ f ( x 2 + α d 2 ) = ( 5 8 − 5 4 α ) 2  s.t.  0 ≤ α ≤ 1 2 \begin{array}{ll} \min & f\left(x^{2}+\alpha d^{2}\right)=\left(\frac{5}{8}-\frac{5}{4} \alpha\right)^{2} \\ \text { s.t. } & 0 \leq \alpha \leq \frac{1}{2} \end{array} min s.t. ​f(x2+αd2)=(85​−45​α)20≤α≤21​​
α 2 = 1 2 \alpha_{2}=\frac{1}{2} α2​=21​.

第三次迭代:
x 3 = ( 0 0 2 1 ) , ∇ f ( x 3 ) = ( 0 0 0 0 ) , I B = { 3 , 4 } , I N = { 1 , 2 } x^{3}=\left(\begin{array}{l}0 \\ 0 \\ 2 \\ 1\end{array}\right), \nabla f\left(x^{3}\right)=\left(\begin{array}{l}0 \\ 0 \\ 0 \\ 0\end{array}\right), I_{B}=\{3,4\}, I_{N}=\{1,2\} x3=⎝⎜⎜⎛​0021​⎠⎟⎟⎞​,∇f(x3)=⎝⎜⎜⎛​0000​⎠⎟⎟⎞​,IB​={3,4},IN​={1,2}
x B = ( x 3 x 4 ) = ( 2 1 ) , x N = ( x 1 x 2 ) = ( 0 0 ) x_{B}=\left(\begin{array}{c}x_{3} \\ x_{4}\end{array}\right)=\left(\begin{array}{c}2 \\ 1\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{2}\end{array}\right)=\left(\begin{array}{l}0 \\ 0\end{array}\right) xB​=(x3​x4​​)=(21​),xN​=(x1​x2​​)=(00​)
B = ( 1 0 0 1 ) , B − 1 = ( 1 0 0 1 ) , N = ( 1 − 1 − 2 1 ) B=\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right), B^{-1}=\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & -1 \\ -2 & 1\end{array}\right) B=(10​01​),B−1=(10​01​),N=(1−2​−11​)

( r N 3 ) T = ∇ N f ( x 3 ) T − ∇ B f ( x 3 ) T B − 1 N = ( 0 0 ) T − ( 0 0 ) T ( 1 0 0 1 ) ( 1 − 1 − 2 1 ) = ( 0 0 ) T \begin{aligned} \left(r_{N}^{3}\right)^{T} &=\nabla_{N} f\left(x^{3}\right)^{T}-\nabla_{B} f\left(x^{3}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T}-\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T}\left(\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right)\left(\begin{array}{rr} 1 & -1 \\ -2 & 1 \end{array}\right) \\ &=\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T} \end{aligned} (rN3​)T​=∇N​f(x3)T−∇B​f(x3)TB−1N=(00​)T−(00​)T(10​01​)(1−2​−11​)=(00​)T​
停止迭代,
x ˉ = x 3 = ( 0 0 2 1 ) \bar{x}=x^{3}=\left(\begin{array}{l} 0 \\ 0 \\ 2 \\ 1 \end{array}\right) xˉ=x3=⎝⎜⎜⎛​0021​⎠⎟⎟⎞​

参考资料

MIT凸优化课程

Stephen BoydConvex Optimization

《凸优化》,Stephen Boyd等著,王书宁等译

“最优化理论与算法”北京邮电大学数学系

ight)$

( r N 3 ) T = ∇ N f ( x 3 ) T − ∇ B f ( x 3 ) T B − 1 N = ( 0 0 ) T − ( 0 0 ) T ( 1 0 0 1 ) ( 1 − 1 − 2 1 ) = ( 0 0 ) T \begin{aligned} \left(r_{N}^{3}\right)^{T} &=\nabla_{N} f\left(x^{3}\right)^{T}-\nabla_{B} f\left(x^{3}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T}-\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T}\left(\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right)\left(\begin{array}{rr} 1 & -1 \\ -2 & 1 \end{array}\right) \\ &=\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T} \end{aligned} (rN3​)T​=∇N​f(x3)T−∇B​f(x3)TB−1N=(00​)T−(00​)T(10​01​)(1−2​−11​)=(00​)T​
停止迭代,
x ˉ = x 3 = ( 0 0 2 1 ) \bar{x}=x^{3}=\left(\begin{array}{l} 0 \\ 0 \\ 2 \\ 1 \end{array}\right) xˉ=x3=⎝⎜⎜⎛​0021​⎠⎟⎟⎞​

参考资料

MIT凸优化课程

Stephen BoydConvex Optimization

《凸优化》,Stephen Boyd等著,王书宁等译

“最优化理论与算法”北京邮电大学数学系

上一篇:SVD奇异值分解


下一篇:C-01背包问题