python-具有numpy的N * M * M张量的矢量化(部分)逆

2024-03-12 21:08:32

我几乎和一年前的情况一样处于类似的情况：
fast way to invert or dot kxnxn matrix

所以我有一个张量的索引为a [n,i,j]的维度为(N,M,M)的张量,我想为N中的每个n求M * M方阵部分的值.

例如,假设我有

In [1]:    a = np.arange(12)
           a.shape = (3,2,2)
           a

Out[1]: array([[[ 0,  1],
                  [ 2,  3]],

                  [[ 4,  5],
                  [ 6,  7]],

                  [[ 8,  9],
                  [10, 11]]])

然后for循环反转将如下所示：

In [2]: inv_a = np.zeros([3,2,2])
        for m in xrange(0,3):
            inv_a[m] = np.linalg.inv(a[m])
        inv_a

Out[2]: array([[[-1.5,  0.5],
                  [ 1. ,  0. ]],

                  [[-3.5,  2.5],
                  [ 3. , -2. ]],

                  [[-5.5,  4.5],
                 [ 5. , -4. ]]])

根据github上的this issue,这显然将在NumPy 2.0中实现.

我猜我需要按照github问题线程中的seberg的说明安装开发版本,但是现在还有另一种矢量化的方法吗？

解决方法:

更新：
在NumPy 1.8和更高版本中,numpy.linalg中的函数是通用通用函数.
这意味着您现在可以执行以下操作：

import numpy as np
a = np.random.rand(12, 3, 3)
np.linalg.inv(a)

这将反转每个3×3数组,并将结果作为12x3x3数组返回.
参见numpy 1.8 release notes.

原始答案：

由于N相对较小,我们如何手动计算所有矩阵的LU分解.
这确保了所涉及的for循环相对较短.

这是使用常规NumPy语法可以完成的方法：

import numpy as np
from numpy.random import rand

def pylu3d(A):
    N = A.shape[1]
    for j in xrange(N-1):
        for i in xrange(j+1,N):
            #change to L
            A[:,i,j] /= A[:,j,j]
            #change to U
            A[:,i,j+1:] -= A[:,i,j:j+1] * A[:,j,j+1:]

def pylusolve(A, B):
    N = A.shape[1]
    for j in xrange(N-1):
        for i in xrange(j+1,N):
            B[:,i] -= A[:,i,j] * B[:,j]
    for j in xrange(N-1,-1,-1):
        B[:,j] /= A[:,j,j]
        for i in xrange(j):
            B[:,i] -= A[:,i,j] * B[:,j]

#usage
A = rand(1000000,3,3)
b = rand(3)
b = np.tile(b,(1000000,1))
pylu3d(A)
# A has been replaced with the LU decompositions
pylusolve(A, b)
# b has been replaced to the solutions of
# A[i] x = b[i] for each A[i] and b[i]

如我所写,pylu3d修改A来计算LU分解.
用LU分解替换每个NxN矩阵后,可以使用pylusolve求解代表矩阵系统右侧的MxN数组b.
它在适当位置修改b并进行适当的后替换以解决系统问题.
在撰写本文时,此实现不包括数据透视,因此它在数值上并不稳定,但在大多数情况下应能很好地工作.

根据数组在内存中的排列方式,使用Cython可能仍会更快.
这是两个执行相同功能的Cython函数,但它们首先沿M进行迭代.
它不是向量化的,但是相对较快.

from numpy cimport ndarray as ar
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def lu3d(ar[double,ndim=3] A):
    cdef int n, i, j, k, N=A.shape[0], h=A.shape[1], w=A.shape[2]
    for n in xrange(N):
        for j in xrange(h-1):
            for i in xrange(j+1,h):
                #change to L
                A[n,i,j] /= A[n,j,j]
                #change to U
                for k in xrange(j+1,w):
                    A[n,i,k] -= A[n,i,j] * A[n,j,k]

@cython.boundscheck(False)
@cython.wraparound(False)
def lusolve(ar[double,ndim=3] A, ar[double,ndim=2] b):
    cdef int n, i, j, N=A.shape[0], h=A.shape[1]
    for n in xrange(N):
        for j in xrange(h-1):
            for i in xrange(j+1,h):
                b[n,i] -= A[n,i,j] * b[n,j]
        for j in xrange(h-1,-1,-1):
            b[n,j] /= A[n,j,j]
            for i in xrange(j):
                b[n,i] -= A[n,i,j] * b[n,j]

您也可以尝试使用Numba,尽管在这种情况下,我无法让它像Cython一样快地运行.

码农公寓

相关文章