java-如何使用Zstd-jni和字节缓冲区解压缩大文件

我尝试使用ByteBuffers和Channels并行下载大量40 MB文件时解压缩.与使用Streams相比,使用Channels可以获得更高的吞吐量,我们需要这是一个非常高的吞吐量的系统,因为我们每天需要处理40 TB的文件,而这一部分目前是瓶颈.文件使用zstd-jni压缩.Zstd-jni具有用于解压缩字节缓冲区的api,但是使用它们时出现错误.如何使用zstd-jni一次解压缩字节缓冲区?

我在他们的测试中找到了这些示例,但是除非丢失了某些内容,否则使用ByteBuffers的示例似乎假定整个输入文件都适合一个ByteBuffer:
https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala

以下是我用于压缩和解压缩文件的代码.压缩代码效果很好,但是解压缩代码失败,错误为-70.

public static long compressFile(String inFile, String outFolder, ByteBuffer inBuffer, ByteBuffer compressedBuffer, int compressionLevel) throws IOException {
    File file = new File(inFile);
    File outFile = new File(outFolder, file.getName() + ".zs");
    long numBytes = 0l;

    try (RandomAccessFile inRaFile = new RandomAccessFile(file, "r");
        RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
                FileChannel inChannel = inRaFile.getChannel();
                FileChannel outChannel = outRaFile.getChannel()) {
        inBuffer.clear();
        while(inChannel.read(inBuffer) > 0) {
            inBuffer.flip();
            compressedBuffer.clear();

            long compressedSize = Zstd.compressDirectByteBuffer(compressedBuffer, 0, compressedBuffer.capacity(), inBuffer, 0, inBuffer.limit(), compressionLevel);
            numBytes+=compressedSize;
            compressedBuffer.position((int)compressedSize);
            compressedBuffer.flip();
            outChannel.write(compressedBuffer);
            inBuffer.clear(); 
        }
    }

    return numBytes;
}

public static long decompressFile(String originalFilePath, String inFolder, ByteBuffer inBuffer, ByteBuffer decompressedBuffer) throws IOException {
    File outFile = new File(originalFilePath);
    File inFile = new File(inFolder, outFile.getName() + ".zs");
    outFile = new File(inFolder, outFile.getName());

    long numBytes = 0l;

    try (RandomAccessFile inRaFile = new RandomAccessFile(inFile, "r");
        RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
                FileChannel inChannel = inRaFile.getChannel();
                FileChannel outChannel = outRaFile.getChannel()) {

        inBuffer.clear();

        while(inChannel.read(inBuffer) > 0) {
            inBuffer.flip();
            decompressedBuffer.clear();
            long compressedSize = Zstd.decompressDirectByteBuffer(decompressedBuffer, 0, decompressedBuffer.capacity(), inBuffer, 0, inBuffer.limit());
            System.out.println(Zstd.isError(compressedSize) + " " + compressedSize);
            numBytes+=compressedSize;
            decompressedBuffer.position((int)compressedSize);
            decompressedBuffer.flip();
            outChannel.write(decompressedBuffer);
            inBuffer.clear(); 
        }
    }

    return numBytes;
}

解决方法:

是的,您在示例中使用的静态方法假定整个压缩文件都适合一个ByteBuffer.据我了解您的要求,您需要使用ByteBuffers进行流式解压缩. ZstdDirectBufferDecompressingStream已经提供了以下功能:

https://static.javadoc.io/com.github.luben/zstd-jni/1.3.7-1/com/github/luben/zstd/ZstdDirectBufferDecompressingStream.html

这是一个示例(从测试中)如何使用它:

https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala#L261-L302

但您还必须对其进行子类化并覆盖“重新填充”方法.

编辑:这是我刚刚添加的新测试,它的结构与您的问题完全相同-在通道之间移动数据:

https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala#L540-L586

上一篇:译(四十一)-Python从路径中获取文件名


下一篇:处理不同后缀的文件