前言

在具体介绍如何实现文件 zip 压缩之前我们需要对 zip 压缩有个基础了解。相信各位在实际中经常使用 zip 文件压缩，但是你可能对压缩包了解甚少。

先来看个使用命令行压缩文件的示例，在我当前的 Downloads 目录下有两个文件：

$ ls
050617-425.png  test.md

现在使用 zip 命令进行压缩成一个 example.zip 文件：

$ zip example.zip ./*

接下来要做的是使用 l 参数列出这个压缩包文件信息，如下：

$ unzip -l example.zip 
Archive:  example.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
  1159311  2021-11-05 20:23   050617-425.png
      148  2021-11-05 20:23   test.md
---------                     -------
  1159459                     2 files

注意看输出的信息，一共有四列。第一列 Length 输出的是文件的大小，单位是字节。最主要的是看最后一列 Name，这列输出的就是压缩文件的名称。一定要知道这点，因为如何不知道这个是什么就会对 Java ZIP 压缩不理解。

现在我们就对 zip 文件有了基本的认识，那来看下如何使用 Apache 工具包 Commons-Compress 如何创建 ZIP 文件和 ZIP 文件的解压。

文件压缩

有了前面的知识就会发现创建 zip 压缩对象就特别简单，只要创建一个 ZipArchiveOutputStream 对象即可，怎么创建呢？两种方式，直接 NEW 或者工厂方法，如下。

直接 NEW（推荐）：

ZipArchiveOutputStream zipOutputStream = new ZipArchiveOutputStream(outputStream);

使用工厂创建：

// 使用工厂方式返回的是一个 ArchiveOutputStream 抽象IO对象
ArchiveOutputStream archiveOutputStream = new ArchiveStreamFactory().createArchiveOutputStream(ArchiveStreamFactory.ZIP, outputStream, "UTF-8");
// 在使用之前最好进行一次强转
ZipArchiveOutputStream zipOutputStream = (ZipArchiveOutputStream) archiveOutputStream;

这两种方法不管使用哪种都可，除非你使用 SIP 扩展点，否则我更推荐你直接使用 NEW 的方式。

压缩文件

这里的压缩文件不能包含目录，支持多个文件。代码如下：

public static void createZipArchive(Collection<File> fileList, OutputStream outputStream) {
    try (ZipArchiveOutputStream zipOutputStream = new ZipArchiveOutputStream(outputStream)) {
        zipOutputStream.setUseZip64(Zip64Mode.AsNeeded);
        // 解决中文乱码问题
        // issue: https://stackoverflow.com/questions/4212577/zip-file-created-by-java-doesnt-support-chineseutf-8
        zipOutputStream.setEncoding("Cp437");
        zipOutputStream.setFallbackToUTF8(true);
        zipOutputStream.setUseLanguageEncodingFlag(true);
        zipOutputStream.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.NOT_ENCODEABLE);
        // 创建条目
        for (File file : fileList) {
            // 创建压缩条目
            ZipArchiveEntry zipEntry = new ZipArchiveEntry(file, file.getName());
            zipOutputStream.putArchiveEntry(zipEntry);
            // 写入数据
            IOUtils.copy(new FileInputStream(file), zipOutputStream);
            // 该条目创建完成
            zipOutputStream.closeArchiveEntry();
        }
        zipOutputStream.finish();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

这里我们需要指的一点就是就是 创建条目，在最开始的时候还特意让大家看了下 zip 压缩包文件的数据内容，其中就有一个 Name。那么这个 Name 是怎么来的呢？其实就是我们 Java 代码里面的 ZipArchiveEntry 对象。

ZipArchiveEntry 对象主要有两个属性：file 和 name。file 就是具体的文件，而这个 name 就是文件的名称，对应于 ZIP 压缩包的 Name 输出列。相信说到这大家就对 ZIP 压缩包有了全新的认识。

当一个 ZipArchiveEntry 创建完成后就需要写入对应的文件流，对应上面代码中的 IOUtils.copy(in, out)。当文件流写玩后就需要关闭这个 ZipArchiveEntry，对应代码中的 closeArchiveEntry() 方法，表示这个条目（文件）创建完成了，之后如何还有文件就重复这个操作！

另外注意看方法内部的对中文的处理。在压缩含有中文名的文件时，解压时会有中文会有乱码的问题，这是 JDK 的一个 BUG。在 StackOverflow 上也有对应的问题，并指出了 Oracle ISSUE 地址。链接是：

https://stackoverflow.com/questions/4212577/zip-file-created-by-java-doesnt-support-chineseutf-8

看下演示示例：

在我的 /home/kali/Downloads/test 目录下有六个文件，如下：

$ ls /home/kali/Downloads/test
2021_10_04_17_49_IMG_0328.MOV  photo_2021-09-15_20-11-51.jpg  video_2021-10-30_18-30-58.mp4
photo_2021-09-15_12-22-14.jpg  photo_2021-10-09_09-36-06.jpg  代码没写完哪有脸睡觉.jpg

我想将这六个文件压缩成一个 test.zip 文件，写到 /home/kali/Downloads/exp 目录下，代码如下：

public static void main(String[] args) throws IOException {
    File file = new File("/home/kali/Downloads/test");
    if (file.isDirectory()) {
        File[] files = file.listFiles();
        if (files != null && files.length != 0) {
            createZipArchive(Arrays.asList(files), new FileOutputStream("/home/kali/Downloads/exp/test.zip"));
        }
    }
}

看下生成后的压缩文件：

$ ls /home/kali/Downloads/exp/
test.zip

来看下压缩文件的内容：

$ unzip -l /home/kali/Downloads/exp/test.zip 
Archive:  test.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    51014  2021-10-30 18:11   photo_2021-10-09_09-36-06.jpg
    78909  2021-10-30 18:30   代码没写完哪有脸睡觉.jpg
    93505  2021-10-30 21:22   photo_2021-09-15_20-11-51.jpg
   586305  2021-10-30 18:30   video_2021-10-30_18-30-58.mp4
    67048  2021-10-30 21:22   photo_2021-09-15_12-22-14.jpg
702499037  2021-10-30 18:47   2021_10_04_17_49_IMG_0328.MOV
---------                     -------
703375818                     6 files

这样 ZIP 文件就创建完成了。

注意
关于这个中文乱码的问题，我使用 stackoverflow 上面的方法在 Linux 测试正常，但是在 MacOS 下测试依然有乱码，而 Windows 平台因为没环境所以没做测试。关于这个问题的解决方案还在查资料中…

压缩目录和文件

上面的方法虽然能够批量压缩文件但是只能够压缩具体的文件，也就是说指定的 File 必须是文件，不能是目录。那如何压缩子目录呢？

比如在我当前的 Download 目录下有几个文件，树形结构如下（其中 emptydir 是一个空文件夹）：

$ tree ~/Downloads/
.
├── 050617-425.png
├── emptydir
├── html
│   ├── errno-404.html
│   └── index.html
└── test.md

那现在该怎去压缩才能达到压缩文件夹的目录呢？我们还是用使用 zip 命令来演示下该怎么去压缩。命令如下：

$ zip example.zip ./*
  adding: 050617-425.png (deflated 6%)
  adding: emptydir/ (stored 0%)
  adding: html/ (stored 0%)
  adding: test.md (deflated 20%)

之后会得到一个 example.zip 文件，现在来看下这个 example.zip 文件中的内容：

$ unzip -l example.zip 
Archive:  example.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
  1159311  2021-11-04 23:15   050617-425.png
        0  2021-11-05 20:10   emptydir/
        0  2021-11-05 20:10   html/
      148  2021-11-04 23:00   test.md
---------                     -------
  1159459                     4 files

注意看 Name 列的输出，如果是文件夹的话 Name 就要包含文件夹信息。这其实就是对应我们 Java 类的 ZipArchiveEntry 对象，所以我们也可以这么实现，代码如下：

public static void createZipArchive(File dir, OutputStream outputStream) throws IOException {
    if (!dir.isDirectory()) {
        throw new IOException(String.format("%s 不是一个目录", dir.getPath()));
    }
    try (ZipArchiveOutputStream zipOutputStream = new ZipArchiveOutputStream(outputStream)) {
        zipOutputStream.setUseZip64(Zip64Mode.AsNeeded);
        // 解决中文乱码问题
        // issue: https://stackoverflow.com/questions/4212577/zip-file-created-by-java-doesnt-support-chineseutf-8
        zipOutputStream.setEncoding("Cp437");
        zipOutputStream.setFallbackToUTF8(true);
        zipOutputStream.setUseLanguageEncodingFlag(true);
        zipOutputStream.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.NOT_ENCODEABLE);
        String dirPath = dir.getPath();
        Files.walk(dir.toPath(), FileVisitOption.FOLLOW_LINKS).forEach(new Consumer<Path>() {
            @Override
            public void accept(Path path) {
                File file = path.toFile();
                String filePath = file.getPath();
                // 如果 dirPath 和 filePath 相等表示是指定的文件目录, 跳过.
                if (filePath.equals(dirPath)) {
                    return;
                }
                // 如果 file 是一个文件就开始写入文件流.
                // 如果 file 是一个目录但是文件夹没有内容表示是一个空目录, 也需要j将这个空文件夹写入压缩包.
                if (file.isFile() || (file.isDirectory() && file.listFiles().length == 0)) {
                    // 空文件夹处理
                    if (file.isDirectory()) {
                        try {
                            // 注意看这个文件名, 不再是之前的 file.getName()
                            String fileName = filePath.substring(dirPath.length() + 1);
                            ZipArchiveEntry entry = new ZipArchiveEntry(file, fileName);
                            zipOutputStream.putArchiveEntry(entry);
                            // 注意这里不再需要写入流, 因为是一个这里的 file 是一个空文件,
                            // 我们只需要使用 ZipArchiveEntry 创建一个标记即可!!!!
                            // IOUtils.copy(inputStream, zipOutputStream);
                            zipOutputStream.closeArchiveEntry();
                            return;
                        } catch (IOException e) {
                            // 空文件夹创建失败, Handler exception
                            e.printStackTrace();
                        }
                    }
                    // 文件处理
                    try (FileInputStream inputStream = new FileInputStream(file)) {
                        // 注意看这个文件名, 不再是之前的 file.getName()
                        String fileName = filePath.substring(dirPath.length() + 1);
                        ZipArchiveEntry entry = new ZipArchiveEntry(file, fileName);
                        zipOutputStream.putArchiveEntry(entry);
                        IOUtils.copy(inputStream, zipOutputStream);
                        zipOutputStream.closeArchiveEntry();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
        });
        zipOutputStream.finish();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

这个代码有点长，另外使用了 Stream，但是因为阅读方便所以没有 Lambda，下面开始对这个代码说下。

首先呢前面的代码没什么区别，区别在于 Files.walk() 内部的处理。Files.walk() 是用于一个一个读取指定文件夹下内容的函数，你可以理解为就是递归下钻。在开始递归之前我们先得到这个文件夹的 Path，即 dir.getPath()。主要是为了后面的文件比较。

在递归内部，我们要比对递归的 Path 和 dirPath，是因为截取 Path 得到 ZIP 的 Name。当然第一次递归就是 dir 目录本身，所以我们要进行跳过。之后我们需要判断是否为文件，如果是文件我们就正常写入即可。

但是如果是文件夹呢？我们需要判断这个文件夹是否是空文件夹，如果是我们需要记录到 ZIP 压缩文件中，创建一个对应的 ZipArchiveEntry 对象即可。注意，因为是空文件夹所以不需要写入流，创建这个对象也仅仅是为了标记，ZIP 压缩包会将这个 ZipArchiveEntry 当做文件处理，在解压时就能得到对应的空文件夹。

另外，这段代码为了阅读方便所以没做任何的优化。下面执行看下效果如何：

public static void main(String[] args) throws IOException {
    createZipArchive(new File("/home/kali/Downloads"), new FileOutputStream("/home/kali/tmp/examlpe.zip"));
}

ZIP 文件创建完成后使用 unzip 命令看下压缩包内容：

$ unzip -l examlpe.zip 
Archive:  examlpe.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
      148  2021-11-04 23:00   test.md
        0  2021-11-05 20:10   emptydir/
  1159311  2021-11-04 23:15   050617-425.png
   259288  2021-11-04 21:21   html/index.html
    17043  2021-11-04 21:21   html/errno-404.html
---------                     -------
  1435790                     5 files

你会看到 Name 列输出的信息就是我们之前目录下的所有文件内容。另外注意 emptydir/ 对应的 Length 为 0，是一个空文件夹，其他的文件也一切正常。

这样一个包含目录的 ZIP 压缩文件就创建完成了~

指定压缩级别

创建 ZIP 压缩包时我们还可以指定压缩级别，ZipArchiveOutputStream 有个 setLevel 方法用于设置压缩级别：

org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream#setLevel

另外，这个这个级别在 Java 标准库 java.util.zip.Deflater 中有定义，几个常量如下：

public static final int DEFLATED = 8;
public static final int NO_COMPRESSION = 0;
public static final int BEST_SPEED = 1;
public static final int BEST_COMPRESSION = 9;
public static final int DEFAULT_COMPRESSION = -1;
public static final int FILTERED = 1;
public static final int HUFFMAN_ONLY = 2;
public static final int DEFAULT_STRATEGY = 0;
public static final int NO_FLUSH = 0;
public static final int SYNC_FLUSH = 2;
public static final int FULL_FLUSH = 3;

有兴趣的可以自己试下，这里就不再演示了。

设置压缩方法

ZipArchiveOutputStream 同样的还有一个 setMethod 方法用于设置压缩方式：

org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream#setMethod

压缩方式有两个，也是在 Java 标准库 java.util.zip.ZipEntry 中定义：

public static final int STORED = 0;
public static final int DEFLATED = 8;

STORED 表示打包归档而不是压缩，这个就类似于 tar 命令打包一个。而 DEFLATED 表示的是压缩存储，我们使用 ZIP 的原因就是因为压缩，所以该值是默认值。

文件解压

相比较压缩，解压更简单些。因为解压就是根据 Zip Entry Name 创建文件或者文件夹，也就是说在解压时要重点关注对文件夹的处理。看下我们之前创建的 example.zip 文件中的条目信息：

$ unzip -l examlpe.zip 
Archive:  examlpe.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
      148  2021-11-04 23:00   test.md
        0  2021-11-05 20:10   emptydir/
  1159311  2021-11-04 23:15   050617-425.png
   259288  2021-11-04 21:21   html/index.html
    17043  2021-11-04 21:21   html/errno-404.html
---------                     -------
  1435790                     5 files

在解压时我们一定一定一定一定要特别注意对 Entry Name 的处理！也就是说在压缩时文件是什么样的解压时也要是什么样的。

比如第一个 Entry Name 是一个 test.md，这个是一个具体的文件，所以在解压时将其输出到指定的目录下即可。而第二个 Entry Name 是 emptydir/，这明显是一个文件夹，并且是个空文件夹。那么在解压时对这类数据该如何处理呢？是不是应该创建一个文件夹呢？再来看最后两个 Entry Name：html/index.html 和 html/errno-404.html。这个就比较特别了，是一个包含文件夹的文件，那么我们在输出时就应该先创建对应的 html 文件夹然后再将两个 html 文件解压到这两个文件夹下！

这是在解压时要注意的地方。另外你有没有发现这个 Entry Name，事实上我们可以通过这个 Entry Name 只解压某一个文件，这个也被称为随机访问解压。而有随机的肯定也有顺序的，现在直接看代码：

顺序访问解压

顺序解压就是按照 Entry Name 的顺序依次读取文件流进行解压，而且是全部解压！这个就没有任何的技巧，直接读取文件流即可。看下下面的代码：

public static void unpackZipArchive(File zipFile, File outputDir) throws IOException {
    if (!outputDir.isDirectory()) {
        throw new IOException(outputDir + "不是一个目录");
    }
    Set<String> existPaths = new HashSet<>();
    try (ZipArchiveInputStream inputStream = new ZipArchiveInputStream(new FileInputStream(zipFile))) {
        ArchiveEntry entry;
        while ((entry = inputStream.getNextEntry()) != null) {
            // Zip Entry Name
            String fileName = entry.getName();
            // 解压的文件对象输出到 outputDir 目录下
            File newFile = new File(outputDir.getPath(), fileName);
            // 这里要判断是 entry 是不是目录, 如果是目录我们应该创建对应的目录.
            // 而如果是文件的话, 我们需要先得到文件的目录, 因为 zip 中的 fileName
            // 可以是 dir/file 的格式, 我们需要判断是否存在 / 符号. 如果存在就表示
            // file 前面有一个或多个目录, 所以要先创建对应的目录, 之后再输出具体的文件.
            if (entry.isDirectory()) {
                Files.createDirectories(newFile.toPath());
            } else {
                if (fileName.contains(File.separator)) {
                    String newFilePath = newFile.getPath();
                    String dirPath = newFilePath.substring(0, newFilePath.lastIndexOf(File.separator));
                    if (!existPaths.contains(dirPath)) {
                        Path path = Paths.get(dirPath);
                        Files.createDirectories(path);
                        existPaths.add(dirPath);
                    }
                }
                IOUtils.copy(inputStream, new FileOutputStream(newFile));
            }
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

这个就是顺序解压，最重要的是看 while 循环中对 Entry Name 的处理，我们先判断 Entry 是否是文件夹，是文件夹就要进行创建。不是文件夹的话要要继续判断文件是否包含目录，如果包含目录我们要先创建目录然后再写文件。

现在来看下解压效果，还是以之前创建的 example.zip 为例：

public static void main(String[] args) throws IOException {
    unpackZipArchive(new File("/home/kali/tmp/examlpe.zip"), new File("/home/kali/tmp/unpark"));
}

最后看下解压后的文件：

$ tree /home/kali/tmp/unpark
/home/kali/tmp/unpark
├── 050617-425.png
├── emptydir
├── html
│   ├── errno-404.html
│   └── index.html
└── test.md
2 directories, 4 files

Prefect~

随机访问解压

随机访问解压相比较顺序而言更加的灵活，因为是根据 Entry Name 来的。什么意思呢？就是说我们可以根据 Entry Name 来解压指定的文件，也被称为随机访问。代码在实现上的区别是不直接创建一个文件流，而是得到一个 ZipFile 对象：

public static void unpackZipArchive(File zipFile, File outputDir) throws IOException {
    ZipFile zip = new ZipFile(zipFile);
}

这个 ZipFile 比较有意思，它有几个重要的方法（下面是其中的四个）：

public InputStream getInputStream(final ZipArchiveEntry ze);
public Enumeration<ZipArchiveEntry> getEntries();
public Iterable<ZipArchiveEntry> getEntries(final String name);
public ZipArchiveEntry getEntry(final String name);

你发现问题了吗？先看最后两个方法，参数是 name，这个 name 就是 Entry Name。也就是我们之前 example.zip 中的 Name 数据：

$ unzip -l examlpe.zip 
Archive:  examlpe.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
      148  2021-11-04 23:00   test.md
        0  2021-11-05 20:10   emptydir/
  1159311  2021-11-04 23:15   050617-425.png
   259288  2021-11-04 21:21   html/index.html
    17043  2021-11-04 21:21   html/errno-404.html
---------                     -------
  1435790                     5 files

比如我只想要解压 050617-425.png 文件怎么办呢？使用 getEntry(final String name) 方法即可，伪代码如下：

public static void unpackZipArchive(File zipFile, File outputDir) throws IOException {
    ZipFile zip = new ZipFile(zipFile);
    // 得到指定的 Entry
    ZipArchiveEntry entry = zip.getEntry("050617-425.png");
    // 解压
    try (InputStream inputStream = zip.getInputStream(entry)) {
        // 因为知道这是一个普通的文件, 所以这里什么判断都不做. 仅做演示使用, 直接解压.
        File newFile = new File(outputDir.getPath(), entry.getName());
        IOUtils.copy(inputStream, new FileOutputStream(newFile));
    }
}

解压看下 /home/kali/tmp/unpark 目录下的文件：

$ tree ~/tmp/unpark/
/home/kali/tmp/unpark/
└── 050617-425.png
0 directories, 1 file

现在有没有体会到其中的精髓？

另外，相信也注意到了 getEntries() 方法，我们同样也可以作为顺序解压使用，伪代码如下：

public static void unpackZipArchive(File zipFile, File outputDir) throws IOException {
    ZipFile zip = new ZipFile(zipFile);
    // 直接获取所有的 Entry 对象, 循环解压
    Enumeration<ZipArchiveEntry> entries = zip.getEntries();
    while (entries.hasMoreElements()) {
        ZipArchiveEntry entry = entries.nextElement();
        try (InputStream inputStream = zip.getInputStream(entry)){
            // do something...
        }
    }
}

怎么样？有没有觉得很棒？

Java 小记

ZIP 文件压缩和解压

前言