Commands - 打包文件列表中的文件 - 《Linux学习记录》

前言
tar
- 搭配xargs
- 搭配-T选项
Zip
- 搭配-i选项
参考链接

前言

从服务器下载图像数据，在模型训练过程中，opencv出现警告Premature end of JPEG file。这应该是出现了jpeg文件损坏的问题。
使用如下代码检查到jpeg出现损坏：

# Premature end of JPEG file
def is_valid_jpg(path):
    """判断JPG文件下载是否完整"""
    if path.split(".")[-1].lower() == "jpg":
        with open(path, "rb") as f:
            f.seek(-2, 2)
            return f.read() == b"\xff\xd9"
    else:
        return True

但是数据本身很重要，所以需要从服务器端重新下载这些文件。由于原始的数据集本身非常大，将近70G，直接下载并不合理。所以希望仅仅更新这些存在错误的数据。于是出现了这样的需求，在linux机器上，如何从linux服务器下载指定文件列表中的数据？

`tar`

tar 用来打包文件：https://www.runoob.com/linux/linux-comm-tar.html

搭配`xargs`

cat file.txt | xargs tar -czvf file.tar.gz

cat 查看文件的内容
file.txt 需要打包的文件列表（一行一个文件路径）
xargs 将参数列表转换成小块分段传递给其他命令

其中的文件列表的大致格式如下：

projects/index.php
projects/js/index.js
projects/css/index.css

注：需要注意的是，在执行上面的命令时，需要在与projects目录平行的目录里执行，即文件列表中的路径从执行指令的路径开始，或者直接使用从根目录开始的绝对路径。

搭配`-T`选项

tar -czv -T file.txt -f backup.tar.gz

`Zip`

zip具体配置项可见：https://www.runoob.com/linux/linux-comm-zip.html

搭配`-i`选项

-i：“文件列表” 只压缩文件列表中的文件
-x：“文件列表” 压缩时排除文件列表中指定的文件

-@ file lists. If a file list is specified as -@ [Not on MacOS], zip takes the list of input files from standard input instead of from the command line. For example,

cat diff-files.txt | zip -@ diffedfiles.zip
zip diffedfiles.zip $(cat diff-files.txt) -r
zip output.zip -r . -i@filelist.txt
zip -@ - < filelist.txt > output.zip

参考自：https://serverfault.com/questions/652892/create-zip-based-on-contents-of-a-file-list
当然这里的-i的使用不限于仅仅文件，还可以使用通配符的形式来配置：

# which will include only the files that end in .c in the current directory 
# and its subdirectories
zip -r foo . -i \*.c
zip -r foo . --include \*.c

参考自：https://www.computerhope.com/unix/zip.htm

参考链接

通过文件列表打包文件：https://www.cnblogs.com/yuzhoushenqi/p/6321936.html
Linux tar打包文件列表：http://www.360doc.com/content/12/0418/10/7871_204592739.shtml

打包文件列表中的文件

前言

tar

搭配xargs

搭配-T选项