日常需要传递几G或者十几G的fastq数据,在cp或者rysnc完后,是需要对复制的数据进行md5码检验的。本文先获取源文件md5码再获取目标文件md5码,最后检验两者是否一致,从而判断复制是否完整。更多知识分享请到 https://zouhua.top/

获取md5码

  1. md5sum RawData/filename > RawData_md5sum.tsv
  2. md5sum Rename/filename > Rename_md5sum.tsv

主程序

  1. #!/usr/bin/python
  2. import sys
  3. import re
  4. import os
  5. import argparse as ap
  6. def parse_argument(args):
  7. parser = ap.ArgumentParser(description='check')
  8. parser.add_argument('-f1', '--file1', metavar='<file1>', type=str)
  9. parser.add_argument('-f2', '--file2', metavar='<file2>', type=str)
  10. parser.add_argument('-o', '--out', metavar='<out>', type=str)
  11. return parser.parse_args()
  12. def main():
  13. args = parse_argument(sys.argv)
  14. dict_f1 = {}
  15. with open(args.file1, 'r') as f:
  16. lines = f.readlines()
  17. for line in lines:
  18. line = line.strip().split()
  19. dict_f1[line[0]] = line[1]
  20. out_f = open(args.out, 'w')
  21. with open(args.file2, 'r') as f2:
  22. lines2 = f2.readlines()
  23. for line2 in lines2:
  24. line2 = line2.strip().split()
  25. if line2[0] in dict_f1.keys():
  26. res = "\t".join([line2[1], dict_f1[line2[0]], str(line2[0]), "Correct"])
  27. out_f.write(res + "\n")
  28. else:
  29. res = "\t".join([line2[1], line[0], "Wrong"])
  30. out_f.write(res + "\n")
  31. out_f.close()
  32. main()

运行

  1. python check_md5.py -f1 RawData_md5sum.tsv -f2 Rename_md5sum.tsv -o Checkout_md5.tsv