前提

突然收到比较多数量的PE文件分析要去,通过文件大小和ImpHash进行分类,避免重复分析😣

🐍Python🐍

逻辑

  1. 遍历文件夹
  2. 获取文件MD5
  3. 获取ImpHash

image.png

代码

  1. #coding=utf-8
  2. import os
  3. import hashlib
  4. import pefile
  5. def GetFileMD5(filename):
  6. if not os.path.isfile(filename):
  7. print(filename)
  8. return
  9. strMD5 = hashlib.md5()
  10. f = open(filename, 'rb')
  11. while True:
  12. fContent = f.read()
  13. if not fContent:
  14. break
  15. strMD5.update(fContent)
  16. f.close()
  17. return strMD5.hexdigest().upper()
  18. def EnumFile(dir):
  19. for home, dirs, files in os.walk(dir):
  20. for dir in dirs:
  21. print(dir)
  22. for filename in files:
  23. pathFile = os.path.join(home, filename)
  24. nameNew = GetFileMD5(pathFile)
  25. #print(pathFile)
  26. print("MD5:", nameNew)
  27. ImpHash = GetImpHash(pathFile)
  28. print("ImpHash:", ImpHash)
  29. #os.rename(pathFile, nameNew)
  30. def GetImpHash(filename):
  31. file = pefile.PE(filename)
  32. return file.get_imphash().upper()
  33. path = "路径"
  34. if __name__ == "__main__":
  35. EnumFile(path)

升级

后因为要可视化成图表,新写了另一个Python:
数据处理 | pandas处理数据后输出为Excel表格