前提
突然收到比较多数量的PE文件分析要去,通过文件大小和ImpHash进行分类,避免重复分析😣
🐍Python🐍
逻辑
- 遍历文件夹
- 获取文件MD5
- 获取ImpHash
代码
#coding=utf-8
import os
import hashlib
import pefile
def GetFileMD5(filename):
if not os.path.isfile(filename):
print(filename)
return
strMD5 = hashlib.md5()
f = open(filename, 'rb')
while True:
fContent = f.read()
if not fContent:
break
strMD5.update(fContent)
f.close()
return strMD5.hexdigest().upper()
def EnumFile(dir):
for home, dirs, files in os.walk(dir):
for dir in dirs:
print(dir)
for filename in files:
pathFile = os.path.join(home, filename)
nameNew = GetFileMD5(pathFile)
#print(pathFile)
print("MD5:", nameNew)
ImpHash = GetImpHash(pathFile)
print("ImpHash:", ImpHash)
#os.rename(pathFile, nameNew)
def GetImpHash(filename):
file = pefile.PE(filename)
return file.get_imphash().upper()
path = "路径"
if __name__ == "__main__":
EnumFile(path)
升级
后因为要可视化成图表,新写了另一个Python:
数据处理 | pandas处理数据后输出为Excel表格