Python - Python中使用read()读取图片内容 - 《代码笔记》

1. 字节数少于预期字节数
2. 不能使用 PIL.Image.frombytes 从 raw_loader 读取的内容重建图片
3. 正确从raw内容重建图片的方法

首先如何使用 read() 读取文件内容：

def raw_loader(path):
    """A loader for loading raw file content.
    Args:
    @param path: str, the file path.
    Return:
    the content of the file.
    """
    try:
        content = b''
        with open(path, 'rb') as f:
            while True:
                tmp = f.read(4096)
                if not tmp: break
                content = content + tmp
        return content
    except IOError as e:
        return None

一般博客文章会建议使用一次 read() 直接读取文件全部内容，这里为了避免出现文件内容不能读全的可能，采用循环读取的方式每次从文件读取 4k 个字节的数据，最终返回 bytes 类型的文件内容。

下面说几个直接从文件读取图片数据会遇到的问题和困惑：（我们假设待读取的图片为 tmp_batch.png , RGBA , 大小 84 x 84*3 。

1. 字节数少于预期字节数

对于一张 84 x 84*3 的 RGBA 图片，其数据量应为 84*84*3*4=197568 个字节数，但我们使用 raw_loader 函数直接读取的到的数据流字节数却少于预期。

im_raw = raw_loader('tmp_batch.png')
print(len(im_raw), type(im_raw))
# output
# 109409 <class 'bytes'>

可以看到读取出来的字节流只包含 109409 个字节，明显少于 197568.

但如果使用PIL读取图片文件的话，可以得到预期长度的字节流。

from PIL import Image as PILImage
im = PILImage.open('tmp_batch.png')
print(im.size)
imbytes = im.tobytes()
print(len(imbytes), type(imbytes))
# output
# (84, 588)
# 197568 <class 'bytes'>

这种字节数量上的差异可能是由数据编码格式导致的，数据格式会对数据进行一定程度的压缩。通过 ls -lh 可以看到读取的raw文件内容在字节数量上和 ls 显示的文件实际占有的存储空间一致。

2. 不能使用 `PIL.Image.frombytes` 从 `raw_loader` 读取的内容重建图片

PILImage.frombytes('RGBA', (84, 588), im_raw)
# 会报错

3. 正确从raw内容重建图片的方法

from io import BytesIO
im = PILImage.open(BytesIO(im_raw))
print(type(im), im.size)
# output
# <class 'PIL.PngImagePlugin.PngImageFile'> (84, 588)

Python中使用read()读取图片内容

1. 字节数少于预期字节数

2. 不能使用 PIL.Image.frombytes 从 raw_loader 读取的内容重建图片

3. 正确从raw内容重建图片的方法

2. 不能使用 `PIL.Image.frombytes` 从 `raw_loader` 读取的内容重建图片