1 字符序列类型

字符序列类型	Python 3	Python 2
8位值 (8个二进制)	bytes	str
Unicode 字符	str	unicode

Unicode 字符（Python 3 的 str 实例和 Python 2 的 unicode 实例）都没有和特定的二进制编码形式相关联；换句话说，Unicode字符转换为二进制数据有很多编码方式，其中最常见的是 UTF-8。

bytes - 二进制

bytes 实例包含的是原始二进制数据，即 8 位的无符号值（通常按照 ASCII 编码标准来显示）。
```
a = b'h\x65llo'
print(a)
print(list(a))
print(type(a))
```
str - Unicode
str 实例包含的是 Unicode 码点（code point，也叫做代码点），
- 这些码点与人类语言之中的文本字符相对应。
```
b = 'a\u0300 propos'
print(b)
print(list(b))
print(type(b))
```
  注意
要把 Unicode 数据转换成二进制数据，必须调用 **str** 的 encode 方法。
要把二进制数据转换成 Unicode 数据，必须调用 **bytes** 的 decode 方法。

2 编码与解码
Unicode 字符 —> 二进制数据，称为编码 encode
二进制数据 —> Unicode 字符，称为解码 decode

编写 Python 程序的时候，一定要把编码和解码放在最外围来做。
让程序的核心部分，可以使用 Unicode 数据来运作，这种办法通常叫做 Unicode 三明治，（Unicode sandwich）
程序的核心部分，应该用 str 类型来表示 Unicode 数据，并且不要锁定到某种字符编码上面。
这样可以让程序接受许多种文本编码（例如 Latin-1 、Shift JIS 及 Big5），并把它们都转化成 Unicode，也能保证输出的文本信息都是用同一种标准（最好是 UTF-8）来编码的。

3 使用情形（Python 3）
使用情形
- 需要将 Unicode 字符 —> UTF-8 编码后的二进制数据
- 需要操作没有特定编码形式的 Unicode字符
解码 ```python def to_str(bytes_or_str): if isinstance(bytes_or_str, bytes):
```
 value = bytes_or_str.decode('utf-8')
```
else:
```
 value = bytes_or_str
```
return value # Instance of str

print(repr(to_str(b’foo’))) print(repr(to_str(‘bar’)))

![image.png](https://cdn.nlark.com/yuque/0/2022/png/22011425/1648170387085-f0cfb6bf-a258-4a3a-9794-c2dab5efc7b9.png#clientId=u8e55ec6a-b17d-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=61&id=u0093d40a&margin=%5Bobject%20Object%5D&name=image.png&originHeight=55&originWidth=402&originalType=binary&ratio=1&rotation=0&showTitle=false&size=4227&status=done&style=none&taskId=u4bcfc410-1498-4738-bb0d-863fb5432d4&title=&width=446.66667849929274)
- 编码
```python
def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value  # Instance of bytes
print(repr(to_bytes(b'foo')))
print(repr(to_bytes('bar')))

4 可能的问题（Python 3）

操作符不能混用

bytes 与 str 这两种实例不能在某些操作符（例如 > 、== 、+ 、% 操作符）上面混用。

读写文件

如果使用内置函数 open 获取了文件句柄（file handle）。那么请注意，该句柄默认采用 UTF-8 的编码格式来操作文件。

问题：如果向文件中随机写入一些二进制数据，下面代码可能会出错。

with open('/tmp/random.bin', 'w')as f:
  f.write(os.urandom(10))
>>>
TypeError: must be str, not bytes

原因：Python 3 给 open 函数添加了名为 encoding 的新参数，而这个参数的默认值就是 ‘utf-8’。
解决方案，用二进制写入模式（’wb’）来开启待操作的文件。
```
with open('/tmp/random.bin', 'wb')as f:
 f.write(os.urandom(10))
```
读取数据也类似，用（’rb‘）来打开文件。

Effective Python

3 - 了解 bytes 和 str 的区别

1 字符序列类型

bytes - 二进制

str - Unicode

注意

2 编码与解码

3 使用情形（Python 3）

4 可能的问题（Python 3）

操作符不能混用

读写文件