Pydantic — 强大的数据校验工具,比DRF的校验器还快12倍
2022-01-17 12:51
本文目录
Pydantic 是一个使用Python类型注解进行数据验证和管理的模块。安装方法非常简单,打开终端输入:
pip install pydantic
它类似于 Django DRF 序列化器的数据校验功能,不同的是,Django里的序列化器的Field是有限制的,如果你想要使用自己的Field还需要继承并重写它的基类:
# Django 序列化器class Book(models.Model):id = models.AutoField(primary_key=True)name = models.CharField(max_length=32)price = models.DecimalField(max_digits=5, decimal_places=2)author = models.CharField(max_length=32)publish = models.CharField(max_length=32)
而 Pydantic 基于Python3.7以上的类型注解特性,实现了可以对任何类做数据校验的功能:
# Pydantic 数据校验功能from datetime import datetimefrom typing import List, Optionalfrom pydantic import BaseModelclass User(BaseModel):id: intname = 'John Doe'signup_ts: Optional[datetime] = Nonefriends: List[int] = []external_data = {'id': '123','signup_ts': '2019-06-01 12:22','friends': [1, 2, '3'],}user = User(**external_data)print(user.id)print(type(user.id))#> 123#> <class 'int'>print(repr(user.signup_ts))#> datetime.datetime(2019, 6, 1, 12, 22)print(user.friends)#> [1, 2, 3]print(user.dict())"""{'id': 123,'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),'friends': [1, 2, 3],'name': 'John Doe',}"""
从上面的基本使用可以看到,它甚至能自动帮你做数据类型的转换,比如代码中的 user.id, 在字典中是字符串,但经过Pydantic校验器后,它自动变成了int型,因为User类里的注解就是int型。
当我们的数据和定义的注解类型不一致时会报这样的Error:
from datetime import datetimefrom typing import List, Optionalfrom pydantic import BaseModelclass User(BaseModel):id: intname = 'John Doe'signup_ts: Optional[datetime] = Nonefriends: List[int] = []external_data = {'id': '123','signup_ts': '2019-06-01 12:222','friends': [1, 2, '3'],}user = User(**external_data)"""Traceback (most recent call last):File "1.py", line 18, in <module>user = User(**external_data)File "pydantic\main.py", line 331, in pydantic.main.BaseModel.__init__pydantic.error_wrappers.ValidationError: 1 validation error for Usersignup_tsinvalid datetime format (type=value_error.datetime)"""
即 “invalid datetime format”, 因为我传入的 signup_ts 不是标准的时间格式(多了个2)。
1.Pydantic 模型数据导出
通过Pydantic模型中自带的 json 属性方法,能让经过校验后的数据一行命令直接转成 json 字符串,如前文中的user对象:
print(user.dict()) # 转为字典"""{'id': 123,'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),'friends': [1, 2, 3],'name': 'John Doe',}"""print(user.json()) # 转为json"""{"id": 123, "signup_ts": "2019-06-01T12:22:00", "friends": [1, 2, 3], "name": "John Doe"}"""
非常方便。它还支持将整个数据结构导出为 schema json,它能完整地描述整个对象的数据结构类型:
print(user.schema_json(indent=2))"""{"title": "User","type": "object","properties": {"id": {"title": "Id","type": "integer"},"signup_ts": {"title": "Signup Ts","type": "string","format": "date-time"},"friends": {"title": "Friends","default": [],"type": "array","items": {"type": "integer"}},"name": {"title": "Name","default": "John Doe","type": "string"}},"required": ["id"]}"""
2.数据导入
除了直接定义数据校验模型,它还能通过ORM、字符串、文件导入到数据校验模型:
比如字符串(raw):
from datetime import datetimefrom pydantic import BaseModelclass User(BaseModel):id: intname = 'John Doe'signup_ts: datetime = Nonem = User.parse_raw('{"id": 123, "name": "James"}')print(m)#> id=123 signup_ts=None name='James'
此外,它能直接将ORM的对象输入,转为Pydantic的对象,比如从Sqlalchemy ORM:
from typing import Listfrom sqlalchemy import Column, Integer, Stringfrom sqlalchemy.dialects.postgresql import ARRAYfrom sqlalchemy.ext.declarative import declarative_basefrom pydantic import BaseModel, constrBase = declarative_base()class CompanyOrm(Base):__tablename__ = 'companies'id = Column(Integer, primary_key=True, nullable=False)public_key = Column(String(20), index=True, nullable=False, unique=True)name = Column(String(63), unique=True)domains = Column(ARRAY(String(255)))class CompanyModel(BaseModel):id: intpublic_key: constr(max_length=20)name: constr(max_length=63)domains: List[constr(max_length=255)]class Config:orm_mode = Trueco_orm = CompanyOrm(id=123,public_key='foobar',name='Testing',domains=['example.com', 'foobar.com'],)print(co_orm)#> <models_orm_mode.CompanyOrm object at 0x7f0bdac44850>co_model = CompanyModel.from_orm(co_orm)print(co_model)#> id=123 public_key='foobar' name='Testing' domains=['example.com',#> 'foobar.com']
从Json文件导入:
from datetime import datetimefrom pathlib import Pathfrom pydantic import BaseModelclass User(BaseModel):id: intname = 'John Doe'signup_ts: datetime = Nonepath = Path('data.json')path.write_text('{"id": 123, "name": "James"}')m = User.parse_file(path)print(m)
从pickle导入:
import picklefrom datetime import datetimefrom pydantic import BaseModelpickle_data = pickle.dumps({'id': 123,'name': 'James','signup_ts': datetime(2017, 7, 14)})m = User.parse_raw(pickle_data, content_type='application/pickle', allow_pickle=True)print(m)#> id=123 signup_ts=datetime.datetime(2017, 7, 14, 0, 0) name='James'
3.自定义数据校验
你还能给它增加 validator 装饰器,增加你需要的校验逻辑:
from pydantic import BaseModel, ValidationError, validatorclass UserModel(BaseModel):name: strusername: strpassword1: strpassword2: str@validator('name')def name_must_contain_space(cls, v):if ' ' not in v:raise ValueError('must contain a space')return v.title()@validator('password2')def passwords_match(cls, v, values, **kwargs):if 'password1' in values and v != values['password1']:raise ValueError('passwords do not match')return v@validator('username')def username_alphanumeric(cls, v):assert v.isalnum(), 'must be alphanumeric'return v
上面,我们增加了三种自定义校验逻辑:
1.name 必须带有空格
2.password2 必须和 password1 相同
3.username 必须为字母
让我们试试这三个校验是否成功实现:
user = UserModel(name='samuel colvin',username='scolvin',password1='zxcvbn',password2='zxcvbn',)print(user)#> name='Samuel Colvin' username='scolvin' password1='zxcvbn' password2='zxcvbn'try:UserModel(name='samuel',username='scolvin',password1='zxcvbn',password2='zxcvbn2',)except ValidationError as e:print(e)"""2 validation errors for UserModelnamemust contain a space (type=value_error)password2passwords do not match (type=value_error)"""
可以看到,第一个UserModel里的数据完全没有问题,通过校验。
第二个UserModel里的数据,由于name存在空格,password2和password1不一致,无法通过校验。
4.性能表现
这是最令我惊讶的部分,Pydantic 比 Django-rest-framework 还快了12.3倍:
| Package | Version | Relative Performance | Mean validation time |
|---|---|---|---|
| pydantic | 1.7.3 |
93.7μs | |
| attrs + cattrs | 20.3.0 |
1.5x slower | 143.6μs |
| valideer | 0.4.2 |
1.9x slower | 175.9μs |
| marshmallow | 3.10.0 |
2.4x slower | 227.6μs |
| voluptuous | 0.12.1 |
2.7x slower | 257.5μs |
| trafaret | 2.1.0 |
3.2x slower | 296.7μs |
| schematics | 2.1.0 |
10.2x slower | 955.5μs |
| django-rest-framework | 3.12.2 |
12.3x slower | 1148.4μs |
| cerberus | 1.3.2 |
25.9x slower | 2427.6μs |
而且他们的所有基准测试代码都是开源的,你可以在下面这个Github链接找到:
https://github.com/samuelcolvin/pydantic/tree/master/benchmarks
