1、介绍、安装
(1)介绍
- spaCy is a library for advanced Natural Language Processing in Python and Cython. It’s built on the very latest research, and was designed from day one to be used in real products.
- spaCy comes with pretrained pipelines and currently supports tokenization and training for 60+ languages. It features state-of-the-art speed and neural network models for tagging, parsing, named entity recognition, text classification and more, multi-task learning with pretrained transformers like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management. spaCy is commercial open-source software, released under the MIT license.
- Trained Models & Pipelines:https://spacy.io/models
- training system:https://spacy.io/usage/training
- Trained pipelines for spaCy can be installed as Python packages. This means that they’re a component of your application, just like any other module. Models can be installed using spaCy’s download command, or manually by pointing pip to a path or URL.
- spacy 是由 Matt Honnibal 在 Explosion AI 开发用于 “Python 中的工业强度 NLP” 的相对较新的软件包。它的设计考虑了应用数据科学家,这意味着它不会影响用户决定使用什么深奥的算法来执行常见任务,而且速度快(因为它在 Cython 中实现)。
- spacy 由该团队维护:https://explosion.ai/about
spacy 为任何 NLP 项目中常用的任务提供一站式服务,包括:
- Trained pipelines for different languages and tasks
- Multi-task learning with pretrained transformers like BERT
- Support for pretrained word vectors and embeddings
- State-of-the-art speed
- Production-ready training system
- Linguistically-motivated tokenization(分词)
- Components for named entity recognition(实体识别), part-of-speech-tagging(词性标注), dependency parsing(依赖解析), sentence segmentation(分句), text classification, lemmatization(词形还原), morphological analysis, entity linking and more
- Easily extensible with custom components and attributes
- Support for custom models in PyTorch, TensorFlow and other frameworks
- Built in visualizers for syntax and NER
- Easy model packaging, deployment and workflow management
- Robust, rigorously evaluated accuracy
(2)安装
pip install spacy
python -m spacy download en_core_web_sm
- 后续使用
spacy.load("en_core_web_sm")
时,需要预先下载该模型,否则会报错:
- 后续使用
- 该模型下载后会作为一个 python 包存在:
- 安装完成后,会有 spacy 指令:
which spacy
- 相关 spacy 命令参考:https://spacy.io/api/cli#download
(3)升级 spacy
- Some updates to spaCy may require downloading new statistical models. If you’re running spaCy v2.0 or higher, you can use the
validate
command to check if your installed models are compatible and if not, print details on how to update them:
pip install .tar.gz archive or .whl from path or URL
pip install /Users/you/en_core_web_sm-3.0.0.tar.gz pip install /Users/you/en_core_web_sm-3.0.0-py3-none-any.whl pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
<a name="MAnV5"></a>
## (2)使用示例
```python
# jy: 方式1: 使用 spacy.load(), 传入模型名称或模型路径.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
# jy: 方式2: 基于模型对应的 python 包(导入包并调用 load 方法)
# method with no arguments.
import spacy
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = nlp("This is a sentence.")
3、参考
- 官方 github:https://github.com/explosion/spacy