1、介绍、安装

(1)介绍

  • spaCy is a library for advanced Natural Language Processing in Python and Cython. It’s built on the very latest research, and was designed from day one to be used in real products.
  • spaCy comes with pretrained pipelines and currently supports tokenization and training for 60+ languages. It features state-of-the-art speed and neural network models for tagging, parsing, named entity recognition, text classification and more, multi-task learning with pretrained transformers like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management. spaCy is commercial open-source software, released under the MIT license.
  • Trained pipelines for spaCy can be installed as Python packages. This means that they’re a component of your application, just like any other module. Models can be installed using spaCy’s download command, or manually by pointing pip to a path or URL.

  • spacy 是由 Matt Honnibal 在 Explosion AI 开发用于 “Python 中的工业强度 NLP” 的相对较新的软件包。它的设计考虑了应用数据科学家,这意味着它不会影响用户决定使用什么深奥的算法来执行常见任务,而且速度快(因为它在 Cython 中实现)。
  • spacy 由该团队维护:https://explosion.ai/about
  • spacy 为任何 NLP 项目中常用的任务提供一站式服务,包括:

    • Trained pipelines for different languages and tasks
    • Multi-task learning with pretrained transformers like BERT
    • Support for pretrained word vectors and embeddings
    • State-of-the-art speed
    • Production-ready training system
    • Linguistically-motivated tokenization(分词)
    • Components for named entity recognition(实体识别), part-of-speech-tagging(词性标注), dependency parsing(依赖解析), sentence segmentation(分句), text classification, lemmatization(词形还原), morphological analysis, entity linking and more
    • Easily extensible with custom components and attributes
    • Support for custom models in PyTorch, TensorFlow and other frameworks
    • Built in visualizers for syntax and NER
    • Easy model packaging, deployment and workflow management
    • Robust, rigorously evaluated accuracy

      (2)安装

  • pip install spacy

  • python -m spacy download en_core_web_sm
    • 后续使用spacy.load("en_core_web_sm")时,需要预先下载该模型,否则会报错:

image.png

  • 该模型下载后会作为一个 python 包存在:

image.png

  • 安装完成后,会有 spacy 指令:
    • which spacy

image.png

  • Some updates to spaCy may require downloading new statistical models. If you’re running spaCy v2.0 or higher, you can use the validate command to check if your installed models are compatible and if not, print details on how to update them:
    • pip install -U spacy
    • python -m spacy validate
    • image.png

      2、模型使用

      (1)模型下载

      ```shell

      Download best-matching version of specific model for your spaCy installation

      python -m spacy download en_core_web_sm

pip install .tar.gz archive or .whl from path or URL

pip install /Users/you/en_core_web_sm-3.0.0.tar.gz pip install /Users/you/en_core_web_sm-3.0.0-py3-none-any.whl pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz

  1. <a name="MAnV5"></a>
  2. ## (2)使用示例
  3. ```python
  4. # jy: 方式1: 使用 spacy.load(), 传入模型名称或模型路径.
  5. import spacy
  6. nlp = spacy.load("en_core_web_sm")
  7. doc = nlp("This is a sentence.")
  8. # jy: 方式2: 基于模型对应的 python 包(导入包并调用 load 方法)
  9. # method with no arguments.
  10. import spacy
  11. import en_core_web_sm
  12. nlp = en_core_web_sm.load()
  13. doc = nlp("This is a sentence.")

3、参考