• Each task has an associated pipeline(), it is simpler to use the general pipeline() abstraction which contains all the specific task pipelines.
  • The pipeline() automatically loads a default model and tokenizer capable of inference for your task.
  • To immediately use a model on a given input (text, image, audio, …), we provide the pipeline API. Pipelines group together a pretrained model with the preprocessing that was used during that model’s training.

generator = pipeline(task=”text-generation”)

doctest: +SKIP

res = generator(“Three Rings for the Elven-kings under the sky, Seven for the “ “Dwarf-lords in their halls of stone”)
“”” [{‘generated_text’: ‘Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Iron-priests at the door to the east, and thirteen for the Lord Kings at the end of the mountain’}] “”” print(res)

doctest: +SKIP

jy: If you have more than one input, pass your input as a list:

ls_res = generator([ “Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone”, “Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne”, ])

jy: The text-generation task has a generate() method with several parameters

for controlling the output. For example, if you want to generate more than one

output, set the num_return_sequences parameter:

res = generator( “Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone”, num_return_sequences=2,)

  1. <a name="Mnsed"></a>
  2. ## Choose a model and tokenizer
  3. The [pipeline()](https://huggingface.co/docs/transformers/v4.19.2/en/main_classes/pipelines#transformers.pipeline) accepts any model from the [Model Hub](https://huggingface.co/models). There are tags on the Model Hub that allow you to filter for a model you’d like to use for your task. Once you’ve picked an appropriate model, load it with the corresponding AutoModelFor and [`AutoTokenizer’] class. For example, load the [AutoModelForCausalLM](https://huggingface.co/docs/transformers/v4.19.2/en/model_doc/auto#transformers.AutoModelForCausalLM) class for a causal language modeling task:
  4. ```python
  5. from transformers import AutoTokenizer, AutoModelForCausalLM
  6. from transformers import pipeline
  7. tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
  8. model = AutoModelForCausalLM.from_pretrained("distilgpt2")
  9. # jy: Create a pipeline() for your task, and specify the model and tokenizer
  10. # you’ve loaded:
  11. generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
  12. #Pass your input text to the pipeline() to generate some text:
  13. res = generator("Three Rings for the Elven-kings under the sky, Seven for the "
  14. "Dwarf-lords in their halls of stone")
  15. # jy: [{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Dragon-lords (for them to rule in a world ruled by their rulers, and all who live within the realm'}]
  16. print(res)

2、audio-classification

from datasets import load_dataset
import torch
from transformers import pipeline


torch.manual_seed(42)
ds = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", 
                  split="validation")
audio_file = ds[0]["audio"]["path"]


# jy: Find an audio classification model on the Model Hub for emotion recognition 
#     and load it in the pipeline():
audio_classifier = pipeline(
    task="audio-classification", 
    model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
)

# jy: Pass the audio file to the pipeline():
preds = audio_classifier(audio_file)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(preds)
"""
[{'score': 0.1315, 'label': 'calm'}, 
 {'score': 0.1307, 'label': 'neutral'}, 
 {'score': 0.1274, 'label': 'sad'}, 
 {'score': 0.1261, 'label': 'fearful'}, 
 {'score': 0.1242, 'label': 'happy'}]
"""

3、image-classification

  • For example, what species of cat is shown below?

image.png

from transformers import pipeline

# jy: Specify your vision task and pass your image to the classifier. 
#     The image can be a link or a local path to the image.
vision_classifier = pipeline(task="image-classification")
preds = vision_classifier(
    images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")

preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(preds)
"""
[{'score': 0.4335, 'label': 'lynx, catamount'},
 {'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'},
 {'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'},
 {'score': 0.0239, 'label': 'Egyptian cat'},
 {'score': 0.0229, 'label': 'tiger cat'}]
"""

4、sentiment-analysis

  • Here is how to quickly use a pipeline to classify positive versus negative texts: ```python from transformers import pipeline

    jy: Allocate a pipeline for sentiment-analysis; it will download and cache the

    pretrained model used by the pipeline.

    classifier = pipeline(‘sentiment-analysis’)

    jy: evaluate the given text.

    res = classifier(‘We are very happy to introduce pipeline to the transformers ‘
               'repository.')
    

    [{‘label’: ‘POSITIVE’, ‘score’: 0.9996980428695679}]

    print(res)
<a name="ML7m7"></a>
# 5、question-answering

- Many NLP tasks have a pre-trained `pipeline` ready to go. For example, we can easily extract question answers given context:
```python
from transformers import pipeline

# Allocate a pipeline for question-answering
question_answerer = pipeline('question-answering')

res = question_answerer({
    'question': 'What is the name of the repository ?',
    'context': 'Pipeline has been included in the huggingface/transformers repository'})
# jy: In addition to the answer, the pretrained model used here returned its confidence
#     score, along with the start position and end position of the answer in the 
#     tokenized sentence. 返回结果如: 
#     {'score': 0.30970096588134766, 'start': 34, 'end': 58, 
#      'answer': 'huggingface/transformers'}
print(res)


# jy: For more than one sentence, pass a list of sentences to the pipeline() which
#     returns a list of dictionaries:
ls_sents = ["We are very happy to show you the Transformers library.", 
            "We hope you don't hate it."]
results = classifier(ls_sents)
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
"""
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
"""

6、Automatic Speech Recognition

import torch
from transformers import pipeline
from datasets import load_dataset, Audio

speech_recognizer = pipeline("automatic-speech-recognition", 
                             model="facebook/wav2vec2-base-960h")

# jy: 基于 datasets 包加载 MInDS-14 dataset
dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")

# jy: We need to make sure that the sampling rate of the dataset matches the sampling
#     rate facebook/wav2vec2-base-960h was trained on.  Audio files are automatically
#     loaded and resampled when calling the "audio" column.
dataset = dataset.cast_column(
    "audio", 
    Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate)
)

# jy: extract the raw waveform arrays of the first 4 samples and pass it as a list 
#     to the pipeline:
result = speech_recognizer(dataset[:4]["audio"])
print([d["text"] for d in result])
"""
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT', 
"FONDERING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE", 
"I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS", 
'HOW DO I TURN A JOIN A COUNT']
"""
  • MInDS-14 dataset:
  • For a larger dataset where the inputs are big (like in speech or vision), you will want to pass along a generator instead of a list that loads all the inputs in memory.

    7、加载不同框架(tf、torch)的模型,并与 pipeline 结合

  • Under the hood, the AutoModelForSequenceClassification and AutoTokenizer classes work together to power the pipeline() ```python

    jy: multilingual BERT model fine-tuned for sentiment analysis

    model_name = “nlptown/bert-base-multilingual-uncased-sentiment”

jy: pytorch ————————————————————————————————-

jy: Use the AutoModelForSequenceClassification and AutoTokenizer to load the

pretrained model and it’s associated tokenizer

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)

jy: tf ———————————————————————————————————

jy: Use the TFAutoModelForSequenceClassification and AutoTokenizer to load the

pretrained model and it’s associated tokenizer

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

model = TFAutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)

```python
# jy: Then you can specify the model and tokenizer in the pipeline(), and apply the
#     classifier on your target text:
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
res = classifier("Nous sommes très heureux de vous présenter la bibliothèque "
                 "Transformers.")
# jy: [{'label': '5 stars', 'score': 0.7273}]
print(res)