- 1、text-generation
- doctest: +SKIP
- doctest: +SKIP
- jy: If you have more than one input, pass your input as a list:
- jy: The text-generation task has a
generate()
method with several parameters - for controlling the output. For example, if you want to generate more than one
- output, set the
num_return_sequences
parameter: - 2、audio-classification
- 3、image-classification
- 4、sentiment-analysis
- jy: Allocate a pipeline for sentiment-analysis; it will download and cache the
- pretrained model used by the pipeline.
- jy: evaluate the given text.
- [{‘label’: ‘POSITIVE’, ‘score’: 0.9996980428695679}]
- 6、Automatic Speech Recognition
- 7、加载不同框架(tf、torch)的模型,并与 pipeline 结合
- jy: multilingual BERT model fine-tuned for sentiment analysis
- jy: pytorch ————————————————————————————————-
- jy: Use the AutoModelForSequenceClassification and AutoTokenizer to load the
- pretrained model and it’s associated tokenizer
- jy: tf ———————————————————————————————————
- jy: Use the TFAutoModelForSequenceClassification and AutoTokenizer to load the
- pretrained model and it’s associated tokenizer
- Each task has an associated
pipeline()
, it is simpler to use the generalpipeline()
abstraction which contains all the specific task pipelines. - The
pipeline()
automatically loads a default model and tokenizer capable of inference for your task. - To immediately use a model on a given input (text, image, audio, …), we provide the
pipeline
API. Pipelines group together a pretrained model with the preprocessing that was used during that model’s training.- 各 task 对应的 pipeline:https://huggingface.co/docs/transformers/main_classes/pipelines
- You can learn more about the tasks supported by the
pipeline
API in:- https://huggingface.co/docs/transformers/task_summary
1、text-generation
```python from transformers import pipeline
- https://huggingface.co/docs/transformers/task_summary
generator = pipeline(task=”text-generation”)
doctest: +SKIP
res = generator(“Three Rings for the Elven-kings under the sky, Seven for the “
“Dwarf-lords in their halls of stone”)
“””
[{‘generated_text’: ‘Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Iron-priests at the door to the east, and thirteen for the Lord Kings at the end of the mountain’}]
“””
print(res)
doctest: +SKIP
jy: If you have more than one input, pass your input as a list:
ls_res = generator([ “Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone”, “Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne”, ])
jy: The text-generation task has a generate()
method with several parameters
for controlling the output. For example, if you want to generate more than one
output, set the num_return_sequences
parameter:
res = generator( “Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone”, num_return_sequences=2,)
<a name="Mnsed"></a>
## Choose a model and tokenizer
The [pipeline()](https://huggingface.co/docs/transformers/v4.19.2/en/main_classes/pipelines#transformers.pipeline) accepts any model from the [Model Hub](https://huggingface.co/models). There are tags on the Model Hub that allow you to filter for a model you’d like to use for your task. Once you’ve picked an appropriate model, load it with the corresponding AutoModelFor and [`AutoTokenizer’] class. For example, load the [AutoModelForCausalLM](https://huggingface.co/docs/transformers/v4.19.2/en/model_doc/auto#transformers.AutoModelForCausalLM) class for a causal language modeling task:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
# jy: Create a pipeline() for your task, and specify the model and tokenizer
# you’ve loaded:
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
#Pass your input text to the pipeline() to generate some text:
res = generator("Three Rings for the Elven-kings under the sky, Seven for the "
"Dwarf-lords in their halls of stone")
# jy: [{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Dragon-lords (for them to rule in a world ruled by their rulers, and all who live within the realm'}]
print(res)
2、audio-classification
from datasets import load_dataset
import torch
from transformers import pipeline
torch.manual_seed(42)
ds = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean",
split="validation")
audio_file = ds[0]["audio"]["path"]
# jy: Find an audio classification model on the Model Hub for emotion recognition
# and load it in the pipeline():
audio_classifier = pipeline(
task="audio-classification",
model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
)
# jy: Pass the audio file to the pipeline():
preds = audio_classifier(audio_file)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(preds)
"""
[{'score': 0.1315, 'label': 'calm'},
{'score': 0.1307, 'label': 'neutral'},
{'score': 0.1274, 'label': 'sad'},
{'score': 0.1261, 'label': 'fearful'},
{'score': 0.1242, 'label': 'happy'}]
"""
3、image-classification
- For example, what species of cat is shown below?
from transformers import pipeline
# jy: Specify your vision task and pass your image to the classifier.
# The image can be a link or a local path to the image.
vision_classifier = pipeline(task="image-classification")
preds = vision_classifier(
images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(preds)
"""
[{'score': 0.4335, 'label': 'lynx, catamount'},
{'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'},
{'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'},
{'score': 0.0239, 'label': 'Egyptian cat'},
{'score': 0.0229, 'label': 'tiger cat'}]
"""
4、sentiment-analysis
- Here is how to quickly use a pipeline to classify positive versus negative texts:
```python
from transformers import pipeline
jy: Allocate a pipeline for sentiment-analysis; it will download and cache the
pretrained model used by the pipeline.
classifier = pipeline(‘sentiment-analysis’)jy: evaluate the given text.
res = classifier(‘We are very happy to introduce pipeline to the transformers ‘'repository.')
[{‘label’: ‘POSITIVE’, ‘score’: 0.9996980428695679}]
print(res)
<a name="ML7m7"></a>
# 5、question-answering
- Many NLP tasks have a pre-trained `pipeline` ready to go. For example, we can easily extract question answers given context:
```python
from transformers import pipeline
# Allocate a pipeline for question-answering
question_answerer = pipeline('question-answering')
res = question_answerer({
'question': 'What is the name of the repository ?',
'context': 'Pipeline has been included in the huggingface/transformers repository'})
# jy: In addition to the answer, the pretrained model used here returned its confidence
# score, along with the start position and end position of the answer in the
# tokenized sentence. 返回结果如:
# {'score': 0.30970096588134766, 'start': 34, 'end': 58,
# 'answer': 'huggingface/transformers'}
print(res)
# jy: For more than one sentence, pass a list of sentences to the pipeline() which
# returns a list of dictionaries:
ls_sents = ["We are very happy to show you the Transformers library.",
"We hope you don't hate it."]
results = classifier(ls_sents)
for result in results:
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
"""
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
"""
6、Automatic Speech Recognition
import torch
from transformers import pipeline
from datasets import load_dataset, Audio
speech_recognizer = pipeline("automatic-speech-recognition",
model="facebook/wav2vec2-base-960h")
# jy: 基于 datasets 包加载 MInDS-14 dataset
dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")
# jy: We need to make sure that the sampling rate of the dataset matches the sampling
# rate facebook/wav2vec2-base-960h was trained on. Audio files are automatically
# loaded and resampled when calling the "audio" column.
dataset = dataset.cast_column(
"audio",
Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate)
)
# jy: extract the raw waveform arrays of the first 4 samples and pass it as a list
# to the pipeline:
result = speech_recognizer(dataset[:4]["audio"])
print([d["text"] for d in result])
"""
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT',
"FONDERING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE",
"I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS",
'HOW DO I TURN A JOIN A COUNT']
"""
- MInDS-14 dataset:
For a larger dataset where the inputs are big (like in speech or vision), you will want to pass along a generator instead of a list that loads all the inputs in memory.
7、加载不同框架(tf、torch)的模型,并与 pipeline 结合
Under the hood, the
AutoModelForSequenceClassification
andAutoTokenizer
classes work together to power thepipeline()
```pythonjy: multilingual BERT model fine-tuned for sentiment analysis
model_name = “nlptown/bert-base-multilingual-uncased-sentiment”
jy: pytorch ————————————————————————————————-
jy: Use the AutoModelForSequenceClassification and AutoTokenizer to load the
pretrained model and it’s associated tokenizer
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)
jy: tf ———————————————————————————————————
jy: Use the TFAutoModelForSequenceClassification and AutoTokenizer to load the
pretrained model and it’s associated tokenizer
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
model = TFAutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)
```python
# jy: Then you can specify the model and tokenizer in the pipeline(), and apply the
# classifier on your target text:
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
res = classifier("Nous sommes très heureux de vous présenter la bibliothèque "
"Transformers.")
# jy: [{'label': '5 stars', 'score': 0.7273}]
print(res)