1、text-generation
doctest: +SKIP
doctest: +SKIP
jy: If you have more than one input, pass your input as a list:
jy: The text-generation task has a generate() method with several parameters
for controlling the output. For example, if you want to generate more than one
output, set the num_return_sequences parameter:
2、audio-classification
3、image-classification
4、sentiment-analysis
jy: Allocate a pipeline for sentiment-analysis; it will download and cache the
pretrained model used by the pipeline.
jy: evaluate the given text.
[{‘label’: ‘POSITIVE’, ‘score’: 0.9996980428695679}]
6、Automatic Speech Recognition
7、加载不同框架（tf、torch）的模型，并与 pipeline 结合
jy: multilingual BERT model fine-tuned for sentiment analysis
jy: pytorch ————————————————————————————————-
jy: Use the AutoModelForSequenceClassification and AutoTokenizer to load the
pretrained model and it’s associated tokenizer
jy: tf ———————————————————————————————————
jy: Use the TFAutoModelForSequenceClassification and AutoTokenizer to load the
pretrained model and it’s associated tokenizer

Each task has an associated pipeline(), it is simpler to use the general pipeline() abstraction which contains all the specific task pipelines.
The pipeline() automatically loads a default model and tokenizer capable of inference for your task.
To immediately use a model on a given input (text, image, audio, …), we provide the pipeline API. Pipelines group together a pretrained model with the preprocessing that was used during that model’s training.
- 各 task 对应的 pipeline：https://huggingface.co/docs/transformers/main_classes/pipelines
- You can learn more about the tasks supported by the pipeline API in:
  - https://huggingface.co/docs/transformers/task_summary
    1、text-generation
```python from transformers import pipeline

generator = pipeline(task=”text-generation”)

doctest: +SKIP

res = generator(“Three Rings for the Elven-kings under the sky, Seven for the “ “Dwarf-lords in their halls of stone”)
“”” [{‘generated_text’: ‘Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Iron-priests at the door to the east, and thirteen for the Lord Kings at the end of the mountain’}] “”” print(res)

doctest: +SKIP

jy: If you have more than one input, pass your input as a list:

ls_res = generator([ “Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone”, “Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne”, ])

jy: The text-generation task has a `generate()` method with several parameters

for controlling the output. For example, if you want to generate more than one

output, set the `num_return_sequences` parameter:

res = generator( “Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone”, num_return_sequences=2,)

<a name="Mnsed"></a>
## Choose a model and tokenizer
The [pipeline()](https://huggingface.co/docs/transformers/v4.19.2/en/main_classes/pipelines#transformers.pipeline) accepts any model from the [Model Hub](https://huggingface.co/models). There are tags on the Model Hub that allow you to filter for a model you’d like to use for your task. Once you’ve picked an appropriate model, load it with the corresponding AutoModelFor and [`AutoTokenizer’] class. For example, load the [AutoModelForCausalLM](https://huggingface.co/docs/transformers/v4.19.2/en/model_doc/auto#transformers.AutoModelForCausalLM) class for a causal language modeling task:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
# jy: Create a pipeline() for your task, and specify the model and tokenizer 
#     you’ve loaded:
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
#Pass your input text to the pipeline() to generate some text:
res = generator("Three Rings for the Elven-kings under the sky, Seven for the "
                "Dwarf-lords in their halls of stone") 
# jy: [{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Dragon-lords (for them to rule in a world ruled by their rulers, and all who live within the realm'}]
print(res)

2、audio-classification

from datasets import load_dataset
import torch
from transformers import pipeline


torch.manual_seed(42)
ds = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", 
                  split="validation")
audio_file = ds[0]["audio"]["path"]


# jy: Find an audio classification model on the Model Hub for emotion recognition 
#     and load it in the pipeline():
audio_classifier = pipeline(
    task="audio-classification", 
    model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
)

# jy: Pass the audio file to the pipeline():
preds = audio_classifier(audio_file)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(preds)
"""
[{'score': 0.1315, 'label': 'calm'}, 
 {'score': 0.1307, 'label': 'neutral'}, 
 {'score': 0.1274, 'label': 'sad'}, 
 {'score': 0.1261, 'label': 'fearful'}, 
 {'score': 0.1242, 'label': 'happy'}]
"""

3、image-classification

For example, what species of cat is shown below?

from transformers import pipeline

# jy: Specify your vision task and pass your image to the classifier. 
#     The image can be a link or a local path to the image.
vision_classifier = pipeline(task="image-classification")
preds = vision_classifier(
    images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")

preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(preds)
"""
[{'score': 0.4335, 'label': 'lynx, catamount'},
 {'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'},
 {'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'},
 {'score': 0.0239, 'label': 'Egyptian cat'},
 {'score': 0.0229, 'label': 'tiger cat'}]
"""

4、sentiment-analysis

Here is how to quickly use a pipeline to classify positive versus negative texts: ```python from transformers import pipeline
jy: Allocate a pipeline for sentiment-analysis; it will download and cache the
pretrained model used by the pipeline.
classifier = pipeline(‘sentiment-analysis’)
jy: evaluate the given text.
res = classifier(‘We are very happy to introduce pipeline to the transformers ‘
```
           'repository.')
```
[{‘label’: ‘POSITIVE’, ‘score’: 0.9996980428695679}]
print(res)

<a name="ML7m7"></a>
# 5、question-answering

- Many NLP tasks have a pre-trained `pipeline` ready to go. For example, we can easily extract question answers given context:
```python
from transformers import pipeline

# Allocate a pipeline for question-answering
question_answerer = pipeline('question-answering')

res = question_answerer({
    'question': 'What is the name of the repository ?',
    'context': 'Pipeline has been included in the huggingface/transformers repository'})
# jy: In addition to the answer, the pretrained model used here returned its confidence
#     score, along with the start position and end position of the answer in the 
#     tokenized sentence. 返回结果如: 
#     {'score': 0.30970096588134766, 'start': 34, 'end': 58, 
#      'answer': 'huggingface/transformers'}
print(res)


# jy: For more than one sentence, pass a list of sentences to the pipeline() which
#     returns a list of dictionaries:
ls_sents = ["We are very happy to show you the Transformers library.", 
            "We hope you don't hate it."]
results = classifier(ls_sents)
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
"""
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
"""

6、Automatic Speech Recognition

import torch
from transformers import pipeline
from datasets import load_dataset, Audio

speech_recognizer = pipeline("automatic-speech-recognition", 
                             model="facebook/wav2vec2-base-960h")

# jy: 基于 datasets 包加载 MInDS-14 dataset
dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")

# jy: We need to make sure that the sampling rate of the dataset matches the sampling
#     rate facebook/wav2vec2-base-960h was trained on.  Audio files are automatically
#     loaded and resampled when calling the "audio" column.
dataset = dataset.cast_column(
    "audio", 
    Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate)
)

# jy: extract the raw waveform arrays of the first 4 samples and pass it as a list 
#     to the pipeline:
result = speech_recognizer(dataset[:4]["audio"])
print([d["text"] for d in result])
"""
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT', 
"FONDERING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE", 
"I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS", 
'HOW DO I TURN A JOIN A COUNT']
"""

MInDS-14 dataset：
- https://huggingface.co/datasets/PolyAI/minds14
For a larger dataset where the inputs are big (like in speech or vision), you will want to pass along a generator instead of a list that loads all the inputs in memory.

7、加载不同框架（tf、torch）的模型，并与 pipeline 结合
Under the hood, the AutoModelForSequenceClassification and AutoTokenizer classes work together to power the pipeline() ```python

jy: multilingual BERT model fine-tuned for sentiment analysis
model_name = “nlptown/bert-base-multilingual-uncased-sentiment”

jy: pytorch ————————————————————————————————-

jy: Use the AutoModelForSequenceClassification and AutoTokenizer to load the

pretrained model and it’s associated tokenizer

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)

jy: tf ———————————————————————————————————

jy: Use the TFAutoModelForSequenceClassification and AutoTokenizer to load the

pretrained model and it’s associated tokenizer

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

model = TFAutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)

```python
# jy: Then you can specify the model and tokenizer in the pipeline(), and apply the
#     classifier on your target text:
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
res = classifier("Nous sommes très heureux de vous présenter la bibliothèque "
                 "Transformers.")
# jy: [{'label': '5 stars', 'score': 0.7273}]
print(res)

【03】机器学习、深度学习

（02）pipeline() 模型加载

1、text-generation

doctest: +SKIP

doctest: +SKIP

jy: If you have more than one input, pass your input as a list:

jy: The text-generation task has a `generate()` method with several parameters

for controlling the output. For example, if you want to generate more than one

output, set the `num_return_sequences` parameter:

2、audio-classification

3、image-classification

4、sentiment-analysis

jy: Allocate a pipeline for sentiment-analysis; it will download and cache the

pretrained model used by the pipeline.

jy: evaluate the given text.

[{‘label’: ‘POSITIVE’, ‘score’: 0.9996980428695679}]

6、Automatic Speech Recognition

7、加载不同框架（tf、torch）的模型，并与 pipeline 结合

jy: multilingual BERT model fine-tuned for sentiment analysis

jy: pytorch ————————————————————————————————-

jy: Use the AutoModelForSequenceClassification and AutoTokenizer to load the

pretrained model and it’s associated tokenizer

jy: tf ———————————————————————————————————

jy: Use the TFAutoModelForSequenceClassification and AutoTokenizer to load the

pretrained model and it’s associated tokenizer

（02）pipeline() 模型加载

1、text-generation

doctest: +SKIP

doctest: +SKIP

jy: If you have more than one input, pass your input as a list:

jy: The text-generation task has a generate() method with several parameters

for controlling the output. For example, if you want to generate more than one

output, set the num_return_sequences parameter:

2、audio-classification

3、image-classification

4、sentiment-analysis

jy: Allocate a pipeline for sentiment-analysis; it will download and cache the

pretrained model used by the pipeline.

jy: evaluate the given text.

[{‘label’: ‘POSITIVE’, ‘score’: 0.9996980428695679}]

6、Automatic Speech Recognition

7、加载不同框架（tf、torch）的模型，并与 pipeline 结合

jy: multilingual BERT model fine-tuned for sentiment analysis

jy: pytorch ————————————————————————————————-

jy: Use the AutoModelForSequenceClassification and AutoTokenizer to load the

pretrained model and it’s associated tokenizer

jy: tf ———————————————————————————————————

jy: Use the TFAutoModelForSequenceClassification and AutoTokenizer to load the

pretrained model and it’s associated tokenizer

jy: The text-generation task has a `generate()` method with several parameters

output, set the `num_return_sequences` parameter: