- Datasets provides various common and NLP-specific metrics for you to measure your models performance. In this section of the tutorials, you will load a metric and use it to evaluate your models predictions.
- metrics:https://huggingface.co/metrics
- You can see what metrics are available with
list_metrics():list_metrics():https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/loading_methods#datasets.list_metrics ```python from datasets import list_metrics
metrics_list = list_metrics()
jy: 47
print(len(metrics_list))
print(metrics_list) “”” [‘accuracy’, ‘bertscore’, ‘bleu’, ‘bleurt’, ‘cer’, ‘chrf’, ‘code_eval’, ‘comet’, ‘competition_math’, ‘coval’, ‘cuad’, ‘exact_match’, ‘f1’, ‘frugalscore’, ‘glue’, ‘google_bleu’, ‘indic_glue’, ‘mae’, ‘mahalanobis’, ‘matthews_correlation’, ‘mauve’, ‘mean_iou’, ‘meteor’, ‘mse’, ‘pearsonr’, ‘perplexity’, ‘precision’, ‘recall’, ‘rl_reliability’, ‘roc_auc’, ‘rouge’, ‘sacrebleu’, ‘sari’, ‘seqeval’, ‘spearmanr’, ‘squad’, ‘squad_v2’, ‘super_glue’, ‘ter’, ‘trec_eval’, ‘wer’, ‘wiki_split’, ‘xnli’, ‘xtreme_s’, ‘jordyvl/ece’, ‘lvwerra/aweeesoooome_metric’, ‘lvwerra/test’] “””
<a name="ZfRyP"></a>## 1、Load metric- It is very easy to load a metric with Datasets. In fact, you will notice that it is very similar to loading a dataset!- Load a metric from the Hub with `load_metric()`:- `load_metric()`:[https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/loading_methods#datasets.load_metric](https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/loading_methods#datasets.load_metric)```pythonfrom datasets import load_metricmetric = load_metric('glue', 'mrpc')
This will load the metric associated with the MRPC dataset from the GLUE benchmark.
2、Select a configuration
If you are using a benchmark dataset, you need to select a metric that is associated with the configuration you are using.
Select a metric configuration by providing the configuration name:
metric = load_metric('glue', 'mrpc')
3、Metrics object
Before you begin using a
Metricobject, you should get to know it a little better. As with a dataset, you can return some basic information about a metric. For example, access theinputs_descriptionparameter indatasets.MetricInfoto get more information about a metrics expected input format and some usage examples:Metric:https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/main_classes#datasets.Metricdatasets.MetricInfo:https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/main_classes#datasets.MetricInfoprint(metric.inputs_description)"""Compute GLUE evaluation metric associated to each GLUE dataset.Args:predictions: list of predictions to score.Each translation should be tokenized into a list of tokens.references: list of lists of references for each translation.Each reference should be tokenized into a list of tokens.Returns: depending on the GLUE subset, one or several of:"accuracy": Accuracy"f1": F1 score"pearson": Pearson Correlation"spearmanr": Spearman Correlation"matthews_correlation": Matthew CorrelationExamples:>>> glue_metric = datasets.load_metric('glue', 'sst2') # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]>>> references = [0, 1]>>> predictions = [0, 1]>>> results = glue_metric.compute(predictions=predictions, references=references)>>> print(results){'accuracy': 1.0}...>>> glue_metric = datasets.load_metric('glue', 'mrpc') # 'mrpc' or 'qqp'>>> references = [0, 1]>>> predictions = [0, 1]>>> results = glue_metric.compute(predictions=predictions, references=references)>>> print(results){'accuracy': 1.0, 'f1': 1.0}..."""
Notice for the MRPC configuration, the metric expects the input format to be zero or one. For a complete list of attributes you can return with your metric, take a look at
MetricInfo.MetricInfo:https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/main_classes#datasets.MetricInfo4、Compute metric
Once you have loaded a metric, you are ready to use it to evaluate a models predictions. Provide the model predictions and references to
compute():compute():https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/main_classes#datasets.Metric.computemodel_predictions = model(model_inputs)final_score = metric.compute(predictions=model_predictions,references=gold_references)
