Metrics

Inception Score (IS)

Inception Score (IS) is a metric to measure how much GAN generates high-fidelity and diverse images. Calculating IS requires the pre-trained Inception-V3 network, and recent approaches utilize OpenAI’s TensorFlow implementation.

Frechet Inception Distance (FID)

FID is a widely used metric to evaluate the performance of a GAN model. Calculating FID requires the pre-trained Inception-V3 network, and modern approaches use Tensorflow-based FID. StudioGAN utilizes the PyTorch-based FID to test GAN models in the same PyTorch environment. We show that the PyTorch based FID implementation provides almost the same results with the TensorFlow implementation (See Appendix F of our paper).

Precision and Recall (PR)

Precision measures how accurately the generator can learn the target distribution. Recall measures how completely the generator covers the target distribution. Like IS and FID, calculating Precision and Recall requires the pre-trained Inception-V3 model. StudioGAN uses the same hyperparameter settings with the original Precision and Recall implementation, and StudioGAN calculates the F-beta score suggested by Sajjadi et al.

LPIPS

LPIPS.为了评估多样性,我们使用了 LPIPS。 测量生成的样本之间的平均特征距离。较高的 LPIPS 分数表明生成的图像具有较好的多样性。图4

NDB and JSD

测量实际图像与生成图像之间的相似度,这些指标评估了生成模型的模式缺失程度。通过K-mean聚类算法将训练样本聚类到不同的bins(箱子)中,bins可以被视为真实数据分布的模式。然后,每个生成的样本被分配到其最近邻居的bins中。我们计算训练样本和合成样本的bins比例,以评估生成的分布和真实数据分布之间的差异。较低数值,代表拟合更多模式,生成的数据分布更接近真实数据分布。