Performance metrics evaluate how well the model performs on a specific task. These metrics depend on the type of generative model and its use case.

  • Accuracy and Precision (for tasks like text summarization or image captioning): Measures how often the generated output is correct compared to a human-annotated reference.
  • F1-Score: A combination of precision and recall, which can be useful for evaluating tasks where both false positives and false negatives are problematic.
  • Inception Score (IS) and Fréchet Inception Distance (FID) (for image generation models like GANs):
  • Inception Score evaluates the diversity and quality of images by measuring how well a classifier can classify them.
  • Fréchet Inception Distancemeasures the distance between the distribution of generated images and real images. Lower FID values indicate more realistic image generation.

BLEU, ROUGE, METEOR

  • (for text generation models): These are metrics commonly used to evaluate machine translation and summarization tasks . They measure the overlap between generated text and reference text.