Performance metrics evaluate how well the model performs on a specific task.
These metrics depend on the type of generative model and its use case.
- Accuracy and Precision (for tasks like text summarization or image captioning):
Measures how often the generated output is correct compared to a human-annotated reference.
- F1-Score: A combination of precision and recall, which can be useful for evaluating tasks where both false positives and false negatives are problematic.
- Inception Score (IS) and Fréchet Inception Distance (FID) (for image generation models like GANs):
- Inception Score evaluates the diversity and quality of images by measuring how well a classifier can classify them.
- Fréchet Inception Distancemeasures the distance between the distribution of generated images and real images. Lower FID values indicate more realistic image generation.