That sounds like it could be an intiresting metric. Worth noting that there is a...

psb217 · on Oct 17, 2024

This is a bit pedantic, but FID score wouldn't really be a viable metric for best of n selection since it's a metric that's only computable for distributions of samples. FID score is also pretty high variance for small sample sizes, so you need a lot of samples to compute a meaningful FID score.

Better metrics (assuming goal is text->image) would be some sort of inception score or CLIP-based text matching score. These metrics are computable on single samples.

cube2222 · on Oct 16, 2024

Yeah I’d likely just pick the best scoring one (that is, the pick is made by the evaluation tool, not the model) - to simulate “whatever the receiver deemed best for what they wanted”.