Generative models aim to learn a data distribution p(x|θ) from training samples. Three common ways to measure similarity between the model distribution p and the data distribution q are:
1) Kullback-Leibler (KL) divergence, which is used in maximum likelihood estimation.
2) Jensen-Shannon (JS) divergence, which is minimized during training of generative adversarial networks (GANs).
3) Optimal transport (OT) distance, such as the 1-Wasserstein distance, which provides a smooth measure of similarity and can be applied in the form of Wasserstein GANs.