This document summarizes probability distributions commonly used in machine learning, including:
- Binary variables (Bernoulli, binomial) and the beta distribution Bayesian prior
- Multinomial and multinomial distributions, and the Dirichlet distribution prior
- The Gaussian/normal distribution for continuous variables, including properties, conditional/marginal distributions, and inference
- Solutions for limitations of the Gaussian like diagonal/isotropic covariances and mixture models
It provides the key equations for maximum likelihood estimation and Bayesian inference using conjugate prior distributions for these common probability models.