Probability theory provides a framework for quantifying and manipulating uncertainty. It allows optimal predictions given incomplete information. The document outlines key probability concepts like sample spaces, events, axioms of probability, joint/conditional probabilities, and Bayes' rule. It also covers important probability distributions like binomial, Gaussian, and multivariate Gaussian. Finally, it discusses optimization concepts for machine learning like functions, derivatives, and using derivatives to find optima like maxima and minima.