This document provides an overview of data preparation and descriptive statistics in SystemML. It discusses data formatting, pre-processing such as transforming categorical features and handling missing values, and descriptive statistics including univariate, bivariate, and stratified statistics. Univariate statistics describe the distribution of individual variables, bivariate statistics measure associations between pairs of variables, and stratified statistics measure associations between variables within subgroups defined by a categorical variable to control for confounding.