This document analyzes when undersampling is effective for addressing class imbalance in classification tasks. It introduces the concepts of warping in posterior distributions and increased variance due to sample removal with undersampling. It presents a theoretical condition under which undersampling is expected to improve classification accuracy based on comparing the ranking error probability with and without undersampling. Experiments on synthetic univariate and bivariate datasets are used to illustrate factors influencing whether the condition holds.
Related topics: