Mathias Müller discusses data leakage in meta modeling, particularly with HCC target encoding, and its effects on machine learning predictions. He explains how improper handling of data splits and target information can bias model performance and provides methods to mitigate these risks, including nested cross-validation. The presentation highlights the importance of avoiding target leakage to enhance model accuracy and reliability.
Related topics: