This document discusses translating data to predictive models for drug discovery. It describes preprocessing data through standardization and tautomerization to reduce noise. Feature importance analysis found protonation and partitioning descriptors important for many models. Models were successfully built on large benchmark datasets like ChEMBL and applications, achieving good performance. The models are integrated into a discovery platform to provide predictions to medicinal chemists via an interface, filling gaps in their structure-activity knowledge. Overall the document outlines an end-to-end workflow for applying machine learning to drive insights from compound data.