Data preprocessing involves cleaning data by handling missing values, noise, and inconsistencies. It also includes integrating and transforming data through normalization, aggregation, and dimensionality reduction. The goals are to improve data quality and reduce data volume for mining while maintaining the essential information. Techniques like binning, clustering, regression and histograms are used to discretize and reduce numerical attributes.