The document discusses big data challenges and potential solutions. It begins by outlining how big data is generated from various sources and used in applications like search engines. The main challenges are determining which subset of big data to analyze and how to clean noisy data. Two potential solutions discussed are:
1) Intelligent sampling to determine a representative subset of data to analyze instead of the entire dataset, in order to improve running time. Adaptive sampling techniques like IDASA are proposed.
2) Filtering techniques like ensemble filtering use multiple models to identify and remove mislabeled instances from training data, in order to improve predictive accuracy by cleaning the data. Bayesian analysis can interpret filtering as a form of model averaging.