This document discusses setting up development and test sets when working on machine learning projects. Some key points:
- Development and test sets should come from the same distribution as the data expected in the future to properly evaluate models.
- The dev set should be large enough to detect differences between models, while the test set only needs to provide a confident estimate of overall performance.
- Establish a single evaluation metric for the team to optimize to speed up comparisons between models.
- Iteratively build systems, evaluate on dev sets, generate new ideas, and quickly iterate to make progress.
- Be prepared to change dev/test sets and metrics if they are no longer pointing the team in the right direction
Related topics: