The attribute that should be tested at the root of the decision tree is the attribute that results in the maximum information gain, or minimum entropy, when used to split the training data. In other words, the attribute that best separates the data according to the target classes. This attribute will create "purer" nodes with respect to the target classes.
Related topics: