Deep Learning across label confidence distributions
Source: https://blog.ourcrowd.com

Deep Learning across label confidence distributions

Label confidence and wisdom of crowds

When doing supervised machine learning, we would like to identify a relationship between some features and one or more output values associated with every data point. In case of classification, the output value(s) are categorical, which we call datapoint labels. However, these labels may not all have the same level of validity (ie. our confidence in these labels is variable). Some of the commonly used labeling processes for datapoints are conducted by averaging experimental measurements (eg. as in biological or chemical contexts), or by using the annotations of multiple experts (or non-experts). Historically this has its roots in  the Condorcet’s jury theorem presented by the French Mathematician (Marie Jean Antoine Nicolas de Caritat; 1743–1794). Based on this theorem, if each member of a jury has an equal and independent chance better than random, but worse than perfect, of making a correct judgment on whether a defendant is guilty, the majority of jurors is more likely to be correct than each individual juror, and the probability of a correct majority judgment approaches 1 as the jury size increases (https://plato.stanford.edu/entries/social-choice/). When we rely on wisdom of crowds or multiple measurements (or even multiple measurement techniques), the labels assigned to the datapoints can have different confidence levels based on the level of agreement between the human labellers or multiple measurements. The important question is, should we only use the datapoints with high label confidence?

We can still learn from disagreement

Although we may prefer to rely on high confidence labels, there is still knowledge in the disagreement between labeling strategies, which result in low confidence labels. Hence, we may benefit from learning across the label confidence distribution (datapoints with different label confidence levels). To achieve this goal, there are two classic approaches:

  1. Ensemble learning: In ensemble learning, we build multiple models and then consider votes of all the models to come up with a prediction of the label of each data point. We can use the same approach by building multiple models across the label confidence distribution. Although this approach can result in better performance than that of models trained on high confidence datapoints alone, it may not be feasible to be used in an industrial setting. These models rely on training multiple models to predict the label of a new datapoint, which is computationally very expensive. 
  2. Assigning weights: Confidence based weight-assignment in the optimization process is another approach that can be used to learn from datapoints with different confidence levels. In spite of successes of this method, some systems may be constrained from using weight-based strategies, such as: those with discrete confidence tiers rather than continuous probabilities, systems with only partial confidence assignments, or systems using simulated random negatives. Drug-target interaction prediction is among such problems facing  the issue of simulated random negative data points. 

We presented an alternative approach called Filtered Transfer Learning (FTL) that does not have the computational cost issue of ensemble learning and can be used in systems incompatible with weight-based strategies.

Filtered Transfer Learning (FTL)

We developed Filtered Transfer Learning (FTL), a technique relying on the concept of transfer learning (see the following figure). In transfer learning, a model (like a deep learning model) is first trained on a reference task (typically with much more datapoints) and then afterwards fine-tuned on a smaller task to come up with the task specific model. In FTL, a neural network model is first trained on datapoints with different confidence levels and the lower confidence datapoints are then filtered in a stepwise manner prior to retraining. Eventually the model is trained on the highest confidence datapoints and then used to predict the label of new data points (like those in the test set). We implemented this technique for predicting drug-target interaction using STITCH dataset and showed high performance of this technique (available in our preprint: https://arxiv.org/abs/2006.02528).

No alt text provided for this image

Source: Tonekaboni, Seyed Ali Madani, et al. arXiv preprint arXiv:2006.02528 (2020). (https://arxiv.org/abs/2006.02528)

Beyond drug-target interaction

Dealing with datapoints with different confidence levels is not limited to the drug-target interaction problem. There are many problems, within and outside of healthcare and pharmaceutical industries, that can benefit from the FTL technique such as:

  1. Radiological or histopathological images or image segment labels: Radiological and histopathological Images (or image parts) are labeled by experts within hospitals and healthcare settings. Although their labeling can be very accurate, the labels are not perfect and confidence of image labels are variable.
  2. Crowdsourced image annotation: Many of the image datasets, containing images of animals, cars, etc., are labeled based on wisdom of crowds resulting in images with different label confidence.
  3. Resistance to drugs: Although patients (or model systems) can be categorized as resistant and sensitive to drugs, these categorizations rely on continuous measurements. Hence, considering a threshold to categorize the data points as sensitive or resistant to drugs results in low confidence (datapoints closer to the threshold) and high confidence (datapoints further away from the threshold) datapoints.

Reference:

  1. Almeida, Matthew, et al. "Mitigating Class Boundary Label Uncertainty to Reduce Both Model Bias and Variance." arXiv preprint arXiv:2002.09963 (2020).
  2. Brady, Adrian P. "Error and discrepancy in radiology: inevitable or avoidable?." Insights into imaging 8.1 (2017): 171-182.
  3. Hagenah, Jannis, Sascha Leymann, and Floris Ernst. "Integrating Label Uncertainty in Ultrasound Image Classification using Weighted Support Vector Machines." Current Directions in Biomedical Engineering 5.1 (2019): 285-287.
  4. Hafner, Marc, et al. "Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs." Nature methods 13.6 (2016): 521.
  5. Luo, Zhe, Xin Dang, and Yixin Chen. "Label confidence based AdaBoost algorithm." 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017.
  6. Oakden-Rayner, Luke. "Exploring Large-scale Public Medical Image Datasets." Academic radiology 27.1 (2020): 106-112.
  7. Reamaroon, Narathip, et al. "Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome." IEEE journal of biomedical and health informatics 23.1 (2018): 407-415.
  8. Szklarczyk, Damian, et al. "STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data." Nucleic acids research 44.D1 (2016): D380-D384.
  9. Tonekaboni, Seyed Ali Madani, et al. "Learning across label confidence distributions using Filtered Transfer Learning." arXiv preprint arXiv:2006.02528 (2020).
  10. Yosinski, Jason, et al. "How transferable are features in deep neural networks?." Advances in neural information processing systems. 2014.


Sekhar Maddula

Founder & CEO at Cartoon Movie ™ - Forbes honored eXtrepreneur

5y

Looks interesting, Ali! Concept looks similar to Snorkel for Dark Data Extraction by Stanford.

Naheed Kurji

Building Something New x 2 | Girl Dad x 2

5y

Keep up the top notch work, Ali. This is a wonderful contribution to the ML world.

To view or add a comment, sign in

Others also viewed

Explore topics