Test sensitivity depends on positivity level

Test sensitivity depends on positivity level

Evaluating screening tests requires rigor as well as transparency in reporting. A JAMA-paper by Barnell et al from last year had neither, and I wrote a letter to the editor with Christian Fynbo Christiansen offering some reservations. The main issue is, that two tests are compared but with different positivity proportions; the fecal immunochemical test (FIT, the currently preferred stool-based screening test for colorectal cancer) and their own Colosense, which combines FIT, RNA-concentrations and smoking status. They conclude that their Colosense test has superior sensitivity, but more persons in their study cohort test positive with this test than with FIT, at the expense of more false positive tests. This is not a fair comparison, and I elaborate on why below.

Several letters with similar points has since been published (https://www.webofscience.com/wos/woscc/summary/d11513ba-4a16-4a38-bfdb-b6da5559244d-f6c55927/date-descending/1), and there was not space for ours as well.

However, the most relevant response of all, an empirical comparison of Colosense results with simply lowering the FIT positivity threshold, to yield a similar amount of positives has now come out by Niedermaier et al., and I think fellow nerds of screening and test validity will find the subtle burn highly enjoyable.

In their cohort, they find that at a FIT threshold low enough to match the proportion of positive tests by Colosense, the sensitivity, specificity etc. are... the same!

Read Dr. Tobias Niedermaier et al's research letter here:

Lowering Fecal Immunochemical Test Positivity Threshold vs Multitarget Stool RNA Testing for Colorectal Cancer Screening | Colorectal Cancer | JAMA | JAMA Network (kb.dk)

More on why positivity levels need to be held constant in test comparisons, in this excerpt of our draft of a letter to the editor:

In ColoSense, Barnell et al. combined FIT with RNA testing and information on smoking status in a risk-based screening model (1). They applied a risk threshold to define a positive or negative screening result and thereby used the model as a screening tool to determine who should be referred for colonoscopy. The test results of the FIT and ColoSense methods were then compared in terms of both positive findings (CRC or advanced adenomas) and negative findings (medium-risk or low-risk adenomas, or no findings).

Performance of a screening test is based on comparing test results to results from a “gold standard test” for diagnosing the specific condition (2). For CRC, the gold standard is commonly considered colonoscopy. The possible outcomes of a screening test are “true positive”, “false positive”, “true negative” and “false negative”.

The performance of a diagnostic or screening test thus is evaluated in terms of its sensitivity, specificity, positive predictive value, and negative predictive value, as presented in Table 1. All four values are needed to evaluate a test. For quantitative tests, such as the FIT or the ColoSense risk score, they are dependent on the cut-off used for positivity (3). Focusing on only one, or even two, of the performance components may be misleading. While all these values are not provided directly in the paper by Barnell et al., they can be derived from the paper’s results section, as shown in Table 2.

Thus, 28 of 36 CRCs and 175 of 606 advanced adenomas were correctly identified by FIT, whereas 34 of the 36 CRCs and 278 of the 606 advanced adenomas were correctly identified by ColoSense.

FIT and ColoSense had sensitivities for CRC of 77.8% and 94.4%, respectively. However, 628 (7%) tested positive with FIT, whereas 1,516 (17%) tested positive with ColoSense. Sensitivity depends on the underlying prevalence of the disease, but also on the chosen positivity level. Therefore, the number of positive tests needs to be held constant to judge the difference in risk assessment made by the FIT and ColoSense methods.

While the ColoSense method found more CRC cases (at the expense of more false positives), this was due to a lower threshold for positivity. Colosense possibly had an additional performance advantage, but the size of such an advantage cannot be inferred from the results presented in the paper.

The paper states the FDA requirement that sensitivity for CRC of a new screening test should be at least 90%, while maintaining a specificity of more than 80%. However, this is quite easily achieved, especially in a new test incorporating the standard test (FIT). The paper by Barnell et al., simply exemplifies that decreasing the threshold for positivity will lead to identification of more cancers than the current standard screening method. It is important to keep in mind that the decreased threshold for positivity is at the cost of more false positives.

1.         Barnell EK, Wurtzler EM, Rocca J La, Fitzgerald T, Petrone J, Hao Y, et al. Multitarget Stool RNA Test for Colorectal Cancer Screening. JAMA. 2023; 330(18):1760.

2.         Vandenbroucke JP, Sørensen HT. Clinical Epidemiology. In: Lash TL, VanderWeele TJ, Haneuse S, Rothman KJ, editors. Modern Epidemiology. 4. Wolters Kluwer; 2021; p. 895–930.

3.         Cleophas TJ, Zwinderman AH, Cleophas TF, Cleophas EP, editors. Summary of Validation Procedures for Diagnostic Tests. In: Statistics Applied to Clinical Trials. Dordrecht: Springer Netherlands; 2009; p. 433–47.

To view or add a comment, sign in

Others also viewed

Explore content categories