This study presents the signal-to-noise ratio optimized gene selection and clustering for cancer classification (SNR-OGSCC) methodology, aimed at enhancing classification accuracy while reducing the dimensionality of gene expression data across various cancer types. Implemented on a standard computational setup, the SNR-OGSCC method combines advanced filtering, clustering, and machine learning techniques, demonstrating significant improvements in classification accuracy on seven cancer datasets: leukemia, colon cancer, prostate cancer, lung cancer, lymphoma, central nervous system (CNS) tumors, and ovarian cancer. Notably, our approach achieved perfect accuracies of 100% for leukemia, lung cancer, and ovarian cancer, with high accuracies of 98.4% for colon cancer, 99.1% for prostate cancer, 98.3% for lymphoma, and 99.7% for CNS tumors, while requiring as few as 4–5 genes for effective classification. These findings highlight the efficiency and robustness of the SNR-OGSCC methodology, suggesting its potential to identify meaningful biomarkers and improve personalized cancer treatment strategies. Further validation with larger datasets and biological experiments is essential to confirm its applicability in clinical settings.
Related topics: