This document describes a study that uses machine learning models to predict active compounds for lung cancer. Specifically:
1) A dataset of molecules was collected from the ChEMBL database and divided into active and non-active groups based on inhibition concentration values. Molecular descriptors were then calculated to encode the chemical structures.
2) Two machine learning models - a neural network and gradient boosting tree classifier - were trained on the molecular descriptors to predict compound activity. Feature selection was also performed to analyze important structural features.
3) The models accurately predicted active compounds for lung cancer based on quantitative structure-activity relationships. Comparative analysis identified important chemical structures contributing to compound effectiveness.
Related topics: