Target Identification Is Foundational For Drug Development
Welcome to Drug Discovery Diaries, where I share and explore my and others’ thoughts on early drug discovery across industry and academia!
Since this is the inaugural post for Drug Discovery Diaries, I could think of no better and more important topic to cover than target identification. Target identification is the foundational work upon which all drug discovery and development efforts are built. Whether you’re a start-up targeting a rare disease or a global pharma expanding your oncology portfolio, the entire process of taking a molecule from discovery to patient relies on a fundamental and unavoidable fact: the target you have chosen must have a real biological impact on physiology or disease.
In other words, if your favorite gene, protein, lipid, or sugar from your Ph.D. thesis, your side-project, or your new start-up doesn’t have a well-defined relationship with the disease or physiology that you wish to work on, then your foundation is already shaky! Alternatively, you might have worked on a disease space and through careful analysis found targets of interest but you have not yet validated their effects in real world models (see deeper posts on the world of target validation). In either case, you will need to build a foundational model for your target through real world evidence.
Thankfully, there is no better time than today to start building a foundational model for your target. We live in the most data rich time in all of human history and there is a panoply of information to ingest, dissect, and leverage. One of my favorite ways to start building a foundational model is to break up the exploration into four key components:
1) Target-To-Phenotype Data: Do biological models already exist for your target and is there a strong relationship between target and effect? Evaluate the causal relationship between your target and it’s biology by accessing databases like DepMap for gene essentiality, MGI for gene to phenotype in mouse models, Crop-GS Hub for genetic dependencies in plants/crops, SGD for yeast dependencies, and BV-BRC for pathogens.
2) Target-To-Structure Data: Is there structural information for your target already? Has your target been crystalized or modelled in some way? Did your target show up in high-throughput drug screens? Are there already tool compounds that can help you validate target effects in the clinical space of your interest? Evaluate the structural information on your target by accessing databases like PubChem and chEMBL for pre-existing compounds and bio-assay data, PDB and EMDB for x-ray, NMR, and cryo-EM structures, and UniProt for sequences, variants, and paralogues.
3) Target-To-Expression Data: Is your target expressed extracellularly or intracellularly? Is your target found in an off-target tissue that could cause toxicity? Are there certain disease states where your target is highly expressed or lowly expressed? Is there a certain developmental window where your target is elevated? Are there certain single-cell populations that express your target? Find out the expression patterns for your target using tools like TCGA for cancer-specific expression profiles, DepMap for cell-line-specific expression, CELLxGENE and Human Cell Atlas for single-cell profiles, and The Human Protein Atlas for a comprehensive overview of protein expression based on tissue and cell type, secretome, subcellular localization, and much more.
4) Target-To-Text Data: Has your target already been included in a clinical trial? Have inventors patented molecules against your target? Are there hidden textual associations (see this issue’s AI tip below) regarding your target in large data sets that you might be missing? We are highly trained as scientists to use Pubmed and Google Scholar to learn about targets but there is a wealth of information available hidden in places like the patent literature and clinical trial repositories. Check out Google Patents or Espacenet for mentions of your target in the patent literature and EU Clinical Trials Register or the WHO International Clinical Trials Registry Platform for ongoing or past clinical trials.
Target identification is a critical first step for any drug discovery or development project. This starting point that can make or break the rest of your drug discovery journey. While the process can seem daunting, the sheer volume of accessible data today offers an unprecedented opportunity to build robust, evidence-based models around your target. By systematically exploring phenotype, structure, expression, and text data, you can begin to separate promising hypotheses from scientific wishful thinking. In future posts, we’ll dig deeper into each of these domains and explore how to move from identification to validation and beyond. A well-chosen target isn’t just a starting point. It’s a statement of intent for everything that follows.
What kind of targets are you exploring? What kind of models do you like to build? Are there any tools you would recommend? Let me know in the comments!
Assistant Professor - Department of Oncology, University of Turin
3moBrilliant! Loved reading this, thanks Sam. Looking forward to the next episode!
Postdoctoral Researcher at DKFZ German Cancer Research Center
3moWhat a cool idea! Great job Sam! Looking forward to the next installment 👍
PhD student at German Cancer Research Center (DKFZ)
3moI really enjoyed reading this, Sam! Looking forward to your next post 🤓
PhD in Life Sciences | Preclinical Hematology & Oncology | Portfolio Strategy | www(.)ianghezzi(.)com
3moHi Sam, great initiative! Thank you for sharing such great resources!