Mastering Synthesizability in AI Drug Design
For years, the promise of generative AI in drug discovery has been electrifying: algorithms that can design novel molecules with perfect properties (like high potency, high selectivity, and excellent safety profiles) in a fraction of the time of traditional methods. We've seen an explosion of tools, with over 40 new models mentioned in the previous article. But a critical question has always loomed over these digital blueprints: Are they makeable? This is the challenge of synthetic accessibility, a bottleneck that separates a brilliant in silico idea from a tangible compound in a vial. A molecule that can't be synthesized is a dead end, wasting valuable time and resources.
In this fourth article of my series on generative AI in drug discovery, we'll dive deep into how the field is tackling the synthesizability problem head-on, evolving from simple scoring heuristics to fully integrated systems that design molecules with a viable synthesis plan from the very beginning.
The evolution of the "Can we make this?" score
The first step in solving a problem is measuring it. For years, computational chemists have relied on heuristic scores to get a quick estimate of a molecule's complexity, a critical need when trudging through millions of virtual candidates. This led to the development of several foundational scoring methods.
The classics (heuristics): Tools like SAscore and SCScore became industry standards. SAscore operates on a simple but effective principle: it analyzes a molecule's structure, penalizing it for having rare structural fragments (based on their frequency in large databases like PubChem) or high complexity (e.g., many rings, stereocenters), assuming these features are harder to synthesize. SCScore takes a different approach, using a deep neural network trained on millions of reactions from the Reaxys database to predict the number of reaction steps a synthesis might require. While incredibly useful for a first-pass filter, these heuristics are blunt instruments. They provide a general sense of complexity but often lack the nuance to predict the feasibility of a specific reaction pathway and can be unreliable for molecules outside the typical "drug-like" space, such as natural products or functional materials.
A prime example of how these classic heuristics are applied is within optimization platforms like Moldrug (Universität des Saarlandes and Boehringer Ingelheim). Rather than being a purely generative model, Moldrug uses a genetic algorithm (GA) to intelligently explore and optimize a population of molecules starting from a known seed, balancing multiple critical drug properties simultaneously. The framework uses the CReM library to modify molecules by replacing fragments, ensuring new structures remain chemically valid and synthetically plausible. Then, GA iteratively applies selection and mutation operations to guide molecules toward better properties. Moldrug employs Derringer-Suich desirability functions to create a single "cost" value balancing competing objectives, evaluating: Binding Affinity (AutoDock-Vina Score), Drug-likeness (qed), and Synthetic Accessibility (using sa_score). This is where classic scores play a crucial role.
Template-based retrosynthesis with AiZynthFinder: Before generative models can be guided by "live" retrosynthesis, the underlying engines must be fast, reliable, and accessible. AiZynthFinder is a prominent open-source tool providing a robust framework for retrosynthetic planning. Developed by AstraZeneca researchers, AiZynthFinder uses a Monte Carlo Tree Search algorithm to recursively break down target molecules into simpler precursors. This search is guided by a neural network trained on reaction templates from sources like the USPTO database. The process continues until it traces a path back to purchasable building blocks cataloged in a "stock" file. Key features include:
Speed: Finds potential routes in under 10 seconds, with exhaustive searches completing in less than a minute
Flexibility: Modular design allowing customization with different algorithms, policies, and stock libraries
Robustness: Built with modern software engineering principles ensuring reliability
AiZynthFinder serves both as a standalone tool and as a critical component in the AI-driven discovery ecosystem. It's used by frameworks like SynFlowNet (described later) as an external validator for synthesizability, and functions as the type of "oracle" that enables advanced systems like Saturn (see later) to provide real-time feedback, guiding generation toward molecules that are practically achievable in the lab.
The rise of retrosynthesis-informed scores: The next logical step was to create scores based not on abstract rules, but on the direct output of a Computer-Aided Synthesis Planning (CASP) tool. This new generation of scores answers a more practical question: "Can an AI actually find a synthetic route for this?" A prominent example of this approach is the RScore from Iktos, which is derived from a full retrosynthetic analysis using the Spaya software. It considers the number of steps, likelihood of reactions, and route convergence to produce a score from 0 (no route found) to 1 (simple, known synthesis). To make this computationally intensive check practical for high-throughput screening, researchers also developed RSPred, a neural network trained to predict the RScore in a fraction of the time (1ms vs. 42s per molecule ).
The power of this approach was validated in a study where seven chemists blindly labeled molecules as synthetically feasible or not. The results were telling: RScore perfectly classified the molecules according to the chemists' judgment (AUC 1.0), significantly outperforming other scores. The classic SAscore still performed well (AUC 0.96), but scores like RAscore (based on AiZynthFinder) and SCscore showed poor correlation with expert opinion (AUC 0.68 and 0.57, respectively). This established RScore as a highly accurate proxy for human assessment.
The ultimate computational check: "Round-trip" validation: A 2025 study by Penn State University, MIT, and Google DeepMind introduced an even more rigorous validation method to close the logical loop. Their "round-trip score" works in three stages: 1) Propose a retrosynthetic route for a target molecule; 2) Use a forward-reaction model to computationally "re-synthesize" the molecule from the proposed starting materials; 3) Calculate the Tanimoto similarity between the re-synthesized product and the original target. A perfect match (a score of 1) gives extremely high confidence that the proposed route is not just plausible, but chemically sound. This method addresses a key failure mode where a retrosynthesis plan looks good on paper but involves reactions that would fail or produce unwanted side products - a flaw that a forward-prediction model can often catch.
Personalization and context as a new frontier
A "one-size-fits-all" score, while a crucial first step, is inherently limited. The real-world feasibility of a synthesis depends on a project's specific chemistry, available starting materials, the synthetic routes a team is comfortable with, and the unique challenges of a novel chemical space. Acknowledging this, the field's most recent innovations have moved beyond static metrics to create adaptable, context-aware scores that can be tailored to a specific project or even a specific chemist's intuition.
Capturing human expertise with FSscore (focused synthesizability score)
This 2025 tool from EPFL introduces a personalized, "human-in-the-loop" approach. It starts with a baseline model built on a powerful graph attention network, which is already good at capturing nuanced structural features. However, its true innovation lies in its ability to be fine-tuned with small amounts of human feedback. A chemist can simply rank just 20-50 pairs of molecules as "easier" or "harder" to make, and the model rapidly adapts its understanding to that specific chemical space.
This has proven highly effective where generic scores fail. For instance, most scores cannot distinguish between a molecule with undefined stereochemistry and its specific, stereochemically pure counterpart, even though the latter is far more challenging to synthesize. After being fine-tuned on just 50 examples, FSscore learned to correctly penalize the molecules with assigned chirality. Similarly, it has been successfully tailored to the complex chemical space of PROTACs - large molecules that often mislead traditional scores. When integrated with the generative model REINVENT, FSscore-guided optimization led to 40% of its generated molecules having exact matches in a commercial database, compared to just 17% for the standard SAscore, demonstrating its power in guiding AI toward tangible, purchasable compounds.
Adapting to real-world resources with Leap
Another major 2025 advance, Leap from Exscientia, is the first scoring method that can dynamically account for the availability of key intermediates. It conceptualizes synthesis as a tree and uses a GPT-2 model pre-trained on hundreds of thousands of synthetic routes to predict the "tree depth", i.e., the maximum number of steps from the target to a commercially available building block.
Crucially, a user can tell the model that a specific intermediate is already on hand, and Leap will recalculate the synthesis depth accordingly. This directly mirrors real-world scenarios where a project's direction can be transformed by the availability of a crucial building block. Leap's performance is promising: even without considering intermediates, it outperforms existing scores by at least 5% AUC. When a key intermediate is supplied, its predictive power increases(AUC of 0.89), while other scores that cannot process this information become less reliable. The model is also robust, showing that its predictions are not skewed when provided with irrelevant or "false" intermediates. Leap represents an important shift, moving synthesizability assessment from a static prediction to a dynamic calculation that reflects the practical, resource-dependent nature of laboratory chemistry.
What if we make generation synthesis-aware?
The ultimate goal is not just to filter out bad ideas, but to prevent them from being generated in the first place. This means moving synthesizability from a post-design checkpoint to a core component of the generative engine itself. Recent advances have demonstrated multiple sophisticated approaches to this challenge.
Shifting the paradigm with a reaction-based generation
Instead of generating molecules atom-by-atom or as a SMILES string, a new class of models generates the synthetic pathway directly. This fundamentally changes the game:
SynFormer from MIT approaches this through a language-based framework, using postfix notation to represent synthetic pathways linearly. It works with a vast chemical space defined by 115 reaction templates and over 223,000 commercially available building blocks, giving access to >10^60 possible molecules. The framework's encoder-decoder variant (SynFormer-ED) can reconstruct 66% of molecules from the Enamine REAL Diversity Set, while its decoder-only variant (SynFormer-D) excels at goal-directed generation.
SynFlowNet from Cambridge University, EPFL, Microsoft, and Valence Labs leverages Generative Flow Networks (GFlowNets) to treat molecule generation as a flow through a defined synthetic space. Its core innovation is defining an action space not with atoms or fragments, but with valid chemical reactions and a library of over 220,000 purchasable building blocks. This ensures any generated molecule is, by its very construction, synthesizable. The model's effectiveness was demonstrated in a direct comparison: SynFlowNet achieved a 62% success rate when its outputs were validated by the external retrosynthesis tool AiZynthFinder, whereas a comparable fragment-based GFlowNet scored 0%. A key technical achievement was the successful training of its backward policy using REINFORCE, which could find a valid synthetic route back to the starting materials for 99-100% of the molecules it generated. Compared to other baselines like REINVENT, SynFlowNet was shown to generate more diverse and novel molecular scaffolds, highlighting its ability to explore new chemical territory while respecting the hard constraints of synthesis.
SyntheMol-RL, developed by Stanford and McMaster Universities, demonstrated the real-world power of this approach in a complete, end-to-end pipeline for antibiotic discovery. The model uses Monte Carlo Tree Search to explore the Enamine REAL Space - a vast chemical library of over 76 billion molecules that are all guaranteed to be synthesizable in a single step. To guide the search, researchers first trained property prediction models on experimental data from screening thousands of compounds against the priority pathogen Acinetobacter baumannii. SyntheMol then used these predictors to navigate the synthesizable space and find molecules with high predicted antibacterial activity and structural novelty. The results were a powerful validation of the method: of 150 diverse, high-scoring molecules selected for synthesis, 58 were successfully created in the lab. Testing these compounds revealed that 6 of them (a 10% hit rate) were potent antibiotics, effective even against resistant clinical isolates of A. baumannii. The entire pipeline, from AI training to experimental validation of novel antibiotic candidates, was completed in about three months, showcasing how designing for synthesizability from the start can dramatically accelerate discovery.
Direct retrosynthesis oracles
With increasingly efficient models, it's now possible to use a full retrosynthesis engine as a live guide during generation. The Saturn framework from EPFL exemplifies this approach, representing a clear shift in molecular design. Built on the Mamba architecture and pre-trained on PubChem, Saturn actively optimizes for synthesizability during the creative process using reinforcement learning with an augmented memory algorithm. Its model-agnostic design integrates with various retrosynthesis engines (Syntheseus, AiZynthFinder, RetroKNN), achieving remarkable efficiency - requiring 40× fewer oracle calls than comparable methods. Importantly, Saturn enables unprecedented control over synthesis routes:
Can enforce specific reaction types (e.g., Suzuki coupling)
Can avoid unwanted reaction classes
Can target specific building block libraries
In benchmarks, Saturn achieves >90% success finding exact matches in massive chemical libraries. Its applications range from drug discovery to industrial waste valorization and functional materials design. Overall, since traditional synthesizability heuristics like SA score prove less reliable beyond drug-like molecules, incorporating actual synthesis engines in the design process is essential.
Joint 3D and synthesis generation is now possible!
The 2025 framework SynCoGen from Toronto and McGill universities represents a pinnacle of integration, simultaneously generating a molecule's building blocks, connecting reactions, and 3D coordinates. Trained on SynSpace, 600,000+ molecules with their synthesis pathways and 3D conformations, it delivers:
Integrated generation of molecular graphs, retrosynthesis pathways, and 3D conformations in one step.
Promising metrics: 96.7% chemical validity and 72% synthesizability when validated with external tools. The 3D structures pass 87.2% of rigorous validity checks.
Practical utility: In fragment-linking challenges for targets like HIV protease, it generated novel connectors with strong docking scores and viable synthesis routes - achieving 58-79% retrosynthesis success versus 0% for synthesis-unaware models.
This approach bridges the gap between computational design and experimental reality by ensuring predicted 3D structures can actually be synthesized through known, practical routes.
The big picture: Foundation models and the lab-in-the-Loop
These advancements in synthesizability are not happening in a vacuum. They are enabled by two concurrent shifts that are reshaping the R&D landscape: the rise of massive foundation models and the implementation of the "Lab-in-the-Loop" paradigm.
Foundation models are the engines of multi-parameter optimization
The new generation of AI is defined by a "seismic shift" in scale. Where previous models were trained on millions of data points, today's foundation models learn from billions of parameters, allowing them to grasp the complex, unwritten rules of chemistry and biology. This leap in capacity allows them to explore what experts call the "untapped chemical universe," where less than 0.1% of all potentially synthesizable small molecules have been explored.
Their true power lies in solving the multi-parameter puzzle of drug design simultaneously. Traditionally, optimizing one property, like solubility, could negatively impact another, like potency, in a frustrating process likened to solving a Rubik's cube. Foundation models, however, can balance these conflicting criteria in a single design cycle. They can be tasked to optimize for high binding affinity, low toxicity, good metabolic stability, and, crucially, high synthetic feasibility all at once, generating blueprints for molecules that are holistically optimized from the start.
The lab-in-the-Loop
This computational power plugs directly into the "lab-in-the-Loop" paradigm, an intelligent, iterative cycle that aims to shorten the path from idea to discovery. The process is a continuous DMTA feedback loop:
Design: A generative AI foundation model designs novel molecules with desired properties, including a high likelihood of being synthesizable.
Make: The most promising candidates are synthesized, often using robotic automation to increase speed and throughput. This is the step where reliable synthesizability assessment is non-negotiable; if the molecule can't be made, the loop breaks.
Test: The newly synthesized compounds are put through a battery of real-world biological assays. This can range from measuring binding affinity to using advanced biomedical imaging to see how cells respond to the compound.
Analyze & refine: The experimental results, the "oracles" or feedback, are fed directly back into the foundation model. This real-world data allows the AI to learn from its successes and failures, continuously sharpening its predictions and guiding the next cycle of molecular design.
Accurate, reliable, and steerable synthesizability assessment is the critical gear that connects the AI's "design" phase to the lab's "make" phase. Without a high degree of confidence that a proposed molecule can be created, the entire resource-intensive cycle cannot begin. The methods discussed earlier, from retrosynthesis-informed scores to generative models that output full synthetic pathways, provide exactly this confidence, ensuring that the AI is proposing hypotheses that are not just computationally interesting, but experimentally testable.
Conclusions
The journey to address the synthesizability challenge has shown promising developments. We've progressed from simple, post-hoc heuristics like SAscore, which offered a basic first filter, to more nuanced approaches like FSscore and Leap that aim to incorporate human expertise and available resources. More recently, systems like SynFormer and SynCoGen have emerged, where synthesis planning is integrated into the design process, though their real-world effectiveness remains to be fully validated.
This represents a methodological shift from filtering problematic candidates during post-processing toward generating more viable options from the start. However, significant challenges remain in translating computational predictions into laboratory success.
Synthesizability is increasingly recognized not merely as a constraint but as a parameter that chemists might actively influence, potentially guiding generative models to utilize specific reaction types and leverage available building blocks. The future of AI in drug design could be one where computationally generated molecules come with proposed synthetic routes, though these will still require expert validation and refinement. This growing reliability may support the development of more efficient "Lab-in-the-Loop" systems, potentially shortening the path from digital hypothesis to experimental validation, provided these computational approaches continue to demonstrate consistent accuracy in real-world synthesis settings.
References:
Grisoni, F., Huisman, B. J. H., Button, A. L., Moret, M., Atz, K., Merk, D., & Schneider, G. (2021). Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Science Advances, 7(24). https://doi.org/10.1126/sciadv.abg3338
Genheden, S., Thakkar, A., Chadimová, V. et al. (2020) AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J Cheminform, 12, 70. https://doi.org/10.1186/s13321-020-00472-1
Skoraczyński, G., Kitlas, M., Miasojedow, B., & Gambin, A. (2023). Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. Journal of Cheminformatics, 15(6). https://doi.org/10.1186/s13321-023-00678-z
Liu, S., Zhang, D., Tu, Z., Dai, H., & Liu, P. (2025). Evaluating Molecule Synthesizability via Retrosynthetic Planning and Reaction Prediction. arXiv:2411.08306v2.
Neeser, R. M. (2024). FSscore: A Personalized Machine Learning-Based Synthetic Feasibility Score. arXiv:2312.12737v2. https://doi.org/10.48550/arXiv.2312.12737
Laws, L. (2025, May 5). How foundation models are redefining small molecule drug discovery. DiscoverPharma. https://discover-pharma.com/exclusive-interview-how-foundation-models-are-redefining-small-molecule-drug-discovery/
Motente, M., & Chude-Okonkwo, A. K. (2025). Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis. Drugs Drug Candidates, 4(2), 26. https://doi.org/10.3390/ddc4020026
NVIDIA. (n.d.). Lab-in-the-Loop AI for Life Science: Shorten the path from hypothesis to breakthrough by engineering biological intelligence with feedback from the lab. https://www.nvidia.com/en-us/use-cases/lab-in-the-loop-ai-for-life-science/
Calvi, A., Gaudin, T., Miketa, D., Sydow, D., & Wilbraham, L. (2024). Leap: Molecular Synthesizability Scoring with Intermediates. arXiv:2403.13005v2. https://doi.org/10.48550/arXiv.2403.13005
Martínez León, A., Ries, B., Hub, J. S., & Magarkar, A. (2025). Moldrug algorithm for an automated ligand binding site exploration by 3D aware molecular enumerations. Journal of Cheminformatics, 17(85). https://doi.org/10.1186/s13321-025-01022-3
Polykovskiy, D., Zhebrak, A., Sanchez-Lengeling, B., Golovanov, S., Tatanov, O., Belyaev, S., Kurbanov, R., Artamonov, A., Aladinskiy, V., Veselov, M., Kadurin, A., Johansson, S., Chen, H., Nikolenko, S., Aspuru-Guzik, A., & Zhavoronkov, A. (2020). MOSES: A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology, 11. https://doi.org/10.3389/fphar.2020.565644
Thakkar, A., Chadimová, V., Bjerrum, E. J., Engkvist, O., & Reymond, J.-L. (2021). RAscore: A Machine Learning Approach to Predicting Synthetic Accessibility. Chemical Science, 12(9), 3339-3349. https://doi.org/10.1039/d0sc05401a
Thakkar, A., Chadimová, V., Bjerrum, E. J., Engkvist, O., & Reymond, J.-L. (2021). Retrosynthetic accessibility score (RAscore) - rapid machine learned synthesizability classification from AI driven retrosynthetic planning. Chemical Science, (9). https://doi.org/10.1039/d0sc05401a
Guo, J., Sabanza-Gil, V., Jončev, Z., Luterbacher, J. S., & Schwaller, P. (2025). Generative Molecular Design with Steerable and Granular Synthesizability Control. arXiv:2505.08774v1. https://doi.org/10.48550/arXiv.2505.08774
Guo, J., & Schwaller, P. (2025). Directly optimizing for synthesizability in generative molecular design using retrosynthesis models. Chemical Science, 16(16), 6943-6956. https://doi.org/10.1039/d5sc01476j
Parrot, M., Tajmouati, H., da Silva, V. B. R., & Gambin, A. (2023). Integrating synthetic accessibility with AI-based generative drug design. Journal of Cheminformatics, 15(83). https://doi.org/10.1186/s13321-023-00742-8
Rekesh, A., Cretu, M., Shevchuk, D., Somnath, V. R., Liò, P., Batey, R. A., Tyers, M., Koziarski, M., & Liu, C.-H. (2025). SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling. arXiv:2507.11818v1. https://doi.org/10.48550/arXiv.2507.11818
Cretu, M., Harris, C., Igashov, I., Schneuing, A., Segler, M., Correia, B., Roy, J., Bengio, E., & Liò, P. (2025). SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints. arXiv:2405.01155v3. https://doi.org/10.48550/arXiv.2405.01155
Gao, W., & Luo, S. (2024). Generative artificial intelligence for navigating synthesizable chemical space. arXiv:2410.03494v1. https://doi.org/10.48550/arXiv.2410.03494
Swanson, K., Liu, G., Catacutan, D. B., McLellan, S., Arnold, A., Tu, M. M., Brown, E. D., Zou, J., & Stokes, J. M. (2025). SyntheMol-RL: a flexible reinforcement learning framework for designing novel and synthesizable antibiotics. https://doi.org/10.1101/2025.05.17.654017
Gao, W., & Coley, C. W. (2020). The Synthesizability of Molecules Proposed by Generative Models. Journal of Chemical Information and Modeling, 60(12). https://pubs.acs.org/doi/10.1021/acs.jcim.0c00174
Malikussaid, & Nuha, H. H. (2025). VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design. arXiv:2506.23339v1. https://doi.org/10.48550/arXiv.2506.23339
Research Fellow in Molecular Modelling/Computational Chemistry
1moSpoiler: it doesn't work much, and when it does it's pretty much something that existed in the training data or nothing that a trivial combinatorial algorithm could do in a 1/100th of the time.
PhD | Digital Drug Hunter | 8 years in industrial & academic Computer-Aided Drug Design (CADD) | Computational Chemistry | Generative AI for Drug Discovery
1moPart 5 has been published here: https://www.linkedin.com/pulse/generative-molecular-design-pharma-rd-pragmatic-usecase-serhii-vakal-nicgf
Founder & Chief Architect at ERSTLING Quantum Innovations (Coherion™, ERSTLING Matterworks®, ERSTLING Therapeutics®, Algotronic™, AstroSight™)
1moI’d like to gently offer a new perspective; Most of the AI platforms today (Insilico included) are fundamentally statistical — generative, correlational, and heavily reliant on training data. But biology isn't probabilistic at its core. It's governed by deterministic physical and informational constraints. And that's where we come in. At Coherion, we use a quantum-executed validation model (no qubits, no simulation) to collapse biological and chemical systems into their most probable outcome, instantly. It's not generative. It's solved. For example, here’s a real quantum validation we ran on a failed kinase inhibitor — ⚡ Binding free energy: -8.71 kcal/mol ⚡ Pose deviation: 0.34 Å ⚡ ADMET pass, no red flags ⚡ Repositioned indication: Acute Myeloid Leukemia (FLT3 mutant) ⚡ Turnaround: 3 minutes. This is a recovered, validated, IND-ready asset — and we can do up to 50/day. The future isn't "AI-assisted guesswork." It’s quantum-deterministic resolution.
Novartis Leading Scientist. Computer-Aided Drug Design (CADD)
1moI am always surprised by the current avalanche of narrative about AI applied to drug design. It seems that we transition from prehistoric ages to a brave new world. No, rational drug design existed and was efficient before the advent of what is called AI ( to be precisely defined). No, the consideration of synthetic accessibility was always of major importance and was at the heart of the collaboration between CADD scientists and medicinal chemists before the advent of AI. Drug design is far from being a matter of algorithms.