🔗BInD: AI Model for Drug Design💊, 🌾Winnow-KAN: Single-cell RNA-seq Location Recovery📍, 🔁b-move: Lossless Pattern Matching for Genetic Diversity🧬
Stay Updated with the Latest in Bioinformatics!
Issue: 99 | Date: 15 August 2025
👋 Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you are a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we have got you covered. Subscribe now to stay ahead in the exciting realm of Bioinformatics!
🔬 Featured Research
This study investigates liver-mediated mechanisms underlying growth retardation in piglets using transcriptomics, metabolomics, and ATAC-seq. Piglets with low birth and weaning weights exhibited hepatic vacuolation and structural lesions. Key pathways enriched include PPAR signalling, glutathione metabolism, and ferroptosis, with elevated GCLM and related metabolites linked to ROS scavenging and improved growth. Integrative analysis revealed transcription factor networks and differentially accessible regions associated with liver development and disease.
This study shows that milk-based feeding during weaning in mice enhances gut barrier integrity by enriching Dubosiella newyorkensis, which produces acetate to activate JNK2 signalling. Early intervention reduced inflammation, while delayed milk feeding or a high-fat diet shifted signalling to P38, disrupting the barrier. The effects were stage-specific and dependent on the microbiome’s transitional state.
This study finds that neutrophils from interferon-positive (IFNpos) systemic lupus erythematosus (SLE) patients show increased expression of multiple transposable element (TE) families alongside interferon-stimulated genes (ISGs). Most upregulated TEs were in introns of ISGs and correlated with partial intron retention and altered splicing. These changes were absent in interferon-negative patients, and expression of certain TE families tracked with disease activity. The results suggest a link between IFN responses, TE activation, and gene regulation in SLE neutrophils.
Single-cell RNA-seq of SIV-infected macaques revealed CD4⁺ T cell loss, aberrant B cell and CD16⁺ monocyte expansion during chronic infection. After five years of ART, myeloid cell dysregulation persisted, with ribosomal pathways marking infection stage and treatment duration. Immune profiles correlated with proviral reservoir size.
Metagenomic analyses identified “Jug” phages—360–402 kb jumbo phages—in human and animal guts, making up 1.1% of human gut reads and likely infecting Bacteroides/Phocaeicola. Over 1,500 genomes revealed broad distribution, shared gene content, and signs of cross-host transmission. Jug phages showed high transcriptional activity, including a calcium-translocating ATPase not previously seen in phages.
This study introduces the individualized co-expression-like index (iCKI), which quantifies gene–gene interaction strength for each individual sample. A higher absolute iCKI value indicates stronger co-expression between a pair of genes. The authors applied iCKI to early prediction of rheumatoid arthritis and survival analysis in pancreatic cancer, finding that co-expression-based biomarkers outperformed single-gene markers. iCKI also supports diverse omics data—such as DNA methylation, protein, and metabolite levels—offering a flexible tool for individualized disease biomarker discovery.
The study aimed to evaluate UL16 binding protein 2 (ULBP2) as a therapeutic target in gastric cancer and assess the efficacy of ULBP2-directed CAR-T cells. ULBP2 overexpression was found to activate TGF-β signalling, promoting cancer-associated fibroblast activation and tumour progression. ULBP2 CAR-T cells demonstrated effective tumour elimination and enhanced survival in both cell-derived and patient-derived xenograft models, especially when combined with anti-PD-1 therapy.
The study aimed to identify and characterize novel extrachromosomal elements (ECEs) in the human oral microbiome. A new family of large circular ECEs, named “Inocles,” was discovered using long-read metagenomics from saliva samples. Inocles were prevalent in 74% of individuals and encoded genes linked to stress tolerance and host interaction. Their abundance correlated with immune responses and was markedly reduced in patients with head, neck, and colorectal cancers, suggesting potential biomarker relevance.
This study aimed to characterize RNA-binding protein (RBP) expression and splicing patterns in HPV+ and HPV− cervical cancer using single-cell and bulk transcriptomic data. Key findings include heterogeneous RBP expression across cell types, identification of four RBPs (CSTB, TIPARP, NDRG1, NDRG2) linked to 25 alternative splicing events, and significant downregulation of TIPARP in HPV+ samples. These RBPs may serve as molecular markers and inform prognostic modelling in cervical cancer.
The study aimed to improve early detection of Alzheimer’s disease by identifying key MRI slices that reveal subtle anatomical changes associated with Early Mild Cognitive Impairment (EMCI). Using the ADNI-3 dataset and deep learning models integrated with Vision Transformers, the approach achieved high diagnostic accuracy—99.19% for AD vs. EMCI and 99.45% for AD vs. LMCI—highlighting the effectiveness of targeted slice selection in enhancing early diagnosis.
The study mapped single-cell transcriptomes of Drosophila brains under axenic and microbiome-associated conditions across age groups. Profiling 34,427 cells revealed 56 cell types with microbiome-driven transcriptional shifts, especially in aged glial and dopaminergic neurons. Key pathways affected included mitochondrial activity and Notch signalling, with age-related microbiome changes correlating with heightened brain gene expression.
🛠️ Latest Tools
This article talks about HUMESS, a computational framework that integrates transcriptomic analysis with metabolic modelling to identify condition-specific gene signatures. It enhances the interpretation of gene expression data by linking it to metabolic functions under varying conditions. The method improves the accuracy of metabolic flux predictions by incorporating quantitative omics data. HUMESS demonstrates strong performance in identifying biologically relevant gene signatures across diverse cellular environments.
The source code is available here.
This article talks about Winnow-KAN, a deep learning framework designed to recover spatial locations of single-cell RNA-seq data using a minimal gene set. It leverages a modified Kolmogorov-Arnold Network to enhance prediction accuracy and interpretability while reducing redundancy in gene features. The model includes a selector layer that identifies the most informative genes for spatial mapping. Benchmarking across brain and cancer datasets shows Winnow-KAN outperforms traditional methods in both precision and efficiency.
The source code is available here.
This article talks about HarmoDecon, a semi-supervised deep learning model designed to correct multi-scale biases in cell-type deconvolution for spatial transcriptomics. It addresses inaccuracies at the spot level, sample level, and between SRT and scRNA-seq datasets using pseudo-spots and Gaussian Mixture Graph Convolutional Networks. HarmoDecon outperforms 11 existing methods in simulations and real datasets, including STARmap, osmFISH, and 10x Visium. It shows high accuracy in spatial domain clustering and strong correlations with cancer markers in breast cancer samples.
The source code is available here.
The study introduces GFFx, a genome annotation toolkit built in Rust to address performance limitations in querying large-scale annotation files. It employs a compact, model-aware indexing system inspired by binning strategies to enable efficient feature- and region-based extraction. GFFx leverages Rust’s multithreading and memory safety features to enhance runtime and scalability. Benchmarking demonstrates substantial improvements over existing tools in handling hierarchical models and large genomes.
The source code is available here.
This work explores the development of foundation models for single-cell ATAC-seq data, drawing parallels with cell language models used in scRNA-seq. It outlines the unique challenges posed by scATAC-seq, including high dimensionality, sparse and near-binary data distributions, and lack of standardized annotations. The study emphasizes the need for specialized modelling approaches to capture chromatin accessibility patterns and support downstream epigenetic analyses.
The study proposes a novel approach for identifying PACE subtypes in Parkinson’s disease by integrating topological features derived from Parkinson-specific gene graphs (PGG) with gene expression data. Unlike traditional single-view machine learning models that rely on temporal RNA-seq features, this method incorporates structural information from gene and cell graphs. The integration aims to better capture molecular network alterations associated with disease heterogeneity.
The source code is available here.
The study presents a two-stage enhancer prediction framework combining epigenetic and sequence-based features. The first stage uses a Blending-KAN model that integrates multiple base classifiers with Kolmogorov-Arnold Networks to flexibly utilize epigenetic signals. The second stage employs DNABERT-2 for sequence feature extraction and a Stacking-Auto model for enhancer localization. The framework demonstrates high accuracy across cell lines and robustness to noise, outperforming existing methods.
The source code is available here.
This study benchmarks nine graph neural network (GNN) models across seven datasets to assess the impact of integrating causal features into graph classification tasks. It explores how causality-aware architectures enhance generalizability, predictive accuracy, and robustness compared to traditional GNNs. The research highlights the role of attention-based causal models in capturing complex dependencies and improving performance in large-scale graph classification. It also discusses the adaptability of causal models to multi-class datasets and the importance of hyperparameter tuning in model optimization.
The source code is available here.
DualNetM: an adaptive dual network framework for inferring functional-oriented markers | BMC Biology
The paper presents DualNetM, a deep learning framework that integrates gene co-expression and regulatory networks from single-cell data to infer functional markers. It employs adaptive attention mechanisms and bidirectional graph neural networks to construct gene regulatory networks and identify biologically relevant markers. Benchmark comparisons demonstrate its performance across multiple datasets.
The source code is available here.
This work introduces b-move, a bidirectional extension of the move structure for efficient pattern matching in run-length compressed data. The algorithm achieves faster character extensions and improved cache efficiency compared to existing methods, narrowing the performance gap with FM-index-based alternatives. It supports both forward and backward searches for locating occurrences in compressed sequences.
The source code is available here.
MEGDTA is a computational model that predicts drug-target affinity by integrating molecular graphs, fingerprints, and protein 3D structures using ensemble graph neural networks. It processes drug and protein features through parallel architectures and fuses them via cross-attention mechanisms. Evaluations on benchmark datasets demonstrate its capability to extract diverse structural features for affinity prediction.
The source code is available here.
📰 Community News
Researchers at McMaster University identified a new antifungal class, coniotins, derived from Coniochaeta hoffmannii. These molecules target fungal cell walls and show potent activity against Candida auris without harming human cells. The discovery was enabled by prefractionation screening and metabolomics, revealing previously hidden bioactive compounds.
A University of Pennsylvania study introduced two strategies to reduce inflammation from lipid nanoparticle (LNP)-mediated RNA delivery: a biodegradable lipid (4A3-SC8) that limits endosomal rupture and thiodigalactoside (TG), a galectin-blocking drug. These approaches improved mRNA delivery in a mouse model of ARDS, suggesting broader therapeutic potential.
KAIST researchers developed BInD, an AI model that designs drug candidates using only protein structure data. It integrates molecular generation and binding prediction in a single step, optimizing for binding affinity, drug-likeness, and stability. The model uses a diffusion-based approach informed by chemical laws and demonstrated efficacy against EGFR mutations.
Tahoe Therapeutics raised $30M to build a dataset of 1 billion single-cell profiles and 1 million drug-patient interactions. The data, generated using Parse Bioscience and Ultima Genomics platforms, aims to train AI-based virtual cell models for drug development, with plans to share the dataset with a single partner for clinical translation.
Whole-genome sequencing of 490,640 UK Biobank participants revealed over 1 billion SNPs and 101 million indels, significantly surpassing previous exome and array-based datasets. The data uncovered thousands of novel trait associations, especially involving rare and noncoding variants, enhancing precision medicine and population genetics research.
Spatial transcriptomics is gaining interest for clinical trials, offering insights into tissue architecture and cell interactions. However, challenges include high assay costs, complex workflows, and data standardization. Researchers are exploring multiplex immunofluorescence as a more feasible clinical translation pathway.
📅 Upcoming Events
The EMBL Conference on Protein Synthesis and Translational Control will be held from 3–7 September 2025 at EMBL Heidelberg, with a hybrid format allowing virtual attendance. The conference explores the molecular mechanisms of translation, its regulation, and its role in health and disease. Topics include translation initiation, elongation, termination, quality control, emerging technologies, and the impact of translation dysregulation in conditions like cancer and viral infections. The event features keynote speakers, interactive sessions, and recorded talks available on demand for registered participants.
This workshop explores the application of artificial intelligence in clinical decision support for heart, lung, blood, and sleep disorders. Topics include machine learning, natural language processing, and computer vision, with a focus on improving diagnosis, treatment personalization, and healthcare delivery. The event will feature expert talks, panel discussions, and opportunities for collaboration across disciplines.
📚 Educational Corner
The article demonstrates how to enable URL-based bookmarking in Shiny apps using the enableBookmarking = "url" setting. It walks through a practical example using the golem framework, showing how user inputs can be reflected in the URL for reproducibility and sharing. The implementation includes server-side logic and UI integration.
This post explores the use of the {shinytesters} package to simulate input updates in Shiny tests. It addresses challenges with observer-triggered input changes and shows how mocking update functions improves test accuracy. Examples illustrate how to track input attributes like minimum and maximum values during automated testing.
The article discusses strategies for handling missing data in predictive modelling, focusing on the medley package. It presents a case study on student retention using models tailored to different data availability scenarios. The approach avoids imputation by training multiple models and selecting the appropriate one based on observed data patterns.
This tutorial walks through building a Kotlin-based AI agent using Koog and Docker tools. It covers project setup, Docker Compose integration with model endpoints, and deployment using Docker Model Runner. The example centers on a recipe assistant and demonstrates how environment variables link services to AI models.
The article highlights new features in Python 3.14 RC1, including official support for free-threaded execution, deferred type annotations, and template string literals. It also introduces experimental JIT compilation and a new Windows installer. These updates aim to improve concurrency, performance, and developer experience.
This article explains mixin classes as a way to add reusable functionality to Python classes via multiple inheritance. It outlines best practices for naming and structuring mixins and provides examples of combining mixins with base classes. The focus is on modular design and code reuse without implying a hierarchical relationship.
Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
📬 Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: bioinformatics@zifornd.com
Copyright © 2025, Bioinformer Weekly Roundup. All rights reserved.