🌪️CYCLONE: RNAseq Contrastive Learning Framework🧬, ⚡PoweREST: Statistical Power Estimation Framework😴, PuMA: PubMed Gene/Cell Type-Relation Atlas🗺

🌪️CYCLONE: RNAseq Contrastive Learning Framework🧬, ⚡PoweREST: Statistical Power Estimation Framework😴, PuMA: PubMed Gene/Cell Type-Relation Atlas🗺

Stay Updated with the Latest in Bioinformatics!

Issue: 97 | Date: 01 August 2025

👋 Welcome to the Bioinformer Weekly Roundup!

In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you are a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we have got you covered. Subscribe now to stay ahead in the exciting realm of Bioinformatics!

🔬 Featured Research

Incorporating exon–exon junction reads enhances differential splicing detection | BMC Bioinformatics

This article talks about a novel workflow called Differential Exon-Junction Usage (DEJU) that enhances the detection of alternative splicing events in RNA-seq data. Unlike traditional methods that rely solely on exon-level read counts, DEJU integrates exon–exon junction reads to improve statistical power and accuracy. The authors benchmarked DEJU against existing tools using simulations and real datasets, showing it detects more biologically meaningful splicing events while controlling false discovery rates. Overall, DEJU offers a more precise, efficient, and flexible approach for differential splicing analysis in bulk RNA-seq experiments.

Accurate human genome analysis with element avidity sequencing | BMC Bioinformatics

This article talks about a new sequencing technology called Element Avidity Sequencing, which offers improved accuracy in human genome analysis. It shows significant performance in mapping and variant calling, especially at lower coverage levels (20–30x). The technology also shows lower base error rates, particularly in challenging regions like homopolymers and tandem repeats. Additionally, its ability to use longer insert sizes further enhances genome analysis accuracy across all coverage levels.

Deficiency in POLE exonuclease causes synthetic lethality in highly aneuploid cancer cells | bioRxiv

This article talks about how POLE exonuclease deficiency leads to synthetic lethality in highly aneuploid cancer cells. By analyzing nearly half a million tumor samples, the study found a strong mutual exclusivity between POLE mutations and high aneuploidy burden. Modeling and experiments showed that POLE deficiency increases the chance of inactivating essential genes on lost chromosome arms, making aneuploid cells vulnerable. These findings suggest that targeting POLE exonuclease could be a promising strategy for treating highly aneuploid tumours.

Generating realistic artificial human genomes using adversarial autoencoders | Oxford Academic

This article talks about a method to generate realistic artificial human genomes using adversarial autoencoders combined with domain knowledge of mutation mechanisms. The approach reduces data dimensionality and mimics chromosomal recombination hotspots to reflect natural mutation transmission. Using data from the 1000 Genomes Project, the model trains variational autoencoders with a Wasserstein GAN to produce synthetic genomes that preserve linkage disequilibrium. The resulting genomes are diverse, realistic, and anonymized, ensuring no individual’s identity is revealed.

Circulating microRNAs Track Circulating Tumor Cells in Metastatic Colorectal Cancer | Technology Networks

The study investigates whether specific circulating microRNA (miRNA) profiles can distinguish between patients with and without circulating tumor cells (CTCs) in metastatic colorectal cancer. Blood samples were analysed using immunomagnetic CTC detection and miRNA microarrays. Distinct miRNA signatures were identified for CTC-positive and CTC-negative patients. The findings suggest potential for miRNAs as non-invasive biomarkers for metastasis monitoring.

Identification of differentially expressed immune-related genes in patients with systemic lupus erythematosus and the development of a hub gene-based diagnostic model | European Journal of Medical Research

This research analyses gene expression data from systemic lupus erythematosus (SLE) patients to identify immune-related differentially expressed genes. Using WGCNA and validation via RT-qPCR, nine hub genes were highlighted and linked to immune cell infiltration and signaling pathways. A diagnostic model using three of these genes showed high accuracy in distinguishing SLE from healthy controls.

Whole genome mutagenicity evaluation using Hawk-Seq™ demonstrates high inter-laboratory reproducibility and concordance with the transgenic rodent gene mutation assay | Genes and Environment

The study evaluates Hawk-Seq™, an error-corrected next-generation sequencing method, for its reproducibility and concordance with the transgenic rodent gene mutation assay. Genomic DNA from mutagen-exposed mice was analysed across three laboratories. Results showed consistent mutation patterns and high correlation in base substitution frequencies, supporting Hawk-Seq™ as a reproducible tool for mutagenicity assessment.

Temporal multiomics gene expression data of human embryonic stem cell-derived cardiomyocyte differentiation | Scientific Data

This dataset captures gene expression, translation, and protein levels during cardiomyocyte differentiation from human embryonic stem cells across ten time points. Techniques used include mRNA-seq, ribosome profiling, and mass spectrometry. The data provide insights into regulatory mechanisms during differentiation and serve as a resource for studying early human development.

Drug-target interaction prediction based on graph convolutional autoencoder with dynamic weighting residual GCN | BMC Bioinformatics

This study addresses limitations in current graph convolutional network (GCN)-based methods for drug-target interaction (DTI) prediction. It highlights the use of shallow networks and lack of guided training as key challenges. The proposed approach leverages deeper graph structures and improved training mechanisms to enhance semantic extraction and network representation for more accurate DTI predictions.

Metabolite-mediated interactions and direct contact between Fusobacterium varium and Faecalibacterium prausnitzii | Microbiome

The research explores interactions between Fusobacterium varium and Faecalibacterium prausnitzii in the human gut. It reveals that F. varium growth is inhibited by acidic conditions and β-hydroxybutyric acid produced by F. prausnitzii, while F. prausnitzii growth is promoted via direct contact with F. varium. These findings underscore the role of metabolite exchange and physical interaction in shaping microbial community dynamics.

High-Resolution Ultrasound Data for AI-Based Segmentation in Mouse Brain Tumor | Scientific Data

This study introduces a publicly available ultrasound image dataset for GL261 glioblastoma segmentation in mouse models. Comprising 1,856 annotated images, the dataset supports automated tumor boundary detection during surgery. It aims to facilitate AI-driven segmentation and improve preclinical modeling of glioblastoma resection techniques, bridging experimental and clinical research.

🛠️ Latest Tools

LDA-SCGB: inferring lncRNA-disease associations based on condensed gradient boosting | BMC Bioinformatics

LDA-SCGB is a computational model designed to infer lncRNA-disease associations. It uses singular value decomposition for feature extraction and condensed gradient boosting for classification. The model was evaluated on three datasets and applied to predict lncRNAs linked to colorectal cancer, heart failure, and lung adenocarcinoma.

Source code is available here.

CYCLONE: recycle contrastive learning for integrating single-cell gene expression data | BMC Bioinformatics

CYCLONE introduces a recycle contrastive learning framework for integrating single-cell RNA-seq data across batches. It combines a contrastive learning network with a variational autoencoder to jointly train low-dimensional representations. The method iteratively refines mutual nearest neighbour (MNN) pairs and augments them with KNN pairs to identify batch-specific cell types. Evaluations on simulated and real datasets demonstrate effective batch effect removal while preserving biologically relevant batch-specific signals.

Source code is available here and the datasets are available here.

Phylo-rs: an extensible phylogenetic analysis library in rust | BMC Bioinformatics

Phylo-rs is a general-purpose phylogenetic analysis library developed in Rust, designed for speed, memory efficiency, and WebAssembly compatibility. It provides scalable data structures and algorithms for large-scale phylogenetic inference. The library was applied to analyse influenza A virus diversity in swine and to visualize tree space from MCMC Bayesian analyses, computing billions of tree pair distances to assess convergence. Phylo-rs is open-source and aims to support the development of advanced phylogenetic tools.

Source code is available here.

Geometry-Complete Latent Diffusion Model for 3D Molecule Generation | Oxford Academic

GCLDM introduces a geometry-complete latent diffusion model for 3D molecule generation, integrating an autoencoder that maps features between atom space and latent space. The model supports continuous latent representations, enabling the learning of multi-modal molecular distributions. It aims to improve the fidelity of generated molecular structures by better fitting the true data distribution. Comparative experiments demonstrate its performance across various benchmarks.

Source code and data are provided here.

2OMe-LM: predicting 2’-O-methylation sites in human RNA using a pre-trained RNA language model | Oxford Academic

2OMe-LM is a deep learning framework designed to predict 2′-O-methylation (2OMe) sites in human RNA. It integrates features from RNA pre-trained language models and word2vec embeddings, processed via bidirectional LSTM and fully connected layers. A feature fusion module and attention block enhance prediction accuracy and interpretability. The model outperforms existing predictors and supports motif discovery related to 2OMe. A web server and source code are publicly available for use and further development.

The source code can be obtained from here.

PoweREST: Statistical power estimation for spatial transcriptomics experiments to detect differentially expressed genes between two conditions | PLOS Computational Biology

The study introduces PoweREST, a statistical power estimation framework tailored for spatial transcriptomics (ST) experiments. It addresses the challenge of detecting differentially expressed genes (DEGs) between conditions, especially given the high cost and complexity of ST data generation. PoweREST supports power calculations both before and after data collection, incorporating factors like spatial gene expression, log-fold changes, detection rates, and replicate numbers. It includes a user-friendly web application for interactive visualization and planning, helping researchers optimize experimental design and resource allocation.

Code is available at here.

bmdrc: Python package for quantifying phenotypes from chemical exposures with benchmark dose modeling | PLOS Computational Biology

The bmdrc Python package is designed to quantify phenotypes resulting from chemical exposures using benchmark dose (BMD) modeling. It supports analysis of proportional response data—such as morphological or behavioural changes in organisms—across varying chemical concentrations. Built to align with EPA guidelines, bmdrc includes model fitting, filtering steps, and visualization tools to ensure reproducibility and clarity in toxicological assessments. It is open-source and integrates with platforms like SRP, making it a valuable resource for researchers studying chemical risk and dose-response relationships.

The source code is available here.

PuMA: PubMed gene/cell type-relation Atlas | BMC Bioinformatics

PuMA (PubMed Gene/Cell type-Relation Atlas) is a software framework designed to support literature-driven cell type annotation, especially in single-cell RNA-sequencing (scRNA-seq) analysis. It uses a pretrained machine learning-based named entity recognition model to extract gene and cell type concepts from PubMed articles and links them via biomedical ontologies. PuMA provides a local, user-friendly web interface with interactive graph visualizations, enabling exploration of gene-cell type relationships and traceability to original literature sources. It complements manually curated databases by offering automated, regularly updated insights from the latest research.

The source code is available here.

DiffCoRank: a comprehensive framework for discovering hub genes and differential gene co-expression in brain implant-associated tissue responses | BMC Bioinformatics

DiffCoRank is a computational framework developed to identify hub genes and differential gene co-expression patterns in tissue responses to brain implants. It integrates RNA-Seq preprocessing, gene filtering, correlation-based module detection, and network analysis to uncover strongly connected genes (SCGs) using false discovery rate (FDR) thresholds. A hybrid clustering method combining UMAP and DBSCAN enhances module identification. The framework ranks hub genes using multiple centrality metrics and reveals key biological pathways—such as oxidative stress, calcium signaling, autophagy, immune activation, and vascular remodelling—linked to implant-tissue interactions. DiffCoRank also includes a user-friendly visualization tool for interactive exploration of results.

The source code can be found here.

📰 Community News

Scientists create first artificial cell that moves on its own by chemical reactions | Interesting Engineering

The news talks about an artificial cell, developed by researchers, that moves autonomously through internal chemical reactions, without relying on external forces or mechanical components. The system uses a chemical gradient to generate motion, offering a model for studying minimal cell behaviour. The model provides a framework for studying minimal cellular behaviour and chemical motility.

Kinase enzymes exist throughout tree of life—those found in bacteria may be vulnerable targets for new antibiotics | Phys.org

The news discusses a comparative study which observed kinase enzymes to be conserved across diverse organisms, including bacteria. Structural analysis of bacterial kinases revealed features that may be exploited for therapeutic targeting. The findings contribute to understanding kinase evolution and potential antibiotic strategies.

Immunocompromised people are susceptible to skin cancer caused directly by beta-HPV | News-medical.net

The news uncovers the role of beta-human papillomavirus (HPV) in skin cancer among immunocompromised individuals. Viral gene expression and immune suppression were examined as contributing factors. Results suggest a direct link between beta-HPV activity and tumor development in vulnerable populations.

Researchers investigate probiotics as alternative to formaldehyde in poultry hatcheries | Phys.org

Can probiotics replace formaldehyde in poultry hatcheries? Scientists explored this question by evaluating hatch rates, microbial balance, and chick health—offering insights into safer hatchery practices. The study measured hatchability, microbial composition, and chick health under probiotic treatment. Findings inform strategies for safer hatchery management practices. 

Common Viruses May Wake Dormant Breast Cancer Cells, Study Finds | ScienceAlert

This study explores how common viral infections may reactivate dormant breast cancer cells. Experimental models showed that viral-induced inflammation could influence cellular reawakening. The research examines mechanisms of metastatic recurrence linked to viral exposure.

Inside a Cow's Stomach: The Microbial Magic Behind Milk | The Scientist

Ever wondered what fuels milk production in cows? Scientists are mapping the microbial communities in the bovine gut, revealing how fermentation and digestion support nutrient absorption and dairy output.

📅 Upcoming Events

BTEP: The Basics of Microbiome Analysis | NIH

This training covers the principles of microbiome analysis using 16S rRNA amplicon sequencing. It introduces tools like DADA2 for denoising and generating count tables from raw data. The session focuses on preprocessing techniques and sets the stage for downstream analysis in future modules.

3D Co-Cultures: Choosing the Best Systems & Analysis for Immuno-Oncology | labroots

This session explores methodologies for constructing 3D co-culture systems to better replicate in vivo cellular environments in immuno-oncology research. It discusses the integration of fibroblasts and immune cells into culture models and examines how their interactions influence treatment responses. Various 3D culture techniques are reviewed, with attention to their analytical compatibility and limitations. The presentation also outlines strategies for evaluating cell-type-specific responses within co-culture.

📚 Educational Corner

Bitwise Operators in Python | Real Python

This tutorial offers a deep dive into Python’s bitwise operators, explaining how to manipulate binary data at the bit level. It covers logical operators (AND, OR, XOR, NOT), shift operations, and bitmasks, with practical applications in compression, encryption, and hardware control. The guide also explores binary number systems, data representation, and operator overloading for custom types.

Part I – Using R in Excel – Descriptive Statistics | R-bloggers

This blog introduces the ExcelRAddIn, enabling R script execution within Excel. It demonstrates loading and preparing data using Power Query, creating R data frames from Excel tables, and computing descriptive statistics using R functions. The workflow integrates R packages like tidyverse and ggplot2, with outputs structured for Excel compatibility.

Part IV – Using R in Excel – Calling Python | R-bloggers

This post explores integrating Python into Excel workflows via R using the reticulate package. It addresses limitations of Excel’s native Python support by routing Python scripts through R, enabling access to libraries like yfinance. The setup includes environment configuration, error handling, and execution of Python functions for financial data retrieval within Excel.

Let’s talk about NA-s! | R-bloggers

The post discusses handling missing values (NA) in R, outlining built-in strategies such as na.omit, na.exclude, and na.fail. It emphasizes explicit handling over global settings and illustrates how standard functions like mean() and median() incorporate na.rm parameters. The article also proposes enhancing function design to streamline NA management.

MCP Catalog: Finding the Right AI Tools for Your Project | Docker

This blog introduces the Model Context Protocol (MCP), a framework that enables AI agents to interact with external APIs securely and predictably. It outlines challenges in discovering and integrating MCP servers, including fragmented tool registries and complex configurations. Docker’s MCP Catalog and Toolkit aim to streamline this process by offering a centralized, secure, and portable environment for building and deploying AI-compatible tools.

Connect with Us

Stay connected and engage with us on social media for daily updates, discussions, and more!

📬 Subscribe

Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.

Subscribe Now

We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!


Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.

Contact: bioinformatics@zifornd.com

Copyright © 2025, Bioinformer Weekly Roundup. All rights reserved.


To view or add a comment, sign in

Others also viewed

Explore topics