🍟 CRISPRware: Guide RNA Library Design🧬, 🧪hDNApipe: streamlining human genome analysis👤, 🦠MiCoDe: Microbiome Community Detection🔍
Stay Updated with the Latest in Bioinformatics!
Issue: 93 | Date: 04 July 2025
👋 Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you are a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we have got you covered. Subscribe now to stay ahead in the exciting realm of Bioinformatics!
🔬 Featured Research
69.9-kb long inverted repeat increases genome instability in a strain of Lactobacillus crispatus | Oxford Academic
This study likely investigates a large inverted repeat sequence in the genome of L. crispatus and its role in promoting genomic instability. The repeat may facilitate recombination events or structural rearrangements, impacting genome integrity and possibly influencing strain-specific traits.
This research probably characterizes ICEs across Mollicutes, a group of wall-less bacteria. It may detail how ICEs contribute to horizontal gene transfer, genome plasticity, and adaptation, highlighting their structural diversity and evolutionary significance in shaping microbial genomes.
This study introduces MGFEA, a graph-based algorithm that infers metabolite levels from spatial and single-cell transcriptomic data. It integrates gene interaction and spatial graphs guided by genome-scale metabolic models to estimate metabolic fluxes. MGFEA improves inference accuracy by incorporating metabolome data and addresses limitations of prior models like scFEA and Compass.
The authors evaluate various phylogenomic methods for reconstructing microbial species trees, focusing on gene-level evolutionary histories and their impact on genome-wide phylogenies. They propose a visualization framework using low-dimensional tree space to identify outlier gene histories and improve species tree estimation. The approach aids in selecting gene sets for robust phylogenomic inference.
This research presents bulk2sc, a variational autoencoder model that generates synthetic single-cell RNA-seq data from bulk RNA-seq. It deconvolves pseudo-bulk datasets by learning cell-type distributions, enabling single-cell level insights from bulk data. The model is validated against real scRNA-seq data and offers a cost-effective alternative for disease studies.
The study benchmarks deep learning-based metagenomic binning tools, highlighting COMEBin and GenomeFace for their accuracy and speed. It emphasizes the effectiveness of multi-sample binning and embedding space partitioning for low-coverage datasets. The work provides standardized workflows for evaluating binning performance and improving MAG recovery.
Researchers identified three novel orders within the class Ca. Penumbrarchaeia of Thermoplasmatota using metagenomic mining and enrichments. These rare biosphere members exhibit unique gene content and potential roles in organic matter degradation in anoxic environments. The study highlights their functional novelty and habitat specificity.
Lineage-specific expansions of polinton-like viruses in photosynthetic cryptophytes | BMC Microbiome
Using long-read sequencing, the study uncovers over a thousand polinton-like viruses (PLVs) in cryptophyte genomes, particularly Rhodomonas lacustris. These PLVs show lineage-specific expansions and diverse replication strategies. The findings link PLVs to host-virus interactions and suggest their role as endogenous viral elements in freshwater protists.
This study analyses transcriptomic changes across multiple chicken tissues under acute heat stress. The liver shows significant differential gene expression, with fatty acid metabolism pathways playing a central role. Functional validation of FASN in hepatocytes confirms its involvement in mitigating heat-induced metabolic disruptions.
The authors sequenced and analysed chloroplast genomes of 25 Morus species, identifying conserved structures and SSR polymorphisms. Phylogenetic analyses grouped the species into three clades based on usage (leaf, fruit, wild). The study provides SSR markers for classification and insights into mulberry phylogeny.
🛠️ Latest Tools
2dSpAn-Auto provides two workflows—binary skeletonization (2dSpAn-Auto.b) and fuzzy skeletonization (2dSpAn-Auto.f)—to segment and quantify dendritic spines in 2D maximum intensity projection images. It extracts spine density and morphometry metrics (area, length, head width, neck widths) along with total dendrite length via automated batch processing with optional expert parameter tuning through a GUI. Validation across in vitro, ex vivo, and in vivo imaging demonstrates high accuracy and reproducibility under varying protocols. The open-source tool, released under GPL v3, addresses the need for fast, modality-agnostic spine analysis in neurological research and clinical studies.
The source code is available here.
LabOps introduces a self-hosted Free and Open-Source Software workflow that integrates tools for collaborative writing, instant messaging, data storage, and more, tailored for academic research labs. It provides ready-to-deploy YAML configurations for Mattermost, Nextcloud, Radicale, and OnlyOffice, enabling secure, customizable communication and resource sharing without proprietary constraints. The paper outlines adoption strategies, discusses limitations of FOSS versus commercial suites, and presents a case study of cross-lab collaboration. LabOps aims to enhance data sovereignty and long-term accessibility in research environments.
The source code is available here.
scRepertoire 2 is an R package update for integrated analysis of single-cell adaptive immune receptor sequencing alongside transcriptomic data. New features include expanded clonotype tracking workflows, diversity metrics, longitudinal and comparative visualization modules, and seamless compatibility with Seurat and SingleCellExperiment frameworks. Benchmarking shows an 85.1% speed improvement and 91.9% memory reduction over version 1 across diverse repertoire sizes. The toolkit, available under the MIT license via Bioconductor and GitHub, supports end-to-end immune profiling in health and disease studies.
The source code is available here.
CRISPRware is a locally installable tool for high-throughput design of guide RNA libraries targeting coding, noncoding, and translated genomic regions. It integrates modern on-target scoring algorithms (including deep learning–based predictors) and ensemble strategies, sensitive off-target search, and comprehensive annotations (gene, TSS, SNPs) for five CRISPR modalities (knockout, activation, inhibition, base editing, knockdown). The authors demonstrate genome-wide gRNA generation for six model organisms and host results via UCSC Genome Browser sessions. CRISPRware enhances flexibility and customizability over existing web portals.
The source code is available here.
MiCoDe is a user-friendly web application for unsupervised detection of microbial communities from taxonomic abundance data. It implements a Bayesian weighted stochastic block model tailored to address high dimensionality, compositionality, zero inflation, and nonlinearity inherent to microbiome sequencing. Users upload a CSV of taxa abundances (samples × taxa), select transformations, network estimation methods, and community numbers (with sensible defaults), and receive interactive network visualizations. The source R code is available on GitHub for local use.
The source code is available here.
AutoPM3 automates extraction of PM3 criterion evidence—variant co-occurrence in trans—from literature using open-source large language models. The pipeline comprises four modules: variant augmentation for alternative representations, a Retrieval-Augmented Generation system to locate relevant text passages, a TableLLM with Text2SQL for table parsing, and an evidence synthesizer. Evaluation on PM3-Bench (1,027 variant-publication pairs) shows improvements in variant hit rate and in trans identification over baseline methods. A Streamlit interface facilitates local deployment for clinical variant interpretation workflows.
The source code is available here.
AOP-helpFinder 3.0 extends previous text-mining tools by incorporating additional data sources and graph-based methods to identify stressor–event and event–event associations across molecular initiating events, key events, and adverse outcomes. It automatically annotates mined relationships with toxicological database entries (AOP-Wiki, KEGG, Reactome, DisGeNET, etc.) and offers interactive network visualization on the web server. The updated pipeline enhances integrative toxicology by streamlining adverse outcome pathway development directly from PubMed abstracts.
The source code is available here.
hDNApipe is an end-to-end pipeline for human genomic sequencing data that delivers variant calling (SNVs, INDELs, SVs, CNVs), annotation, and optional visualization via both command-line and a Tkinter-based GUI. Distributed as a Docker container for effortless setup, it supports WGS, WES, and targeted panels in germline and somatic contexts. Benchmarking against existing pipelines demonstrates competitive precision, sensitivity, and runtime efficiency. hDNApipe simplifies customization through parameter files and dual-mode operation, facilitating rapid genomic analysis deployment.
The source code is available here.
📰 Community News
Researchers sequenced 4,000-year-old genomes from skeletal remains in Chile, identifying Mycobacterium lepromatosis, a pathogen linked to severe forms of leprosy. The findings suggest leprosy was present in South America long before European contact, challenging previous assumptions about its historical spread.
A Stanford-led study in mice found that inhibiting the overactive LRRK2 enzyme, linked to a genetic form of Parkinson’s, may help preserve dopamine-producing neurons. The research highlights a potential therapeutic strategy for early-stage intervention.
A study profiled cerebrospinal fluid from ME/CFS patients and identified distinct neuroinflammatory protein signatures. These immune markers revealed subgroups within the disease, offering insights into its heterogeneous nature and potential diagnostic pathways. MexOMICs Maps the Genetic and Social Landscape of Disease in Mexico | The Scientist
The MexOmics consortium is collecting genetic, clinical, and social data from twins and patients with lupus and Parkinson’s disease. By integrating functional genomics with community engagement, the initiative aims to understand how genetic and environmental factors shape disease in Mexico.
AlphaGenome is a new AI model developed to predict the regulatory impact of genetic variants across the genome. It integrates convolutional and transformer architectures to analyse long-range DNA sequences and supports variant interpretation across multiple molecular modalities.
Using single-cell RNA sequencing, researchers analysed immune cells from JIA patients and identified subtype-specific inflammatory profiles. The study revealed distinct cellular interactions and signalling pathways, contributing to improved classification and understanding of JIA pathogenesis.
📅 Upcoming Events
This training demonstrates practical applications of decision trees, survival analysis, and random forest models using R. It includes techniques for handling censored data and building predictive models, with a focus on statistical programming and interpretation of model outputs.
This event will focus on advances in microbiology, including infectious diseases, antimicrobial resistance, and microbiome diagnostics. Expected outcomes include insights into AI-driven genomics, machine learning for gene annotation, and multi-omics approaches for microbial community profiling. Sessions will also address microbial adaptation to climate change and environmental microbiology.
This webinar will present spectral flow cytometry as a tool for deep immune profiling to identify blood-based biomarkers in diseases like COVID-19 and Giant Cell Arteritis. It will highlight distinct cell surface markers and phenotypic shifts linked to disease severity, validated through scRNA-seq integration. The session aims to demonstrate the method’s value in translational immunology and non-invasive disease monitoring.
📚 Educational Corner
The article outlines best practices for validating Shiny applications in regulated domains such as healthcare, pharma, and finance. It emphasizes the importance of reproducibility, traceability, and documentation to meet compliance standards. Key recommendations include modular code design, separation of UI and logic, version control, and reproducible environments. Common pitfalls like hardcoded paths, global variables, and lack of testing are identified, with practical solutions offered to enhance reliability and maintainability.
The workshop introduces practical methods for integrating R with Excel, focusing on importing/exporting workbooks and performing typical Excel-based analyses within R. It covers data visualization, row and column operations, and reproducibility techniques. Led by a biostatistics PhD candidate, the session emphasizes public health and biomedical research applications. Participants receive resources for continued learning and can engage through Q&A segments.
The article explores performance optimization in R by comparing digit-counting implementations across languages, including R, Julia, and Fortran. It introduces the {quickr} package, which transpiles R code into Fortran to enhance execution speed. The author tests various approaches, highlighting differences in syntax, memory management, and computational efficiency. The post includes reflections on Fortran’s continued relevance in scientific computing and its integration with R for high-performance tasks.
The article evaluates local large language models (LLMs) for structured tool calling in agentic applications using Docker Model Runner. It documents manual and scaled testing across models under 10B parameters, including xLAM-2-8b-fc-r and watt-tool-8B. Key issues observed include premature tool invocation, incorrect tool selection, malformed arguments, and ignored responses. A leaderboard ranks models based on performance in a simulated shopping assistant scenario, highlighting variability in tool-handling capabilities among local LLMs.
Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
📬 Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: bioinformatics@zifornd.com
Copyright © 2025, Bioinformer Weekly Roundup. All rights reserved.