Data Analysis In Biology

Explore top LinkedIn content from expert professionals.

  • View profile for Phillip Compeau

    Professor and Asst Dean at CMU | Online education founder | PhD + MBA

    13,839 followers

    It’s Monday, which means it’s time for another lecture in Great Ideas in Computational Biology! This week, we’re building on what we learned last week about read mapping algorithms to a related problem: RNA-sequencing. In RNA sequencing, we sequence fragments of RNA, which is transcribed from DNA. By measuring the amounts of RNA transcribed from each gene, we can determine how genes are expressed in different cells, at different times, or under different conditions. This week’s lecture explores: • How sequencing DNA helps us develop a technology to sequence RNA and why splicing junctions make mapping RNA to a reference genome even more challenging than DNA. • The application of spliced alignment algorithms, which combine biological knowledge with dynamic programming to align reads that span exons. • How tools like TopHat combine computational insights with biological understanding to discover splice junctions, leveraging the principles of efficient alignment learned from the Burrows-Wheeler Transform. RNA sequencing exemplifies how computational biology adapts to biological complexity, creating algorithms that incorporate our growing knowledge of cellular processes. Check out this week's slides below, and as always, please feel free to like, share, and comment!

  • View profile for Neo Bioinfo

    Learn, Explore & Grow in Bioinformatics

    3,084 followers

    2 Mini Bioinformatics Projects for Beginners (Start Building Today!) If you’re just starting your bioinformatics journey, theory alone won’t take you far — hands-on projects are where real learning begins. Here are 2 beginner-friendly projects to strengthen your skills and build your portfolio 💻👇 🔹 Project 1: Sequence Alignment & Phylogenetic Tree Construction Goal: Analyze evolutionary relationships between different species or genes. Tools: NCBI BLAST, Clustal Omega, MEGA, or Phylo.io Steps: 1️⃣ Choose a gene/protein sequence from NCBI. 2️⃣ Use BLAST to find homologous sequences. 3️⃣ Perform multiple sequence alignment using Clustal Omega. 4️⃣ Construct a phylogenetic tree to visualize relationships. You’ll Learn: Sequence alignment, FASTA handling, and evolutionary analysis — core skills in bioinformatics. 🔹 Project 2: Protein Structure Prediction & Visualization Goal: Predict and visualize the 3D structure of a protein. Tools: AlphaFold, Robetta, PyMOL Steps: 1️⃣ Select a protein sequence (FASTA format) from UniProt. 2️⃣ Use AlphaFold or Robetta to predict its 3D structure. 3️⃣ Visualize and analyze the structure in PyMOL. You’ll Learn: Structural bioinformatics basics and how to interpret protein folding and function. 💡 Tip: Document your process, results, and insights on GitHub or LinkedIn. Recruiters and professors love seeing practical work — not just grades or certificates! 🚀 Start small, stay consistent, and you’ll soon have a solid portfolio that shows your bioinformatics growth and problem-solving mindset. #Bioinformatics #Genomics #Proteomics #ComputationalBiology #BLAST #AlphaFold #Research #DataScience #Python #NGS #BioinformaticsProjects #CodingForBiologists #LifeScience

  • View profile for Dip Ghosh

    Student at North South University

    4,447 followers

    🔬💻 Where Pipettes Meet Python: Why the Future of Biotech is Hybrid In today’s rapidly evolving biotech landscape, the synergy between wet labs and dry labs is no longer optional—it’s a necessity. 🧫 Wet Labs: Turning Hypotheses Into Tangible Discoveries Wet labs are the physical spaces where experiments happen: pipetting, culturing, sequencing, and staining. It’s where techniques like PCR, Western blotting, flow cytometry, and CRISPR-Cas9 gene editing bring biological theories to life. 📌 Example: In 2023, a study by the Broad Institute demonstrated how CRISPR edits in stem cells led to real-time observations of gene behavior—something only possible through wet lab precision. 💻 Dry Labs: Where Data Drives Discovery Dry labs are all about computational biology, bioinformatics, and systems modeling. Massive biological datasets—think genomic, transcriptomic, proteomic—are analyzed to extract meaningful patterns. 📊 Key Fact: According to Nature (2021), a single human genome generates 200 GB of raw data, and RNA-Seq datasets often exceed 50 million reads per sample. Without dry lab analysis, this data remains untapped potential. 🧠 The Real Power Lies in Integration When wet and dry lab researchers collaborate, we accelerate innovation: Cancer Genomics: The Cancer Genome Atlas (TCGA) combined wet-lab tissue analysis with dry-lab sequencing data to discover over 200 cancer-causing mutations. Drug Discovery: Machine learning models trained on bioassay data (dry lab) help identify promising compounds for synthesis and testing (wet lab), reducing R&D timelines by 30–50% (McKinsey, 2022). Synthetic Biology: Teams at MIT and ETH Zurich simulate gene networks computationally before building them physically in bacteria or yeast. 🔄 This loop of prediction (dry lab) → validation (wet lab) → refinement (dry lab again) is the engine behind modern biotech breakthroughs. 💡 Bottom line: Scientists fluent in both environments—or teams that bridge them—are shaping the next generation of cures, diagnostics, and biological understanding. #Biotech #WetLab #DryLab #Bioinformatics #MolecularBiology #Genomics #TranslationalResearch #CRISPR #STEMCareers #LabLife #ScientificInnovation #AIinBiotech #FutureOfScience

  • View profile for Tibor Zechmeister

    Founding Member & Head of Regulatory and Quality @ Flinn.ai | Notified Body Lead Auditor | Chair, RAPS Austria LNG | MedTech Entrepreneur | AI in MedTech • Regulatory Automation | MDR/IVDR • QMS • Risk Management

    24,195 followers

    Clinical evaluation is the most underestimated challenge in your MDR compliance journey. Most manufacturers focus heavily on QMS documentation and technical files. But it's your clinical evaluation that often becomes the bottleneck. Why? Because it requires both scientific rigor and regulatory precision. And notified bodies are scrutinizing this area more than ever before. The stakes are clear: insufficient clinical data means delayed market access or even rejection. So what does a compliant clinical evaluation actually look like? Here are 5 essential elements every MedTech leader needs to master: Clinical Evaluation Plan (CEP) ↳ This isn't just a document. It's your roadmap for success. ↳ Define specific endpoints that align with your intended purpose. ↳ Remember that vague objectives lead to undefined outcomes. Literature Review Strategy ↳ Simply collecting studies isn't enough anymore. ↳ You need a systematic search methodology with clear inclusion/exclusion criteria. ↳ Document why certain studies were rejected— auditors always ask this. Clinical Data Sufficiency ↳ "Sufficient" clinical data is subjective until you define it. ↳ Create a clear threshold for what constitutes adequate evidence. ↳ Pre-MDR data often falls short of current expectations. Post-Market Clinical Follow-Up (PMCF) ↳ This isn't optional. It's a fundamental part of your clinical evaluation. ↳ Notified bodies expect proactive data collection, not just passive surveillance. ↳ The days of "we have no complaints" as sufficient PMCF are long gone. Equivalence Justification ↳ The bar for equivalence has been raised significantly under MDR. ↳ You need access to technical documentation of equivalent devices. ↳ Without contractual agreements, equivalence claims are increasingly difficult to defend. Clinical evaluation isn't a one-time task. It's a continuous process throughout your device's lifecycle. The manufacturers who succeed are those who integrate clinical thinking from design phase through post-market surveillance. P.S. What's been your biggest challenge with clinical evaluations under MDR? Is it finding sufficient data, justifying equivalence, or something else? ⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡⬡ MedTech regulatory challenges can be complex. But smart strategies, cutting-edge tools, and expert insights can make all the difference. I'm Tibor, passionate about leveraging AI to transform how regulatory processes are automated and managed. Let's connect and collaborate to streamline regulatory work for everyone! #clinicalevaluation #regulatoryaffairs #medicaldevices

  • View profile for EU MDR Compliance

    Take control of medical device compliance | Templates & guides | Practical solutions for immediate implementation

    72,630 followers

    How to turn clinical data into real insight in your clinical evaluation? Let’s walk through it step by step.↴ 1. What is “clinical data”? Clinical data refers to any information regarding the safety or performance of a medical device that comes from: → Clinical investigations of the device itself → Other investigations or studies found in scientific literature → Post-market surveillance, especially PMCF → Peer-reviewed literature about similar clinical experience It’s not just “data”, it’s data that shows how the device behaves in the real world, in real hands, with real patients. 2. Clinical data must be relevant Relevance is contextual. That means: relevant to the medical device and its intended purpose. So: → Always match your data to a specific device and a specific clinical question → Exclude nonhuman, vague or off-label data unless justified → Avoid just dumping SOTA or competitor data unless it supports your argument → Carefully define search terms and clearly identify the device under discussion 3. Clinical data must be high quality Not all data is equal. High-quality clinical data should be: → Scientifically rigorous → Complete and methodologically sound → Controlled and peer-reviewed Use validated frameworks like: ✓ PICO (Population, Intervention, Comparator, Outcome) ✓Cochrane Handbook ✓ PRISMA ✓ MOOSE You can find guidance regarding appraisal of data in IMDRF, MDCG 2020-6, and Meddev 2.7/1. Sources from PubMed, Google Scholar, or Cochrane are preferred, especially if they’re top-tier journals. 4. Clinical data must be sufficient Here’s where many stumble. → Small sample size? Not enough → Low-quality studies or incomplete data? Weak evidence → Case reports and posters? Not generalizable → Not on EU population? Risky Sufficient data means: → Enough for an expert to form an opinion → Strong enough for the device’s risk level → Justified by hierarchy of evidence High-risk = needs deep literature and real-world experience Low-risk = might rely on limited, lower-quality data 5. So… when is clinical data enough? When it supports a clear, justified, and ethical approach to the device’s clinical evaluation. Risk-based is the key: → Higher risk = more data → Lower risk = less, if PMS and RM show acceptable benefit-risk

  • View profile for Bryce Platt, PharmD

    Consultant Pharmacist | Transforming the Business of Pharmacy | Strategy & Insights Across the U.S. Drug Supply Chain | Passionate about Aligning Incentives to Benefit Patients

    23,574 followers

    How can we decrease pharmacy spend on high-cost drugs by double digits without worse outcomes? --- Uplift modeling is a common tactic in marketing to target the specific people for a promotion that otherwise wouldn’t buy the product. While marketing in general can lead to overconsumption, in healthcare/#pharmacy, the same mathematical techniques used for uplift modeling could be repurposed to support #PrecisionMedicine or personalized medicine, where the goal is to identify which patients are most likely to benefit from a specific treatment while avoiding unnecessary treatments for patients who might not respond well. Identifying the cohort that is getting most of the outcomes from a drug varies by drug, but some drugs have only a fraction of the total population driving a larger share of clinical results. --- Here's the basic process for using #UpliftModeling (you can find more details from my Milliman white paper in the comments): 1. Treatment: Identify the treatment for which you want to predict response (e.g., a high-cost brand/specialty drug like GLP-1s). This could also be done for a medical device or any intervention. 2. Data collection: Gather comprehensive data and studies about patients, including their medical history, genetic information, and any other relevant attributes. This is often the limiter of building a good model. 3. Control group: Assemble a control group of patients who are similar to those receiving the treatment but are not receiving the treatment themselves. This helps establish a baseline for comparison. 4. Outcome measurement: Measure the effectiveness of the treatment for both the treatment group and the control group. This could involve monitoring health improvements, cardiac events, or other relevant medical outcomes. For FDA-approved drugs, this could come from published research on the “absolute risk reduction” or “number needed to treat.” 5. Model building: Develop predictive models using machine learning algorithms that estimate the likelihood of a positive response to the treatment for each individual. 6. Uplift calculation: Calculate the difference in response rates between the treatment group and the control group to determine the net impact of the treatment. 7. Segment: Divide patients into different segments based on their predicted response probabilities. 8. Action: Use the insights from uplift modeling to guide treatment, coverage, or other decisions. --- A payer or employer can use this information how they’d like, but I imagine it will be used to adjust formularies or utilization management strategies. It could also be used when setting up contracts for how a drug should be used while carving out certain drugs or disease states (e.g. oncology drugs at a center of excellence). There are more potential use cases in the white paper in the comments. --- Would you use this strategy for #PharmacyBenefits or #ValueBasedCare models that take on risk for cost of care?

  • View profile for Robert Rachford

    CEO of Better Biostatistics 🔬 A Biometrics Consulting Network for the Life Sciences 🌎 Father 👨🏻🍼

    20,311 followers

    How to review a Case Report Form (CRF) as a Biostatistician. The Biostatistician MUST ensure all the necessary data for statistical analysis is captured correctly. It is very easy for the biostatistician to brush this task off as they are not the formal owner of the CRFs and they are typically working on other high importance items during this time such as randomization lists and initial versions of the SAP. DO NOT BRUSH THIS TASK OFF. A good and proper review of the CRFs can be the difference maker between a well conducted and a poorly conducted trial. As a reminder: - The biostatistician is ultimately responsible for ALL analyses conducted in the clinical trial - All analyses are dependent upon the data they are run on - All data comes from some collection vehicle - primarily the CRFs Review the CRFs - I beg you Here are best practices for a biostatistician reviewing CRFs: - Start with what is most important: The primary endpoint. Review the protocol and write down every variable needed to conduct that analysis. This can be difficult at the beginning of the study as you don't yet have a statistical analysis plan. So take your time and really begin to think about what variables you will need to analyze the primary endpoint - will you be using a model statement? If so, what variables would you need/want to include. (This is bonus points as you are naturally getting a head start on your SAP development 😎) - Once you have your list of variables, you need to determine the time points you need those values at. Are you conducting a repeated measures analysis? If so, what timepoints are to be looked at? Do this for all the variables required for the primary endpoint analysis. - At this point you have a list of all the variables you need and a list of all the timepoints you will need those variables to be collected. - You are now ready to do your formal CRF review! - Be sure to ask for a copy of the blank annotated CRFs (please note this is not CDISC annotations at this point - this is the annotations explaining what the Field OIDs represent). - Go through each page noting down when you see a variable being collected that you need and confirming that it is collected at the correct timepoint/visit. - Once you go through the entire CRF you should be able to clearly determine if everything you need is being collected and at the timepoints/visits you need it to be collected. "But this is only for the primary endpoint!" you say. And yes, you are correct, but the beauty of this process flow is that it can be repeated for secondary, exploratory and safety endpoints! Simply follow the above steps for all the other endpoints within the trial and you will be able to quickly and efficiently review the CRFs to ensure all the data you need to conduct your analyses is captured correctly. Let me know if any of the above is not clear and Happy CRF Reviewing! Happy Friday

  • 🔍 What if we could rapidly search through millions of engineered immune cells to find the few that know exactly how to fight cancer? This is the promise of CAR T cell therapy: reprogramming a patient’s immune cells to recognize and destroy cancer. It has already led to remarkable, even curative, results in blood cancers. But when it comes to solid tumors, success has been harder to achieve — largely because we still don’t know which engineered cells will behave best in the complex tumor environment. One of the biggest challenges in CAR T cell therapy is figuring out which receptor designs will not only bind a target but activate the T cell effectively — especially in solid tumors where timing, persistence, and safety matter. Traditional pooled screens tell us which cells survive or proliferate, but they often miss a critical piece: real-time function. 💡 In this study, we developed a modular, high-throughput functional screening platform using nanovials—hydrogel microparticles with tunable antigen and cytokine capture—to evaluate pooled CAR libraries based on actual cytokine secretion (IFN-γ) at the single-cell level. 📊 Highlights: ▶️ We screened >2 million CAR T cells in a single experiment with only a standard flow cytometer and nanovials (no microfluidics or expensive equipment). ▶️ By capturing secreted IFN-γ in response to binding of HER2, we identified CAR variants with early vs. sustained activation dynamics. IL15RA-containing constructs (especially IL15RA-CD28) were enriched for rapid cytokine secretion, while CD40-CD40 constructs showed delayed but strong responses after 12 hours. ▶️ This tunable antigen system allowed us to mimic antigen density thresholds, potentially helping to reduce on-target, off-tumor effects. ⚙️ No custom microfluidics, no specialized instruments — just antigen-coated #nanovials, standard flow sorting, and sequencing. This work highlights how function-first screening can unlock new classes of CAR designs optimized for timing, potency, and safety — and do so in a format accessible to many labs. Big congratulations to my PhD student, Citra Soemardy and our fantastic collaborators at ETH Zürich, Anna M., Rocio Castellanos Rueda, and Sai Reddy and The Johns Hopkins University, Jamie Spangler, Monika Kizerwetter, PhD. and Nikol García Espinoza. Thanks to the Chan Zuckerberg Initiative for their support. A link to the preprint can be found here: https://lnkd.in/ggG585RU We’d love to hear how others might apply this platform — from CARs to TCRs, and beyond. #CAR_Tcell #SyntheticBiology #Immunotherapy #SingleCell #Bioengineering #Immunoengineering #Nanovials #FunctionalScreening #CellTherapy

  • View profile for 🎯  Ming "Tommy" Tang

    Director of Bioinformatics | Cure Diseases with Data | Author of From Cell Line to Command Line | >100K followers across social platforms | Educator YouTube @chatomics

    57,486 followers

    Thread: Multi-omics sounds cool—until you actually try it. Here's are the nuances. 1/ You’ve got RNA-seq. Methylation. Proteomics. Time to “integrate” the data. But how? And why? Let’s break it down. 2/ Multi-omic integration sounds powerful. But it’s not magic. If you don’t ask the right question first, the answer won’t matter. 3/ Start here: Do you want shared programs across omics? Or unique signals from each modality? That choice decides your method. 4/ Unsupervised goal? Try MOFA2. Want to predict disease or treatment? DIABLO is your friend. Graph models? Great—if it performs better 5/ Real-life example: Chronic kidney disease study used both MOFA2 + DIABLO. Why? Different tools, complementary insights. Paper: https://lnkd.in/eZ_Fu83u Another New preprint for a different disease: https://lnkd.in/esXGmdqQ 6/ Here’s what makes multi-omics hard: Your matrix is incomplete. RNA-seq for 200 samples. Proteomics for 150. Methylation for 180. 7/ You can’t just “merge” them. Naive concatenation drowns real signal. Or worse—creates phantom clusters driven by batch noise. 8/ Each modality is different: scATAC-seq is sparse Proteomics is noisy RNA-seq has 20K+ features Methylation may only cover 50K regions and over 9 million CpG sites 9/ Good methods normalize each modality, learn weights, or regularize smartly. MOFA2, DIABLO, and weighted PCA all do this. 10/ Want to see how it fails? Check this post: https://lnkd.in/eMiCtVgW Spatial + gene expression integration went sideways without normalization. 11/ Math is nice. But biology matters more. If you can’t map back your result to a gene, CpG, or protein—what’s the point? 12/ These methods uncover correlations, not causes. Interpret carefully. Validate everything. 13/ Use known pathways. Run orthogonal experiments. Generalize across cohorts. Don’t trust the output blindly. 14/ Resources: Tools list: https://lnkd.in/eri4hGKR Tool review: https://lnkd.in/etcQfBm4 Overview: https://lnkd.in/esK4M-eG 15/ Key takeaways: Start with the question Pick tools based on your goal Normalize per modality Validate everything Biology > black boxes Multi-omics is messy. But it’s worth it—if you know what you’re doing. I hope you've found this post helpful. Follow me for more. Subscribe to my FREE newsletter chatomics to learn bioinformatics https://lnkd.in/erw83Svn

  • View profile for Layla Hosseini-Gerami

    Chief Data Science Officer, Ignota Labs | Forbes 30 under 30

    3,282 followers

    🧬 Multi-instance machine learning (MIL) in chemoinformatics and bioinformatics🧪 I recently read a new paper, "Chemical complexity challenge: Is multi-instance machine learning a solution?" by Zankov et al. (doi 10.1002/wcms.1698), after seeing a post about it from Pat Walters. See below for my summary! 📚 The study explores the application of MIL algorithms for handling complex chemical and biological data. MIL is a learning framework that considers objects as sets of multiple alternative instances, called a "bag". The goal, in contrast to single-instance learning (SIL), is to predict a label for the bag rather than for a single instance. For example, a single chemical structure can be represented by an ensemble of multiple conformations to predict bioactivity. This leads to a better representation of the dynamic nature of chemicals in equilibrium. 💡 MIL algorithms can be categorised as instance-based or bag-based. Instance-based algorithms consider instances in a bag as a separate objects. Predictions are generated for each instance in the bag, and a predefined rule (an aggregation function) is applied to aggregate the instance-level predictions into a bag-level prediction. Bag-based algorithms consider the whole bag as a training object, leading to a single prediction for the bag as a whole, rather than for individual instances. 🔬The paper showcases various applications of MIL in chemoinformatics and bioinformatics. For example, MIL has been used to model bioactivity with conformation ensembles, improving prediction accuracy over SIL from 71% to 91% on a benchmarking dataset. It has also been applied to predict protein-protein interactions, taking into account different protein isoforms synthesised from the same gene arising from alternative splicing. 🎯 One interesting aspect of MIL is its ability to identify key instances, such as specific molecular forms or fragments, that are responsible for a particular property or function of a molecule (key instance detection or KID). This opens up new possibilities for understanding molecular mechanisms and designing targeted interventions, for example by identifying key domains in proteins responsible for biological functions. 📊 However, the authors note the need for benchmark datasets to validate MIL models, especially in the chemistry and biology domains. Current benchmark datasets are limited in size and scope, and KID in particular is challenging to evaluate. They anticipate the development of new datasets to stimulate progress in this field. 🌟 Overall, this paper sheds light on the potential of MIL in chemoinformatics and bioinformatics, offering new insights and avenues for research. Exciting times ahead for data-driven approaches in understanding complex chemical and biological systems! #MachineLearning #Chemoinformatics #Bioinformatics #MultiInstanceLearning #DataScience #ArtificialIntelligence Image: Graphical abstract from paper

Explore categories