Week 3 of F1L Internship Emulator: The Data

Dean Lee

Figure One Lab: Computational Biology Upskilling

Published Jul 29, 2024

The goal this week is to explore the scRNA-seq data from Kinker et al. (DOI: 10.1038/s41588-020-00726-6) using Scanpy. You will re-enact Figure 1B of the paper, to look at the data in the same way the authors did. This is the core motivation behind Figure One Lab (F1L), and I’m excited to see how far you get! There is also a bonus challenge.

Scanpy

Here is the official Scanpy documentation. No need to read it completely, but you should get familiar with it and refer back frequently.

Here is a demo by Mark Sanborn showing how he analyzes scRNA-seq data: Part 1 and Part 2. This demo was referenced in previous weeks. If you haven’t watched these videos yet, now is the time to do so.

Jupyter Notebooks

In the F1L GitHub repo you will find a few Jupyter notebooks I have prepared. Run through 240701_kinker_anndata.ipynb and 240702_kinker_scanpy.ipynb in your own working directory, which you specified last week. The input to 240701_kinker_anndata.ipynb is the Kinker et al. scRNA-seq dataset you downloaded last week. 240701_kinker_anndata.ipynb outputs a file which serves as the input to 240702_kinker_scanpy.ipynb.

Understand what each code block does. Get familiar with the structure of AnnData objects. Use GitHub Copilot or ChatGPT as your private tutor. Refer back to the official Scanpy documentation and Mark Sanborn’s demo as many times as you need to. You will see that most of the code is data clean-up and formatting followed by boiler-plate Scanpy workflow. Would you consider modifying the Scanpy workflow? How? Take notes directly in your Jupyter notebook.

Re-enact Figure 1

By the time you get to the end of 240702_kinker_scanpy.ipynb, you should have produced a UMAP plot, a 2-D visualization of the scRNA-seq dataset. Does this look like Figure 1B of Kinker et al.? How is it similar? How is it different? Take notes directly in your Jupyter notebook.

Now color the dots in the UMAP plot by the different columns of the metadata. What patterns do you see? Do they make sense? Take notes directly in your Jupyter notebook.

Now for the challenge. Try to reproduce Figure 2B and 2D from Kinker et al. See if you can recapitulate some of the patterns in the data that the authors describe. Do your best, but don’t lose any sleep over this.

Once you are happy with what you have accomplished, push your Jupyter notebooks to your GitHub repo and share a link to the repo in our designated Discord. Also feel free to share a screenshot of a plot you are proud of in our Discord.

Other People’s Code

Computational biologists often inherit someone else’s code. If you’re lucky, that code will be annotated clearly. In most cases, like the Jupyter notebooks you had to run through this week, that code will not be adequately annotated. Part of the job is to make sense of that code anyway; there is usually enough structure in someone else’s code that you can parse out the line of thought. I hope this week gives you a realistic sense of that experience.

Resources

Introducing Figure One Lab (F1L)
Introducing the F1L Internship Emulator
Week 1 of F1L Internship Emulator: The KSQ
Week 2 of F1L Internship Emulator: The Paper
Discord for participants
Scanpy documentation
A demo by Mark Sanborn showing how he analyzes scRNA-seq data: Part 1 and Part 2
Suggestions for how to analyze scRNA-seq data from Fabian Theis’s lab, one of the leaders in this kind of analysis. No need to read this completely, as it is quite long, but get familiar with the table of contents.

Ahmed Mohamed Hassan

Student at Istanbul Medeniyet University I Graphic designer

The discord link doesn't seem to be working, is there a chance I can get a new one

Diogo Camacho

Biotech Executive | Comp Bio | AI/ML | Computing Biology

This is so cool. Dean Lee this is such a great series. Shout out to Alex Wolf and all the folks contributing actively to #scanpy

1 Reaction

See more comments

Week 3 of F1L Internship Emulator: The Data

Dean Lee

Figure One Lab: Computational Biology Upskilling

Scanpy

Jupyter Notebooks

Re-enact Figure 1

Other People’s Code

Resources

Figure One Lab

20,755 followers

More articles by this author

Others also viewed

Solving a Labyrinth with Backtracking: A Guide to Enhancements

Learning Riemannian Manifolds with Python

Lie Algebra on SO3 Groups in Python

From SimEvents to SimPy: A Practical Guide to Discrete-Event Simulation in Python

Python 3D Visualization -- A Hackable Step-by-step Jupyter Notebook

AlphaFold 3 Best Practices - How to Confidently Present an AF3 Structure (With Code!)

Understanding Gradient Descent in Python

Feature Engineering techniques in Python

Class 7 - LOOPS & FUNCTIONS IN PYTHON Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

A Comprehensive Guide to Feature Engineering for Machine Learning in Python

Explore topics

Scanpy

Jupyter Notebooks

Re-enact Figure 1

Other People’s Code

Resources

Figure One Lab

20,755 followers

RNA-Seq Data Analysis On-Ramp Starts on July 12

Jun 29, 2025

Figure One Lab Course Update

Apr 28, 2025

~40% of life science PhD students in the US don't graduate

Jan 16, 2025

Figure One Lab Update: Bare Minimum R is Out

Jan 6, 2025

Figure One Lab Course Update: How Much Guidance Is Too Much?

Dec 2, 2024

Figure One Lab Course Update: Posit Cloud + Kajabi, Bare Minimum R

Nov 25, 2024

Figure One Lab Course Update: Easy Code Environment Setup

Nov 18, 2024

The path from bench to computational biology requires a piecemeal learning approach

Sep 25, 2024

How I built a compbio project in my free time to land a biotech job

Sep 3, 2024

Week 5 of F1L Internship Emulator: The Slides

Aug 19, 2024

Others also viewed

Solving a Labyrinth with Backtracking: A Guide to Enhancements

Learning Riemannian Manifolds with Python

Lie Algebra on SO3 Groups in Python

From SimEvents to SimPy: A Practical Guide to Discrete-Event Simulation in Python

Python 3D Visualization -- A Hackable Step-by-step Jupyter Notebook

AlphaFold 3 Best Practices - How to Confidently Present an AF3 Structure (With Code!)

Understanding Gradient Descent in Python

Feature Engineering techniques in Python

Class 7 - LOOPS & FUNCTIONS IN PYTHON Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

A Comprehensive Guide to Feature Engineering for Machine Learning in Python

Explore topics