Week 3 of F1L Internship Emulator: The Data
The goal this week is to explore the scRNA-seq data from Kinker et al. (DOI: 10.1038/s41588-020-00726-6) using Scanpy. You will re-enact Figure 1B of the paper, to look at the data in the same way the authors did. This is the core motivation behind Figure One Lab (F1L), and I’m excited to see how far you get! There is also a bonus challenge.
Scanpy
Here is the official Scanpy documentation. No need to read it completely, but you should get familiar with it and refer back frequently.
Here is a demo by Mark Sanborn showing how he analyzes scRNA-seq data: Part 1 and Part 2. This demo was referenced in previous weeks. If you haven’t watched these videos yet, now is the time to do so.
Jupyter Notebooks
In the F1L GitHub repo you will find a few Jupyter notebooks I have prepared. Run through 240701_kinker_anndata.ipynb and 240702_kinker_scanpy.ipynb in your own working directory, which you specified last week. The input to 240701_kinker_anndata.ipynb is the Kinker et al. scRNA-seq dataset you downloaded last week. 240701_kinker_anndata.ipynb outputs a file which serves as the input to 240702_kinker_scanpy.ipynb.
Understand what each code block does. Get familiar with the structure of AnnData objects. Use GitHub Copilot or ChatGPT as your private tutor. Refer back to the official Scanpy documentation and Mark Sanborn’s demo as many times as you need to. You will see that most of the code is data clean-up and formatting followed by boiler-plate Scanpy workflow. Would you consider modifying the Scanpy workflow? How? Take notes directly in your Jupyter notebook.
Re-enact Figure 1
By the time you get to the end of 240702_kinker_scanpy.ipynb, you should have produced a UMAP plot, a 2-D visualization of the scRNA-seq dataset. Does this look like Figure 1B of Kinker et al.? How is it similar? How is it different? Take notes directly in your Jupyter notebook.
Now color the dots in the UMAP plot by the different columns of the metadata. What patterns do you see? Do they make sense? Take notes directly in your Jupyter notebook.
Now for the challenge. Try to reproduce Figure 2B and 2D from Kinker et al. See if you can recapitulate some of the patterns in the data that the authors describe. Do your best, but don’t lose any sleep over this.
Once you are happy with what you have accomplished, push your Jupyter notebooks to your GitHub repo and share a link to the repo in our designated Discord. Also feel free to share a screenshot of a plot you are proud of in our Discord.
Other People’s Code
Computational biologists often inherit someone else’s code. If you’re lucky, that code will be annotated clearly. In most cases, like the Jupyter notebooks you had to run through this week, that code will not be adequately annotated. Part of the job is to make sense of that code anyway; there is usually enough structure in someone else’s code that you can parse out the line of thought. I hope this week gives you a realistic sense of that experience.
Resources
Student at Istanbul Medeniyet University I Graphic designer
1yThe discord link doesn't seem to be working, is there a chance I can get a new one
Biotech Executive | Comp Bio | AI/ML | Computing Biology
1yThis is so cool. Dean Lee this is such a great series. Shout out to Alex Wolf and all the folks contributing actively to #scanpy