Week 4 of F1L Internship Emulator: The Biology
Last week you should have successfully replicated Figure 1B of Kinker et al. (DOI: 10.1038/s41588-020-00726-6) using Scanpy. You should have observed that cells tended to cluster together in the UMAP visualization by cell line of origin. I hope you didn’t just stop there, but also took some time to interpret what you saw.
This week, you will explore the scRNA-seq data underlying Figure 1B with regard to the KSQ, which I restate below.
The KSQ: Using available scRNA-seq data from cancer cell lines, how would you explore the use of the following FDA-approved antibody therapies in additional cancers?
Trastuzumab: Targets HER2 and is used in the treatment of HER2-positive breast and gastric cancers.
Bevacizumab: Targets VEGF and is used for a variety of cancers, including colorectal, lung, glioblastoma, breast, liver, and kidney cancer.
ERBB2 and Trastuzumab
Run through 240703_kinker_explore.ipynb. Here I have done some extremely basic exploration of the data to help you get started.
I first checked how ERBB2, the gene encoding the HER2 protein, is expressed across each cancer indication represented by the cell lines. A cursory glance reveals that ERBB2 is highest in breast, gastric, and lung cancer. It makes sense that it is more highly expressed in breast and gastric cancer, since trastuzumab is already FDA-approved for these two indications. It is, however, unexpected that some lung cancer cells in the data also have high ERBB2 expression. What is happening there? And does that warrant considering the potential of trastuzumab for lung cancer?
Then I checked ERBB2 expression across each breast, gastric, and lung cancer cell line to see whether it is higher in some cell lines but not others. That turns out to be the case. Which cell lines are ERBB2-high? Which cell lines are ERBB2-low? What does this discrepancy suggest about the potential of trastuzumab to be repurposed?
Now think back to the KSQ. Using this scRNA-seq dataset, how would you explore (with further computational analysis) the use of trastuzumab for indications that it is not currently approved for? I have shown you just one way to explore this data with the KSQ in mind; which other ways would you take? Which other questions would you ask of the data? Which new ways might you slice up and visualize the data? Make sure you consider the caveats of cancer cell lines, the mechanism of action of trastuzumab, and the biology of the different cancer indications.
VEGFA and Bevacizumab
I did not do any exploration of VEGFA, the gene encoding the VEGF protein, in the data. I do not address the bevacizumab part of the KSQ . I leave this part entirely to you.
Now It’s Your Turn
This is where I push you off the cliff and hope you fly. I will offer no more hints beyond this point. I want to see you explore the data beyond what I have done. I want to see you reason through the patterns you see in the data to arrive at a reasonable response to both parts of the KSQ.
Once you are happy with what you have gleaned from the data, record your response to both parts of the KSQ directly in your Jupyter notebook, push it to your GitHub repo, and share a link to the repo in our designated Discord.
Resources
Discord for participants
A demo by Mark Sanborn showing how he analyzes scRNA-seq data: Part 1 and Part 2
Suggestions for how to analyze scRNA-seq data from Fabian Theis’s lab, one of the leaders in this kind of analysis. No need to read this completely, as it is quite long, but get familiar with the table of contents.
MS Bioinformatics from Georgia Tech
4moHi Dean! Thanks for creating Figure One Lab, I've had a great time following along. I noticed in the plot of ERBB2 expression across cancer indications (from the 240703_kinker_explore.ipynb notebook) that lung cancer is mentioned as having the third highest ERBB2 expression. However, based on visual inspection of the plot, it seems like colon/colorectal cancer has higher ERBB2 expression than lung cancer. Could you clarify why lung cancer was highlighted as having the third highest expression? Is this related to an interest in the ERBB2 expression of lung cancer cells specifically, or is there another reasoning behind this observation? Thank you!