Seven days of extreme GWAS p-values, 2023, episode 3 of 7, explainable trans-pQTLs
The GWAS Catalog has collected over 100,000 pQTLs linking a specific variant to a specific protein. I have systematically mapped each protein name to a unique identifier and each variant to its position relative to the gene encoding that protein, the cognate gene.
On Monday I reviewed the new cis-pQTLs, where the variant for a protein falls close to the cognate gene. Yesterday we reviewed cases where the variant was far from the cognate gene and there was no obvious link between the genes close to that variant and the protein being measured.
All of which leads us to today: explainable trans-pQTLs!
I am going to invite a bit of audience participation today.
For the top 10 explainable trans-pQTLs of 2023 I will provide the protein name and the closest gene.
If you’ve followed me at all you know that in general in a GWAS study the closest gene is the causal gene most of the time.
In this case (by my estimate) the closest gene is the correct choice in 8 of 10 cases.
I will lay out the case for one of the associations where you have to move beyond the closest gene.
Your mission, should you choose to accept it, is to find the other entry where the closest gene is not the correct causal gene.
Ready?
Let’s begin!
This is the list of the top 10 trans-pQTLs with an explainable causal gene added to the GWAS catalog in 2023. Once again the star is “Mapping the proteo-genomic convergence of human diseases” from Maik Pietzner, Claudia Langenberg and colleagues, which is the source of 9 of the top 10 associations.
I have listed the closest gene to each lead SNP in the last column. Eight of these (I believe) are the true causal genes. Two of them are not.
The one I will explain is the top hit, the association of Granulin at the CELSR2 locus.
One of these days I intend to release a list of “red herring” genes, that is, genes that are frequently the closest gene but are (probably) never the true causal gene.
For example, I wrote a whole Twitter thread about how the hundreds of “N-acetyl-amino acid” associations at the ALMS1 locus are almost certainly due to the N-acetyltransferase gene (NAT8) next door, and not ALMS1. (but this doesn’t stop people from writing papers suggesting ALMS1 as the causal gene).
Similarly, if you see CELSR2 you should immediately suspect SORT1, especially if the trait is at all lipid related. Kiran Musunuru, Daniel Rader and colleagues demonstrated over a decade ago that SORT1 is the causal gene for LDL-cholesterol (From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus).
Still, because of the strong eQTL for CELSR2 at the locus, many methods will gravitate toward CELSR2.
Just a quick pubmed search turns up two papers just in the past couple months recommending CELSR2 as a target for dyslipidemia essentially based on eQTLs:
Please don’t do this unless you actually have biochemical data showing that the CELSR2 gene product (cadherin EGF LAG seven-pass G-type receptor 2) actually interacts with LDL particles.
Musunuru et al demonstrated that SORT1 is the causal gene for LDL-cholesterol in 2010.
What about the granulin proteins, all derived from the GRN gene?
Also in 2010, Fenghua Hu et al demonstrated that sortilin is responsible for the endocytosis of progranulin, eventually leading to its intracellular processing into granulins. (Sortilin-mediated endocytosis determines levels of the frontotemporal dementia protein, progranulin). A cell-line with a SORT1 KO differs significantly in its concentration of progranulin and the stable granulins (Intracellular Proteolysis of Progranulin Generates Stable, Lysosomal Granulins that Are Haploinsufficient in Patients with Frontotemporal Dementia Caused by GRN Mutations)
The case of the hidden causal gene.
I mentioned I believe that only 8 of the closest genes are in fact the causal gene for the protein abundance trait. I walked through the evidence for the strongest association, for “Granulins” abundance at the SORT1 locus.
Of the remaining 9 loci, which one harbors an imposter? Which association is better explained by a gene which is not the closest gene?
Correct responses (with rationale) will get a one-of-a-kind photo of my spirit avatar, Mojo, whose face graces all my social media accounts.
Here again are the top 10 explainable trans-pQTLs for 2023:
Tomorrow
Check back tomorrow when we will venture into some non-molecular traits such as prostate cancer and Dupuytren’s disease.
Associate professor of Medical genetics - Univ. of Sassari (IT)
1yWhat about the APOL1 history? "Humans and some higher primates resist trypanosomiasis by T. brucei, an arthropod-borne multispecies pathogen that limits agrarian activity in sub-Saharan Africa because livestock are primary hosts. Two trypanosome lytic factors (TLF), constitutively circulating as innate immune effectors, mediate this protection, and both contain the primate-only proteins, APOL1 and haptoglobin-related protein (HPR). Interestingly, genetic variants in the HPR locus associate with circulating APOL1 levels7" [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5678957/]. Causal gene in this case would be HPR, not the closest DHX38.