bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
1
2
Genetic association of TMPRSS2 rs2070788 polymorphism with
COVID-19 Case Fatality Rate among Indian populations
3
4
Rudra Kumar Pandey1*, Anshika Srivastava1, Prajjval Pratap Singh1, and Gyaneshwer
Chaubey1*
5
6
1
7
8
*Corresponding authors: E-mail address: gyaneshwer.chaubey@bhu.ac.in (Gyaneshwer Chaubey),
rudrakumarpandey4@gmail.com (Rudra Kumar Pandey).
1
Cytogenetics Laboratory, Department of Zoology, Banaras Hindu University, Varanasi, India221005
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
2
9
Abstract
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
SARS-CoV2, the causative agent for COVID-19, an ongoing pandemic, engages the ACE2
receptor to enter the host cell through S protein priming by a serine protease, TMPRSS2.
Variation in the TMPRSS2 gene may account for the difference in population disease
susceptibility. The haplotype-based genetic sharing and structure of TMPRSS2 among global
populations have not been studied so far. Therefore, in the present work, we used this
approach with a focus on South Asia to study the haplotypes and their sharing among various
populations worldwide. We have used next-generation sequencing data of 393 individuals
and analysed the TMPRSS2 gene. Our analysis of genetic relatedness for this gene showed a
closer affinity of South Asians with the West Eurasian populations therefore, host disease
susceptibility and severity particularly in the context of TMPRSS2 will be more akin to West
Eurasian instead of East Eurasian. This is in contrast to our prior study on ACE2 gene which
shows South Asian haplotypes have a strong affinity towards West Eurasians. Thus ACE2 and
TMPRSS2 have an antagonistic genetic relatedness among South Asians. We have also tested
the SNP’s frequencies of this gene among various Indian state populations with respect to the
case fatality rate. Interestingly, we found a significant positive association between the
rs2070788 SNP (G Allele) and the case fatality rate in India. It has been shown that the GG
genotype of rs2070788 allele tends to have a higher expression of TMPRSS2 in the lung
compared to the AG and AA genotypes, thus it might play a vital part in determining
differential disease vulnerability. We trust that this information will be useful in underscoring
the role of the TMPRSS2 variant in COVID-19 susceptibility and using it as a biomarker may
help to predict populations at risk.
31
Keywords: COVID-19, TMPRSS2, India, rs2070788, haplotype, Linkage Disequilibrium
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
3
32
1. Introduction
33
COVID-19 is an ongoing pandemic that has cost millions of lives worldwide, caused by the
34
SARS-CoV2 virus of the Beta Family. Along with ACE2 (Angiotensin-converting enzyme 2)
35
which acts as a receptor, TMPRSS2 (Transmembrane protease, serine 2), a serine protease, is
36
also involved in virus entry the host cell through S Protein priming (1,2). Along with SARS-CoV-
37
2, the Influenza virus, as well as the various human coronaviruses such as HCoV-229E, MERS-
38
CoV, and SARS-CoV, have been identified to utilize this protein for cell entrance (3). Serine
39
proteases have been linked to a variety of physiological and pathological processes.
40
Androgenic hormones were shown to upregulate this gene in prostate cancer cells, while
41
androgen-independent prostate cancer tissue was found to downregulate it (4). Northern
42
blots analysis has revealed that in mice TMPRSS2 is mainly expressed in the kidney and
43
prostate, whereas in humans, TMPRSS2 is largely expressed in the prostate, salivary gland,
44
stomach and colon (5). TMPRSS2 is also expressed in the epithelia of the respiratory,
45
urogenital and gastrointestinal tracts according to in-situ hybridization investigations
46
performed on mice embryos and adult tissues (5).
47
The impact of the COVID-19 crisis is not uniform across ethnic groups. Patients from different
48
ethnic backgrounds suffer disproportionately (6). Discrepancies in infection as well as case
49
fatality rates (CFR) could be due to multiple reasons e.g., differences in quarantine and social
50
distancing policies, access to medical care, reliability & coverage of epidemiological data, and
51
population age structure, which shows that mortality is greater among the elderly and those
52
with comorbidity (7,8). However, many young and healthy people have also lost their lives
53
due to rapid cytokine storms (9). It is important to note that these factors do not appear to
54
account for all the disparities noticed among groups, and there are significant gaps that
55
require the scientific community's attention to propose and test theories that will assist us in
56
better understanding the disease etiology. This is even more important, keeping in mind that
57
the number of cases and deaths may be poorly reported in some populations however,
58
countries with strict standards for the collection and presentation of epidemiological data
59
suggest that human variation in genetic makeup may account for differential susceptibility
60
and severity in disease outcomes among different populations (10). There is evidence that
61
supports the role of ACE2 gene variations in susceptibility to COVID-19 in Indian populations
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
4
62
(11,12). However, little is known regarding the genetic structure of TMPRSS2 haplotypes
63
among South Asian populations, a detailed analysis of the sequence data of TMPRSS2 gene
64
from world populations may unveil its haplotype sharing, which may help understand the role
65
of TMPRSS2 in disease susceptibility globally. Given the relevance of the TMPRSS2 gene in the
66
SARS-CoV-2 infection process, COVID-19 infection and severity pattern may be directly linked
67
to elevated TMPRSS2 gene expression, resulting in varying disease susceptibility outcomes in
68
various communities globally. However, the role of TMPRSS2 polymorphism for disease
69
susceptibility in the Indian populations is largely unexplored and this needs to be examined.
70
Therefore, in the current study, we analysed the haplotype structure of TMPRSS2 focusing on
71
South Asia and its genetic markers that could be responsible for changes in the gene's
72
expression in the lungs tissue and, correlate it with epidemiological data on COVID-19 for any
73
existing association among Indian population.
74
2. Material and Methods
75
The TMPRSS2 gene haplotype analysis for various world populations was done using NGS data
76
from (13). PLINK 1.9 was used to extract sequences from the dataset for different populations
77
(14). After excluding samples from Sahul and Africa, as well relatives up till second-degree, a
78
total of 393 samples and 795 SNPs were observed and were used further for study
79
(Supplementary Table 1 and 2). The plink file was converted to fasta (ped to IUPAC) by a
80
customized script (15). For the purpose of phasing, Fst calculation, Population-wise genetic
81
distances calculation, and generation of Network and Arlequin input file, DNAsp was used
82
(16). MEGA X was used to construct an Fst based Neighbour-joining tree (17). To calculate
83
Nei’s genetic and average pairwise distance, Arlequin 3.5 was used and plotted on a graph by
84
R V3.1 (18,19). Network v5 and network publisher were employed to draw the median-joining
85
network while total and prevalent haplotypes in TMPRSS2 gene for each population were
86
calculated using XML file generated through Arlequin 3.5 (18,20).
87
For the association study, we searched for the studies on TMPRSS2 variants reported in the
88
literature elsewhere in relation to COVID-19 susceptibility (4,21–41). We obtained a total of
89
5 SNPs (rs2070788, rs734056, rs12329760, rs2276205, and rs3787950) was observed in our
90
data and studied subsequently in detail. Data from the Estonian Biocentre (42–45), data from
91
phase 3 of the 1,000 Genomes Project (46), and our new genotyped samples from several
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
5
92
Indian states were used to calculate the frequency of each of these SNPs among various Indian
93
populations using plink 1.9. State-wise frequency maps for rs2070788 and COVID-19 CFR
94
among the Indian population were made by https://www.datawrapper.de/. and worldwide
95
spatial distribution of rs2070788 was generated from the PGG.SNV toolkit using 1000 genome
96
samples (47). The regression plots for statewise allele frequency Vs the CFR were constructed
97
using https://www.graphpad.com/quickcalcs/linear1/ and further validated by the Microsoft
98
excel regression calculations. We also performed Pearson's correlation coefficient test (48)
99
at a 95 percent confidence interval and 1,000 bootstrapping (2,000,000 seeds) for a two-
100
tailed significance test to verify our results by using, SPSS (ver 26). The LD map and aggregate
101
frequency of haplotypes carrying rs2070788 (G allele) were calculated for each of the
102
populations by Haploview (49).
103
3. Result and Discussion
104
TMPRSS2 is a serine protease enzyme that is encoded in humans by the TMPRSS2 gene that
105
is located on chromosome 21q22.3. (50). This protein aids in virus entry into host cells, such
106
as the influenza virus, and human coronaviruses such as HCoV-229E, MERS-CoV, SARS-CoV,
107
and SARS-CoV-2 by proteolytically cleaving and then activating the viral envelope
108
glycoproteins (51), and thus can be inhibited by TMPRSS2 inhibitor (1). Genetic variation in
109
this gene may account for differential vulnerability for COVID-19 disease among diverse
110
populations, therefore, in the present study with our major focus being on South Asia.
111
We analyzed TMPRSS2 gene sequence data among world populations by haplotype-based
112
approach for comparison among the various groups. Fst based neighbour Joining (NJ) tree
113
showed the clustering of South Asians with the West Eurasian populations (Caucasus, West
114
Asia, Europe, and Central Asia) (Figure 1A). Similarly, the Average Pairwise differences
115
analysis showed smaller diversity and genetic distance between populations, among East and
116
West Eurasians, while greater diversity and genetic distance was observed between East and
117
West Eurasian populations. The lowest diversity was found within West Asia & the American
118
population (Figure 1B). A median-joining (MJ) network analysis of the TMPRSS2 gene revealed
119
that there are 499 haplotypes throughout this gene among the examined populations, with
120
prevalent haplotypes (Hap 34, Hap 48, Hap 75, Hap 98, and Hap 260), each having ≥10
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
6
121
individuals. Haplotypes 48 and 75 were found to be more common in Europe, while
122
haplotypes 98 and 260 were observed to be more common in Siberia. Haplotype 34 was
123
frequent in Southeast Asia, followed by Central Asia (Supplementary Table 3A and
124
Supplementary Figure 1). Altogether, South Asian populations carry 47 haplotypes, among
125
which 6 are shared (Hap_34, Hap_48, Hap_78, Hap_112, Hap_219, and Hap_260) with other
126
continental populations while the rest are unique to South Asia. Among the shared
127
haplotypes, five are shared with the West Eurasian populations, whereas only a single
128
haplotype is shared with the East Eurasian populations. (Figure 1C and Supplementary Table
129
3B). The haplotype sharing, as well as Fst analysis, are consistent with the West Eurasian
130
affiliation of the majority of South Asian TMPRSS2 haplotypes (Figure 1C and Figure 1A).
131
Therefore, the host susceptibility of SARS-CoV-2 for TMPRSS2 gene among South Asians is
132
most likely expected to be similar to West Eurasian rather than that of East Eurasians. In
133
contrast with this, our previous study on the ACE2 gene has shown the strong affinity of South
134
Asian haplotypes with the East Eurasians (11,12). Thus, for the South Asians, ACE2 and
135
TMPRSS2 have an antagonistically genetic relatedness. As a result, it's worth proposing that
136
the South Asian population's susceptibility to SARS-CoV-2 will fall somewhere between West
137
and East Eurasian people, which is most likely the cause of the moderate susceptibility.
138
There has not been any association study so far on the TMPRSS2 variants in relation to COVID-
139
19 among Indian Populations. Therefore, we calculated groupwise allele frequencies in Indian
140
populations for all the 5 SNPs (rs2070788, rs734056, rs12329760, rs2276205, and rs3787950)
141
observed in our data. The linear regression analysis was carried out for these SNP's for spatial
142
frequency in India with COVID-19 CFR among various Indian states (Supplementary Table 4
143
A, B and 5). The Regression Analysis showed a significant positive correlation for rs2070788
144
SNP (G allele), between allele frequency and case fatality rate (p < 0.05). Higher CFR was
145
observed where the allele frequency is higher and vice versa (Figure 2A and B). The goodness
146
of fit (R2) explained 33.82% of the variation (Figures 2C). Because this is an active pandemic
147
with changing numbers of infected and dead patients, we confirmed our findings at different
148
timelines (latest up to August 2021). The recent data backs up the previous observation with
149
no substantial difference between the outcomes, to further validate our results we
150
performed the Pearson correlation coefficient test which shows a significant positive
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
7
151
correlation with r = .582, p = 0.029, thus supporting the previous observation of strong
152
positive association (Table 1).
153
Tmprss2 expression in the lungs was reported to be higher in the rs2070788 GG genotype
154
than those in the AA and AG genotype (52) thus, the G allele may contribute to severe
155
consequences in SARS-COV2 infection in populations with high frequency. We found that G
156
allele frequency in India ranges from 20% to 50%, with the mean frequency of 39%, lowest
157
being in Arunachal Pradesh and highest in Bihar which is in accordance as per data observed
158
which clearly shows Arunachal Pradesh is among those states that show lowest CFR while
159
Bihar and other states are among higher CFR rate (Supplementary Table 4A and B). Thus this
160
may explain the disparity in severity of pandemic among various Indian states (Figure 2 B).
161
Being an androgen-sensitive gene TMPRSS2 is known to mediate sex-related effects and
162
rs2070788 SNP seems to play an important role (53). Higher expression of TMPRSS2 in males
163
might make them more prone to virus fusion and could explain high COVID-19 mortality in
164
males (54,55).
165
For Linkage disequilibrium (LD) analysis, LD plots were made for each population focussing on
166
rs2070788 and nearby SNPs on that haplotype. LD blocks of various sizes were observed
167
among Central Asians, Caucasians, Europeans, South Asians, Siberians, and West Asians. The
168
highest LD level was found in Americans. (Supplementary Figure 2). We also calculated
169
aggregate haplotypes frequency which are in LD carrying rs2070788 (G allele), in each
170
population presented in (Supplementary Table 6). Considerable levels of variation in
171
haplotype frequency were observed among the populations. The highest haplotype
172
frequency was observed in America (0.654), while the lowest haplotype frequency was
173
recorded in Southeast Asia Island (0.322), these findings are consistent with epidemiological
174
data available on COVID-19 which clearly shows that the American population has the most
175
number of cases and death while Southeast Asians are much below in the list. We also looked
176
for worldwide distribution of rs2070788 (G allele) from 1000 genome data (Supplementary
177
Table 7 and Supplementary Figure 3) and found consistent with the previous observation,
178
rs2070788 (G allele) frequency was highest in Americans (0.49), while lowest in African (0.27)
179
and East Asians (0.36) populations, this may explain high fatality in among Americans
180
populations while African and East Asians being least affected. Low severity among East
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
8
181
Asians could be due to adaptation at many genes that engage with coronaviruses, also
182
including the SARS-CoV-2, which began 25,000 years back for coronaviruses, or a related virus
183
outbreak in East Asia at that time (56).
184
4. Conclusion
185
In conclusion for the first time, we have shown closer affinity of South Asians with the West
186
Eurasian populations for TMPRSS2 gene. Hence, hot disease susceptibility in context of
187
TMPRSS2 will be more likely similar to West Eurasian populations. This is in contrast to our
188
prior study on the ACE2 gene, which showed closer genetic affinity of South Asian haplotypes
189
with Easts Eurasians. Thus, for South Asians, ACE2 and TMPRSS2 have an antagonistic genetic
190
relationship. So, it's worth proposing that the susceptibility of the South Asian population to
191
SARS-CoV-2 will fall somewhere between West and East Eurasian populations, which is most
192
likely the source of the moderate susceptibility. We also found a genetic association between
193
rs2070788 and CFR among various Indian populations. This information could be used as a
194
genetic biomarker to predict susceptible populations, which may be very useful during the
195
epidemic in policymaking and making better resource allocation.
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
9
196
Author Contributions
197
198
199
GC and RKP conceived and designed this study. RKP, AS, and PPS analysed the data. RKP, AS,
PPS, and GC wrote the manuscript. All authors contributed to the article and approved the
submitted version
200
Acknowledgments
201
202
203
This work is supported by Faculty IOE grant BHU (6031). RKP is supported by the UGC-NonNET fellowship, AS is supported by UGC-CAS fellowship and PPS is supported by CSIR
fellowship.
204
Funding
205
206
This research did not receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.
207
Data Availability Statement
208
All datasets generated for this study are included in the article/Supplementary Material.
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
10
209
Refrences
210
211
212
1.
Hoffmann M, Kleine-Weber H, Schroeder S, Krüger N, Herrler T, Erichsen S, et al. SARS-CoV-2
Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease
Inhibitor. Cell. 2020 Apr 16;181(2):271-280.e8.
213
214
2.
Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated
with a new coronavirus of probable bat origin. Nature. 2020 Mar;579(7798):270–3.
215
216
3.
Shen LW, Mao HJ, Wu YL, Tanaka Y, Zhang W. TMPRSS2: A potential target for treatment of
influenza virus and coronavirus infections. Biochimie. 2017 Nov 1;142:1–10.
217
218
4.
Mollica V, Rizzo A, Massari F. The pivotal role of TMPRSS2 in coronavirus disease 2019 and
prostate cancer. Future Oncol. 2020 Sep 1;16(27):2029–33.
219
220
5.
Vaarala MH, Porvari KS, Kellokumpu S, Kyllönen AP, Vihko PT. Expression of transmembrane
serine protease TMPRSS2 in mouse and human tissues. J Pathol. 2001;193(1):134–40.
221
222
6.
Webb Hooper M, Nápoles AM, Pérez-Stable EJ. COVID-19 and Racial/Ethnic Disparities. JAMA.
2020 Jun 23;323(24):2466–7.
223
224
7.
Ejaz H, Alsrhani A, Zafar A, Javed H, Junaid K, Abdalla AE, et al. COVID-19 and comorbidities:
Deleterious impact on infected patients. J Infect Public Health. 2020 Dec 1;13(12):1833–9.
225
226
8.
Sanyaolu A, Okorie C, Marinkovic A, Patidar R, Younis K, Desai P, et al. Comorbidity and its
Impact on Patients with COVID-19. SN Compr Clin Med. 2020 Aug 1;2(8):1069–76.
227
228
229
230
9.
Muschitz C, Trummert A, Berent T, Laimer N, Knoblich L, Bodlaj G, et al. Attenuation of COVID19-induced cytokine storm in a young male patient with severe respiratory and neurological
symptoms. Wien Klin Wochenschr [Internet]. 2021 Apr 27 [cited 2021 Aug 28]; Available from:
https://doi.org/10.1007/s00508-021-01867-2
231
232
233
10.
SeyedAlinaghi S, Mehrtak M, MohsseniPour M, Mirzapour P, Barzegary A, Habibi P, et al.
Genetic susceptibility of COVID-19: a systematic review of current evidence. Eur J Med Res.
2021 May 20;26(1):46.
234
235
236
11.
Srivastava A, Bandopadhyay A, Das D, Pandey RK, Singh V, Khanam N, et al. Genetic Association
of ACE2 rs2285666 Polymorphism With COVID-19 Spatial Distribution in India. Front Genet.
2020;11:1163.
237
238
239
12.
Srivastava A, Pandey RK, Singh PP, Kumar P, Rasalkar AA, Tamang R, et al. Most frequent South
Asian haplotypes of ACE2 share identity by descent with East Eurasian populations. PLOS ONE.
2020 Sep 16;15(9):e0238255.
240
241
13.
Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, et al. Genomic analyses inform
on migration events during the peopling of Eurasia. Nature. 2016 Oct;538(7624):238–42.
242
243
244
14.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A Tool Set
for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet. 2007
Sep 1;81(3):559–75.
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
11
245
246
247
15.
Sander N, Abel GJ, Bauer R, Schmidt J. Visualising migration flow data with circular plots
[Internet]. Vienna Institute of Demography Working Papers; 2014 [cited 2021 Aug 28]. Report
No.: 2/2014. Available from: https://www.econstor.eu/handle/10419/97018
248
249
250
16.
Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, Guirao-Rico S, Librado P, Ramos-Onsins SE, et al.
DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets. Mol Biol Evol. 2017 Dec
1;34(12):3299–302.
251
252
17.
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics
Analysis across Computing Platforms. Mol Biol Evol. 2018 Jun;35(6):1547–9.
253
254
18.
Excoffier L, Lischer HEL. Arlequin suite ver 3.5: a new series of programs to perform population
genetics analyses under Linux and Windows. Mol Ecol Resour. 2010;10(3):564–7.
255
256
19.
R: The R Project for Statistical Computing [Internet]. [cited 2021 Aug 28]. Available from:
https://www.r-project.org/
257
258
20.
Bandelt HJ, Forster P, Röhl A. Median-joining networks for inferring intraspecific phylogenies.
Mol Biol Evol. 1999 Jan 1;16(1):37–48.
259
260
261
21.
Andolfo I, Russo R, Lasorsa VA, Cantalupo S, Rosato BE, Bonfiglio F, et al. Common variants at
21q22.3 locus influence MX1 and TMPRSS2 gene expression and susceptibility to severe
COVID-19. iScience. 2021 Apr 23;24(4):102322.
262
263
264
22.
Asselta R, Paraboschi EM, Mantovani A, Duga S. ACE2 and TMPRSS2 variants and expression as
candidates to sex and country differences in COVID-19 severity in Italy. Aging. 2020 Jun
5;12(11):10087–98.
265
266
267
268
269
23.
Bhattacharyya C, Das C, Ghosh A, Singh AK, Mukherjee S, Majumder PP, et al. Global Spread of
SARS-CoV-2 Subtype with Spike Protein Mutation D614G is Shaped by Human Genomic
Variations that Regulate Expression of TMPRSS2 and MX1 Genes [Internet]. 2020 May [cited
2021 Aug 28] p. 2020.05.04.075911. Available from:
https://www.biorxiv.org/content/10.1101/2020.05.04.075911v1
270
271
272
24.
Darbani B. The Expression and Polymorphism of Entry Machinery for COVID-19 in Human:
Juxtaposing Population Groups, Gender, and Different Tissues. Int J Environ Res Public Health.
2020 Jan;17(10):3433.
273
274
275
25.
Hou Y, Zhao J, Martin W, Kallianpur A, Chung MK, Jehi L, et al. New insights into genetic
susceptibility of COVID-19: an ACE2 and TMPRSS2 polymorphism analysis. BMC Med. 2020 Jul
15;18(1):216.
276
277
278
26.
Irham LM, Chou W-H, Calkins MJ, Adikusuma W, Hsieh S-L, Chang W-C. Genetic variants that
influence SARS-CoV-2 receptor TMPRSS2 expression among population cohorts from multiple
continents. Biochem Biophys Res Commun. 2020 Aug 20;529(2):263–9.
279
280
27.
Iyer GR, Samajder S, Zubeda S, S DSN, Mali V, PV SK, et al. Infectivity and Progression of COVID19 Based on Selected Host Candidate Gene Variants. Front Genet. 2020;11:861.
281
282
283
28.
Jeon S, Blazyte A, Yoon C, Ryu H, Jeon Y, Bhak Y, et al. Ethnicity-dependent allele frequencies
are correlated with COVID-19 case fatality rate [Internet]. Preprints; 2020 Oct [cited 2021 Aug
28]. Available from: https://www.authorea.com/users/367817/articles/487091-ethnicity-
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
12
284
285
dependent-allele-frequencies-are-correlated-with-covid-19-case-fatalityrate?commit=92f9ba974af4c5e0ff312d7dd9994aa1b1589975
286
287
288
29.
Kim Y-C, Jeong B-H. Strong Correlation between the Case Fatality Rate of COVID-19 and the
rs6598045 Single Nucleotide Polymorphism (SNP) of the Interferon-Induced Transmembrane
Protein 3 (IFITM3) Gene at the Population-Level. Genes. 2021 Jan;12(1):42.
289
290
291
30.
Latini A, Agolini E, Novelli A, Borgiani P, Giannini R, Gravina P, et al. COVID-19 and Genetic
Variants of Protein Involved in the SARS-CoV-2 Entry into the Host Cells. Genes. 2020
Sep;11(9):1010.
292
293
294
31.
Paniri A, Hosseini MM, Akhavan-Niaki H. First comprehensive computational analysis of
functional consequences of TMPRSS2 SNPs in susceptibility to SARS-CoV-2 among different
populations. J Biomol Struct Dyn. 2021 Jul 3;39(10):3576–93.
295
296
32.
Piva F, Sabanovic B, Cecati M, Giulietti M. Expression and co-expression analyses of TMPRSS2,
a key element in COVID-19. Eur J Clin Microbiol Infect Dis. 2021 Feb 1;40(2):451–5.
297
298
299
33.
Ragia G, Manolopoulos VG. Assessing COVID-19 susceptibility through analysis of the genetic
and epigenetic diversity of ACE2-mediated SARS-CoV-2 entry. Pharmacogenomics. 2020 Dec
1;21(18):1311–29.
300
301
302
34.
Senapati S, Kumar S, Singh AK, Banerjee P, Bhagavatula S. Assessment of risk conferred by
coding and regulatory variations of TMPRSS2 and CD26 in susceptibility to SARS-CoV-2
infection in human. J Genet. 2020;99:53.
303
304
305
306
35.
Sharma S, Singh I, Haider S, Malik MZ, Ponnusamy K, Rai E. ACE2 Homo-dimerization, Human
Genomic variants and Interaction of Host Proteins Explain High Population Specific Differences
in Outcomes of COVID19 [Internet]. 2020 Apr [cited 2021 Aug 28] p. 2020.04.24.050534.
Available from: https://www.biorxiv.org/content/10.1101/2020.04.24.050534v1
307
308
309
36.
Singh H, Choudhari R, Nema V, Khan AA. ACE2 and TMPRSS2 polymorphisms in various
diseases with special reference to its impact on COVID-19 disease. Microb Pathog. 2021 Jan
1;150:104621.
310
311
37.
Strope JD, PharmD CHC, Figg WD. TMPRSS2: Potential Biomarker for COVID‐19 Outcomes. J
Clin Pharmacol. 2020 May 21;10.1002/jcph.1641.
312
313
314
38.
Torre-Fuentes L, Matías-Guiu J, Hernández-Lorenzo L, Montero-Escribano P, Pytel V, PortaEtessam J, et al. ACE2, TMPRSS2, and Furin variants and SARS-CoV-2 infection in Madrid, Spain.
J Med Virol. 2021 Feb;93(2):863–9.
315
316
317
39.
Vargas-Alarcón G, Posadas-Sánchez R, Ramírez-Bello J. Variability in genes related to SARS-CoV2 entry into host cells (ACE2, TMPRSS2, TMPRSS11A, ELANE, and CTSL) and its potential use in
association studies. Life Sci. 2020 Nov 1;260:118313.
318
319
320
40.
Wang F, Huang S, Gao R, Zhou Y, Lai C, Li Z, et al. Initial whole-genome sequencing and analysis
of the host genetic contribution to COVID-19 severity and susceptibility. Cell Discov. 2020 Nov
10;6(1):1–16.
321
322
323
41.
Wulandari L, Hamidah B, Pakpahan C, Damayanti NS, Kurniati ND, Adiatmaja CO, et al. Initial
study on TMPRSS2 p.Val160Met genetic variant in COVID-19 patients. Hum Genomics. 2021
May 17;15(1):29.
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
13
324
325
326
42.
Chaubey G, Ayub Q, Rai N, Prakash S, Mushrif-Tripathy V, Mezzavilla M, et al. “Like sugar in
milk”: reconstructing the genetic history of the Parsi population. Genome Biol. 2017 Jun
14;18(1):110.
327
328
329
43.
Pathak AK, Kadian A, Kushniarevich A, Montinaro F, Mondal M, Ongaro L, et al. The Genetic
Ancestry of Modern Indus Valley Populations from Northwest India. Am J Hum Genet. 2018
Dec 6;103(6):918–29.
330
331
44.
Re3data.Org. Estonian Biocentre Public Data. 2014 [cited 2021 Aug 28]; Available from:
http://service.re3data.org/repository/r3d100010986
332
333
45.
Tätte K, Pagani L, Pathak AK, Kõks S, Ho Duy B, Ho XD, et al. The genetic legacy of continental
scale admixture in Indian Austroasiatic speakers. Sci Rep. 2019 Mar 7;9:3818.
334
335
336
46.
Durbin RM, Altshuler D, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, et al. A map of
human genome variation from population-scale sequencing. Nature. 2010
Oct;467(7319):1061–73.
337
338
339
47.
Zhang C, Gao Y, Ning Z, Lu Y, Zhang X, Liu J, et al. PGG.SNV: understanding the evolutionary and
medical implications of human single nucleotide variations in diverse populations. Genome
Biol. 2019 Oct 22;20(1):215.
340
341
342
343
48.
Benesty J, Chen J, Huang Y, Cohen I. Pearson Correlation Coefficient. In: Cohen I, Huang Y, Chen
J, Benesty J, editors. Noise Reduction in Speech Processing [Internet]. Berlin, Heidelberg:
Springer; 2009 [cited 2021 Aug 28]. p. 1–4. (Springer Topics in Signal Processing). Available
from: https://doi.org/10.1007/978-3-642-00296-0_5
344
345
49.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype
maps. Bioinformatics. 2005 Jan 15;21(2):263–5.
346
347
348
50.
Paoloni-Giacobino A, Chen H, Peitsch MC, Rossier C, Antonarakis SE. Cloning of the TMPRSS2
Gene, Which Encodes a Novel Serine Protease with Transmembrane, LDLRA, and SRCR
Domains and Maps to 21q22.3. Genomics. 1997 Sep 15;44(3):309–20.
349
350
51.
Huggins DJ. Structural analysis of experimental drugs binding to the SARS-CoV-2 target
TMPRSS2. J Mol Graph Model. 2020 Nov 1;100:107710.
351
352
353
52.
Cheng Z, Zhou J, To KK-W, Chu H, Li C, Wang D, et al. Identification of TMPRSS2 as a
Susceptibility Gene for Severe 2009 Pandemic A(H1N1) Influenza and A(H7N9) Influenza. J
Infect Dis. 2015 Oct 15;212(8):1214–21.
354
355
356
53.
Alshahawey M, Raslan M, Sabri N. Sex-mediated effects of ACE2 and TMPRSS2 on the
incidence and severity of COVID-19; The need for genetic implementation. Curr Res Transl
Med. 2020 Nov;68(4):149–50.
357
358
54.
Lamy P-J, Rébillard X, Vacherot F, de la Taille A. Androgenic hormones and the excess male
mortality observed in COVID-19 patients: new convergent data. World J Urol. 2020 Jun 2;1–3.
359
360
361
55.
Peckham H, de Gruijter NM, Raine C, Radziszewska A, Ciurtin C, Wedderburn LR, et al. Male sex
identified by global COVID-19 meta-analysis as a risk factor for death and ITU admission. Nat
Commun. 2020 Dec 9;11(1):6317.
bioRxiv preprint doi: https://doi.org/10.1101/2021.10.04.463014; this version posted October 5, 2021. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
14
362
363
364
56.
365
Figure legends
366
367
368
369
370
371
372
373
Fig 1(A) Neighbour-Joining (NJ) tree based on Fst distance, showing genetic relationship for TMPRSS2
gene among the studied population. (B) Matrix showing average paired variation for TMPRSS2 gene,
between the population (green) in the upper triangle, within-population (orange) along diagonal, and
Nei’s distance between populations are shown (blue) in the lower triangle. The obtained value for
different variables is directly proportional to the color gradient. (C) The stacked bar-plot represents
47 haplotypes observed in TMPRSS2 Gene among South Asian populations. Frequency and sharing for
each haplotype with South Asia and to other geographic regions are indicated with different coloured
bars.
374
375
376
377
FIGURE 2 (A) frequency map (%) showing the spatial distribution of allele rs2258666 among Indian
populations. Grey colour marks the absence of data. (B) The Map of state-wise frequency (%) of casefatality rate (CFR) (updated till 30th August 2021). (C) The linear regression analysis graph showing the
goodness of fit and Pearson correlation coefficient for the allele frequency vs. CFR.
378
379
Supplementary Figure 1 The median-joining network of TMPRSS2 gene. The circle size determines the
number of samples with a certain haplotype. The five most common haplotypes are marked.
380
381
382
Supplementary Figure 2 LD (linkage disequilibrium) maps of the TMPRSS2 gene, focusing on
rs2070788 and its haplotype, in world populations. Shading from white to red indicates the intensity
of r2 from 0 to 1. Strong LD is represented by a high percentage (>80) in darker red squares.
383
Supplementary Figure 3 The spatial distribution of SNP rs2070788 from 1000 genome data
384
Souilmi Y, Lauterbur ME, Tobler R, Huber CD, Johar AS, Moradi SV, et al. An ancient viral
epidemic involving host coronavirus interacting genes more than 20,000 years ago in East Asia.
Curr Biol. 2021 Aug 23;31(16):3504-3514.e9.
TABLE 1 | Outcome of tests conducted for statistical significance at different timelines of the pandemic in
India.
Observation
rs2070788
Linear regression
R square
p-value
Pearson’s correlation
r
p-value
June 2021_CFR
0.3382
0.0292
0.582
0.029
July 2021_CFR
0.3097
0.0387
0.557
0.039
August 2021_CFR
0.2888
0.0475
0.537
0.047