In silico Analysis of Single Nucleotide Polymorphism (SNPs) in Human RAG1 and RAG2 Genes of Severe Combined Immunodeficiency
- 1. Department of Rheumatology, Omdurman Teaching Hospital, Sudan
- 2. Department of Applied Bioinformatics, Africa City of Technology, Sudan
Abstract
Severe combined immunodeficiency is an inherited Primary immunodeficiency PID, which is characterized by the absence or dysfunction of T lymphocytes. Defects in RAG1 and RAG2 are known to cause a TBNK+ form of SCID. Recombinase activating genes RAG1 and RAG2 (OMIM 179615,179616 respectively) are expressed exclusively in lymphocytes and mediate the creation of double-strand. DNA breaks at the sites of recombination and in signal sequences during T− and B− cell receptor gene rearrangement. This study was focused on the effect of nonsynonymous single nucleotide polymorphisms in the function and structure of RAG1& RAG2 genes using In silico analysis. Only nsSNPs and 3’UTR SNPs were selected for computational analysis. Predictions of deleterious nsSNPs were performed by bioinformatics software. Five damaging nsSNPs (rs112047157, rs61758790, rs4151032, rs61752933, rs75591129) were predicted in RAG1 and two damaging nsSNPs (rs112927992, rs17852002) in RAG2, all of this nsSNPs were found on domain that important in binding and mutation effect in its protein function. Hence it is the first study type of RAG1 and RAG2 analysis. We hope to provide more information that needed to help researchers to do further study in SCID especially in our country where consanguineous marriage is common.
Citation
Ali MSAS, Tomador Siddig MZ, Elhadi RA, Yousof MR, Yousif Abdallah SE, et al. (2016) In silico Analysis of Single Nucleotide Polymorphism (SNPs) in Human RAG1 and RAG2 Genes of Severe Combined Immunodeficiency. J Bioinform, Genomics, Proteomics 1(1): 1005.
Keywords
• Severe combined immunodeficiency
• Primary immunodeficiency
• T lymphocytes
• Recombinase activating genes
• non synonymous Single Nucleotide Polymorphisms
ABBREVIATIONS
SCID: Severe combined immunodeficiency; PID: Primary Immunodeficiency; OMIM: Online Mendelian Inheritance in Man; nsSNP: nonsynonymous Single Nucleotide Polymorphisms; RAG1: Recombinase Activating Gene1; RAG2: Recombinase Activating Gene2; AR: Autosomal Recessive; NK: natural killer; DNA: Deoxyribo Nucleic Acid; SIFT: Sorting Intolerant from Tolerant; PolymiRTS: Polymorphism In Micro RNAs and their Target Sites; PolyPhen-2: Polymorphism Pheno typing V2; PSIC: Position-Specific Independent Count; miRNA: Micro Ribonucleic Acid; 3′UTR: 3′ Un Translated Region; RI: Reliability Index; RSSs: Recombination Signal Sequences; GO: Gene-Ontology
INTRODUCTION
SCID is an inherited primary immunodeficiency, which is characterized by the absence or dysfunction of T lymphocytes affecting both cellular and humoral adaptive immunity [1-5]. Estimated to be 1 in 75,000-100,000 of live births [8-11] and are more common in male subjects, reflecting the over representation of X-linked SCID (XL-SCID), the most common worldwide form (50%) of SCID in human subjects [10]. However, in cultures in which consanguineous marriage is common, the incidence of autosomal recessive - SCID is higher than has been previously reported [10]. It can be classified as T−B+ and T−B− SCID with further subdivision based on the presence or absence of NK cells [5]. Defects in Recombinase activating genes (RAG1 and RAG2) are known to cause a T-B-NK+ form of AR- SCID (OMIM: 601457) [12,13]. It is now known that SCID can be caused in humans by mutations in at least 13 different genes that result in aberrant development of T cell [7]. Since the first description of RAG1 and RAG2 deficiency in patients with SCID by Schwarz et al. in 1996 [14], a pleiotropic spectrum of phenotypes associated with RAG1 and RAG2 deficiency has been described. The location of RAG1& RAG2 is on chromosome 11 p13 [15,16]. RAG1 and RAG2 are expressed exclusively in lymphocytes and mediate the creation of double-strand DNA breaks at the sites of recombination and in signal sequences during T- and B-cell receptor gene rearrangement [10].
V(D)J recombination is the site-specific DNA rearrangement process that assembles Both B and T Cell Receptor - TCR - genes during lymphoid development. Recombination is initiated by the lymphoid-specific RAG1 and RAG2 recombinase, which introduces double-strand DNA breaks at RSSs flanking variable (V), diversity (D), and junction (J) gene segments spread along the Immunoglobulin and TCR loci [17-20]. The recombination process is tightly regulated, occurring at specific stages of development and in specific cell types (e.g., Immunoglobulin and TCR genes are rearranged in B and T cells, respectively). This process takes place in a temporal manner, with Immunoglobulin heavy chain rearrangements preceding Immunoglobulin light chain rearrangements and D-to-J rearrangements preceding V-to-D J rearrangements [17-20]. Mutations in either RAG1 or RAG2 genes hamper initiation of V(D)J recombination, hence causing an early block of B and T cell maturation similar to the situation of RAG1 and RAG2 knockout (KO) mice [21].
In this computational study, we focused on the effect of nsSNPs in the function and structure of RAG1 and RAG2 Protein using In silico analysis. Hence it is the first study type of RAG1 and RAG2 analysis, we hope to provide more information that needed to help researchers to do further study in SCID especially in our country where consanguineous marriage is common.
MATERIALS AND METHODS
The SNPs sequence of RAG1 and RAG2 genes were collected in August 2015 from NCBI database (http://www.ncbi.nlm.nih. gov/projects/SNP). They contained a total of 5413 SNPs in RAG1 and 2804 SNPs in RAG2 at the time of the study, out of which 1048 in RAG1 and 508 in RAG2 were coding SNPs, 147 in RAG1 and 28 in RAG2 occurred in the miRNA 3′ UTR, eight in RAG1 and ten in RAG2 occurred in 5′ UTR region and 234 in RAG1 and 178 in RAG2 occurred in intronic regions. We selected missense & nonsense nsSNPs and 3′ UTR SNPs for our investigation, Figure (1). The nsSNPs (rs SNPs) of RAG1 and RAG2 were submitted as batch to SIFT server, then the resultant damaging nsSNPs were submitted to Polyphen as query sequences in FASTA Format. Prediction of change in stability due to mutation was performed by I-Mutant 2.0. The protein sequences used were obtained from the ExPASy Database (www.expasy.org/ ). Project hope software was used to highlight the changes occurred as a result of the deleterious SNPs at the molecular level of the protein 3D structure. The SNPS at the 3′ UTR region were analyzed by PolymiRTS software. Prediction of the function of genes and their interactions were obtained from Gene MANIA database.
Gene MANIA (http://www.genemania.org/)
It is an online database that helps you predict the function of your favorite genes and gene sets. Gene MANIA finds other genes that are related to a set of input genes, using a very large set of functional association data. Association data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. You can use Gene MANIA to find new members of a pathway or complex, find additional genes you may have missed in your screen or find new genes with a specific function, such as protein kinases. Your question is defined by the set of genes you input [22].
SIFT-software
In order to detect deleterious nsSNPs, SIFT program was used, which is a novel bioinformatics tool to predict whether an amino acid substitution affects protein function, this program generates alignments with a large number of homologous sequences and assigns score for each residue ranging from zero to one. Scores closer to zero indicates evolutionary conservation of the genes and intolerance to substitution, while scores closer to one indicate tolerance to substitution only [23]. (http://sift. jcvi.org/)
PolyPhen-2
Also is an online bioinformatics soft-ware produced by Harvard University it searches for 3D protein structures, then calculates PSIC scores for each of two variant, the PSIC scores difference between two variants. PolyPhen results were assigned probably damaging (2.00 or more) possibly damaging (1.40- 1-90), potentially damaging (1.0-1.5), benign (0.00-0.90), [24]. (http://genetics.bwh.harvard.edu/pph2/index.shtml).We used this software to confirm SIFT result and we took only double positive results for further workup.
I-Mutant v2.0c
Predictor the stability changes upon mutation from the protein sequence or structure. It shows the amino acid in Wild-Type Protein (WT), New Amino acid after Mutation (NAW), reliability Index (RI), Temperature in Celsius degrees (T) and the PH [25].
I-Mutant available at: (http:/www.I-Mutant2.0.cgi).
Chimera
It is a software produced by University of California; San Francisco is used in this step to generate the mutated models of protein 3D model. The outcome is then a graphic model depicting mutation [26]. (http://www.cgl.ucsf.edu/chimera/).
Project Hope software (http://www.cmbi.ru.nl/ hope/input)
It is an online web server where the user can submit a sequence and mutation. This software collects structural information from a series of sources, including calculations on the 3D protein structure, sequence annotations in UniProt and predictions from DAS-servers. It combines this information to give analyze the effect of a certain mutation on the protein structure and will show the effect of that mutation in such a way that even those without a bioinformatics background can understand it [27].
PolymiRTS
It is the database server designed specifically for the analysis of the 3′UTR region; we used this server to determine SNPs that may alter miRNA target site [28]. All SNPs located within the 3′UTR region were selected separately and submitted to the program. (Available at: http://compbio.uthsc.edu/miRSNP/).
RESULTS AND DISCUSSION
Prediction of protein structural stability
Seven nsSNPs of RAG1 and RAG2 genes have been selected on the basis of prediction scores of SIFT and PolyPhen; these SNPs were given to I-Mutant web server to predict the DDG stability and RI upon mutation, in RAG1, three SNPs (rs112047157, rs4151032 and rs75591129) shown decrease in protein stability while other two SNPs (rs61758790 and rs61752933) shown increase in protein stability, both SNPs in RAG2 predicted to decrease protein stability, as in (Table 1).
Modeling of mutant structure
Protein sequences of the nsSNP were presented to Project Hope revealed the 3D structure for the truncated proteins with its new candidates; in addition, it described the reaction and physiochemical properties of these candidates. Here we present the results upon each candidate and discuss the conformational variations and interactions with the neighboring amino acids; all native and mutant structure of RAG1 and RAG2 proteins showed in the (Figure 2). The wild type is displayed by green color while mutant type is displayed by red one. A/G mutation (rs112047157) led to conversion of methionine to valine at position 487. The mutant residue is smaller than wild residue; this might lead to loss of interactions. The wild-type residue is located in an α-helix. The mutation converts the wild-type residue in a residue that does not prefer α-helices as secondary structure leading to disturb local structure. The mutated residue is located in a domain that is important in DNA binding and nucleic acid binding. Mutation of the residue might disturb the function. (Figure 2a)
G/T mutation (rs61758790) caused conversion of phenylalanine to leucine at position 520. The mutant residue is smaller than wild residue, this might lead to loss of interactions. The wild-type residue is located in its preferred secondary structure, a β-strand but the mutant residue prefers to be in another secondary structure; therefore the local conformation will be slightly destabilized. The mutated residue is located in a domain that is important in DNA binding and nucleic acid binding. Mutation of the residue might disturb this function (Figure 2b)
C/T mutation (rs4151032) resulted in change of proline to serine at position 525. Prolines are known to have a very rigid structure, changes a proline with such a function into another residue disturbing the local structure. This variant is annotated with severity: Polymorphism (VAR_029263). The mutant residue is smaller than wild residue; this might lead to loss of interactions. The wild-type residue is more hydrophobic than the mutant residue. Hydrophobic interactions, either in the core of the protein or on the surface, will be lost. The mutated residue is located in a domain that is important for binding of other molecules. Mutation of the residue might disturb this function, (Figure 2c).
A/G mutation (rs61752933) caused change of isoleucine in to valine at position 810. The mutant residue is smaller than wild residue; this might lead to loss of interactions. The mutated residue is located in a domain that is important in DNA binding and nucleic acid binding. Mutation of the residue might disturb this function, (Figure 2d).
A\C mutation (rs75591129) caused conversion of tyrosine to serine at position 913. Tyrosine is preferred secondary structure, a β-strand. The mutant residue prefers to be in another secondary structure; therefore the local conformation will be slightly destabilized. The mutated residue is located in a domain that is important in DNA binding and nucleic acid binding. Mutation of the residue might disturb this function, (Figure 2e). C/T mutation (rs112927992) resulted in change serine to phenylalanine at position 291; this residue is part of an interpro domain named V-D-J Recombination Activating Protein 2. This domain is annotated with GO, these GO annotations indicate the domain has a function in DNA binding and Nucleic Acid Binding. This residue is part of an interpro domain named Galactose Oxidase/ kelch, Beta-Propeller (IPR011043). The mutant residue is bigger than the wild-type residue, this might lead to bumps. The mutant residue is more hydrophobic than the wild-type residue; this can result in loss of hydrogen bonds and/or disturb correct folding. The mutated residue is located in a domain that is important for binding of other molecules. Mutation of the residue might disturb this function (Figure 2f). T/C mutation (rs17852002) led to conversion of valine to alanine at position 154. The mutant residue is smaller than the wild residue; this might lead to loss of interactions. The wild-type residue is located in its preferred secondary structure, a β-strand. The mutant residue prefers to be in another secondary structure; therefore the local conformation will be slightly destabilized. The residue is part of an interpro domain named V-D-J Recombination Activating Protein 2. This domain is annotated with GO these indicate the domain has a function in DNA binding and Nucleic Acid Binding, also is part of an interpro domain named Galactose Oxidase/ kelch, Beta-Propeller (IPR011043 ), is part of an interpro domain named Kelch-Type Beta Propeller ( IPR015915 ) ,the domain is also important in Protein Binding (Figure 2g). Both nsSNPs of RAG2 were found in the same domain that important in function of protein leading to disturb of function. Mutations in RAG1 or RAG2 result in the blocking of T- and B-cell inability to initiate recombination of the DNAs variable, diversity, and joining regions, and thereby do not form functional T- cell or B-cell receptors [29].
SNPs at the 3′UTR region
SNPs in 3′UTR of RAG1 and RAG2 genes were submitted as batch to PolymiRTS server. The output showed result as following; in RAG1, 19 SNPs were predicted while only three SNPs were predicted in RAG2 gene. The functional classes of both genes are described in (Table 2) below. According to (Table 2), we found some SNPs in 3′UTR of both RAG1 and RAG2 related to cancer development although those patients with SCID don’t survive [8-10] till they develop cancer, but some patients with delayed onset of RAG1 deficiency develop cancer [30].
RAG1 and RAG2 have many vital functions, and they interact, co-expressed, share similar protein domain, or participate to achieve many functions with many genes and they are illustrated by using GENEMANIA and shown in (Figure 3) below.
Table 1: Prediction of nsSNPs in RAG 1& RAG 2 by SIFT, PolyPhen-2 and I-Mutant software.
Gene Type | SNP ID | Chromosome Location | Nucleotide Change | Sift Prediction | Sift Score | Sift Medium | Acc | Amino Acid Change | Polyphen-2 Result | I Mutant Result |
RAG1 | rs112047157 | 11:36574763 | A/G | Damage | 1.000.00 | 4.27 | P15918 | M487V | Possibly Damaging | Decrease stability |
rs61758790 | 11:36574864 | G/T | Damage | 1.000.00 | 4.27 | F520L | Probably Damaging | Increase stability | ||
rs4151032 | 11:36574877 | C/T | Damage | 1.000.00 | 4.27 | P525S | Probably Damaging | Decrease stability | ||
rs61752933 | 11:36575732 | A/G | Damage | 1.000.01 | 4.27 | I810V | Possibly Damaging | Increase stability | ||
rs75591129 | 11:36576096 | A/C | Damage | 1.000.00 | 4.27 | Y931S | Probably Damaging | Decrease stability | ||
RAG2 | rs112927992 | 11:36614847 | C/T | Damage | 1.000.00 | 4.32 | P55895 | S291F | Possibly Damaging | Decrease stability |
rs17852002 | 11:36615258 | T/C | Damage | 1.000.00 | 4.32 | V154A | Probably Damaging | Decrease stability |
CONCLUSION
In RAG1 we found five nsSNPs and two nsSNPs in RAG2 predicted by both SIFT and Polyphen, i.e. Double positive results; however five nsSNP (rs415107, rs34841221, rs4151029, rs2227973, rs4151034) and one nsSNP (rs117899975) in RAG1 and RAG2 respectively damaging by SIFT only which may be due to the limitation of the softwares used and we estimate the nsSNPs to be further analyzed by an advance software to predict their effect and they are speculated to affect the stability or function of the proteins. From this study we suggest these seven nsSNPs predicted to be good candidates and very useful in detection of SCID associated with RAG1 and RAG2.
From the results of PolymiRTS we noticed that although many cancer types may generate due to miRNA target site but possibility of cervical cancer was the most common. Application of the computational tools might provide an alternative approach to select target SNPs in association studies, helping in research and diagnostic purpose as well.
Table 2A: 3′UTR SNPs of RAG1 as detected by PolymiRTS.
Location | dbSNP ID | miR ID | Cancer type | miR site | Function class | context+ score change |
36598009 | rs189589191 | hsa-miR-548 | Cervical cancer | tgagtTGGTTTTt | Disrupted | -0.102 |
hsa-miR-4637 | acute lymphoblastic leukemia | tgAGTTAGTtttt | Created | -0.176 | ||
36598259 | rs144069419 | hsa-miR-3191 | Melanoma | aCCAGAGAtgagc | Disrupted | -0.136 |
hsa-miR-330 | Cervical cancer | aCCAGAGAtgagc | Disrupted | -0.081 | ||
hsa-miR-3126 | Melanoma & breast cancer | aCCAGATAtgagc | Created | -0.089 | ||
36598426 | rs115582302 | hsa-miR-3646 | solid tumors | tatTTCATTTttg | Disrupted | -0.067 |
36598426 | rs115582302 | hsa-miR-548ad | malignant human B cells | tattTCGTTTTtg | Created | -0.187 |
36598725 | rs4151039 | hsa-miR-3137 | Melanoma | taGCTACAGttag | Disrupted | -0.224 |
36599069 | rs4151040 | hsa-miR-624 | colorectal cancer | ggataACCTTGTA | -0.129 | |
36599086 | rs112766186 | hsa-miR-4524 | malignant human B cells + breast cancer | tccatCTGCTAAg | Created | -0.034 |
36599087 | rs145963034 | hsa-miR-374 | cervical cancer | ccatccGCTAAGT | Disrupted | -0.037 |
36599090 | rs4151041 | hsa-miR-374 | cervical cancer | tccGCTAAGTtta | Disrupted | -0.037 |
36599164 | rs149724031 | hsa-miR-570 | colorectal cancer | tggaaTGTTTTCA | Disrupted | 0.1 |
36599261 | rs113060327 | hsa-miR-5195 | acute lymphoblastic leukemia | tcattTAGGGGTA | Disrupted | -0.282 |
hsa-miR-4640 | Breast cancer | tcatttGGGGGTA | Created | -0.158 | ||
36599548 | rs185464049 | hsa-miR-5694 | metastatic prostate cancer | gaaactATGATCT | Disrupted | 0.076 |
hsa-mir-27 | cervical cancer | gaaACTGTGAtct | Created | 0.024 | ||
36599964 | rs4151044 | hsa-miR-3153 | Melanoma | gccacaCTTTCCC | Disrupted | -0.003 |
hsa-miR-4668 | Breast cancer | gccacaTTTTCCC | Created | 0.056 | ||
36600151 | rs148483119 | hsa-miR-155 | B cell lymphomas + cervical cancer + CLL | taacacTGTAGGA | Disrupted | -0.079 |
36600232 | rs4151045 | hsa-miR-4698 | Breast cancer | atcaCATTTTGAt | Disrupted | 0.095 |
hsa-miR-3973 | acute myeloid leukemia | atcacACTTTGAt | Created | 0.035 | ||
hsa-miR-595 | colorectal cancer | atCACACTTtgat | Created | -0.052 | ||
36600527 | rs192931118 | hsa-miR-3622 | Cervical + breast cancer | TCAGGTGcattgc | Disrupted | -0.043 |
36601006 | rs180966342 | hsa-miR-580 | colorectal cancer | AATCATTtttggt | Disrupted | -0.099 |
36601142 | rs183729240 | hsa-miR-1273 | cervical cancer | gatGCAGTGGAtt | Created | -0.466 |
hsa-miR-181 | cervical cancer + CLL | gatgCAGTGGAtt | Created | -0.165 | ||
36601202 | rs190801060 | hsa-miR-205 | cervical cancer + nasopharyngeal carcinoma | aaaTGAAATAtga | Disrupted | -0.057 |
hsa-miR-5696 | metastatic prostate cancer | AAATGAAatatga | Disrupted | -0.035 | ||
hsa-miR-579 | colorectal cancer | AAATGAAatatga | Disrupted | -0.059 |
Table 2B: 3′UTR SNPs of RAG2 as detected by PolymiRTS.
Location | dbSNP ID | miR ID | Cancer type | miR site | Func class | context+ score change |
36613631 | rs186462541 | hsa-miR-1273 | Cervical cancer | tTCAAGCAAccct | disrupted | -0.322 |
hsa-miR-23 | Cervical cancer | ttcaaGGAACCCt | Created | -0.319 | ||
36613758 | rs3740956 | hsa-miR-193 | Cervical cancer | taGCCAGTAaaga | Created | -0.226 |
hsa-miR-4794 | Breast cancer | TAGCCAGtaaaga | Created | -0.276 | ||
hsa-miR-664 | Chronic lymphocytic leukemia | TAGCCAGtaaaga | Created | -0.294 | ||
36613947 | rs142073874 | has-miR-4698 | Breast cancer | cctataCATTTTG | disrupted | -0.194 |
ACKNOWLEDGEMENT
The authors thank the Africa City of Technology (Sudan) for providing the facilities to carry out this work.
REFERENCES
1. Notarangelo LD. Primary immunodeficiencies. J Allergy Clin Immunol. 2010; 125: 182-194.
2. Cossu F. Genetics of SCID. Ital J Pediatr. 2010; 36: 76.
11. Fischer A. Severe combined immunodeficiencies (SCID). Clin Exp Immunol. 2000; 122: 143-149.
23. Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001; 11: 863-874.