In silico Analysis of Single  Nucleotide Polymorphism  (SNPs) in Human RAG1 and  RAG2 Genes of Severe  Combined Immunodeficiency

Mona Shams Aldeen S. Ali; Tomador Siddig MZ; Rehab A. Elhadi; Muhammad Rahama Yousof; Siddig Eltyeb Yousif Abdallah; Maiada Mohamed Yousif Ahmed; Nosiba Yahia Mohamed Hassen; Sulum Omer Masoud Mohamed; Marwa Mohamed Osman; Mohamed A. Hassan

doi:https://doi.org/10.47739/2576-1102/1005

In silico Analysis of Single Nucleotide Polymorphism (SNPs) in Human RAG1 and RAG2 Genes of Severe Combined Immunodeficiency

Research Article | Open Access

Article DOI : https://doi.org/10.47739/2576-1102/1005

Mona Shams Aldeen S. Ali^1,2* Tomador Siddig MZ^1,2 Rehab A. Elhadi^1,2 Muhammad Rahama Yousof² Siddig Eltyeb Yousif Abdallah¹ Maiada Mohamed Yousif Ahmed² Nosiba Yahia Mohamed Hassen¹ Sulum Omer Masoud Mohamed¹ Marwa Mohamed Osman² Mohamed A. Hassan²

^1. Department of Rheumatology, Omdurman Teaching Hospital, Sudan
^2. Department of Applied Bioinformatics, Africa City of Technology, Sudan

+ Show More - Show Less

Corresponding Authors

Mona Shams Aldeen S. Ali, Department of Applied Bioinformatics, Africa City of Technology, Khartoum, Sudan, Tel: 249121784688

Abstract

Severe combined immunodeficiency is an inherited Primary immunodeficiency PID, which is characterized by the absence or dysfunction of T lymphocytes. Defects in RAG1 and RAG2 are known to cause a TBNK+ form of SCID. Recombinase activating genes RAG1 and RAG2 (OMIM 179615,179616 respectively) are expressed exclusively in lymphocytes and mediate the creation of double-strand. DNA breaks at the sites of recombination and in signal sequences during T− and B− cell receptor gene rearrangement. This study was focused on the effect of nonsynonymous single nucleotide polymorphisms in the function and structure of RAG1& RAG2 genes using In silico analysis. Only nsSNPs and 3’UTR SNPs were selected for computational analysis. Predictions of deleterious nsSNPs were performed by bioinformatics software. Five damaging nsSNPs (rs112047157, rs61758790, rs4151032, rs61752933, rs75591129) were predicted in RAG1 and two damaging nsSNPs (rs112927992, rs17852002) in RAG2, all of this nsSNPs were found on domain that important in binding and mutation effect in its protein function. Hence it is the first study type of RAG1 and RAG2 analysis. We hope to provide more information that needed to help researchers to do further study in SCID especially in our country where consanguineous marriage is common.

Citation

Ali MSAS, Tomador Siddig MZ, Elhadi RA, Yousof MR, Yousif Abdallah SE, et al. (2016) In silico Analysis of Single Nucleotide Polymorphism (SNPs) in Human RAG1 and RAG2 Genes of Severe Combined Immunodeficiency. J Bioinform, Genomics, Proteomics 1(1): 1005.

Keywords

•   Severe combined immunodeficiency
•   Primary immunodeficiency
•   T lymphocytes
•   Recombinase activating genes
•   non synonymous Single Nucleotide Polymorphisms

ABBREVIATIONS

SCID: Severe combined immunodeficiency; PID: Primary Immunodeficiency; OMIM: Online Mendelian Inheritance in Man; nsSNP: nonsynonymous Single Nucleotide Polymorphisms; RAG1: Recombinase Activating Gene1; RAG2: Recombinase Activating Gene2; AR: Autosomal Recessive; NK: natural killer; DNA: Deoxyribo Nucleic Acid; SIFT: Sorting Intolerant from Tolerant; PolymiRTS: Polymorphism In Micro RNAs and their Target Sites; PolyPhen-2: Polymorphism Pheno typing V2; PSIC: Position-Specific Independent Count; miRNA: Micro Ribonucleic Acid; 3′UTR: 3′ Un Translated Region; RI: Reliability Index; RSSs: Recombination Signal Sequences; GO: Gene-Ontology

INTRODUCTION

SCID is an inherited primary immunodeficiency, which is characterized by the absence or dysfunction of T lymphocytes affecting both cellular and humoral adaptive immunity [1-5]. Estimated to be 1 in 75,000-100,000 of live births [8-11] and are more common in male subjects, reflecting the over representation of X-linked SCID (XL-SCID), the most common worldwide form (50%) of SCID in human subjects [10]. However, in cultures in which consanguineous marriage is common, the incidence of autosomal recessive - SCID is higher than has been previously reported [10]. It can be classified as T−B+ and T−B− SCID with further subdivision based on the presence or absence of NK cells [5]. Defects in Recombinase activating genes (RAG1 and RAG2) are known to cause a T-B-NK+ form of AR- SCID (OMIM: 601457) [12,13]. It is now known that SCID can be caused in humans by mutations in at least 13 different genes that result in aberrant development of T cell [7]. Since the first description of RAG1 and RAG2 deficiency in patients with SCID by Schwarz et al. in 1996 [14], a pleiotropic spectrum of phenotypes associated with RAG1 and RAG2 deficiency has been described. The location of RAG1& RAG2 is on chromosome 11 p13 [15,16]. RAG1 and RAG2 are expressed exclusively in lymphocytes and mediate the creation of double-strand DNA breaks at the sites of recombination and in signal sequences during T- and B-cell receptor gene rearrangement [10].

V(D)J recombination is the site-specific DNA rearrangement process that assembles Both B and T Cell Receptor - TCR - genes during lymphoid development. Recombination is initiated by the lymphoid-specific RAG1 and RAG2 recombinase, which introduces double-strand DNA breaks at RSSs flanking variable (V), diversity (D), and junction (J) gene segments spread along the Immunoglobulin and TCR loci [17-20]. The recombination process is tightly regulated, occurring at specific stages of development and in specific cell types (e.g., Immunoglobulin and TCR genes are rearranged in B and T cells, respectively). This process takes place in a temporal manner, with Immunoglobulin heavy chain rearrangements preceding Immunoglobulin light chain rearrangements and D-to-J rearrangements preceding V-to-D J rearrangements [17-20]. Mutations in either RAG1 or RAG2 genes hamper initiation of V(D)J recombination, hence causing an early block of B and T cell maturation similar to the situation of RAG1 and RAG2 knockout (KO) mice [21].

In this computational study, we focused on the effect of nsSNPs in the function and structure of RAG1 and RAG2 Protein using In silico analysis. Hence it is the first study type of RAG1 and RAG2 analysis, we hope to provide more information that needed to help researchers to do further study in SCID especially in our country where consanguineous marriage is common.

MATERIALS AND METHODS

The SNPs sequence of RAG1 and RAG2 genes were collected in August 2015 from NCBI database (http://www.ncbi.nlm.nih. gov/projects/SNP). They contained a total of 5413 SNPs in RAG1 and 2804 SNPs in RAG2 at the time of the study, out of which 1048 in RAG1 and 508 in RAG2 were coding SNPs, 147 in RAG1 and 28 in RAG2 occurred in the miRNA 3′ UTR, eight in RAG1 and ten in RAG2 occurred in 5′ UTR region and 234 in RAG1 and 178 in RAG2 occurred in intronic regions. We selected missense & nonsense nsSNPs and 3′ UTR SNPs for our investigation, Figure (1). The nsSNPs (rs SNPs) of RAG1 and RAG2 were submitted as batch to SIFT server, then the resultant damaging nsSNPs were submitted to Polyphen as query sequences in FASTA Format. Prediction of change in stability due to mutation was performed by I-Mutant 2.0. The protein sequences used were obtained from the ExPASy Database (www.expasy.org/ ). Project hope software was used to highlight the changes occurred as a result of the deleterious SNPs at the molecular level of the protein 3D structure. The SNPS at the 3′ UTR region were analyzed by PolymiRTS software. Prediction of the function of genes and their interactions were obtained from Gene MANIA database.

Gene MANIA (http://www.genemania.org/)

It is an online database that helps you predict the function of your favorite genes and gene sets. Gene MANIA finds other genes that are related to a set of input genes, using a very large set of functional association data. Association data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. You can use Gene MANIA to find new members of a pathway or complex, find additional genes you may have missed in your screen or find new genes with a specific function, such as protein kinases. Your question is defined by the set of genes you input [22].

SIFT-software

In order to detect deleterious nsSNPs, SIFT program was used, which is a novel bioinformatics tool to predict whether an amino acid substitution affects protein function, this program generates alignments with a large number of homologous sequences and assigns score for each residue ranging from zero to one. Scores closer to zero indicates evolutionary conservation of the genes and intolerance to substitution, while scores closer to one indicate tolerance to substitution only [23]. (http://sift. jcvi.org/)

PolyPhen-2

Also is an online bioinformatics soft-ware produced by Harvard University it searches for 3D protein structures, then calculates PSIC scores for each of two variant, the PSIC scores difference between two variants. PolyPhen results were assigned probably damaging (2.00 or more) possibly damaging (1.40- 1-90), potentially damaging (1.0-1.5), benign (0.00-0.90), [24]. (http://genetics.bwh.harvard.edu/pph2/index.shtml).We used this software to confirm SIFT result and we took only double positive results for further workup.

I-Mutant v2.0c

Predictor the stability changes upon mutation from the protein sequence or structure. It shows the amino acid in Wild-Type Protein (WT), New Amino acid after Mutation (NAW), reliability Index (RI), Temperature in Celsius degrees (T) and the PH [25].

I-Mutant available at: (http:/www.I-Mutant2.0.cgi).

Chimera

It is a software produced by University of California; San Francisco is used in this step to generate the mutated models of protein 3D model. The outcome is then a graphic model depicting mutation [26]. (http://www.cgl.ucsf.edu/chimera/).

Project Hope software (http://www.cmbi.ru.nl/ hope/input)

It is an online web server where the user can submit a sequence and mutation. This software collects structural information from a series of sources, including calculations on the 3D protein structure, sequence annotations in UniProt and predictions from DAS-servers. It combines this information to give analyze the effect of a certain mutation on the protein structure and will show the effect of that mutation in such a way that even those without a bioinformatics background can understand it [27].

PolymiRTS

It is the database server designed specifically for the analysis of the 3′UTR region; we used this server to determine SNPs that may alter miRNA target site [28]. All SNPs located within the 3′UTR region were selected separately and submitted to the program. (Available at: http://compbio.uthsc.edu/miRSNP/).

RESULTS AND DISCUSSION

Prediction of protein structural stability

Seven nsSNPs of RAG1 and RAG2 genes have been selected on the basis of prediction scores of SIFT and PolyPhen; these SNPs were given to I-Mutant web server to predict the DDG stability and RI upon mutation, in RAG1, three SNPs (rs112047157, rs4151032 and rs75591129) shown decrease in protein stability while other two SNPs (rs61758790 and rs61752933) shown increase in protein stability, both SNPs in RAG2 predicted to decrease protein stability, as in (Table 1).

Modeling of mutant structure

Protein sequences of the nsSNP were presented to Project Hope revealed the 3D structure for the truncated proteins with its new candidates; in addition, it described the reaction and physiochemical properties of these candidates. Here we present the results upon each candidate and discuss the conformational variations and interactions with the neighboring amino acids; all native and mutant structure of RAG1 and RAG2 proteins showed in the (Figure 2). The wild type is displayed by green color while mutant type is displayed by red one. A/G mutation (rs112047157) led to conversion of methionine to valine at position 487. The mutant residue is smaller than wild residue; this might lead to loss of interactions. The wild-type residue is located in an α-helix. The mutation converts the wild-type residue in a residue that does not prefer α-helices as secondary structure leading to disturb local structure. The mutated residue is located in a domain that is important in DNA binding and nucleic acid binding. Mutation of the residue might disturb the function. (Figure 2a)

G/T mutation (rs61758790) caused conversion of phenylalanine to leucine at position 520. The mutant residue is smaller than wild residue, this might lead to loss of interactions. The wild-type residue is located in its preferred secondary structure, a β-strand but the mutant residue prefers to be in another secondary structure; therefore the local conformation will be slightly destabilized. The mutated residue is located in a domain that is important in DNA binding and nucleic acid binding. Mutation of the residue might disturb this function (Figure 2b)

C/T mutation (rs4151032) resulted in change of proline to serine at position 525. Prolines are known to have a very rigid structure, changes a proline with such a function into another residue disturbing the local structure. This variant is annotated with severity: Polymorphism (VAR_029263). The mutant residue is smaller than wild residue; this might lead to loss of interactions. The wild-type residue is more hydrophobic than the mutant residue. Hydrophobic interactions, either in the core of the protein or on the surface, will be lost. The mutated residue is located in a domain that is important for binding of other molecules. Mutation of the residue might disturb this function, (Figure 2c).

A/G mutation (rs61752933) caused change of isoleucine in to valine at position 810. The mutant residue is smaller than wild residue; this might lead to loss of interactions. The mutated residue is located in a domain that is important in DNA binding and nucleic acid binding. Mutation of the residue might disturb this function, (Figure 2d).

A\C mutation (rs75591129) caused conversion of tyrosine to serine at position 913. Tyrosine is preferred secondary structure, a β-strand. The mutant residue prefers to be in another secondary structure; therefore the local conformation will be slightly destabilized. The mutated residue is located in a domain that is important in DNA binding and nucleic acid binding. Mutation of the residue might disturb this function, (Figure 2e). C/T mutation (rs112927992) resulted in change serine to phenylalanine at position 291; this residue is part of an interpro domain named V-D-J Recombination Activating Protein 2. This domain is annotated with GO, these GO annotations indicate the domain has a function in DNA binding and Nucleic Acid Binding. This residue is part of an interpro domain named Galactose Oxidase/ kelch, Beta-Propeller (IPR011043). The mutant residue is bigger than the wild-type residue, this might lead to bumps. The mutant residue is more hydrophobic than the wild-type residue; this can result in loss of hydrogen bonds and/or disturb correct folding. The mutated residue is located in a domain that is important for binding of other molecules. Mutation of the residue might disturb this function (Figure 2f). T/C mutation (rs17852002) led to conversion of valine to alanine at position 154. The mutant residue is smaller than the wild residue; this might lead to loss of interactions. The wild-type residue is located in its preferred secondary structure, a β-strand. The mutant residue prefers to be in another secondary structure; therefore the local conformation will be slightly destabilized. The residue is part of an interpro domain named V-D-J Recombination Activating Protein 2. This domain is annotated with GO these indicate the domain has a function in DNA binding and Nucleic Acid Binding, also is part of an interpro domain named Galactose Oxidase/ kelch, Beta-Propeller (IPR011043 ), is part of an interpro domain named Kelch-Type Beta Propeller ( IPR015915 ) ,the domain is also important in Protein Binding (Figure 2g). Both nsSNPs of RAG2 were found in the same domain that important in function of protein leading to disturb of function. Mutations in RAG1 or RAG2 result in the blocking of T- and B-cell inability to initiate recombination of the DNAs variable, diversity, and joining regions, and thereby do not form functional T- cell or B-cell receptors [29].

SNPs at the 3′UTR region

SNPs in 3′UTR of RAG1 and RAG2 genes were submitted as batch to PolymiRTS server. The output showed result as following; in RAG1, 19 SNPs were predicted while only three SNPs were predicted in RAG2 gene. The functional classes of both genes are described in (Table 2) below. According to (Table 2), we found some SNPs in 3′UTR of both RAG1 and RAG2 related to cancer development although those patients with SCID don’t survive [8-10] till they develop cancer, but some patients with delayed onset of RAG1 deficiency develop cancer [30].

RAG1 and RAG2 have many vital functions, and they interact, co-expressed, share similar protein domain, or participate to achieve many functions with many genes and they are illustrated by using GENEMANIA and shown in (Figure 3) below.

Table 1: Prediction of nsSNPs in RAG 1& RAG 2 by SIFT, PolyPhen-2 and I-Mutant software.

Gene Type	SNP ID	Chromosome Location	Nucleotide Change	Sift Prediction	Sift Score	Sift Medium	Acc	Amino Acid Change	Polyphen-2 Result	I Mutant Result
RAG1	rs112047157	11:36574763	A/G	Damage	1.000.00	4.27	P15918	M487V	Possibly Damaging	Decrease stability
	rs61758790	11:36574864	G/T	Damage	1.000.00	4.27		F520L	Probably Damaging	Increase stability
	rs4151032	11:36574877	C/T	Damage	1.000.00	4.27		P525S	Probably Damaging	Decrease stability
	rs61752933	11:36575732	A/G	Damage	1.000.01	4.27		I810V	Possibly Damaging	Increase stability
	rs75591129	11:36576096	A/C	Damage	1.000.00	4.27		Y931S	Probably Damaging	Decrease stability
RAG2	rs112927992	11:36614847	C/T	Damage	1.000.00	4.32	P55895	S291F	Possibly Damaging	Decrease stability
RAG2	rs17852002	11:36615258	T/C	Damage	1.000.00	4.32	P55895	V154A	Probably Damaging	Decrease stability

CONCLUSION

In RAG1 we found five nsSNPs and two nsSNPs in RAG2 predicted by both SIFT and Polyphen, i.e. Double positive results; however five nsSNP (rs415107, rs34841221, rs4151029, rs2227973, rs4151034) and one nsSNP (rs117899975) in RAG1 and RAG2 respectively damaging by SIFT only which may be due to the limitation of the softwares used and we estimate the nsSNPs to be further analyzed by an advance software to predict their effect and they are speculated to affect the stability or function of the proteins. From this study we suggest these seven nsSNPs predicted to be good candidates and very useful in detection of SCID associated with RAG1 and RAG2.

From the results of PolymiRTS we noticed that although many cancer types may generate due to miRNA target site but possibility of cervical cancer was the most common. Application of the computational tools might provide an alternative approach to select target SNPs in association studies, helping in research and diagnostic purpose as well.

Table 2A: 3′UTR SNPs of RAG1 as detected by PolymiRTS.

Location	dbSNP ID	miR ID	Cancer type	miR site	Function class	context+ score change
36598009	rs189589191	hsa-miR-548	Cervical cancer	tgagtTGGTTTTt	Disrupted	-0.102
36598009		hsa-miR-4637	acute lymphoblastic leukemia	tgAGTTAGTtttt	Created	-0.176

36598259	rs144069419	hsa-miR-3191	Melanoma	aCCAGAGAtgagc	Disrupted	-0.136
36598259		hsa-miR-330	Cervical cancer	aCCAGAGAtgagc	Disrupted	-0.081
		hsa-miR-3126	Melanoma & breast cancer	aCCAGATAtgagc	Created	-0.089
36598426	rs115582302	hsa-miR-3646	solid tumors	tatTTCATTTttg	Disrupted	-0.067
36598426	rs115582302	hsa-miR-548ad	malignant human B cells	tattTCGTTTTtg	Created	-0.187
36598725	rs4151039	hsa-miR-3137	Melanoma	taGCTACAGttag	Disrupted	-0.224
36599069	rs4151040	hsa-miR-624	colorectal cancer	ggataACCTTGTA		-0.129
36599086	rs112766186	hsa-miR-4524	malignant human B cells + breast cancer	tccatCTGCTAAg	Created	-0.034
36599087	rs145963034	hsa-miR-374	cervical cancer	ccatccGCTAAGT	Disrupted	-0.037
36599090	rs4151041	hsa-miR-374	cervical cancer	tccGCTAAGTtta	Disrupted	-0.037
36599164	rs149724031	hsa-miR-570	colorectal cancer	tggaaTGTTTTCA	Disrupted	0.1
36599261	rs113060327	hsa-miR-5195	acute lymphoblastic leukemia	tcattTAGGGGTA	Disrupted	-0.282
		hsa-miR-4640	Breast cancer	tcatttGGGGGTA	Created	-0.158
36599548	rs185464049	hsa-miR-5694	metastatic prostate cancer	gaaactATGATCT	Disrupted	0.076
		hsa-mir-27	cervical cancer	gaaACTGTGAtct	Created	0.024
36599964	rs4151044	hsa-miR-3153	Melanoma	gccacaCTTTCCC	Disrupted	-0.003
36599964		hsa-miR-4668	Breast cancer	gccacaTTTTCCC	Created	0.056
36600151	rs148483119	hsa-miR-155	B cell lymphomas + cervical cancer + CLL	taacacTGTAGGA	Disrupted	-0.079
36600232	rs4151045	hsa-miR-4698	Breast cancer	atcaCATTTTGAt	Disrupted	0.095
		hsa-miR-3973	acute myeloid leukemia	atcacACTTTGAt	Created	0.035
		hsa-miR-595	colorectal cancer	atCACACTTtgat	Created	-0.052
36600527	rs192931118	hsa-miR-3622	Cervical + breast cancer	TCAGGTGcattgc	Disrupted	-0.043
36601006	rs180966342	hsa-miR-580	colorectal cancer	AATCATTtttggt	Disrupted	-0.099
36601142	rs183729240	hsa-miR-1273	cervical cancer	gatGCAGTGGAtt	Created	-0.466
36601142		hsa-miR-181	cervical cancer + CLL	gatgCAGTGGAtt	Created	-0.165
36601202	rs190801060	hsa-miR-205	cervical cancer + nasopharyngeal carcinoma	aaaTGAAATAtga	Disrupted	-0.057
		hsa-miR-5696	metastatic prostate cancer	AAATGAAatatga	Disrupted	-0.035
		hsa-miR-579	colorectal cancer	AAATGAAatatga	Disrupted	-0.059

Table 2B: 3′UTR SNPs of RAG2 as detected by PolymiRTS.

Location	dbSNP ID	miR ID	Cancer type	miR site	Func class	context+ score change
36613631	rs186462541	hsa-miR-1273	Cervical cancer	tTCAAGCAAccct	disrupted	-0.322
36613631		hsa-miR-23	Cervical cancer	ttcaaGGAACCCt	Created	-0.319
36613758	rs3740956	hsa-miR-193	Cervical cancer	taGCCAGTAaaga	Created	-0.226
		hsa-miR-4794	Breast cancer	TAGCCAGtaaaga	Created	-0.276
		hsa-miR-664	Chronic lymphocytic leukemia	TAGCCAGtaaaga	Created	-0.294
36613947	rs142073874	has-miR-4698	Breast cancer	cctataCATTTTG	disrupted	-0.194