Computational in Sight into Identification and Analysis of SSR-FDM in Citrus limon
- 1. Tectona Biotech Resource Center, Bhubaneswar, India
Abstract
SSRs or microsatellites identification and its functional analysis has a key role in different sectors of genomics such as genome organization, gene regulation, quantitative genetic variation, evolution of genes and plant breeding sectors. Therefore, computational approach was undergone for identification and analysis of SSR within functional domain of Citrus limon (C. limon) of family Rutaceae, is one of the vital and effective medicinal plant. Total 1644 numbers of extracted ESTs of C. limon were validated through Tandem repeat finder and VecScreen which have been assembled in CAP3 that resulted 55 contigs and 1183 singletons. Afterwards, total 420 SSRs were identified as SSR-ESTs using MISA tool and also detected 75.23% of mononucleotide SSR motifs with most ample sort of repeats such as di- (9.52%), tri- (7.61 %) and tetra- (0.95 %) nucleotide. Ultimately, 128 SSR sequences have been selected with appropriate primer properties which would be used as markers to look at transferability to related species. Further, the useful functional annotation was performed using Blast2GO. These findings would assist to understand the significance of SSR markers and also to facilitate the evaluation of genetic range in medicinal plant flora.
Citation
Ray M, Pradhan SP, Sahoo S (2019) Computational in Sight into Identification and Analysis of SSR-FDM in Citrus limon. Int J Plant Biol Res 7(1): 1111.
Keywords
• SSRs; ESTs; Genomics; MISA; Plant breeding; Primer
ABBREVIATIONS
EST: Expressed Sequence Tag; SSR: Simple Sequence Repeats; NCBI: National Centre for Biotechnology Information; DbEST: Database of EST; MISA: Micro Satellite Identification Tool; KEGG: Kyoto Encyclopaedia of Genes and Genomes; BP: Biological Process; MF: Molecular Function; CC: Cellular Components
INTRODUCTION
Plant oriented natural resources are vital for human life. Especially in the last century, the irresponsible use of natural resources has become one of the alarming problems which are a threat to nature and the environment. The wide variety of plant derived medicaments has expanded slowly to come upon needs [1]. Thus an expertise of the patterns of genetic variant within and among populations of medicinal plant life is essential for devising most effective genetic resource control strategies for his or her conservation, sustainable usage and genetic improvement [2].
Citrus fruits are one of the international’s most essential fruit crops, and are regarded for their nutritive values and unique aroma. Citrus is specially consumed as clean fruit or juice. Many in vivo and in vitro researches have proved that citrus fruit is effective against many chronic diseases, like cancers and vascular illnesses. Lemon could be very rich in important natural compounds, which include citric acid, ascorbic acid, minerals, flavonoids, and crucial oils. Therefore, the new Citrus cultivars have been mainly developed for fresh consumption i.e. to screen these plants in order to validate their use in food and medication and to and to show the active ingredients by the way of characterizing their constituents. The unique tendencies which include their phenolic compound and specially the flavonoids contents led to their use in new fields inclusive of pharmacology and food era [3].
Although if, have a look at on taxonomic type of Citrus limon, it represents the complicated, debatable and ambiguous taxonomy as it consists of a number of the most commercially crucial fruits [4]. This purpose prompted to work on molecular marker evaluation on this present characteristic because taxonomic category offers the records for future breeding, genetic improvement etc, so to enhance this observe the following analyses have been taken in to attention.
Expressed collection tags (ESTs) are sub sequence of cDNA instructions that offer direct facts of gene expression and additionally function resources of microsatellites or the simple sequence repeats (SSRs), are the short DNA sequences with 1-6 base pairs of length. Several studies advise that the plenty of SSRs were found in non-coding regions of the genome sequences and have a wide application in the area of plant genetic studies which includes genetic variant, linkage mapping, gene tagging, evolution and breeding as they have multi-allelic, reproducible and co-dominant inheritance properties [5].
EST–SSR markers are anticipated to own excessive interspecific transferability as they belong to conserved genic areas of the genome [6], thus the objectives of this work focused on the in silico identification of EST- SSR markers of Citrus limon.
Also the primer designing from EST-SSRs turned into one of the prospective elements of this study because in expressed DNA areas the present primer sequences are anticipated to be quite well conserved, hence it improving the threat of marker transferability across taxonomic boundaries [7]. The final element is the functionality annotation of SSR-FDM, which gives the facts approximately the involvement of EST-SSRs in distinct metabolic features and throws a course to research the genetic capability of C. limon.
MATERIALS AND METHOD
Retrieval of EST sequences
The Expressed sequence tag (EST) sequences of Citrus limon were retrieved from EST database (dbEST) (https:// www.ncbi.nlm.nih.gov/nucest/?term=) of National Centre for Biotechnology Information (NCBI) web server (https://www. ncbi.nlm.nih.gov/).
Detection of repeat locations
The accumulated EST sequences of Citrus limon were subjected to for the elimination or deletion of repeat regions within the nucleotide sequences through the usage of Tandem Repeats Finder (TRDB) (https://tandem.bu.edu/trf/trf.html) that’s a application to find or show the repeated sample of one or greater nucleotides in DNA sequences.
Screening of vector regions
After the deletion of tandem repeats containing sequences the EST sequences were again analyzed to screen the vector regions through VecScreen (https://www.ncbi.nlm.nih.gov/tools/ vecscreen/) which is a system to find the section of nucleotide, which may be a vector contaminated vicinity or the infection rate is more at that precise segment.
Sequence assembly analysis
The remaining EST sequences of C. limon were taken for assembly analysis by using CAP3 (http://doua.prabi.fr/software/ cap3) sequence assembly program, which permits to assemble a set of contiguous or contigs sequence as well as the singleton sequences.
Detection of SSR containing EST sequences
The resulted contigs and singleton sequences were subjected to further analysis to find out those sequences which contained the single sequence repeats (SSR) sequences via Microsatellite identification tool (MISA) (http://pgrc.ipk-gatersleben.de/ misa/). It allows the identification and localization of perfect microsatellites as well as compound microsatellites which are interrupted via a certain wide variety of bases.
Retrieval of primer sequences
The amassed SSR containing EST sequences were again computed in Primer 3 (http://bioinfo.ut.ee/primer3-0.4.0/) to collect the appropriate primer sequences or the forward and reverse primer from the given nucleotide sequences.
SSR-FDM analysis
The functional annotation of considered Primer sequences turned into performed through Blast2GO (https://www.blast2go. com/) analysis. Blast2GO is a bioinformatics platform for highquality functional annotation and evaluation of genomic datasets. So, this could offer all of the useful facts for selected sequences.
RESULTS AND DISCUSSION
Sequence retrieval and validation
There were total 1644 number of EST sequences of Citrus limon were retrieved from EST database of NCBI and were analyzed through Tandem Repeat Finder to find out the sequence in which one or more nucleotides were repeated at a phase, because these tandem repeats can be found not only in intergenic regions however also in each of the non coding and coding regions of an expansion of different genes and these repeat expansion sicknesses are a set of human genetic problems caused by long and highly polymorphic tandem repeats, such as if the repeat is present in an exon or coding part, then Huntington Disease (HD) or spinobulbar muscular atrophy (SBMA) is happened and if repeat is outside of the open reading frame myotonic dystrophy (DM) or Fragile X syndrome (FXS) can caused [8]. Thus, to overcome all above the complications, there were total 210 numbers of tandem repeats contained EST sequences were removed manually. After removal of tandem repeat sequences, remaining 1434 numbers of sequences were once more analyzed through VecScreen [9] to discover and eliminate the vector infected sequences i.e. the segment of nucleotide which may be a strong vector origin and might have more chance to contaminate by vector, thus out of 1434 number of sequences 63 numbers of sequences were deleted as vector contaminated sequences and remaining 1371 number of EST sequences of Citrus limon were went for further analysis (Figure 1).
Sequence assembly analysis and identification of SSR
The completion of retrieval and validation of EST sequences of Citrus limon, initiated the analysis of remaining 1371 numbers of sequences for meeting evaluation through CAP3 program which collect the reads of EST sequences and predicts the contigs and singleton sequences, because the software has a functionality to clip 5′ and 3′ low-quality regions of reads, uses base quality values in computation of overlaps between reads, construction of more than one sequence alignments of reads, and generation of consensus sequences. This program also uses forward–opposite constraints to correct assembly errors and hyperlink contigs [10]. So, on the premise of all of the functions of CAP3 application, there were total 55 numbers of contigs and 1183 number of singleton sequences were predicted from 1371 numbers of EST sequences and these predicted contigs and singleton sequences were subjected for identification of SSR sequences i.e. those contigs and singleton sequences contains single sequence repeats (SSR) or microsatellites, are extensively-used marker device in plant genetics and forensics and beneficial for primer design [11], were identified through MISA, for the reason that it could identify the SSR containing sequences from both contigs and singletons [12] and right here it resulted 420 numbers of SSR containing sequences, which is termed as SSR-ESTs (Figure1).
Primer designing
The fundamental parameters for primer pair design have been as follows: a minimum range of SSR pattern repeats of 10 for di-nucleotides, seven for tri-nucleotides, four for tetra-nucleotides, minimum and most product sizes of 103–250 bp (optimal: 150 bp); primer length of 18–25 bases (optimal: 21 bases); GC content of 57.45% –61.76% (optimal: 50%); annealing temperatures of 31.82°C - 60°C (optimal: 56°C); and default values for the other parameters [13]. Thus, by following these above criteria, out of 420 EST-SSR 128 numbers of sequences were considered, which were gave appropriate forward and reverse primers through Primer3, because Primer3 software has been broadly used for primer layout, often in high-throughput genomics programs [14] (Table 1) (Figure 1).
Frequency distribution of SSRs
The diagnosed SSRs (Microsatellites) were analyzed by MISA tool, which were gave mononucleotides, dinucleotides, trinucleotides, tetranucleotides and compounds. Out of 420 SSR containing sequences, the highest proportion were presented 316 numbers of mononucleotide repeats (MNR) (75.23%), 40 numbers of dinucleotide repeats (DNR) (9.52%), 32 numbers of trinucleotide repeats (TNR) (7.61%), and 4 numbers of tetranucleotide repeats (TNR) (0.95%) were observed.
Functional domain analysis of SSR markers
128 numbers SSR-ESTs sequences were assigned with gene ontology terms for the functional domain annotation through BLAST2GO, has the capability to produce high throughput useful annotation statistics [15] but among 128 only 115 numbers of EST-SSRs were analyzed in BLAST2GO (Figure 1). The evaluation process consists of alignment, mapping, annotation and so on of given sequences with the aid of using unique packages like BLAST, InterProScan etc (Figure 2). Under BLAST2GO the functional analysis of considered sequences were done through InterProScan program, as it uses the databases like pattern scan, Signal PHMM, TMHMM, HMM Panther, and FPrintScan for functional domain analysis [7]. Here, the associated metabolic pathways and the enzyme codes for the EST-SSRs were additionally studied in BLAST2GO via KEGG database as KEGG database is a collection of organic pathways, chemical materials, diseases, drugs and many others [16]. There were total 913 numbers of mapped and annotated GO terms were analyzed out of which 392 numbers of biological process, 389 numbers of molecular functions and 132 numbers of GO terms for cellular components were analyzed for 146 numbers of EST-SSR sequences.
Biological processes
A biological process (larger processes) is a series of events accomplished by one or more (multiple) ordered assemblies of molecular function. In biological process, the most frequently observed functions were, Translation (13 SSR-ESTs), Transmembrane transport (6 SSR-EST), Fatty acid biosynthetic process (3 SSR-EST), Metabolic process (3 SSR- EST), Oxidationreduction process (3 SSR-EST), Response to water (3 SSR-EST), ATP hydrolysis couple proton transport (2 SSR-EST), Cell redox homeostasis (2 SSR-EST), Ceramide metabolic process, Denovo pyrimidine nucleobase biosynthetic process (2 SSR-EST), DNA replication (2 SSR-EST), Electron transport chain (2 SSR- EST), Glutamine metabolic process (2 SSR-EST), Lipid catabolic process (2 SSR-EST), Lipid transport (2 SSR- EST), Negative regulation of transcription, DNA-templated (2 SSR-EST), Phosphorelay signal transduction system (2 SSR-EST), Photosynthetic electron transport in photosystem II (2 SSR-EST), Protein glycosylation (2 SSR-EST), Proton transport (2 SSR-EST), Protein ubiquitination (2 SSR-EST), Protein-chromophore linkage (2 SSR-EST), Regulation of transcription, DNA template (2 SSR-EST), Response to abscisic acid (2 SSR-EST), Response to stress (2 SSR-EST), RNA processing (2 SSR-EST). The remaining markers were involved in a less amount of Biological process and also those processes were occurred in less number, so the most frequently occurred biological processes were taken in to consideration (Figure 3a).
Molecular functions
Molecular function describes the actions or activities that a gene product (or a complex) performs. Here, in molecular function, the most frequent resulted functions were as follows; Structural constituent of ribosome (13 SSR-EST), DNA binding (7 SSR-ESTs) ATP binding (6 SSR-ESTs), RNA (rRNA, tRNA) binding (5 SSR-EST), Transmembrane transporter activity (4 SSR-EST), heme binding (4 SSR-EST), Zinc ion binding (3 SSREST), Transferase activity (3 SSR-EST), Monooxygenase activity (3 SSR-EST), Lipid binding (3 SSR-EST), Iron ion binding (3SSREST), GTP binding (3 SSR-EST), Electron transfer activity (3 SSR-EST), DNA binding transcription factor activity (3 SSR-EST) (Figure 3b).
Cellular components
Cellular component is a component of cell, but with the provision that it is part of some larger object. This study meet the most frequently observed cellular components were, Integral component of membrane (9 SSR-ESTs), Ribosome (6 SSR-ESTs), Nucleus (6 SSR-ESTs), Small ribosomal subunit (5 SSR-ESTs), Proton transporting ATP synthase complex, Catalytic core F (1) (3 SSR-ESTs), Photosystem I (3 SSR-ESTs), Golgi membrane (3 SSR-ESTs) (Figure 3c).
The SSR-ESTs after FDM assessment had been further analyzed in Blast2Go for EC mapping and then figuring out its KEGG pathways. The EC mapping and KEGG pathway enrichment assessment resulted 87 numbers of metabolic pathways and the enzyme codes for 146 numbers of EST-SSR sequences i.e. Multiple quantity of sequences were involved in exclusive metabolic pathways and additionally one sequence have a couple of number of enzyme codes. This analysis summarized that the enzyme code ec: 3.6.1.15-phosphatase corresponds to Thiamine metabolism pathway with involvement of maximum 4 numbers of SSR-ESTs (Table 2). This prediction may leads to gather the information regarding the involvement of selected primers from SSR-ESTs of C. limon with different metabolic pathways.
CONCLUSION
Citrus limon is an ever inexperienced plant with high rate of medicinal value. Microsatellites or SSRs play a prime function in polymorphism analysis and in marker assisted selection. In silico approach for predicting SSRs within the complete genome, was observed to be both cost and time effective and additionally helps to increase a novel generation of molecular markers as well. So, this study exhibits 420 EST-SSR sequences which give 128 tremendous primers which will probably beneficial for genetic mapping, gene populace examine and so on. Also the functional domain analysis or GO annotation of resulted ESTSSRs can provide statistics concerning the putative functions of transcribed genetic markers, which might have the way for future studies in the aspect of breeding and genetic studies of Citrus limon plant and its functional characterization.
ACKNOWLEDGEMENT
We really thanks to Tectona Biotech Resource Centre (TBRC), Odisha, for providing necessary facilities, as well as we present our heartily gratitude to Dr. Shovan Kumar Mishra, Director of TBRC, for his endless support throughout this work.