Surveillance of Emerging SARS-CoV-2 Variants by Nanopore Technology-based Genome Sequencing
- 1. Department of Virology, Medical Research Institute, P.O.Box: 527, Baseline Road, Colombo 08, Sri Lanka
- 2. Department of Virology, Medical Research Institute, P.O.Box: 527, Baseline Road, Colombo 08, Sri Lanka
- 3. Department of Virology, Medical Research Institute, P.O.Box: 527, Baseline Road, Colombo 08, Sri Lanka
- 4. Department of Pathology, Stanford University School of Medicine, Palo Alto, California, USA
Abstract
Background:
Detection of emerging variants of severe acute respiratory syndrome coronavirus-2, genome sequencing in all countries at least 1% of their infections is recommended. Nanopore technology platform was set-up at the Reference laboratory during pandemic, sequencing is continued to understand the circulated variants in the country.
Objectives:
This study was to describe the surveillance of emerging variants by nanopore technology-based genome sequencing in different COVID-19 waves in Sri Lanka and to demonstrate the association with the sample characteristics, and vaccination status.
Methodology:
The study analyzed 207 RNA positive swab samples received to sequence laboratory during different waves. The N gene cut-off threshold < 30 considered as the major inclusion criteria. Viral RNA was extracted, elutes were subjected to nanopore sequencing according to the manufacturer’s instructions using the SQK-RBK110.96 rapid barcoding kit. All the sequencing data were uploaded in the publicly accessible database, GISAID.
Results:
Analysis revealed variants distributed throughout the period were 58% Omicron, 22% Delta, 4% Alpha, and only less than 1% of Kappa variant. 16% study samples were remained unassigned. Omicron variant was circulated among all age groups and in all the provinces. Ct value and variants assigned percentage was 100% in Ct values 10-15 while only 45% assigned Ct value over 25.
Conclusion:
The present study declared the emergence, prevalence, and distribution of SARS-CoV-2 variants locally and summarized the establishment of Nanopore Technology by enabling whole genome sequencing in a low resource setting country
Keywords
Emerging SARS-Cov-2 Variants, Laboratory Surveillance, Nanopore Technology, Genome Sequencing, Bioinformatics Analysis And Phylogeny, Sociodemographic And Sample Cut off (Ct) Threshold, Global Sharing Of Genomic Data/GISAID
citation
Abeynayake JI , Chathuranga GP, Fernando MAY, Sahoo MK (2023) Surveillance of Emerging SARS-CoV-2 Variants by Nanopore Technologybased Genome Sequencing. Clin Res Infect Dis 7(1): 1058.
INTRODUCTIO
The emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to evolve giving rise to many variants with higher transmissibility and immune evasion abilities which continue to drive the pandemic [1]. To detect emerging of such possible new variants, the World Health Organization has recommended to carry out genomic sequencing in all countries for at least 1% of their infections [2]. New variants of SARS-CoV-2 virus reinforce the critical role of whole genomic sequencing since it is important to find useful information about the viral lineages, Variants of Interests (VOI) and Variants of Concern (VOC) [3].
SARS-CoV2 is a virus with a single stranded RNA. The genome is about 30 kb and consists of genes encoding multiple nonstructural, structural, and accessory proteins. The non-structural proteins include NSP1 to NSP16 which are necessary for virus transcription and replication [4]. The structural proteins include Spike (S), Envelope (E), Membrane (M) and Nucleoprotein(N).The virus attachment, entry, and infectivity are mediated by structural proteins [3,4]
During virus replication, mutations can occur altering its protein functions leading to little to no impact on the virus’s properties, causes response to on vaccines, therapeutic medicines, diagnostic tools, and other public health and social measures [5]. Factors such as possible transmission between humans and other mammals, have contributed to the rapid increase in the number of mutations [3]. During the pandemic different variants have been identified and broadly categorized as VOC, VOI, and Variants Under Monitoring (VUM). Identification and monitoring their diversified transmission routes in a community in all countries is of key importance.
Nowadays Next-Generation Sequencing (NGS) is an effective method to identify different mutations and new variants of epidemiological and clinical importance. Even though Capillary sequencing technology was the first sequencing technology designed it has been progressively displaced by high-throughput“NGS” technologies. There are several NGS platforms such as Illumina dye sequencing, pyrosequencing, and single molecule real-time sequencing [6,7]. Nanopore technology is another technology that has been emerged and its read length is on average much longer, being especially valuable in Whole-Genome Sequencing (WGS) applications. Further, Nanopore technologies provides fast SARS-CoV-2 sequencing, at low cost, in a portable sequencing platform, with requirement of relatively minimal laboratory infrastructure [8].
Therefore, this Nanopore technology platform was setup in the state sector laboratories during the pandemic and sequencing was carried out during different waves and post COVID (Coronavirus disease) period with positive SARS-CoV-2 samples for better understanding of the circulating SARS-CoV-2 variants, and its mutations in the country.
OBJECTIVES
The objective of this study was to describe the laboratory surveillance of emerging SARS-CoV-2 variants by Nanopore technology-based genome sequencing in different COVID waves and post- COVID period in Sri Lanka and to demonstrate the association with the sample characteristics, clinical profiles, and the vaccination status.
MATERIAL AND METHODOLOGY
Clinical Samples
The study retrospectively analyzed Nasopharyngeal/ Oropharyngeal (NP/OP) samples received for WGS at the sequencing laboratory of the National Virus Reference Laboratory (NVRL), in the country during different COVID waves, using the Nanopore technology-based genome sequencing. All samples included had previously tested positive for SARSCoV-2 RNA using rRT-PCR (Real-Time Reverse-TranscriptasePolymerase Chain Reaction) with targets in the nucleocapsid and envelope genes or lateral flow testing with Panbio COVID-19 Ag rapid test of Abbott manufactures. Following primary detection of COVID-19 positivity samples were matched with pre-defined criteria developed by the Ministry of Health [9] and selected for genomic sequencing.
Sequencing laboratory at NVRL received NP/OP specimens in viral transport media maintaining the cold chain, from different provinces of the country through health care institutions of both state and private sector in Sri Lanka. Upon arrival samples were aliquoted, deidentified, and stored at -80?C until nucleic acid extraction and the test run was initiated. The N gene Cut off threshold value (Ct value) less than 30 was considered as the major inclusion criteria during selection of samples [9,10]. The other criteria intensified were positive samples from overseas returnees/foreign travelers, critically ill patients in ICU, moderate to severe symptomatic inward patients despite having the full course of COVID-19 vaccination, positive patients with initial recovery and discharged, community clusters with high complications, and samples from COVID-19 deaths
RNA extraction and Genomic sequencing
A total of 207 samples were re-extracted with a single freezethaw cycle. QIAamp® Viral RNA Mini kit was used following the manufacturers instruction to extract the viral RNA [11]. The workflow of the nanopore sequencing procedure is as Figure 1.
Figure 1: Workflow of the Nanopore Sequencing.
The extracted samples were subjected to nanopore sequencing according to the manufacturer’s instructions using the SQKRBK110.96 rapid barcoding kit (ONT, Oxford, UK). The extracted RNA was converted to cDNA using Luna Script RT Super Mix followed by sample amplification in two reactions with Midnight primer pool A & B and Q5 HS Master Mix designed for whole genome amplification.
1200 bp tiled PCR amplicons were generated with midnight primers as described in Freed et al., 2020 [12]. All the thermal cycling steps were carried out in the ABI 7500 Real Time PCR instrument (Applied biosystems, USA). Barcodes were attached to resulting DNA amplicons with Rapid Barcode Kit and pooled together before the clean-up step. Subsequently, DNA library was loaded into R9 version of the Oxford Nanopore MinION Spot-ON flowcell (FLO- MIN106D) and sequenced on the Oxford Nanopore Minion Mk1C. The run was terminated once the desired number of reads, a minimum of 20,000 reads per sample was achieved.
During the sequence run streaming the raw data generated were converted to FASTQ files through a process called base calling. FASTQ text files containing sequence data for each read were then uploaded to the EPI2ME Agent software which is a cloud-based data analysis platform that appraises the putative variant of SARS-CoV-2. Furthermore, the resulting consensus sequences (FASTA files) were analyzed through several web-based software like Nextclade, Pangolin COVID-19 lineage assigner, and Stanford University coronavirus antiviral & resistance database. Finally, all the sequencing data were uploaded and published in the publicly accessible database, GISAID platform which is a global initiative on sharing of genomic data [13]. In addition, distribution of variants during different COVID-19 waves, Cut off threshold (Ct value) of the tested samples, vaccination status, clinical profile and sociodemographic data gathered from the accompanied test requests were scrutinized with sequencing results.
Statistical Analysis
Excel software was used to calculation of means, and percentages. Mode was calculated using the SPSS version 25. Descriptive statistics were used to show the characteristics of the study sample.
RESULTS
Here we present genome sequenced results of samples during October 2021 through January 2023 with sample characteristics and variants distribution across the variables
Sample Characteristics
A total of 207 SARS-CoV-2 positive patients’ swab-based samples were sequenced. Sample characteristics such as age range, gender, geography, vaccination status, and clinical profile are depicted in Table 1.
Table 1: Sample characteristics.
Variable | Frequency (percentage) |
Age range, Mean, Mode, (n = 207) | 4 months - 80 years, 37 years, 28 years |
< 1 year | 2 (0.97%) |
1-15 years | 22 (10.63%) |
16-50 years | 134 (64.73%) |
52-65 years | 40 (19.32%) |
> 65 years | 9 (4.35%) |
Gender (n=207) | |
Female | 76 (36.71%) |
Male | 131 (63.29%) |
Geographical distribution of cases (n=207) | |
Western | 162 (78.26%) |
Northwestern | 32 (15.46%) |
Eastern | 2 (0.97%) |
Sabaragamuwa | 1 (0.48%) |
Vaccination status (n = 207) | |
Vaccinated | 84 (40.58%) |
Not vaccinated | 7 (3.38%) |
Unknown | 116 (56.04%) |
Clinical Profile (n = 207) | |
Symptomatic | 59 (28.50%) |
Asymptomatic | 16 (7.73%) |
Unknown | 132 (63.77%) |
N Gene Ct value (n = 207) | |
15-Oct | 23 (11.11%) |
15-25 | 139 (67.15%) |
> 25 | 31 (14.98%) |
Antigen Positive cases | 6 (2.90%) |
Unknown | 8 (3.86%) |
Of these, majority were nasopharyngeal samples. Among the sequenced, males were 63%, and females only 37%. Age range was four months to 80 years and mean age was 37 years. Of the total, highest numbers subjected to sequencing were from Western province and rest represented North-western, Southern, Eastern and Sabaragamuwa provinces. Vaccination status was provided only in 44% whereas status was unknown in others. 59% of the samples were from symptomatic patients, while 17% were from asymptomatic and the remainder was not documented. Samples that were subjected to sequencing covered a scale of 10-30 in the N gene Cut-off threshold (Ct) with 139 samples within the Ct of 15-25.
Bioinformatics analysis of sequencing data generated a SARS CoV-2 phylogenetic tree depicted in Figure 2,
Figure 2: Nextclade based Phylogenetic Tree created using Sequences data during the study period in Sri Lanka.
which illustrated the different variants according to WHO nomenclature during the period. The circles represent the Sri Lanka sequences in comparison with published sequences from all over the world. Here, Nextclade lineages are clustered according to the indicated color code. Figure 2 discloses the details of evolutions such as nucleotide and amino acid changes from root, divergence, and clade leading to understand the virus molecular epidemic profile of the country at a given specific time point. Retrospective analysis revealed variants distributed throughout the period were 58% Omicron, 22% Delta, 4% Alpha, and only less than 1% of Kappa variant. 16% study samples were remained unassigned Table 2.
Table 2: Variants distribution during the period.
Variant | Frequency (n = 207) |
Omicron (B.1.1.529) | 119 (57.49%) |
Delta (B.1.617.2) | 46 (22.22%) |
Alpha (B.1.1.7) | 8 (3.87%) |
Kappa (B.1.617.1) | 1 (0.48%) |
Unassigned | 33 (15.94%) |
Data in the Table 3
Table 3: Distribution of Variants/Sub variants during the period.
Month | Variant/Frequency | Sub Variant/Frequency | |
December,2021 (n = 53) | Delta 42 (79.2%) | AY.104 | 28 (66.7%) |
AY.28 | 6 (14.3%) | ||
AY.39 | 4 (9.5%) | ||
AY.67 | 2 (4.8%) | ||
AY.95 | 1 (2.4%) | ||
AY.101 | 1 (2.4%) | ||
Omicron 3 (5.7%) | BA.1 | 3 (100%) | |
January,2022 (n = 95) | Omicron 66 (69.47%) | BA.1 | 66 (100%) |
Alpha 8 (8.42%) | B.1.1.7 | 8 (100%) | |
Delta 4 (4.21%) | AY.104 | 3 (100%) | |
Kappa 1 (1.05%) | - | ||
July,2022 (n = 36) | Omicron 34 (94.4%) | BA.5 | 27 (79.4%) |
BA.2 | 6 (17.6%) | ||
BE.1 | 1 (2.9%) | ||
January,2023 (n = 23) | Omicron 16 (69.57%) | BA.5 | 4 (25.00%) |
BA.2 | 4 (25.00%) | ||
XBB.1 | 4 (25.00%) | ||
BF.15 | 1 (6.25%) | ||
BF.28 | 1 (6.25%) | ||
XBB.3 | 1 (6.25%) | ||
CH.1.1 | 1 (6.25%) |
displayed different sub-variants circulated throughout the study period. It was evident leading variants circulated in the months of December 2021 and January 2022 were Delta (79%), Omicron (6%) and Omicron (69%), Delta (4%) respectively. Different variants were circulated during the period, and highest numbers of sub variants were associated with Omicron variant. Association between Ct value and variant assignation is demonstrated in Figure 3.
Figure 3: Association of Ct value and Variant Assigned %.
It was 100% in samples with Ct values 10-15 while only 45% demonstration in samples with a Ct value over 25.
Sample variables and variant distribution are displayed in the Table 4
Table 4: Distribution of Variants with the sample Variables.
Variable | Alpha (n = 8) | Delta (n = 46) | Omicron (n = 119) |
Age | |||
< 1 | - | - | 2 (0.97%) |
15-Jan | 6 (2.90%) | 4 (1.93%) | 13 (6.28%) |
16-50 | 2 (0.97%) | 26 (12.56%) | 80 (38.65%) |
51-65 | - | 12 (5.80%) | 18 (8.70%) |
>65 | - | 4 (1.93%) | 6 (2.90%) |
Province | |||
Western | 6 (2.90%) | 25 (12.08%) | 105 (50.72%) |
Northwestern | - | 21 (10.14%) | 6 (2.90%) |
Southern | 2 (0.97%) | - | 7 (3.38%) |
Eastern | - | - | 2 (0.97%) |
Sabaragamuwa | - | - | 1 (0.48%) |
and Figure 4. Omicron variant was circulated among all
Figure 4: Distribution of Variants with the sample Variables.
age groups and in all the provinces. Delta variant was detected in all age groups except in less than one year old patient, but it was detected only in two provinces. Both genders were affected with Omicron (58%), Delta (22%) and Alpha (4%). 30% of the vaccinated population was affected with the Omicron variant. Symptomatic clinical profile was detected in higher percentages than asymptomatic profiles associated with both Delta and Omicron variants.
DISCUSSION
Genomic surveillance is a crucial weapon in the public health fight against infectious diseases, providing rapid identification and complete characterization of infectious disease pathogens. Surveillance related to the SARS-CoV-2 is highly recommended by WHO since RNA viruses are often characterized with high mutation rates [14,15]. Moreover, identification of mutations is critical for not only understanding the infectious mechanism but also for tracking the evolution and transmission routes of the virus [4,16-18]. The manuscript describes the nanopore based gene sequencing, criteria for sample selection and appraises the analysis of putative variants during the period which covers different COVID waves and post COVID period. Criteria considered for selection of samples, filtered the viral load, tracked the imported variants, vaccine escaped mutants, variants in critically ill patients and in deaths as well as in reinfected patients.
We have setup and utilize the nanopore technology for generating and analyzing sequencing data due to its several advantages of this technique which include, single molecule sequencing, enable rapid generation of sequencing data and real-time analysis [19]. Further we experienced, that it requires a comparatively simple procedures for library preparation, offer flexibility in sample throughput by accommodating low to high numbers of specimens per flow-cell, and most importantly its low capital and recurrent cost which is well suited for any low resource setting country. Inbuilt quality control produces standard plots such as distributions of read lengths and quality scores, the number of reads generated per barcode, and the total yield of bases over time are some of the benefits added while using this technology. Along with plots it simplifies optimization of laboratory procedures in sequencing, through rapid diagnosis of common issues like bubbles introduced during library loading, and the presence of contaminants during sequencing, as which also explained well in several studies [20,21].
In this study, 207 positive SARS-CoV-2 genomes were subjected to genomic sequencing. The study analyzed combine residual samples received at the reference laboratory as well as the samples aliquoted and stored at -80? in the reference laboratory during the 1st and 2nd waves. In addition, samplespositive following COVID-19 Ag testing too counted for genome sequence in the current study. The study comprised samples from four months to 80 years of age whereas clinical profile varies from asymptomatic infection to symptomatic, deaths and with some unknown clinical profiles. Majority of the samples were from males and the rest represented the female population. The study subjects mixed with vaccinated, unvaccinated, and unknown status. Samples from different provinces accommodated to sequence study, announced highest numbers are from Western province.
Genomic sequencing data demonstrated majority of samples to be assigned with variants while some samples were unassigned and is most likely due to greater RNA instability or may be due to inadequate original RNA load. Interestingly, almost 70% of Ag positive samples subjected to sequencing were assigned with variants which is an encouraging sign for sequencing during the post COIVD-19 period. The overall incidence of Omicron variant, heavily mutated variant was symbolic indicating Omicron activity in the period of October 2021 to January 2023, confirming the current study period was basically after the 1st and 2nd COVID-19 waves which the 1st and 2nd waves went on from January 2020 to October 2021. However, Delta, Alpha and Kappa variants were also identified for a lesser extent at the same time as the study analyzed a few stored samples at the reference center. Meanwhile, monthly distribution of Omicron variant shown marked increase from 5.7% to 69.47% while Delta variant shown marked decrease from 79.2% to 4.21% from December 2021 to January 2022, representing the trends in some other counties [22]. Beside this feature, same data demarcated the 2nd and the 3rd COVID-19 waves, in Sri Lanka. Apart from those signals, data demonstrated the significant surge of the 3rd wave around January 2022, which may coincide with the emergence of the specific Omicron variant. Alternately, data spelled how Delta variant was replaced by the Omicron variant during the months of January 2022 to July 2022. This shift in the country explained the findings that have been observed globally owing to increased mutations of the Omicron variant [22].
Interestingly, bioinformatics analysis of sequencing data generated the SARS-CoV-2 family tree which showed off branching into many limbs and illustrated the phylogeny of the variants detected in Sri Lanka during the targeted period. This powerful genetic tool makes sense of genome sequence diversity leading to emergence of viruses, sub variants and trace how they are related through sub-sampling on different geographics and different time periods as of its core fundamentals. It conveys the vital information of subsequent mutation rates with time where it was low in Delta variant and considerably high in Omicron which led to massive changes in the Omicron overtime. The rapid mutations acquired by the virus have significantly contributed to the 3rd COVID-19 wave in the pandemic
The sequence data exhibited Delta variant dominance during the December 2021 with AY sub- variants. Alpha variant or sub variant B.1.1.7 detection was relatively low revealing that it was more prevalent in 1st and 2nd waves across the world [22].Although, different variants were circulated during the study period, highest number of sub variants were associated with the Omicron. Emergence of Omicron was noted in January 2022 drastically and it continued to predominate over time with significant number of sub-variants namely BA.1, BA.5, BA.2, BE.1, XBB.1, XBB.3, BF.15, BF.28 and CH.1.1 during the rest of the study period parallel to the other countries [23]. Sri Lanka too has series of recent mutants XBB and BF although BF.7 and XBB.1.5 which are more virulent and transmissible [23,24] not identified within the current data. Emergence of variants and sub variants justified the high mutation rate of the SARS-CoV-2 RNA virus. Furthermore, bioinformatics analysis of sequencing data emphasizes the interval of emergence of BF and XBB. These significant mutations attributed to gradual disappearance of BA.5 sub variant which is comparable to the global scenario [23,24,25].
As previously mentioned in the text, 201 representative samples were selected with a Ct value spanning from10 to 30, and six others were from Ag positive samples tested during the post COVID-19 period due to the limited availability of rRT-PCR positive samples. The analysis disclosed lower Ct values to be more productive for genome analysis and the context assures a 100% or near detection with Ct value less than 25 among the study samples. Findings also persuades to sequence Ag positive samples instead of obliging Ct values over 25. This fact is justifiable since antigen positivity showed up in early stage of the infection with a relatively significant viral load. This further reveals a correlation between the ability to analyze SARS-CoV-2 viral genome with nanopore technology in the presence of higher viral load. However, the nasopharyngeal swabs predominated the type of samples in this study with a low number of different other types of samples which limited the possibility of detecting the more superior sample for sequencing..
Data indicated Omicron variant infectivity occurred irrespective of the age and this observation held across the provinces dispatched samples to this study. Delta variant activity remains almost consistent with that of Omicron in relation to the age. However, mix circulation with Alpha, Delta and Omicron was evident in the Western province during the period may be due to high population density in the province. Surprisingly, even though Omicron infectivity was high despite the individual’s positive vaccination status, the current study unable to compare the outcome of Omicron activity within unvaccinated population. Despite the variant circulated, individuals with symptomatic clinical profile were leading among the study population which is comparable to some other countries [4]. Even with the percentage difference of Delta and Omicron, available data is inadequate to convince the association of variants and deaths during the pandemic. Although, the percentage of Omicron activity is greater among many variables contrast to the other variants, this study partially demonstrates the impact of other variants in the country due to the low number of samples taken into the study from 1st and 2nd COVID-19 waves.
CONCLUSION
Conclusively, the document summarized the advantagesof Nanopore Technology, it’s establishment by enabling whole genome sequencing facility in a low resource setting county to contribute to SARS-CoV-2 surveillance locally and further empowered by global surveillance. The present study declared the emergence, prevalence, and distribution of variants locally, which were related to different COVID-19 waves in Sri Lanka. Ongoing sequencing efforts are critical to monitor subtypes and assist in identifying emerging variants of concern which will contribute to the global effort toward elucidating the molecular epidemic profile of the virus
ACKNOWLEDGMENT
We thank Dr. Palitha Abeykoon, former WHO Special Envoy, Dr. Anil Jasinghe, former DGHS for assistance and guidance towards establishment of gene sequencing at Reference laboratory in Sri Lanka and the staff of the Department of Virology, Medical Research Institute, Colombo 8.
REFERENCES
6. https://en.wikipedia.org”Whole genome sequencings. Wikipedia.
8. https://www.sciencedirect.com”Nanopore Sequencing.
11. https://www.qiagen.com/virus.”QIAamp Viral RNA mini kit 250, QIAGEN group.
23. https://www.fda.gov/”SARS -CoV-2 Viral Mutations: Impact on COVID-19 Tests. FDA. 2023.
24. https://www.who.int/”TAG-VE statement on Omicron sub lineages BQ.1 and XBB. WHO Statement, 2022
25. https://www.usnews.com”CDC: Omicron Subvariants BA.4.6, BF.7 increasing while BA.5 declines. By Cecelia Smith-Schoenwalder. 2022