Metagenomic next-generation sequencing aids the diagnosis of viral infections in febrile returning travellers

a MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK b Department of Infectious Diseases, Queen Elizabeth University Hospital, 1345 Govan Rd, Govan, Glasgow G51 4TF, UK c West of Scotland Specialist Virology Centre, Level 5, New Lister Building, Glasgow Royal Infirmary, 10-16 Alexandra Parade, Glasgow G31 2ER, UK d Queen Elizabeth University Hospital, 1345 Govan Rd, Govan, Glasgow G51 4TF, UK


Introduction
The ease of international travel has increased the potential for transmission of a wide range of viruses, including those of epidemic potential. Identifying these pathogens is crucial for treatment and for the prevention of further transmission. Traditional diagnostic tests require a priori knowledge of potential pathogenic agents and it is increasingly challenging for infectious disease physicians to select relevant diagnostic tests for pathogens im-ported from a wide number of locations across the globe. 1 Such tests are often batched and sent to a reference laboratory, often resulting in delayed diagnosis. Even specialized tests may fail to identify a rapidly evolving pathogen, due to the presence of variations at PCR primer binding sites. Furthermore, new and emerging pathogens cannot be identified using standard assays. In contrast, new or emerging infections and those not considered by a treating physician may readily be identified with MNGS. [2][3][4][5][6] Whole viral genome sequences generated using MinION or Illumina-based technology additionally be used for resistance screening and finescale phyloepidemiological analyses to inform public health control strategies. 7,8 In this study, we aimed to evaluate the potential for MNGs as an all-in-one diagnostic tool in febrile patients with a recent history of overseas travel. The value of MNGS as a tool to identify pathogens not identified by traditional methods has been described in case reports of patients infected with a number of pathogens. One notable example involved the use of MNGS to diagnose a 14-year old boy with neuroleptospirosis, resulting in a change of management to incorporate appropriate antibacterial therapy. 9 It has also been used in outbreaks of viral haemorrhagic fever (VHF) where standard diagnostic assays did not reveal a diagnosis. 5,10 A novel strain of Ebola virus, Bundibugyo virus, not readily identified using standard assays was identified following metagenomic 454 pyrosequencing of patient samples in 2008 during an outbreak of VHF in Western Uganda. 5 454 pyrosequencing and subsequently Illumina sequencing were used in another outbreak of viral haemorrhagic fever in the Democratic Republic of the Congo and revealed a novel rhabdovirus -Bas Congo virus -as the aetiological agent. 10 Causes of VHF had previously included members of the Filovirus and Arenavirus families but not the Rhabdovirus family and therefore pathogen-specific diagnostic assays would have been unhelpful. Without the use of MNGS, it is unlikely that this pathogen would have been identified.
A number of NGS platforms have been used to diagnose viral pathogens, based on different technologies. 11 These are divided into short and long-read methods. At present, short read methods are higher-throughput and cheaper than long-read methods. Short read methods divide into sequencing by ligation and sequencing by synthesis. 454 pyrosequencing (in which the release of a pyrophosphate is incorporated in a sequencing by synthesis reaction that results in the release of light) was the first available short read NGS platform but is limited by a high error rate (largely due to polynucleotide repeats). Illumina sequencing is also a short-read sequencing by synthesis method and is based on the high-throughput detection of fluorescent markers as they are incorporated into an expanding chain. It is cheaper and has a higher level of accuracy allowing accurate sequencing of genomes from new pathogens and for fine-scale phyloepidemiology. It is limited by sequence read lengths of around 500 base pairs, requiring advanced bioinformatic software to reconstruct viral genomes. Longer sequence reads may be acquired on other platforms such as the PacBio, the Oxford nanopore and the Ion Torrent platforms that are based on semiconductor technology (due to a change in ionic charge when a base is added to the expanding chain). These technologies are limited however by a decreased accuracy when compared with Illuminabased methods. A particularly exciting advance has been the development of the MinIon platform which is portable, the size of a mobile telephone and can be used in the field without an external internet or power source. 12 The MinIon platform has been used recently to track the Zika virus outbreak in South America and Ebola virus disease in the Democratic Republic of Congo. 12,13 It is limited in accuracy however and in the need for high concentrations of input DNA necessitating prior amplification of genomic material by PCR but has potential for further development as a diagnostic tool if high error rates can be addressed.

Patient consent and sample collection
The retrospective use of samples from patients admitted with a febrile illness following overseas travel between 2013 and 2016 was granted by the NHSGGC Bio-repository and Pathology Tissue Resource committee (16/WS/0207NHS). Samples were anonymized and patients who had not had an HIV test during admission were excluded (to avoid the possibility of diagnosing a new HIV infection using MNGS). Inclusion criteria were a fever greater than 38 °C, age ≥18 years, a preceding negative HIV test and overseas travel within 12 weeks of presentation to hospital. 44 sam-ples from 40 patients were available for MNGS analysis. Diagnoses were made by two blinded investigators (an infectious diseases consultant physician and a senior laboratory scientist with expertise in NGS data analysis) prior to comparison with clinical data.

Extraction of DNA and RNA
RNA and DNA were extracted from 200 μl of serum or plasma samples using the automated NucliSens EasyMAG platform at the West of Scotland Specialist Virology Centre, Glasgow Royal Infirmary. Nucleic acid was eluted in 110 μl nuclease-free water and stored at −80 °C until further processing.

Double stranded cDNA synthesis
RNA was reverse transcribed into cDNA using Superscript III reverse transcriptase (Invitrogen) according to manufacturer's recommendations. The total reaction volume from single-stranded cDNA synthesis (20 μl) was used to create double stranded cDNA with a New England Biolabs Second Strand cDNA synthesis kit according to manufacturer's recommendations in a final volume of 80 μl.

Library preparation and MiSeq sequencing
For library preparation either the Kapa Library Preparation Kit (KK8232 Kapa Biosystems) or the Nextera XT Kit (Illumina FC-131-1024) was used in combination with the NEBNext® Multiplex Oligos for Illumina® (Index Primers Set 1 and 2, E7335 and E7500) or the Nextera® XT Index Kit (FC-131-2001). To estimate the molarity of DNA libraries after indexing PCR, DNA concentration was measured using the High Sensitivity double-stranded DNA Qubit Assay (Invitrogen), and the quality of the DNA was assessed using automated gel electrophoresis using a 2200 TapeStation Instrument (Agilent Technologies). For DNA purification between the library preparation steps, Agencourt® AMPure® XP magnetic beads (Beckman Coulter) and Solid Phase Reversible Immobilization solution (SPRI, 20% PEG, 2.5 M NaCl) were used, respectively. Up to 20 DNA libraries were pooled together for 150 or 250 paired end sequencing using v2 or v3 cartridges (Illumina) on the MiSeq or NextSeq platform according to manufacturer's recommendations.

Bioinformatic analysis
Sequences of low quality and complexity and Illumina adaptor sequences were removed from raw fastq files using Trim_galore (Babraham Bioinformatics) and Prinseq. 14 Raw reads were used to carry out a blastx search using diamond (version 0.8.20) software. Subsequently, de novo assembly and mapping approaches were applied using SPAdes-3.8.0, 15 dipSPAdes 16 and Tanoti (version 1.3, https://github.com/vbsreenu/Tanoti ). Contigs were identified using BLASTn and BLASTx (blastall version 2.2.26) against ncbi nr and nt databases and diamond databases. 17 The output was visualized using Krona plots and BLAST tables. 18 Additionally, a rapid hashbased exploration tool was created and evaluated (Anwesh). This software applies BLAT to screen MNGS datasets with customized curated databases of nucleotide sequences. Two databases were created for this purpose, derived from NCBI ( ftp://ftp.ncbi.nlm.nih. gov/genomes/Viruses/ ) and the 'Approved List of biological agents' published by the UK Health and Safety Executive (HSE). When a match to a sequence was detected during screening with Anwesh or by de novo assembly, this was used for mapping using Tanoti. As MNGS is subject to low-level cross-contamination, we used a threshold of a minimum of coverage of 10% of the viral genome for diagnosis and repeated testing from a run where a sample was positive with a higher number of mapped reads ( Supplementary  Fig. 1). This threshold was chosen as it approximates the size of PCR product used in routine diagnostic assays.

Confirmatory testing
Confirmatory tests were carried out by PCR or serology in the West of Scotland Specialist Virology Centre (WoSSVC), Glasgow Royal Infirmary or sent to the Rare and Imported Pathogens Laboratory (RIPL, Public Health England Porton Down).

Results
Plasma samples from 40 febrile returning travellers and 1 afebrile control patient were analyzed using MNGS ( Fig. 1 ). Relevant travel histories and clinical and standard of care laboratory diagnosis are listed in Supplementary Table 1. MNGS analysis resulted in the detection of 8 viral genomes: Dengue virus 1 (DENV1; n = 2; P17 and P35), Chikungunya virus (CHIKV; n = 2; P18 and P40), hepatitis E virus (HEV; n = 1; P11), hepatitis A virus (HAV; n = 1; P22), mumps virus ( n = 1; P34) and Ebola virus (EBOV; n = 1; P41) ( Fig. 1 (a)). A range of depth and coverage of viral read results were identified (likely reflecting a variety of viral loads) ( Fig. 1 (b)). Three patients who were infected with respiratory viruses based on viral throat swab testing -P12 (influenza A virus), P13 (influenza B virus), and P30 (rhinovirus) could not be fully evaluated by MNGS because respiratory samples were not available for analysis. Additionally, human pegivirus (HPgV) was detected in 3 patients (patients P4, P10 and P30). This is not currently considered to be a human pathogen. These were all genotype 2 and one strain was highly divergent, nearly meeting the criteria for a new genotype based on current ICTV recommendations ( Supplementary  Fig. 2). While the focus of this project was the detection of viral infections, we also detected Plasmodium falciparum and Plasmodium malariae genomic material in patients P2, P3, P21 and P29. Three patients (P24, P34 and P39) were found to have malaria on blood film sampling but not NGS (in each case the date of viral sampling for NGS occurred subsequent to blood film positive results). One of these patients (P34) had been diagnosed previously with P. falciparum malaria on blood film and had sampling for other pathogens because of ongoing fever and lymphadenopathy despite therapyfragments of mumps virus genome were detected in this sample. Subsequent PCR testing by the diagnostic laboratory was negative.
Maximum likelihood phylogenetic analysis of DENV and CHIKV full genomes ( Fig. 2 ) revealed that the CHIKV viral genome from P18 clustered within the Asian strain of CHIKV and was most closely associated with other CHIKV strains from Indonesia in keeping with the travel history of the patient. Phylogenetic analysis of the DENV strains revealed that P35, a returning traveller from Thailand had a strain clustering with genotype 1 strain from China and Thailand while P17, a returning traveller from the Maldives had a strain most closely related to strains from Sri Lanka (this is the first published DENV whole genome sequence from the Maldives). Whole genomes detected in the study have been submitted to GenBank (accession numbers to follow).

Discussion
MNGS has the potential to improve the diagnostic yield of viral, bacterial and parasitic infectious diseases. 2 , 19-28 Importantly, it does not require the requesting physician to consider all diagnostic possibilities when ordering the test; in our small proof-ofconcept study, 3 additional viral diagnoses were made using MNGS (Chikungunya virus in two cases and mumps virus in one). Furthermore, MNGS has the potential to detect multiple pathogens in a single sample. In our study, one patient (34) who had been admitted with Plasmodium falciparum infection but was noted to have prominent cervical lymphadenopathy and ongoing fever despite treatment was found to have fragments of mumps virus following MNGS. This was in keeping with ongoing symptoms of lymphadenopathy and fever despite negative blood films ( P. falciparum was not detected in this case, but at the time of sampling, the blood film was noted to be negative, having been positive at an earlier time point). Full genome data generated by MNGS has other potential applications -for example, identification of virulence genes or resistance mutations including as minority variants and the identification of clusters of infection using phylogenetic approaches. Infections of consequence can also be identified and relevant treatment and public health precautions carried out in real time to aid the treatment of an affected patient. 29 In this study, one returning traveller presented with a relapse of Ebola virus disease associated with meningoencephalitis, 10 months after her original presentation. In this case, the full open reading frame of the virus was sequenced in real time both in plasma and cerebrospinal fluid in order to facilitate experimental treatment selection and was found to be identical in both sites on relapse. This was the first evidence of EBOV causing disease in the central nervous system and the only available sequence obtained from cerebrospinal fluid -a site not previously considered to harbor the virus.
There are a number of limitations to this study. Firstly, our method is designed to detect RNA viruses (the majority of viruses associated with travel are RNA viruses) and employs a DNAse step to reduce sequencing human genomic material. It is likely to detect DNA viruses with an RNA stage in the life cycle but with reduced sensitivity. To increase the detection threshold for DNA viruses, a different approach such as target enrichment (allowing separation of viral from host DNA) or removal of methylated DNA should be considered. Secondly, we did not have the opportunity to test respiratory, CSF and urine samples in the majority of cases; however others have identified novel viral pathogens such as astroviruses using other body fluids and tissues. 20,25,30 Thirdly, we have not formally measured sensitivity of MNGS for the detection of each pathogen identified in this study. We have previously evaluated the threshold for detection of the hepatitis C virus and MNGS can reliably allow reconstruction of a full HCV genome above a threshold of 10,0 0 0 IU/ml using MNGS and 10 0 0 IU/ml using target enrichment approaches. 7 Finally, we did not design this study to include analysis of bacterial genomes. Further validation of this technique and the evaluation of detection sensitivity will be required as the method is rolled out.
There are still barriers to overcome before MNGS is implemented as a routine diagnostic assay. Firstly, MNGS is highly subject to cross-contamination and therefore stringent controls must be in place to manage the pipeline (separation of preparation rooms at each step of the sequencing pathway and strict bioinformatic thresholds). Secondly cost considerations must be made. At the time of writing, the cost of MNGS is approximately $100 USD/sample -this is not much higher than the cost of individual tests requested in febrile patients from centralized specialist centers but will require consideration of batching and may only be feasible in major centers with a high throughput of patients. Formal cost-effectiveness and validation studies are required.
At the current time, the Illumina platform has the highest accuracy, but this may not be essential when using the tool simply to make a diagnosis. The portable MinIon platform has the potential as a bedside test but requires further development of library preparation platforms that are also portable and a reduction in the high amount of starting material required at present. 31,32 Inaccuracy in sequence reads may not be of major importance if diagnosis is the main goal, particularly with viruses where genomes are extremely diverse, differing from another by more than 30%.
There is potential for MNGS to provide a single test to diagnose pathogens that may be viral, parasitic or bacterial on the basis of specific syndromic manifestations, for example, returning travellers with fever. However, wet lab techniques and bioinformatic methods will need to be adapted to issues associated with different pathogens. Diagnosis of bacterial infections using MNGS may be best carried out following culture or 16S PCR. The potential for contamination from bacteria on the skin surface must be considered but is a lesser problem when identifying viral or parasitic pathogens. Malaria is an essential consideration in returning travellers but may require adaptations of sample processing -for example the use of whole blood would be likely to increase the sensitivity of detection of blood stage parasites but would carry the associated need to carry out more sequencing due to dilution of samples with host DNA. 33,34 It is even possible that a single sample could yield information on host mutations rendering the patient more or less susceptible to certain agents (for example CCR5 deficiency is likely to provide protection against the HIV virus but susceptibility to tick-borne encephalitis virus, West Nile virus and influenza). 35,36 However, the development of such a test will require specialists from multiple specialities to co-ordinate and work together to develop appropriate approaches to diagnosis. It is an aim well-worth pursuing.

Declaration of competing interest
None.

Funding
This work was supported by the Wellcome Trust (personal fellowship ET -102789/Z/13/A ) and the Medical Research Council ( MC_UU_12014/1 ).

Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.jinf.2019.08.003 .