Genomic analyses of Neisseria gonorrhoeae reveal an association of the gonococcal genetic island with antimicrobial resistance

Summary Objectives Antimicrobial resistance (AMR) threatens our ability to treat the sexually transmitted bacterial infection gonorrhoea. The increasing availability of whole genome sequence (WGS) data from Neisseria gonorrhoeae isolates, however, provides us with an opportunity in which WGS can be mined for AMR determinants. Methods Chromosomal and plasmid genes implicated in AMR were catalogued on the PubMLST Neisseria database (http://pubmlst.org/neisseria). AMR genotypes were identified in WGS from 289 gonococci for which MICs against several antimicrobial compounds had been determined. Whole genome comparisons were undertaken using whole genome MLST (wgMLST). Results Clusters of isolates with distinct AMR genotypes were apparent following wgMLST analysis consistent with the occurrence of genome wide genetic variation. This included the presence of the gonococcal genetic island (GGI), a type 4 secretion system shown to increase recombination and for which possession was significantly associated with AMR to multiple antimicrobials. Conclusions Evolution of the gonococcal genome occurs in response to antimicrobial selective pressure resulting in the formation of distinct N. gonorrhoeae populations evidenced by the wgMLST clusters seen here. Genomic islands offer selective advantages to host bacteria and possession of the GGI may, not only facilitate the spread of AMR in gonococcal populations, but may also confer fitness advantages.

KEYWORDS Whole-genome sequencing; Antimicrobial resistance; Type IV secretion system; Gene-by-gene annotation Summary Objectives: Antimicrobial resistance (AMR) threatens our ability to treat the sexually transmitted bacterial infection gonorrhoea. The increasing availability of whole genome sequence (WGS) data from Neisseria gonorrhoeae isolates, however, provides us with an opportunity in which WGS can be mined for AMR determinants. Methods: Chromosomal and plasmid genes implicated in AMR were catalogued on the PubMLST Neisseria database (http://pubmlst.org/neisseria). AMR genotypes were identified in WGS from 289 gonococci for which MICs against several antimicrobial compounds had been determined. Whole genome comparisons were undertaken using whole genome MLST (wgMLST). Results: Clusters of isolates with distinct AMR genotypes were apparent following wgMLST analysis consistent with the occurrence of genome wide genetic variation. This included the presence of the gonococcal genetic island (GGI), a type 4 secretion system shown to increase recombination and for which possession was significantly associated with AMR to multiple antimicrobials. Conclusions: Evolution of the gonococcal genome occurs in response to antimicrobial selective pressure resulting in the formation of distinct N. gonorrhoeae populations evidenced by the wgMLST clusters seen here. Genomic islands offer selective advantages to host bacteria and Introduction Neisseria gonorrhoeae, the aetiological agent of the sexually transmitted disease gonorrhoea, annually causes an estimated 108 million cases globally. 1 Untreated gonorrhoea can result in severe sequelae including pelvic inflammatory disease, infertility, neonatal conjunctivitis as well as disseminated gonococcal infections. Gonorrhoea may also lead to increased HIV transmission. 2 While effective treatment of gonorrhoea is a priority for public health globally, treatment options have diminished as N. gonorrhoeae strains have developed resistance to multiple classes of antibiotics. 3 Gonococci become resistant to antibiotics through spontaneous mutation and/or horizontal genetic transfer (HGT) with resistance conferred through all known mechanisms including antimicrobial inactivation, antimicrobial target alteration as well as increased export and decreased uptake of antimicrobial compounds. 2 For example, resistance to fluoroquinolones, which inhibit the action of topoisomerase enzymes involved in DNA replication, occurs through amino acid alterations in the chromosomal DNA gyrase gene, gyrA and/or the DNA topoisomerase gene, parC. 4 The penicillin binding proteins 1 and 2 (PBP1 and PBP2) encoded by ponA and penA respectively are essential in the final stages of peptidoglycan synthesis involved in cell wall assembly. Beta-lactams such as penicillin and cephalosporin target PBP1 and PBP2 inhibiting cell wall synthesis; however, non-synonymous mutations combined with recombination alter the antibiotic target thereby limiting beta-lactam activity. 5 Resistance to spectinomycin and azithromycin, which both interfere with protein synthesis, occurs through point mutations in the nucleotide sequences encoding either 16S rRNA or 23S rRNA respectively. 6,7 Increased export of antimicrobial compounds may occur through alterations of the mtrR efflux pump repressor gene and/or its associated promoter resulting in overexpression of the MtrCDE efflux pump, 8,9 while decreased antimicrobial uptake occurs through alteration of the major outer membrane protein PorB encoded by porB1b (also known as penB). 10 Finally, antimicrobial inactivation may result from plasmid-mediated beta-lactamases and/or tetM genes which facilitate penicillin and/or tetracycline resistance. 11,12 Advances in sequencing and bioinformatics technology provide rapid and automated analysis of whole genome sequence data (WGS) and understanding antimicrobial resistance (AMR) using WGS is likely to become essential in combatting AMR. For example, associations between resistance to the third generation cephalosporin, cefixime and possession of penA mosaic alleles have been identified in several WGS studies undertaken in N. gonorrhoeae. 6,13,14 The PubMLST.org/neisseria website archives and annotates, at the time of writing, >7000 WGS data from multiple Neisseria species including N. gonorrhoeae. 15 WGS data deposited in the database are annotated, gene-by-gene, enabling rapid extraction of strain information and enhancing surveillance. 16 Pivotal to surveillance is the capacity for AMR detection to be comparable across datasets and requires AMR determinants to be annotated in a readily accessible and reproducible format available to the entire community. In this study, a catalogue of all known genes implicated in AMR is provided with genomic comparison of WGS data from a representative N. gonorrhoeae dataset identifying distinct gonococcal populations clustering by AMR genotype indicative of the presence of additional genomic elements associated with AMR. This included a type 4 secretion system (T4SS) also known as the gonococcal genetic island (GGI) which is known to enhance HGT through the secretion of single stranded DNA. 17 Data presented here reveal that the T4SS was significantly associated with gonococci exhibiting reduced susceptibility to multiple antimicrobial compounds. The presence of the T4SS may therefore not only offer selective advantages to host bacteria but may also facilitate the spread of AMR in gonococcal populations.

Materials and methods
Isolate collections and WGS analyses WGS data from published isolate collections included: i) 236 isolates collected from sentinel public STD clinics by the US Centers for Disease Control and Prevention Gonococcal Isolate Surveillance Project; and ii) 53 isolates of diverse origin dating from the 1980s to 2011. Isolates had been analysed for antimicrobial minimum inhibitory concentrations (MICs) to several antibiotics. 13,14 Short reads were obtained from the European Nucleotide Archive (ENA) and assembled de novo using VELVET in combination with VEL-VETOPTIMISER as previously described. 18 The resulting contigs were uploaded to the Bacterial Isolate Genome Sequence (BIGSdb) genomics platform hosted on www. pubmlst.org/neisseria. 15 WGS data were compared using the genome comparator tool, implemented within the PubMLST.org/neisseria website which runs the BIGSdb genomics platform. 15,18,19 Using this tool, loci defined in the database or an annotated reference genome can be compared among genomes. Using a reference genome, the coding sequences within the reference annotation are extracted and compared against assembled WGS contigs. Unique allele sequences at each locus are designated with an integer starting at 1 (representing identity with the reference sequence) eventually leading to a genome-wide multi locus profile (wgMLST) from which a distance matrix can be generated and resolved into networks using the NeighborNet algorithm implemented in Splitstree. 20 In this study, the reference genomes from N. gonorrhoeae isolates FA1090 (accession number NC_002946) and MS11 (accession number NC_022240) were employed.
The GGI characterised in N. gonorrhoeae isolate MS11 (Accession number AY803022) was used as a reference. It is composed of 62 open reading frames and, sequences from each of these were defined in the database (Supplementary Table 1). 17 WGS were then annotated for the presence or absence of this element.

Annotation of AMR loci
Loci defined in pubmlst.org/neisseria are allocated a valuefree nomenclature using the prefix NEIS followed by 4 digits. 18 AMR loci were designated accordingly and were linked with any number of aliases including locus tags from finished genomes or gene names (Table 1). For example, penicillin binding protein 2 was defined as NEIS1753 and was associated with the locus tag NGO1542 (from the reference N. gonorrhoeae isolate FA1090) and the gene name, penA. As alterations in promoter regions located upstream of specific loci have been found to increase antibiotic resistance, 9 specific loci were assigned the pro suffix (for promoter) followed by the corresponding locus prefix for the adjacent gene to differentiate them from coding sequences, e.g. pro NEIS1635 ( Table 1).
The plasmid containing the tetM gene, conferring resistance to tetracycline from N. gonorrhoeae 5289 (GU479466), 12 was used to define loci NEIS2202-NEIS2249, with NEIS2210 designating the tetM gene. Sequences from the beta-lactamase plasmid conferring resistance to betalactams were retrieved from plasmid pSJ5.2 containing bla-TEM1 , and defined as NEIS2357eNEIS2360 (DQ355980) with NEIS2357 designating the bla TEM gene (Table 1). 21 The BIGSdb software includes 'autotagger' and "autodefiner" tools which scan deposited WGS against defined loci identifying alleles greater than or equal to 98% sequence identity. This process runs in the background and automatically updates isolate records with specific allele numbers, marking regions on assembled contiguous sequences (contigs) for any of the defined loci. Loci with sequence identity <98% were manually checked and curated. Using the molecular evolutionary analysis software MEGA v6, deduced amino acid sequences were aligned identifying polymorphic sites associated with antimicrobial resistance and enabling alleles containing these mutations to be detected ( Table 2). 22 Four copies of 16S rRNA and 23S rRNA are present in N. gonorrhoeae genomes. Reference sequences containing 16S and 23S rRNA along with flanking loci were    created against which short reads from isolates were mapped using the BurrowseWheeler Alignment (BWA) software package and subsequently viewed using Tablet. 23,24 Mapped reads were then visually inspected and nucleotide substitutions verified.

Antimicrobial resistance phenotype
MIC cut-offs, guided by the US GISP antimicrobial susceptibility criteria, were defined for each antimicrobial compound (Supplementary Table 2). Phenotypic testing of AMR is the current preferred method for determining antimicrobial susceptibility and is the "gold standard" with any new approaches, such as genotypic AMR, requiring validation against this using sensitivity, specificity and predictive values. These were calculated as described previously. 25

N. gonorrhoeae wgMLST
Whole genome analysis identified a star burst phylogeny with isolates forming discrete clusters associated with distinct AMR genotypes and the presence/absence of the T4SS, known as the gonococcal genetic island (GGI) (Fig. 1,  Table 2) which is not associated with reduced susceptibility to cephalosporins. 13 Allelic profiles for the other AMR genes were, however, the same as the other ST-1901 isolates in this cluster. Another group of isolates, previously identified as cluster 2 by Grad et al. but indicated here as cluster 8, were ST-1580, contained NEIS1753 allele 266 as well as the GGI but were susceptible to ciprofloxacin. 13 These isolates possessed the smaller transferrin binding protein B gene (isotype I) implicated in iron acquisition and predominantly associated with Neisseria meningitidis isolates from clonal complex ST-11 as well as commensal Neisseria. 13,26 Cluster 7 contained isolates from ST-9363 and lacked the GGI but were  (Fig. 1). Clusters 5 and 6 included ST-1588 and ST-1893 isolates on longer branches indicative of diversity. FA1090 and MS11 were part of a large diverse group of isolates, some of which dating from the 1980s. 14,27

AMR analysis
A catalogue of all AMR determinants in gonococci is described (Table 1). Alleles containing mutations associated with resistance were identified and linked with annotations describing principal mutations ( Table 2). Mutations in ponA (NEIS0414) associated with resistance to betalactam compounds were identified in 203/289 (70%) isolates with allele 13 the most predominant (187/203, 92%) ( Table 2, Supplementary Table 4). Penicillin binding protein 2, penA (NEIS1753) alleles 266 and 281 contained penA mosaic motif XXXIV, which is associated with reduced susceptibility to third generation cephalosporins; however, penA allele 281 contained an additional non-synonymous mutation (D101 / E) found in one isolate only, GCGS126. This isolate had a cefixime MIC >0.125 mg/ml but did not possess mutations conferring resistance in any other AMRassociated loci (Supplementary Table 4). penA (NEIS1753) allele 266 was found in 122/289 (42%) isolates; however, 26/122 (21%) did not have mutations associated with resistance in other AMR loci. Although these isolates exhibited reduced susceptibility to cefixime, they did not have resistant phenotypic MIC values to any of the other antimicrobials (Supplementary Table 4).
The adenine deletion in the 13bp promoter region associated with increased expression of the MtrCDE efflux pump was found in 178/289 (62%) isolates ( pro NEIS1635 allele 3) 9 and was associated with mutations in many of the other AMR loci including penA (NEIS1753), ponA (NEIS0414) and porB (NEIS2020). Four isolates were found with an A / C substitution in the promoter region, pro-NEIS1635 allele 4, with these isolates also containing premature stop codons in mtrR gene consistent with putative non-functional MtrR proteins. These were associated with resistant MIC to azithromycin 8 ( Table 2). Mutations associated with high AMR MIC values were not detected in the efflux pumps MacAB and FarAB. 32,33 None of the isolates were found with nucleotide substitution G / T in the À10 promoter region (5 0 -TAGAAT-3 0 ) upstream of macA ( pro-NEIS0488) and no significant mutations were found in the transcriptional regulator, NEIS0374 (farR). 34 Overexpression of NEIS0763 (norM ) may occur when a T / C nucleotide occurs in the À35 box in the promoter region ( pro NEIS0763) (TTGACG to CTGACG) 35 and all of the isolates contained this substitution.

Phenotype vs genotype correlation
High congruence was observed between phenotypic AMR and the predicted genotypic AMR (Table 3). Discrepancies occurred when comparing beta-lactam resistance profiles with, for example, nine isolates containing MIC values 1 mg/ml to penicillin but which had AMR amino acid mutations associated with resistance in loci NEIS1753, NEIS0414, NEIS2020, NEIS0408 and pro NEIS1635 for which other isolates with the same mutations had MIC values ranging from 2 to 8 mg/ml to penicillin. In addition, four isolates with an AMR genotype had reduced susceptibility to cefixime and penicillin but were susceptible to ceftriaxone and cefpodoxime. Two isolates had genotypic profiles consistent with reduced susceptibility to cefixime but did not have a corresponding resistant phenotype. Three isolates contained a beta-lactamase plasmid but had MICs 1 to penicillin.
PPV scores were over 95% for each antimicrobial compound consistent with genotypic AMR performing as well as phenotypic AMR in detecting antimicrobial resistance (Table 4). NPV scores indicated whether isolates with a susceptible phenotype also had a susceptible genotype and NPV scores were low for penicillin and tetracycline but high for cefixime, ciprofloxacin and azithromycin (Table 4). Sensitivity and specificity scores were high for all compounds.

Discussion
Direct deduction of resistance from WGS data provides an important opportunity for the enhanced surveillance of AMR for public health benefit. Gonococcal AMR is, however, a complex phenotype resulting from single to multiple genetic changes often occurring in synergy and resulting in increasing levels of antimicrobial resistance to several compounds with the added uncertainty that additional unknown genetic elements may also be playing a role. 3,5 The complexity of gonococcal AMR is further exacerbated by the presence of multiple gene names and lack of webaccessible repositories with which sequence data can be queried. In this study, all of the known genes implicated in AMR were catalogued defining AMR determinants in a readily accessible, reproducible format found on the www.pubmlst.org/neisseria website, which hosts WGS data from multiple Neisseria species (Tables 1 and 2). 19 Gene-by-gene annotation of AMR loci, combined with wgMLST analysis, identified clusters of isolates with distinct AMR genotypes. Some of these also possessed the GGI, a T4SS known to facilitate HGT through the secretion of single stranded DNA (ssDNA) into the extracellular environment (Fig. 1, Supplementary Table 1). 17 T4SSs are mobile genetic elements and play a major role in HGT allowing bacteria to outcompete other bacterial species through the acquisition of a variety of fitness genes including catabolic, virulence and antibiotic resistance. For example, antimicrobial resistance in Haemophilus influenzae has been shown to be associated with the acquisition of integrative conjugative elements known as ICEs, a type of T4SS. 36 It is also known that mobile genetic elements such as plasmids, phages and genomic islands play an important role in the emergence of pathogenic Enterobacteriaceae. 37 A number of hypothetical genes remain to be characterised in the GGI which may offer additional selective advantages to host gonococci (Supplementary Table 1). The association, however, of the T4SS in this study with N. gonorrhoeae isolates exhibiting reduced susceptibility to multiple antimicrobial compounds is consistent with the likelihood that this element will accelerate the spread of AMR.
Expansion of distinct gonococcal populations may also be promoted through the activity of toxineantitoxin (TA) subunits encoded by the genes, ydhB (NEIS2281) and ydcA (NEIS2282), located in the GGI (Supplementary Table 1). TA are common features of mobile genetic elements and the negative effects of cell growth conferred by the toxin are suppressed by an antitoxin. Cells lacking the mobile genetic element and, therefore the TA, are harmed by the toxin producing cells, which are themselves immune due to possession of the antitoxin. 38,39 Thus, the presence of the GGI may have been a significant factor in the expansion in the Western Hemisphere of gonococci belonging to ST-1901 (NG-MAST ST-1407). In addition, isolates possessing both the GGI and plasmid mediated AMR were not prevalent in this dataset. Many of GGI encoded genes show similarity to those from the Escherichia coli F-plasmid conjugation system and, the order of the genes in the GGI is highly similar to the IncF family of conjugative plasmids, consistent with the GGI being an ancestral chromosomally inserted plasmid. 17,40  In particular, the DNA methylases, ydg (NEIS2288) and ydhA (NEIS2289) can be found which may enable within host competition between plasmids consistent with the low prevalence of isolates here possessing both a plasmid and the GGI 41 (Supplementary Tables 1 and 4). The increasing number of bacteria becoming resistant to multiple antimicrobials is a major global concern with health officials warning of the possibility of untreatable bacterial infections. 42 The tools developed in this study present a means through which AMR can be deduced from WGS while also permitting AMR genotypes to be compared between isolates and linked with additional genomic data. Furthermore, the availability of a web-accessible database enables globally distinct isolate collections, where selection pressures will be different, to be compared, thereby enriching surveillance. Concordance was high between phenotypic and genotypic AMR with most of the discrepancies observed for the beta-lactam compounds and tetracycline, for which multiple genetic components are implicated in conferring resistance (Tables 3 and 4). In most cases, AMR genotypes were identified which did not correlate with AMR phenotypes (i.e. isolates had susceptible phenotypes despite the presence of resistant genotypes). These correlated with some of the lower NPV scores obtained for penicillin and tetracycline ( Table 4). The high PPV, specificity and sensitivity values are encouraging, however, and indicate that molecular AMR diagnosis may be useful in surveillance particularly in settings where diagnosis relies on nucleic amplification tests (NAATS) and cultures are not available (Table 4). 43 N. gonorrhoeae has developed resistance to all antimicrobials recommended in the first-line empirical treatment of gonorrhoeae and in order to understand and limit the onset of an era of untreatable gonorrhoea, it is essential that factors underpinning the acquisition of antimicrobial resistance are understood and monitored. The data and tools presented here provide a model in which this can be accomplished using an easily accessible database with the likelihood that such interfaces will become particularly important as more WGS data become available.

Funding
This study was jointly funded by a Wellcome Institutional Strategic Support Fund (WTISSF) and the Oxford Martin School, University of Oxford (H2RXJo00). MCJM was supported by the Wellcome Trust (087622). YHG was supported by the National Institutes of Health (K08-AI104767-01). DLT supported by the CDC and CDC's Office of Advanced Molecular Detection (AMD-18).