two new core chromosome-encoded in

20 Superantigens are ubiquitous within the Streptococcus pyogenes genome, which 21 suggests that superantigen-mediated T-cell activation provides a significant selective 22 advantage. S. pyogenes can carry a variable complement of the 11 known 23 superantigens. We have identified two novel S. pyogenes superantigens, denoted speQ 24 and speR, adjacent to each other in the core-chromosome of isolates belonging to eleven 25 different emm-types. Although distinct from other superantigens, speQ and speR were 26 most closely related to speK and speJ respectively. Recombinant SPEQ and SPER were 27 mitogenic towards human peripheral blood mononuclear cells at ng/ml concentrations, 28 and SPER was found to be more mitogenic than SPEQ.

found in other streptococcal species as well. Three superantigen genes, szeN, szeP and 48 szeF have been found only in S. equi subsp zooepidemicus [3]. Commons et al. later 49 suggested renaming these to speN, speP and speO respectively to standardize the 50 nomenclature across all streptococci [1]. 51 The superantigen genes speG, speJ and smeZ are encoded on the core chromosome but 52 are not ubiquitous among S. pyogenes isolates. The other eight identified S. pyogenes 53 superantigens are associated with prophages which have the potential to be mobile, 54 introducing variability among isolates. As there is variability in the complement of 55 superantigens carried by S. pyogenes isolates, along with mobility and sharing across 56 other streptococcal species, there may be streptococcal superantigens that are yet to be 57 identified. 58

Identification of two new potential superantigen genes; speQ and speR 152
We sequenced the genomes of two viable emm60 isolates, originally collected in 1938 153 from two puerperal sepsis patients, and analyzed the genomes for the presence of 154 superantigens. We could not detect the presence of any of the known streptococcal 155 superantigens by short read sequence mapping or BLAST analysis of de novo 156 assembled genomes. The analysis did, however, indicate the presence of sequence in 157 the genomes of both emm60 isolates with partial homology to speK. We identified this 158 homologous sequence to be within one of two hypothetical genes located immediately 159 downstream of the gene ideS (also known as mac) encoding for an immunoglobulin 160 cleaving protease ( Figure 1A). BLAST indicated that these genes were closely related 161 to other streptococcal superantigens and carried the typical superantigen C terminal -162 grasp domain [1]. We therefore predicted that these would be superantigen genes and 163 denoted them speQ and speR. PCR and Sanger sequencing confirmed the WGS data. 164 BLASTn and BLASTp of completed available S. pyogenes genomes also identified 165 speQ and speR in an emm87 strain NGAS743 (DI45_05770 and DI45_05775 166 respectively; Genbank CP007560.1) [16]. In isolates where full length speQ and speR 167 genes were absent, a C-terminal fragment of speR was present immediately downstream 168 of ideS ( Figure 1A). We also performed BLASTp analysis of the entire NCBI database, 169 excluding S. pyogenes, but did not identify SPEQ or SPER in any other available 170 genomes including other streptococcal species. 171 Phylogenetic analysis of the amino acid sequences of SPEQ, SPER, and all other 172 available superantigen alleles from all streptococcal species [1] demonstrated that, 173 although phylogenetically distinct, SPEQ is closely related to the prophage-associated 174 SPEK sharing 84% amino acid identity, and SPER is most closely related to the 175 chromosomal SPEJ sharing 64% amino acid identity ( Figure 1B). Comparisons were 176 made between SPEQ, SPEK, SPEJ and SPER to identify two superantigen signature 177 amino acid motifs (Supplementary Figure 1) [1,3]. SPER, like SPEJ, had the motif Y-178 G-G-(LIV)-T-x4-N (Prosite PS00277) but only a partial match for this was identified in 179 SPEQ and SPEK. All four superantigens had the motif K-x2-(LIVF)-x4-(LIVF)-D-x3-180 R-x2-L-x5-(LIV)-Y (Prosite PS00278) and a C-terminal zinc binding domain (HxD). 181 To determine the presence of speQR in other S. pyogenes genotypes, publicly available 182 WGS fastq data were obtained from the short read archive for UK isolates [5,6] and 183 USA isolates [4] totaling 4,262 genomes tested covering 86 different genotypes 184 (Supplementary Table 2). Complete speQ and speR were identified in the assembled 185 genome sequence of isolates belonging to the emm-types emm9, 15, 18, 42, 53, 58, 60, 186 77, 87, 94 and 169 (Supplementary Table 3). However, not all isolates belonging to 187 some of these genotypes carried the complete speQR locus, which was unexpected 188 given the lack of association with mobile genetic elements. Only one out of 41 emm18 189 (USA isolate 20154046) had complete speQR, as did 21/24 emm58 isolates and 49/72 190 emm77 isolates. The presence or absence of complete speQR in these genotypes 191 appeared to be associated with divergent lineages and multi-locus sequence types 192 (MLST) within these emm-types (Supplementary Figure 2, Supplementary Table 3), 193 indicative of the same emm gene on completely different genetic backgrounds. In 194 contrast, all emm94 isolates were MLST-89, but 2/50 did not carry the complete speQR 195 allele and formed a separate sub-lineage. As these isolates were still relatively closely 196 related there may have been a horizontal gene transfer event of the speQR region. 197 The majority of isolate genomes that were positive for speQR were also positive for at 198 least one other superantigen gene (Supplementary Table 3). The exceptions to this were 199 4/5 emm60, 1/72 emm77, and 2/2 emm169 isolates where no superantigen genes other 200 than speQR were detected [4]. 201 From the WGS analysis, thirteen DNA alleles for speQ and seven DNA alleles for speR 202 were identified. The variation was limited to single nucleotide polymorphisms, except 203 a region in speQ which varied in the number of a 15bp/5aa repeat (Supplementary 204 Figure 3). This 15bp/5aa region repeated twice, four times and five times in three 205 alleles, speQ.2, speQ.4 and speQ.5 respectively; these alleles were found only in 206 genotype emm9 isolates. Based on amino acid sequence, SPEQ.1, SPEQ.6, SPEQ.8, 207 SPEQ.11 and SPEQ.12 were identical. SPEQ.3, SPEQ.7, SPEQ.9 and SPEQ.13 each 208 differ from SPEQ.1 by one amino acid residue and SPEQ.10 differs by two amino acid 209 residues. For SPER, SPER.1, SPER.2, SPER.4, and SPER.5 were identical by amino 210 acid sequence, but SPER.3, SPER.6 and SPER.7 each differ by one amino acid. 211

Recombinant SPEQ and SPER induced proliferation of human mononuclear cells 212
To determine if SPEQ and SPER were capable of inducing proliferation of human T 213 cells, we recombinantly expressed both proteins in E. coli ( Figure 2A). These 214 recombinant toxins represented gene alleles speQ.1 and speR.1. Purified toxins were 215 then used to stimulate human mononuclear cells (MNCs) and proliferation was 216 measured by BrdU uptake ( Figure 2B). Both SPEQ and SPER induced proliferation, 217 although a 10-fold greater concentration of SPEQ than SPER was required to generate 218 an equivalent response. Proliferation after stimulation with another streptococcal 219 superantigen, SPEC, required 100-1000-fold lower concentration than SPEQ and 220 SPER. As a control, the non-mitogenic IdeS was recombinantly expressed and purified 221 in the same manner as SPEQ and SPER but failed to stimulate any proliferation, as 222 expected. 223

Both speQ and speR are expressed by S. pyogenes during culture 224
To confirm speQ and speR expression by S. pyogenes, transcription and protein 225 expression were measured. RNA was extracted at early, mid and late-logarithmic 226 phases of growth of two emm60 strains and converted to cDNA for PCR. Primers that 227 spanned across both speQ and speR confirmed the two genes are co-transcribed ( Figure  228 2C). Although only semi-quantitative, transcription appeared greatest at early and mid-229 logarithmic phases of growth. 230 Culture supernatants from the same two strains of emm60 S. pyogenes were probed by 231 Western blot for SPEQ and SPER using an antibody raised in mice against recombinant 232 proteins. SPEQ could be detected at late-logarithmic phase and following overnight 233 culture in both emm60 strains ( Figure 2D). Using rSPEQ at known concentrations the 234 estimated concentration of SPEQ was ~90-127 ng/ml in late-logarithmic phase culture 235 and increased to ~155-163 ng/ml by overnight culture. We were, however, unable to 236 detect SPER using the rSPER murine antibody in either strain at any growth phase, 237 which was unexpected given the co-transcription. 238

Discussion 239
We identified two potential superantigen genes present in the chromosomes of two 240 1930's S. pyogenes emm60 isolates and subsequently identified the same genes in 241 isolates belonging to 10 other emm-types in modern international isolates. We termed 242 these genes speQ and speR and confirmed that they were capable of inducing 243 proliferation of human cells. 244 We tested the genomes over 4000 different isolates representing 86 emm-types and 245 detected speQR in strains belonging to emm9, 15, 18, 42, 53, 58, 60, 77, 87, 94 and 169, 246 although both speQR positive and negative lineages existed within these genotypes 247 (Supplementary Figure 2). Both emm77 and emm87 have been reported as common 248 causes of invasive disease in various countries [17]. 249 Like the majority of other superantigens, both speQ and speR carry two of the three 250 classic superantigen motifs. The third was absent in speQ, as also observed in the 251 closest relative speK, although present in speR and may relate to the different mitogenic 252 potential; 10-fold more SPEQ than SPER was required to generate an equivalent 253 mitogenic response. The mitogenic activity of both SPEQ and SPER was 10-100 fold 254 lower than that of SPEC. This may limit contribution of SPEQ/R to virulence in the 255 presence of much more potent superantigens. The majority of isolates whose genomes 256 tested positive for speQ/R also carried at least one other superantigen genes. 257 Across the entire collection of 1441 USA isolates (previously all typed for the 11 known 258 superantigen genes), the prevalence of speQR was 6%, similar to speL (5%) and speM 259 (6%) [4]. The most commonly found superantigen gene within the USA collection was 260 speG (93%) followed by smeZ (91%), speC (51%), speJ (41%), speA (26%), speH 261 (25%), speI (23%), ssa (10%), speK (9%) [4]. Of those that were positive for speQR, 262 the prevalence of smeZ was still high (94%) and similar for speK (11%) and speC 263 (40%), but fewer were positive for speG (54%) as well as speA (4%), speH (4%), speI 264 (1%), speJ (28%), and more were positive for ssa (41%), speL (16%) and speM (16%). 265 This may, however, reflect an association of superantigen complement with emm-type. 266 At least one superantigen gene was detected in all 1441 USA isolates, except for four 267 (of 5) emm60 isolates and one (of 54) emm77 isolate [4]. We identified that these five 268 'superantigen negative' isolates were positive for speQR, consistent with our initial 269 finding that two 1930s emm60 were only positive for speQR and no other known 270