Epidemiological data and genome sequencing reveals that nosocomial transmission of SARS-CoV-2 is underestimated and mostly mediated by a small number of highly infectious individuals

Objectives: Despite robust efforts, patients and staff acquire SARS-CoV-2 infection in hospitals. We investigated whether whole-genome sequencing enhanced the epidemiological investigation of healthcare-associated SARS-CoV-2 acquisition. Methods: From 17-November-2020 to 5-January-2021, 803 inpatients and 329 staff were diagnosed with SARS-CoV-2 infection at four Oxfordshire hospitals. We classified cases using epidemiological definitions, looked for a potential source for each nosocomial infection, and evaluated genomic evidence supporting transmission. Results: Using national epidemiological definitions, 109/803(14%) inpatient infections were classified as definite/probable nosocomial, 615(77%) as community-acquired and 79(10%) as indeterminate. There was strong epidemiological evidence to support definite/probable cases as nosocomial. Many indeterminate cases were likely infected in hospital: 53/79(67%) had a prior-negative PCR and 75(95%) contact with a potential source. 89/615(11% of all 803 patients) with apparent community-onset had a recent hospital exposure. Within 764 samples sequenced 607 genomic clusters were identified (>1 SNP distinct). Only 43/607(7%) clusters contained evidence of onward transmission (subsequent cases within ≤ 1 SNP). 20/21 epidemiologically-identified outbreaks contained multiple genomic introductions. Most (80%) nosocomial acquisition occurred in rapid super-spreading events in settings with a mix of COVID-19 and non-COVID-19 patients. Conclusions: Current surveillance definitions underestimate nosocomial acquisition. Most nosocomial transmission occurs from a relatively limited number of highly infectious individuals.


Introduction
Limiting acquisition of SARS-CoV-2 by patients and staff in hospitals is an infection prevention and control (IPC) priority. Despite robust efforts, both patients and staff are infected in hospitals; 10-40% of hospital-diagnosed COVID-19 cases are thought to have been acquired in hospital, with 8700 deaths following nosocomial infection reported in the UK, [1][2][3][4] and higher rates of seroconversion are reported in healthcare workers compared to the general population. 5 , 6 Distinguishing which patients have acquired infection in hospital allows potential transmission events to be investigated. Epidemiological rules are frequently used for nosocomial classification and outbreak investigation, using spatial and temporal patient data to make assumptions about acquisition and transmission. However, such rules may exclude plausible transmission and leave uncertainty around the source of individual infections. SARS-CoV-2 whole-genome sequencing (WGS) has been proposed as an adjunct to assist hospital outbreak investigation. Individuals infected with identical or near-identical ( ≤ 1 SNP) viruses, are more likely to be linked in a transmission chain than those with more distantly related viruses, as demonstrated by previous retrospective studies that have utilised WGS to identify nosocomial infections and outbreaks. [7][8][9][10][11][12] We investigated whether sequencing could enhance epidemiological investigation of healthcare-associated SARS-CoV-2 acquisition in two areas: i) confirming/excluding nosocomial acquisition and ii) understanding the role of outbreaks in nosocomial acquisition. We highlight the benefits and pitfalls of this approach, to help guide local practice in individual centres.

Study design, setting and participants
The Oxford University Hospitals NHS Foundation Trust comprises four hospitals with ∼1100 beds (mostly in 4-bed bays within wards of 20-30 beds) and ∼13,500 staff. The four hospitals are presented as "A", a large acute hospital admitting both COVID-19 and non-COVID-19 patients; "B", a smaller general hospital admitting both COVID-19 and non-COVID-19 patients; "C", a hospital focused predominantly on cancer care; and "D", a largely elective orthopedic hospital, with C and D not routinely admitting COVID-19 patients. Ward admission and discharge dates were available for all patients from 14 days before the first positive PCR, and the work location for those staff working exclusively or predominantly on a single ward. Public Health England (PHE) guidance for COVID- 19 IPC was followed throughout the study, including the use of patient pathways, personal protective equipment (PPE), symptomatic and asymptomatic staff and patient testing (summarised in Supplement). 13 Infections in patients and hospital staff were detected by symptomatic and asymptomatic SARS-CoV-2 PCR testing of combined nasal and oropharyngeal swabs by Thermo Fisher TaqPath assay (2553/2773, 92% samples) and other platforms (details in Supplement). PCR-positive samples were stored at −80 °C for WGS. Sequencing was attempted on all stored samples, regardless of cycle threshold (Ct) value, using the ARTIC LoCost protocol 14 (details in Supplement).

Definitions
Nosocomial SARS-CoV-2 infection was defined following NHS England and NHS Improvement definitions: 15 • Community-Onset, PCR-positive ≤ 2 days after hospital admission/attendance.
Enhanced nosocomial classification -prior negative PCR results (available as a result of admission screening, weekly ward screening and symptomatic testing) and admissions in the 14 days prior to diagnosis were used to determine whether additional support existed for nosocomial acquisition.
For the purpose of identifying plausible transmission events, indicative incubation periods were defined as 1-14 days prior to a positive PCR test. 16 Infectious periods were defined from 4 days before to 7 days after a positive PCR for patients, and 4 days before to the day of the positive PCR test for staff (reflecting that staff isolated at home for 10 days following a positive test). 17 Mean serial intervals, i.e. the duration between the symptom-onset time of in a transmission donor and recipient, have been estimated at 4-7 days, here 5 days is used. 18 , 19 Individuals acquiring SARS-CoV-2 are denoted "recipients", and those transmitting infection as "donors". A "plausible donor" for a recipient, is identified by the donor and recipient being present on the same ward ("ward contact"), during the donor's infectious period and the recipient's incubation period. "Hospital contact" was defined as presence in the same hospital on the same calendar day, during the donor's infectious period and the recipient's incubation period.
Epidemiological outbreaks were defined following PHE guidance: 20 ≥ 2 cases of COVID-19 in patients or hospital staff 'associated with the same setting', with ≥ 1 case (if a patient) meeting the definition of probable/definite nosocomial infection, ending when no cases were diagnosed for 28 days. Here 'associated with the same setting' was defined as a ward contact.
Genomic outbreaks were defined as for epidemiological outbreaks with the additional requirement for individual viral sequences to be genomically linked. Genomically linked sequences were defined as those sharing ≤ 1 SNPs, an association close enough to support transmission whilst minimising over-calling of linkage in non-nosocomial cases (see Supplement). Genomic clusters were defined as for genomic outbreaks, but without the requirement for ≥ 1 definite/probable nosocomial case.

Epidemiologic and genetic analysis
We initially classified all cases according to epidemiological definitions above, and then tested if there was epidemiological evidence of a potential source case for each new definite/probable/indeterminate patient and staff infection. We then evaluated how many of these epidemiologically linked cases were within ≤ 1 SNPs of each other, i.e. had genomic evidence to support transmission. Following this, we searched for epidemiologically defined outbreaks involving infected patients and staff. Community-onset cases were only included as part of an outbreak if they could have plausibly seeded the outbreak (i.e. their diagnosis preceded the first staff or nosocomial patient case on that ward), and not if admitted during an ongoing outbreak.
Combined epidemiological and genomic analysis was performed using R version 4.0.2 21 , with visualisation using ggplot2 22 and igraph 23 packages. Multiple sequence alignment and phylogenetic analysis were performed with MAFFT) 24 and IQTree 25 respectively; phylogenies were prepared and visualised using Treeswift 26 and Toytree 27 .  The epidemiological datasets analysed are not publicly available as they contain personal data but are available from the Infections in Oxfordshire Research Database (https://oxfordbrc. nihr.ac.uk/research-themes-overview/antimicrobial-resistance-and-modernising-microbiology/infections-in-oxfordshire-researchdatabase-iord/), subject to an application and research proposal meeting the ethical and governance requirements of the Database.

Nosocomial classification
Based on standard national definitions, 188/803 (23%) inpatient infections were classified as nosocomial, subgrouped as definite ( n = 51), probable ( n = 58) or indeterminate ( n = 79). In the UK, patients who acquired SARS-CoV-2 infection in hospital but were discharged before testing positive are not reported as nosocomial and so are not accounted for in these numbers, in part because community testing results are not routinely available to hospitals.

"Enhanced" nosocomial classification
Admission screening (within 24 h) was performed in 916/1104 (83%) of admissions. All 51 definite cases had prior-negative PCRs earlier in the same admission, providing additional support for nosocomial acquisition. 56/58 (97%) probable cases had priornegative PCRs. Although described as indeterminate, 53/79 (67%) of those diagnosed 3-7 days after admission had ≥ 1 prior negative sample ( Table 2 ). Indeterminate cases with a prior-negative PCR were diagnosed later during admission (median day 5, range 3-7), than those without a prior-negative (median day 3.5, range 3-7; p = 0.001). Therefore the greatest uncertainty around nosocomial acquisition exists for the 26 indeterminate cases without prior-negative PCRs and relatively short intervals between admission and first positive PCR. By definition, those without a priornegative PCR had no prior PCR tests obtained in the same hospital admission, i.e. admission testing was not done.
Amongst the 615 community-onset cases, a retrospective lookback at the 14 days prior to SARS-CoV-2 diagnosis, revealed 89/803 (11%) had prior hospital admissions during which SARS-CoV-2 infection could have been acquired, and 69/89 (78%) of these had ward contact with a plausible donor during that admission, suggesting reporting definitions based only on current admissions under-estimate the extent of nosocomial infection.

Epidemiological outbreak identification
Applying an epidemiological outbreak definition considering all ward overlaps led to the identification of 3 outbreaks, the largest containing over 700 individuals, highlighting that it is an unworkable definition when inpatient prevalence is high. Therefore, to more closely replicate IPC practice and provide more interpretable data, the definition of an epidemiological outbreak was restricted to only include ward overlaps with patients and staff on the ward of nosocomial diagnosis. A total of 246 individual infections (46 definite, 53 probable, 56 indeterminate, 7 community-onset and 84 staff), occurred in one of 25 outbreaks on 24 different wards. The median outbreak size was 8 (IQR 3-12, range 2-32); 99/109 (91%) definite and probable cases occurred in outbreaks.
Can genomics help to confirm/exclude nosocomial acquisition?

Genomics improves precision for cases without prior-negative PCR results
Genomics helped clarify uncertainty around cases without a prior-negative PCR ( Table 2 ). In the probable group, two individuals lacked a prior-negative PCR; one was sequenced alongside a donor and confirmed as genomically-linked ( ≤ 1 SNP), and therefore likely nosocomially-acquired. In contrast, in the indeterminate group, 26 individuals lacked a prior-negative PCR. 15/24 (63%) were sequenced alongside ≥ 1 potential donors; only 6/15 (40%) were genomically-linked. Hence absence of a prior-negative PCR in the indeterminate group was associated with a lower likelihood of nosocomial-acquisition, but sequencing did support some of these infections having a nosocomial source.

Genomics can confirm nosocomial acquisition in "community-onset" cases
Of the 69 individuals with community-onset infection with a prior hospital admission and a plausible donor, 37 were sequenced alongside ≥ 1 plausible donor(s). 17/37 were genomicallylinked, indicating 17 additional infections previously categorised as "community-associated" were plausibly nosocomially acquired ( Table 2 ).

Undiagnosed/unsequenced individuals limit utility of genomic data
Amongst the 116 nosocomial cases sequenced, 13 (11%) were genetically-linked to ≥ 1 other case within 0-1 SNPs but with no documented ward or hospital contact (either patient or staff). These may represent community acquisition in the case of indeterminate cases, but may also be due to undiagnosed/unsequenced individuals providing the missing epidemiological link, e.g. due to incomplete admission and ward-based patient screening and undiagnosed staff cases.
Genomic data was unable to provide confirmation of nosocomial acquisition for 22/116 (19%) of sequenced nosocomial cases. Although we can conclude that these cases were not linked to any of the other cases sequenced, we cannot use this information to exclude nosocomial acquisition from undiagnosed/unsequenced individuals, due to incomplete sampling/sequencing.

Epidemiological outbreaks often contain multiple genomically-distinct introductions
Genomic data were used to refine the epidemiologically-defined outbreaks ( Fig. 2 B). Of the 246 staff and patients in an epidemiological outbreak, 171 were sequenced (26/46 definite, 37/53 probable, 33/56 indeterminate patients, 2/7 community-onset and 73/84 staff); 21 of the 25 epidemiological outbreaks had ≥2 members sequenced and are described further. One epidemiological outbreak was confirmed genetically as a single outbreak (each case within 0-1 SNPs of another) and in 7 'outbreaks' no individuals were genetically linked (all ≥ 2 SNPs from all other cases). In the remaining 13 epidemiological outbreaks there was a mix of geneticallylinked and unlinked cases; 9 consisted of a single geneticallydistinct outbreak in addition to unlinked cases, 3 had two distinct genomic outbreaks and one ward had evidence of 3 ge-nomic outbreaks. Overall 116/171 sequenced cases from epidemiologically defined outbreaks were confirmed to be in a genomicallysupported outbreak (17/26 (65%) definite, 34/37 (92%) probable, 26/33 (79%) indeterminate, 1/2 (50%) community-onset and 38/73 (52%) staff). This highlights that epidemiological investigation may overestimate the size of outbreaks, which often occur alongside genetically-distinct introductions.

Most cases do not lead to onwards hospital transmission
Considering the cohort as a whole, rather than just those in an epidemiological outbreak as above, of the 764 individuals with samples sequenced, 200 were placed in one of 43 genomic clusters on the basis of being linked to at least one other case within 0-1 SNPs and 564 were singletons. Therefore during the period of study, SARS-CoV-2 was introduced to OUH on at least 607 occasions, with evidence of onward transmission in 43 clusters (7% of introductions) ( Fig. 3 ). The median cluster size was 2 (range 2-32). Of the 43 clusters, 17 contained both staff and patients, 16 patients only and 10 staff only. 16/43 genomic clusters were classified as genomic outbreaks (i.e. contained ≥ 1 definite/probable nosocomial case). Compared to the epidemiological estimate that 91% of nosocomially-acquired cases were linked to outbreaks, combining epidemiological and genomic data suggested 52/69 (75%) of all sequenced definite/probable nosocomial cases occurred in one of 16 genomic outbreaks, whereas 26% occurred as genomic singletons, ≥ 2 SNPs from any other case.
Use of hospital-level ward data, accounting for all patient moves before and after testing PCR positive, led to identification of an unfeasibly large epidemiological outbreak of over 700 individuals. However, using these data in combination with WGS provides a more plausible identification of 15 additional individuals linked to outbreaks, who were missed by ward-based application of the outbreak definition due to patient ward moves during their incubation period, highlighting that outbreaks can span multiple wards.
Only 7/25 epidemiologically-defined outbreaks started with a known community-onset case, 2/7 were successfully sequenced, and only 1 confirmed to be genetically-related to subsequent cases within 0-1 SNPS. Despite the partial sequencing of communityonset cases, these data are consistent with limited direct patientpatient spread from known community-onset SARS-CoV-2 infected patients. Approximately two-thirds of staff infections were genetically distinct in this dataset, with 170/261 (65%) > 1 SNP different to all other cases, across 90 work locations. Although these cases occurred on wards with existing outbreaks, they were more common in areas with transient patient contact e.g. outpatient areas and dialysis units.
The distribution of genomic clusters differed by hospital site; consistent with the extent of exposure to COVID-19 admissions; only isolated/single cases occurred at hospital "D" and only two clusters observed in hospital "C"(one staff pair and one trio containing 1 staff member and 2 patients). In contrast, hospitals "A" and "B" saw multiple larger clusters (notably, the proportion of cases sequenced was the same across all sites). In addition to differences in COVID-19 case load/infectious pressure, other factors may have played a role, such as: patient pathways including co-location of non-COVID-19 and COVID-19 cohort wards, estates/facilities, including number of patient side rooms and ventilation, differences in staff mobility between COVID-19 and non-COVID-19 wards, staff facilities (communal/break areas) and adherence to social distancing.

Two patterns of nosocomial acquisition seen
Broadly two patterns of nosocomial acquisition were seen; patterns are shown on a representative example phylogeny in Fig. 4 .  Fig. 2. Outbreaks containing at least one definite or probable nosocomial case. (A) Using epidemiological data alone (nodes are linked purely using ward-based contacts) isolated grey nodes indicate individuals in a genomic but not epidemiologically-defined outbreak, (B) Using both epidemiological data and genomic data (nodes are linked both epidemiologically and genomically), isolated nodes indicate individuals in an epidemiological but not genomic outbreak. Each node represents an individual, all individuals in an epidemiological or genomic outbreak are shown in panel A with the sequenced subset in panel B. Node colours indicate the epidemiological group, grey nodes were not assigned to an epidemiological group. Lines indicate ward contact within an outbreak, line length is insignifiant. This demonstrates that epidemiological outbreaks consist of multiple genomic outbreaks and individual introductions, and conversely genomic outbreaks span multiple wards/epidemiological outbreaks. 69/176 (39%) nosocomial cases were not sequenced.
Superspreading events. These are characterised by a rapid accumulation of multiple cases within 1-2 serial intervals, e.g. > 5 cases within a 7-10 day period, implying multiple transmissions per serial interval. 8 such events occurred during this study, 7 at hospital "A", and one at hospital "B", typically occurring on non-COVID-19 wards, with open bays, involving both staff and patients, and in specialties with patients highly dependent on nursing care (e.g. trauma, acute medicine, neurology). The median outbreak size was 9 (range 7-32) and median duration 12 days (range 7-35 days). Although infrequent, these 8 superspreading events accounted for 80% of cases linked to a genomic outbreak. With a serial interval of 5 days, some outbreaks may represent exposure to a single superspreading infection, but most are subsequently propagated amongst staff/patients. Incomplete sampling and asymptomatic individuals without symptom onset dates prevents confident identification of the source of each outbreak, however, in two clusters, staff cases preceded patient cases, so staff could have acted as an index events. In the remaining six outbreaks, there were no cases diagnosed prior to the first definite/probable nosocomial case to act as a plausible index, therefore the outbreak was likely seeded by an undiagnosed or unsequenced patient/staff/visitor. No outbreaks were seeded by direct patient-patient transmission from known positive patients, however we cannot exclude a non-sequenced cross-covering staff member providing the missing epidemiological link, by acquiring infection from a known positive patient and seeding an outbreak on a non-COVID ward.
Recurrent introductions. These are characterised by slow "rumbling" accumulation of nosocomial and staff cases on a ward, on both non-COVID and mixed wards with side rooms accommodating COVID and non-COVID patients ( Fig. 4 ). They consist of multiple introductions of distinct viral variants over a more prolonged period of time, giving the appearance of a slowly progressing outbreak, but with no, or minimal, onward transmission within the unit. Genomic data is required to distinguish recurrent introductions from genomic outbreaks.
Recurrent introductions involving one or more definite/probable case occurred on 6 different wards across hospitals "A ", "B " and "D ". Each mimicked an outbreak with between 3 and 6 staff and nosocomial cases occurring on the ward over 2-6 week periods, however all were genetically distinct introductions.

Discussion
In this retrospective cohort study of healthcare-associated SARS-CoV-2 transmission using combined epidemiological and sequencing data we make several key findings that challenge current surveillance definitions and reveal most nosocomial transmission Fig. 4. Example phylogeny demonstrating a superspreading event and recurrent introductions on a single ward at hospital "B". The node and label colour indicates broad epidemiological classification (community-onset, nosocomial, staff). The tip label gives the day of the outbreak the individual tested positive followed by the full epidemiological classification. The scale bar represents SNP distance. All cases were classified as part of the same epidemiological outbreak, however WGS reveals multiple introductions. The "community-onset" case diagnosed on day 44 of the outbreak, had a previous hospital admission with exposure on this ward during a superspreading event.
occurs from a relatively limited number of highly infectious individuals.
Our findings suggest that the majority of cases occurring after > 7 days in hospital are nosocomially acquired, for example 107/109 probable/definite cases had ≥ 1 prior-negative PCR test in the same admission, and most had plausible ward-based sources for their infection. However, surveillance definitions identifying probable/definite nosocomially-acquired cases (on the basis of prior hospital stays of > 7 days) likely under-estimate the extent of acquisition in hospital. In the UK, nationally reported nosocomial figures exclude indeterminate cases (diagnosed on day 3-7 of their hospital stay); however several of these cases in our study had prior-negative PCR tests during the same admission, plausible exposure to infectious patients, and genomic-linkage with other cases, all supporting acquisition in hospital, particularly for cases diagnosed on days 5-7. Furthermore, surveillance definitions considering only the current hospital stay, as in the UK, do not capture nosocomial acquisition during a recent prior hospital stay, for which we found both epidemiological and genomic evidence. Consideration should be given to revising surveillance definitions to account for prior-negative tests and infections diagnosed < 7 days into admission.
Consistent with defining most cases within 2 days of admission as community-acquired, genomics demonstrated most cases in staff and patients are genomically-distinct from all others in the hospital; there were 607 genomic clusters within the 764 samples sequenced. This is similar to WGS-based findings in other healthcare-associated infections over the last decade. 28 , 29 However, in contrast to other nosocomial infections, we found evidence that most nosocomial acquisition occurs in explosive superspreading events, with clusters of genomically-related cases occurring in short time periods, as observed by others for SARS-CoV-2 in both community and hospital settings. 9 , 30-33 .
WGS added most value when investigating outbreaks during periods of high SARS-CoV-2 prevalence, given high rates of ward-based contact with infected patients. The majority of epidemiologically-defined outbreaks consisted of multiple genomic introductions with some smaller genomic clusters. The role of staff in outbreaks is overestimated from epidemiological data alone, with genomics confirming only 52% of staff epidemiologically placed in an outbreak were genomically-linked, and the majority of sequenced staff cases were genomic singletons ( ≥ 2 SNPs from any other case). Additionally, in hospitals not routinely admitting COVID-19 patients, rates of transmission were low, suggesting that isolated acquisition from staff is relatively uncommon, and that transmission requires a 'perfect storm' of mixed COVID and non-COVID wards, emergency admissions and dependent patients accommodated in bays.
The main limitations of genomic data were two-fold. Firstly, although epidemiological data is available for all patients, genomic data is limited by sample availability and difficulty of generating sequences at low viral loads. Here 67% of the cohort were successfully sequenced, in line with other similar hospital cohorts (20-70%). 7 , 11 , 34 As such, genomic data does not enable nosocomial acquisition to be ruled out. Incomplete hospital sequencing datasets suffer from an 'absence of evidence' when attempting to exclude nosocomial acquisition, which should not be mistaken for 'evidence of absence' of nosocomial acquisition. This may be mitigated in the future by integrated community epidemiological and genomic datasets, and could be addressed through probabilistic inference methods that can account for missing data, or by further optimising sequencing yields. Future approaches to evaluating transmission could also consider proxy markers of infectiousness such as Ct values (reflecting viral loads).
Secondly, the rapid transmission of SARS-CoV-2 in relation to viral evolution and the short time spans of outbreaks are insufficient for substantial genetic variation to accumulate, and therefore genomic data alone is insufficient to confer linkage or resolve the ordering of transmission; a combination of epidemiological and genomic data is required. A 1 SNP cut-off for defining linkage captures the majority of cases genuinely linked to the cluster, with the compromise of including a few community-onset cases likely linked by chance. The sensitivity and specificity of this SNP threshold for defining linkage varies according to the point during the pandemic at which it is being applied, both in terms of time since the start of the pandemic (greater overall viral diversity afforded by later time points in the pandemic), and the current rate of transmission (locally reduced diversity during exponential periods of spread, such as was observed with the emergence of the alpha variant in the winter of 2020). Improvements in sensitivity and specificity to detect transmission might also be gained from considering patterns of intra-host variation. 35 , 36 Generally, however, joint epidemiologic and genomic analysis enable the limitations of one method to be compensated for by the strengths of the other, acknowledging both approaches are limited by undiagnosed cases.
Our data have several practical IPC implications. The small proportion of cases leading to detected onward transmission highlight that existing enhanced IPC practices are generally effective at preventing most patient-patient spread from known positive patient cases, e.g. via triaging of patients into pathways on admission and widespread diagnostic testing reducing contact between infectious/susceptible individuals. However, rates of nosocomial infection remain too high, with the highest rates in hospitals caring for both COVID-19 and non-COVID-19 patients.
When investigating nosocomial SARS-CoV-2 transmission, it is vital to consider the contributions of both patients and staff in initiating and amplifying transmission. The identification of superspreading events highlights the importance of screening of both asymptomatic patients and staff to help identify and control outbreaks early (whilst acknowledging that virus in up to half of positive staff may not genetically be part of the outbreak). Several strategies can be used and scaled according to the situation, from universal admission testing, weekly ward screening, asymptomatic staff lateral flow device testing as standard, scaling up to on-demand full-ward lateral flow and PCR screening, and more frequent regular screening if nosocomial cases are identified. If multiple cases are observed within a single serial interval, highly suggestive of a superspreading event, rapid action should be taken, which may involve temporary ward closure to mitigate secondary transmission, recognising that those recently infected have the highest viral loads 37 and are most infectious to patients and staff 38 . If resources allow, use of dedicated staff in high risk areas, and self-isolation at home for staff exposed to a high risk event, may also be appropriate. Challenges include recognising outbreaks spanning multiple areas and implementing effective testing and control measures, e.g. for patients who move between wards and staff who cross-cover multiple wards, including during nights and those contracted by outside agencies. Communication that patients discharged from a superspreading ward are at high risk for acquisition should lower the threshold for post-discharge SARS-CoV-2 screening/testing. Variations in rates of nosocomial transmission suggest screening should be prioritised on wards and in specialities with the highest risks (e.g. acute medicine, trauma, neurology in our setting). As vaccination-mediated reductions in inpatient COVID cases occur, it will be important to raise awareness that patients on low risk wards/pathways are still at risk of nosocomial acquisition, in addition to highlighting that in general outbreaks are caused by patients or staff not known to be positive.
This study demonstrates that retrospective analyses of genomic data is useful in some circumstances to guide future IPC practice, with results consistent with similar studies in the UK. 7-10 It remains to be seen whether the additional costs of generating and analyzing this genomic data near real-time ( < 48 hrs from sample to dissemination of results) are justified by additional IPC gains, or whether the rapid and rigorous application of gold standard epidemiological methods in response to fast accumulation of nosocomial PCR-based diagnoses is the key intervention. This question will be addressed by studies such as the COG-HOCI trial. 39 Regardless of WGS, there is a clear need for automated systems to rapidly assimilate epidemiological data tracking patients over space and time to allow transmissions based on locations other than ward of diagnosis to be quickly identified and fed to IPC teams.
In conclusion, epidemiological investigation can be enhanced by genomic data, to provide insights into nosocomial acquisition and outbreaks in the hospital setting, and provide practical insights to optimise IPC interventions.

Declaration of Competing Interest
DWE declares lecture fees from Gilead, outside the submitted work. No other author has a conflict of interest to declare.