Best practices for reporting workplace outbreaks/super spreading event data

We are posting new data from a workplace out break that has been reported publicly. We have about half of the 30 outbreak genomes sequenced. We normally report the County and State residency of host (when available). We have been going back and forth internally as to what to do in this case to 1. make the data most useful to others and 2. ensure the info does make the individuals (indirectly identifiable). We are wondering what is best practice for this for a GISAID submission and downstream in a nextstrain build. A few options seem to be:

  1. In location, continue listing the County/State of residence and just add to the additional location information field the workplace (e.g., Workplace X).
  2. In location, list the location of the workplace (state or County/state) and add to the additional location information field the workplace (e.g., Workplace X).
  3. Other thoughts?

Finally, I’m not 100% what to list in the Outbreak detail field. Anyone else using this?

Hi @oroak, thanks for reaching out! I can only give advice based on what we do here at Nextstrain, so certainly might be worth finding out what other pipelines or people using the GISAID data might do!

For us, it would be most useful to continue putting Country/State into the location field. This is often reported in a pretty standard manner by most submitters, so we’re able to parse this automatically when it contains standard levels like states/regions, counties/cities etc - and that makes things much easier for us! For things that fail the parser, we have to adjust manually.

We do then take information in the ‘additional location information’ (and ‘additional host information’) field - sometimes this has extra information like county or city, so we’ll add this to our location information. In other cases we can’t integrate the data into Nextstrain itself (like workplace), but anything in the ‘additional’ fields is done manually to semi-manually (there’s so much variation in what people put in them!) so it’s easy for us to handle these appropriately.

I do not actually know anything about the ‘Outbreak’ field myself! Unfortunately it doesn’t seem to a field one can search by on the GISAID website, so I’m also unsure how others have used it! It may well be that using this to indicate your samples are all part of the same outbreak(s) might be most appropriate. You might want to reach out to GISAID and see if they have any information on how the field is intended!

I hope this helps!

@emmahodcroft Thanks for weighing in. As you said, normally we include in the location field Country/State/ & County (if available). But this is of the host’s home location. Is sounds like for Nextstrain at least to switching location to the Country/State/County of exposure might make more sense?

We may be just overthinking this, because we often just don’t know the exposure source location. But we we working forward County level builds now as well as expanded sequencing of additional outbreaks so standardizing how we handle this in my goal. Since we know in this case, someone living is County A was actually exposed in County B, it seems that is would make the most sense to not have the substrain showing up in County A (unless there was additional community spread). Disclaimer, I’m a molecular geneticists and not an epidemiologist, so we’ve love to learn from the experts.

Ah I see! I’m sorry, I missed the point about exposure!

For this case, we actually can and do incorporate exposure information if it differs from where the person lives. We generally assume that wherever the sample was taken is the place where the person was tested and was exposed. If the only information we have is what’s in the ‘location’ field on GISAID, then this is our implicit assumption.

We do look at information in additional information, though - and incorporate this if it speaks about ‘travelled from’ or ‘exposed in’ (or similar)! We do this by modifying internal fields that we track, some of which show up on the website: (the colored sequences have a different country of exposure than the country the sequence is from).

However, we don’t actually include the location_exposure (county-level) field in our analyses right now. We can track it and include it in our own files, but it doesn’t get used in analysis or exported in metadata.tsv – mostly just because we generally don’t have this level of detail! So that might be a big caveat.

I think there’s then two options that would fit with how we normally interpret the data:

  • You could put the exposure location in the ‘additional location information’ field we would add it to our own internal files, but it won’t show up in any analysis or the metadata file
  • You could use the exposure location as the ‘giving’ location, and specify the ‘resident’ location in the additional info - we can’t note this in any way in our own database, but then the sequence will show up in the location where exposure happened, which may be most useful for analysis!

This is all very Nextstrain specific, of course - but that’s hopefully a little useful insight on how it would work on our end. Really exciting to hear you’re setting up County-level builds!

Hi Brian. I’m used to this sort of specific location information being referenced as:

but still providing State and County data. I can’t of course guarantee that this would pass IRB review, but this seems to be standard practice. The lat/long of the “nursing home B” would be fuzzed from actual lat/longs.

In terms of Nextstrain, this could be a separate location from normal county listing, or it could be a separate metadata column entirely, the way that folks at the Broad do for “conference exposure” here:

Thanks @trvrb. Maybe we just have a forward looking IRB, but we don’t have to dance around the workplace exposure. We can list the exact business and location. Likely because now that Oregon Health Authority publicly reports all of these of a given size outbreak in real time. While we can certainly add additional fields in a custom build, I noted none of the exposure data is in the MGH GISAID submission data. We’d like to make this information available for others beyond our site in the GISAD data release (even if this isn’t one of the fields normally captured in nextstrain).

For the main location indicated in GISAID, my take away from yours and @emmahodcroft’s comments is that is would make most sense to list the likely exposure location (that is the workplace State/County location) and not the home residency location. This would actually anonymize things a bit more for the individual hosts. Portland is a bit interesting in that we have larger population counties covering the city limits and additional smaller ones in the metro area.

I don’t know how much this really matters for the epi, but in this case (and ones it the future) it is a difference of having a substrain showing in 3-4 counties versus 1 on the map (which is why I’m interested in a “best practice” long-term).

1 Like

That’s really interesting about IRB. In terms of GISAID, yes, I’d have main Location as something like: North America / USA / Oregon / Columbia County where the particular county is the exposure location. And then have Additional location information list as much detail as you’re comfortable providing.