Contextual strain list from augur filter

Hi all,

I recently found a majority of contextual sequences were dropped in the combine_samples steps. I then traced back to the subsample step which generates contextual strain list (e.g. sample-global.txt), and found that most of these sampled strains were not in “priorities_country.tsv”, “proximity_country.tsv” or “combined_sequences_for_subsampling.fasta”. I am wondering how these strains were sampled (listed in the strain list) even if they were not in the .fasta file or the files relevant to the priority.

I compared the subsampled step in the previous workflow. It seems the old workflow considered sequence data in the step. I don’t know if I need to manually change the role to solve the issue or the problem is caused by other reasons.

I attached my entire line for subsamping below -

augur filter --metadata results/combined_metadata.tsv.xz --include defaults/include.txt --exclude defaults/exclude.txt --min-date ‘2021-01-01’ --exclude-where ‘country=XX’ --priority results/XX_region/priorities_country.tsv --group-by year month --subsample-max-sequences 2000 --output-strains results/XX_region/sample-global.txt

Thanks!