I tried to make a build for only Pango lineage B.1.160 and all sub-lineages, but I get an error in augur filter
which I think is due to the way I specified the list of lineages.
This is from my build file:
builds:
turbuss:
region: global
country: Norway
subsampling_scheme: turbuss-scheme
pango_lineage: "'B.1.160', 'AB.1', 'B.1.160.1', 'B.1.160.2', 'B.1.160.3', 'B.1.160.4', 'B.1.160.5', 'B.1.160.6', 'B.1.160.7', 'B.1.160.8', 'B.1.160.9', 'B.1.160.10', 'B.1.160.11', 'B.1.160.12', 'B.1.160.14', 'B.1.160.15', 'B.1.160.16', 'B.1.160.17', 'B.1.160.18', 'B.1.160.19', 'B.1.160.20', 'B.1.160.21', 'B.1.160.22', 'B.1.160.23', 'B.1.160.24', 'B.1.160.25', 'B.1.160.26', 'B.1.160.27', 'B.1.160.28', 'B.1.160.29', 'B.1.160.30', 'B.1.160.31', 'B.1.160.32', 'B.1.160.33'"
subsampling:
turbuss-scheme:
country:
group_by: "country"
max_sequences: 4000
# query: --query "(country == '{country}') & (pango_lineage == '{pango_lineage}')"
query: --query "(country == '{country}') & (pango_lineage in '{pango_lineage}')"
#query: --query "(country == '{country}') & ('{pango_lineage}' in pango_lineage)"
related:
group_by: "country year month"
max_sequences: 1000
# exclude: --exclude-where "country!='{country}'"
# query: --query "(pango_lineage == '{pango_lineage}') & (country != '{country}') "
query: --query "('{pango_lineage}' in pango_lineage) & (country != '{country}') "
# sampling_scheme: --probabilistic-sampling
priorities:
type: "proximity"
And this is from the output:
augur filter --sequences results/filtered_turbuss.fasta.xz --metadata results/sanitized_metadata_turbuss.tsv.xz --sequence-index results/combined_sequence_index.tsv.xz --include my_profiles/fhi/include.txt --exclude defaults/exclude.txt --query "(country == 'Norway') & (pango_lineage in ''B.1.160', 'AB.1', 'B.1.160.1', 'B.1.160.2', 'B.1.160.3', 'B.1.160.4', 'B.1.160.5', 'B.1.160.6', 'B.1.160.7', 'B.1.160.8', 'B.1.160.9', 'B.1.160.10', 'B.1.160.11', 'B.1.160.12', 'B.1.160.14', 'B.1.160.15', 'B.1.160.16', 'B.1.160.17', 'B.1.160.18', 'B.1.160.19', 'B.1.160.20', 'B.1.160.21', 'B.1.160.22', 'B.1.160.23', 'B.1.160.24', 'B.1.160.25', 'B.1.160.26', 'B.1.160.27', 'B.1.160.28', 'B.1.160.29', 'B.1.160.30', 'B.1.160.31', 'B.1.160.32', 'B.1.160.33'')" --group-by country --subsample-max-sequences 4000 --output results/turbuss/sample-country.fasta --output-strains results/turbuss/sample-country.txt 2>&1 | tee logs/subsample_turbuss_country.txt
Traceback (most recent call last):
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/util_support/metadata_file.py", line 45, in metadata
metadata = metadata.query(self.query).copy()
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/frame.py", line 3469, in query
res = self.eval(expr, **kwargs)
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/frame.py", line 3599, in eval
return _eval(expr, inplace=inplace, **kwargs)
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/computation/eval.py", line 342, in eval
parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/computation/expr.py", line 798, in __init__
self.terms = self.parse()
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/computation/expr.py", line 817, in parse
return self._visitor.visit(self.expr)
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/computation/expr.py", line 397, in visit
raise e
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/computation/expr.py", line 393, in visit
node = ast.fix_missing_locations(ast.parse(clean))
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
(country =='Norway')and (pango_lineage in ''B .1 .160 ', 'AB .1 ', 'B .1 .160 .1 ', 'B .1 .160 .2 ', 'B .1 .160 .3 ', 'B .1 .160 .4 ', 'B .1 .160 .5 ', 'B .1 .160 .6 ', 'B .1 .160 .7 ', 'B .1 .160 .8 ', 'B .1 .160 .9 ', 'B .1 .160 .10 ', 'B .1 .160 .11 ', 'B .1 .160 .12 ', 'B .1 .160 .14 ', 'B .1 .160 .15 ', 'B .1 .160 .16 ', 'B .1 .160 .17 ', 'B .1 .160 .18 ', 'B .1 .160 .19 ', 'B .1 .160 .20 ', 'B .1 .160 .21 ', 'B .1 .160 .22 ', 'B .1 .160 .23 ', 'B .1 .160 .24 ', 'B .1 .160 .25 ', 'B .1 .160 .26 ', 'B .1 .160 .27 ', 'B .1 .160 .28 ', 'B .1 .160 .29 ', 'B .1 .160 .30 ', 'B .1 .160 .31 ', 'B .1 .160 .32 ', 'B .1 .160 .33 '')
^
SyntaxError: Python keyword not valid identifier in numexpr query
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jonr/miniconda3/envs/nextstrain/bin/augur", line 10, in <module>
sys.exit(main())
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/__main__.py", line 10, in main
return augur.run( argv[1:] )
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/__init__.py", line 75, in run
return args.__command__.run(args)
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/filter.py", line 312, in run
filtered = set(filter_by_query(list(seq_keep), args.metadata, args.query))
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/filter.py", line 92, in filter_by_query
filtered_meta_dict, _ = read_metadata(metadata_file, query)
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/utils.py", line 74, in read_metadata
return MetadataFile(fname, query).read()
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/util_support/metadata_file.py", line 21, in read
self.check_metadata_duplicates()
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/util_support/metadata_file.py", line 55, in check_metadata_duplicates
self.metadata[self.key_type]
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/util_support/metadata_file.py", line 47, in metadata
raise ValueError(
ValueError: Error applying pandas query to metadata: `(country == 'Norway') & (pango_lineage in ''B.1.160', 'AB.1', 'B.1.160.1', 'B.1.160.2', 'B.1.160.3', 'B.1.160.4', 'B.1.160.5', 'B.1.160.6', 'B.1.160.7', 'B.1.160.8', 'B.1.160.9', 'B.1.160.10', 'B.1.160.11', 'B.1.160.12', 'B.1.160.14', 'B.1.160.15', 'B.1.160.16', 'B.1.160.17', 'B.1.160.18', 'B.1.160.19', 'B.1.160.20', 'B.1.160.21', 'B.1.160.22', 'B.1.160.23', 'B.1.160.24', 'B.1.160.25', 'B.1.160.26', 'B.1.160.27', 'B.1.160.28', 'B.1.160.29', 'B.1.160.30', 'B.1.160.31', 'B.1.160.32', 'B.1.160.33'')` (Python keyword not valid identifier in numexpr query (<unknown>, line 1))
Waiting at most 5 seconds for missing files.
MissingOutputException in line 368 of /home/jonr/Prosjekter/Nextstrain/ncov/workflow/snakemake_rules/main_workflow.smk:
Job Missing files after 5 seconds:
results/turbuss/sample-country.fasta
results/turbuss/sample-country.txt
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 16 completed successfully, but some output files are missing. 16
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 589, in handle_job_success
File "/home/jonr/miniconda3/envs/nextstrain/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 252, in handle_job_success
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /home/jonr/Prosjekter/Nextstrain/ncov/.snakemake/log/2021-08-18T103107.376129.snakemake.log