Logfile logs/subsampling_proximity_asia_israel_country.txt:
Traceback (most recent call last):
File “/home/vishwajeet/data/ncov/scripts/get_distance_to_focal_set.py”, line 155, in
focal_seqs_dict = calculate_snp_matrix(focal_seqs, consensus = ref, ignore_seqs=args.ignore_seqs)
File “/home/vishwajeet/data/ncov/scripts/get_distance_to_focal_set.py”, line 73, in calculate_snp_matrix
raise ValueError(‘Fasta file appears to have sequences of different lengths!’)
ValueError: Fasta file appears to have sequences of different lengths!
I have used sanitize sequences.py and sanitize metadata.py before the run.
Hi @vrmarathe - can you share the builds.yaml file you are running the workflow with?
I would double check that the sequences in results/filtered_israel-data.fasta.xz and results/asia_israel/sample-country.fasta have sequences in them and the sequences there are of the same length. (They should be, but worth double checking this.)
Hi @james, I am using the following file, I am using msa_0908.fasta (aligned file from GISAID) as a input. aligned: data/sequences_gisaid.fasta.gz. I have used the cleaning script from nextstrain to generate this.
# Define input files.
inputs:
- name: israel-data
metadata: data/metadata_gisaid.tsv.gz
aligned: data/sequences_gisaid.fasta.gz
builds:
asia_israel:
subsampling_scheme: country
region: Asia
country: Israel
# Here, USA is in North America
files:
auspice_config: "my_profiles/example/my_auspice_config.json"
description: "my_profiles/example/my_description.md"
Hi @vrmarathe - the way you have defined your inputs is for aligned sequences, but from the error you are getting I’m guessing they are unaligned sequences. I would try changing that section to:
Hi @james , I have sucessfully ran nextstrain for israel. Thank your for the help.
Is there a way to skip the tree-building for other countries. I am just looking to get the mutation data like amino acid mutations and nucleotide mutations ?
Sure - there are a bunch of ways you could get this information - you could stop this pipeline at the relevant step – alignment / masking / filtering, depending on the exact data you want – and examine the alignments; we are using nextalign for this. Alternatively you could use nextclade (this also uses nextalign internally).