While generating the nextstrain build for only Israel as a subsampling scheme. I am getting the following error.
determine priority for inclusion in as phylogenetic context by
genetic similiarity to sequences in focal set for build ‘asia_israel’.
python3 scripts/get_distance_to_focal_set.py --reference defaults/reference_seq.fasta --alignment results/filtered_israel-data.fasta.xz --focal-alignment results/asia_israel/sample-country.fasta --ignore-seqs Wuhan/Hu-1/2019 --chunk-size 10000 --output results/asia_israel/proximity_country.tsv 2>&1 | tee logs/subsampling_proximity_asia_israel_country.txt
Traceback (most recent call last):
File “/home/vishwajeet/data/ncov/scripts/get_distance_to_focal_set.py”, line 155, in
focal_seqs_dict = calculate_snp_matrix(focal_seqs, consensus = ref, ignore_seqs=args.ignore_seqs)
File “/home/vishwajeet/data/ncov/scripts/get_distance_to_focal_set.py”, line 73, in calculate_snp_matrix
raise ValueError(‘Fasta file appears to have sequences of different lengths!’)
ValueError: Fasta file appears to have sequences of different lengths!
I have used sanitize sequences.py and sanitize metadata.py before the run.
What should I do ?