For the tutorial on “Explore SARS-CoV-2 evolution”, under the section of “Preparing your data” and then “Curate data from the full GISAID database”, it says: Find the “Download packages” section and select the “FASTA” button.
My question is: why don’t we download the "MSA full XXXX" file under the “alignment and proteins” section? I think the FASTA files under “Download packages” section is not aligned.
BTW, I found there are ambiguous sequences in the FASTA file, with letters such as “k”, “y”. Is there a way to filter out sequences containing those letters?
Thank you very much & Best regards,