Accession in sequences.fasta, but not in nexstrain website

Thomas · December 19, 2023, 2:05am

Hi everyone,

I was looking for some accessions in Nexstrain RSV-A (auspice), but cannot find them when filtering for that sample. I downloaded the sequences fasta and can find it there. One example of such an accession is (KU950643). Looking at the downloaded metatdata file, it is not obvious to me as to why the sequence was filtered out. This is the row in the metadata file: KU950643 KU950643.1 KU950643 10/2/2012 North America USA Homo sapiens 4/6/2016 Das et al Das,S.R.,Halpin,R.A.,Shilts,M.,Puri,V.,Akopov,A.,Fedorova,N.,Stockwell,T.,Amedeo,P.,Bishop,B.,Katzel,D.,Schobel,S.,Shrivastava,S.,Hartert,T. 22266 A.D GA2 0 good 45219 1 15225 1 1 1.

Can someone help me understand this?

Thank you!
Thomas

corneliusroemer · December 19, 2023, 4:19pm

Without looking at the details, my first guess would be subsampling: not all good sequences are included as this would make the analysis take too long, unrepresentative and hard to view.

In the workflow, the subsampling happens here: rule filter (Github link)

Other reasons it doesn’t show up could be QC checks, but as you show the sample having a good quality score, subsampling is the most likely answer for why it’s not included.

The sequences.fasta you’re referring to is probably the output of the ingest pipeline and the input data for the phylogenetic workflow. The input data contains all sequences from Genbank, whereas the workflow filters and subsamples, starting from the full set.

Thomas · December 20, 2023, 3:36pm

Thank you for the explanation, that makes sense!

Topic		Replies	Views
Contextual strain list from augur filter General	0	377	May 6, 2022
ERROR: All samples have been dropped! Check filter rules and metadata file format Help and Getting Started	0	738	September 21, 2020
Diagnosing error + filtering issues Help and Getting Started	14	1685	November 9, 2020
Updated example command needed for updated GISAID file	4	583	August 30, 2021
All samples dropped during augur filter	29	2368	January 24, 2022

Accession in sequences.fasta, but not in nexstrain website

Related topics