Preparing data from paired-end FASTQ files

maryj · March 6, 2024, 10:54pm

I have a very basic question about how to prepare data for Nextstrain. I have around 100 paired end reads of influenza samples (ie, files ending in _R1.fastq or _R2.fastq) and I am unsure how to format them as a consensus genome as described here in the Nextstrain documentation.

Based on my understanding, I think I need to use Bioconductor or Samtools or something to align the paired-end reads to a reference genome and generate a consensus genome - is that correct? Do I do this pairwise for each set of paired-end reads?

Thanks, I appreciate any help getting started in the right direction.

joverlee · March 7, 2024, 6:43pm

Hi @maryj,

Consensus genomes are usually generated through a bioinformatics pipeline that uses multiple tools to assemble consensus genomes. INSaFlu can be a good platform for you to get started with your influenza samples.

Best,
Jover

Topic		Replies	Views
Preparing my own data Help and Getting Started	1	386	April 10, 2022
Nextclade H5N1 Concatenated view B3.13 and D1.1 genotypes Site Feedback	3	315	January 10, 2025
Error with a flu reference sequence for alignment Help and Getting Started	8	372	March 27, 2024
Using influenza datasets in clades.nextstrain.org Help and Getting Started	7	163	March 20, 2025
Recherche de séquences soumisses sur GISIAD Help and Getting Started	8	774	July 12, 2021

Preparing data from paired-end FASTQ files

Related topics