This example shows how to create reference based assemblies of viral transcriptomes.
Trimming transcriptome reads:
Download reference genomes (Assembly ids listed on a text file called acc_list.txt), and unzip the output from ncbi
programs/ncbi_datasets/datasets download genome accession --inputfile acc_list.txt --include gff3,genome,gbff
unzip ncbi_dataset.zip
Index the reference genomes (repeat for each reference)
# load hisat2
source /programs/HISAT2/hisat2.sh
#index reference
hisat2-build GCA_027256855.1_ASM2725685v1_genomic.fna GCA_027256855.1_index
Map reads to reference
hisat2 -p 4 -x /path/to/index -1 /path/to/R1 -2 /path/to/R2
# for example,
hisat2 -p 4 -x ./references/GCA_027256855.1/GCA_027256855.1_index -1 ../HaeL_reads/CM7_2022_93_R1_paired.fastq -2 ../HaeL_reads/CM7_2022_93_R2_paired.fastq
#sort
samtools sort -o alnst.sorted.bam alns.sam
Assemble transcripts using the aligned reads
mkdir assembly
programs/bin/cufflinks/gffread ../../references/GCA_027256855.1/GCA_027256855.1.gff -T -o ../../references/GCA_027256855.1/genomic.gtf
stringtie mapped_reads/aligned_reads.bam -o stringtie_out/transcripts.gtf -p 4