Reference based transcriptome assembly

This example shows how to create reference based assemblies of viral transcriptomes.

Trimming transcriptome reads:
1. fastqc (for example, fastqc -o qc_reports R1.fq.gz R2.fq.gz)
2. trimmomatic (see viral sequence detection)
Download reference genomes (Assembly ids listed on a text file called acc_list.txt), and unzip the output from ncbi
1. Jingmen tick virus: GCF_000919875.1
2. Alongshan virus: GCA_027256855.1
```
programs/ncbi_datasets/datasets download genome accession --inputfile acc_list.txt --include gff3,genome,gbff
unzip ncbi_dataset.zip
```

Index the reference genomes (repeat for each reference)

# load hisat2
source /programs/HISAT2/hisat2.sh

#index reference
hisat2-build GCA_027256855.1_ASM2725685v1_genomic.fna GCA_027256855.1_index

Map reads to reference

 hisat2 -p 4 -x /path/to/index -1 /path/to/R1 -2 /path/to/R2
 
 # for example,
 hisat2 -p 4 -x ./references/GCA_027256855.1/GCA_027256855.1_index -1 ../HaeL_reads/CM7_2022_93_R1_paired.fastq -2 ../HaeL_reads/CM7_2022_93_R2_paired.fastq 
 
 #sort
 samtools sort -o alnst.sorted.bam alns.sam

Assemble transcripts using the aligned reads

mkdir assembly
programs/bin/cufflinks/gffread ../../references/GCA_027256855.1/GCA_027256855.1.gff -T -o ../../references/GCA_027256855.1/genomic.gtf
stringtie mapped_reads/aligned_reads.bam -o stringtie_out/transcripts.gtf -p 4