This is a pipeline in progress for detecting viral sequences from RNA-seq data. I’ve based it off methods used in this paper: https://www.nature.com/articles/s41564-024-01796-6
Software:
- Trimmomatic
- SPAdes
- Virsorter2, checkV, DRAMv
Input data:
- RNA-seq data in fastq format
Workflow:
- Trim adapter sequences and low quality reads using trimmomatic
- de novo transcriptome assembly with rnaviralSPAdes
- Follow protocol described here to detect viral sequences in assembly
Steps:
- Use trimmomatic (downloaded on HPC). For example:
java -jar /programs/trimmomatic/trimmomatic-0.39.jar PE -phred33 input_forward.fastq input_reverse.fastq output_forward_paired.fastq output_forward_unpaired.fastq output_reverse_paired.fastq output_reverse_unpaired.fastq ILLUMINACLIP:/programs/trimmomatic/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:35
- this command accomplishes the following:
- Removes adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
- ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>
- fastaWithAdaptersEtc: specifies the path to a fasta file containing all the adapters, PCR sequences etc. The naming of the various sequences within this file determines how they are used. See below.
- seedMismatches: specifies the maximum mismatch count which will still allow a full match to be performed
- palindromeClipThreshold: specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment.
- simpleClipThreshold: specifies how accurate the match between any adapter etc. sequence must be against a read.
- note, the path to the adapter file on the HPC is /programs/trimmomatic/adapters/TruSeq3-PE-2.fa
- Removes leading low quality or N bases (below quality 3) (LEADING:3)
- Removes trailing low quality or N bases (below quality 3) (TRAILING:3)
- Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
- Drop reads below the 35 bases long (MINLEN:35)