| Skill Category | Why it’s useful | References |
|---|---|---|
| Command line | Many programs are run via the command line | - Basics and cheat sheet - Useful 1 hr lecture aimed at MIT CS students (but fairly entry level) - Game to practice |
| Text editors | It’s often useful to use text editors to write bash scripts to run software in loops/to create pipelines | - Sublime (Kristina’s choice for MacOS) - Emacs (Kristina’s choice for HPC) emacs cheat sheet |
| Version control | Reproducibility, keep track of your work as you go | - Github intro tutorials |
| Accessing HPC | More powerful computer, necessary for some sequence analysis tasks | - VPN, scp, screen - Transferring files - Software available - see below for more information |
| Skill Category | Why it’s useful | References |
|---|---|---|
| File formats | Several types of standard file formats are used to store DNA, RNA, amino acid, and variant data | Examples |
| Multiple sequence alignment | The foundation of sequence analysis is comparison of a sequence to other sequences. Sequence alignment algorithms allow us to do that | |
| Phylogenetic trees | Understand relationships between sequences | |
| Conda | ||
| Geneious |
These tutorials cover topics like creating directories and copying/moving files on the Linux command line.
Datacamp requires a paid subscription, but I was able to do the first chapter for free, and it has a built in terminal on the website so you don’t have to download anything. I preferred the second tutorial though since the whole thing is free so it ends up covering more topics!
–OR–
Cornell BIOHPC Workshops:
Useful beginning workshops: