Computational background

Skill Category Why it’s useful References
Command line Many programs are run via the command line - Basics and cheat sheet - Useful 1 hr lecture aimed at MIT CS students (but fairly entry level) - Game to practice
Text editors It’s often useful to use text editors to write bash scripts to run software in loops/to create pipelines - Sublime (Kristina’s choice for MacOS) - Emacs (Kristina’s choice for HPC) emacs cheat sheet
Version control Reproducibility, keep track of your work as you go - Github intro tutorials
Accessing HPC More powerful computer, necessary for some sequence analysis tasks - VPN, scp, screen - Transferring files - Software available - see below for more information

Bioinformatics basics

Skill Category Why it’s useful References
File formats Several types of standard file formats are used to store DNA, RNA, amino acid, and variant data Examples
Multiple sequence alignment The foundation of  sequence analysis is comparison of a sequence to other sequences. Sequence alignment algorithms allow us to do that
Phylogenetic trees Understand relationships between sequences
Conda
Geneious

Microbial genomics

Tutorials about Command Line Basics:

These tutorials cover topics like creating directories and copying/moving files on the Linux command line.

Datacamp requires a paid subscription, but I was able to do the first chapter for free, and it has a built in terminal on the website so you don’t have to download anything. I preferred the second tutorial though since the whole thing is free so it ends up covering more topics!

–OR–

Cornell BIOHPC Workshops:

Workshops

Useful beginning workshops: