Below is a protocol to run the dorado software to basecall nanopore data.
Before running this protocol, a few key themes to think about will be:
- Moving large volumes of data
- between computers (locally to servers, server to server, etc), so its best to be familiar with tools like
rsync. Movement of data will occur before the run (moving data to the GPU server), as well as after the run (moving results from GPU to server or local).
- Terminal multiplexing
- tools like
tmux are super useful, so that you can keep an eye on each step of the process
- Time management
- You will have to keep a sharp eye on time, as you will be charged by the hour that you use the GPU - and you get charged regardless of if you are doing computations or not - so make sure to be vigilant.
Step 1: Login and reserve gpu server
Click: https://biohpc.cornell.edu/Default.aspx
- Login
- under User:{user}
- click “Reservations”
- scroll down to “GPU gen2 servers**”
- in the “New reservation” section
- change the date and time to your specifications — I recommend reserving at least 24 hours at a time. Just keep in mind that you will then have to CANCEL your reservation once your data is done running.
- Make sure you have the correct BioHPC credit account selected before you reserve
Step 2: Move data and scripts to reserved GPU
- Do this however you prefer. I prefer to use
rsync and put all data in one place first to stage it, before moving all to the GPU at once