THE CLUSTER
Non-computationally-intensive analyses can be run on your personal desktop or laptop computer. But for computationally intensive analyses (read: fMRI data), you'll need to run them to the Discovery Cluster. (Not sure what a cluster is? Look here.)
What this page covers:
Cluster resources
Accessing the cluster
Logging into the cluster
Navigating to your folder on the lab share
Changing folder permissions
Running analyses on the cluster
Using modules
Bonus: Cluster hacks
CLUSTER RESOURCES
For cluster questions, John Hudson is our go-to person. He and other members of the Cluster team can help you debug, optimize, and parallelize your code.
This tutorial explains more in depth about how the Discovery works, and how to connect with it. You can find other handy tutorials on that website as well.
Research computing also sometimes holds intro classes on using the Cluster; look out for them on their calendar.
ACCESSING THE CLUSTER
We have a lab share on Dartmouth's Discovery Supercomputing Cluster, with subfolders for each individual lab member's data and analyses. You can request an account to access the cluster here. Once you have your account, you'll need to email John Hudson to grant you permissions to access the lab share and submit jobs on Discovery.
Logging Into the Cluster
To log into the cluster, you'll need to use the command line. Never used the command line before? Check out this tutorial.
Once you have a terminal window open, log in using the following line of code:
ssh -Y yourDartmouthID@discovery7.hpcc.dartmouth.edu
To exit the cluster, simply enter:
exit
Hint: if you have trouble connecting to the cluster, check that you are connected to the eduroam network (not Dartmouth Public).
If you want to connect to the cluster off campus, you can connect to Dartmouth’s computing network via VPN. You will need to install the VPN client for Mac or Windows, and then follow these directions to establish your connection.
Navigating to Your Folder on the Lab Share
Via the command line
Once you've logged in, you can navigate to the lab share using the following command:
cd /dartfs-hpc/rc/lab/M/MeyerM/
Each lab member runs analyses in their personal folder on the lab share. If you don't already have one, you can create one:
mkdir yourlastname
Via Mac's Finder app
Open Finder > Go > Connect to Server
In the Server Address field, type: smb://dartfs-hpc/rc/lab/M/MeyerM/
Click the plus sign to save it to your favorite servers
Click Connect
Select "Registered User" and type your Dartmouth ID and password
IMPORTANT: Never, EVER edit or delete another lab member's folder (obviously).
Changing Folder Permissions
Need someone else to be able to run and/or edit your scripts? By default, files you create on Discovery are yours and yours only. You can give other users permission to modify them by using Discovery’s modifyacl script:
Add the script’s directory to your path by adding this line to your ~/.bash_profile:
PATH=$PATH:/dartfs-hpc/admin/local/bin
export $PATH
Enter the following command:
modifyacl -u user’s_dartmouth_ID readwrite name_of_folder
Running Analyses on the Cluster
Running analyses on the Cluster is just like running analyses on your personal computer: you'll execute a script (e.g. Python, FSL, etc.) that reads in your data, analyzes it, and outputs results. The main difference is that on a supercomputer, you have a lot more options for exactly where and how you execute your analysis script. To specify these options, you'll submit a "job" via a master script called a "pbs script" (because it ends in ".pbs")
The pbs script tells Discovery: "Run this analysis script, in this folder, with these specifications." You can find a basic template here. Some notes on the pbs script:
"walltime" is the maximum time your job is allowed to run before it terminates. For more complex analyses and more subjects, you'll need longer walltimes.
Where it says "cd $PBS_O_workdir", replace PBS_O_workdir with the path to whatever directory you want your analyses to run in.
Where it says "./program_name arg1 arg2 ...", replace ./program_name with the path to & name of your analysis script, and replace arg1 arg2 ... with any input arguments your script needs.
The pbs script outputs a .o file for the job output, and a .e file that logs errors. These are not your analysis results, but a record of the job itself!
Once you have your pbs script ready to go, make sure you're in the directory where it's located. Then submit your job using the following command (replace myscript with whatever you named your pbs script):
mksub myscript.pbs
Some useful commands to check up on your job as it runs:
myjobs—shows you what jobs you're currently running
checkjob ####—shows you what's going on if your job is failing (replace #### with your job number [can find using myjobs])
qr—query the resources you're using
qshow—gives you an overview of who's using the cluster and how much space they're using
Using Modules
On your home computer, you couldn't run a MATLAB script without opening MATLAB. Similarly, to run a script on the cluster, you'll need to open the corresponding program first. You do this through a module. Discovery already has modules for most common programs, but if you need one that isn't on there, you can get it added. To open a module (eg. MATLAB):
Make sure you included -Y in your login command. This opens XQuartz, which allows GUIs to be opened.
To explore what modules are available, enter the following command:
module avail
To load a module, enter the following command (replace matlab with your desired program):
module load matlab
You won't see a MATLAB GUI window open up as you would on your computer, but you can now run MATLAB scripts. You can unload the module with the same command, using "unload" instead of "load.
To open a MATLAB GUI, simply enter:
matlab
Bonus: Cluster Hacks
There are some cool ways you can customize your Cluster account to get around fast!
Automatically load modules whenever you log in
Simply type:
module initadd modulename
Navigate to folders with one command
Open your bash profile:
vim ~/.bash_profile
Enter editing mode by pressing "i"
At the bottom, under the line that says "User specific environment and startup programs," create whatever command you want using the alias function. The following example navigates to the project folder for my Orange is the New Black study whenever I type OINB:
alias OINB="cd /dartfs-hpc/rc/lab/M/MeyerM/Collier/OINB"
Save and exit by pressing [esc], then ":wq"
Save your edits:
source ~/.bash_profile
Open programs with customized settings
As in the above example, edit your .bash_profile
Create an alias to open your desired program with customized settings
The following example launches SPM12 via MATLAB without opening the MATLAB GUI, saving time and desktop clutter. You can now simply type spm12 from the command line whenever you want to open spm12.
alias spm12="matlab -nosplash -nodesktop -softwareopengl -r \"addpath('/dartfs-hpc/rc/lab/M/MeyerM/spm12'); spm\""
To open spm8 instead, add another version of this alias, but replace 12 with 8:
alias spm8="matlab -nosplash -nodesktop -softwareopengl -r \"addpath('/dartfs-hpc/rc/lab/M/MeyerM/spm8'); spm\""