THE CLUSTER

ibm-supercomputer-p690-cluster.jpg

Non-computationally-intensive analyses can be run on your personal desktop or laptop computer. But for computationally intensive analyses (read: fMRI data), you'll need to run them to the Discovery Cluster. (Not sure what a cluster is? Look here.)

What this page covers:

  • Cluster resources

  • Accessing the cluster

  • Logging into the cluster

  • Navigating to your folder on the lab share

  • Changing folder permissions

  • Running analyses on the cluster

  • Using modules

  • Bonus: Cluster hacks

CLUSTER RESOURCES

  • For cluster questions, John Hudson is our go-to person. He and other members of the Cluster team can help you debug, optimize, and parallelize your code.

  • This tutorial explains more in depth about how the Discovery works, and how to connect with it. You can find other handy tutorials on that website as well.

  • Research computing also sometimes holds intro classes on using the Cluster; look out for them on their calendar.

ACCESSING THE CLUSTER

We have a lab share on Dartmouth's Discovery Supercomputing Cluster, with subfolders for each individual lab member's data and analyses. You can request an account to access the cluster here. Once you have your account, you'll need to email John Hudson to grant you permissions to access the lab share and submit jobs on Discovery.

Logging Into the Cluster

To log into the cluster, you'll need to use the command line. Never used the command line before? Check out this tutorial.

Once you have a terminal window open, log in using the following line of code:

   ssh -Y yourDartmouthID@discovery7.hpcc.dartmouth.edu

To exit the cluster, simply enter:

   exit

Hint: if you have trouble connecting to the cluster, check that you are connected to the eduroam network (not Dartmouth Public).

If you want to connect to the cluster off campus, you can connect to Dartmouth’s computing network via VPN. You will need to install the VPN client for Mac or Windows, and then follow these directions to establish your connection.

Navigating to Your Folder on the Lab Share

  1. Via the command line

    • Once you've logged in, you can navigate to the lab share using the following command:

      cd /dartfs-hpc/rc/lab/M/MeyerM/
    • Each lab member runs analyses in their personal folder on the lab share. If you don't already have one, you can create one:

      mkdir yourlastname

  2. Via Mac's Finder app

    • Open Finder > Go > Connect to Server

    • In the Server Address field, type: smb://dartfs-hpc/rc/lab/M/MeyerM/

    • Click the plus sign to save it to your favorite servers

    • Click Connect

    • Select "Registered User" and type your Dartmouth ID and password

IMPORTANT: Never, EVER edit or delete another lab member's folder (obviously).

Changing Folder Permissions

Need someone else to be able to run and/or edit your scripts? By default, files you create on Discovery are yours and yours only. You can give other users permission to modify them by using Discovery’s modifyacl script:

  1. Add the script’s directory to your path by adding this line to your ~/.bash_profile:

    PATH=$PATH:/dartfs-hpc/admin/local/bin

    export $PATH

  2. Enter the following command:

    modifyacl -u user’s_dartmouth_ID readwrite name_of_folder

Running Analyses on the Cluster

Running analyses on the Cluster is just like running analyses on your personal computer: you'll execute a script (e.g. Python, FSL, etc.) that reads in your data, analyzes it, and outputs results. The main difference is that on a supercomputer, you have a lot more options for exactly where and how you execute your analysis script. To specify these options, you'll submit a "job" via a master script called a "pbs script" (because it ends in ".pbs")

The pbs script tells Discovery: "Run this analysis script, in this folder, with these specifications." You can find a basic template here. Some notes on the pbs script:

  • "walltime" is the maximum time your job is allowed to run before it terminates. For more complex analyses and more subjects, you'll need longer walltimes.

  • Where it says "cd $PBS_O_workdir", replace PBS_O_workdir with the path to whatever directory you want your analyses to run in.

  • Where it says "./program_name arg1 arg2 ...", replace ./program_name with the path to & name of your analysis script, and replace arg1 arg2 ... with any input arguments your script needs.

  • The pbs script outputs a .o file for the job output, and a .e file that logs errors. These are not your analysis results, but a record of the job itself!

Once you have your pbs script ready to go, make sure you're in the directory where it's located. Then submit your job using the following command (replace myscript with whatever you named your pbs script):

   mksub myscript.pbs

Some useful commands to check up on your job as it runs:

  • myjobs—shows you what jobs you're currently running

  • checkjob ####—shows you what's going on if your job is failing (replace #### with your job number [can find using myjobs])

  • qr—query the resources you're using

  • qshow—gives you an overview of who's using the cluster and how much space they're using

Using Modules

On your home computer, you couldn't run a MATLAB script without opening MATLAB. Similarly, to run a script on the cluster, you'll need to open the corresponding program first. You do this through a module. Discovery already has modules for most common programs, but if you need one that isn't on there, you can get it added. To open a module (eg. MATLAB):

  • Make sure you included -Y in your login command. This opens XQuartz, which allows GUIs to be opened.

  • To explore what modules are available, enter the following command:

      module avail
  • To load a module, enter the following command (replace matlab with your desired program):

      module load matlab

You won't see a MATLAB GUI window open up as you would on your computer, but you can now run MATLAB scripts. You can unload the module with the same command, using "unload" instead of "load.

  • To open a MATLAB GUI, simply enter:

      matlab

Bonus: Cluster Hacks

There are some cool ways you can customize your Cluster account to get around fast!

  1. Automatically load modules whenever you log in

    • Simply type:

      module initadd modulename
  2. Navigate to folders with one command

    • Open your bash profile:

      vim ~/.bash_profile
    • Enter editing mode by pressing "i"

    • At the bottom, under the line that says "User specific environment and startup programs," create whatever command you want using the alias function. The following example navigates to the project folder for my Orange is the New Black study whenever I type OINB:

      alias OINB="cd /dartfs-hpc/rc/lab/M/MeyerM/Collier/OINB"

    • Save and exit by pressing [esc], then ":wq"

    • Save your edits:

      source ~/.bash_profile
  3. Open programs with customized settings

    • As in the above example, edit your .bash_profile

    • Create an alias to open your desired program with customized settings

    • The following example launches SPM12 via MATLAB without opening the MATLAB GUI, saving time and desktop clutter. You can now simply type spm12 from the command line whenever you want to open spm12.

      alias spm12="matlab -nosplash -nodesktop -softwareopengl -r \"addpath('/dartfs-hpc/rc/lab/M/MeyerM/spm12'); spm\""

    • To open spm8 instead, add another version of this alias, but replace 12 with 8:

      alias spm8="matlab -nosplash -nodesktop -softwareopengl -r \"addpath('/dartfs-hpc/rc/lab/M/MeyerM/spm8'); spm\""