What are compute servers?

We provide two different kinds of compute servers for running compute-intensive research applications.

The first kind are general usage compute servers, the second kind are distributed slurm partitions.

Our general usage compute servers can be used by multiple people to run long term jobs. This does mean that the resources are shared, which is not always ideal if your job has to contend with other processes. However, all of the general usage compute servers can be accessed directly through a regular SSH login.

Our distributed slurm partitions, on the other hand, allow you to claim resources exclusively for up to three days per job, the details of which are covered at our slurm cluster page.

Please do not run applications such as web browsers or email clients on the compute servers; general-purpose computing should be done on your desktop if you have one, or on an application server if you are connecting remotely. Likewise, compute-intensive processes should not be run on the application servers as they are meant for general-purpose computing.

Although most software on the compute servers should be available as part of the packages provided by the OS vendor, some software which is provided by alternate sources resides in /opt. You should consider setting your PATH or using aliases to make running software in /opt easier; your Point of Contact (PoC) can assist you with this.

If a desired (but not installed) package is available as part of our current Ubuntu distribution, contact your Point of Contact (PoC) and have them put in a request that it be added to core software.

With respect to software that is neither available via the OS distribution-provided package list nor in /opt, you or your Point of Contact (PoC) can install programs in your home directory or a suitable work partition for use on these servers, so long as the programs do not require Administrator/root privileges to install or run.

General Usage Compute Servers

  • comps0 and comps1 have an AMD Ryzen Threadripper 2990WX with 128 GB of RAM (32 cores with two threads per core for 64 total threads).
  • comps2 is a Silicon Mechanics Rackform iServ R331.v4 with two 12-core Intel E5-2697v2 CPUs and 128G of ram (24 cores total with one thread per core for 24 total threads).
  • comps3 is an HPE Proliant DL3855 installed with two 3GHZ AMD Epyc 7302 CPUs and 512G of DDR4 RAM (16 cores and 32 threads per CPU). Please note that this server has the largest amount of installed memory out of all our current compute servers.

Slurm Compute Cluster

Our slurm deployment is a fairly typical job scheduling, resource-based heterogeneous cluster with accounting. From a login session on one of our above compute servers, a user submits a job to a particular 'partition' (slurm nomenclature for a group of associated nodes) along with specifications of what resources are required (cores, memory, gpus).

The head node will schedule the jobs on node(s) as required to meet the criteria. Resources used are tracked (the slurm documentation calls this 'billing') against a user's account, where the general formula is number of resources times time in use, with a weight factor for more powerful hardware, and total resource usage over time is taken into account when scheduling priority for jobs as compared to other users in contention for the same resources.

The slurm cluster page has more information.