MSI Queues¶

Publically available queues at MSI

System	Queue	Nodes	Cores/node	Mem/node	Walltime	Running jobs
lab	lab (default)	73	8-32	15-128	72	6
	lab-long	?	8	15	150	6
	lab-600	?	8	128	600	1
	oc	4	12	23	72	3
Itasca	batch (default)	1086	8	22	24	2
	devel	32	8	22	2	?
	long	28	8	22	48	?
	jay	1	8	22	24	?
	sb	35	16	64	48	2
	sb128	8	16	128	96	2
	sb256	8	16	256	96	2
Mesabi	small (default)	256	24	64	96	?
	large (default)	360	24	64	24	?
	ram256g	32	24	256	96	?
	ram1t	16	32	1000	96	?
	k40	40	24	128	24	?

Notes:

Jobs cannot request more than 1 node on the lab queue
8 nodes in the lab (default) queue have 128G of memory and 16-32 cores per node, the rest (65 nodes) have 15G of memory and 8 cores per node
Jobs submitted to Mesabi’s default queue automatically routes jobs requesting 10 or more nodes to the “large” queue, and smaller jobs to the “small” queue.

Lab Cluster Queues¶

The Lab Cluster is the oldest, slowest general-purpose cluster at MSI. All queues on the Lab cluster only allow single-node jobs, but up to six jobs can run simultaneously (per user). The Lab cluster hardware is very old and is not considered a high-performance system. Your jobs will run much faster on Itasca or Mesabi. The “isub” command launches interactive jobs on this cluster (useful for general-purpose command-line work).

lab (default): The is the main queue on the Lab cluster. Many nodes are available, but most are small. Job requesting 15G of memory or less and 8 nodes or less will have much shorter wait times in the queue than jobs requesting >15G memory or >8 nodes
lab-long: Useful if your job needs more than 72 hours of walltime
lab-600: Useful if your job needs more than 150 hours of walltime
oc: A queue for four overclocked nodes, particularily useful for serial (single-core) jobs. Also good for general use since these nodes are much newer than the other Lab nodes

Itasca Queues¶

Itasca is the second-tier general-purpose cluster at MSI, slower than Mesabi but faster than the Lab Cluster.

batch (default): This is the main queue on Itasca. A huge number of nodes are available, but each node is not particularily powerful. Great for jobs than can make use of many nodes, and for general use
devel: This queue is for testing your pbs scripts. It works just like the batch queue, but you are limited to 2 hours of walltime and 32 nodes. The advantage is jobs on this queue have high priority, so jobs should start very quickly
long: Useful if your job needs more than 24 hours of walltime
jay: A queue for a special high-performance node with a high-speed internet connection
sb: Sandybridge queue (64G memory). The Sandybridge nodes are much more powerful than the standard Itasca nodes in the batch queue, but there aren’t very many of them. Great for single-node and smaller multi-node jobs. Large multi-node (>6) jobs tend to have long wait times in the queue. This queue has four times more nodes than the sb128 and sb256 queues, so use this queue unless you need more than 64G memory
sb128: Sandybridge queue (128G memory). Refer to sb queue
sb256: Sandybridge queue (256G memory). Refer to sb queue

Mesabi Queues¶

Mesabi is the newest, fastest general-purpose cluster at MSI that also contains some specialized hardware. MSI has not determined how queues will be set up on the new Mesabi system. However, it is likely that the queue structure will be derived from the hardware structure:

small (default): This is the main queue on Mesabi for jobs requesting fewer than 10 nodes. Jobs may request partial nodes (like on the lab queue). A large number of powerful nodes are available. Great for smaller multi-node jobs or for large numbers of small jobs (including single-node, single-core jobs)
large (default): This is the main queue on Mesabi for jobs requesting 10 or more nodes. A large number of powerful nodes are available. Great for jobs than can make use of many nodes.
ram256 (mid-mem): Queue for general-purpose 256G memory nodes
ram1t (high-mem): Queue for general-purpose 1T memory nodes. These nodes have 32 cores per node, so they are also good for jobs that scale well across multiple cores, but can’t make use of multiple nodes.
k40 (gpu): Queue for nodes with NVidia Tesla GPUs, useful for jobs running software capable of using GPU accelerators